--- license: apache-2.0 language: - bn metrics: - wer - cer tags: - seq2seq - ipa - bengali - byt5 widget: - text: আমি সে বাবুর মামু বাড়ি গিছিলাম। example_title: Narail Text - text: এখন এই কুলো তার শেষ অই কুলো তার শেষ। example_title: Rangpur Text - text: খয়দে সিআরের এইল্লা কি অবস্থা! example_title: Chittagong Text - text: আটাইশ করছিলাম দের কানি ক্ষেত, ইবার মাইর কাইছি। example_title: Kishoreganj Text - text: তারা তো ওই খারাপ খেইলাই আসে না। example_title: Narsingdi Text - text: আর সব থেকে ফানি কথা হইতেছে দেখ? example_title: Tangail Text --- # Regional bengali text to IPA transcription - umt5-base This is a fine-tuned version of the [google/umt5-base](https://huggingface.co/google/umt5-base) for the task of generating IPA transcriptions from regional bengali text. This was done on the dataset of the competition [“ভাষামূল: মুখের ভাষার খোঁজে“](https://www.kaggle.com/competitions/regipa/overview) by Bengali.AI. Scores achieved till now (test scores): - **Word error rate (wer)**: 0.02390405721962450 - **Char error rate (cer)**: 0.01011514943093060 Supported district tokens: - Kishoreganj - Narail - Narsingdi - Chittagong - Rangpur - Tangail --- ## Loading & using the model ```python # Load model directly from transformers import AutoTokenizer, AutoModelForSeq2SeqLM tokenizer = AutoTokenizer.from_pretrained("teamapocalypseml/ben2ipa-umt5base") model = AutoModelForSeq2SeqLM.from_pretrained("teamapocalypseml/ben2ipa-umt5base") """ The format of the input text MUST BE: """ text = " bengali_text_here" text_ids = tokenizer(text, return_tensors='pt').input_ids model(text_ids) ``` ## Using the pipeline ```python # Use a pipeline as a high-level helper from transformers import pipeline device = "cuda" if torch.cuda.is_available() else "cpu" pipe = pipeline("text2text-generation", model="teamapocalypseml/ben2ipa-umt5base", device=device) """ `texts` must be in the format of: """ outputs = pipe(texts, max_length=512, batch_size=batch_size) ``` ## Credits Done by [S M Jishanul Islam](https://huggingface.co/smji), [Sadia Ahmmed](https://huggingface.co/sadiaahmmed), [Sahid Hossain Mustakim](https://huggingface.co/rhsm15)