--- license: mit language: - wo - fr metrics: - bleu pipeline_tag: translation tags: - text-generation-inference --- # Model Documentation: Wolof to French Translation with NLLB-200 ## Model Overview This document describes a machine translation model fine-tuned from Meta's NLLB-200 for translating from Wolof to French. The model, hosted at `cifope/nllb-200-wo-fr-distilled-600M`, utilizes a distilled version of the NLLB-200 model which has been specifically optimized for translation tasks between the Wolof and French languages. ## Dependencies The model requires the `transformers` library by Hugging Face. Ensure that you have the library installed: ```bash pip install transformers ``` ## Setup Import necessary classes from the `transformers` library: ```python from transformers import AutoModelForSeq2SeqLM, NllbTokenizer ``` Initialize the model and tokenizer: ```python model = AutoModelForSeq2SeqLM.from_pretrained('cifope/nllb-200-wo-fr-distilled-600M') tokenizer = NllbTokenizer.from_pretrained('facebook/nllb-200-distilled-600M') ``` ## Translation Functions ### Translate from French to Wolof The `translate` function translates text from French to Wolof: ```python def translate(text, src_lang='fra_Latn', tgt_lang='wol_Latn', a=16, b=1.5, max_input_length=1024, **kwargs): tokenizer.src_lang = src_lang tokenizer.tgt_lang = tgt_lang inputs = tokenizer(text, return_tensors='pt', padding=True, truncation=True, max_length=max_input_length) result = model.generate( **inputs.to(model.device), forced_bos_token_id=tokenizer.convert_tokens_to_ids(tgt_lang), max_new_tokens=int(a + b * inputs.input_ids.shape[1]), **kwargs ) return tokenizer.batch_decode(result, skip_special_tokens=True) ``` ### Translate from Wolof to French The `reversed_translate` function translates text from Wolof to French: ```python def reversed_translate(text, src_lang='wol_Latn', tgt_lang='fra_Latn', a=16, b=1.5, max_input_length=1024, **kwargs): tokenizer.src_lang = src_lang tokenizer.tgt_lang = tgt_lang inputs = tokenizer(text, return_tensors='pt', padding=True, truncation=True, max_length=max_input_length) result = model.generate( **inputs.to(model.device), forced_bos_token_id=tokenizer.convert_tokens_to_ids(tgt_lang), max_new_tokens=int(a + b * inputs.input_ids.shape[1]), **kwargs ) return tokenizer.batch_decode(result, skip_special_tokens=True) ``` ## Usage To use the model for translating text, simply call the `translate` or `reversed_translate` function with the appropriate text and parameters. For example: ```python french_text = "L'argent peut être échangé à la seule banque des îles située à Stanley" wolof_translation = translate(french_text) print(wolof_translation) wolof_text = "alkaati yi tàmbali nañu xàll léegi kilifa gi ñów" french_translation = reversed_translate(wolof_text) print(french_translation) wolof_text = "alkaati yi tàmbali nañu xàll léegi kilifa gi ñów" english_translation = reversed_translate(wolof_text,tgt_lang="eng_Latn") print(english_translation) ```