Edit model card

T5-french-base Model

Model Overview

The T5-French-Base model is a ~250M params only T5 model trained (entirely from scratch) solely on French data from the RedPajama 2 dataset. This model was trained for 85,000 steps and was only pre-trained from scratch without any supervised training. Therefore, this model has to be fine-tuned before it is useable on a downstream task. It is intended to serve as a foundation for further fine-tuning and as a starting point for downstream tasks in the French language. Since the training compute buget was very limited, the model is mainly useful for research only.

Model Details

  • Model Architecture: T5 Base, version 1.1 (GEGLU activation in feed-forward hidden layer, rather than ReLU)
  • Training Dataset: RedPajama 2 dataset (French-only)
  • Training Steps: 85,000 (from scratch)
  • Tokenizer: T5 Tokenizer

Intended Use

The T5-French-Base model is intended to be used for research only, in order to serve as a pre-trained model for further fine-tuning on specific French language tasks. It may be used as a starting point for fine-tuning on tasks such as:

  • French text generation
  • French question answering
  • French language understanding
  • French text summarization

Limitations

The T5-French-Base model may not be suitable for user-facing, or production applications. It is mainly meant for researchers only. It was trained entirely from scratch. The training budget was really limited (85k steps only, ~250M params only, for a final loss of ~1.1). The model is a base model that hasn't been fine-tuned yet. As such, it does NOT follow instructions. Additionally, the model was trained solely on French data and won't work for tasks that require cross-lingual understanding or multilingual capabilities.

Ethical Considerations

The T5-French-Base model was trained from scratch on publicly available data and does not contain any known biases or ethical concerns. However, researchers should be aware of potential biases in the RedPajama 2 training data and should carefully evaluate the model's outputs for any unintended consequences.

Citation

If you use the RedPajama-T5-Base-French model in your work, please cite the original Google T5 model, as well as the following:

@article{guillaumeT5french,
  title={T5-French-Base model: A T5 model trained on french data only},
  author={guillaumephd},
  url={https://huggingface.co/guillaumephd/t5-french-base},
  year={2024}
}
Downloads last month
14
Safetensors
Model size
248M params
Tensor type
F32
·

Dataset used to train guillaumephd/t5-french-base