bert-mini-amharic

This model has the same architecture as bert-mini and was pretrained from scratch using the Amharic subsets of the oscar and mc4 datasets, on a total of 137 Million tokens. The tokenizer was trained from scratch on the same text corpus, and had a vocabulary size of 24k. It achieves the following results on the evaluation set:

Loss: 3.57
Perplexity: 35.52

Even though this model only has 9.7 Million parameters, its performance is only slightly behind the 28x larger 279 Million parameter xlm-roberta-base model on the same Amharic evaluation set.

How to use

You can use this model directly with a pipeline for masked language modeling:

>>> from transformers import pipeline
>>> unmasker = pipeline('fill-mask', model='rasyosef/bert-mini-amharic')
>>> unmasker("ከሀገራቸው ከኢትዮጵያ ከወጡ ግማሽ ምዕተ [MASK] ተቆጥሯል።")

[{'score': 0.4713546335697174,
  'token': 9308,
  'token_str': 'ዓመት',
  'sequence': 'ከሀገራቸው ከኢትዮጵያ ከወጡ ግማሽ ምዕተ ዓመት ተቆጥሯል ።'},
 {'score': 0.25726795196533203,
  'token': 9540,
  'token_str': 'ዓመታት',
  'sequence': 'ከሀገራቸው ከኢትዮጵያ ከወጡ ግማሽ ምዕተ ዓመታት ተቆጥሯል ።'},
 {'score': 0.07067586481571198,
  'token': 10354,
  'token_str': 'አመት',
  'sequence': 'ከሀገራቸው ከኢትዮጵያ ከወጡ ግማሽ ምዕተ አመት ተቆጥሯል ።'},
 {'score': 0.07064681500196457,
  'token': 11212,
  'token_str': 'አመታት',
  'sequence': 'ከሀገራቸው ከኢትዮጵያ ከወጡ ግማሽ ምዕተ አመታት ተቆጥሯል ።'},
 {'score': 0.012558948248624802,
  'token': 10588,
  'token_str': 'ወራት',
  'sequence': 'ከሀገራቸው ከኢትዮጵያ ከወጡ ግማሽ ምዕተ ወራት ተቆጥሯል ።'}]

Fine-tuning

The following github repository contains a notebook that fine-tunes this model for an Amharic text classification task.

https://github.com/rasyosef/amharic-news-category-classification

Fine-tuned Model Performance

Since this is a multi-class classification task, the reported precision, recall, and f1 metrics are macro averages.

Model	Size(# params)	Accuracy	Precision	Recall	F1
bert-mini-amharic	9.67M	0.87	0.83	0.83	0.83
bert-small-amharic	25.7M	0.89	0.86	0.87	0.86
xlm-roberta-base	279M	0.9	0.88	0.88	0.88

rasyosef
/

bert-mini-amharic

bert-mini-amharic

How to use

Fine-tuning

Fine-tuned Model Performance

Datasets used to train rasyosef/bert-mini-amharic