sahajBERT-NCC / README.md
SaulLu's picture
Change widget text
0225852
---
language: bn
tags:
- collaborative
- bengali
- SequenceClassification
license: apache-2.0
datasets: IndicGlue
metrics:
- Loss
- Accuracy
- Precision
- Recall
widget:
- text: "এশিয়ায় প্রথম দৃষ্টিহীন ব্যক্তির মাউন্ট এভারেস্ট জয়|"
---
# sahajBERT News Article Classification
## Model description
[sahajBERT](https://huggingface.co/neuropark/sahajBERT) fine-tuned for news article classification using the `sna.bn` split of [IndicGlue](https://huggingface.co/datasets/indic_glue).
The model is trained for classifying articles into 5 different classes:
| Label id | Label |
|:--------:|:----:|
|0 | kolkata|
|1 | state|
|2 | national|
|3 | sports|
|4 | entertainment|
|5 | international|
## Intended uses & limitations
#### How to use
You can use this model directly with a pipeline for Sequence Classification:
```python
from transformers import AlbertForSequenceClassification, TextClassificationPipeline, PreTrainedTokenizerFast
# Initialize tokenizer
tokenizer = PreTrainedTokenizerFast.from_pretrained("neuropark/sahajBERT-NCC")
# Initialize model
model = AlbertForSequenceClassification.from_pretrained("neuropark/sahajBERT-NCC")
# Initialize pipeline
pipeline = TextClassificationPipeline(tokenizer=tokenizer, model=model)
raw_text = "এই ইউনিয়নে ৩ টি মৌজা ও ১০ টি গ্রাম আছে ।" # Change me
output = pipeline(raw_text)
```
#### Limitations and bias
<!-- Provide examples of latent issues and potential remediations. -->
WIP
## Training data
The model was initialized with pre-trained weights of [sahajBERT](https://huggingface.co/neuropark/sahajBERT) at step 19519 and trained on the `sna.bn` split of [IndicGlue](https://huggingface.co/datasets/indic_glue).
## Training procedure
Coming soon!
<!-- ```bibtex
@inproceedings{...,
year={2020}
}
``` -->
## Eval results
Loss: 0.2477145493030548
Accuracy: 0.926293408929837
Macro F1: 0.9079785326650756
Recall: 0.926293408929837
Weighted F1: 0.9266428029354202
Macro Precision: 0.9109938492260489
Micro Precision: 0.926293408929837
Weighted Precision: 0.9288535478995414
Macro Recall: 0.9069095007692186
Micro Recall: 0.926293408929837
Weighted Recall: 0.926293408929837
### BibTeX entry and citation info
Coming soon!
<!-- ```bibtex
@inproceedings{...,
year={2020}
}
``` -->