File size: 2,424 Bytes
e66d572
16b5869
 
e66d572
 
 
 
 
 
 
 
 
 
 
0225852
 
 
 
16b5869
 
e66d572
16b5869
e66d572
16b5869
e66d572
16b5869
e66d572
16b5869
e66d572
 
 
 
 
 
 
 
16b5869
e66d572
16b5869
e66d572
16b5869
e66d572
 
 
16b5869
e66d572
 
16b5869
e66d572
 
 
 
 
 
 
 
16b5869
 
e66d572
 
 
 
 
 
 
3f4e734
e66d572
 
 
 
 
 
 
 
 
 
 
 
 
e4ab342
e66d572
e4ab342
e66d572
e4ab342
e66d572
e4ab342
e66d572
e4ab342
e66d572
e4ab342
e66d572
e4ab342
e66d572
e4ab342
e66d572
e4ab342
e66d572
e4ab342
e66d572
e4ab342
16b5869
 
e66d572
16b5869
e66d572
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111

---
language: bn
tags:
- collaborative
- bengali
- SequenceClassification
license: apache-2.0
datasets: IndicGlue 
metrics:
- Loss
- Accuracy
- Precision
- Recall
widget:

- text: "এশিয়ায় প্রথম দৃষ্টিহীন ব্যক্তির মাউন্ট এভারেস্ট জয়|"

---

# sahajBERT News Article Classification

## Model description

[sahajBERT](https://huggingface.co/neuropark/sahajBERT) fine-tuned for news article classification using the `sna.bn` split of [IndicGlue](https://huggingface.co/datasets/indic_glue). 

The model is trained for classifying articles into 5 different classes:

| Label id | Label |
|:--------:|:----:|
|0 | kolkata|
|1 | state|
|2 | national|
|3 | sports|
|4 | entertainment|
|5 | international|

## Intended uses & limitations

#### How to use

You can use this model directly with a pipeline for Sequence Classification:
```python
from transformers import AlbertForSequenceClassification, TextClassificationPipeline, PreTrainedTokenizerFast

# Initialize tokenizer
tokenizer = PreTrainedTokenizerFast.from_pretrained("neuropark/sahajBERT-NCC")

# Initialize model
model = AlbertForSequenceClassification.from_pretrained("neuropark/sahajBERT-NCC")

# Initialize pipeline
pipeline = TextClassificationPipeline(tokenizer=tokenizer, model=model)

raw_text = "এই ইউনিয়নে ৩ টি মৌজা ও ১০ টি গ্রাম আছে ।" # Change me
output = pipeline(raw_text)
```

#### Limitations and bias

<!-- Provide examples of latent issues and potential remediations. -->
WIP

## Training data

The model was initialized with pre-trained weights of [sahajBERT](https://huggingface.co/neuropark/sahajBERT) at step 19519 and trained on the `sna.bn` split of [IndicGlue](https://huggingface.co/datasets/indic_glue). 

## Training procedure

Coming soon! 
<!-- ```bibtex
@inproceedings{...,
  year={2020}
}
``` -->

## Eval results


Loss: 0.2477145493030548

Accuracy: 0.926293408929837

Macro F1: 0.9079785326650756

Recall: 0.926293408929837

Weighted F1: 0.9266428029354202

Macro Precision: 0.9109938492260489

Micro Precision: 0.926293408929837

Weighted Precision: 0.9288535478995414

Macro Recall: 0.9069095007692186

Micro Recall: 0.926293408929837

Weighted Recall: 0.926293408929837


### BibTeX entry and citation info

Coming soon! 
<!-- ```bibtex
@inproceedings{...,
  year={2020}
}
``` -->