papluca commited on
Commit
9865598
1 Parent(s): 8442e3d

Add model usage info

Browse files
Files changed (1) hide show
  1. README.md +45 -0
README.md CHANGED
@@ -103,6 +103,51 @@ As a baseline to compare `xlm-roberta-base-language-detection` against, we have
103
  |vi |0.971 |0.990 |0.980 |500 |
104
  |zh |1.000 |1.000 |1.000 |500 |
105
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
106
  ## Training procedure
107
 
108
  Fine-tuning was done via the `Trainer` API. Here is the [Colab notebook](https://colab.research.google.com/drive/15LJTckS6gU3RQOmjLqxVNBmbsBdnUEvl?usp=sharing) with the training code.
 
103
  |vi |0.971 |0.990 |0.980 |500 |
104
  |zh |1.000 |1.000 |1.000 |500 |
105
 
106
+ ## How to get started with the model
107
+
108
+ The easiest way to use the model is via the high-level `pipeline` API:
109
+
110
+ ```python
111
+ from transformers import pipeline
112
+
113
+ text = [
114
+ "Brevity is the soul of wit.",
115
+ "Amor, ch'a nullo amato amar perdona."
116
+ ]
117
+
118
+ model_ckpt = "papluca/xlm-roberta-base-language-detection"
119
+ pipe = pipeline("text-classification", model=model_ckpt)
120
+ pipe(text, top_k=1, truncation=True)
121
+ ```
122
+
123
+ Or one can proceed with the tokenizer and model separately:
124
+
125
+ ```python
126
+ import torch
127
+ from transformers import AutoModelForSequenceClassification, AutoTokenizer
128
+
129
+ text = [
130
+ "Brevity is the soul of wit.",
131
+ "Amor, ch'a nullo amato amar perdona."
132
+ ]
133
+
134
+ model_ckpt = "papluca/xlm-roberta-base-language-detection"
135
+ tokenizer = AutoTokenizer.from_pretrained(model_ckpt)
136
+ model = AutoModelForSequenceClassification.from_pretrained(model_ckpt)
137
+
138
+ inputs = tokenizer(text, padding=True, truncation=True, return_tensors="pt")
139
+
140
+ with torch.no_grad():
141
+ logits = model(**inputs).logits
142
+
143
+ preds = torch.softmax(logits, dim=-1)
144
+
145
+ # Map raw predictions to languages
146
+ id2lang = model.config.id2label
147
+ vals, idxs = torch.max(preds, dim=1)
148
+ {id2lang[k.item()]: v.item() for k, v in zip(idxs, vals)}
149
+ ```
150
+
151
  ## Training procedure
152
 
153
  Fine-tuning was done via the `Trainer` API. Here is the [Colab notebook](https://colab.research.google.com/drive/15LJTckS6gU3RQOmjLqxVNBmbsBdnUEvl?usp=sharing) with the training code.