SetFit with BAAI/bge-small-en-v1.5
This is a SetFit model that can be used for Text Classification. This SetFit model uses BAAI/bge-small-en-v1.5 as the Sentence Transformer embedding model. A SetFitHead instance is used for classification.
The model has been trained using an efficient few-shot learning technique that involves:
- Fine-tuning a Sentence Transformer with contrastive learning.
- Training a classification head with features from the fine-tuned Sentence Transformer.
Model Details
Model Description
- Model Type: SetFit
- Sentence Transformer body: BAAI/bge-small-en-v1.5
- Classification head: a SetFitHead instance
- Maximum Sequence Length: 512 tokens
- Number of Classes: 2 classes
Model Sources
Model Labels
Label |
Examples |
NON_SARCASTIC |
- 'so the newer devices have the ios screenshot i m still on ios but my ipad mini 1 st gen shows the ios screenshot . odd .'
- 'why do amazon need a test authorisation when i add a new payment card , as well as the authorisation they get when i actually use it ?'
- 'waterboarding sounds like a lot of fun until you find out what it is'
|
SARCASTIC |
- "have you been reading long ? you are not very good at it . it has nothing to do with who i like , especially since i am not a fan of corbyn anyway . it ' s that in one case someone was literally slapped in the face , and in the other someone wore a milkshake . battery > being annoying"
- 'wish one of the many people dressed as killers were actually one n killed me'
- 'is it even christmas if there isn t a fight with neighbours and a broken wrist ?'
|
Evaluation
Metrics
Label |
Accuracy |
F1 |
Precision |
Recall |
all |
0.6618 |
0.3952 |
0.2891 |
0.6242 |
Uses
Direct Use for Inference
First install the SetFit library:
pip install setfit
Then you can load this model and run inference.
from setfit import SetFitModel
model = SetFitModel.from_pretrained("w11wo/bge-small-en-v1.5-isarcasm")
preds = model("last day in my twenties")
Training Details
Training Set Metrics
Training set |
Min |
Median |
Max |
Word count |
2 |
19.8489 |
63 |
Label |
Training Sample Count |
NON_SARCASTIC |
609 |
SARCASTIC |
609 |
Training Hyperparameters
- batch_size: (256, 16)
- num_epochs: (3, 8)
- max_steps: -1
- sampling_strategy: oversampling
- body_learning_rate: (2e-05, 5e-06)
- head_learning_rate: 0.002
- loss: CosineSimilarityLoss
- distance_metric: cosine_distance
- margin: 0.25
- end_to_end: True
- use_amp: False
- warmup_proportion: 0.1
- l2_weight: 0.01
- seed: 42
- eval_max_steps: -1
- load_best_model_at_end: True
Training Results
Epoch |
Step |
Training Loss |
Validation Loss |
0.0003 |
1 |
0.2571 |
- |
0.0172 |
50 |
0.251 |
- |
0.0344 |
100 |
0.2556 |
- |
0.0517 |
150 |
0.2513 |
- |
0.0689 |
200 |
0.2531 |
- |
0.0861 |
250 |
0.2518 |
- |
0.1033 |
300 |
0.2553 |
- |
0.1206 |
350 |
0.2501 |
- |
0.1378 |
400 |
0.2546 |
- |
0.1550 |
450 |
0.2506 |
- |
0.1722 |
500 |
0.2317 |
- |
0.1895 |
550 |
0.093 |
- |
0.2067 |
600 |
0.0139 |
- |
0.2239 |
650 |
0.0166 |
- |
0.2411 |
700 |
0.0053 |
- |
0.2584 |
750 |
0.0013 |
- |
0.2756 |
800 |
0.0121 |
- |
0.2928 |
850 |
0.0096 |
- |
0.3100 |
900 |
0.0043 |
- |
0.3272 |
950 |
0.0014 |
- |
0.3445 |
1000 |
0.0009 |
- |
0.3617 |
1050 |
0.0117 |
- |
0.3789 |
1100 |
0.0144 |
- |
0.3961 |
1150 |
0.0084 |
- |
0.4134 |
1200 |
0.0006 |
- |
0.4306 |
1250 |
0.0005 |
- |
0.4478 |
1300 |
0.0081 |
- |
0.4650 |
1350 |
0.0144 |
- |
0.4823 |
1400 |
0.0045 |
- |
0.4995 |
1450 |
0.0042 |
- |
0.5167 |
1500 |
0.0005 |
- |
0.5339 |
1550 |
0.003 |
- |
0.5512 |
1600 |
0.0004 |
- |
0.5684 |
1650 |
0.0005 |
- |
0.5856 |
1700 |
0.0004 |
- |
0.6028 |
1750 |
0.0004 |
- |
0.6200 |
1800 |
0.0026 |
- |
0.6373 |
1850 |
0.0004 |
- |
0.6545 |
1900 |
0.0004 |
- |
0.6717 |
1950 |
0.0003 |
- |
0.6889 |
2000 |
0.0014 |
- |
0.7062 |
2050 |
0.0004 |
- |
0.7234 |
2100 |
0.0003 |
- |
0.7406 |
2150 |
0.0003 |
- |
0.7578 |
2200 |
0.0004 |
- |
0.7751 |
2250 |
0.0003 |
- |
0.7923 |
2300 |
0.0003 |
- |
0.8095 |
2350 |
0.0003 |
- |
0.8267 |
2400 |
0.0003 |
- |
0.8440 |
2450 |
0.0003 |
- |
0.8612 |
2500 |
0.0003 |
- |
0.8784 |
2550 |
0.0003 |
- |
0.8956 |
2600 |
0.0003 |
- |
0.9128 |
2650 |
0.0003 |
- |
0.9301 |
2700 |
0.0003 |
- |
0.9473 |
2750 |
0.0004 |
- |
0.9645 |
2800 |
0.0003 |
- |
0.9817 |
2850 |
0.0003 |
- |
0.9990 |
2900 |
0.0036 |
- |
1.0162 |
2950 |
0.0003 |
- |
1.0334 |
3000 |
0.0003 |
- |
1.0506 |
3050 |
0.0002 |
- |
1.0679 |
3100 |
0.0002 |
- |
1.0851 |
3150 |
0.0002 |
- |
1.1023 |
3200 |
0.0002 |
- |
1.1195 |
3250 |
0.0002 |
- |
1.1368 |
3300 |
0.0003 |
- |
1.1540 |
3350 |
0.0004 |
- |
1.1712 |
3400 |
0.0002 |
- |
1.1884 |
3450 |
0.0002 |
- |
1.2056 |
3500 |
0.0002 |
- |
1.2229 |
3550 |
0.0002 |
- |
1.2401 |
3600 |
0.0002 |
- |
1.2573 |
3650 |
0.0009 |
- |
1.2745 |
3700 |
0.0002 |
- |
1.2918 |
3750 |
0.0002 |
- |
1.3090 |
3800 |
0.0002 |
- |
1.3262 |
3850 |
0.0002 |
- |
1.3434 |
3900 |
0.0002 |
- |
1.3607 |
3950 |
0.0002 |
- |
1.3779 |
4000 |
0.0002 |
- |
1.3951 |
4050 |
0.0002 |
- |
1.4123 |
4100 |
0.0002 |
- |
1.4296 |
4150 |
0.0002 |
- |
1.4468 |
4200 |
0.0003 |
- |
1.4640 |
4250 |
0.0002 |
- |
1.4812 |
4300 |
0.0002 |
- |
1.4984 |
4350 |
0.0002 |
- |
1.5157 |
4400 |
0.0002 |
- |
1.5329 |
4450 |
0.0002 |
- |
1.5501 |
4500 |
0.0002 |
- |
1.5673 |
4550 |
0.0002 |
- |
1.5846 |
4600 |
0.0002 |
- |
1.6018 |
4650 |
0.0002 |
- |
1.6190 |
4700 |
0.0002 |
- |
1.6362 |
4750 |
0.0002 |
- |
1.6535 |
4800 |
0.0002 |
- |
1.6707 |
4850 |
0.0002 |
- |
1.6879 |
4900 |
0.0002 |
- |
1.7051 |
4950 |
0.0002 |
- |
1.7224 |
5000 |
0.0003 |
- |
1.7396 |
5050 |
0.0002 |
- |
1.7568 |
5100 |
0.0002 |
- |
1.7740 |
5150 |
0.0002 |
- |
1.7913 |
5200 |
0.0002 |
- |
1.8085 |
5250 |
0.0002 |
- |
1.8257 |
5300 |
0.0038 |
- |
1.8429 |
5350 |
0.0002 |
- |
1.8601 |
5400 |
0.0002 |
- |
1.8774 |
5450 |
0.0002 |
- |
1.8946 |
5500 |
0.0002 |
- |
1.9118 |
5550 |
0.0002 |
- |
1.9290 |
5600 |
0.0005 |
- |
1.9463 |
5650 |
0.0002 |
- |
1.9635 |
5700 |
0.0002 |
- |
1.9807 |
5750 |
0.0002 |
- |
1.9979 |
5800 |
0.0002 |
- |
2.0152 |
5850 |
0.0001 |
- |
2.0324 |
5900 |
0.0002 |
- |
2.0496 |
5950 |
0.0002 |
- |
2.0668 |
6000 |
0.0002 |
- |
2.0841 |
6050 |
0.0002 |
- |
2.1013 |
6100 |
0.0002 |
- |
2.1185 |
6150 |
0.0002 |
- |
2.1357 |
6200 |
0.0001 |
- |
2.1529 |
6250 |
0.0002 |
- |
2.1702 |
6300 |
0.0002 |
- |
2.1874 |
6350 |
0.0001 |
- |
2.2046 |
6400 |
0.0001 |
- |
2.2218 |
6450 |
0.0001 |
- |
2.2391 |
6500 |
0.0001 |
- |
2.2563 |
6550 |
0.0001 |
- |
2.2735 |
6600 |
0.0001 |
- |
2.2907 |
6650 |
0.0001 |
- |
2.3080 |
6700 |
0.0001 |
- |
2.3252 |
6750 |
0.0001 |
- |
2.3424 |
6800 |
0.0001 |
- |
2.3596 |
6850 |
0.0001 |
- |
2.3769 |
6900 |
0.0001 |
- |
2.3941 |
6950 |
0.0001 |
- |
2.4113 |
7000 |
0.0001 |
- |
2.4285 |
7050 |
0.0001 |
- |
2.4457 |
7100 |
0.0001 |
- |
2.4630 |
7150 |
0.0001 |
- |
2.4802 |
7200 |
0.0001 |
- |
2.4974 |
7250 |
0.0001 |
- |
2.5146 |
7300 |
0.0001 |
- |
2.5319 |
7350 |
0.0001 |
- |
2.5491 |
7400 |
0.0001 |
- |
2.5663 |
7450 |
0.0001 |
- |
2.5835 |
7500 |
0.0001 |
- |
2.6008 |
7550 |
0.0001 |
- |
2.6180 |
7600 |
0.0001 |
- |
2.6352 |
7650 |
0.0001 |
- |
2.6524 |
7700 |
0.0001 |
- |
2.6697 |
7750 |
0.0001 |
- |
2.6869 |
7800 |
0.0001 |
- |
2.7041 |
7850 |
0.0001 |
- |
2.7213 |
7900 |
0.0001 |
- |
2.7385 |
7950 |
0.0001 |
- |
2.7558 |
8000 |
0.0001 |
- |
2.7730 |
8050 |
0.0001 |
- |
2.7902 |
8100 |
0.0001 |
- |
2.8074 |
8150 |
0.0001 |
- |
2.8247 |
8200 |
0.0001 |
- |
2.8419 |
8250 |
0.0001 |
- |
2.8591 |
8300 |
0.0001 |
- |
2.8763 |
8350 |
0.0001 |
- |
2.8936 |
8400 |
0.0001 |
- |
2.9108 |
8450 |
0.0001 |
- |
2.9280 |
8500 |
0.0001 |
- |
2.9452 |
8550 |
0.0001 |
- |
2.9625 |
8600 |
0.0001 |
- |
2.9797 |
8650 |
0.0001 |
- |
2.9969 |
8700 |
0.0001 |
- |
Framework Versions
- Python: 3.10.12
- SetFit: 1.0.1
- Sentence Transformers: 2.2.2
- Transformers: 4.32.0
- PyTorch: 2.1.1+cu121
- Datasets: 2.14.5
- Tokenizers: 0.13.3
Citation
BibTeX
@article{https://doi.org/10.48550/arxiv.2209.11055,
doi = {10.48550/ARXIV.2209.11055},
url = {https://arxiv.org/abs/2209.11055},
author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
title = {Efficient Few-Shot Learning Without Prompts},
publisher = {arXiv},
year = {2022},
copyright = {Creative Commons Attribution 4.0 International}
}