Edit model card

SentenceTransformer based on sentence-transformers/stsb-distilbert-base

This is a sentence-transformers model finetuned from sentence-transformers/stsb-distilbert-base on the sentence-transformers/quora-duplicates dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: DistilBertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("tomaarsen/stsb-distilbert-base-ocl")
# Run inference
sentences = [
    'Is stretching bad?',
    'Is stretching good for you?',
    'If i=0; what will i=i++ do to i?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Binary Classification

Metric Value
cosine_accuracy 0.86
cosine_accuracy_threshold 0.8104
cosine_f1 0.8251
cosine_f1_threshold 0.7248
cosine_precision 0.7347
cosine_recall 0.9407
cosine_ap 0.8872
dot_accuracy 0.828
dot_accuracy_threshold 157.3549
dot_f1 0.7899
dot_f1_threshold 145.7113
dot_precision 0.7155
dot_recall 0.8814
dot_ap 0.8369
manhattan_accuracy 0.868
manhattan_accuracy_threshold 208.0035
manhattan_f1 0.8308
manhattan_f1_threshold 208.0035
manhattan_precision 0.7922
manhattan_recall 0.8733
manhattan_ap 0.8868
euclidean_accuracy 0.867
euclidean_accuracy_threshold 9.2694
euclidean_f1 0.8301
euclidean_f1_threshold 9.5257
euclidean_precision 0.7888
euclidean_recall 0.876
euclidean_ap 0.8884
max_accuracy 0.868
max_accuracy_threshold 208.0035
max_f1 0.8308
max_f1_threshold 208.0035
max_precision 0.7922
max_recall 0.9407
max_ap 0.8884

Paraphrase Mining

Metric Value
average_precision 0.5344
f1 0.5448
precision 0.5311
recall 0.5592
threshold 0.8626

Information Retrieval

Metric Value
cosine_accuracy@1 0.928
cosine_accuracy@3 0.9712
cosine_accuracy@5 0.9782
cosine_accuracy@10 0.9874
cosine_precision@1 0.928
cosine_precision@3 0.4151
cosine_precision@5 0.2666
cosine_precision@10 0.1417
cosine_recall@1 0.7994
cosine_recall@3 0.9342
cosine_recall@5 0.9561
cosine_recall@10 0.9766
cosine_ndcg@10 0.9516
cosine_mrr@10 0.9509
cosine_map@100 0.939
dot_accuracy@1 0.8926
dot_accuracy@3 0.9518
dot_accuracy@5 0.9658
dot_accuracy@10 0.9768
dot_precision@1 0.8926
dot_precision@3 0.4027
dot_precision@5 0.2608
dot_precision@10 0.1388
dot_recall@1 0.768
dot_recall@3 0.9106
dot_recall@5 0.9402
dot_recall@10 0.9623
dot_ndcg@10 0.9264
dot_mrr@10 0.9243
dot_map@100 0.9094

Training Details

Training Dataset

sentence-transformers/quora-duplicates

  • Dataset: sentence-transformers/quora-duplicates at 451a485
  • Size: 100,000 training samples
  • Columns: sentence1, sentence2, and label
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 label
    type string string int
    details
    • min: 6 tokens
    • mean: 15.5 tokens
    • max: 45 tokens
    • min: 6 tokens
    • mean: 15.46 tokens
    • max: 78 tokens
    • 0: ~64.10%
    • 1: ~35.90%
  • Samples:
    sentence1 sentence2 label
    What are the best ecommerce blogs to do guest posts on about SEO to gain new clients? Interested in being a guest blogger for an ecommerce marketing blog? 0
    How do I learn Informatica online training? What is Informatica online training? 0
    What effects does marijuana use have on the flu? What effects does Marijuana use have on the common cold? 0
  • Loss: OnlineContrastiveLoss

Evaluation Dataset

sentence-transformers/quora-duplicates

  • Dataset: sentence-transformers/quora-duplicates at 451a485
  • Size: 1,000 evaluation samples
  • Columns: sentence1, sentence2, and label
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 label
    type string string int
    details
    • min: 6 tokens
    • mean: 15.82 tokens
    • max: 46 tokens
    • min: 6 tokens
    • mean: 15.91 tokens
    • max: 72 tokens
    • 0: ~62.90%
    • 1: ~37.10%
  • Samples:
    sentence1 sentence2 label
    How should I prepare for JEE Mains 2017? How do I prepare for the JEE 2016? 0
    What is the gate exam? What is the GATE exam in engineering? 0
    Where do IRS officers get posted? Does IRS Officers get posted abroad? 0
  • Loss: OnlineContrastiveLoss

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • fp16: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: False
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: None
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss loss cosine_map@100 quora-duplicates-dev_average_precision quora-duplicates_max_ap
0 0 - - 0.9235 0.4200 0.7276
0.0640 100 2.5123 - - - -
0.1280 200 2.0534 - - - -
0.1599 250 - 1.7914 0.9127 0.4082 0.8301
0.1919 300 1.9505 - - - -
0.2559 400 1.9836 - - - -
0.3199 500 1.8462 1.5923 0.9190 0.4445 0.8688
0.3839 600 1.7734 - - - -
0.4479 700 1.7918 - - - -
0.4798 750 - 1.5461 0.9291 0.4943 0.8707
0.5118 800 1.6157 - - - -
0.5758 900 1.7244 - - - -
0.6398 1000 1.7322 1.5294 0.9309 0.5048 0.8808
0.7038 1100 1.6825 - - - -
0.7678 1200 1.6823 - - - -
0.7997 1250 - 1.4812 0.9351 0.5126 0.8865
0.8317 1300 1.5707 - - - -
0.8957 1400 1.6145 - - - -
0.9597 1500 1.5795 1.4705 0.9390 0.5344 0.8884

Environmental Impact

Carbon emissions were measured using CodeCarbon.

  • Energy Consumed: 0.040 kWh
  • Carbon Emitted: 0.016 kg of CO2
  • Hours Used: 0.202 hours

Training Hardware

  • On Cloud: No
  • GPU Model: 1 x NVIDIA GeForce RTX 3090
  • CPU Model: 13th Gen Intel(R) Core(TM) i7-13700K
  • RAM Size: 31.78 GB

Framework Versions

  • Python: 3.11.6
  • Sentence Transformers: 3.0.0.dev0
  • Transformers: 4.41.0.dev0
  • PyTorch: 2.3.0+cu121
  • Accelerate: 0.26.1
  • Datasets: 2.18.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
0
Safetensors
Model size
66.4M params
Tensor type
F32
·

Finetuned from

Evaluation results