tomaarsen's picture
tomaarsen HF staff
Add new SentenceTransformer model.
67a57b4 verified
metadata
language:
  - en
library_name: sentence-transformers
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - loss:CosineSimilarityLoss
base_model: sentence-transformers/all-mpnet-base-v2
metrics:
  - pearson_cosine
  - spearman_cosine
  - pearson_manhattan
  - spearman_manhattan
  - pearson_euclidean
  - spearman_euclidean
  - pearson_dot
  - spearman_dot
  - pearson_max
  - spearman_max
widget:
  - source_sentence: A boy is vacuuming.
    sentences:
      - A little boy is vacuuming the floor.
      - A woman is riding an elephant.
      - People are sitting on benches.
  - source_sentence: A man shoots a man.
    sentences:
      - The man is aiming a gun.
      - A man is tracking in the wood.
      - A woman leading a white horse.
  - source_sentence: A plane in the sky.
    sentences:
      - A plane rides on a road.
      - A tiger walks around aimlessly.
      - Two dogs playing on the shore.
  - source_sentence: A baby is laughing.
    sentences:
      - The baby laughed in his car seat.
      - A toddler walks down a hallway.
      - There are dogs in the forest.
  - source_sentence: The gate is yellow.
    sentences:
      - The gate is blue.
      - US spends $50m on carp invasion
      - Suicide bomber strikes in Syria
pipeline_tag: sentence-similarity
co2_eq_emissions:
  emissions: 9.73131270828096
  energy_consumed: 0.025035406836808046
  source: codecarbon
  training_type: fine-tuning
  on_cloud: false
  cpu_model: 13th Gen Intel(R) Core(TM) i7-13700K
  ram_total_size: 31.777088165283203
  hours_used: 0.122
  hardware_used: 1 x NVIDIA GeForce RTX 3090
model-index:
  - name: SentenceTransformer based on sentence-transformers/all-mpnet-base-v2
    results:
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts dev
          type: sts-dev
        metrics:
          - type: pearson_cosine
            value: 0.9105652572605438
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.9097842782963139
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.8999692728646553
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.909018931820409
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.9003677259034385
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.9097842782963139
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.9105652590717077
            name: Pearson Dot
          - type: spearman_dot
            value: 0.9097842782963139
            name: Spearman Dot
          - type: pearson_max
            value: 0.9105652590717077
            name: Pearson Max
          - type: spearman_max
            value: 0.9097842782963139
            name: Spearman Max
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts test
          type: sts-test
        metrics:
          - type: pearson_cosine
            value: 0.8764756843077764
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.8733461504859822
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.8668031220817161
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.8725075805222068
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.8674774784108314
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.8733464312456004
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.8764756858675475
            name: Pearson Dot
          - type: spearman_dot
            value: 0.8733464312456004
            name: Spearman Dot
          - type: pearson_max
            value: 0.8764756858675475
            name: Pearson Max
          - type: spearman_max
            value: 0.8733464312456004
            name: Spearman Max

SentenceTransformer based on sentence-transformers/all-mpnet-base-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-mpnet-base-v2 on the sentence-transformers/stsb dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 384, 'do_lower_case': False}) with Transformer model: MPNetModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("tomaarsen/all-mpnet-base-v2-sts")
# Run inference
sentences = [
    'The gate is yellow.',
    'The gate is blue.',
    'US spends $50m on carp invasion',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Metric Value
pearson_cosine 0.9106
spearman_cosine 0.9098
pearson_manhattan 0.9
spearman_manhattan 0.909
pearson_euclidean 0.9004
spearman_euclidean 0.9098
pearson_dot 0.9106
spearman_dot 0.9098
pearson_max 0.9106
spearman_max 0.9098

Semantic Similarity

Metric Value
pearson_cosine 0.8765
spearman_cosine 0.8733
pearson_manhattan 0.8668
spearman_manhattan 0.8725
pearson_euclidean 0.8675
spearman_euclidean 0.8733
pearson_dot 0.8765
spearman_dot 0.8733
pearson_max 0.8765
spearman_max 0.8733

Training Details

Training Dataset

sentence-transformers/stsb

  • Dataset: sentence-transformers/stsb at ab7a5ac
  • Size: 5,749 training samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 6 tokens
    • mean: 10.0 tokens
    • max: 28 tokens
    • min: 5 tokens
    • mean: 9.95 tokens
    • max: 25 tokens
    • min: 0.0
    • mean: 0.54
    • max: 1.0
  • Samples:
    sentence1 sentence2 score
    A plane is taking off. An air plane is taking off. 1.0
    A man is playing a large flute. A man is playing a flute. 0.76
    A man is spreading shreded cheese on a pizza. A man is spreading shredded cheese on an uncooked pizza. 0.76
  • Loss: CosineSimilarityLoss with these parameters:
    {
        "loss_fct": "torch.nn.modules.loss.MSELoss"
    }
    

Evaluation Dataset

sentence-transformers/stsb

  • Dataset: sentence-transformers/stsb at ab7a5ac
  • Size: 1,500 evaluation samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 5 tokens
    • mean: 15.1 tokens
    • max: 45 tokens
    • min: 6 tokens
    • mean: 15.11 tokens
    • max: 53 tokens
    • min: 0.0
    • mean: 0.47
    • max: 1.0
  • Samples:
    sentence1 sentence2 score
    A man with a hard hat is dancing. A man wearing a hard hat is dancing. 1.0
    A young child is riding a horse. A child is riding a horse. 0.95
    A man is feeding a mouse to a snake. The man is feeding a mouse to the snake. 1.0
  • Loss: CosineSimilarityLoss with these parameters:
    {
        "loss_fct": "torch.nn.modules.loss.MSELoss"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • num_train_epochs: 4
  • warmup_ratio: 0.1
  • fp16: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: False
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 4
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: None
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss loss sts-dev_spearman_cosine sts-test_spearman_cosine
0.2778 100 0.0218 0.0210 0.8939 -
0.5556 200 0.0203 0.0190 0.8990 -
0.8333 300 0.019 0.0183 0.9021 -
1.1111 400 0.0147 0.0190 0.9033 -
1.3889 500 0.0092 0.0187 0.9038 -
1.6667 600 0.0089 0.0180 0.9031 -
1.9444 700 0.0089 0.0184 0.9045 -
2.2222 800 0.0056 0.0181 0.9066 -
2.5 900 0.0045 0.0182 0.9075 -
2.7778 1000 0.0047 0.0179 0.9083 -
3.0556 1100 0.0045 0.0179 0.9090 -
3.3333 1200 0.003 0.0176 0.9088 -
3.6111 1300 0.0029 0.0176 0.9093 -
3.8889 1400 0.0031 0.0176 0.9098 -
4.0 1440 - - - 0.8733

Environmental Impact

Carbon emissions were measured using CodeCarbon.

  • Energy Consumed: 0.025 kWh
  • Carbon Emitted: 0.010 kg of CO2
  • Hours Used: 0.122 hours

Training Hardware

  • On Cloud: No
  • GPU Model: 1 x NVIDIA GeForce RTX 3090
  • CPU Model: 13th Gen Intel(R) Core(TM) i7-13700K
  • RAM Size: 31.78 GB

Framework Versions

  • Python: 3.11.6
  • Sentence Transformers: 3.0.0.dev0
  • Transformers: 4.41.0.dev0
  • PyTorch: 2.3.0+cu121
  • Accelerate: 0.26.1
  • Datasets: 2.18.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}