@tomaarsen on Hugging Face: "🤗 Sentence Transformers v2.4.0 for embedding models is now out! It introduces…"

tomaarsen

posted an update Feb 23

Post

🤗 Sentence Transformers v2.4.0 for embedding models is now out! It introduces a lot of powerful features, such as:

1. Matryoshka Loss function - you can now train & perform inference on 🪆 Matryoshka Embedding models. See also our blogpost: https://huggingface.co/blog/matryoshka

2. CoSENTLoss & AnglELoss: State of the art loss functions. These are quite interesting, they outperform CosineSimilarityLoss on nearly all benchmarks as a drop-in replacement! See also the docs: https://sbert.net/docs/package_reference/losses.html#cosentloss

3. Prompt templates: Many popular models such as intfloat/multilingual-e5-large and BAAI/bge-large-en-v1.5 prefix their texts with prompts, so this adds configuration options to automatically include prompts using model.encode(..., prompt_name="query") which will include a prompt with the name "query". More info in the docs: https://sbert.net/examples/applications/computing-embeddings/README.html#prompt-templates

4. Instructor support: Support for the INSTRUCTOR line of models, such as hkunlp/instructor-large. Learn how to use them here: https://sbert.net/docs/pretrained_models.html#instructor-models

5. Removed NLTK & sentencepiece dependencies: Should allow for a smaller installation & a slightly faster import!

6. Updated documentation: a new Loss Overview section: https://sbert.net/docs/training/loss_overview.html and more detailed loss functions: https://sbert.net/docs/package_reference/losses.html

And much more! See the full release notes here: https://github.com/UKPLab/sentence-transformers/releases/tag/v2.4.0

Some more very exciting updates are still on the horizon!

bfn

Mar 7

•

edited Mar 7

Random idea for a more generalized Matryoshka (tbf it's not really a Matryoshka anymore): why not just use weights for the individual dimensions?

rel_weights = math.log2(num_dims) - (torch.arange(num_dims) + 1).log2() + 1/2

This would result in the same general distribution of weights over the loss as the Matryoshka as we move toward the higher numbered dimensionsa, except smoothly, without the steps: each additional dimension adds less and less to the whole, so now it would then make sense to truncate the embedding to any arbitrary length, not just some randomly set values.

tomaarsen

Mar 7

I've had the same idea before as well! I think this should work as well, but I haven't had time to do the research myself. Perhaps @SeanLee97 is interested in trying this out?

Join the conversation