Why some embeddings models contain "llama" in their names?

by terilias - opened Dec 14, 2023

Dec 14, 2023

Hello,

I was reading the README file on the GitHub repository of the models and noticed that on the "English results" page (https://github.com/SeanLee97/AnglE#english-sts-results), certain Angle models include the name of the Llama LLM, such as "SeanLee97/angle-llama-7b-nli-v2". I'm curious about the significance of this. I attempted to glean insights from the associated paper, but I find myself confused. If I interpret correctly, the authors utilized Llama 2 to generate the training text dataset. Is this the only reason of the LLM's name inclusion in the name? Furthermore, does this imply that these AnglE models are more effective when paired specifically with Llama 2 as the LLM in RAG applications?

SeanLee97

WhereIsAI org Dec 14, 2023

•

edited Dec 14, 2023

Hi @terilias , there are two different usages of LLaMA in our paper, as follows:

First usage: we use the LLaMA as a feature extractor to encode text and obtain corresponding text embedding. Subsequently, we fine-tune the text embeddings using AnglE optimization. This is the primary usage of LLaMA in our paper. The models listed in English STS Results belong to this setting. In this setting, LLaMA parameters will be optimized.
Second usage: we maintain the original function of LLaMA and utilize it as an annotator to generate supervised training datasets through prompt engineering. We refer to this approach as LLM-supervised. In this setting, the parameters of LLaMA remain fixed as we solely use it for text generation. We introduce this method because AnglE is a supervised approach. However, in domain-specific applications, supervised data is often limited. The use of LLM-supervised can help address this issue by providing additional labeled data.

SeanLee97

WhereIsAI org Dec 14, 2023

Hello,

I was reading the README file on the GitHub repository of the models and noticed that on the "English results" page (https://github.com/SeanLee97/AnglE#english-sts-results), certain Angle models include the name of the Llama LLM, such as "SeanLee97/angle-llama-7b-nli-v2". I'm curious about the significance of this. I attempted to glean insights from the associated paper, but I find myself confused. If I interpret correctly, the authors utilized Llama 2 to generate the training text dataset. Is this the only reason of the LLM's name inclusion in the name? Furthermore, does this imply that these AnglE models are more effective when paired specifically with Llama 2 as the LLM in RAG applications?

B.T.W., the models like SeanLee97/angle-llama-xxx were only trained on NLI datasets. Although it can achieve SOTA on STS tasks, we cannot guarantee that it generalizes well to other tasks. Therefore, if you want to use it for real applications, it is recommended to use WhereIsAI/UAE-Large-V1.

terilias

Dec 14, 2023

•

edited Dec 14, 2023

Thank you for your response and your time, @SeanLee97 ! Based on your feedback, it seems that my assumption, suggesting that it's preferable to use Llama 2 as the text generator (when I use these AnglE models) instead of other LLMs in RAG applications is incorrect, am I right?

SeanLee97

WhereIsAI org Dec 14, 2023

I think it is incorrect.
Our models likeSeanLee97/angle-llama-xxx can be used as feature generators for producing sentence embeddings for vector search.
It cannot be used as LLM for text generation.

terilias

Dec 14, 2023

Ok, I understand. I thought there was a connection between the selected embeddings model for vector search and the Language Model (LLM) for text generation. However, I now realize that these two components are entirely independent, despite the LLM being utilized in the training of the embeddings model. Thanks a lot for clarifying!

SeanLee97

WhereIsAI org Dec 14, 2023

Thanks again for following our work!

SeanLee97 changed discussion status to closed Dec 14, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment