Why some embeddings models contain "llama" in their names?

#4
by terilias - opened

Hello,

I was reading the README file on the GitHub repository of the models and noticed that on the "English results" page (https://github.com/SeanLee97/AnglE#english-sts-results), certain Angle models include the name of the Llama LLM, such as "SeanLee97/angle-llama-7b-nli-v2". I'm curious about the significance of this. I attempted to glean insights from the associated paper, but I find myself confused. If I interpret correctly, the authors utilized Llama 2 to generate the training text dataset. Is this the only reason of the LLM's name inclusion in the name? Furthermore, does this imply that these AnglE models are more effective when paired specifically with Llama 2 as the LLM in RAG applications?

Hi @terilias , there are two different usages of LLaMA in our paper, as follows:

  1. First usage: we use the LLaMA as a feature extractor to encode text and obtain corresponding text embedding. Subsequently, we fine-tune the text embeddings using AnglE optimization. This is the primary usage of LLaMA in our paper. The models listed in English STS Results belong to this setting. In this setting, LLaMA parameters will be optimized.

  2. Second usage: we maintain the original function of LLaMA and utilize it as an annotator to generate supervised training datasets through prompt engineering. We refer to this approach as LLM-supervised. In this setting, the parameters of LLaMA remain fixed as we solely use it for text generation. We introduce this method because AnglE is a supervised approach. However, in domain-specific applications, supervised data is often limited. The use of LLM-supervised can help address this issue by providing additional labeled data.

WhereIsAI org

Hello,

I was reading the README file on the GitHub repository of the models and noticed that on the "English results" page (https://github.com/SeanLee97/AnglE#english-sts-results), certain Angle models include the name of the Llama LLM, such as "SeanLee97/angle-llama-7b-nli-v2". I'm curious about the significance of this. I attempted to glean insights from the associated paper, but I find myself confused. If I interpret correctly, the authors utilized Llama 2 to generate the training text dataset. Is this the only reason of the LLM's name inclusion in the name? Furthermore, does this imply that these AnglE models are more effective when paired specifically with Llama 2 as the LLM in RAG applications?

B.T.W., the models like SeanLee97/angle-llama-xxx were only trained on NLI datasets. Although it can achieve SOTA on STS tasks, we cannot guarantee that it generalizes well to other tasks. Therefore, if you want to use it for real applications, it is recommended to use WhereIsAI/UAE-Large-V1.

Thank you for your response and your time, @SeanLee97 ! Based on your feedback, it seems that my assumption, suggesting that it's preferable to use Llama 2 as the text generator (when I use these AnglE models) instead of other LLMs in RAG applications is incorrect, am I right?

WhereIsAI org

I think it is incorrect.
Our models likeSeanLee97/angle-llama-xxx can be used as feature generators for producing sentence embeddings for vector search.
It cannot be used as LLM for text generation.

Ok, I understand. I thought there was a connection between the selected embeddings model for vector search and the Language Model (LLM) for text generation. However, I now realize that these two components are entirely independent, despite the LLM being utilized in the training of the embeddings model. Thanks a lot for clarifying!

WhereIsAI org

Thanks again for following our work!

SeanLee97 changed discussion status to closed

Sign up or log in to comment