Restart the space for new models

#28
by infgrad - opened

Hi, @Muennighoff
Thanks for the great work!
I submitted two new Chinese Text Embedding models: "stella-base-zh" and "stella-large-zh" , can you help restart this space?

Thanks!

Massive Text Embedding Benchmark org

Done! Congrats on the strong performance! cc @Jinkin

Hi, @Muennighoff , thanks for taking the time to restart the leaderboard cache. Can you help this time refreshing the leaderboard?

I have another query:

  • How can I do have also a German leaderboard for MTEB like the CH and PO language that you have on the GH repo?
Massive Text Embedding Benchmark org

Dones!

We can add a German leaderboard but there's not many german datasets at this point so I would wait for more first

Okay, understood. I would try to generate a German dataset using translation.

If we consider the knowledge distillation technics for text representation of other languages, do you think it's worth it for the semantic search task? @Muennighoff

Massive Text Embedding Benchmark org

If you do human translation it's fine; A high quality machine translation may be OK, too (BEIR-PL is machine-translated).

By semantic search do you mean Retrieval?

Yes, exactly for the retrieval process.

Then to correct the rank of the result to the user query, I use a cross-encoder which works perfectly.
But sometimes, the default pre-trained model worked well with the German text, with some complicated sentences, it performed badly.

That is why I thought to use knowledge distillation.

Massive Text Embedding Benchmark org

I see. Not sure, maybe it could work

Sign up or log in to comment