Model Introduction
360Zhinao-search uses the self-developed BERT model as the base for multi-task fine-tuning, which has an average score of 75.05 on the Retrieval task on the C-MTEB-Retrieval benchmark, currently ranking first.
C-MTEB-Retrieval leaderboard contains a total of 8 [query, passage] similarity retrieval subtasks in different fields, using NDCG@10 (Normalized Discounted Cumulative Gain @ 10) as the evaluation index.
Model | T2Retrieval | MMarcoRetrieval | DuRetrieval | CovidRetrieval | CmedqaRetrieval | EcomRetrieval | MedicalRetrieval | VideoRetrieval | Avg |
---|---|---|---|---|---|---|---|---|---|
360Zhinao-search | 87.12 | 83.32 | 87.57 | 85.02 | 46.73 | 68.9 | 63.69 | 78.09 | 75.05 |
AGE_Hybrid | 86.88 | 80.65 | 89.28 | 83.66 | 47.26 | 69.28 | 65.94 | 76.79 | 74.97 |
OpenSearch-text-hybrid | 86.76 | 79.93 | 87.85 | 84.03 | 46.56 | 68.79 | 65.92 | 75.43 | 74.41 |
piccolo-large-zh-v2 | 86.14 | 79.54 | 89.14 | 86.78 | 47.58 | 67.75 | 64.88 | 73.1 | 74.36 |
stella-large-zh-v3-1792d | 85.56 | 79.14 | 87.13 | 82.44 | 46.87 | 68.62 | 65.18 | 73.89 | 73.6 |
Optimization points
- Data filtering: Strictly prevent the C-MTEB-Retrieval test data from leaking, and clean all queries and passages in the test set;
- Data source enhancement: Use open source data and LLM synthetic data to improve data diversity;
- Negative example mining: Use multiple methods to deeply mine difficult-to-distinguish negative examples to improve information gain;
- Training efficiency: multi-machine multi-GPU training + Deepspeed method to optimize GPU memory utilization.
Usage
from typing import cast, List, Dict, Union
from transformers import AutoModel, AutoTokenizer
import torch
import numpy as np
tokenizer = AutoTokenizer.from_pretrained('qihoo360/360Zhinao-search')
model = AutoModel.from_pretrained('qihoo360/360Zhinao-search')
sentences = ['天空是什么颜色的', '天空是蓝色的']
inputs = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt', max_length=512)
if __name__ == "__main__":
with torch.no_grad():
last_hidden_state = model(**inputs, return_dict=True).last_hidden_state
embeddings = last_hidden_state[:, 0]
embeddings = torch.nn.functional.normalize(embeddings, dim=-1)
embeddings = embeddings.cpu().numpy()
print("embeddings:")
print(embeddings)
cos_sim = np.dot(embeddings[0], embeddings[1])
print("cos_sim:", cos_sim)
Reference
License
The source code of this repository follows the open-source license Apache 2.0.
360Zhinao open-source models support commercial use. If you wish to use these models or continue training them for commercial purposes, please contact us via email (g-zhinao-opensource@360.cn) to apply. For the specific license agreement, please see <<360 Zhinao Open-Source Model License>>.
- Downloads last month
- 565
Evaluation results
- map on MTEB CMedQAv1test set self-reported87.005
- mrr on MTEB CMedQAv1test set self-reported89.347
- map on MTEB CMedQAv2test set self-reported88.483
- mrr on MTEB CMedQAv2test set self-reported90.578
- map on MTEB MMarcoRerankingself-reported32.409
- mrr on MTEB MMarcoRerankingself-reported31.487
- map on MTEB T2Rerankingself-reported67.803
- mrr on MTEB T2Rerankingself-reported78.145
- map_at_1 on MTEB CmedqaRetrievalself-reported27.171
- map_at_10 on MTEB CmedqaRetrievalself-reported40.109
- map_at_100 on MTEB CmedqaRetrievalself-reported41.938
- map_at_1000 on MTEB CmedqaRetrievalself-reported42.051
- map_at_3 on MTEB CmedqaRetrievalself-reported35.883
- map_at_5 on MTEB CmedqaRetrievalself-reported38.220
- mrr_at_1 on MTEB CmedqaRetrievalself-reported41.285
- mrr_at_10 on MTEB CmedqaRetrievalself-reported49.247
- mrr_at_100 on MTEB CmedqaRetrievalself-reported50.199
- mrr_at_1000 on MTEB CmedqaRetrievalself-reported50.245
- mrr_at_3 on MTEB CmedqaRetrievalself-reported46.837
- mrr_at_5 on MTEB CmedqaRetrievalself-reported48.223
- ndcg_at_1 on MTEB CmedqaRetrievalself-reported41.285
- ndcg_at_10 on MTEB CmedqaRetrievalself-reported46.727
- ndcg_at_100 on MTEB CmedqaRetrievalself-reported53.791
- ndcg_at_1000 on MTEB CmedqaRetrievalself-reported55.706
- ndcg_at_3 on MTEB CmedqaRetrievalself-reported41.613
- ndcg_at_5 on MTEB CmedqaRetrievalself-reported43.703
- precision_at_1 on MTEB CmedqaRetrievalself-reported41.285
- precision_at_10 on MTEB CmedqaRetrievalself-reported10.340
- precision_at_100 on MTEB CmedqaRetrievalself-reported1.602
- precision_at_1000 on MTEB CmedqaRetrievalself-reported0.184
- precision_at_3 on MTEB CmedqaRetrievalself-reported23.423
- precision_at_5 on MTEB CmedqaRetrievalself-reported16.914
- recall_at_1 on MTEB CmedqaRetrievalself-reported27.171
- recall_at_10 on MTEB CmedqaRetrievalself-reported57.049
- recall_at_100 on MTEB CmedqaRetrievalself-reported86.271
- recall_at_1000 on MTEB CmedqaRetrievalself-reported99.023
- recall_at_3 on MTEB CmedqaRetrievalself-reported41.528
- recall_at_5 on MTEB CmedqaRetrievalself-reported48.162
- map_at_1 on MTEB CovidRetrievalself-reported73.762
- map_at_10 on MTEB CovidRetrievalself-reported81.663
- map_at_100 on MTEB CovidRetrievalself-reported81.871
- map_at_1000 on MTEB CovidRetrievalself-reported81.877
- map_at_3 on MTEB CovidRetrievalself-reported80.102
- map_at_5 on MTEB CovidRetrievalself-reported81.162
- mrr_at_1 on MTEB CovidRetrievalself-reported74.078
- mrr_at_10 on MTEB CovidRetrievalself-reported81.745
- mrr_at_100 on MTEB CovidRetrievalself-reported81.953
- mrr_at_1000 on MTEB CovidRetrievalself-reported81.959
- mrr_at_3 on MTEB CovidRetrievalself-reported80.260
- mrr_at_5 on MTEB CovidRetrievalself-reported81.266
- ndcg_at_1 on MTEB CovidRetrievalself-reported73.973
- ndcg_at_10 on MTEB CovidRetrievalself-reported85.021
- ndcg_at_100 on MTEB CovidRetrievalself-reported85.884
- ndcg_at_1000 on MTEB CovidRetrievalself-reported86.023
- ndcg_at_3 on MTEB CovidRetrievalself-reported82.034
- ndcg_at_5 on MTEB CovidRetrievalself-reported83.905
- precision_at_1 on MTEB CovidRetrievalself-reported73.973
- precision_at_10 on MTEB CovidRetrievalself-reported9.631
- precision_at_100 on MTEB CovidRetrievalself-reported1.000
- precision_at_1000 on MTEB CovidRetrievalself-reported0.101
- precision_at_3 on MTEB CovidRetrievalself-reported29.329
- precision_at_5 on MTEB CovidRetrievalself-reported18.546
- recall_at_1 on MTEB CovidRetrievalself-reported73.762
- recall_at_10 on MTEB CovidRetrievalself-reported95.258
- recall_at_100 on MTEB CovidRetrievalself-reported98.946
- recall_at_1000 on MTEB CovidRetrievalself-reported100.000
- recall_at_3 on MTEB CovidRetrievalself-reported87.460
- recall_at_5 on MTEB CovidRetrievalself-reported91.939
- map_at_1 on MTEB DuRetrievalself-reported25.967
- map_at_10 on MTEB DuRetrievalself-reported79.928
- map_at_100 on MTEB DuRetrievalself-reported82.764
- map_at_1000 on MTEB DuRetrievalself-reported82.794
- map_at_3 on MTEB DuRetrievalself-reported54.432
- map_at_5 on MTEB DuRetrievalself-reported69.246
- mrr_at_1 on MTEB DuRetrievalself-reported89.000
- mrr_at_10 on MTEB DuRetrievalself-reported92.810
- mrr_at_100 on MTEB DuRetrievalself-reported92.857
- mrr_at_1000 on MTEB DuRetrievalself-reported92.860
- mrr_at_3 on MTEB DuRetrievalself-reported92.467
- mrr_at_5 on MTEB DuRetrievalself-reported92.677
- ndcg_at_1 on MTEB DuRetrievalself-reported89.000
- ndcg_at_10 on MTEB DuRetrievalself-reported87.570
- ndcg_at_100 on MTEB DuRetrievalself-reported90.135
- ndcg_at_1000 on MTEB DuRetrievalself-reported90.427
- ndcg_at_3 on MTEB DuRetrievalself-reported84.889
- ndcg_at_5 on MTEB DuRetrievalself-reported84.607
- precision_at_1 on MTEB DuRetrievalself-reported89.000
- precision_at_10 on MTEB DuRetrievalself-reported42.245
- precision_at_100 on MTEB DuRetrievalself-reported4.834
- precision_at_1000 on MTEB DuRetrievalself-reported0.490
- precision_at_3 on MTEB DuRetrievalself-reported75.883
- precision_at_5 on MTEB DuRetrievalself-reported64.880
- recall_at_1 on MTEB DuRetrievalself-reported25.967
- recall_at_10 on MTEB DuRetrievalself-reported89.796
- recall_at_100 on MTEB DuRetrievalself-reported98.042
- recall_at_1000 on MTEB DuRetrievalself-reported99.610
- recall_at_3 on MTEB DuRetrievalself-reported57.084
- recall_at_5 on MTEB DuRetrievalself-reported74.763
- map_at_1 on MTEB EcomRetrievalself-reported53.600
- map_at_10 on MTEB EcomRetrievalself-reported63.948
- map_at_100 on MTEB EcomRetrievalself-reported64.379
- map_at_1000 on MTEB EcomRetrievalself-reported64.392
- map_at_3 on MTEB EcomRetrievalself-reported61.683
- map_at_5 on MTEB EcomRetrievalself-reported63.078
- mrr_at_1 on MTEB EcomRetrievalself-reported53.600
- mrr_at_10 on MTEB EcomRetrievalself-reported63.948
- mrr_at_100 on MTEB EcomRetrievalself-reported64.379
- mrr_at_1000 on MTEB EcomRetrievalself-reported64.392
- mrr_at_3 on MTEB EcomRetrievalself-reported61.683
- mrr_at_5 on MTEB EcomRetrievalself-reported63.078
- ndcg_at_1 on MTEB EcomRetrievalself-reported53.600
- ndcg_at_10 on MTEB EcomRetrievalself-reported68.904
- ndcg_at_100 on MTEB EcomRetrievalself-reported71.019
- ndcg_at_1000 on MTEB EcomRetrievalself-reported71.345
- ndcg_at_3 on MTEB EcomRetrievalself-reported64.308
- ndcg_at_5 on MTEB EcomRetrievalself-reported66.800
- precision_at_1 on MTEB EcomRetrievalself-reported53.600
- precision_at_10 on MTEB EcomRetrievalself-reported8.440
- precision_at_100 on MTEB EcomRetrievalself-reported0.943
- precision_at_1000 on MTEB EcomRetrievalself-reported0.097
- precision_at_3 on MTEB EcomRetrievalself-reported23.967
- precision_at_5 on MTEB EcomRetrievalself-reported15.580
- recall_at_1 on MTEB EcomRetrievalself-reported53.600
- recall_at_10 on MTEB EcomRetrievalself-reported84.400
- recall_at_100 on MTEB EcomRetrievalself-reported94.300
- recall_at_1000 on MTEB EcomRetrievalself-reported96.800
- recall_at_3 on MTEB EcomRetrievalself-reported71.900
- recall_at_5 on MTEB EcomRetrievalself-reported77.900
- map_at_1 on MTEB MMarcoRetrievalself-reported71.375
- map_at_10 on MTEB MMarcoRetrievalself-reported80.056
- map_at_100 on MTEB MMarcoRetrievalself-reported80.287
- map_at_1000 on MTEB MMarcoRetrievalself-reported80.294
- map_at_3 on MTEB MMarcoRetrievalself-reported78.479
- map_at_5 on MTEB MMarcoRetrievalself-reported79.519
- mrr_at_1 on MTEB MMarcoRetrievalself-reported73.739
- mrr_at_10 on MTEB MMarcoRetrievalself-reported80.535
- mrr_at_100 on MTEB MMarcoRetrievalself-reported80.735
- mrr_at_1000 on MTEB MMarcoRetrievalself-reported80.742
- mrr_at_3 on MTEB MMarcoRetrievalself-reported79.212
- mrr_at_5 on MTEB MMarcoRetrievalself-reported80.059
- ndcg_at_1 on MTEB MMarcoRetrievalself-reported73.739
- ndcg_at_10 on MTEB MMarcoRetrievalself-reported83.321
- ndcg_at_100 on MTEB MMarcoRetrievalself-reported84.350
- ndcg_at_1000 on MTEB MMarcoRetrievalself-reported84.542
- ndcg_at_3 on MTEB MMarcoRetrievalself-reported80.401
- ndcg_at_5 on MTEB MMarcoRetrievalself-reported82.107
- precision_at_1 on MTEB MMarcoRetrievalself-reported73.739
- precision_at_10 on MTEB MMarcoRetrievalself-reported9.878
- precision_at_100 on MTEB MMarcoRetrievalself-reported1.039
- precision_at_1000 on MTEB MMarcoRetrievalself-reported0.106
- precision_at_3 on MTEB MMarcoRetrievalself-reported30.053
- precision_at_5 on MTEB MMarcoRetrievalself-reported18.954
- recall_at_1 on MTEB MMarcoRetrievalself-reported71.375
- recall_at_10 on MTEB MMarcoRetrievalself-reported92.846
- recall_at_100 on MTEB MMarcoRetrievalself-reported97.498
- recall_at_1000 on MTEB MMarcoRetrievalself-reported98.992
- recall_at_3 on MTEB MMarcoRetrievalself-reported85.199
- recall_at_5 on MTEB MMarcoRetrievalself-reported89.220
- map_at_1 on MTEB MedicalRetrievalself-reported55.600
- map_at_10 on MTEB MedicalRetrievalself-reported61.035
- map_at_100 on MTEB MedicalRetrievalself-reported61.542
- map_at_1000 on MTEB MedicalRetrievalself-reported61.598
- map_at_3 on MTEB MedicalRetrievalself-reported59.683
- map_at_5 on MTEB MedicalRetrievalself-reported60.478
- mrr_at_1 on MTEB MedicalRetrievalself-reported55.600
- mrr_at_10 on MTEB MedicalRetrievalself-reported61.035
- mrr_at_100 on MTEB MedicalRetrievalself-reported61.542
- mrr_at_1000 on MTEB MedicalRetrievalself-reported61.598
- mrr_at_3 on MTEB MedicalRetrievalself-reported59.683
- mrr_at_5 on MTEB MedicalRetrievalself-reported60.478
- ndcg_at_1 on MTEB MedicalRetrievalself-reported55.600
- ndcg_at_10 on MTEB MedicalRetrievalself-reported63.686
- ndcg_at_100 on MTEB MedicalRetrievalself-reported66.417
- ndcg_at_1000 on MTEB MedicalRetrievalself-reported67.924
- ndcg_at_3 on MTEB MedicalRetrievalself-reported60.951
- ndcg_at_5 on MTEB MedicalRetrievalself-reported62.388
- precision_at_1 on MTEB MedicalRetrievalself-reported55.600
- precision_at_10 on MTEB MedicalRetrievalself-reported7.200
- precision_at_100 on MTEB MedicalRetrievalself-reported0.854
- precision_at_1000 on MTEB MedicalRetrievalself-reported0.097
- precision_at_3 on MTEB MedicalRetrievalself-reported21.533
- precision_at_5 on MTEB MedicalRetrievalself-reported13.620
- recall_at_1 on MTEB MedicalRetrievalself-reported55.600
- recall_at_10 on MTEB MedicalRetrievalself-reported72.000
- recall_at_100 on MTEB MedicalRetrievalself-reported85.400
- recall_at_1000 on MTEB MedicalRetrievalself-reported97.300
- recall_at_3 on MTEB MedicalRetrievalself-reported64.600
- recall_at_5 on MTEB MedicalRetrievalself-reported68.100