Edit model card
YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Indobert Cross-Encoder

This is a Cross-Encoder model for ID that can be used for passage re-ranking. It was trained on the multilingual version of MS Marco Passage Ranking task.

The model can be used for Information Retrieval: See SBERT.net Retrieve & Re-rank.

Usage with SentenceTransformers

When you have SentenceTransformers installed, you can use the model like this:

from sentence_transformers import CrossEncoder
model = CrossEncoder('model_name', max_length=512)
query = 'How many people live in Berlin?'
docs = ['Berlin has a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers.', 'New York City is famous for the Metropolitan Museum of Art.']
pairs = [(query, doc) for doc in docs]
scores = model.predict(pairs)

Usage with Transformers

With the transformers library, you can use the model like this:

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model = AutoModelForSequenceClassification.from_pretrained('model_name')
tokenizer = AutoTokenizer.from_pretrained('model_name')

features = tokenizer(['How many people live in Berlin?', 'How many people live in Berlin?'], ['Berlin has a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers.', 'New York City is famous for the Metropolitan Museum of Art.'],  padding=True, truncation=True, return_tensors="pt")

model.eval()
with torch.no_grad():
    scores = model(**features).logits
    print(scores)

Performance

Model Mmarco Dev MrTyDi Test Miracal Test
MRR@10 R@1000 MRR@10 R@1000 NCDG@10 R@1K
$\text{BM25 (Elastic Search)}$ .114 .642 .279 .858 .391 .971
$\text{IndoBERT}_{\text{CAT}}$ .181 .642 .447 .858 .455 .971
Downloads last month
5