Model Card for Model ID

E-commerce query segmentation model in English.

Model Details

Model Description

This is a token classification model using BERT base uncased as the base model. The model is fine-tuned on the (QueryNER training dataset)[https://huggingface.co/datasets/bltlab/queryner].

Developed by: BLT Lab in collaboration with eBay.
Funded by: eBay
Shared by: (@cpalenmichel)[https://github.com/cpalenmichel]
Model type: Token Classification / Sequence Labeling / Chunking
Language(s) (NLP): English
License: CC-BY 4.0
Finetuned from model: BERT base uncased

Model Sources

Underlying model is based on BERT base-uncased.

Repository: https://github.com/bltlab/query-ner
Paper: Accepted at LREC-COLING Coming soon

Uses

Direct Use

Intended use is research purposes and e-commerce query segmentation.

Downstream Use

Potential downstream use cases include weighting entity spans, linking to knowledge bases, removing spans as a recovery strategy for null and low recall queries.

Out-of-Scope Use

This model is trained only on the training data of the QueryNER dataset. It may not perform well on other domains without additional training data and further fine-tuning.

Bias, Risks, and Limitations

See paper limitations section.

How to Get Started with the Model

See huggingface tutorials for token classification and access the model using AutoModelForTokenClassification. Note that we do some post processing to make use of only the first subtoken's tag unlike the inference API.

Training Details

Training Data

See paper for details.

Training Procedure

See paper for details.

Training Hyperparameters

See paper for details.

Evaluation

Evaluation details provided in the paper. Scoring was done using SeqScore using the conlleval repair method for invalid label transition sequences.

Testing Data, Factors & Metrics

Testing Data

QueryNER test set: https://huggingface.co/datasets/bltlab/queryner

Factors

Evaluation is reported with micro-F1 at the entity level on the QueryNER test set. We used conlleval repair method for invalid label transitions.

Metrics

We use micro-F1 at the entity level as this is fairly common practice for NER models.

Results

[More Information Needed]

Environmental Impact

Rough estimate

Hardware Type: 1 RTX 3090 GPU
Hours used: < 2 hours
Cloud Provider: Private
Compute Region: northamerica-northeast1
Carbon Emitted: 0.02

Citation

Accepted at LREC-COLING coming soon

BibTeX:

Accepted at LREC-COLING coming soon

Model Card Authors

Chester Palen-Michel (@cpalenmichel)[https://github.com/cpalenmichel]

Model Card Contact

Chester Palen-Michel (@cpalenmichel)[https://github.com/cpalenmichel]

bltlab
/

queryner-bert-base-uncased

Model Card for Model ID

Model Details

Model Description

Model Sources

Uses

Direct Use

Downstream Use

Out-of-Scope Use

Bias, Risks, and Limitations

How to Get Started with the Model

Training Details

Training Data

Training Procedure

Training Hyperparameters

Evaluation

Testing Data, Factors & Metrics

Testing Data

Factors

Metrics

Results

Environmental Impact

Citation

Model Card Authors

Model Card Contact

Dataset used to train bltlab/queryner-bert-base-uncased