Edit model card

limit: None, provide_description: False, num_fewshot: 5, batch_size: None

Task Version Metric Value Stderr
hendrycksTest-college_chemistry 1 acc 0.4600 ± 0.0501
acc_norm 0.4600 ± 0.0501
hendrycksTest-high_school_chemistry 1 acc 0.5222 ± 0.0351
acc_norm 0.5222 ± 0.0351
hendrycksTest-college_biology 1 acc 0.7222 ± 0.0375
acc_norm 0.7222 ± 0.0375
hendrycksTest-high_school_biology 1 acc 0.7355 ± 0.0251
acc_norm 0.7355 ± 0.0251
winogrande 0 acc 0.7758 ± 0.0117

This model was trained from base Mistral-7B-Instruct-v0.2 on 710 examples, 200 of which comes from camel-ai/biology set. The rest were scraped personally and consists of very long scientific articles and text books.

It beats Mistral-7B-Instruct-v0.2 in MMLU chemistry and biology. It should be able to generate mostly factual, basic and lengthy scientific text. I guess it could be "we have cosmopedia at home" for people who want to create cheap pretraining datasets from scratch.

Template:

[Context]
You are a helpful assistant. Read the instruction and write a response accordingly.

[User]
{prompt}

[Assistant]

image/png

Downloads last month
1
Safetensors
Model size
7.24B params
Tensor type
BF16
·

Dataset used to train Ba2han/BioMistral-v0.2