Datasets: NeurIPS LLM Challenge 2023
Datasets that were under consideration for usage in my submission to the 2023 NeurIPS Large Language Model Efficiency Challenge.
Viewer • Updated • 5.95k • 32Note Ultimately used in my full eval submission, with exclusion of dolly_hhrlhf. Included only in Mistral-7B-sft-v1.
databricks/databricks-dolly-15k
Viewer • Updated • 28.5k • 647Note Used both for Mistral-7B-sft-v0 and Mistral-7B-sft-v1 in my submissions.
hendrycks/competition_math
Updated • 11.7k • 89Note This turned out to be one of the holdout tasks. mosaicml/instruct-v3 includes training problems from competition_math. This is a contributor to the high scores on GSM8k and MATH benchmarks.
kaist-ai/CoT-Collection
Viewer • Updated • 525 • 85Note Looked promising, but did not have time to explore.
tasksource/icl-symbol-tuning-instruct
Viewer • Updated • 9 • 16Note Considered for improving ICL. Did not have time to explore.
cais/mmlu
Viewer • Updated • 1.49M • 229Note Decided against training on MMLU data.
GAIR/lima
Viewer • Updated • 6.13k • 378Note Avoided due to CC BY-NC-SA license, though it would have been allowed for the competition. Likely would have been a good resource otherwise.
grammarly/coedit
Viewer • Updated • 3.41k • 45Note The plan here would be to target robustness metrics by finetuning an expert model to correct perturbations and/or clarify the input. This could have paraphrasing or other text revision tasks if they appeared in the hidden eval. Did not have time to fully explore.
leslyarun/c4_200m_gec_train100k_test25k
Viewer • Updated • 99 • 5Note Similar use case as coedit.
wanyu/IteraTeR_human_sent
Viewer • Updated • 13Note Similar use case as coedit.
social_i_qa
Viewer • Updated • 23.1k • 7Note Now knowing that the holdout tasks had ethics questions, I wish I had used this.
lighteval/siqa
Viewer • Updated • 70.1k • 3Note Same as social_i_qa
tau/commonsense_qa
Viewer • Updated • 10.4k • 53Note Now knowing that the holdout tasks had ethics questions, I wish I had used this.
euirim/goodwiki
Viewer • Updated • 76 • 47Note Could have been useful for RAG.
multi_news
Viewer • Updated • 3.62k • 57Note The thought was this could help with CNN/DM summarization, but some quality and license concerns combined with acceptable performance without it led to its exclusion.
math_qa
Viewer • Updated • 32k • 68allenai/ropes
Viewer • Updated • 63 • 41allenai/openbookqa
Viewer • Updated • 4.29k • 60allenai/ai2_arc
Viewer • Updated • 1.16M • 86riddle_sense
Viewer • Updated • 199 • 21allenai/qasc
Viewer • Updated • 108 • 9nyu-mll/blimp
Viewer • Updated • 2.43k • 31google/boolq
Viewer • Updated • 7.37k • 54corypaik/prost
Viewer • Updated • 395 • 1allenai/sciq
Viewer • Updated • 5.84k • 77facebook/belebele
Viewer • Updated • 24.8k • 77derek-thomas/ScienceQA
Viewer • Updated • 2.27k • 108openlifescienceai/medmcqa
Viewer • Updated • 3.41k • 95embedding-data/QQP_triplets
Viewer • Updated • 532 • 4VMware/open-instruct
Viewer • Updated • 156 • 39