Datasets: NeurIPS LLM Challenge 2023

mdouglas 's Collections

Papers

Papers: GEC/Revision

Papers: Instruct

Papers: MoE/Ensemble

Papers: PEFT

Papers: Evaluation

Papers: Models

Papers: Quantization

Papers: Pruning

Papers: LLM as a Judge

Reading List

Datasets: NeurIPS LLM Challenge 2023

updated Apr 10

Datasets that were under consideration for usage in my submission to the 2023 NeurIPS Large Language Model Efficiency Challenge.

Upvote

mosaicml/instruct-v3

Viewer • Updated Oct 2, 2023 • 5.95k • 32

Note Ultimately used in my full eval submission, with exclusion of dolly_hhrlhf. Included only in Mistral-7B-sft-v1.
databricks/databricks-dolly-15k

Viewer • Updated Jun 30, 2023 • 28.5k • 647

Note Used both for Mistral-7B-sft-v0 and Mistral-7B-sft-v1 in my submissions.
hendrycks/competition_math

Updated Jun 8, 2023 • 11.7k • 89

Note This turned out to be one of the holdout tasks. mosaicml/instruct-v3 includes training problems from competition_math. This is a contributor to the high scores on GSM8k and MATH benchmarks.
kaist-ai/CoT-Collection

Viewer • Updated Oct 14, 2023 • 525 • 85

Note Looked promising, but did not have time to explore.
tasksource/icl-symbol-tuning-instruct

Viewer • Updated Jul 26, 2023 • 9 • 16

Note Considered for improving ICL. Did not have time to explore.
cais/mmlu

Viewer • Updated Mar 8 • 1.49M • 229

Note Decided against training on MMLU data.
GAIR/lima

Viewer • Updated Jun 8, 2023 • 6.13k • 378

Note Avoided due to CC BY-NC-SA license, though it would have been allowed for the competition. Likely would have been a good resource otherwise.
grammarly/coedit

Viewer • Updated Oct 21, 2023 • 3.41k • 45

Note The plan here would be to target robustness metrics by finetuning an expert model to correct perturbations and/or clarify the input. This could have paraphrasing or other text revision tasks if they appeared in the hidden eval. Did not have time to fully explore.
leslyarun/c4_200m_gec_train100k_test25k

Viewer • Updated Oct 26, 2022 • 99 • 5

Note Similar use case as coedit.
wanyu/IteraTeR_human_sent

Viewer • Updated Oct 24, 2022 • 13

Note Similar use case as coedit.
social_i_qa

Viewer • Updated Jan 18 • 23.1k • 7

Note Now knowing that the holdout tasks had ethics questions, I wish I had used this.
lighteval/siqa

Viewer • Updated Oct 7, 2023 • 70.1k • 3

Note Same as social_i_qa
tau/commonsense_qa

Viewer • Updated Jan 4 • 10.4k • 53

Note Now knowing that the holdout tasks had ethics questions, I wish I had used this.
euirim/goodwiki

Viewer • Updated Sep 11, 2023 • 76 • 47

Note Could have been useful for RAG.
multi_news

Viewer • Updated Jan 18 • 3.62k • 57

Note The thought was this could help with CNN/DM summarization, but some quality and license concerns combined with acceptable performance without it led to its exclusion.
math_qa

Viewer • Updated Jan 18 • 32k • 68
allenai/ropes

Viewer • Updated Jan 4 • 63 • 41
allenai/openbookqa

Viewer • Updated Jan 4 • 4.29k • 60
allenai/ai2_arc

Viewer • Updated Dec 21, 2023 • 1.16M • 86
riddle_sense

Viewer • Updated Jan 18 • 199 • 21
allenai/qasc

Viewer • Updated Jan 4 • 108 • 9
nyu-mll/blimp

Viewer • Updated Jan 23 • 2.43k • 31
google/boolq

Viewer • Updated Jan 22 • 7.37k • 54
corypaik/prost

Viewer • Updated Oct 25, 2022 • 395 • 1
allenai/sciq

Viewer • Updated Jan 4 • 5.84k • 77
facebook/belebele

Viewer • Updated Nov 15, 2023 • 24.8k • 77
derek-thomas/ScienceQA

Viewer • Updated Feb 25, 2023 • 2.27k • 108
openlifescienceai/medmcqa

Viewer • Updated Jan 4 • 3.41k • 95
embedding-data/QQP_triplets

Viewer • Updated Aug 2, 2022 • 532 • 4
VMware/open-instruct

Viewer • Updated Jul 12, 2023 • 156 • 39

Upvote