Clémentine Fourrier's picture

Clémentine Fourrier

clefourrier

·

http://clefourrier.github.io

AI & ML interests

None yet

Articles

Introducing the Open Arabic LLM Leaderboard

Introducing the Open Leaderboard for Hebrew LLMs!

Bringing the Artificial Analysis LLM Performance Leaderboard to Hugging Face

Improving Prompt Consistency with Structured Generations

Introducing the Open Chain of Thought Leaderboard

The Open Medical-LLM Leaderboard: Benchmarking Large Language Models in Healthcare

Introducing the LiveCodeBench Leaderboard - Holistic and Contamination-Free Evaluation of Code LLMs

Introducing the Chatbot Guardrails Arena

Introducing ConTextual: How well can your Multimodal model jointly reason over text and image in text-rich scenes?

TTS Arena: Benchmarking Text-to-Speech Models in the Wild

Introducing the Red-Teaming Resistance Leaderboard

Introducing the Open Ko-LLM Leaderboard: Leading the Korean LLM Evaluation Ecosystem

NPHardEval Leaderboard: Unveiling the Reasoning Abilities of Large Language Models through Complexity Classes and Dynamic Updates

Introducing the Enterprise Scenarios Leaderboard: a Leaderboard for Real World Use Cases

The Hallucinations Leaderboard, an Open Effort to Measure Hallucinations in Large Language Models

A guide to setting up your own Hugging Face leaderboard: an end-to-end example with Vectara's hallucination leaderboard

2023, year of open LLMs

Open LLM Leaderboard: DROP deep dive

Overview of natively supported quantization schemes in 🤗 Transformers

What's going on with the Open LLM Leaderboard?

Introduction to Graph Machine Learning

Organizations

clefourrier's activity

New activity in HuggingFaceH4/open_llm_leaderboard 3 days ago

restore-emoji

#743 opened 4 days ago by

GSM8K do not appear.

#742 opened 4 days ago by

New activity in open-llm-leaderboard/results 4 days ago

Git clone fails because of invalid paths

#59 opened 7 days ago by

New activity in HuggingFaceH4/open_llm_leaderboard 4 days ago

force-model-revision

#739 opened 8 days ago by

porting-app-poc

#732 opened 11 days ago by

Llama-3 70b model eval failed and can't submit again

#709 opened 22 days ago by

New activity in open-llm-leaderboard/details_Weyaxi__a 6 days ago

Renaming Model Weyaxi/a to Weyaxi/Einstein-v6.1-LLama3-8B-Instruct-Ties

#1 opened 7 days ago by

New activity in open-llm-leaderboard/results 6 days ago

Renaming Model Weyaxi/a to Weyaxi/Einstein-v6.1-LLama3-8B-Instruct-Ties

#60 opened 7 days ago by

New activity in open-llm-leaderboard/requests 6 days ago

Renaming Model Weyaxi/a to Weyaxi/Einstein-v6.1-LLama3-8B-Instruct-Ties

#128 opened 7 days ago by

New activity in HuggingFaceH4/open_llm_leaderboard 11 days ago

performance-improvement

#705 opened 23 days ago by

New activity in HuggingFaceH4/open_llm_leaderboard 13 days ago

No good way to identify number of activated parameters causes MIxtral evaluation failures

#680 opened about 1 month ago by

New activity in HuggingFaceH4/open_llm_leaderboard 14 days ago

Resubmit mlabonne/OrpoLlama-3-8B

#725 opened 16 days ago by

New activity in HuggingFaceH4/open_llm_leaderboard 15 days ago

72b models eval failed

#689 opened 29 days ago by

About Building Own Leaderboard

#717 opened 19 days ago by

11B model evaluation failed

#722 opened 16 days ago by

Identifying flagged datasets

#723 opened 16 days ago by

Average Column

#724 opened 16 days ago by

How to Build Similar Leaderboard for Non-text Domain Language Models?

#727 opened 16 days ago by

New activity in HuggingFaceH4/open_llm_leaderboard 17 days ago

prod-mirror

#708 opened 22 days ago by

ALL Jamba models failing

#690 opened 28 days ago by

New activity in HuggingFaceH4/open_llm_leaderboard 18 days ago

Expand the existing or introduce a new "knowledge base" of the leaderboard to improve model classification

#581 opened 3 months ago by

GSM8K failure with Llama 3 finetunes

#703 opened 24 days ago by

Encoder + GRU for time series

#711 opened 21 days ago by

3

#714 opened 20 days ago by

Failed model evaluation

#716 opened 20 days ago by

GPTQ failed submission request

#669 opened about 1 month ago by

New activity in HuggingFaceH4/open_llm_leaderboard 24 days ago

dummy column refactoring

#688 opened 29 days ago by

Add google/recurrentgemma-2b-it

#677 opened about 1 month ago by

Failed evaluation, and cannot summit again

#699 opened 25 days ago by

Merge model failed

#700 opened 24 days ago by

failed run Llama-3-11.5B-v2

#698 opened 25 days ago by

Llama-3-8B finetuned model Failed Evaluation

#695 opened 26 days ago by

GPTQ and Mixtral models will need to be relaunched

#692 opened 27 days ago by

Removing FAILED models

#701 opened 24 days ago by

New activity in gaia-benchmark/leaderboard 25 days ago

Runtime error :(

#15 opened about 1 month ago by

New activity in HuggingFaceH4/open_llm_leaderboard 25 days ago

Runtime Error

#697 opened 25 days ago by

Llama 3 foundation models failing!

#693 opened 26 days ago by

New activity in open-llm-leaderboard/requests 25 days ago

Delete MaziyarPanahi/Llama-3-16B-Instruct-v0.1_eval_request_False_float16_Original.json

#114 opened 26 days ago by

Delete MaziyarPanahi/Llama-3-13B-Instruct-v0.1_eval_request_False_float16_Original.json

#113 opened 26 days ago by

Delete MaziyarPanahi/Goku-8x22B-v0.2_eval_request_False_float16_Original.json

#112 opened 26 days ago by

Delete MaziyarPanahi/Goku-8x22B-v0.1_eval_request_False_float16_Original.json

#110 opened 26 days ago by

Delete MaziyarPanahi/Goku-8x22B-v0.2_eval_request_False_float16_Original.json

#104 opened 28 days ago by

Failure request indicators

#103 opened 28 days ago by

FAILED

#105 opened 28 days ago by

Evaluation Failure NLPark/Test0_SLIDE

#107 opened 27 days ago by

Llama-3-MoE eval failed

#109 opened 26 days ago by

Evaluation Failure - rmdhirr/Calyx_7B

#106 opened 27 days ago by

Failed run

#108 opened 27 days ago by

New activity in HuggingFaceH4/open_llm_leaderboard 27 days ago

Future feature: system prompt and chat support

#459 opened 5 months ago by

New activity in MM-UPD/MM-UPD 28 days ago

Have a benchmark leaderboard for the dataset

#1 opened 28 days ago by

New activity in HuggingFaceH4/open_llm_leaderboard 29 days ago

python-upgrade

#683 opened about 1 month ago by

transformers-upgrade

#687 opened 29 days ago by

72b Eval model failed

#685 opened 30 days ago by

False Flagging

#686 opened 29 days ago by

Walmart-the-bag

Announcement: Flagging merged models with incorrect metadata

#510 opened 5 months ago by

New activity in HuggingFaceH4/open_llm_leaderboard 30 days ago

Any updates on redesigning the leaderboard?

#595 opened 3 months ago by

commented a paper 6 months ago

GAIA: a benchmark for General AI Assistants

Paper • 2311.12983 • Published Nov 21, 2023 • 171 •