Average Column

by DaytonaPhil - opened

The AVERAGE column is statistically invalid. It should be removed.

Open LLM Leaderboard org

Though not all dataset contain the same number of samples, we still want to average the model score across "capabilities", seeing each task as a unique block. We'll keep it as is.

clefourrier changed discussion status to closed

Sign up or log in to comment