MT-Bench Results

#8
by 0-hero - opened

MT-Bench

Model MT-Bench
Claude 3 Opus 9.43
GPT-4-1106-Preview 9.32
Claude 3 Sonnet 9.18
WizardLM-2 8x22B 9.12
GPT-4-0314 8.96
Mixtral-8x22B-Instruct-v0.1 8.66
zephyr-orpo-141b-A35b-v0.1 8.17
Matter-0.2-8x22B 8.00

Nice!
It will be interesting to see more benchmark results here.
I guess Mixtral-8x22B-Instruct-v0.1 is better in multilingualty than WizardLM-2 8x22B.
Maybe merging them can work even better :)

Sign up or log in to comment