Spaces:

HuggingFaceH4
/

open_llm_leaderboard

Running on CPU Upgrade

App Files Files Community

748

GSM8K failure with Llama 3 finetunes

#703

by jeiku - opened 26 days ago

Discussion

jeiku

26 days ago

I have noticed a large number of GSM8K failures with Llama 3 finetunes and was wondering if HF has any plans to address this issue? I suspect it may be due to model uploaders modifying the tokenizer_config.json for GGUF/EXL2 quantization. I have uploaded a model which has not been altered to test this theory. I would love to hear what someone with more experience has to say.

SaylorTwift

Hugging Face H4 org 25 days ago

hi ! can you link the request of the model you submitted ? it will make it easier to check the logs and pinpoint the issue :)

jeiku

24 days ago

•

edited 24 days ago

https://huggingface.co/datasets/open-llm-leaderboard/details_jeiku__Average_Normie_l3_v1_8B

https://huggingface.co/datasets/open-llm-leaderboard/details_jeiku__Chaos_RP_l3_8B

I'm not sure if this is what you mean, but both of these failed GSM8K even though a prior model from the same lineage passed. I have also seen this issue with other creators. It may be unrelated but I am also having an issue with models disappearing from the leaderboard. I track their progress through eval, but they never post to the leaderboard.

clefourrier

Hugging Face H4 org 21 days ago

Hi @jeiku !
I believe it could be helpful for you to take a look at the FAQ (in the FAQ tab of the leaderboard). We explain there how to find request files, why some models don't appear on the leaderboard, etc.

MaziyarPanahi

16 days ago

Hi @SaylorTwift @clefourrier

To avoid opening a similar issue, I just noticed 4 of my new submissions are missing GSM8K. The models are similar to others which have all the metrics successfully. The only difference I see with these 4 is they are missing generation_config:

Should I add generation_config file and re-submit?

alozowski

Hugging Face H4 org 12 days ago

Hi everyone,

There is indeed a bug with GSM8K for these models, we need a little more time to figure out what the problem is – we will get back as soon as possible!

MaziyarPanahi

12 days ago

Hi everyone,

There is indeed a bug with GSM8K for these models, we need a little more time to figure out what the problem is – we will get back as soon as possible!

Thanks @alozowski

I have locally tested with llm-eval (similar to the one LB uses it). It was empty, but after adding generation_config I can see GSM8K scores. Not sure if generation_config is the definite workaround, because I do have models without any generation_config file and they worked fine.

Thanks gain for your time looking into this

alozowski

Hugging Face H4 org 11 days ago

Hi @jeiku and @MaziyarPanahi ,

It seems that the problem is actually in the generation_config file. Could you please add it for your models and ping me here when you are ready? I will resubmit your models for evaluation right away

jeiku

10 days ago

Hi @jeiku and @MaziyarPanahi ,

It seems that the problem is actually in the generation_config file. Could you please add it for your models and ping me here when you are ready? I will resubmit your models for evaluation right away

I am not interested in resubmitting, but I will be sure to source a generation_config file for my next finetune. Unfortunately, mergekit does not produce this file with Llama 3, which I use to merge LoRAs en masse. I will grab the original file and include it in my next submission. Thank you for looking into this.

alozowski

Hugging Face H4 org 10 days ago

Since it appears that the situation has been resolved, I will close this discussion

alozowski changed discussion status to closed 10 days ago

MaziyarPanahi

10 days ago

Hi @jeiku and @MaziyarPanahi ,

It seems that the problem is actually in the generation_config file. Could you please add it for your models and ping me here when you are ready? I will resubmit your models for evaluation right away

I have added generation_config to all these models, you can re-submit them if possible:

Many thanks again, appreciate it.

alozowski

Hugging Face H4 org 10 days ago

Hi @MaziyarPanahi !

I've resubmitted your Llama-3-8B-Instruct-v0.2 model as it has no GSM8K results, but you can already check other models in the leaderboard – see my screenshot

MaziyarPanahi

10 days ago

@alozowski Than you so much! I appreciate your help.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment