Does quantization work?

#3
by KnutJaegersberg - opened

I played with your model a bit but I have not been able to quantize a fine tune based on it with awq, gptq, gguf.

What was the problem you got? I managed to quantize a gguf just fine with it.

Hmm... I fine tuned a model, merged that and tried to quantize that.
For awq / gptq it added an lm.head or something weight layer.
The output of the quantized model was rubbish.
I could not even make the 16fp model file for gguf, it failed at the first step also because it seemed to not get some layers or something...
Have you tried that with a qlora fine tune?
You can also try mine:
https://huggingface.co/KnutJaegersberg/Deacon-34B-qlora

It's a great model, generating good output.

Have you tried that with a qlora fine tune?

The one I used was: https://huggingface.co/Doctor-Shotgun/limarpv3-yi-llama-34b-lora

https://huggingface.co/KnutJaegersberg/Deacon-34B-qlora

The merge script failed to merge your qlora, probably because I tried to merge it with "chargoddard/Yi-34B-Llama" but your qlora seems to use "KnutJaegersberg/Yi-34B-Llamafied".

yes, I made it refer to that one. I fine tuned the model with autotrain, with merge adapter argument. but then a separate adapter still came out of it.
Locally I tried it with your model, or at least the adapter config referred to it, that worked. For consistency I refer to the files that came out of it. That worked locally, too.
I also merged it manually afterwards, I think with your original model, merging worked. I tried to quantize that, that didn't work (well enough).

I'm still fairly new to fine tuning.

Perhaps the llamafied repo of mine is already the merged model... I did this earlier, manually removing the adapter, but for this model, my interactions in textgen webui gave me the feeling it needed it.

it's bloody edge tech and beginners luck :)

Sign up or log in to comment