Llama v2 GPTQ context length

#7
by andrewsameh - opened

I've noticed that the context length for all Llama v2 is 4k but for the quantized versions all are 2k as per the config json file.
Does the quantization lower the context length or is it something that can be adjusted?

Quick workaround is to simply override the max_context_length parameter such as config.max_context_length=4096
I have used it with prompts that are up to 4096 and it works.

Yeah this should be fixed and I will do so shortly. In the Meta repos they have this:

image.png

I'm not quite sure why max_position_embeddings is 2048 instead of 4096, but I think I will duplicate what they have unless/until told otherwise.

Has this been updated yet, I assume not as I've not seen it? :thinkingface:

Would also love to see an stability ai orca2 version of the llama2 13b, uncensored if that's possible yet :D

Thanks TheBloke, dude you fookin rock :)

Yeah this should be fixed and I will do so shortly. In the Meta repos they have this:

image.png

I'm not quite sure why max_position_embeddings is 2048 instead of 4096, but I think I will duplicate what they have unless/until told otherwise.

Meta already fixed max_position_embeddings to 4096 in their repos, but I think I saw this mentioned in another comment allready.

Yeah I've updated my max_length and max_position_embeddings to 4096 in my Llama 2 repos, matching Meta - who yes fixed this a couple of days ago

The branch main has been updated, but other branches are still 2048, such as gptq-4bit-128g-actorder_True. Will they be updated in the future?

Sign up or log in to comment