"How to run in llama.cpp" is for 2048 instead of 8192 context size

#1
by wolfram - opened

On the Model card, under "How to run in llama.cpp", the example command line limits the context to 2048 instead of the 8192 this model supports:

./main -t 10 -ngl 32 -m hermes-llongma-2-13b-8k.ggmlv3.q4_K_M.bin --color -c 2048 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "### Instruction: Write a story about llamas\n### Response:"

What would the proper command be? Besides raising the context size, does it require scaling parameters like --rope-freq-base or --rope-freq-scale?

Yes it does. I'll update that. I believe the command is -c 8192 --rope-freq-base 10000 --rope-freq-scale 0.5

Great, thanks for confirmation and updating the information.

It worked for me using the equivalent koboldcpp command line options: --contextsize 8192 --ropeconfig 0.5 10000

wolfram changed discussion status to closed

Sign up or log in to comment