Still endless output with any gguf if not instruct mode

#14
by alexcardo - opened

Looks like there is still no fix for using llama cpp in basic and server mode. The only usage possibility is "-ins" parameter, which allows you only to play with the model.

If like me you want to use it in productions (creating some app), with any prompt template it will generate the endless output. I've been trying all the possible prompt templates without any success.

If anyone can provide a proper solution, I would appreciate.

Quant Factory org

@alexcardo , can you try the v2 version https://huggingface.co/QuantFactory/Meta-Llama-3-8B-Instruct-GGUF-v2
This was created with llama.cpp's tokenizer fix for llama3

Sign up or log in to comment