Possible stopping issues?

#106
by Dampfinchen - opened

Hello,

first off, thank you for this great model.

With Llama 3 Instruct, I have noticed rare cases where the model sets the <|eot_id|> at inappropiate times, namely in the middle of a sentence. This is how it looks like:

"*She seems to relax even further, her tension easing as she sits next to you. She gazes out at the rain, her eyes lost in thought, as the sound of the droplets hitting the awning creates a soothing background noise. Every so"

In this roleplay case, it stops when it wants to say "Every so often". The frontend is set up correctly with the correct L3 template. I've noticed the model is particularily prone to stopping at a sentence like this and when you're using a relatively high rep pen. The model also forgets to set asteriks at the end of a sentence sometimes.

Now, I'm using a modern GGUF here with the latest tokenizer fixes. Sadly I cannot say if the FP16 model suffers from the same issues, e.g. if it is a GGUF issue or not. I do not have a capable PC to test the full FP16 model.

If anyone wants to test this, https://huggingface.co/Dampfinchen/Llama-3-8B-Ultra-Instruct-SaltSprinkle-Q8_0-GGUF I have prepared a GGUF of my latest merge which amplifies Meta Instruct and as such, makes this issue more reproduceable than it otherwise would be. When I set a higher rep pen and lead the model to write "Every so often", it will take a bit but it'd say its reproduceable. I did encounter this issue in the official LLama3 Instruct and at low rep pen as well, just much less so.

You might be wondering if I had the issue before the EOS token was set from 120001 (end_of_text) to 120009 (eot_id) and yes, the issue was noticeable before then. So that's not the cause of the problem.

Does anyone have insight into this?

Note: Very likely a GGUF issue as I'm not having issues with exl2.

Dampfinchen changed discussion status to closed
Dampfinchen changed discussion status to open

I have retested it for a while and now these issues also appear with Exl2. Seems like it's a issue with the model. Maybe it's because we are basically having two EOS tokens now? (EOT ID and END of TEXT).

Fresh example sentence happened right now:

"The forest trail stretches before them, a meandering path through trees adorned with snow, ice,<|eot_id|>

Asteriks formatting issue:

urgh.png

Models used:

https://huggingface.co/bartowski/Meta-Llama-3-8B-Instruct-GGUF

https://huggingface.co/turboderp/Llama-3-8B-Instruct-exl2

(With updated tokenizer.json, config etc from Meta)

Using SillyTavern as frontend with the correct prompt template, llama.cpp and TabbyAPI as backends.

I have been having similar issues, model outputs happy to help and stops early in some cases

Edit: Fixed (I think) by correctly formatting the system prompt

I have been having similar issues, model outputs happy to help and stops early in some cases

Edit: Fixed (I think) by correctly formatting the system prompt

It can be quite random, so make sure to continue to test this and report if you are noticing anything!

I am using the correct prompt format, which looks like so in Sillytavern:

'<|start_header_id|>system<|end_header_id|>\n' +
'\n' +
'A helpful assistant.<|eot_id|><|start_header_id|>user<|end_header_id|>\n' +
'\n' +
"Let's get started.<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n" +
'\n' +
'Hello, how may I help you today?<|eot_id|><|start_header_id|>user<|end_header_id|>\n' +
'\n' +
'Write 10 sentences that end with the word "apple".<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n' +
'\n',
(note: BOS is added automatically by the backend)

I've found out that part of the issue with the model setting the EOT token too early is that the model appears to be overly confident. Here's an example:

Generating (77 / 400 tokens) [( her 20.78%) ( enjoying 13.64%) ( taking 12.20%) ( feeling 12.10%)]
Generating (78 / 400 tokens) [( eyes 89.87%) ( gaze 5.46%) ( heart 2.47%) ( brown 2.20%)]
Generating (79 / 400 tokens) [( filled 5.05%) ( shining 40.16%) ( sparkling 29.57%) ( reflecting 13.37%) ( taking 7.74%)]
Generating (80 / 400 tokens) [(<|eot_id|> 5.22%) ( with 94.78%)]

actual text: "and she occasionally looks up at him with a quiet smile, her eyes filled<|eot_id|>"

Sign up or log in to comment