Model can't stop from explaining itself

#53
by fedeparra - opened

Hi. Here's a problem I've noticed with all Phi models so far:

prompt = "You are a robot. Federico, your owner, says in Spanish 'por favor, ven a la cocina'. Please select among the following options the action that seems more appropriate to Federico's injunction: NAVIGATE, JUMP, DANCE, TURN, JOKE. Limit your response to one word."

Output: NAVIGATE

In this scenario, the most appropriate action for the robot to take in response to Federico's command "por favor, ven a la cocina" (please, come to the kitchen) would be to navigate towards the kitchen. The other options (JUMP, DANCE, JOKE) do not align with the request to move to a specific location.
Prompt length.


As you can see I specifically asked for one word response but the model can't help itself: it "feels" obliged to explain it's reasoning.

This is terrible for using the model as a parser or for limited set decisions like in this example, as we can't rely on the response.

Limiting output to just a bunch of tokens can help but different words have different token lengths so that's not a solution.

Besides, I'm sure the experts at Microsoft know some specific prompting magic that can make the model less verbose?

Hi,

Have you used chat format in your prompt? Something like:

<|user|>\nQuestion<|end|>\n<|assistant|>

Since it is an instruct version, formatting your prompt in the chat format might be hlepful.

I did follow that pattern, and I did try 20 or so different prompts to suggest the model to stop after one word, without success.

I'm encountering a similar problem. The model keeps repeating its answer and does not stop until reaches the max_new_token limit.

Microsoft org

It could be related to some missing stop tokens. Could you please retry the generation using 32000, 32001 and 32007 as the stop tokens?

It could be related to some missing stop tokens. Could you please retry the generation using 32000, 32001 and 32007 as the stop tokens?

I don't think it's that since the model doesn't continue generating indefinitely - it does end right after the explanation; it just needs to explain it's reasoning and will not obey orders to not do so. It looks like it was trained specifically to explain its responses and can't help but do so no matter how much we ask not to.

Microsoft org

That’s odd. Even though I tried the 4k, it gives me the correct response on the Inference API:

IMG_3590.jpeg

When I removed the last instruction, it produces some additional content:

IMG_3591.jpeg

To show that it is not being cut due to amount of tokens, also used the following:

IMG_3592.jpeg

Additional explanation can be caused by stop tokens. For example, if the model generates a <|end|> and does not stop, it will try to keep generating extra information since it expects an user query or an assistant response.

Additional explanation can be caused by stop tokens. For example, if the model generates a <|end|> and does not stop, it will try to keep generating extra information since it expects an user query or an assistant response.

Interesting! I'm using the onnx version provided by Microsoft (it's in the same collection) that uses the new onnx runtime generator. Also, it's 4 bit quantized, and quantized models sometimes have issues with stop words.

I thought this was a common problem to all the versions. Now that I see that's not the case I'll rather repost on the onnx version - and I'll also check the stop word issue.

Thank you!

nguyenbh changed discussion status to closed

Sign up or log in to comment