Model can't stop from explaining itself

#53

by fedeparra - opened about 1 month ago

Discussion

fedeparra

about 1 month ago

•

edited about 1 month ago

Hi. Here's a problem I've noticed with all Phi models so far:

prompt = "You are a robot. Federico, your owner, says in Spanish 'por favor, ven a la cocina'. Please select among the following options the action that seems more appropriate to Federico's injunction: NAVIGATE, JUMP, DANCE, TURN, JOKE. Limit your response to one word."

Output: NAVIGATE

In this scenario, the most appropriate action for the robot to take in response to Federico's command "por favor, ven a la cocina" (please, come to the kitchen) would be to navigate towards the kitchen. The other options (JUMP, DANCE, JOKE) do not align with the request to move to a specific location.
Prompt length.

As you can see I specifically asked for one word response but the model can't help itself: it "feels" obliged to explain it's reasoning.

This is terrible for using the model as a parser or for limited set decisions like in this example, as we can't rely on the response.

Limiting output to just a bunch of tokens can help but different words have different token lengths so that's not a solution.

Besides, I'm sure the experts at Microsoft know some specific prompting magic that can make the model less verbose?

bitmman-nch

about 1 month ago

•

edited about 1 month ago

Hi,

Have you used chat format in your prompt? Something like:

<|user|>\nQuestion<|end|>\n<|assistant|>

Since it is an instruct version, formatting your prompt in the chat format might be hlepful.

fedeparra

about 1 month ago

I did follow that pattern, and I did try 20 or so different prompts to suggest the model to stop after one word, without success.

bitmman-nch

about 1 month ago

I'm encountering a similar problem. The model keeps repeating its answer and does not stop until reaches the max_new_token limit.

gugarosa

Microsoft org about 1 month ago

It could be related to some missing stop tokens. Could you please retry the generation using 32000, 32001 and 32007 as the stop tokens?

fedeparra

about 1 month ago

It could be related to some missing stop tokens. Could you please retry the generation using 32000, 32001 and 32007 as the stop tokens?

I don't think it's that since the model doesn't continue generating indefinitely - it does end right after the explanation; it just needs to explain it's reasoning and will not obey orders to not do so. It looks like it was trained specifically to explain its responses and can't help but do so no matter how much we ask not to.

gugarosa

Microsoft org 30 days ago

That’s odd. Even though I tried the 4k, it gives me the correct response on the Inference API:

When I removed the last instruction, it produces some additional content:

To show that it is not being cut due to amount of tokens, also used the following:

gugarosa

Microsoft org 30 days ago

•

edited 30 days ago

Additional explanation can be caused by stop tokens. For example, if the model generates a <|end|> and does not stop, it will try to keep generating extra information since it expects an user query or an assistant response.

fedeparra

30 days ago

Additional explanation can be caused by stop tokens. For example, if the model generates a <|end|> and does not stop, it will try to keep generating extra information since it expects an user query or an assistant response.

Interesting! I'm using the onnx version provided by Microsoft (it's in the same collection) that uses the new onnx runtime generator. Also, it's 4 bit quantized, and quantized models sometimes have issues with stop words.

I thought this was a common problem to all the versions. Now that I see that's not the case I'll rather repost on the onnx version - and I'll also check the stop word issue.

Thank you!

nguyenbh changed discussion status to closed 10 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment