Irrelevant text generation while prompting

#63

by ad6398 - opened Jan 5

Jan 5

I am asking Phi-2 to explain photosynthesis, using two methods

Greedy decoding/ Normal Method

base_model_id = 'microsoft/phi-2'
eval_model = AutoModelForCausalLM.from_pretrained(base_model_id, trust_remote_code=True, torch_dtype=torch.float16, load_in_8bit=True)

with torch.no_grad():
    raw_op_greedy = eval_model.generate(**tok_eval_prompt, max_new_tokens=500, repetition_penalty=1.15)

Output

Nucleus Sampling

with torch.no_grad():
    rnp = eval_model.generate(**tok_eval_prompt, max_new_tokens=500, repetition_penalty=1.15, do_sample=True, top_p=0.90, num_return_sequences=3)

Outputs:

In all cases model is adding some irrelevant piece of text after the explanation. I was wondering what could be the reason, it it max_new_token parameter? Do we need to set it up explicitly for every query, after guessing what could be the length where Phi-2 won't add some redundant texts.

Second question I have is regrading support of sampling of generated text. I noticed below statement in model-card section, does this mean that along with beam search, Top-k or Top-p samplings are irrelevant to Phi-2 somehow and it is best while greedy decoding only?

In the generation function, our model currently does not support beam search (num_beams > 1).

I tried multiple flavour code-generation, Chat-mode, Instruction-output, Sampling method gave worst results. Are there any specific reasons or I am doing something wrong with sampling or any other parameter passing?

ad6398 changed discussion title from Irrelevant results while to Irrelevant text generation while prompting Jan 5

baasitsh

Jan 8

•

edited Jan 8

The instruct template you're using has a typo: "Instruct: " should be used instead of "Instruction".

More information is on the technical reports for phi

ad6398

Jan 8

@baasitsh it gave similar results.

baasitsh

Jan 8

Then it's probably because none of the phi models are instruction tuned or for chat use cases. So, they don't know when to stop generating.

TP6174

Jan 9

normally you need to preset the prompt with "Instruct" as well as prompt it with output termination string. The model outputs "end of text" string which you can pick up as it finishes. Normally this works very well, though on occasion it will keep going until it reaches the output token limit. As people have mentioned here this is a base model :) Here's what I'm using:

def generate_llm_response(model, tokenizer, device, prompt, max_length):

  output_termination = "\nOutput:"
  total_input = f"Instruct:{prompt}{output_termination}"
  inputs = tokenizer(total_input, return_tensors="pt", return_attention_mask=True)
  inputs = inputs.to(device)
  eos_token_id = tokenizer.eos_token_id

  outputs = model.generate(**inputs, max_length=max_length, eos_token_id=eos_token_id)

  # Find the position of "Output:" and extract the text after it
  generated_text = tokenizer.batch_decode(outputs)[0]

  # Split the text at "Output:" and take the second part
  split_text = generated_text.split("Output:", 1)
  assistant_response = split_text[1].strip() if len(split_text) > 1 else ""
  assistant_response = assistant_response.replace("<|endoftext|>", "").strip()

  return assistant_response

FrancescoCozzolino

Jan 11

•

edited Jan 11

I have created a naive solution for this problem which removes the extra text at the bottom of the answer. please check the code here github.com/YodaGitMaster/medium-phi2-deploy-finetune-llm

if you know a way to do it more beautifully, please write me a message, really looking forward to it.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment