Irrelevant text generation while prompting

#63
by ad6398 - opened

I am asking Phi-2 to explain photosynthesis, using two methods

Greedy decoding/ Normal Method

base_model_id = 'microsoft/phi-2'
eval_model = AutoModelForCausalLM.from_pretrained(base_model_id, trust_remote_code=True, torch_dtype=torch.float16, load_in_8bit=True)

with torch.no_grad():
    raw_op_greedy = eval_model.generate(**tok_eval_prompt, max_new_tokens=500, repetition_penalty=1.15)

Output
Screenshot 2024-01-04 at 7.30.01 PM.png

Nucleus Sampling

with torch.no_grad():
    rnp = eval_model.generate(**tok_eval_prompt, max_new_tokens=500, repetition_penalty=1.15, do_sample=True, top_p=0.90, num_return_sequences=3)

Outputs:

Screenshot 2024-01-04 at 7.30.50 PM.png

Screenshot 2024-01-04 at 7.31.11 PM.png

In all cases model is adding some irrelevant piece of text after the explanation. I was wondering what could be the reason, it it max_new_token parameter? Do we need to set it up explicitly for every query, after guessing what could be the length where Phi-2 won't add some redundant texts.

Second question I have is regrading support of sampling of generated text. I noticed below statement in model-card section, does this mean that along with beam search, Top-k or Top-p samplings are irrelevant to Phi-2 somehow and it is best while greedy decoding only?

In the generation function, our model currently does not support beam search (num_beams > 1).

I tried multiple flavour code-generation, Chat-mode, Instruction-output, Sampling method gave worst results. Are there any specific reasons or I am doing something wrong with sampling or any other parameter passing?

ad6398 changed discussion title from Irrelevant results while to Irrelevant text generation while prompting

The instruct template you're using has a typo: "Instruct: " should be used instead of "Instruction".

More information is on the technical reports for phi

@baasitsh it gave similar results.

Then it's probably because none of the phi models are instruction tuned or for chat use cases. So, they don't know when to stop generating.

normally you need to preset the prompt with "Instruct" as well as prompt it with output termination string. The model outputs "end of text" string which you can pick up as it finishes. Normally this works very well, though on occasion it will keep going until it reaches the output token limit. As people have mentioned here this is a base model :) Here's what I'm using:

def generate_llm_response(model, tokenizer, device, prompt, max_length):

  output_termination = "\nOutput:"
  total_input = f"Instruct:{prompt}{output_termination}"
  inputs = tokenizer(total_input, return_tensors="pt", return_attention_mask=True)
  inputs = inputs.to(device)
  eos_token_id = tokenizer.eos_token_id

  outputs = model.generate(**inputs, max_length=max_length, eos_token_id=eos_token_id)

  # Find the position of "Output:" and extract the text after it
  generated_text = tokenizer.batch_decode(outputs)[0]

  # Split the text at "Output:" and take the second part
  split_text = generated_text.split("Output:", 1)
  assistant_response = split_text[1].strip() if len(split_text) > 1 else ""
  assistant_response = assistant_response.replace("<|endoftext|>", "").strip()

  return assistant_response

I have created a naive solution for this problem which removes the extra text at the bottom of the answer. please check the code here github.com/YodaGitMaster/medium-phi2-deploy-finetune-llm

if you know a way to do it more beautifully, please write me a message, really looking forward to it.

Sign up or log in to comment