Hallucination issue in Llama-2-13B-chat-GPTQ

#45
by DivyanshTiwari7 - opened

I am using this model in text-generation-webui of ooba. But it is hallucinating alot and even not following the given prompt properly.
My use-case is chatPDF where i am providing chunks of pdfs as a knowledge source in the prompt along with the question i need answer to. But even if answer is present in chunks, Llama2 sometimes is not able to generate correct response. It sometimes gives out of context answers as well from its own knowledge base instead of sticking to the prompt.

Can you anyone please explain the reason of such issues and its fix if possible. Any discussion on this would be highly appreciated.
Thank you

@TheBloke

I mean it is a quantized 13b model but you might be using the wrong prompt format? You need to choose the llama 2 prompt format.

Thanks for replying @johnwick123forevr , in text-generation-webui I am using openai base to call Llama 2 model which sends prompt exactly in GPT format but changes it on backend in the following format for Llama 2:

'''
You are a helpful AI assistant. Whatever conversation you have with the customer needs to be carried out in a polite and professional manner.
Generate answers ONLY based on the facts given in the list of sources below. When you generate an answer from a source, you need to use that source completely and not omit a single point. If there isn't enough information below, say you don't know and do not generate answers on your own. If asking a clarifying question to the user would help, ask the question.
For tabular information return it as an html table. Do not return markdown format. If the question is not in English, answer in the language used in the question.
For answers that do not have a table, generate the answer in a point wise manner.
For every answer you give, you HAVE to mention the section number from where you have generated the answer. For example you need to write the reference in the following format - (Reference Section: 10.4.2)

Sources:
{ Source knowledge chunk here }
Below is an instruction that describes a task. Write a response that appropriately completes the request.

Instruction:

{user query here}

'''

But Llama model does not stick to prompt always and hallucinate. Sometimes it does not give answer even if it is present in the source knowledge chunks.

Hm. That might be the problem.

Llama2 chat needs a format like this
[INST] <>
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
<>
{prompt}[/INST]

Yes I tried this format but it was exceeding token limit as we send some chunks too along with prompt. I will try this again with shorter prompt. thanks @johnwick123forevr

Hmm then just remove the whole you are helpful part. And just put you are an ai.

@johnwick123forevr Thanks. It made the result better but still hallucination of Llama 2 13b persist. It gives different responses for same query and not always correct. Do you think only prompting can help here or we need to look for alternatives like Fine tuning etc ??

@DivyanshTiwari7 " It gives different responses for same query". Try to set top_k=1 in model.generate or pipeline.

Sign up or log in to comment