Prompt Template with Langchain

#5
by Hanifahreza - opened

I'm trying to make an LLM-RAG system using Langchain and ChromaDB by imitating the given prompt template with this model, but the output is gibberish. Here's how I define the model, tokenizer, ChromaDB, and the prompt template:

# Load Model
model_id = "/home/model/SeaLLM-7B-v2.5/"
tokenizer = AutoTokenizer.from_pretrained(model_id, device_map='auto')
model = AutoModelForCausalLM.from_pretrained(model_id, device_map='auto')

# ChromaDB
db = Chroma.from_documents(pages, HuggingFaceEmbeddings(model_name="/home/model/all-MiniLM-L6-v2/"), persist_directory = '/home/playground/Triton/chromadb/')

prompt_template = """
<|im_start|>system
Anda adalah sistem asisten. Anda akan diberikan sebuah pertanyaan. Anda diberikan
konteks berikut untuk membantu menjawab pertanyaan tersebut:
CONTEXT: {context}<eos>
<|im_start|>user
QUESTION: {question}<eos>
<|im_start|>assistant
ANSWER:"""

prompt = PromptTemplate(template=prompt_template, input_variables=["context", "question"])
print(tokenizer.convert_ids_to_tokens(tokenizer.encode(prompt_template)))
# ['<bos>', '\n', '<', '|', 'im', '_', 'start', '|>', 'system', '\n', 'Anda', '▁adalah', '▁sistem', '▁asisten', '.', '▁Anda', '▁akan', '▁diberikan', '▁sebuah', '▁pertanyaan', '.', '▁Anda', '▁diberikan', '\n', 'kon', 'teks', '▁berikut', '▁untuk', '▁membantu', '▁menjawab', '▁pertanyaan', '▁tersebut', ':', '\n', 'CONTEXT', ':', '▁{', 'context', '}', '<eos>', '\n', '<', '|', 'im', '_', 'start', '|>', 'user', '\n', 'QUESTION', ':', '▁{', 'question', '}', '\n']

I suspect there's something wrong with my prompt template because I use Langchain but I can't find what. Any help is really appreciated. Thanks for your hard work.

SeaLLMs - Language Models for Southeast Asian Languages org

@Hanifahreza There should be no \n at the beginning, but I dont think that is an issue.

Can you craft your full langchain prompt into a complete prompt and run the model with model.generate(**inputs, do_sample=True, temperature=0.7) to see if it works normally?

Note that if you've set repetition penalty, you must set it to 1

Ok, so I have tried to craft the langchain prompt by eliminating the '\n' after the token like this:

prompt_template = """<|im_start|>system
Anda adalah sistem asisten. Anda akan diberikan sebuah pertanyaan yang harus dijawab dalam Bahasa Indonesia. 
Anda diberikan konteks berikut untuk membantu menjawab pertanyaan tersebut:
CONTEXT: {context}<eos>
<|im_start|>user
QUESTION: {question}
"""

prompt = PromptTemplate(template=prompt_template, input_variables=["context", "question"])
print(tokenizer.convert_ids_to_tokens(tokenizer.encode(prompt_template)))
#['<bos>', '<', '|', 'im', '_', 'start', '|>', 'system', '\n', 'Anda', '▁adalah', '▁sistem', '▁asisten', '.', '▁Anda', '▁akan', '▁diberikan', '▁sebuah', '▁pertanyaan', '▁yang', '▁harus', '▁di', 'jawab', '▁dalam', '▁Bahasa', '▁Indonesia', '.', '▁', '\n', 'Anda', '▁diberikan', '▁kon', 'teks', '▁berikut', '▁untuk', '▁membantu', '▁menjawab', '▁pertanyaan', '▁tersebut', ':', '\n', 'CONTEXT', ':', '▁{', 'context', '}', '<eos>', '\n', '<', '|', 'im', '_', 'start', '|>', 'user', '\n', 'QUESTION', ':', '▁{', 'question', '}', '\n']

then, I input a dummy context and question that is obvious to the prompt and fed it to the model directly like this:

inputs = {
    "context": 'net sales apple adalah 3 juta rupiah',
    "question": 'berapa net sales apple?'
}

full_prompt = prompt_template.format(**inputs)
generated_output = model.generate(input_ids=tokenizer.encode(full_prompt, return_tensors="pt"), max_length=100, do_sample=True, temperature=0.7)
print(tokenizer.decode(generated_output[0], skip_special_tokens=True))

The result of that print is:

'<|im_start|>system\nAnda adalah sistem asisten. Anda akan diberikan sebuah pertanyaan yang harus dijawab dalam Bahasa Indonesia. \nAnda diberikan konteks berikut untuk membantu menjawab pertanyaan tersebut:\nCONTEXT: net sales apple adalah 3 juta rupiah\n<|im_start|>user\nQUESTION: berapa net sales apple?\nANSWER: Net sales Apple adalah 3 juta rupiah.'

It seems like the model does indeed work. It provides the correct result in the ANSWER. After some investigations, I think I found the culprit behind the gibberish here:

db = Chroma.from_documents(pages, HuggingFaceEmbeddings(model_name="/home/model/all-MiniLM-L6-v2/"), persist_directory = '/home/playground/Triton/chromadb/')
retriever = db.as_retriever()
memory = ConversationBufferWindowMemory(
    memory_key="chat_history", k=4,
    return_messages=True,  input_key='question', output_key='answer')

qa = ConversationalRetrievalChain.from_llm(
      llm=llm,
      retriever=retriever,
      memory=memory,
      combine_docs_chain_kwargs={"prompt": prompt},
      return_generated_question=True
  )

question = "berapa net sales Apple?"
bot_result = qa({"question": question})

print(bot_result['generated_question'])
# 128011280112801128011280112801128011280112801128011280…
print(bot_result['answer'])
# 128011280112801128011280112801128011280112801128011280…

So I guess there's something wrong when the question is generated from the prompt template after the context and question is passed to it, but I don't understand what.

SeaLLMs - Language Models for Southeast Asian Languages org

@Hanifahreza I remembered this case. When you pass in llm=llm, it doesn't follow the chat format, but directly inject the prompt/instruction as pure text, which cause the model fails to follow the instruction. You need to figure it out

Sign up or log in to comment