Prompt Template with Langchain

by Hanifahreza - opened Apr 25

Apr 25

I'm trying to make an LLM-RAG system using Langchain and ChromaDB by imitating the given prompt template with this model, but the output is gibberish. Here's how I define the model, tokenizer, ChromaDB, and the prompt template:

# Load Model
model_id = "/home/model/SeaLLM-7B-v2.5/"
tokenizer = AutoTokenizer.from_pretrained(model_id, device_map='auto')
model = AutoModelForCausalLM.from_pretrained(model_id, device_map='auto')

# ChromaDB
db = Chroma.from_documents(pages, HuggingFaceEmbeddings(model_name="/home/model/all-MiniLM-L6-v2/"), persist_directory = '/home/playground/Triton/chromadb/')

prompt_template = """
<|im_start|>system
Anda adalah sistem asisten. Anda akan diberikan sebuah pertanyaan. Anda diberikan
konteks berikut untuk membantu menjawab pertanyaan tersebut:
CONTEXT: {context}<eos>
<|im_start|>user
QUESTION: {question}<eos>
<|im_start|>assistant
ANSWER:"""

prompt = PromptTemplate(template=prompt_template, input_variables=["context", "question"])
print(tokenizer.convert_ids_to_tokens(tokenizer.encode(prompt_template)))
# ['<bos>', '\n', '<', '|', 'im', '_', 'start', '|>', 'system', '\n', 'Anda', '▁adalah', '▁sistem', '▁asisten', '.', '▁Anda', '▁akan', '▁diberikan', '▁sebuah', '▁pertanyaan', '.', '▁Anda', '▁diberikan', '\n', 'kon', 'teks', '▁berikut', '▁untuk', '▁membantu', '▁menjawab', '▁pertanyaan', '▁tersebut', ':', '\n', 'CONTEXT', ':', '▁{', 'context', '}', '<eos>', '\n', '<', '|', 'im', '_', 'start', '|>', 'user', '\n', 'QUESTION', ':', '▁{', 'question', '}', '\n']

I suspect there's something wrong with my prompt template because I use Langchain but I can't find what. Any help is really appreciated. Thanks for your hard work.

nxphi47

SeaLLMs - Language Models for Southeast Asian Languages org Apr 26

@Hanifahreza There should be no \n at the beginning, but I dont think that is an issue.

Can you craft your full langchain prompt into a complete prompt and run the model with model.generate(**inputs, do_sample=True, temperature=0.7) to see if it works normally?

Note that if you've set repetition penalty, you must set it to 1

Hanifahreza

Apr 26

Ok, so I have tried to craft the langchain prompt by eliminating the '\n' after the token like this:

prompt_template = """<|im_start|>system
Anda adalah sistem asisten. Anda akan diberikan sebuah pertanyaan yang harus dijawab dalam Bahasa Indonesia. 
Anda diberikan konteks berikut untuk membantu menjawab pertanyaan tersebut:
CONTEXT: {context}<eos>
<|im_start|>user
QUESTION: {question}
"""

prompt = PromptTemplate(template=prompt_template, input_variables=["context", "question"])
print(tokenizer.convert_ids_to_tokens(tokenizer.encode(prompt_template)))
#['<bos>', '<', '|', 'im', '_', 'start', '|>', 'system', '\n', 'Anda', '▁adalah', '▁sistem', '▁asisten', '.', '▁Anda', '▁akan', '▁diberikan', '▁sebuah', '▁pertanyaan', '▁yang', '▁harus', '▁di', 'jawab', '▁dalam', '▁Bahasa', '▁Indonesia', '.', '▁', '\n', 'Anda', '▁diberikan', '▁kon', 'teks', '▁berikut', '▁untuk', '▁membantu', '▁menjawab', '▁pertanyaan', '▁tersebut', ':', '\n', 'CONTEXT', ':', '▁{', 'context', '}', '<eos>', '\n', '<', '|', 'im', '_', 'start', '|>', 'user', '\n', 'QUESTION', ':', '▁{', 'question', '}', '\n']

then, I input a dummy context and question that is obvious to the prompt and fed it to the model directly like this:

inputs = {
    "context": 'net sales apple adalah 3 juta rupiah',
    "question": 'berapa net sales apple?'
}

full_prompt = prompt_template.format(**inputs)
generated_output = model.generate(input_ids=tokenizer.encode(full_prompt, return_tensors="pt"), max_length=100, do_sample=True, temperature=0.7)
print(tokenizer.decode(generated_output[0], skip_special_tokens=True))

The result of that print is:

'<|im_start|>system\nAnda adalah sistem asisten. Anda akan diberikan sebuah pertanyaan yang harus dijawab dalam Bahasa Indonesia. \nAnda diberikan konteks berikut untuk membantu menjawab pertanyaan tersebut:\nCONTEXT: net sales apple adalah 3 juta rupiah\n<|im_start|>user\nQUESTION: berapa net sales apple?\nANSWER: Net sales Apple adalah 3 juta rupiah.'

It seems like the model does indeed work. It provides the correct result in the ANSWER. After some investigations, I think I found the culprit behind the gibberish here:

db = Chroma.from_documents(pages, HuggingFaceEmbeddings(model_name="/home/model/all-MiniLM-L6-v2/"), persist_directory = '/home/playground/Triton/chromadb/')
retriever = db.as_retriever()
memory = ConversationBufferWindowMemory(
    memory_key="chat_history", k=4,
    return_messages=True,  input_key='question', output_key='answer')

qa = ConversationalRetrievalChain.from_llm(
      llm=llm,
      retriever=retriever,
      memory=memory,
      combine_docs_chain_kwargs={"prompt": prompt},
      return_generated_question=True
  )

question = "berapa net sales Apple?"
bot_result = qa({"question": question})

print(bot_result['generated_question'])
# 128011280112801128011280112801128011280112801128011280…
print(bot_result['answer'])
# 128011280112801128011280112801128011280112801128011280…

So I guess there's something wrong when the question is generated from the prompt template after the context and question is passed to it, but I don't understand what.

nxphi47

SeaLLMs - Language Models for Southeast Asian Languages org Apr 26

@Hanifahreza I remembered this case. When you pass in llm=llm, it doesn't follow the chat format, but directly inject the prompt/instruction as pure text, which cause the model fails to follow the instruction. You need to figure it out

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment