Hey guys I am using Meta-Llama-3-8B-Instruct-Q5_K_M.gguf and my setup is like this for rag:

LlamaCpp modeli tanımlanıyor

llm = LlamaCpp(
model_path=r"E:\llm-models\Meta-Llama-3-8B-Instruct-Q5_K_M.gguf",
temperature=0.5,
max_tokens=1024,
top_p=0.9,
repeat_penalty=1.1,
n_batch=4096,
n_ctx=8000,
n_gpu_layers=-1
)

template = """
You are an assistant for answering questions.
You are an assistant whose purpose is to provide appropriate answers to questions based on the data and information provided to you.
Avoid deviating from the topic when responding. For example, if you're asked about a specific date, try to answer based on the information provided.
Also, when you don't have an answer, the information I've acquired may be insufficient to respond to that question.
When a user asks you a question regarding a KPI for March 2020, derive this inference from the provided information and share it with the user concisely.
Users may ask comparative questions, so strive to generate logical and meaningful responses. Keep your answers brief and avoid unnecessary elaboration or repetition.

For example 
If a question come like this Q:  Compare 2021 total paycell active customer count to 2022 paycell active customer count.
You should answer this A:  The total Paycell active customer count for 2021 was 5,309,428.0. The total Paycell active customer count for 2022 was 6,590,586.0. Therefore, the active customer count for Paycell increased from 2021 to 2022.
It should be short and concise.

Question: {question} 
Context: {context} 
Answer:

"""

CONDENSE_QUESTION_PROMPT = PromptTemplate.from_template(template)
#prompt_perspectives = ChatPromptTemplate.from_template(template)

def format_docs(docs):
return "\n\n".join(doc.page_content for doc in docs)

rag_chain = (
{"context": retriever | format_docs, "question": RunnablePassthrough()}
| CONDENSE_QUESTION_PROMPT
| llm
| StrOutputParser()
)

when I go to ask question the answer is coming like that:

user_question ="In which month did the highest virtual card activation occur, and what could be the reasons for this?"
result = rag_chain.invoke(user_question)
print(result)

llama_print_timings: load time = 26376.14 ms
llama_print_timings: sample time = 826.03 ms / 1024 runs ( 0.81 ms per token, 1239.66 tokens per second)
llama_print_timings: prompt eval time = 26373.97 ms / 2929 tokens ( 9.00 ms per token, 111.06 tokens per second)
llama_print_timings: eval time = 176159.48 ms / 1023 runs ( 172.20 ms per token, 5.81 tokens per second)
llama_print_timings: total time = 212466.79 ms / 3952 tokens
- - - - ( ( (nd ier in inier in in
( (: - - - - - - - -? - - - ( ( - ( ( ( (- (-- --- - - - - ( ( ( - - -.

( - ( - --- ( (- --- ( - (---.

:.
----.

(------------------ --- -{. (---------------- in-------------------------------------------------- ------------------------------------------------------------------------------------------------
---------. in ------
---------,- infor ------|-.--- -----. ( -.--...----..----: in-.------.-(.-. ----. -----20-- .9-.- 19-30.20-20- . ..-. ...-..--.90.- . .- in..- . in. . ..-.. (. ( |
... in 80 in in ( ( ..- -... .. 50. in in .. ( (....
3030 of305250 in in in30 in77 in.---. in in80-52.30 in60 in in302 in30-5050, in in in in in in5090 in in- in-7- of-90- in - in--- of the in in in in in the in 2. of of in of --20- of the in in in- the the50 10, in in 30- in 50 in the in the 30 52 20 of 22- 10- - 9 in in 30 30 in in90 in 20-10 -50 of 10 in in the - in-40 in 20 in30-20-20 20 in50 in in in in- in 50- 10-90 9090 of the in in in50 in 5090 90 10 909090 of50 of in in90 in the in the 60 10 in90 in60 in90 in in20 in in the60 in90,80 in the90 in90 in25090 in60 in in in in in the in60.
60 in60-10-60 in 10 of the in in 25 60-60 90-50-90-20-20 in in the25 in the 60 20 the in in60 in the in the in in in 20 in in in in in in in in in in in in in-20 in in- in in in 11 in in in in in20 in20-20 in in in in in in in 20 in20 in in in in in in 20 of20 in in in20 in20 in 20 in 20 in20 in20 in20-20 the20 in in the in in20 in20 in20 in20 in in20 in in in in in in in in 20 in 20 in20 in-8 in20 in,150()-25 in 20,20,20 20-20 in in-20 in in in in in in in inÂ 2 20 15

Where do I things wrong?

lmstudio-community
/

Meta-Llama-3-8B-Instruct-GGUF

Out Put of the model

LlamaCpp modeli tanımlanıyor