AwanLLM/Awanllm-Llama-3-8B-Instruct-DPO-v0.2-GGUF

Based on Meta-Llama-3-8b-Instruct, and is governed by Meta Llama 3 License agreement: https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct

Realized a tokenization mistake with the previous DPO model. So this is now a new version testing out DPO training on the following dataset:

https://huggingface.co/datasets/mlabonne/orpo-dpo-mix-40k The open LLM results are really BAD lol. Something with this dataset is disagreeing with llama 3?

We are happy for anyone to try it out and give some feedback and we won't have the model up on https://awanllm.com on our LLM API...

Instruct format:

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

{{ system_prompt }}<|eot_id|><|start_header_id|>user<|end_header_id|>

{{ user_message_1 }}<|eot_id|><|start_header_id|>assistant<|end_header_id|>

{{ model_answer_1 }}<|eot_id|><|start_header_id|>user<|end_header_id|>

{{ user_message_2 }}<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Quants:

FP16: https://huggingface.co/AwanLLM/Awanllm-Llama-3-8B-Instruct-DPO-v0.2

GGUF: https://huggingface.co/AwanLLM/Awanllm-Llama-3-8B-Instruct-DPO-v0.2-GGUF