CPU Inference

#13
by Ange09 - opened

Hello TheBloke,
Is there any way to perform inference on CPU with the model?
Thank you very much.

Technically yes you can run GPTQ on CPU but it's horribly slow.

If you want CPU only inference, use the GGML versions found in https://huggingface.co/TheBloke/Llama-2-13B-chat-GGML

Sign up or log in to comment