llama.cpp support?

#6
by ct-2 - opened

Is there a way to run this on RAM or via disc with transformers in 4bit? Thanks!

Support is being worked on at llama.cpp, follow the issue at https://github.com/ggerganov/llama.cpp/issues/6877. That requires not only support for the model, but someone to actually up and make quantizations, which will also take a very long time considering the size of the model (and be wildly impractical for most users).

Sign up or log in to comment