How to run the full model ?

#171
by dounykim - opened

I'm trying to run the Mixtral 8x7B model. I've attempted to execute quantized models using GGUF and Llama CPP, but I haven't been able to run the entire model without quantization. I heard that a high-performance GPU is required to run this model as is. I'm wondering what level of GPU is needed and if you could recommend servers or websites where such GPUs are available for rent. I've used AWS before, but it turned out to be quite expensive. So, if anyone has experience using Mixtral 8x7B through AWS, I'd appreciate knowing the estimated costs as well.

Hi there,
The full precision model would require around 100gb of VRAM, that's how big the model is- You can check that out on their website directly.
Now for the GPU renting... why not Vast.AI? Tho it really depends on what you are going to use it for.

@pandora-s Hello. Thanks for the comment. I am trying to test out its performance by giving it some prompts and see the response. I checked the Vast.AI but can you recommend which one to use? Do I have to check if Per-GPU RAM is 100 GB?

Sign up or log in to comment