int4/8 quantization so that we can deploy on consumer-grid gpu card

#7
by Yhyu13 - opened

Hi, would you like to release the quantized version of glm 10B, this would allow to run on a 16GB card which is great

Hi, you can check out my PR which allows for in8 quantization. I haven't tested for in4.

Sign up or log in to comment