Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
macadeliccc 
posted an update Mar 10
Post
Quantize 7B paramater models in 60 seconds using Half Quadratic Quantization (HQQ).

This game-changing technique allows for rapid quantization of models like Llama-2-70B in under 5 minutes, outperforming traditional methods by 50x in speed and offering high-quality compression without calibration data.

Mobius Labs innovative approach not only significantly reduces memory requirements but also enables the use of large models on consumer-grade GPUs, paving the way for more accessible and efficient machine learning research.

Mobius Labs' method utilizes a robust optimization formulation to determine the optimal quantization parameters, specifically targeting the minimization of errors between original and dequantized weights. This involves employing a loss function that promotes sparsity and utilizes a non-convex lp<1-norm, making the problem challenging yet solvable through a Half-Quadratic solver.

This solver simplifies the problem by introducing an extra variable and dividing the optimization into manageable sub-problems. Their implementation cleverly fixes the scale parameter to simplify calculations and focuses on optimizing the zero-point, utilizing closed-form solutions for each sub-problem to bypass the need for gradient calculations.

Check out the colab demo where you are able to quantize models (text generation and multimodal) for use with vLLM or Timm backend as well as transformers!

AutoHQQ: 👉 https://colab.research.google.com/drive/1cG_5R_u9q53Uond7F0JEdliwvoeeaXVN?usp=sharing
Code: https://github.com/mobiusml/hqq
HQQ Blog post: https://mobiusml.github.io/hqq_blog/

Edit: Here is an example of how powerful HQQ can be: macadeliccc/Nous-Hermes-2-Mixtral-8x7B-DPO-HQQ

Citations:

@misc {badri2023hqq,
title = {Half-Quadratic Quantization of Large Machine Learning Models},
url = {https://mobiusml.github.io/hqq_blog/},
author = {Hicham Badri and Appu Shaji},
month = {November},
year = {2023}
}
In this post