For this quantization, we used 1 codebook of 16 bits.

Results (measured with lm_eval==4.0):

Model	Quantization	MMLU (5-shot)	ArcC	ArcE	Hellaswag	Winogrande	PiQA	Model size, Gb
meta-llama/Meta-Llama-3-70B	-	0.7868	0.6041	0.8670	0.6634	0.8248	0.8090	141.2
	1x16	0.7486	0.5009	0.7896	0.6346	0.7954	0.7814	21.9

Safetensors

Model size

11B params

Tensor type

FP16

I16

Collection including ISTA-DASLab/Meta-Llama-3-70B-AQLM-2Bit-1x16