ISTA-DASLab
/

Meta-Llama-3-70B-Instruct-AQLM-2Bit-1x16

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Resources

View closed (1)

Biomed Foundation Model

#4 opened 13 days ago by

Yi-34B AQLM?

#3 opened 25 days ago by

~8 tok/sec with ~5k context on vLLM with Flash Attention and `kv_cache_dtype="fp8"` on 3090TI 24GB VRAM

#2 opened 27 days ago by