Hugging Face
Models
Datasets
Spaces
Posts
Docs
Solutions
Pricing
Log In
Sign Up
ISTA-DASLab
/
Meta-Llama-3-70B-Instruct-AQLM-2Bit-1x16
like
13
Text Generation
Transformers
Safetensors
llama
facebook
meta
llama-3
conversational
text-generation-inference
Inference Endpoints
aqlm
arxiv:
2401.06118
Model card
Files
Files and versions
Community
4
Train
Deploy
Use this model
New discussion
New pull request
Resources
PR & discussions documentation
Code of Conduct
Hub documentation
All
Discussions
Pull requests
View closed (1)
Biomed Foundation Model
1
#4 opened 13 days ago by
amrothemich
Yi-34B AQLM?
#3 opened 25 days ago by
llama-anon
~8 tok/sec with ~5k context on vLLM with Flash Attention and `kv_cache_dtype="fp8"` on 3090TI 24GB VRAM
2
#2 opened 27 days ago by
ubergarm