txin
/

35b-beta-long-3.75bpw-exl2

Text Generation

Inference Endpoints

text-generation-inference

Model card Files Files and versions Community

Edit model card

Configuration Parsing Warning: In config.json: "quantization_config.bits" must be an integer

Notes

3.75bpw test quant of CausalLM/35b-beta-long, which is in itself a finetune of CohereForAI/c4ai-command-r-v01 (hence the corrected licensing).
Theoretically should fit within 24GB of VRAM for inference.

TBA

Tokenizer is different from cohere - and chat template is ChatML - fully fine-tuned at 128K+

No loras, no quants, no tricks, 30M+ sft data.

Pressure Testing from: https://github.com/LeonEricsson/llmcontext

Downloads last month: 8

Datasets used to train txin/35b-beta-long-3.75bpw-exl2