Getting error while loading model_basename = "gptq_model-8bit-128g"

#20
by Pchaudhary - opened

I am using below code :

from transformers import AutoTokenizer, pipeline, logging
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig

model_name_or_path = "TheBloke/Llama-2-13B-chat-GPTQ"
model_basename = "gptq_model-8bit-128g"

use_triton = False

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)

model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
model_basename=model_basename,
use_safetensors=True,
trust_remote_code=True,
device="cuda:0",
use_triton=use_triton,
quantize_config=None)

But I am getting error :

FileNotFoundError Traceback (most recent call last)
in <cell line: 11>()
9 tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
10
---> 11 model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
12 model_basename=model_basename,
13 use_safetensors=True,

1 frames
/usr/local/lib/python3.10/dist-packages/auto_gptq/modeling/_base.py in from_quantized(cls, model_name_or_path, save_dir, device_map, max_memory, device, low_cpu_mem_usage, use_triton, torch_dtype, inject_fused_attention, inject_fused_mlp, use_cuda_fp16, quantize_config, model_basename, use_safetensors, trust_remote_code, warmup_triton, trainable, **kwargs)
712
713 if resolved_archive_file is None: # Could not find a model file to use
--> 714 raise FileNotFoundError(f"Could not find model in {model_name_or_path}")
715
716 model_save_name = resolved_archive_file

FileNotFoundError: Could not find model in TheBloke/Llama-2-13B-chat-GPTQ

Please update to AutoGPTQ 0.3.2, released yesterday. In AutoGPTQ 0.3.0 and 0.2.2 there was a bug where the revision parameter was not followed. This is now fixed.

Ok I will try this one .

Is the below code correct if I want to load model from a particular barch (i.e. gptq-8bit-128g-actorder_True) :

from transformers import AutoTokenizer, pipeline, logging
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig

model_name_or_path = "TheBloke/Llama-2-13B-chat-GPTQ"
model_basename = "gptq_model-4bit-128g"

use_triton = False

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)

model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
revision="gptq-8bit-128g-actorder_True",
model_basename=model_basename,
use_safetensors=True,
trust_remote_code=True,
device="cuda:0",
quantize_config=None)

Can you please provide me a python code to load 8 bit 128g model ?

Yes, just saw that one - presumably some subtle basename thing that changed perhaps?

The required model_basename changed yesterday (August 20th). It is now model_basename = "model" - or you can just leave that line out completely, as it's now configured automatically by quantize_config.json. You no longer need to specify model_basename in the .from_quantized() call. But if you do specify it, set it to "model".

This change has happened due to adding support for an upcoming change in Transformers, which will allow loading GPTQ models directly from Transformers

I did automatically update the README to reflect the model_basename change, but haven't mentioned the changes in more detail yet. I will be updating all GPTQ READMEs in the next 48 hours to make this clearer.

image.png

Ok, thanks for that - so this is the main branch model? What is suggest for the others, similar?

Same for all of them. They're all called model.safetensors now, and each branch's respective quantize_config.json includes that, so you don't need to specify model_basename any more.

Sign up or log in to comment