When load_in_8bit=True, the chat becomes VERY VERY SLOW and returns nothing

#53

by leoyangsw - opened Apr 27, 2023

Discussion

leoyangsw

Apr 27, 2023

•

edited Apr 27, 2023

checkpoint = "THUDM/chatglm-6b"
model = AutoModel.from_pretrained(checkpoint,  torch_dtype=torch.float16,  device_map="auto", load_in_8bit=True, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(checkpoint, trust_remote_code=True)

history = []
while True:
    query = input("Man:\n").strip()
    response, history = model.chat(tokenizer, query, history=history) ### VERY VERY SLOW AND RETURN NOTHING
    print("\nBot:\n" + response)

LLLyyy7

Jul 21, 2023

I have the same problem, has it been solved?

dodobird666

Sep 13, 2023

I meet the same problem, sooooo slow and retrun none,have you sloved?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment