When load_in_8bit=True, the chat becomes VERY VERY SLOW and returns nothing

#53
by leoyangsw - opened
checkpoint = "THUDM/chatglm-6b"
model = AutoModel.from_pretrained(checkpoint,  torch_dtype=torch.float16,  device_map="auto", load_in_8bit=True, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(checkpoint, trust_remote_code=True)

history = []
while True:
    query = input("Man:\n").strip()
    response, history = model.chat(tokenizer, query, history=history) ### VERY VERY SLOW AND RETURN NOTHING
    print("\nBot:\n" + response)

I have the same problem, has it been solved?

I meet the same problem, sooooo slow and retrun none,have you sloved?

Sign up or log in to comment