Token indices sequence length is longer than the specified maximum sequence length for this model (4645 > 2048)

#5
by Maverick17 - opened

How to get the full token sequence input range? Because, I get this warning:

Token indices sequence length is longer than the specified maximum sequence length for this model (4645 > 2048). Running this sequence through the model will result in indexing errors

Hugging Face H4 org

Can you share the code snippet you're running where this happens? The fine-tuning was done on sequences of max 2048 tokens, but the base model's context is much larger

@lewtun I'm running this code:

checkpoint = "HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1"

model = AutoModelForCausalLM.from_pretrained(
checkpoint,
load_in_4bit=True,
device_map="auto",
torch_dtype=torch.bfloat16,
)

pipe = pipeline(
"text-generation",
model=model,
tokenizer=checkpoint,
)

Hugging Face H4 org

Thanks! Can you share the prompt that is triggering the warning?

No, I can't but its a very long one... Almost 7000 tokens long.

Sign up or log in to comment