IndexError while using with language "japanese" (transcribe)

#16

by KKotaki - opened Apr 10, 2023

Discussion

KKotaki

Apr 10, 2023

An error occurs in the following chord in rare cases.
Please let me know the solution.

There was no difference in the shape of the input from other voice data.

code:

device = 'cuda'
model_path = 'openai/whisper-medium'

model = WhisperForConditionalGeneration.from_pretrained(model_path)
processor = WhisperProcessor.from_pretrained(model_path, language="Japanese", task="transcribe")

model.config.forced_decoder_ids = self._processor.get_decoder_prompt_ids( language = "ja", task = "transcribe")
model.config.suppress_tokens = []
model.to(device)

inputs = processor.feature_extractor(
            audio_data,
            return_tensors="pt",
            sampling_rate=16_000
).input_features.to(device)
print(inputs.shape)
# torch.Size([1, 80, 3000])

predicted_ids = self._model.generate(
            inputs,
            max_length=sample_rate * 30,
           forced_decoder_ids=model.config.forced_decoder_ids
)
# error occured!!
# I don't think "forced_decoder_ids=..." needs to be done, but for some reason the language is "fi", so I specify it explicitly.

error:

Traceback (most recent call last):
  File "/xxx/infer.py", line 68, in transcribe
    predicted_ids = self._model.generate(
  File "/xxx/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/xxx/lib/python3.9/site-packages/transformers/generation/utils.py", line 1391, in generate
    return self.greedy_search(
  File "/xxx/lib/python3.9/site-packages/transformers/generation/utils.py", line 2189, in greedy_search
    next_token_logits = outputs.logits[:, -1, :]
IndexError: index -1 is out of bounds for dimension 1 with size 0

sanchit-gandhi

Apr 21, 2023

Hey @KKotaki ! Thanks for reporting this! It looks like there's an issue with the generation code. Could you open an issue in HF Transformers, sharing the code, audio file and full error trace? We'll then be able to discuss the issue with you and propose a fix 🤗 Thank you! Issue link here: https://github.com/huggingface/transformers/issues/new?assignees=&labels=&template=bug-report.yml

sanchit-gandhi

Apr 21, 2023

•

edited Apr 21, 2023

Taking a closer look at your code, I notice that you have max_length set to 30 * 16000 - it's worth noting that max length corresponds to the maximum length of the output text tokens (rather than the audio inputs), so you can set this to ~256.

See https://huggingface.co/docs/transformers/main_classes/text_generation#transformers.GenerationConfig.max_length

sanchit-gandhi

Apr 21, 2023

•

edited Apr 27, 2023

With the latest version of transformers (4.29), you can omit forced_decoder_ids and pass language="japanese" directly to generate

sanchit-gandhi

Apr 21, 2023

If these two suggestions do not work, I'd recommend opening an issue as described above!

KKotaki

Apr 25, 2023

@sanchit-gandhi
Thank you for your response!
I will try and reply to you regarding the points you raised.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment