NameError: name 'Kosmos2Tokenizer' is not defined

#10
by Ashwath-Shetty - opened

i'm getting the below error: @ydshieh
" NameError: name 'Kosmos2Tokenizer' is not defined"

system specs:

  • OS: AWS Sagemaker(Amazon Linux 2, Jupyter Lab 3
    (notebook-al2-v2))
  • Python: 3.10
  • Transformers: 4.31.0
  • PyTorch: 2.0.1
  • CUDA (python -c 'import torch; print(torch.version.cuda)'): 11.8

code:
import requests

from PIL import Image
from transformers import AutoProcessor, AutoModelForVision2Seq
model = AutoModelForVision2Seq.from_pretrained("ydshieh/kosmos-2-patch14-224", trust_remote_code=True)
processor = AutoProcessor.from_pretrained("ydshieh/kosmos-2-patch14-224", trust_remote_code=True)

prompt = "An image of"

url = "https://huggingface.co/ydshieh/kosmos-2-patch14-224/resolve/main/snowman.png"

image = Image.open("images/images_sample/f01-01-9780323479912.jpg")

The original Kosmos-2 demo saves the image first then reload it. For some images, this will give slightly different image input and change the generation outputs.

Uncomment the following 2 lines if you want to match the original demo's outputs.

(One example is the two_dogs.jpg from the demo)

image.save("new_image.jpg")

image = Image.open("new_image.jpg")

inputs = processor(text=prompt, images=image, return_tensors="pt")

output:

NameError Traceback (most recent call last)
Cell In[6], line 2
1 model = AutoModelForVision2Seq.from_pretrained("ydshieh/kosmos-2-patch14-224", trust_remote_code=True)
----> 2 processor = AutoProcessor.from_pretrained("ydshieh/kosmos-2-patch14-224", trust_remote_code=True)
4 prompt = "An image of"
6 # url = "https://huggingface.co/ydshieh/kosmos-2-patch14-224/resolve/main/snowman.png"

File ~/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/transformers/models/auto/processing_auto.py:269, in AutoProcessor.from_pretrained(cls, pretrained_model_name_or_path, **kwargs)
267 if os.path.isdir(pretrained_model_name_or_path):
268 processor_class.register_for_auto_class()
--> 269 return processor_class.from_pretrained(
270 pretrained_model_name_or_path, trust_remote_code=trust_remote_code, **kwargs
271 )
272 elif processor_class is not None:
273 return processor_class.from_pretrained(
274 pretrained_model_name_or_path, trust_remote_code=trust_remote_code, **kwargs
275 )

File ~/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/transformers/processing_utils.py:215, in ProcessorMixin.from_pretrained(cls, pretrained_model_name_or_path, cache_dir, force_download, local_files_only, token, revision, **kwargs)
211 if token is not None:
212 # change to token in a follow-up PR
213 kwargs["use_auth_token"] = token
--> 215 args = cls._get_arguments_from_pretrained(pretrained_model_name_or_path, **kwargs)
216 return cls(*args)

File ~/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/transformers/processing_utils.py:259, in ProcessorMixin._get_arguments_from_pretrained(cls, pretrained_model_name_or_path, **kwargs)
256 else:
257 attribute_class = getattr(transformers_module, class_name)
--> 259 args.append(attribute_class.from_pretrained(pretrained_model_name_or_path, **kwargs))
260 return args

File ~/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py:685, in AutoTokenizer.from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs)
683 else:
684 class_ref = tokenizer_auto_map[0]
--> 685 tokenizer_class = get_class_from_dynamic_module(class_ref, pretrained_model_name_or_path, **kwargs)
686 _ = kwargs.pop("code_revision", None)
687 if os.path.isdir(pretrained_model_name_or_path):

File ~/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/transformers/dynamic_module_utils.py:443, in get_class_from_dynamic_module(class_reference, pretrained_model_name_or_path, cache_dir, force_download, resume_download, proxies, use_auth_token, revision, local_files_only, repo_type, code_revision, **kwargs)
430 # And lastly we get the class inside our newly created module
431 final_module = get_cached_module_file(
432 repo_id,
433 module_file + ".py",
(...)
441 repo_type=repo_type,
442 )
--> 443 return get_class_in_module(class_name, final_module.replace(".py", ""))

File ~/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/transformers/dynamic_module_utils.py:164, in get_class_in_module(class_name, module_path)
160 """
161 Import a module on the cache directory for modules and extract a class from it.
162 """
163 module_path = module_path.replace(os.path.sep, ".")
--> 164 module = importlib.import_module(module_path)
165 return getattr(module, class_name)

File ~/anaconda3/envs/pytorch_p310/lib/python3.10/importlib/init.py:126, in import_module(name, package)
124 break
125 level += 1
--> 126 return _bootstrap._gcd_import(name[level:], package, level)

File :1050, in _gcd_import(name, package, level)

File :1027, in find_and_load(name, import)

File :1006, in find_and_load_unlocked(name, import)

File :688, in _load_unlocked(spec)

File :883, in exec_module(self, module)

File :241, in _call_with_frames_removed(f, *args, **kwds)

File ~/.cache/huggingface/modules/transformers_modules/ydshieh/kosmos-2-patch14-224/b9379a4db0f6c911ad452fb7235256ddb1ae0cea/tokenization_kosmos2_fast.py:48
37 PRETRAINED_VOCAB_FILES_MAP = {
38 "vocab_file": {
39 "microsoft/kosmos-2-patch14-224": "https://huggingface.co/microsoft/kosmos-2-patch14-224/resolve/main/sentencepiece.bpe.model",
40 }
41 }
43 PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES = {
44 "microsoft/kosmos-2-patch14-224": 2048,
45 }
---> 48 class Kosmos2TokenizerFast(PreTrainedTokenizerFast):
49 """
50 Construct a "fast" KOSMOS-2 tokenizer (backed by HuggingFace's tokenizers library). Adapted from
51 [RobertaTokenizer] and [XLNetTokenizer]. Based on
(...)
99 format <patch_index_xxxx> where xxxx is an integer.
100 """
102 vocab_files_names = VOCAB_FILES_NAMES

File ~/.cache/huggingface/modules/transformers_modules/ydshieh/kosmos-2-patch14-224/b9379a4db0f6c911ad452fb7235256ddb1ae0cea/tokenization_kosmos2_fast.py:106, in Kosmos2TokenizerFast()
104 max_model_input_sizes = PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES
105 model_input_names = ["input_ids", "attention_mask"]
--> 106 slow_tokenizer_class = Kosmos2Tokenizer
108 def init(
109 self,
110 vocab_file=None,
(...)
122 ):
123 # Mask token behave like a normal word, i.e. include the space before it
124 mask_token = AddedToken(mask_token, lstrip=True, rstrip=False) if isinstance(mask_token, str) else mask_token

NameError: name 'Kosmos2Tokenizer' is not defined

Hi,

Do you have sentencepiece installed?

Thanks @ydshieh , that solved the issue. just an advice, may be it's a good idea to add this to the tutorial.
thank you for your great work.

i have a couple of questions, i hope you don't mind.

  1. model is giving very brief output. i want to get the detailed output which describes the image(atleast 300 words). how can i achieve that?
  2. where can i find the list of model parameter to tune? & also any tool/framework to tune?
  3. can we finetune/train this on our own data? if yes, how?
  4. can we do few shot prompting? if yes, how?
    i know these are lot of question, thanks in advance.

model is giving very brief output. i want to get the detailed output which describes the image(atleast 300 words). how can i achieve that?

You can use <grounding>Describe this image in detail: as the prompt.

where can i find the list of model parameter to tune? & also any tool/framework to tune?

It's up to you to decide which parameters to tune. The original training trained the whole set of trainable parameters.

can we finetune/train this on our own data? if yes, how?

The original repository contains information about the used dataset: https://github.com/microsoft/unilm/tree/master/kosmos-2
I haven't tried (yet) anything related to training and the original datasets.

can we do few shot prompting? if yes, how?

It is, see the page 9 in their paper (you can find the link to it from the above line).
However, the current implementation done by me doesn't allow preparing easily such input format.

ydshieh changed discussion status to closed

thank you @ydshieh for patiently answering all the questions.
i'm still getting the same result though for "Describe this image in detail:". output length is less than 100 words.

You can try to add min_new_tokens=XXX (with a value you prefer) and/or change max_new_tokens=64 to something higher in the model.generate() call.
If the generation is still short (or long enough but with lower quality), it means the model is not really trained with long (enough) text desriptions.

Hi,

Do you have sentencepiece installed?

Thanks

Hello, I am interested too in fine-tuning this model on my downstream task data. Any news about it?

Sign up or log in to comment