google/paligemma-3b-mix-224 · blog: https://huggingface.co/blog/paligemma

17 days ago

•

Thanks for the model, I am following the steps and completing some on the blog but when I run the one to train it tells me the following:

I share the 'Colab' link:
https://colab.research.google.com/drive/1eSJoBGOO0_oulB5gfXqkhtIqiLngKBwy?usp=sharing

I would also like to know if it is also necessary to implement:
model.hidden_activation= "gelu_pytorch_tanh"
He asks me by message.

Is it also possible to implement flash-attn?

I also wanted to know if it is compatible with the library:
from trl import SFTTrainer

thank you so much.

merve

Google org 17 days ago

•

edited 17 days ago

Hello, SFTTrainer is just a wrapper around the Trainer so I think it should work although it has some features on top like neftune which I don't know if they would work. About Gemma warnings you can ignore them. For index error let me check, I've wrote that part and I ran it a ton of times so it shouldn't've happened 😅

merve

Google org 17 days ago

This comment has been hidden

merve

Google org 17 days ago

Let me give you my training script that for sure works in the meanwhile I figure out what line I missed when I moved to blog:

from datasets import load_dataset
from transformers import AutoTokenizer, PaliGemmaForConditionalGeneration, PaliGemmaProcessor
import torch
import os
import torch
from PIL import Image
from transformers import TrainingArguments, Trainer

def collate_fn(examples):
  texts = ["answer " + example["question"] + "\n" + example['multiple_choice_answer'] for example in examples]
  images = [example["image"].convert("RGB") for example in examples]

  tokens = processor(text=texts, images=images,
                    return_tensors="pt", padding="longest",
                    tokenize_newline_separately=False)

  labels = tokens["input_ids"].clone()#.squeeze()

  labels[labels == processor.tokenizer.pad_token_id] = -100
  labels[labels == 256000] = -100
  tokens["labels"] = labels

  tokens = tokens.to(DTYPE).to("cuda")
  return tokens

ds = load_dataset('HuggingFaceM4/VQAv2', split="train")

ds_remove = ["question_type", "answer_type", "answers", "image_id", "question_id"]
ds = ds.remove_columns(ds_remove)

model_id = "google/paligemma-3b-pt-224"
model = PaliGemmaForConditionalGeneration.from_pretrained(model_id, torch_dtype=torch.bfloat16) 
processor = PaliGemmaProcessor.from_pretrained(model_id)
print("initialized processor")

DTYPE = model.dtype

for param in model.vision_tower.parameters():
    param.requires_grad = False

    # todo: try again with projector unfrozen
for param in model.multi_modal_projector.parameters():
    param.requires_grad = False

ds = ds.train_test_split(test_size=0.1)
train_ds = ds["train"]
val_ds = ds["test"]


args=TrainingArguments(
            num_train_epochs=2,
            remove_unused_columns=False,
            per_device_train_batch_size=4,
            gradient_accumulation_steps=4,
            warmup_steps=2,
            learning_rate=2e-5,
            weight_decay=1e-6,
            adam_beta2=0.999,
            logging_steps=100,
            output_dir="./output10",
            optim="adamw_hf",
            save_strategy="steps",
            save_steps=1000,
            #optim="paged_adamw_8bit",
            push_to_hub=True,
            save_total_limit=1,
            bf16=True,
            report_to=["tensorboard"],
            dataloader_pin_memory=False
        )

trainer = Trainer(
        model=model,
        train_dataset=train_ds,
        eval_dataset=val_ds,
        data_collator=collate_fn,
        args=args
        )
print("initialized trainer")
print("Current device:", trainer.model.device)

trainer.train()

trainer.push_to_hub()

merve

Google org 17 days ago

•

edited 14 days ago

@NickyNicky in blog post I forgot to pass in remove_unused_columns=False hence the error 🤦‍♀️ irrelevant but using data collator we also need to pass dataloader_pin_memory=False (related if you load data from CPU to GPU)

NickyNicky

16 days ago

Don't worry, we all made mistakes, thank you very much for the prompt response, I'm going to try the code.

NickyNicky

16 days ago

•

edited 16 days ago

I also have another question, wasn't this model trained with a template?

How does the model know what the beginning and end of a response is without the tokens or what were the ones used for this model?

Your code.

def collate_fn(examples):
  texts = ["answer " + example["question"] + "\n" + example['multiple_choice_answer'] for example in examples]
  images = [example["image"].convert("RGB") for example in examples]

  tokens = processor(text=texts, images=images,
                    return_tensors="pt", padding="longest",
                    tokenize_newline_separately=False)

  labels = tokens["input_ids"].clone()#.squeeze()

  labels[labels == processor.tokenizer.pad_token_id] = -100
  labels[labels == 256000] = -100
  tokens["labels"] = labels

  tokens = tokens.to(DTYPE).to("cuda")
  return tokens

I added this code but I don't know if it's right.

device = "cuda"

image_token = processor.tokenizer.convert_tokens_to_ids("<image>")
def collate_fn(examples):
  
  # texts = ["answer " + example["question"] + "\n" + example['multiple_choice_answer'] for example in examples]
  # prompt= template.replace("{text_user}",example["question"]).replace("{text_user}",example['multiple_choice_answer'])
  template= """<bos><start_of_turn>system\nyou are a useful AI.<end_of_turn>\n<start_of_turn>user\n{text_user}<end_of_turn>\n<start_of_turn>model\n{text_model}<end_of_turn><eos>"""
  texts = [template.replace("{text_user}",example["question"]).replace("{text_model}",example['multiple_choice_answer']) for example in examples]
  images = [example["image"].convert("RGB") for example in examples]
  tokens = processor(text=texts, images=images,
                    return_tensors="pt", padding="longest",
                    tokenize_newline_separately=False)
  labels = tokens["input_ids"].clone()
  labels[labels == processor.tokenizer.pad_token_id] = -100
  labels[labels == image_token] = -100
  tokens["labels"] = labels
  tokens = tokens.to(torch.bfloat16).to(device)
  return tokens

new code collate_fn.

template= """<bos><start_of_turn>system\nyou are a useful AI.<end_of_turn>\n<start_of_turn>user\n{text_user}<end_of_turn>\n<start_of_turn>model\n{text_model}<end_of_turn><eos>"""
texts = [template.replace("{text_user}",example["question"]).replace("{text_model}",example['multiple_choice_answer']) for example in examples]

NickyNicky

16 days ago

Hello, SFTTrainer is just a wrapper around the Trainer so I think it should work although it has some features on top like neftune which I don't know if they would work. About Gemma warnings you can ignore them. For index error let me check, I've wrote that part and I ran it a ton of times so it shouldn't've happened 😅

can be used without problems.

neftune_noise_alpha = 10 and AdaLora and LoftQ.

merve

Google org 14 days ago

•

edited 14 days ago

@NickyNicky this is not a conversation/multiturn type of model really, it's a single turn model, and newline is conditioning model to generate the responses here, that's also why the newline tokenization flag is needed during FT but not inference. eos token could be added maybe but not heavy chat templates

NickyNicky

14 days ago

thank you so much.

close.

NickyNicky changed discussion status to closed 14 days ago

taesiri

14 days ago

Hello @merve ,

I ran your code and fine-tuned Paligemma, but the output model is behaving strangely and replying with more questions. Here is the demo space: https://huggingface.co/spaces/taesiri/sample-paligemma-finetuned.

I am getting this warning when loading the model:

The tokenizer class you load from this checkpoint is 'LlamaTokenizer'. 
The class this function is called from is 'GemmaTokenizerFast'.

Are we sure that the training dataset format, tokenizer, and other configurations are set correctly? How can I debug this? Many thanks. 🤗🤗

edmond

10 days ago

@NickyNicky this is not a conversation/multiturn type of model really, it's a single turn model, and newline is conditioning model to generate the responses here, that's also why the newline tokenization flag is needed during FT but not inference. eos token could be added maybe but not heavy chat templates

Can you please put documentation on this and on how tokens are managed for training without depending on the Trainer wrapper ?

merve

Google org 9 days ago

Hello, we have made a few changes which also include API changes around preprocessing for finetuning, you can refer to this notebook: https://colab.research.google.com/drive/1x_OEphRK0H97DqqxEyiMewqsTiLD_Xmi?usp=sharing