@merve on Hugging Face: "New open Vision Language Model by @Google: PaliGemma 💙🤍 📝 Comes in 3B…"

Hugging Face

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Back to feed

merve

posted an update 26 days ago

Post

1627

New open Vision Language Model by @Google : PaliGemma 💙🤍

📝 Comes in 3B, pretrained, mix and fine-tuned models in 224, 448 and 896 resolution
🧩 Combination of Gemma 2B LLM and SigLIP image encoder
🤗 Supported in transformers

PaliGemma can do..
🧩 Image segmentation and detection! 🤯
📑 Detailed document understanding and reasoning
🙋 Visual question answering, captioning and any other VLM task!

Read our blog 🔖 hf.co/blog/paligemma
Try the demo 🪀 hf.co/spaces/google/paligemma
Check out the Spaces and the models all in the collection 📚 google/paligemma-release-6643a9ffbf57de2ae0448dda
Collection of fine-tuned PaliGemma models google/paligemma-ft-models-6643b03efb769dad650d2dda

MoonRide

26 days ago

Nice scores in benchmarks, but it failed at my first test image: https://huggingface.co/google/paligemma-3b-mix-448/discussions/2

It might be something wrong with demo space configuration, or... we need better benchmarks.

merve

26 days ago

•

edited 26 days ago

@MoonRide it's not about benchmarks, but the training dataset of the mix checkpoint is different than your use case. I responded on your issue with more details.

Cuiunbo

26 days ago

•

edited 26 days ago

Hi! nice work!
I tried this model and it is more than capable of doing what I thought it could do, it's awesome! I have some questions about some of the details I would like to ask.
Is the training data mentioned in the blog all the training data, and did paligemma have any other training data that is not mentioned?
is there any plan to open-source a chatty model?

merve

26 days ago

@Cuiunbo I think @giffmana et al will release a technical report in the upcoming days. for mix models and finetuned models the details should be in the model cards. for chatty model I think it's not the intention of this release.

In this post