47 18 54

Hugo Laurençon

HugoLaurencon

AI & ML interests

None yet

Articles

Introducing Idefics2: A Powerful 8B Vision-Language Model for the community

18 days ago

• 93

Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset

Mar 15

• 3

Introducing IDEFICS: An Open Reproduction of State-of-the-art Visual Language Model

Aug 22, 2023

• 5

Putting ethical principles at the core of research lifecycle

May 19, 2022

Organizations

Posts 4

Post

1916

Idefics2 is trained mostly on OBELICS, our open interleaved image-text document dataset.

Training on interleaved data is crucial to reaching high performance on VQA tasks, taking an arbitrary number of images as input, and doing in-context learning.

Dataset: HuggingFaceM4/OBELICS
Nomic visualization: https://atlas.nomic.ai/map/f2fba2aa-3647-4f49-a0f3-9347daeee499/ee4a84bd-f125-4bcc-a683-1b4e231cb10f
Link to OBELICS thread: https://twitter.com/HugoLaurencon/status/1694005892839006301

Post

2479

The Cauldron is a massive collection of 50 high-quality datasets, all converted to the user/assistant format, and ready to use to fine-tune any Vision Language Model.

The Cauldron covers a wide range of tasks, including general visual question answering, counting, captioning, text transcription, document understanding, chart/figure understanding, table understanding, visual reasoning, geometry, spotting differences between 2 images or converting a screenshot to a code.

HuggingFaceM4/the_cauldron

View all posts