Introducing Idefics2: A Powerful 8B Vision-Language Model for the community
•
125
It was roughly trained for 1 month on 32 nodes of 8 H100s
Thanks for the feedback. If flash attention is a problem you can always enable/disable it in the loading of the model
We will publish all the details on how the foundation model was trained during its release!