Merve Noyan's picture

Merve Noyan PRO

merve

·

https://github.com/merveenoyan/smol-vision

AI & ML interests

VLMs, vision & co

Recent Activity

updated a dataset about 20 hours ago

vlmbook/images

published a dataset about 20 hours ago

vlmbook/images

View all activity

Organizations

Posts 100

Post

4005

sooo many open AI releases past week, let's summarize! 🤗
merve/april-11-releases-67fcd78be33d241c0977b9d2

multimodal
> Moonshot AI released Kimi VL Thinking, first working open-source multimodal reasoning model and Kimi VL Instruct, both 16B MoEs with 3B active params (OS)
> InternVL3 released based on Qwen2.5VL, 7 ckpts with various sizes (1B to 78B)

LLMs
> NVIDIA released Llama-3_1-Nemotron-Ultra-253B-v1 an LLM built on Llama 405B for reasoning, chat and tool use
> Agentica released DeepCoder-14B-Preview, fine-tuned version of DeepSeek-R1-Distilled-Qwen-14B on problem-test pairs, along with the compiled dataset
> Zyphra/ZR1-1.5B is a new small reasoning LLM built on R1-Distill-1.5B (OS)
> Skywork-OR1-32B-Preview is a new reasoning model by Skywork

Image Generation
> HiDream releases three new models, HiDream I1 Dev, I1 Full, and I1 fast for image generation (OS)

*OS ones have Apache 2.0 or MIT licenses

Articles 25

Article

75

Cohere on Hugging Face Inference Providers 🔥

View all Articles

Collections 50

spaces 105

Vision Papers

All paper summaries read by Merve

Running on Zero

ShieldGemma2 VLM

Demo for ShieldGemma 2, multimodal safety model

UDOP

Generate text from document images

Running on Zero

Paligemma2 Vqav2

PaliGemma2 LoRA finetuned on VQAv2

Running on Zero

OWLSAM

State-of-the-art open-vocabulary image segmentation ⚡️

Sam2.1

models 92

merve/SmolVLM2-500M-Video-Instruct-video-feedback

Image-Text-to-Text • Updated Feb 20 • 1

merve/SmolVLM2-500M-Video-Instruct-videofeedback

Image-Text-to-Text • Updated Feb 20 • 1

merve/SmolVLM2-500M-Video-Instruct-emotions

Image-Text-to-Text • Updated Feb 20 • 6

merve/colpali_ufo

Updated Dec 20, 2024 • 1

merve/paligemma_vqav2

Image-Text-to-Text • Updated Dec 18, 2024 • 148 • 13

merve/paligemma2-3b-vqav2

Updated Dec 5, 2024 • 93 • 6

merve/google-ckpts

Updated Oct 22, 2024

merve/google-tokenizers

Updated Oct 22, 2024

merve/idefics3-llama-vqav2

Updated Sep 11, 2024

merve/idefics3llama-vqav2

Updated Sep 11, 2024 • 8

datasets 28

merve/vlm_test_images

Viewer • Updated 11 days ago • 10 • 435 • 1

merve/retail-in-the-wild

Viewer • Updated Mar 6 • 20 • 181 • 2

merve/model-test-inputs

Updated Oct 21, 2024 • 28

merve/vqav2-small

Viewer • Updated Aug 8, 2024 • 21.4k • 1.5k • 11

merve/SGinW

Viewer • Updated Jul 11, 2024 • 16.7k • 341 • 1

merve/pascal-voc

Viewer • Updated Jul 6, 2024 • 336k • 934 • 1

merve/YouCook2

Viewer • Updated May 28, 2024 • 2k • 53

merve/faiss_embeddings

Updated Jan 25, 2024 • 41

merve/pokemon-ds-embeddings

Viewer • Updated Jan 10, 2024 • 833 • 30 • 4

merve/tr-h4-norobots

Updated Jan 7, 2024 • 51 • 10