Omar Sanseviero's picture

Omar Sanseviero

osanseviero

·

https://osanseviero.github.io/hackerllama/

AI & ML interests

Llamas, model merging, massive ASR for data collection, 3D ML, on-device ML, quantization, model judging, ML in browser, healthcare applications, education, intersection of art and ML.🦙

Articles

Welcome Llama 3 - Meta's new open LLM

CodeGemma - an official Google release for code LLMs

🪆 Introduction to Matryoshka Embedding Models

Welcome Gemma - Google's new open LLM

Constitutional AI with Open LLMs

Preference Tuning LLMs with Direct Preference Optimization Methods

Mixture of Experts Explained

Welcome Mixtral - a SOTA Mixture of Experts on Hugging Face

Inference for PROs

Spread Your Wings: Falcon 180B is here

Code Llama: Llama 2 learns to code

Results of the Open Source AI Game Jam

Llama 2 is here - get it on Hugging Face

The Falcon has landed in the Hugging Face ecosystem

Hugging Face Machine Learning Demos on arXiv

What's new in Diffusers? 🎨

Announcing Evaluation on the Hub

An Introduction to Deep Reinforcement Learning

Welcome spaCy to the 🤗 Hub

Sentence Transformers in the 🤗 Hub

Organizations

osanseviero's activity

upvoted an article about 8 hours ago

Article

Adapt custom AI models to the trainer API and to 🤗

By

•

3 days ago

• 13

upvoted an article about 13 hours ago

Article

Multimodal Augmentation for Documents: Recovering “Comprehension” in “Reading and Comprehension” task

By

•

about 18 hours ago

• 11

upvoted 3 papers about 19 hours ago

BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation

Paper • 2405.09546 • Published 1 day ago • 6

Xmodel-VLM: A Simple Baseline for Multimodal Vision Language Model

Paper • 2405.09215 • Published 2 days ago • 9

ALPINE: Unveiling the Planning Capability of Autoregressive Learning in Language Models

Paper • 2405.09220 • Published 2 days ago • 15

upvoted 2 articles about 20 hours ago

Article

2024-04-22 - Hub Incident Post Mortem

By

•

about 8 hours ago

• 14

Article

Hugging Face + Google Visual Blocks

By

•

about 10 hours ago

• 9

upvoted 8 papers 1 day ago

SpeechVerse: A Large-scale Generalizable Audio Language Model

Paper • 2405.08295 • Published 3 days ago • 7

SpeechGuard: Exploring the Adversarial Robustness of Multimodal Large Language Models

Paper • 2405.08317 • Published 3 days ago • 7

Understanding the performance gap between online and offline alignment algorithms

Paper • 2405.08448 • Published 3 days ago • 9

No Time to Waste: Squeeze Time into Channel for Mobile Video Understanding

Paper • 2405.08344 • Published 3 days ago • 9

Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory

Paper • 2405.08707 • Published 3 days ago • 18

Coin3D: Controllable and Interactive 3D Assets Generation with Proxy-Guided Conditioning

Paper • 2405.08054 • Published 3 days ago • 14

Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

Paper • 2405.08748 • Published 2 days ago • 13

VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion Models

Paper • 2403.06098 • Published Mar 10 • 15

upvoted a collection 1 day ago

Embedding Model Datasets

A curated subset of the datasets that work out of the box with Sentence Transformers: https://huggingface.co/datasets?other=sentence-transformers • 49 items • Updated 1 day ago • 10

upvoted 7 papers 1 day ago

Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models

Paper • 2405.01535 • Published 14 days ago • 90

MS MARCO Web Search: a Large-scale Information-rich Web Dataset with Millions of Real Click Labels

Paper • 2405.07526 • Published 4 days ago • 12

Piccolo2: General Text Embedding with Multi-task Hybrid Loss Training

Paper • 2405.06932 • Published 6 days ago • 14

LogoMotion: Visually Grounded Code Generation for Content-Aware Animation

Paper • 2405.07065 • Published 5 days ago • 13

Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots

Paper • 2405.07990 • Published 3 days ago • 15

SUTRA: Scalable Multilingual Language Model Architecture

Paper • 2405.06694 • Published 9 days ago • 33

What matters when building vision-language models?

Paper • 2405.02246 • Published 13 days ago • 71

upvoted a collection 2 days ago

PaliGemma FT Models

108 items • Updated 2 days ago • 8

upvoted an article 2 days ago

Article

PaliGemma – Google's Cutting-Edge Open Vision Language Model

3 days ago

• 84

upvoted a collection 2 days ago

PaliGemma Release

Pretrained and mix checkpoints for PaliGemma • 10 items • Updated 2 days ago • 80

upvoted a collection 3 days ago

SFR-Instruct-LLaMA-3-8B-R

3 items • Updated 3 days ago • 13

upvoted an article 3 days ago

Article

It's raining diffusion personalization techniques☔️🎭🖼️

By

•

Apr 11

• 16

upvoted a paper 3 days ago

RLHF Workflow: From Reward Modeling to Online RLHF

Paper • 2405.07863 • Published 4 days ago • 51

upvoted a paper 4 days ago

CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts

Paper • 2405.05949 • Published 7 days ago • 2

upvoted a collection 4 days ago

MAmmoTH2

Scaling up instruction data from the web for to build better LLMs • 10 items • Updated 6 days ago • 4

upvoted 3 papers 4 days ago

Multi-Head Mixture-of-Experts

Paper • 2404.15045 • Published 24 days ago • 53

OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework

Paper • 2404.14619 • Published 24 days ago • 120

WildChat: 1M ChatGPT Interaction Logs in the Wild

Paper • 2405.01470 • Published 14 days ago • 52

upvoted an article 4 days ago

Article

Preference Tuning LLMs with Direct Preference Optimization Methods

Jan 18

• 17

upvoted a collection 5 days ago

Yi-1.5 (2024/05)

6 items • Updated 4 days ago • 59

upvoted an article 5 days ago

Article

Introducing RWKV — An RNN with the advantages of a transformer

May 15, 2023

• 3

upvoted a paper 6 days ago

LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report

Paper • 2405.00732 • Published 18 days ago • 105

upvoted an article 6 days ago

Article

🦙⚗️ Using Llama3 and distilabel to build fine-tuning datasets

By

•

20 days ago

• 54

upvoted a collection 6 days ago

Searching for Better ViT Baselines

Exploring ViT hparams and model shapes for the GPU poor (between tiny and base). • 15 items • Updated 3 days ago • 8

upvoted a collection 7 days ago

Neo-Models

Neo • 7 items • Updated 8 days ago • 4

upvoted 3 articles 7 days ago

Article

Can we create pedagogically valuable multi-turn synthetic datasets from Cosmopedia?

By

•

10 days ago

• 6

Article

Energy Star Ratings for AI Models

By

•

8 days ago

• 13

Article

Getting Started with Sentiment Analysis using Python

Feb 2, 2022

• 7

upvoted 5 articles 9 days ago

Article

Introducing the Open Chain of Thought Leaderboard

24 days ago

• 20

Article

Open-source LLMs as LangChain Agents

Jan 24

• 9

Article

Introducing the Open Leaderboard for Hebrew LLMs!

12 days ago

• 23

Article

Powerful ASR + diarization + speculative decoding with Hugging Face Inference Endpoints

16 days ago

• 48

Article

Jack of All Trades, Master of Some, a Multi-Purpose Transformer Agent

25 days ago

• 71

upvoted 2 collections 10 days ago

Arctic

A collection of pre-trained dense-MoE Hybrid transformer models • 2 items • Updated 23 days ago • 18

Granite Code Models

A series of code models trained by IBM licensed under Apache 2.0 license. We release both the base pretrained and instruct models. • 10 items • Updated 5 days ago • 116

upvoted an article 17 days ago

Article

StarCoder2-Instruct: Fully Transparent and Permissive Self-Alignment for Code Generation

18 days ago

• 68

upvoted a collection 23 days ago

OpenELM Instruct Models

4 items • Updated Apr 12 • 96

upvoted a paper 24 days ago

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Paper • 2404.14219 • Published 25 days ago • 230

upvoted a collection 24 days ago

Quantized-FT-Orca-Math

Models trained during quantization aware fine-tuning experiments using PyTorch's FSDP. • 8 items • Updated about 1 month ago • 6

upvoted an article 27 days ago

Article

Fine-tune Llama 3 with ORPO

By

•

24 days ago

• 175

upvoted an article 28 days ago

Article

Welcome Llama 3 - Meta's new open LLM

29 days ago

• 238

upvoted a collection 28 days ago

Meta Llama 3

This collection hosts the transformers and original repos of the Meta Llama 3 and Llama Guard 2 releases • 5 items • Updated 28 days ago • 516

upvoted 2 collections 30 days ago

Idefics2 🐶

Idefics2-8B is a foundation vision-language model. In this collection, you will find the models, datasets and demo related to its creation. • 11 items • Updated 11 days ago • 75

fuck quadratic attention

11 items • Updated 23 days ago • 19