RichardForests (Richrich)

upvoted a paper 4 days ago

Diffusion Model Alignment Using Direct Preference Optimization

Paper • 2311.12908 • Published Nov 21, 2023 • 47

upvoted 4 papers 9 days ago

upvoted 8 papers 12 days ago

Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory

Paper • 2405.08707 • Published 19 days ago • 25

Chameleon: Mixed-Modal Early-Fusion Foundation Models

Paper • 2405.09818 • Published 18 days ago • 96

LoRA Learns Less and Forgets Less

Paper • 2405.09673 • Published 18 days ago • 73

Toon3D: Seeing Cartoons from a New Perspective

Paper • 2405.10320 • Published 17 days ago • 19

Observational Scaling Laws and the Predictability of Language Model Performance

Paper • 2405.10938 • Published 16 days ago • 10

Layer-Condensed KV Cache for Efficient Inference of Large Language Models

Paper • 2405.10637 • Published 17 days ago • 16

Towards Modular LLMs by Building and Reusing a Library of LoRAs

Paper • 2405.11157 • Published 16 days ago • 23

FIFO-Diffusion: Generating Infinite Videos from Text without Training

Paper • 2405.11473 • Published 15 days ago • 52

upvoted a collection 12 days ago

PEFT

Collection

183 items • Updated 7 days ago • 11

upvoted a paper 12 days ago

MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning

Paper • 2405.12130 • Published 13 days ago • 41

upvoted a collection 12 days ago

[lecture artifacts] aligning open language models

Collection

artifacts referenced in the talk timeline! Slides: https://docs.google.com/presentation/d/1quMyI4BAx4rvcDfk8jjv063bmHg4RxZd9mhQloXpMn0/edit?usp=sharin • 63 items • Updated Apr 17 • 47

upvoted a paper 16 days ago

CAT3D: Create Anything in 3D with Multi-View Diffusion Models

Paper • 2405.10314 • Published 17 days ago • 37

upvoted 2 articles 21 days ago

Article

Easily Train Models with H100 GPUs on NVIDIA DGX Cloud

Mar 18

• 1

Article

🤗 PEFT: Parameter-Efficient Fine-Tuning of Billion-Scale Models on Low-Resource Hardware

Feb 10, 2023

• 17

upvoted 8 papers 25 days ago

FLAME: Factuality-Aware Alignment for Large Language Models

Paper • 2405.01525 • Published May 2 • 21

StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation

Paper • 2405.01434 • Published May 2 • 45

Granite Code Models: A Family of Open Foundation Models for Code Intelligence

Paper • 2405.04324 • Published 26 days ago • 14

SAGS: Structure-Aware 3D Gaussian Splatting

Paper • 2404.19149 • Published Apr 29 • 12

GS-LRM: Large Reconstruction Model for 3D Gaussian Splatting

Paper • 2404.19702 • Published Apr 30 • 15

InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation

Paper • 2404.19427 • Published Apr 30 • 65

KAN: Kolmogorov-Arnold Networks

Paper • 2404.19756 • Published Apr 30 • 97

LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report

Paper • 2405.00732 • Published Apr 29 • 115

upvoted a paper 28 days ago

Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models

Paper • 2405.01535 • Published May 2 • 103

upvoted a paper about 1 month ago

Iterative Reasoning Preference Optimization

Paper • 2404.19733 • Published Apr 30 • 41

upvoted an article about 1 month ago

Article

Powerful ASR + diarization + speculative decoding with Hugging Face Inference Endpoints

May 1

• 53

upvoted a paper about 1 month ago

Invisible Stitch: Generating Smooth 3D Scenes with Depth Inpainting

Paper • 2404.19758 • Published Apr 30 • 9

upvoted an article about 1 month ago

Article

Improving Prompt Consistency with Structured Generations

Apr 30

• 46

upvoted 3 papers about 1 month ago

MicroDreamer: Zero-shot 3D Generation in sim20 Seconds by Score-based Iterative Reconstruction

Paper • 2404.19525 • Published Apr 30 • 8

Layer Skip: Enabling Early Exit Inference and Self-Speculative Decoding

Paper • 2404.16710 • Published Apr 25 • 56

OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework

Paper • 2404.14619 • Published Apr 22 • 122

upvoted 2 articles about 1 month ago

Article

Jack of All Trades, Master of Some, a Multi-Purpose Transformer Agent

Apr 22

• 73

Article

Design choices for Vision Language Models in 2024

By

•

Apr 16

• 20

upvoted a paper about 1 month ago

Multi-Head Mixture-of-Experts

Paper • 2404.15045 • Published Apr 23 • 55

upvoted a collection about 1 month ago

Open-Bezoar

Collection

Small, Cost-Effective and Open Models Trained on Mixes of Instruction Data • 7 items • Updated Apr 19 • 6

upvoted a paper about 1 month ago

OpenBezoar: Small, Cost-Effective and Open Models Trained on Mixes of Instruction Data

Paper • 2404.12195 • Published Apr 18 • 11

upvoted 3 papers about 2 months ago

DreamScene360: Unconstrained Text-to-3D Scene Generation with Panoramic Gaussian Splatting

Paper • 2404.06903 • Published Apr 10 • 14

RealmDreamer: Text-Driven 3D Scene Generation with Inpainting and Depth Diffusion

Paper • 2404.07199 • Published Apr 10 • 22

JetMoE: Reaching Llama2 Performance with 0.1M Dollars

Paper • 2404.07413 • Published Apr 11 • 32

upvoted a collection about 2 months ago

DPO vs KTO vs IPO

Collection

A collection of datasets and models used for the Aligning LLMs with Direct Preference Optimization Methods blogpost • 2 items • Updated Jan 16 • 11

upvoted 3 papers about 2 months ago

Zephyr: Direct Distillation of LM Alignment

Paper • 2310.16944 • Published Oct 25, 2023 • 116

Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models

Paper • 2404.07973 • Published Apr 11 • 28

RecurrentGemma: Moving Past Transformers for Efficient Open Language Models

Paper • 2404.07839 • Published Apr 11 • 39

upvoted an article about 2 months ago

Article

Mixture of Depth is Vibe

By

•

Apr 22

• 36

upvoted a paper about 2 months ago

MoMA: Multimodal LLM Adapter for Fast Personalized Image Generation

Paper • 2404.05674 • Published Apr 8 • 11

upvoted an article about 2 months ago

Article

CodeGemma - an official Google release for code LLMs

Apr 9

• 97

upvoted a paper about 2 months ago

Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models

Paper • 2402.17177 • Published Feb 27 • 87

upvoted a collection about 2 months ago

VisionLM

Collection

118 items • Updated 3 days ago • 5

upvoted an article about 2 months ago

Article

DS-MoE: Making MoE Models More Efficient and Less Memory-Intensive

By

•

Apr 9

• 26

upvoted a collection about 2 months ago

Eurus

Collection

Advancing LLM Reasoning Generalists with Preference Trees • 11 items • Updated Apr 15 • 22

upvoted 3 papers about 2 months ago

CameraCtrl: Enabling Camera Control for Text-to-Video Generation

Paper • 2404.02101 • Published Apr 2 • 17

VideoMamba: State Space Model for Efficient Video Understanding

Paper • 2403.06977 • Published Mar 11 • 22

SiMBA: Simplified Mamba-Based Architecture for Vision and Multivariate Time series

Paper • 2403.15360 • Published Mar 22 • 11

upvoted 2 collections about 2 months ago

Papers - Coding - Chain of Thought

Collection

3 items • Updated Apr 22 • 1

Papers - Fine-tuning - DPO

Collection

Refer to additional papers: https://link.springer.com/article/10.1007/s10994-014-5458-8 and https://link.springer.com/article/10.1007/BF00992696 • 17 items • Updated about 1 month ago • 1

upvoted a paper about 2 months ago

AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent

Paper • 2404.03648 • Published Apr 4 • 22

Richrich

AI & ML interests

Organizations

RichardForests's activity

Easily Train Models with H100 GPUs on NVIDIA DGX Cloud

🤗 PEFT: Parameter-Efficient Fine-Tuning of Billion-Scale Models on Low-Resource Hardware

Powerful ASR + diarization + speculative decoding with Hugging Face Inference Endpoints

Improving Prompt Consistency with Structured Generations

Jack of All Trades, Master of Some, a Multi-Purpose Transformer Agent

Design choices for Vision Language Models in 2024

Mixture of Depth is Vibe

CodeGemma - an official Google release for code LLMs

DS-MoE: Making MoE Models More Efficient and Less Memory-Intensive