Ha0 (Ha-Yeong Choi)

upvoted a paper 1 day ago

DITTO-2: Distilled Diffusion Inference-Time T-Optimization for Music Generation

Paper • 2405.20289 • Published 3 days ago • 6

upvoted a paper 10 days ago

RectifID: Personalizing Rectified Flow with Anchored Classifier Guidance

Paper • 2405.14677 • Published 11 days ago • 8

upvoted a paper 18 days ago

Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

Paper • 2405.08748 • Published 20 days ago • 17

upvoted a paper about 1 month ago

FlashSpeech: Efficient Zero-Shot Speech Synthesis

Paper • 2404.14700 • Published Apr 23 • 28

upvoted 3 papers about 2 months ago

ByteEdit: Boost, Comply and Accelerate Generative Image Editing

Paper • 2404.04860 • Published Apr 7 • 24

RL for Consistency Models: Faster Reward Guided Text-to-Image Generation

Paper • 2404.03673 • Published Mar 25 • 14

Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction

Paper • 2404.02905 • Published Apr 3 • 60

upvoted 3 papers 2 months ago

Gamba: Marry Gaussian Splatting with Mamba for single view 3D reconstruction

Paper • 2403.18795 • Published Mar 27 • 17

SiMBA: Simplified Mamba-Based Architecture for Vision and Multivariate Time series

Paper • 2403.15360 • Published Mar 22 • 11

Vid2Robot: End-to-end Video-conditioned Policy Learning with Cross-Attention Transformers

Paper • 2403.12943 • Published Mar 19 • 13

upvoted 4 papers 3 months ago

Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models

Paper • 2402.17177 • Published Feb 27 • 87

upvoted a collection 3 months ago

Sora Reference Papers

Collection

A collection of all papers referenced in OpenAI's "Video generation models as world simulators" technical report • openai.com/sora • 30 items • Updated Feb 20 • 50

upvoted 6 papers 4 months ago

BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data

Paper • 2402.08093 • Published Feb 12 • 52

ReplaceAnything3D:Text-Guided 3D Scene Editing with Compositional Neural Radiance Fields

Paper • 2401.17895 • Published Jan 31 • 15

AnimateLCM: Accelerating the Animation of Personalized Diffusion Models and Adapters with Decoupled Consistency Learning

Paper • 2402.00769 • Published Feb 1 • 18

Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling

Paper • 2401.15977 • Published Jan 29 • 34

MambaByte: Token-free Selective State Space Model

Paper • 2401.13660 • Published Jan 24 • 47

Self-Rewarding Language Models

Paper • 2401.10020 • Published Jan 18 • 135

upvoted 10 papers 5 months ago

Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

Paper • 2401.09417 • Published Jan 17 • 51

PALP: Prompt Aligned Personalization of Text-to-Image Models

Paper • 2401.06105 • Published Jan 11 • 46

TOFU: A Task of Fictitious Unlearning for LLMs

Paper • 2401.06121 • Published Jan 11 • 14

PIXART-δ: Fast and Controllable Image Generation with Latent Consistency Models

Paper • 2401.05252 • Published Jan 10 • 43

Pheme: Efficient and Conversational Speech Generation

Paper • 2401.02839 • Published Jan 5 • 14

TinyLlama: An Open-Source Small Language Model

Paper • 2401.02385 • Published Jan 4 • 81

Audiobox: Unified Audio Generation with Natural Language Prompts

Paper • 2312.15821 • Published Dec 25, 2023 • 12

VCoder: Versatile Vision Encoders for Multimodal Large Language Models

Paper • 2312.14233 • Published Dec 21, 2023 • 14

DREAM-Talk: Diffusion-based Realistic Emotional Audio-driven Method for Single Image Talking Face Generation

Paper • 2312.13578 • Published Dec 21, 2023 • 23

StreamDiffusion: A Pipeline-level Solution for Real-time Interactive Generation

Paper • 2312.12491 • Published Dec 19, 2023 • 66

upvoted 4 papers 6 months ago

StemGen: A music generation model that listens

Paper • 2312.08723 • Published Dec 14, 2023 • 45

PG-Video-LLaVA: Pixel Grounding Large Video-Language Models

Paper • 2311.13435 • Published Nov 22, 2023 • 15

Diffusion Model Alignment Using Direct Preference Optimization

Paper • 2311.12908 • Published Nov 21, 2023 • 47

HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation of Speech by Hierarchical Variational Inference for Zero-shot Speech Synthesis

Paper • 2311.12454 • Published Nov 21, 2023 • 27

Ha-Yeong Choi

AI & ML interests

Organizations

Ha0's activity