MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series Paper • 2405.19327 • Published 3 days ago • 34 • 2
Yuan 2.0-M32: Mixture of Experts with Attention Router Paper • 2405.17976 • Published 4 days ago • 15 • 2
Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning Paper • 2405.18386 • Published 4 days ago • 12 • 3
Vidu4D: Single Generated Video to High-Fidelity 4D Reconstruction with Dynamic Gaussian Surfels Paper • 2405.16822 • Published 5 days ago • 10 • 2
CraftsMan: High-fidelity Mesh Generation with 3D Native Generation and Interactive Geometry Refiner Paper • 2405.14979 • Published 8 days ago • 13 • 2
ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models Paper • 2405.15738 • Published 8 days ago • 41 • 7
Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation Paper • 2405.14598 • Published 9 days ago • 10 • 1
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data Paper • 2405.14333 • Published 9 days ago • 27 • 2
ReVideo: Remake a Video with Motion and Content Control Paper • 2405.13865 • Published 10 days ago • 19 • 4
Many-Shot In-Context Learning in Multimodal Foundation Models Paper • 2405.09798 • Published 16 days ago • 25 • 2
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning Paper • 2405.12130 • Published 12 days ago • 41 • 7
Personalized Residuals for Concept-Driven Text-to-Image Generation Paper • 2405.12978 • Published 11 days ago • 8 • 2
OmniGlue: Generalizable Feature Matching with Foundation Model Guidance Paper • 2405.12979 • Published 11 days ago • 7 • 2
Diffusion for World Modeling: Visual Details Matter in Atari Paper • 2405.12399 • Published 11 days ago • 25 • 3
Face Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and Attribute Control Paper • 2405.12970 • Published 11 days ago • 20 • 4
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework Paper • 2405.11143 • Published 12 days ago • 33 • 2
FIFO-Diffusion: Generating Infinite Videos from Text without Training Paper • 2405.11473 • Published 13 days ago • 50 • 8
Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding Paper • 2405.08748 • Published 18 days ago • 17 • 2
MS MARCO Web Search: a Large-scale Information-rich Web Dataset with Millions of Real Click Labels Paper • 2405.07526 • Published 19 days ago • 14 • 1
Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots Paper • 2405.07990 • Published 19 days ago • 15 • 4
RLHF Workflow: From Reward Modeling to Online RLHF Paper • 2405.07863 • Published 19 days ago • 57 • 5
StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation Paper • 2405.01434 • Published 30 days ago • 44 • 3