ALPINE: Unveiling the Planning Capability of Autoregressive Learning in Language Models Paper • 2405.09220 • Published 2 days ago • 15
Xmodel-VLM: A Simple Baseline for Multimodal Vision Language Model Paper • 2405.09215 • Published 2 days ago • 9
Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots Paper • 2405.07990 • Published 4 days ago • 15
SambaNova SN40L: Scaling the AI Memory Wall with Dataflow and Composition of Experts Paper • 2405.07518 • Published 4 days ago • 19
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models Paper • 2403.13372 • Published Mar 20 • 53
Self-Play Preference Optimization for Language Model Alignment Paper • 2405.00675 • Published 16 days ago • 18
Customizing Text-to-Image Models with a Single Image Pair Paper • 2405.01536 • Published 15 days ago • 17
Clover: Regressive Lightweight Speculative Decoding with Sequential Knowledge Paper • 2405.00263 • Published 16 days ago • 13
SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound Paper • 2405.00233 • Published 16 days ago • 12
Invisible Stitch: Generating Smooth 3D Scenes with Depth Inpainting Paper • 2404.19758 • Published 17 days ago • 9
Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation Paper • 2404.19752 • Published 17 days ago • 17
InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation Paper • 2404.19427 • Published 17 days ago • 64
Stylus: Automatic Adapter Selection for Diffusion Models Paper • 2404.18928 • Published 18 days ago • 14
DressCode: Autoregressively Sewing and Generating Garments from Text Guidance Paper • 2401.16465 • Published Jan 29 • 7
Kangaroo: Lossless Self-Speculative Decoding via Double Early Exiting Paper • 2404.18911 • Published 18 days ago • 26
BlenderAlchemy: Editing 3D Graphics with Vision-Language Models Paper • 2404.17672 • Published 20 days ago • 17
Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models Paper • 2404.18796 • Published 18 days ago • 62
MaPa: Text-driven Photorealistic Material Painting for 3D Shapes Paper • 2404.17569 • Published 21 days ago • 10
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning Paper • 2404.16994 • Published 21 days ago • 30
Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis Paper • 2404.13686 • Published 26 days ago • 25
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone Paper • 2404.14219 • Published 25 days ago • 230
The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions Paper • 2404.13208 • Published 27 days ago • 37
MeshLRM: Large Reconstruction Model for High-Quality Mesh Paper • 2404.12385 • Published 29 days ago • 23
CompGS: Efficient 3D Scene Representation via Compressed Gaussian Splatting Paper • 2404.09458 • Published Apr 15 • 6
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length Paper • 2404.08801 • Published Apr 12 • 61
Video2Game: Real-time, Interactive, Realistic and Browser-Compatible Environment from a Single Video Paper • 2404.09833 • Published Apr 15 • 27
WILBUR: Adaptive In-Context Learning for Robust and Accurate Web Agents Paper • 2404.05902 • Published Apr 8 • 20
From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples Paper • 2404.07544 • Published Apr 11 • 15
Transferable and Principled Efficiency for Open-Vocabulary Segmentation Paper • 2404.07448 • Published Apr 11 • 10
Audio Dialogues: Dialogues dataset for audio and music understanding Paper • 2404.07616 • Published Apr 11 • 14
RULER: What's the Real Context Size of Your Long-Context Language Models? Paper • 2404.06654 • Published Apr 9 • 31
Urban Architect: Steerable 3D Urban Scene Generation with Layout Prior Paper • 2404.06780 • Published Apr 10 • 9
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention Paper • 2404.07143 • Published Apr 10 • 92
RealmDreamer: Text-Driven 3D Scene Generation with Inpainting and Depth Diffusion Paper • 2404.07199 • Published Apr 10 • 22
SwapAnything: Enabling Arbitrary Object Swapping in Personalized Visual Editing Paper • 2404.05717 • Published Apr 8 • 23
PhysAvatar: Learning the Physics of Dressed 3D Avatars from Visual Observations Paper • 2404.04421 • Published Apr 5 • 14
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs Paper • 2404.05719 • Published Apr 8 • 57
Freditor: High-Fidelity and Transferable NeRF Editing by Frequency Decomposition Paper • 2404.02514 • Published Apr 3 • 9
InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation Paper • 2404.02733 • Published Apr 3 • 19
Condition-Aware Neural Network for Controlled Image Generation Paper • 2404.01143 • Published Apr 1 • 11
MaGRITTe: Manipulative and Generative 3D Realization from Image, Topview and Text Paper • 2404.00345 • Published Mar 30 • 16
InterHandGen: Two-Hand Interaction Generation via Cascaded Reverse Diffusion Paper • 2403.17422 • Published Mar 26 • 1
TRIP: Temporal Residual Learning with Image Noise Prior for Image-to-Video Diffusion Models Paper • 2403.17005 • Published Mar 25 • 13
Vid2Robot: End-to-end Video-conditioned Policy Learning with Cross-Attention Transformers Paper • 2403.12943 • Published Mar 19 • 13