Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection Paper • 2405.10300 • Published 4 days ago • 16
TRANSIC: Sim-to-Real Policy Transfer by Learning from Online Correction Paper • 2405.10315 • Published 4 days ago • 7
Dual3D: Efficient and Consistent Text-to-3D Generation with Dual-mode Multi-view Latent Diffusion Paper • 2405.09874 • Published 4 days ago • 10
Many-Shot In-Context Learning in Multimodal Foundation Models Paper • 2405.09798 • Published 4 days ago • 22
CAT3D: Create Anything in 3D with Multi-View Diffusion Models Paper • 2405.10314 • Published 4 days ago • 29
Naturalistic Music Decoding from EEG Data via Latent Diffusion Models Paper • 2405.09062 • Published 5 days ago • 4
BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation Paper • 2405.09546 • Published 5 days ago • 6
Xmodel-VLM: A Simple Baseline for Multimodal Vision Language Model Paper • 2405.09215 • Published 5 days ago • 12
ALPINE: Unveiling the Planning Capability of Autoregressive Learning in Language Models Paper • 2405.09220 • Published 5 days ago • 20
SpeechGuard: Exploring the Adversarial Robustness of Multimodal Large Language Models Paper • 2405.08317 • Published 6 days ago • 7
SpeechVerse: A Large-scale Generalizable Audio Language Model Paper • 2405.08295 • Published 6 days ago • 8
Understanding the performance gap between online and offline alignment algorithms Paper • 2405.08448 • Published 6 days ago • 10
No Time to Waste: Squeeze Time into Channel for Mobile Video Understanding Paper • 2405.08344 • Published 6 days ago • 9
Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory Paper • 2405.08707 • Published 6 days ago • 21
Coin3D: Controllable and Interactive 3D Assets Generation with Proxy-Guided Conditioning Paper • 2405.08054 • Published 7 days ago • 17
Compositional Text-to-Image Generation with Dense Blob Representations Paper • 2405.08246 • Published 7 days ago • 11
Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding Paper • 2405.08748 • Published 6 days ago • 15
MS MARCO Web Search: a Large-scale Information-rich Web Dataset with Millions of Real Click Labels Paper • 2405.07526 • Published 7 days ago • 13
Piccolo2: General Text Embedding with Multi-task Hybrid Loss Training Paper • 2405.06932 • Published 9 days ago • 15
LogoMotion: Visually Grounded Code Generation for Content-Aware Animation Paper • 2405.07065 • Published 9 days ago • 14
Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots Paper • 2405.07990 • Published 7 days ago • 15
SambaNova SN40L: Scaling the AI Memory Wall with Dataflow and Composition of Experts Paper • 2405.07518 • Published 7 days ago • 19
SUTRA: Scalable Multilingual Language Model Architecture Paper • 2405.06694 • Published 13 days ago • 33
EMOPortraits: Emotion-enhanced Multimodal One-shot Head Avatars Paper • 2404.19110 • Published 21 days ago • 3
EXPRESSO: A Benchmark and Analysis of Discrete Expressive Speech Resynthesis Paper • 2308.05725 • Published Aug 10, 2023 • 1
GLiNER: Generalist Model for Named Entity Recognition using Bidirectional Transformer Paper • 2311.08526 • Published Nov 14, 2023 • 7
NuNER: Entity Recognition Encoder Pre-training via LLM-Annotated Data Paper • 2402.15343 • Published Feb 23 • 8
Salient Object-Aware Background Generation using Text-Guided Diffusion Models Paper • 2404.10157 • Published Apr 15 • 1
An Optimistic Acceleration of AMSGrad for Nonconvex Optimization Paper • 1903.01435 • Published Mar 4, 2019 • 1
Improving Generalization Performance by Switching from Adam to SGD Paper • 1712.07628 • Published Dec 20, 2017 • 1
AlignBench: Benchmarking Chinese Alignment of Large Language Models Paper • 2311.18743 • Published Nov 30, 2023 • 1
AniTalker: Animate Vivid and Diverse Talking Faces through Identity-Decoupled Facial Motion Encoding Paper • 2405.03121 • Published 15 days ago • 1
ImageInWords: Unlocking Hyper-Detailed Image Descriptions Paper • 2405.02793 • Published 16 days ago • 1
You Only Cache Once: Decoder-Decoder Architectures for Language Models Paper • 2405.05254 • Published 12 days ago • 5
FER-YOLO-Mamba: Facial Expression Detection and Classification Based on Selective State Space Paper • 2405.01828 • Published 17 days ago • 1
A decoder-only foundation model for time-series forecasting Paper • 2310.10688 • Published Oct 14, 2023 • 3
LLM-AD: Large Language Model based Audio Description System Paper • 2405.00983 • Published 18 days ago • 13
FLAME: Factuality-Aware Alignment for Large Language Models Paper • 2405.01525 • Published 18 days ago • 21
Customizing Text-to-Image Models with a Single Image Pair Paper • 2405.01536 • Published 18 days ago • 17
NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment Paper • 2405.01481 • Published 18 days ago • 20
StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation Paper • 2405.01434 • Published 18 days ago • 44
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report Paper • 2405.00732 • Published 21 days ago • 109
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models Paper • 2405.01535 • Published 18 days ago • 92
STT: Stateful Tracking with Transformers for Autonomous Driving Paper • 2405.00236 • Published 20 days ago • 7
Self-Play Preference Optimization for Language Model Alignment Paper • 2405.00675 • Published 19 days ago • 18
SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound Paper • 2405.00233 • Published 20 days ago • 12