CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data Paper • 2404.15653 • Published Apr 24 • 24
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning Paper • 2405.12130 • Published 14 days ago • 42
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention Paper • 2405.12981 • Published 13 days ago • 23
LiteVAE: Lightweight and Efficient Variational Autoencoders for Latent Diffusion Models Paper • 2405.14477 • Published 11 days ago • 15
Tele-Aloha: A Low-budget and High-authenticity Telepresence System Using Sparse RGB Cameras Paper • 2405.14866 • Published 11 days ago • 5
Transformers Can Do Arithmetic with the Right Embeddings Paper • 2405.17399 • Published 7 days ago • 47
VeLoRA: Memory Efficient Training using Rank-1 Sub-Token Projections Paper • 2405.17991 • Published 6 days ago • 9
Jina CLIP: Your CLIP Model Is Also Your Text Retriever Paper • 2405.20204 • Published 4 days ago • 19