papers - a chethan62 Collection

chethan62 's Collections

papers

models

papers

updated about 1 month ago

The Chosen One: Consistent Characters in Text-to-Image Diffusion Models

Paper • 2311.10093 • Published Nov 16, 2023 • 54
NeuroPrompts: An Adaptive Framework to Optimize Prompts for Text-to-Image Generation

Paper • 2311.12229 • Published Nov 20, 2023 • 26
Diffusion Model Alignment Using Direct Preference Optimization

Paper • 2311.12908 • Published Nov 21, 2023 • 47
VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models

Paper • 2312.00845 • Published Dec 1, 2023 • 36
AnimateZero: Video Diffusion Models are Zero-Shot Image Animators

Paper • 2312.03793 • Published Dec 6, 2023 • 17
One-dimensional Adapter to Rule Them All: Concepts, Diffusion Models and Erasing Applications

Paper • 2312.16145 • Published Dec 26, 2023 • 8
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones

Paper • 2312.16862 • Published Dec 28, 2023 • 28
PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding

Paper • 2312.04461 • Published Dec 7, 2023 • 49
Diffuse to Choose: Enriching Image Conditioned Inpainting in Latent Diffusion Models for Virtual Try-All

Paper • 2401.13795 • Published Jan 24 • 64
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild

Paper • 2401.13627 • Published Jan 24 • 70
MambaByte: Token-free Selective State Space Model

Paper • 2401.13660 • Published Jan 24 • 47
UNIMO-G: Unified Image Generation through Multimodal Conditional Diffusion

Paper • 2401.13388 • Published Jan 24 • 9
Lumiere: A Space-Time Diffusion Model for Video Generation

Paper • 2401.12945 • Published Jan 23 • 82
Multilingual and Fully Non-Autoregressive ASR with Large Language Model Fusion: A Comprehensive Study

Paper • 2401.12789 • Published Jan 23 • 5
Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs

Paper • 2401.11708 • Published Jan 22 • 27
Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers

Paper • 2401.11605 • Published Jan 21 • 19
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data

Paper • 2401.10891 • Published Jan 19 • 54
DiffusionGPT: LLM-Driven Text-to-Image Generation System

Paper • 2401.10061 • Published Jan 18 • 26
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

Paper • 2401.09417 • Published Jan 17 • 51
DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference

Paper • 2401.08671 • Published Jan 9 • 12
UFO: A UI-Focused Agent for Windows OS Interaction

Paper • 2402.07939 • Published Feb 8 • 13
λ-ECLIPSE: Multi-Concept Personalized Text-to-Image Diffusion Models by Leveraging CLIP Latent Space

Paper • 2402.05195 • Published Feb 7 • 16
FiT: Flexible Vision Transformer for Diffusion Model

Paper • 2402.12376 • Published Feb 19 • 46
PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation

Paper • 2403.04692 • Published Mar 7 • 36
Yi: Open Foundation Models by 01.AI

Paper • 2403.04652 • Published Mar 7 • 59
StableDrag: Stable Dragging for Point-based Image Editing

Paper • 2403.04437 • Published Mar 7 • 24
CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion

Paper • 2403.05121 • Published Mar 8 • 16
Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation

Paper • 2403.12015 • Published Mar 18 • 60
Be Yourself: Bounded Attention for Multi-Subject Text-to-Image Generation

Paper • 2403.16990 • Published Mar 25 • 24
ViTAR: Vision Transformer with Any Resolution

Paper • 2403.18361 • Published Mar 27 • 48
LLM Agent Operating System

Paper • 2403.16971 • Published Mar 25 • 62
Getting it Right: Improving Spatial Consistency in Text-to-Image Models

Paper • 2404.01197 • Published Apr 1 • 29
CosmicMan: A Text-to-Image Foundation Model for Humans

Paper • 2404.01294 • Published Apr 1 • 15
On the Scalability of Diffusion-based Text-to-Image Generation

Paper • 2404.02883 • Published Apr 3 • 17
Applying Guidance in a Limited Interval Improves Sample and Distribution Quality in Diffusion Models

Paper • 2404.07724 • Published Apr 11 • 10
Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models

Paper • 2404.07973 • Published Apr 11 • 28
EdgeFusion: On-Device Text-to-Image Generation

Paper • 2404.11925 • Published Apr 18 • 19
InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation

Paper • 2404.19427 • Published Apr 30 • 65
DressCode: Autoregressively Sewing and Generating Garments from Text Guidance

Paper • 2401.16465 • Published Jan 29 • 9