One-for-All: Generalized LoRA for Parameter-Efficient Fine-tuning Paper • 2306.07967 • Published Jun 13, 2023 • 23
Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation Paper • 2306.07954 • Published Jun 13, 2023 • 111
AvatarBooth: High-Quality and Customizable 3D Human Avatar Generation Paper • 2306.09864 • Published Jun 16, 2023 • 13
MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image Editing Paper • 2306.10012 • Published Jun 16, 2023 • 33
One-2-3-45: Any Single Image to 3D Mesh in 45 Seconds without Per-Shape Optimization Paper • 2306.16928 • Published Jun 29, 2023 • 35
DreamTime: An Improved Optimization Strategy for Text-to-3D Content Creation Paper • 2306.12422 • Published Jun 21, 2023 • 11
DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing Paper • 2306.14435 • Published Jun 26, 2023 • 20
DreamDiffusion: Generating High-Quality Images from Brain EEG Signals Paper • 2306.16934 • Published Jun 29, 2023 • 29
Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors Paper • 2306.17843 • Published Jun 30, 2023 • 41
DisCo: Disentangled Control for Referring Human Dance Generation in Real World Paper • 2307.00040 • Published Jun 30, 2023 • 24
LEDITS: Real Image Editing with DDPM Inversion and Semantic Guidance Paper • 2307.00522 • Published Jul 2, 2023 • 27
SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis Paper • 2307.01952 • Published Jul 4, 2023 • 74
DragonDiffusion: Enabling Drag-style Manipulation on Diffusion Models Paper • 2307.02421 • Published Jul 5, 2023 • 33
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation Paper • 2307.06942 • Published Jul 13, 2023 • 20
Sketch-A-Shape: Zero-Shot Sketch-to-3D Shape Generation Paper • 2307.03869 • Published Jul 8, 2023 • 20
AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning Paper • 2307.04725 • Published Jul 10, 2023 • 63
HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image Models Paper • 2307.06949 • Published Jul 13, 2023 • 49
DreamTeacher: Pretraining Image Backbones with Deep Generative Models Paper • 2307.07487 • Published Jul 14, 2023 • 19
Text2Layer: Layered Image Generation using Latent Diffusion Model Paper • 2307.09781 • Published Jul 19, 2023 • 12
FABRIC: Personalizing Diffusion Models with Iterative Feedback Paper • 2307.10159 • Published Jul 19, 2023 • 29
TokenFlow: Consistent Diffusion Features for Consistent Video Editing Paper • 2307.10373 • Published Jul 19, 2023 • 54
Subject-Diffusion:Open Domain Personalized Text-to-Image Generation without Test-time Fine-tuning Paper • 2307.11410 • Published Jul 21, 2023 • 14
ImageBrush: Learning Visual In-Context Instructions for Exemplar-Based Image Manipulation Paper • 2308.00906 • Published Aug 2, 2023 • 12
ConceptLab: Creative Generation using Diffusion Prior Constraints Paper • 2308.02669 • Published Aug 3, 2023 • 22
AvatarVerse: High-quality & Stable 3D Avatar Creation from Text and Pose Paper • 2308.03610 • Published Aug 7, 2023 • 22
3D Gaussian Splatting for Real-Time Radiance Field Rendering Paper • 2308.04079 • Published Aug 8, 2023 • 161
IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models Paper • 2308.06721 • Published Aug 13, 2023 • 24
Dual-Stream Diffusion Net for Text-to-Video Generation Paper • 2308.08316 • Published Aug 16, 2023 • 23
TeCH: Text-guided Reconstruction of Lifelike Clothed Humans Paper • 2308.08545 • Published Aug 16, 2023 • 30
VideoGen: A Reference-Guided Latent Diffusion Approach for High Definition Text-to-Video Generation Paper • 2309.00398 • Published Sep 1, 2023 • 18
CityDreamer: Compositional Generative Model of Unbounded 3D Cities Paper • 2309.00610 • Published Sep 1, 2023 • 14
PhotoVerse: Tuning-Free Image Customization with Text-to-Image Diffusion Models Paper • 2309.05793 • Published Sep 11, 2023 • 50
InstaFlow: One Step is Enough for High-Quality Diffusion-Based Text-to-Image Generation Paper • 2309.06380 • Published Sep 12, 2023 • 32
LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion Models Paper • 2309.15103 • Published Sep 26, 2023 • 42
Emu: Enhancing Image Generation Models Using Photogenic Needles in a Haystack Paper • 2309.15807 • Published Sep 27, 2023 • 30
Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation Paper • 2309.15818 • Published Sep 27, 2023 • 18
DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation Paper • 2309.16653 • Published Sep 28, 2023 • 41
PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis Paper • 2310.00426 • Published Sep 30, 2023 • 60
Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and Latent Diffusion Paper • 2310.03502 • Published Oct 5, 2023 • 74
Aligning Text-to-Image Diffusion Models with Reward Backpropagation Paper • 2310.03739 • Published Oct 5, 2023 • 21
MotionDirector: Motion Customization of Text-to-Video Diffusion Models Paper • 2310.08465 • Published Oct 12, 2023 • 13
GaussianDreamer: Fast Generation from Text to 3D Gaussian Splatting with Point Cloud Priors Paper • 2310.08529 • Published Oct 12, 2023 • 16
HyperHuman: Hyper-Realistic Human Generation with Latent Structural Diffusion Paper • 2310.08579 • Published Oct 12, 2023 • 14
Wonder3D: Single Image to 3D using Cross-Domain Diffusion Paper • 2310.15008 • Published Oct 23, 2023 • 19
DEsignBench: Exploring and Benchmarking DALL-E 3 for Imagining Visual Design Paper • 2310.15144 • Published Oct 23, 2023 • 12
A Picture is Worth a Thousand Words: Principled Recaptioning Improves Image Generation Paper • 2310.16656 • Published Oct 25, 2023 • 36
DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior Paper • 2310.16818 • Published Oct 25, 2023 • 27
CommonCanvas: An Open Diffusion Model Trained with Creative-Commons Images Paper • 2310.16825 • Published Oct 25, 2023 • 27
VideoCrafter1: Open Diffusion Models for High-Quality Video Generation Paper • 2310.19512 • Published Oct 30, 2023 • 14
De-Diffusion Makes Text a Strong Cross-Modal Interface Paper • 2311.00618 • Published Nov 1, 2023 • 21
I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models Paper • 2311.04145 • Published Nov 7, 2023 • 30
LCM-LoRA: A Universal Stable-Diffusion Acceleration Module Paper • 2311.05556 • Published Nov 9, 2023 • 73
Instant3D: Fast Text-to-3D with Sparse-View Generation and Large Reconstruction Model Paper • 2311.06214 • Published Nov 10, 2023 • 28
One-2-3-45++: Fast Single Image to 3D Objects with Consistent Multi-View Generation and 3D Diffusion Paper • 2311.07885 • Published Nov 14, 2023 • 36
DMV3D: Denoising Multi-View Diffusion using 3D Large Reconstruction Model Paper • 2311.09217 • Published Nov 15, 2023 • 20
UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs Paper • 2311.09257 • Published Nov 14, 2023 • 43
The Chosen One: Consistent Characters in Text-to-Image Diffusion Models Paper • 2311.10093 • Published Nov 16, 2023 • 54
MetaDreamer: Efficient Text-to-3D Creation With Disentangling Geometry and Texture Paper • 2311.10123 • Published Nov 16, 2023 • 14
SelfEval: Leveraging the discriminative nature of generative models for evaluation Paper • 2311.10708 • Published Nov 17, 2023 • 14
Emu Video: Factorizing Text-to-Video Generation by Explicit Image Conditioning Paper • 2311.10709 • Published Nov 17, 2023 • 24
Text-to-Sticker: Style Tailoring Latent Diffusion Models for Human Expression Paper • 2311.10794 • Published Nov 17, 2023 • 22
AutoStory: Generating Diverse Storytelling Images with Minimal Human Effort Paper • 2311.11243 • Published Nov 19, 2023 • 14
LucidDreamer: Towards High-Fidelity Text-to-3D Generation via Interval Score Matching Paper • 2311.11284 • Published Nov 19, 2023 • 16
PF-LRM: Pose-Free Large Reconstruction Model for Joint Pose and Shape Prediction Paper • 2311.12024 • Published Nov 20, 2023 • 16
MagicDance: Realistic Human Dance Video Generation with Motions & Facial Expressions Transfer Paper • 2311.12052 • Published Nov 18, 2023 • 28
Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models Paper • 2311.12092 • Published Nov 20, 2023 • 19
NeuroPrompts: An Adaptive Framework to Optimize Prompts for Text-to-Image Generation Paper • 2311.12229 • Published Nov 20, 2023 • 26
Diffusion Model Alignment Using Direct Preference Optimization Paper • 2311.12908 • Published Nov 21, 2023 • 46
FusionFrames: Efficient Architectural Aspects for Text-to-Video Generation Pipeline Paper • 2311.13073 • Published Nov 22, 2023 • 53
Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model Paper • 2311.13231 • Published Nov 22, 2023 • 25
LucidDreamer: Domain-free Generation of 3D Gaussian Splatting Scenes Paper • 2311.13384 • Published Nov 22, 2023 • 48
ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs Paper • 2311.13600 • Published Nov 22, 2023 • 41
VideoBooth: Diffusion-based Video Generation with Image Prompts Paper • 2312.00777 • Published Dec 1, 2023 • 19
VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence Paper • 2312.02087 • Published Dec 4, 2023 • 19
ImageDream: Image-Prompt Multi-view Diffusion for 3D Generation Paper • 2312.02201 • Published Dec 2, 2023 • 30
X-Adapter: Adding Universal Compatibility of Plugins for Upgraded Diffusion Model Paper • 2312.02238 • Published Dec 4, 2023 • 24
DiffiT: Diffusion Vision Transformers for Image Generation Paper • 2312.02139 • Published Dec 4, 2023 • 13
VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models Paper • 2312.00845 • Published Dec 1, 2023 • 36
Analyzing and Improving the Training Dynamics of Diffusion Models Paper • 2312.02696 • Published Dec 5, 2023 • 31
Orthogonal Adaptation for Modular Customization of Diffusion Models Paper • 2312.02432 • Published Dec 5, 2023 • 12
LivePhoto: Real Image Animation with Text-guided Motion Control Paper • 2312.02928 • Published Dec 5, 2023 • 15
Fine-grained Controllable Video Generation via Object Appearance and Context Paper • 2312.02919 • Published Dec 5, 2023 • 9
MotionCtrl: A Unified and Flexible Motion Controller for Video Generation Paper • 2312.03641 • Published Dec 6, 2023 • 19
AnimateZero: Video Diffusion Models are Zero-Shot Image Animators Paper • 2312.03793 • Published Dec 6, 2023 • 17
PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding Paper • 2312.04461 • Published Dec 7, 2023 • 48
HyperDreamer: Hyper-Realistic 3D Content Generation and Editing from a Single Image Paper • 2312.04543 • Published Dec 7, 2023 • 21
Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models Paper • 2312.04410 • Published Dec 7, 2023 • 14
DreaMoving: A Human Dance Video Generation Framework based on Diffusion Models Paper • 2312.05107 • Published Dec 8, 2023 • 32
GenTron: Delving Deep into Diffusion Transformers for Image and Video Generation Paper • 2312.04557 • Published Dec 7, 2023 • 12
Text-to-3D Generation with Bidirectional Diffusion using both 2D and 3D priors Paper • 2312.04963 • Published Dec 7, 2023 • 15
Sherpa3D: Boosting High-Fidelity Text-to-3D Generation via Coarse 3D Prior Paper • 2312.06655 • Published Dec 11, 2023 • 21
Photorealistic Video Generation with Diffusion Models Paper • 2312.06662 • Published Dec 11, 2023 • 23
FreeInit: Bridging Initialization Gap in Video Diffusion Models Paper • 2312.07537 • Published Dec 12, 2023 • 23
FreeControl: Training-Free Spatial Control of Any Text-to-Image Diffusion Model with Any Condition Paper • 2312.07536 • Published Dec 12, 2023 • 15
DiffMorpher: Unleashing the Capability of Diffusion Models for Image Morphing Paper • 2312.07409 • Published Dec 12, 2023 • 21
Clockwork Diffusion: Efficient Generation With Model-Step Distillation Paper • 2312.08128 • Published Dec 13, 2023 • 11
DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models Paper • 2312.09767 • Published Dec 15, 2023 • 24
Faster Diffusion: Rethinking the Role of UNet Encoder in Diffusion Models Paper • 2312.09608 • Published Dec 15, 2023 • 13
FineControlNet: Fine-level Text Control for Image Generation with Spatially Aligned Text Control Injection Paper • 2312.09252 • Published Dec 14, 2023 • 9
SCEdit: Efficient and Controllable Image Diffusion Generation via Skip Connection Editing Paper • 2312.11392 • Published Dec 18, 2023 • 18
StreamDiffusion: A Pipeline-level Solution for Real-time Interactive Generation Paper • 2312.12491 • Published Dec 19, 2023 • 66
InstructVideo: Instructing Video Diffusion Models with Human Feedback Paper • 2312.12490 • Published Dec 19, 2023 • 14
Fairy: Fast Parallelized Instruction-Guided Video-to-Video Synthesis Paper • 2312.13834 • Published Dec 20, 2023 • 25
DREAM-Talk: Diffusion-based Realistic Emotional Audio-driven Method for Single Image Talking Face Generation Paper • 2312.13578 • Published Dec 21, 2023 • 23
Paint3D: Paint Anything 3D with Lighting-Less Texture Diffusion Models Paper • 2312.13913 • Published Dec 21, 2023 • 22
HD-Painter: High-Resolution and Prompt-Faithful Text-Guided Image Inpainting with Diffusion Models Paper • 2312.14091 • Published Dec 21, 2023 • 13
DreamTuner: Single Image is Enough for Subject-Driven Generation Paper • 2312.13691 • Published Dec 21, 2023 • 23
Align Your Gaussians: Text-to-4D with Dynamic 3D Gaussians and Composed Diffusion Models Paper • 2312.13763 • Published Dec 21, 2023 • 9
PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models Paper • 2312.13964 • Published Dec 21, 2023 • 16
Make-A-Character: High Quality Text-to-3D Character Generation within Minutes Paper • 2312.15430 • Published Dec 24, 2023 • 25
A Recipe for Scaling up Text-to-Video Generation with Text-free Videos Paper • 2312.15770 • Published Dec 25, 2023 • 12
FlowVid: Taming Imperfect Optical Flows for Consistent Video-to-Video Synthesis Paper • 2312.17681 • Published Dec 29, 2023 • 15
VideoDrafter: Content-Consistent Multi-Scene Video Generation with LLM Paper • 2401.01256 • Published Jan 2 • 16
Image Sculpting: Precise Object Editing with 3D Geometry Control Paper • 2401.01702 • Published Jan 2 • 18
PIXART-δ: Fast and Controllable Image Generation with Latent Consistency Models Paper • 2401.05252 • Published Jan 10 • 43
InseRF: Text-Driven Generative Object Insertion in Neural 3D Scenes Paper • 2401.05335 • Published Jan 10 • 26
PALP: Prompt Aligned Personalization of Text-to-Image Models Paper • 2401.06105 • Published Jan 11 • 46
Parrot: Pareto-optimal Multi-Reward Reinforcement Learning Framework for Text-to-Image Generation Paper • 2401.05675 • Published Jan 11 • 20
TRIPS: Trilinear Point Splatting for Real-Time Radiance Field Rendering Paper • 2401.06003 • Published Jan 11 • 20
InstantID: Zero-shot Identity-Preserving Generation in Seconds Paper • 2401.07519 • Published Jan 15 • 50
WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens Paper • 2401.09985 • Published Jan 18 • 12
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data Paper • 2401.10891 • Published Jan 19 • 53
Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs Paper • 2401.11708 • Published Jan 22 • 27
EmerDiff: Emerging Pixel-level Semantic Knowledge in Diffusion Models Paper • 2401.11739 • Published Jan 22 • 16
Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers Paper • 2401.11605 • Published Jan 21 • 19
Lumiere: A Space-Time Diffusion Model for Video Generation Paper • 2401.12945 • Published Jan 23 • 82
Deconstructing Denoising Diffusion Models for Self-Supervised Learning Paper • 2401.14404 • Published Jan 25 • 16
Diffuse to Choose: Enriching Image Conditioned Inpainting in Latent Diffusion Models for Virtual Try-All Paper • 2401.13795 • Published Jan 24 • 64
Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling Paper • 2401.15977 • Published Jan 29 • 33
StableIdentity: Inserting Anybody into Anywhere at First Sight Paper • 2401.15975 • Published Jan 29 • 16
BlockFusion: Expandable 3D Scene Generation using Latent Tri-plane Extrapolation Paper • 2401.17053 • Published Jan 30 • 29
Anything in Any Scene: Photorealistic Video Object Insertion Paper • 2401.17509 • Published Jan 30 • 16
ReplaceAnything3D:Text-Guided 3D Scene Editing with Compositional Neural Radiance Fields Paper • 2401.17895 • Published Jan 31 • 15
AnimateLCM: Accelerating the Animation of Personalized Diffusion Models and Adapters with Decoupled Consistency Learning Paper • 2402.00769 • Published Feb 1 • 17
Boximator: Generating Rich and Controllable Motions for Video Synthesis Paper • 2402.01566 • Published Feb 2 • 25
LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation Paper • 2402.05054 • Published Feb 7 • 24
ConsistI2V: Enhancing Visual Consistency for Image-to-Video Generation Paper • 2402.04324 • Published Feb 6 • 22
Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation Paper • 2402.10210 • Published Feb 15 • 28
DreamMatcher: Appearance Matching Self-Attention for Semantically-Consistent Text-to-Image Personalization Paper • 2402.09812 • Published Feb 15 • 11
GaussianObject: Just Taking Four Images to Get A High-Quality 3D Object with Gaussian Splatting Paper • 2402.10259 • Published Feb 15 • 13
MVDiffusion++: A Dense High-resolution Multi-view Diffusion Model for Single or Sparse-view 3D Object Reconstruction Paper • 2402.12712 • Published Feb 20 • 13
Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis Paper • 2402.14797 • Published Feb 22 • 18
Gen4Gen: Generative Data Pipeline for Generative Multi-Concept Composition Paper • 2402.15504 • Published Feb 23 • 19
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models Paper • 2402.17177 • Published Feb 27 • 87
DiffuseKronA: A Parameter Efficient Fine-tuning Method for Personalized Diffusion Model Paper • 2402.17412 • Published Feb 27 • 21
ViewFusion: Towards Multi-View Consistency via Interpolated Denoising Paper • 2402.18842 • Published Feb 29 • 13
OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on Paper • 2403.01779 • Published Mar 4 • 25
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis Paper • 2403.03206 • Published Mar 5 • 40
ResAdapter: Domain Consistent Resolution Adapter for Diffusion Models Paper • 2403.02084 • Published Mar 4 • 11
Finetuned Multimodal Language Models Are High-Quality Image-Text Data Filters Paper • 2403.02677 • Published Mar 5 • 16
PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation Paper • 2403.04692 • Published Mar 7 • 35
VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models Paper • 2403.05438 • Published Mar 8 • 14
CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion Paper • 2403.05121 • Published Mar 8 • 16
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment Paper • 2403.05135 • Published Mar 8 • 39
VLOGGER: Multimodal Diffusion for Embodied Avatar Synthesis Paper • 2403.08764 • Published Mar 13 • 34
StreamMultiDiffusion: Real-Time Interactive Generation with Region-Based Semantic Control Paper • 2403.09055 • Published Mar 14 • 23
SV3D: Novel Multi-view Synthesis and 3D Generation from a Single Image using Latent Video Diffusion Paper • 2403.12008 • Published Mar 18 • 18
Generic 3D Diffusion Adapter Using Controlled Multi-View Editing Paper • 2403.12032 • Published Mar 18 • 14
LightIt: Illumination Modeling and Control for Diffusion Models Paper • 2403.10615 • Published Mar 15 • 14
Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation Paper • 2403.12015 • Published Mar 18 • 60
GaussianFlow: Splatting Gaussian Dynamics for 4D Content Creation Paper • 2403.12365 • Published Mar 19 • 10
RadSplat: Radiance Field-Informed Gaussian Splatting for Robust Real-Time Rendering with 900+ FPS Paper • 2403.13806 • Published Mar 20 • 18
AnyV2V: A Plug-and-Play Framework For Any Video-to-Video Editing Tasks Paper • 2403.14468 • Published Mar 21 • 18
Efficient Video Diffusion Models via Content-Frame Motion-Latent Decomposition Paper • 2403.14148 • Published Mar 21 • 17
GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation Paper • 2403.14621 • Published Mar 21 • 14
FlashFace: Human Image Personalization with High-fidelity Identity Preservation Paper • 2403.17008 • Published Mar 25 • 18
Be Yourself: Bounded Attention for Multi-Subject Text-to-Image Generation Paper • 2403.16990 • Published Mar 25 • 24
SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions Paper • 2403.16627 • Published Mar 25 • 20
Gamba: Marry Gaussian Splatting with Mamba for single view 3D reconstruction Paper • 2403.18795 • Published Mar 27 • 17
ObjectDrop: Bootstrapping Counterfactuals for Photorealistic Object Removal and Insertion Paper • 2403.18818 • Published Mar 27 • 22
EgoLifter: Open-world 3D Segmentation for Egocentric Perception Paper • 2403.18118 • Published Mar 26 • 7
GaussianCube: Structuring Gaussian Splatting using Optimal Transport for 3D Generative Modeling Paper • 2403.19655 • Published Mar 28 • 14
Getting it Right: Improving Spatial Consistency in Text-to-Image Models Paper • 2404.01197 • Published Apr 1 • 29
FlexiDreamer: Single Image-to-3D Generation with FlexiCubes Paper • 2404.00987 • Published Apr 1 • 21
CameraCtrl: Enabling Camera Control for Text-to-Video Generation Paper • 2404.02101 • Published Apr 2 • 16
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction Paper • 2404.02905 • Published Apr 3 • 59
On the Scalability of Diffusion-based Text-to-Image Generation Paper • 2404.02883 • Published Apr 3 • 17
InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation Paper • 2404.02733 • Published Apr 3 • 19
Cross-Attention Makes Inference Cumbersome in Text-to-Image Diffusion Models Paper • 2404.02747 • Published Apr 3 • 11
CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching Paper • 2404.03653 • Published Apr 4 • 28
Freditor: High-Fidelity and Transferable NeRF Editing by Frequency Decomposition Paper • 2404.02514 • Published Apr 3 • 9
ByteEdit: Boost, Comply and Accelerate Generative Image Editing Paper • 2404.04860 • Published Apr 7 • 24
UniFL: Improve Stable Diffusion via Unified Feedback Learning Paper • 2404.05595 • Published Apr 8 • 22
MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators Paper • 2404.05014 • Published Apr 7 • 22
SwapAnything: Enabling Arbitrary Object Swapping in Personalized Visual Editing Paper • 2404.05717 • Published Apr 8 • 23
BeyondScene: Higher-Resolution Human-Centric Scene Generation With Pretrained Diffusion Paper • 2404.04544 • Published Apr 6 • 20
Magic-Boost: Boost 3D Generation with Mutli-View Conditioned Diffusion Paper • 2404.06429 • Published Apr 9 • 6
DreamScene360: Unconstrained Text-to-3D Scene Generation with Panoramic Gaussian Splatting Paper • 2404.06903 • Published Apr 10 • 14
RealmDreamer: Text-Driven 3D Scene Generation with Inpainting and Depth Diffusion Paper • 2404.07199 • Published Apr 10 • 22
ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback Paper • 2404.07987 • Published Apr 11 • 46
Applying Guidance in a Limited Interval Improves Sample and Distribution Quality in Diffusion Models Paper • 2404.07724 • Published Apr 11 • 10
Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model Paper • 2404.09967 • Published Apr 15 • 20
HQ-Edit: A High-Quality Dataset for Instruction-based Image Editing Paper • 2404.09990 • Published Apr 15 • 11
PhysDreamer: Physics-Based Interaction with 3D Objects via Video Generation Paper • 2404.13026 • Published Apr 19 • 21
Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis Paper • 2404.13686 • Published 30 days ago • 26
Align Your Steps: Optimizing Sampling Schedules in Diffusion Models Paper • 2404.14507 • Published 28 days ago • 21
PuLID: Pure and Lightning ID Customization via Contrastive Alignment Paper • 2404.16022 • Published 26 days ago • 16
Interactive3D: Create What You Want by Interactive 3D Generation Paper • 2404.16510 • Published 26 days ago • 17
Revisiting Text-to-Image Evaluation with Gecko: On Metrics, Prompts, and Human Ratings Paper • 2404.16820 • Published 25 days ago • 15
ConsistentID: Portrait Generation with Multimodal Fine-Grained Identity Preserving Paper • 2404.16771 • Published 25 days ago • 16
HaLo-NeRF: Learning Geometry-Guided Semantics for Exploring Unconstrained Photo Collections Paper • 2404.16845 • Published Feb 14 • 5
Stylus: Automatic Adapter Selection for Diffusion Models Paper • 2404.18928 • Published 21 days ago • 14
InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation Paper • 2404.19427 • Published 21 days ago • 65
MotionLCM: Real-time Controllable Motion Generation via Latent Consistency Model Paper • 2404.19759 • Published 20 days ago • 21
GS-LRM: Large Reconstruction Model for 3D Gaussian Splatting Paper • 2404.19702 • Published 20 days ago • 15
Paint by Inpaint: Learning to Add Image Objects by Removing Them First Paper • 2404.18212 • Published 23 days ago • 19
Spectrally Pruned Gaussian Fields with Neural Compensation Paper • 2405.00676 • Published 19 days ago • 8
StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation Paper • 2405.01434 • Published 18 days ago • 44
Customizing Text-to-Image Models with a Single Image Pair Paper • 2405.01536 • Published 18 days ago • 17
Coin3D: Controllable and Interactive 3D Assets Generation with Proxy-Guided Conditioning Paper • 2405.08054 • Published 7 days ago • 17
Compositional Text-to-Image Generation with Dense Blob Representations Paper • 2405.08246 • Published 7 days ago • 11
CAT3D: Create Anything in 3D with Multi-View Diffusion Models Paper • 2405.10314 • Published 4 days ago • 29
Dual3D: Efficient and Consistent Text-to-3D Generation with Dual-mode Multi-view Latent Diffusion Paper • 2405.09874 • Published 5 days ago • 10