It's raining diffusion personalization techniques☔️🎭🖼️

Community Article Published April 11, 2024

Recently, generating high quality portraits from refrence photos was made possible with as little as a single reference image & without any optimization⚡️

image/png

figure taken from InstantID: Zero-shot Identity-Preserving Generation in Seconds

Using these new zero-shot methods, one can easily generate a self portrait with their choice of style, composition, and background👩🏻‍🎨

Here are 3 zero-shot pipelines to know and try🚀

  1. Face-to-all
  2. InstantID
  3. IP Adapter FaceID

🎭IP Adapter FaceID🎭

IP Adapters consist of 2 core components:

image/png

  1. An image encoder to extract image features (from the reference image/s)
  2. Decoupled cross-attention layers for text features and image features. A new cross-attention layer is added for each cross-attention layer in the original UNet model to insert image features. 💡To improve face fidelity, in IP Adapter FaceID, face embeddings were introduced, instead of (or in addition to in IP Adapter FaceID Plus) to CLIP embeddings.

🎭InstantID🎭

image/png

Similar to IP Adapter, InstantID also makes use of id embeddings and decoupled cross attention, but adds a new component: Identity Net

💡IdentityNet - an adapted ControlNet - meant to encode the detailed features from the reference facial image with additional spatial control, with 2 main modifications to ControlNet:

❶ Instead of fine-grained OpenPose facial keypoints, only five facial keypoints are used (two for the eyes, one for the nose, and two for the mouth) for conditional input.

❷ Eliminate text prompts and use ID embedding as conditions for cross-attention layers in the ControlNet

🎭Face-to-all🎭

a diffusers 🧨 workflow inspired by @fofr Face-to-Many ComfyUI workflow🔥 image/png

This workflow extends the original InstantID pipeline & combines it with any SDXL LoRA:

  1. adding the option to stylize with all style sdxl LoRAs - especially useful for styles that aren't known to the base diffusion model (browse the LoRA Studio for inspo ✨)
  2. improving structure preservation - maintaining the composition of the reference image.