Edit model card

ddpo-alignment

This model was finetuned from Stable Diffusion v1-4 using DDPO and a reward function that uses LLaVA to measure prompt-image alignment. See the project website for more details.

The model was finetuned for 200 iterations with a batch size of 256 samples per iteration. During finetuning, we used prompts of the form: "a(n) <animal> <activity>". We selected the animal and activity from the following lists, so try those for the best results. However, we also observed limited generalization to other prompts.

Activities:

  • washing dishes
  • playing chess
  • riding a bike

Animals:

  • cat
  • dog
  • horse
  • monkey
  • rabbit
  • zebra
  • spider
  • bird
  • sheep
  • deer
  • cow
  • goat
  • lion
  • tiger
  • bear
  • raccoon
  • fox
  • wolf
  • lizard
  • beetle
  • ant
  • butterfly
  • fish
  • shark
  • whale
  • dolphin
  • squirrel
  • mouse
  • rat
  • snake
  • turtle
  • frog
  • chicken
  • duck
  • goose
  • bee
  • pig
  • turkey
  • fly
  • llama
  • camel
  • bat
  • gorilla
  • hedgehog
  • kangaroo
Downloads last month
24