Stroke-based generation using SDEdit is an interesting testing ground of our model’s learned tendencies. In this setting a color layout (left) is used as a reference in generating an image according to the prompts Fantasy landscape, trending on artstation (top) and High-resolution rendering of a crowded colorful sci-fi city (bottom). DPO-SDXL generates much more visually exciting imagery than the initialization model SDXL.
The bottom line
Diffusion-DPO enables alignment of diffusion models to human goals/values
This training process closes the performance gap of StableDiffusion-XL-1.0 to closed-source frameworks such as Midjourney v5 or Emu.
Common complaints such as person generation emerge as america phone number list improvements when training on human preferences
Looking ahead, there are many paths that preference optimization in diffusion models will go down. The work presented here is still practically at a proof-of-concept scale: it’s expected that pushing the scale of training could even further improve models.
Furthermore, there are many varieties of preference/feedback that can be used here. We only covered generic human preference here, but experiment from our paper that attributes such as text faithfulness or visual appeal can be dedicatedly optimized for. That’s not even considering more targeted objectives such as personalization. RLHF has been a huge and rapidly growing field in language models and we’re extremely excited to both continue developing these types of diffusion approaches and seeing works from the broader research community as well.
The authors of the research paper discussed in this blog were Bram Wallace, Meihua Dang, Rafael Rafailov, Linqi Zhou, Aaron Lou, Senthil Purushwalkam, Stefano Ermon, Caiming Xiong, Shafiq Joty, and Nikhil Naik.