We further find that human evaluators

rochona · Post by **rochona** » Mon May 26, 2025 10:17 am

prefer BootPIG over existing subject-driven generation methods. Here, we conduct two studies per comparison— one evaluating which method generates an image that best matches the prompt (prompt matching), and another evaluating which method generates an image that maintains the key features of the subject (subject matching).

Impact and Ethical Considerations
The ability to personalize images according to a subject opens up many avenues for applications. From creating marketing images with your product to inpainting your pet into a family photo, the possibilities of personalized generation are endless. On top of that, being able to generate these images without having to wait through the finetuning process, means that these applications are only a few keystrokes away.

While BootPIG enables many exciting applications, it is important to be mindful of the limitations of text-to-image models. Text-to-image models reflect the biases captured in their training data. As a result, harmful stereotypes and inappropriate content may be generated9. Additionally, subject-driven generation introduces america phone number list the risk of misinformation. BootPIG relies on pretrained text-to-image models and thus may perpetuate these biases.

Conclusion and Future Directions
BootPIG allows users to generate any image with their desired subject while avoiding the hassle of finetuning the text-to-image model at inference. BootPIG pre-trains a general subject encoder, which can handle multiple reference images, using only synthetic data. The entire training process takes approximately 1 hour. Future work could explore the use of BootPIG for subject-driven inpainting or look to introduce additional visual controls (e.g., depth maps) into the model.

Additional Details

Senthil Purushwalkam is a Research Scientist at Salesforce AI. His research interests lie in Computer Vision and NLP.

Akash Gokul is an AI Resident at Salesforce Research. He is currently working on improving multimodal generative models.

Shafiq Rayhan Joty is Research Director at Salesforce Research AI, where he directs the NLP group’s work on large language modeling (LLM) and generative AI. His research interests are in LLMs, multi-modal NLP and robust NLP.

Nikhil Naik is a Principal Researcher at Salesforce AI. His research interests are in computer vision, natural language processing, and their applications in sciences. His current work focuses on generative modeling and multimodal representation learning.