Pinterest is creating its own AI text-to-image generation process, which has a unique approach compared to other apps.
According to a recent overview from the Pinterest Engineering team, Pinterest’s “Canvas” model is designed to generate background options for products while keeping the product shot as the main focus.
This requires additional training. Most large language models generate images based on text descriptions by correlating textual notes from other images with the visual outputs. However, product shots often lack background descriptions in their captions, prompting Pinterest’s team to devise a new method to separate the background from the foreground, making it easier to guide the tool with simple commands.
According to Pinterest:
“Training Pinterest Canvas gives us a strong base model that understands what objects look like, what their names are, and how they are typically arranged in scenes. However, our goal is to train models that can visualize or reimagine real ideas or products in new contexts.”
Conceptually, Pinterest aims to leverage its existing database of product images to identify common framing, placement, and background types, thereby enhancing the AI's ability to generate background requests.
Although it’s a complex approach, Pinterest has developed a system that can achieve this with a high level of accuracy.
“We use a segmentation model to create product masks by distinguishing the foreground from the background. Since existing text captions usually only describe the product and neglect the background—an essential factor for guiding the background inpainting process—we incorporate more comprehensive and detailed captions from a visual LLM. In this stage, we train a LoRA on all UNet layers for quick, parameter-efficient fine-tuning. Finally, we perform a brief fine-tuning on a curated set of highly engaged promoted product images to align the model with aesthetics that appeal to Pinners.”
Ultimately, the system is tailored to generate backgrounds based on existing Pin images while also aligning the model with specific visual styles to streamline the creation process.
In the end, brands will be able to input any style they desire using common descriptors, and Pinterest’s system will provide options for product shots in that aesthetic.
It’s an intriguing concept that Pinterest is currently testing with selected advertising partners.
This could be an effective method for generating more variations of your Pin images and increasing your product's appeal through various design styles.