After a series of controversies caused by technical hiccups and licensing changes, AI startup Stability AI announced its latest family of image-generation models.
According to the company, the stable Diffusion 3.5 series is more customizable and versatile than Stability's previous-generation tech, as well as more performant. In total, there are three models.
Stable Diffusion 3.5 Large: At 8 billion parameters, it's the most powerful and will generate images up to 1 megapixel in resolution. (Parameters approximately correlates with problem-solving capability; more parameters roughly mean a better-performing model in general.)
Stable Diffusion 3.5 Large Turbo: This is a Turbo version of Stable Diffusion 3.5 Large that generates images significantly faster than the former while losing a little quality.
Stable Diffusion 3.5 Medium: Optimized version of the model to run it on edge devices like smartphones and laptops with a resolution capacity from 0.25 megapixels up to 2 megapixels.
While the other two, Stable Diffusion 3.5 Large and 3.5 Large Turbo, are available already today; the new one, 3.5 Medium will arrive on October 29.
Stability claims that the Stable Diffusion 3.5 models should produce more "diverse" outputs-more images which feature people with varied skin tones and features-symptom without "extensive" prompting.
"During training, multiple versions of prompts, with shorter prompts prioritized, caption every image, ensuring a broader and more diverse distribution of image concepts for any given text description, Stability chief technology officer Hanno Basse explains in an interview with TechCrunch. "Unlike most generative AI companies, we train on a wide variety of data, including filtered publicly available datasets and synthetic data."
Some companies have cludgily built these sorts of "diversifying" features into image generators in the past, prompting outcries on social media. An older version of Google's Gemini chatbot, for instance, would display an anachronistic group of figures for historical prompts such as "a Roman legion" or "U.S. senators." Google had to pause image generation of people for nearly six months as it developed a fix.
Hopefully, Stability's strategy is wiser than some others. Unfortunately, we cannot give any impressions because Stability didn't offer early access.
Stability's preeminent image generator, Stable Diffusion 3 Medium, was broadly panned for its offbeat artifacts and shortcomings in adhering to prompts. The firm warns that Stable Diffusion 3.5 models may have similar prompting issues; it lays the blame on engineering and architectural trade-offs. But Stability also says the models are more robust than their predecessors in cranking out images in a host of different styles, including 3D art.
"Greater variation in outputs from the same prompt with different seeds may occur, which is intentional as it helps preserve a broader knowledge-base and diverse styles in the base models," Stability wrote in a blog post shared with TechCrunch. "However as a result, prompts lacking specificity might lead to increased uncertainty in the output, and the aesthetic level may vary."
One thing that hasn't changed with the new models is Stability's licenses.
Like all previous Stability models, the Stable Diffusion 3.5 series are free to use for "non-commercial" purposes, including research. Businesses with less than $1 million in annual revenue may commercialize them at no cost. Organizations with more than $1 million in revenue have to enter a contract with Stability to obtain an enterprise license.
Stability caused a scandal this summer over its prohibitive fine-tuning terms, which accorded--or at least seemed to accord--the company the right to charge fees for models trained on images from its image generators. The company backtracked following the backlash, letting in more liberal commercial uses. Stability reasserted today that users own the media they generate with Stability models.
"We want creators to distribute and monetize their work all along the pipeline," said Ana Guillén, VP of marketing and communications at Stability, via email. "As long as they share a copy of our community license with the users of those creations and give 'Powered by Stability AI' visibility on websites, user interfaces, blog posts, About pages, or product documentation."
Stable Diffusion 3.5 Large and Diffusion 3.5 Large Turbo can be self-hosted or used through Stability's API and third-party platforms including Hugging Face, Fireworks, Replicate, and ComfyUI. Stability says it looks forward to releasing the ControlNets for the models, which will facilitate fine-tuning in the coming days.
Stability's models, like most AI models, are trained on public web data — some of which may be copyrighted or under a restrictive license. Stability and many other AI vendors argue that the fair-use doctrine shields them from copyright claims. But that hasn't stopped data owners from filing a growing number of class action lawsuits.
It leaves it to customers to defend their own rights, and has no payout carve-out should it be determined liable.
It does permit data owners to request removal of their data from its training datasets. By March 2023, artists had removed 80 million images from Stable Diffusion's training data, the company reports.
In a statement to me about safety measures around misinformation ahead of the U.S. general elections, Stability said that it "has taken—and continues to take—reasonable steps to prevent the misuse of Stable Diffusion by bad actors." The startup declined to give specific technical details about those steps, however.
As of March, Stability only explicitly prohibited the production of "deceptive" content using its generative AI tools—did not prohibit content that could influence elections, hurt election integrity, or features politicians and public figures.