In February, Google temporarily disabled Gemini’s ability to generate images of people after users reported historical inaccuracies. For instance, when asked to depict “a Roman legion,” Gemini would show an anachronistic group of racially diverse soldiers, while “Zulu warriors” were rendered in stereotypical Black representations.
Google CEO Sundar Pichai apologized, and Demis Hassabis, co-founder of DeepMind, mentioned that a fix would be implemented “in very short order” — within a couple of weeks. However, it took much longer than expected, despite some employees working 120-hour weeks. In the coming days, Gemini will again have the capability to create images of people, but only for specific users: those enrolled in Google’s paid Gemini plans (Gemini Advanced, Business, or Enterprise) will regain access to this feature as part of an early access test limited to English-language users.
Google has not provided a timeline for when this feature will be available for the free Gemini tier or in other languages. A Google spokesperson stated, “Gemini Advanced gives our users priority access to our latest features. This helps us gather valuable feedback while delivering a highly anticipated feature first to our premium subscribers.”
Regarding the fixes for the people generation feature, Google claims that Imagen 3, the latest image-generating model in Gemini, includes measures to ensure that the images it produces are more “fair.” For example, Imagen 3 was trained using AI-generated captions designed to improve the variety and diversity of concepts in its training data. Additionally, the training data was filtered for “safety” and reviewed with fairness considerations, according to Google.
When asked for more details about Imagen 3’s training data, a spokesperson said only that it was trained on “a large dataset comprising images, text, and associated annotations.” They added, “We’ve significantly reduced the potential for undesirable responses through extensive internal and external red-teaming testing, collaborating with independent experts to ensure ongoing improvement. Our focus has been on rigorously testing people generation before turning it back on.”
On a positive note, all Gemini users will receive Imagen 3 within the week, although those not subscribed to premium tiers will not have access to the people generation feature. Google claims that Imagen 3 offers better understanding of text prompts compared to its predecessor, Imagen 2, and is more “creative and detailed” in its image generations. Additionally, it produces fewer artifacts and errors and is the best Imagen model to date for rendering text.
To address concerns regarding the potential for deepfakes, Imagen 3 will implement SynthID, a method developed by DeepMind that applies invisible, cryptographic watermarks to various forms of AI-generated media. Google had previously announced that Imagen 3 would utilize SynthID, so this isn't particularly surprising. However, it's worth noting the difference in how Google is handling image generation in Gemini compared to its other products, such as Pixel Studio, which raises some curiosity.
In addition to Imagen 3, Google is introducing Gems for Gemini, but this feature is available only for Gemini Advanced, Business, and Enterprise users. Similar to OpenAI’s GPTs, Gems are customized versions of Gemini that can function as “experts” on specific topics, such as vegetarian cooking.
Google describes Gems in a blog post: “With Gems, you can create a team of experts to help you think through a challenging project, brainstorm ideas for an upcoming event, or write the perfect caption for a social media post. Your Gem can also remember a detailed set of instructions to help you save time on tedious, repetitive, or difficult tasks.”
To create a Gem, users simply write instructions, assign a name, and get started.
Gems are accessible on desktop and mobile in 150 countries and “most languages,” according to Google, although they are not yet supported in Gemini Live. At launch, several examples are available, including a “learning coach,” a “career guide,” a “brainstormer,” and a “coding partner.”
When asked if Google plans to allow users to publish and utilize other users’ Gems, akin to OpenAI's GPT Store, the response was a definitive “no.”
A Google spokesperson stated, “Right now, we’re focused on learning how people will use Gems for creativity and productivity. Nothing further to share at this time.”