This week in AI, synthetic data, mainly GANs, really rose to prominence.
OpenAI last Thursday rolled out a new interface with its chatbot platform ChatGPT in the form of Canvas. It opens an opening window with a work place for writing and coding projects. Users can generate text or code in the work space of the Canvas and then highlight sections as they would in Word if one needs to edit using ChatGPT.
From the user's perspective, Canvas is a huge quality-of-life improvement. The most interesting part of the feature, to us however is the fine-tuned model that gives it life. OpenAI said it used its GPT-4o model "tailored with synthetic data to enable new user interactions in Canvas.".
We applied novel techniques for synthesizing data, including distilling outputs from o1-preview at OpenAI, to fine-tune the GPT-4o all the way open canvas, to make targeted edits and to be able to leave high-quality comments in line, ChatGPT head of product Nick Turley wrote on X. This meant we were able to train the model faster than usual and unlock new user interactions without requiring data generated by people.
Worth noting, too, is the fact that OpenAI is not alone in Big Tech companies seeking synthetics to train models.
For Movie Gen, the suite of AI-powered tools to create and edit video clips, Meta relied in part on synthetic captions generated by an offshoot of its Llama 3 models. The company assembled a team of human annotators to correct errors in and add greater detail to those captions, but much of the heavy lifting was largely automated.
AI will, one day, create synthetic data good enough to train itself effectively, OpenAI CEO Sam Altman has argued-that would be great for companies like OpenAI, which spent a fortune on human annotators and data licenses.
Meta fine-tuned the Llama 3 models themselves using synthetic data. And OpenAI is said to be sourcing synthetic training data from o1 for its next-generation model, code-named Orion.
On the other hand, going forward with a synthetic-data-first approach poses risks. A researcher recently pointed out to me that the models used for generating synthetic data literally hallucinate---that is, they make things up---and contain biases and limitations of their own. Such flaws appear in the generated data produced by those models.
Using safe synthetic data, therefore, requires proper curation and filtering—precisely as with any human-created data set. Otherwise, this situation can lead to a model's collapse, wherein a model will become less "creative" and more biased in its outputs, seriously impairing the functionality of the model over time.
This isn't easy at scale. But with real-world training data getting more expensive-not to mention increasingly difficult to obtain-vendors selling AI may see synthetic data as their only hope. Let's hope they're careful.
News
Ads in AI Overviews: Google says it will shortly start showing ads in AI Overviews, the AI-generated summaries it returns for some Google Search queries.
Google Lens, now with video: Lens is Google's visual search app, upgraded to be able to answer near-real-time questions about your surroundings. You can capture a video by Lens and ask questions about objects of interest in the video. (Ads probably coming for this too.)
From Sora to DeepMind: OpenAI's video generator, Sora lead Tim Brooks is leaving to join rival Google DeepMind. "Starting a new adventure @GoogleDeepMind," Brooks posted on X. "I'll be working on video generation technologies and world simulators."
Fluxing it up: Black Forest Labs, the Andreessen Horowitz-backed startup behind the image-generation component of xAI's Grok assistant has gone beta with an API - and released a new model.
Not so transparent: California's just-passed AB-2013 bill requires companies developing generative AI systems to publish a high-level summary of the data that they used to train their systems. So far, few companies are willing to say whether they'll comply. The law gives them until January 2026.
Research paper of the week
Apple researchers have been working hard at doing so-called computational photography for years, and depth mapping is an important part of that process. It was originally done using stereoscopy or a dedicated depth sensor like a lidar unit, but those tend to be expensive, complex, and take up valuable internal real estate. Doing it strictly in software would be preferable in many ways. This is what this paper, Depth Pro, is all about.
Aleksei Bochkovskii et al. now share a method for zero-shot monocular depth estimation with high detail-meaning it uses a single camera doesn't need to be trained on specific things, like it works on a camel even though it has never seen one-and catches even difficult aspects like tufts of hair. It's probably being used on iPhones today, although likely an optimized, custom-built version, but if you're feeling adventurous, you can try it out yourself using the code on this GitHub page to do a little depth estimation.
Model of the week
Google released a new model in its Gemini family called Gemini 1.5 Flash-8B, which it says is one of the most performant.
A "distilled" version of Gemini 1.5 Flash, already optimized for speed and efficiency, Gemini 1.5 Flash-8B costs 50% less to use, has lower latency, and comes with 2x higher rate limits in AI Studio, Google's AI-focused developer environment.
The Flash-8B nearly matches the performance of the 1.5 Flash model, which debuted in May, on most benchmarks, Google writes in a blog post. "Our models continue to be informed by developer feedback and our own testing of what is possible.".
Google says Gemini 1.5 Flash-8B is a good fit for chat, transcription and translation, and for just about any other task that's "simple" and "high-volume." The model is also free through Google's Gemini API, albeit with a rate limit of 4,000 requests per minute.
Grab bag
Anthropic has released Message Batches API, which allows devs to process huge numbers of AI model queries asynchronously at a lower cost.
Like with Google's batching requests for the Gemini API, Anthropic's Message Batches API lets devs send batches up to a certain size — 10,000 queries — per batch. Each batch is processed in a 24-hour period and costs 50% less than standard API calls.
Anthropic notes that its Message Batches API is ideal for broad activities such as analyzing vast datasets, significant classification of large-scale datasets, and even model evaluation. As the company puts in a post, "for example," analyzing an entire corporate document repository – say, million-file in size – becomes economically feasible by using that batching discount.
The API for message batches is now in public beta and works with all Anthropic models, including Claude 3.5 Sonnet, Claude 3 Opus, and Claude 3 Haiku.