OpenAI's DevDay introduces the Realtime API and other exciting features for AI app developers.

t's been a tumultuous week or so for OpenAI, with executive departures and significant developments in the world of fundraising
OpenAI's DevDay introduces the Realtime API and other exciting features for AI app developers.

It's been a tumultuous week or so for OpenAI, with executive departures and significant developments in the world of fundraising, but the startup was all business Tuesday at its 2024 DevDay, where it sought to persuade developers to build tools with its AI models. The company announced several new tools Tuesday, including a public beta for its "Realtime API," which makes it easier to build apps with low-latency, AI-generated voice responses. Not quite the advanced voice mode presented by ChatGPT, but close.

Speaking in a press briefing ahead of the conference, OpenAI Chief Product Officer Kevin Weil said that the recent departures of chief technology officer Mira Murati and chief research officer Bob McGrew would not affect the company's progress.

"I'll start by saying Bob and Mira have been incredible leaders. I've learned a lot from them, and they are a huge part of getting us to where we are today," said Weil. "And also, we're not going to slow down."

As OpenAI undergoes its latest C-suite shuffle – the latest reminder of the chaos that followed last year's DevDay – the company is trying to persuade developers that it remains their best bet for building AI apps. Leaders tout that this startup boasts more than 3 million developers building with its AI models, but it's finding itself operating in an increasingly crowded space.

In a statement, OpenAI said that it had lowered the cost for developers to reach its API by 99% in the last two years. It, however, might have been prompted by constant undercutting from competitors like Meta and Google.

One of its new features, dubbed the Realtime API, will give developers a chance to build nearly real-time, speech-to-speech experiences within their apps, with the choice of using six voices provided by OpenAI. Those voices are different from the choices for ChatGPT, and third party voices aren't available, lest developers want to end up with a copyright dispute on their hands. (The ambiguously-Scarlett-Johansson-based voice isn't available anywhere.)

A demo at the briefing by Romain Huet, head of developer experience, OpenAI, demonstrated how a trip-planning application can be built on top of the Realtime API. Users can converse with an AI assistant about an upcoming trip to London and receive low-latency responses. The Realtime API also gives access to a number of tools, so the app was able to annotate a map with restaurant locations as it answered.

At another demo, Huet showed how the Realtime API could conduct a phone call with a human to inquire as to ordering food for some kind of event. Not like Google's infamous Duo, OpenAI's API cannot call restaurants or shops directly, but it can interface with calling APIs like Twilio to do so. Notably, OpenAI doesn't appear to be adding disclosures so that its AI models automatically identify themselves on calls like this, despite the fact that those AI-generated voices sound pretty realistic. For now, it seems to be up to developers to add that disclosure, something which could be required by a new law in California.

Meanwhile, as part of its DevDay announcements, OpenAI also introduced vision fine-tuning in its API, allowing developers to fine-tune their applications of GPT-4o using both images and text. That should, theoretically at least, improve the task of visual understanding through GPT-4o. Says the head of product API at OpenAI, Olivier Godement, to TechCrunch: "They won't be able to upload copyrighted imagery, such as this picture of Donald Duck, images that depict violence or any imagery that violates OpenAI's safety policies.".

OpenAI is rushing to catch up with what its AI model-licensing rivals already provide: its prompt caching feature, for example, is just like the feature Anthropic launched a few months ago, enabling developers to cache frequently used context between API calls thereby decreasing costs and improving latency. According to OpenAI, developers can save 50% using this feature while Anthropic is promising a 90% discount for it.

Finally, OpenAI has provided a model distillation feature so that developers can apply the ability of large models like o1-preview and GPT-4o to fine-tune even smaller models like GPT-4o mini. Using smaller models is likely to be cheaper to run than larger models, but this feature should enable developers to make those smaller AI models better. OpenAI, in its model distillation, is announcing a beta evaluation tool that lets developers measure the performance of their fine-tune within OpenAI's API.

Maybe DevDay made more waves with what it didn't announce. For instance, the GPT Store announced at last year's DevDay was no news here. Our last reports said OpenAI had been experimenting with a revenue-sharing program with some of GPTs' most popular creators, but that has been about it.

Probably, OpenAI will also maintain its silence regarding the release of new AI models within this year's DevDay. Developers who were waiting for OpenAI o1 (not preview/mini version) or video generation model, Sora, developed by the company will have to wait a little longer.

Blog
|
2024-10-02 19:52:13