ElevenLabs, an AI voice cloning and text-to-speech API provider, on Monday rolled out the ability to build conversational AI bots in its developer platform.
According to the company, users now can build full conversational agents on its developer platform. Variables such as tone of voice and response length are also customizable.
According to the reports, ElevenLabs has focused mainly on creating various voices and AI tools for text-to-speech services. Sam Sklar, head of growth at the company, told TechCrunch that many of its clients were already using this ability to create conversational AI agents, but the toughest parts were integrating the knowledge base and handling interruptions from customers. That's why the company decided to build a full pipeline for conversational bots.
Users can log into their ElevenLabs account and get started on building a conversation agent by choosing a template or creating a new project. They can pick the agent's primary language, first message, and system prompt to define the agent's persona. Developers also need to pick a large language model-the Gemini, GPT, or Claude-and choose the temperature of responses to determine how creative the response should be as well as token usage limit.
They may also fine-tune other parameters such as voice, latency, stability, authentication criteria, and the maximum number of conversations with the AI agent.
Users can add their own knowledge base, a file, URL, or even text block, to power the conversational bot. In addition, they can = integrate their own custom LLM with the bot. ElevenLabs SDK is compatible with Python, JavaScript, React, and Swift. Also, the company offers a WebSocket API for more customization.
Companies can also define criteria to collect some data items — for example, the names and email addresses of customers speaking to the agent — together with evaluation criteria in natural language to define success or failure of the call.
ElevenLabs is leverage its existing pipeline for the text-to-speech part. The company has to develop speech-to-text capabilities for the new conversational AI product. The company is not offering its speech-to-text API as a stand-alone product as of now, but it might do that in the future, making it a competitor to Google’s, Microsoft’s, and Amazon’s speech-to-text APIs, as well as specialized APIs, such as OpenAI’s Whisper, AssemblyAI, Deepgram, Speechmatics and Gladia.
The company, which is aiming to raise new funding at a valuation north of $3 billion, also competes with other voice AI startups, such as Vapi and Retell — they are also building conversational agents. More notably, the company will rival OpenAI’s real-time conversational API. However, ElevenLabs believes that its customizations and ability to switch models will give it an edge over OpenAI.