There seems to be a huge space for generative AI in the translation world, and one startup is taking it further than ever conceived: a hyperrealistic dubbing tool based on GenAI video that reproduces in exactly the same form as a person's original voice, imitating the spoken new language while the video and the speaker's physical movements automatically modify to match up with the new patterns of speech.
Three years into stealth mode, the company is launching BodyTalk, its first product, with the company's first external funding of $9.5 million.
Panjaya was founded by two deep learning specialists, Hilik Shani and Ariel Shalom, who spent most of their professional life quietly building deep learning technology for the Israeli government as respectively the startup's general manager and CTO. They hung up their G-man hats in 2021 with the startup itch and a little over 1.5 years ago were joined by Guy Piekarz as CEO.
Piekarz isn't a founder at Panjaya, but an intriguing name to have on the list: In 2013, he sold Matcha, the startup he actually founded, to Apple. Matcha was an early, buzzy player in streaming video discovery and recommendation. It was acquired during what turned out to be the very early days of Apple's strategy around TV and streaming, when these were more rumors than actual products. The young, scrappy start-up was bootstrapped and sold for a song: $10 million to $15 million – modest when compared to the significant steer Apple has since made into streamed media. Piekarz stayed with Apple for nearly a decade building Apple TV and then its sports vertical. He was then introduced to Panjaya through Viola Ventures, one of its backers (others include R-Squared Ventures, JFrog co-founder and CEO Shlomi Ben Haim, Chris Rice, Guy Schory, Ryan Floyd of Storm Ventures, Ali Behnam of Riviera Partners, and Oded Vardi.).
"I was long gone from Apple by then and was going to do something completely different," Piekarz said. "But the demo blew my mind, and the rest is history."
BodyTalk is interesting in how it brings together several pieces of technology that play on different aspects of synthetic media into the frame.
It starts with audio-based translation that can now offer translations in 29 languages.
Then the translation is voiced in an imitation of the original speaker, and that voice is placed over a version of the original video in which the speaker's lips and other movements are altered to match the new words and phrasing. All this is generated automatically on videos after they have been uploaded by the users themselves, who also receive access to a dashboard with even more editing tools. Future plans include an API, as well as getting closer to real-time processing. (Right now, BodyTalk is "near real-time," taking minutes to process videos, Piekarz said.) "We're using best of breed where we need to," Piekarz said of the company's use of third-party large language models and other tools. "And we're building our own AI models where the market doesn't really have a solution."
An example of that is the company's lip syncing, he continued. "Our whole lip sync engine is homegrown by our AI research team, because we haven't found anything that gets to that level and quality of multiple speakers, angles, and all the business use cases we want to support."
For now, it is B2B focused; the clients include JFrog and the TED media organization. The company has several plans to extend further into media, especially on spots for education, marketing, healthcare, and medicine.
The resulting translation videos are pretty uncanny not unlike what you get with deepfakes, although Piekarz winces at that term that has picked up negative connotations over the years that are the opposite of the market the startup is targeting.
"Deepfake" is not something that we're interested in, he said. "We're looking to avoid that whole name." Instead, he said, think of Panjaya as part of the "deep real category."
The company is setting "guardrails" around the technology to prevent misuse by targeting only the B2B market and limiting who will have access to its tools, added Von Manstein. He believes, much further out there will be many more tools developed, including watermarking, that will assist in catching when any video has been altered to create synthetic media, both legitimate and nefarious. "We certainly want to be a part of that and not let misinformation," he said.
The fine print
The other companies, which might be said to also compete with Panjaya in a broader sense, include the big names like Vimeo and ElevenLabs, as well as the smaller ones like Speechify and Synthesis. They all seem to feel that improving dubbing is a bit like trying to swim into a gale-force headwind. To be sure, captions have become an extremely standard part of how people consume videos these days.
It's because of a myriad of other reasons such as poor speakers, background noise in our busy lives, mumbling actors, lower production budgets, and more sound effects on television screens. CBS conducted a survey among American TV viewers and found that over half of Americans kept subtitles on "some (21%) or all (34%) of the time".
But some love subtitles simply because it's fun to read them, and there's been an entire cult built around that.
On social media and other apps, subtitles are baked right into the experience. TikTok, for example, started in November 2023 activating captioning by default on all videos.
At the same time, there exists a vast market for dubbed content beyond borders, and despite the fact that English is considered the lingua franca of the internet by default, research groups such as CSA prove that content in native languages has better engagement, especially in the B2B context. According to Panjaya, even more natural content in native languages could perform better.
Its customers seem to back that theory. TED has reported that Talks dubbing through Panjaya's tooling saw growth of 115%, with the completion rate doubling in those translated videos.