Microsoft will allow users of Teams to clone their voices so that their doppelgangers can speak to others in meetings in other languages.
At the Microsoft Ignite 2024 event on Tuesday, the company revealed a new tool, Interpreter in Teams, for Microsoft Teams that delivers "real-time, speech-to-speech" interpretation capabilities. Early in 2025, people using Teams for meetings will be able to use Interpreter to simulate their voices in up to nine languages: English, French, German, Italian, Japanese, Korean, Portuguese, Mandarin Chinese, and Spanish.
"Imagine being able to sound just like you in a different language," Microsoft CMO Jared Spataro wrote in a blog post shared with TechCrunch. "The Interpreter agent in Teams provides real-time speech-to-speech translation during meetings, and you can opt to have it simulate your speaking voice for a more personal and engaging experience."
In announcing the new feature, Microsoft provided few concrete details, and availability will be limited to Microsoft 365 subscribers. However, it did promise that the tool does not store any biometric data, does not add sentiments beyond those "naturally present" in a voice, and can be disabled by Teams settings.
Interpreter is designed to replicate the speaker's message as faithfully as possible without adding assumptions or extraneous information," a Microsoft spokesperson told TechCrunch. "Voice simulation can only be enabled when users provide consent via a notification during the meeting, or by enabling 'Voice simulation consent' in settings".
For instance, several firms have created tech that can digitally replicate voices with a reasonably natural quality. Meta recently reported that it was testing a translation tool that could automatically translate voices into various languages for use in Instagram Reels, and ElevenLabs offers a robust platform for multilingual speech generation.
AI translations are also typically less lexicographically dense than a human interpreter's translation, and AI translators often fail to capture cultural nuances, analogies and colloquialisms. The savings in cost, however, are tantalizing enough for some to make the trade-off worthwhile. By 2026, the market for natural language processing technologies, which includes translation technologies, could reach as high as $35.1 billion, Markets and Markets said.
AI clones also raise security issues, though.
Deepfakes have spread across social media like wildfire and are increasingly difficult to separate from reality. Already this year, deepfakes of President Joe Biden, Taylor Swift, and Vice President Kamala Harris have garnered millions of views and reshares. Deepfakes have also been used against individuals, such as to pose as loved ones. Impersonation scams related to losses reached $1 billion last year, according to the FTC.
Just this year, a group of cyber criminals claimed to have held a Teams meeting that was so convincing that a company's C-level staff wired the criminals $25 million.
In part because of the risks (and optics), OpenAI earlier this year decided against rolling out its voice cloning tech, Voice Engine.
From what's been revealed so far, Interpreter in Teams is a pretty narrow application of voice cloning. Still, that doesn't mean the tool will be safe from abuse. One can imagine a bad actor feeding Interpreter a misleading recording — for example, someone asking for bank account information — to get a translation in the language of their target.
Hopefully, we’ll get a better idea of the safeguards Microsoft will add around Interpreter in the months to come.