IE 11 Not Supported

For optimal browsing, we recommend Chrome, Firefox or Safari browsers.

How long does it take this AI to learn to replicate your voice?

Answer: Three seconds.

Digital illustration of a face with soundwaves coming from the mouth to indicate an AI speaking.
Shutterstock
Microsoft has developed an artificial intelligence tool that needs just three seconds to learn how to imitate your voice. Three seconds.

The key to this, according to Microsoft, is the vast amount of data that was used to train it. The AI, called VALL-E, was trained on more hours of voice data than any other of its kind — 60,000 hours, to be exact. This means it needs only a very small sample in order to replicate a voice.

VALL-E also excels at replicating the emotions, vocal tones and acoustic environment of a sample, which has been a big problem area for most AI voice programs. This helps make it sound more realistic. Microsoft claims that VALL-E “significantly outperforms the state-of-the-art zero-shot TTS [text-to-speech] system in terms of speech naturalness and speaker similarity.”