PIONEER: Building the world's best AI voice-cloning company from a warzone | Alex Serdiuk

The CEO of Respeecher on winning an Emmy, making grown men cry with an AI-resurrection of Luke Skywalker, and being shelled by the Russian army

Feb 22, 2023

Hello all,

I’m releasing the next instalment of my PIONEER conversation series to mark the one-year anniversary of the Russian invasion of Ukraine later this week. In this episode, I speak to a founder running his company from a warzone.

🇺🇦Alex Serdiuk is the CEO of Respeecher, a Ukranian company at the cutting edge of AI-powered voice cloning. The invasion hasn’t stopped Alex and his team from manifesting their vision of using AI to create the highest quality voices — whilst testing the limits of what is possible with synthetically-generated voice.

Voice cloning as an art form

It all started in 2018 when AI breakthroughs led Alex and his co-founders to ask if machine learning could be applied to make the highest-quality voices. Turns out that AI can do a heck of a lot when it comes to voice.

🎤 CLONING: First, AI has mastered the ability to ‘clone’ any human voice. Although that seems self-evident now, it’s truly astounding how quickly this even emerged as a possibility — let alone an inevitability. Even then, there are AI-generated voices — and there are AI-generated voices.

On one end of the spectrum, the advent of text-to-speech models means the generation of slightly robotic AI voices from text input. This is highly scalable, mass-market stuff. On the other end of the spectrum — where Respeecher sits, speech-to-speech models are used to make voice cloning a high art form. Using voice as the input data, the final outputs are immense. These vocals will wash over you in an immersive surround-sound experience, and you’d never know they are AI-generated.

🎤 SKINING: After mastering the art of cloning, Alex and his team started exploring new capabilities to ‘skin’ voices. This means taking a voice — and transforming it with a ‘skin’ into… almost anything.

An actor can deliver a performance in their normal voice for example, and have it skinned in post-production to make make it raspier, softer, harder. AI can even be used to make a voice speak in different accents; even more astonishing, in a different language — all while its unique attributes: tone, depth and tenor, are preserved. We all know how crap dubbing can ruin a movie experience right? Well now, all vocal performances can not only be improved but localised too.

The ultimate kicker? All of these capabilities can be deployed — like so much of the Generative AI revolution — as a SaaS.

Augment or automate?

🎭 One of the recurrent trends in Generative AI — not least with voice synthesis — is the question of job automation. Will voice cloning cause performers to lose work?
Yes, synthetic voices will become a ‘thing’— but Alex is also categorical about huge new opportunities for actors.

Even if off-the-shelf AI-generated voices are used in lower-quality content — for the best content, AI will have to work with humans. Because what AI cannot do, according to Alex, is deliver a performance. That’s where humans will have to do the heavy lift. Respeecher’s toolkit will be applied to augment rather than automate a human performance.

Alex also argues (correctly IMO), that voice synthesis will lead to new opportunities for performers to license and scale their IP (in this case, their vocals) into multiple projects and across markets. They’ll even be able to sign deals to license their IP posthumously.

Voice cloning in entertainment

No wonder Respeecher is making big waves in Hollywood.

🌙 It was part of a team that won an Emmy in 2020 for In the Event of Moon Disaster. An immersive art project, it tells an ‘alternative history’ of the Moon Landing. In an AI-manipulated film, Richard Nixon is resurrected in what looks like archive footage to read the real speech that was prepared for the President in the event of a moon disaster.

While the project was conceptualised as a public service announcement — an effort to innoculate the public on how hyperrealistic AI-generated media could be used as a tool of misinformation — its flawless execution was an early example of how Generative AI will develop into applications for sublime storytelling.

Since then, Respeecher has gone from strength-to-strength.

🎥 They’ve worked with Lucasfilm to ‘de-age,’ Luke Skywalker’s voice — something that was so powerful that it made ‘grown men cry.’ (I’m not a Starwars person myself… but I totally believe it.) They also hit the news when James Earl Jones agreed to immortalise his voice with AI to return as Darth Vadar in future TV projects.

Screen Rant @screenrant

Creator Jon Favreau reveals Mark Hamill didn't voice Luke Skywalker in The Mandalorian season 2 finale. "...His voice, the young Luke Skywalker voice, is completely synthesized using an application called Respeecher." buff.ly/3gyJjcW

Ars Technica @arstechnica

As James Earl Jones retires, Darth Vader's voice will be more machine now than man, thanks to a voice-cloning software called Respeecher.

trib.alDarth Vader’s voice will be AI-generated from now onUsing Respeecher, Vader will live on as a cloned voice effect performed by another actor.

Misuse of technology: A study on human nature

⚔️ Obviously, there are many ways in which AI voice cloning can go terribly wrong: mis- and disinformation, ‘vishing’, and non-consensual appropriation of identity at scale, to name a few. Just a few weeks ago, ElevenLabs, a new voice cloning start-up had to close its beta when it found that its tech (displayed in the tweet below), was being misused to do things like make actor Emma Watson’s voice read ‘Mein Kampf’ on 4Chan.

Lorenzo Green 〰️ @mrgreen

Are you amazed or horrified? This AI voiceover is FLAWLESS. Leo presenting as Joe Rogan, Steve Jobs, Robert Downey Jr, Bill Gates & Kim Kardashian.

Alex and I discuss the weaponisation of these technologies at length. Like many other pioneers I speak to, I found him to be deeply thoughtful about mitigating the (inevitable) misuse of voice cloning.

🌎 We agreed that the ‘good’ and ‘bad’ applications of technology are ultimately a story about humanity itself. Until we get to AGI (and I don’t think we are there yet!) this is about how all aspects of human nature (the good, bad and ugly) are amplified by exponential technologies — perhaps none more so than AI.

I am sure this is a reflection we will encounter again and again.

In this episode, we cover:

Generative AI and the future of content creation
Voice-cloning applications: from Hollywood movie studios to game development, security and healthcare.
Respeecher’s unique speech-to-speech vs text-to-speech approach
How will AI voice augment human performance vs automate it?
‘Performance’ and the limits of AI voice cloning
Voice synthesis as a medium to expand a performer’s IP
Balancing accessibility vs safety
Consent and cloning of voices
The weaponisation of voice cloning in ‘vishing,’ identity appropriation, and mis and disinformation
Legal issues around voice cloning: IP, privacy, copyright
‘Watermarking’ or authenticating voice tracks

Ps - I have loads of other interesting PIONEER interviews that I’ll be releasing soon. Watch this space.

For now — enjoy.

Nina