Apr 5, 2024

OpenAI's Voice Engine - What It Means for Cybersecurity

OpenAI's Voice Engine - What It Means for Cybersecurity

OpenAI's Voice Engine - What It Means for Cybersecurity

OpenAI's Voice Engine - What It Means for Cybersecurity

OpenAI's Voice Engine revolutionizes text-to-speech, raising cybersecurity and authenticity challenges

OpenAI's Voice Engine revolutionizes text-to-speech, raising cybersecurity and authenticity challenges

OpenAI's Voice Engine revolutionizes text-to-speech, raising cybersecurity and authenticity challenges

Ross Lazerowitz

Co-Founder and CEO

Last week, OpenAI put out a blog post outlining Voice Engine. Voice Engine is a text-to-speech model that can produce natural and realistic speech from a 15-second voice sample. This technology has been used behind the scenes to power their text-to-speed API and ChatGPT Voice. This technology is not new; text-to-speech has existed since the late 50s, but neural model development has accelerated in recent years, giving rise to natural-sounding and realistic voices. They demoed exciting use cases like language translation, voice recovery for people with neurological issues, and communication for non-verbal people.


What's notable about this blog post is OpenAI's outlined decision to slow down on this technology. Although many companies like ElevenLabs, Play.ht, Rime Labs, and others offer APIs for text-to-speed and voice cloning, it appears that Voice Engine produces a much more realistic voice.


What does this mean for cybersecurity?

While the blog post mainly focuses on use cases, it concludes with a warning about the potential societal effects of this technology and a few recommendations:


"Engine both underscores its potential and motivates the need to bolster societal resilience against the challenges brought by ever more convincing generative models. Specifically, we encourage steps like: - Phasing out voice based authentication as a security measure for accessing bank accounts and other sensitive information   - Exploring policies to protect the use of individuals' voices in AI - Educating the public in understanding the capabilities and limitations of AI technologies, including the possibility of deceptive AI content - Accelerating the development and adoption of techniques for tracking the origin of audiovisual content, so it's always clear when you're interacting with a real person or with an AI."

Let's read between the lines on some of these:


Voice-based authentication is dead

“Phasing out voice based authentication as a security measure for accessing bank accounts and other sensitive information”


There has been a lot of talk of deepfake audio detection. For example, Pindrop, whose main line of business is voice authentication systems, is leading the way here, offering a detection service to trace the Biden deepfake eleven-labs. I infer from this recommendation that the author does not believe the detection services will be reliable going forward. Deepfake audio detection may go the way of the GPT text detector OpenAI hosted before GPT-4. After its release, they took it down, stating that they no longer had a reliable way to detect text generated by their model. For voice, they noted that they would include a watermark in audio, but there would be methods to defeat this, as well as improvements in open-source models that the bad guys could run themselves.


The need for media traceability

"Accelerating the development and adoption of techniques for tracking the origin of audiovisual content so it's always clear when you're interacting with a real person or with an AI"


There’s a promising new standard for creating media traceability from the Colation for Content Provenance and Authencity, or C2PA. Back by Google, Microsoft, Adobe, and other tech giants, it seeks to create an easy way to sign and verify content. It’s very early, but if the major social platforms and smartphone makers adopt it with tight integration with Adobe's creative tools, it might make the web safer. I remain skeptical since we’ve had PGP for email for 30+ years, and most users still can’t quickly adopt it, but I’m hopeful.


Education is the low hanging fruit

"Educating the public in understanding the capabilities and limitations of AI technologies, including the possibility of deceptive AI content"


I can't agree with this one more strongly. If you grew up trusting what you could see and hear, this would be a massive change to your priors. We need to improve at making the best anomaly detectors in the world and make humans better at understanding this technology. At Mirage Security, we are trying to make this a reality. Reach out if you’d like to learn more.


Where it's all going

In summary, OpenAI's Voice Engine marks a significant advancement in text-to-speech technology, offering both incredible opportunities for accessibility and new challenges for cybersecurity. The technology's potential for creating realistic deepfakes calls for reevaluating security measures, particularly voice-based authentication. It underscores the importance of developing methods to verify the authenticity of digital content. Educating the public about AI's capabilities and limitations is crucial in navigating these challenges. As we embrace these innovations, a collaborative effort among tech companies, security professionals, and policymakers is essential to leverage the benefits while protecting against misuse, aiming for a secure and informed digital society.

Try Mirage

Learn how to protect your organization from spearphishing.

Free Vishing Simulation

Concerned about voice phishing? Get a free vishing simulation and speak directly with our founders.

© Copyright 2024, All Rights Reserved by ROSNIK Inc.

Free Vishing Simulation

Concerned about voice phishing? Get a free vishing simulation and speak directly with our founders.

© Copyright 2024, All Rights Reserved by ROSNIK Inc.


Free Vishing Simulation

Concerned about voice phishing? Get a free vishing simulation and speak directly with our founders.

© Copyright 2024, All Rights Reserved by ROSNIK Inc.


Free Vishing Simulation

Concerned about voice phishing? Get a free vishing simulation and speak directly with our founders.

© Copyright 2024, All Rights Reserved by ROSNIK Inc.