OpenAI Unveils Voice Engine: A Revolutionary AI Voice Cloning Technology

Divya Bisht4 weeks ago

0 38 2 minutes read

OpenAI, the renowned artificial intelligence research organization, has recently unveiled Voice Engine, a groundbreaking voice cloning technology capable of emulating any speaker through analysis of a mere 15-second audio sample.

Promising “natural-sounding speech” with emotive and realistic voices, the ChatGPT-maker’s new model is built upon the foundation of OpenAI’s existing text-to-speech API and has been in development since 2022. The company has already integrated a version of this toolset to power preset voices within their current text-to-speech API and Read Aloud feature, showcasing samples on their official blog that closely resemble authentic voices.

OpenAI envisions numerous beneficial applications for Voice Engine, such as aiding in reading assistance, language translation, and assisting individuals with speech impairments. However, the company is also mindful of the potential misuse of the technology.

The spectre of deepfake manipulation looms large, prompting concerns regarding privacy and ethical implications. Consequently, OpenAI asserts that Voice Engine is not yet ready for widespread implementation, citing the imperative to address serious privacy concerns before proceeding with a full-scale rollout.

Acknowledging the significant risks associated with this technology, particularly during an election year, OpenAI underscores its commitment to soliciting feedback from various stakeholders spanning government, media, entertainment, education, and civil society. All preview testers have consented to adhere to OpenAI’s usage policies, which prohibit the impersonation of individuals without their explicit consent or legal authorization.

Moreover, users of Voice Engine must disclose to their audience that the voices being utilized are AI-generated. OpenAI has implemented additional safety measures, including watermarking to trace the origin of audio and proactive monitoring of system usage. Upon official release, a “no-go voice list” will be implemented to detect and prevent the use of AI-generated speakers bearing resemblance to prominent figures.

The launch of OpenAI’s Sora and now Voice Engine, come awfully close to one of the most hotly contested elections in the US. Naturally, concerns for the misuse of these new AI tools among political analysts and technology pundits run high.

As for pricing and availability, OpenAI has remained tight-lipped. Having said that potential pricing data suggests Voice Engine may undercut competitors in the market. Speculations indicate a cost of $15 per one million characters, roughly equivalent to 162,500 words, positioning Voice Engine as a cost-effective solution for audiobook production.

Additionally, OpenAI hints at an “HD” version with double the cost, although specifics regarding its functionality remain undisclosed.

In parallel to this announcement, OpenAI has forged another significant partnership with Microsoft to develop an AI-based supercomputer dubbed “Stargate,” with projected costs reaching $100 billion, as reported by The Information. These recent endeavours underscore OpenAI’s continued commitment to pioneering advancements in artificial intelligence and collaborative innovation with industry leaders.