ChatGPT’s Advanced Voice Mode is uncanny

0
17
AI speaking

The new ChatGPT’s voice capabilities

The new ChatGPT Advanced Voice option from OpenAI, which is finally available to a small number of users in an “alpha” group, is a more realistic, human-like audio conversational option for the popular chatbot that can be accessed through the official ChatGPT app for iOS and Android.

However, as reported here, people are already sharing videos of ChatGPT Advanced Voice Mode on social media, just a few days after the first alpha testers used it. They show it making incredibly expressive and amazing noises, mimicking Looney Toons characters, and counting so quickly that it runs out of “breath,” just like a human would.

Here are a few of the most intriguing examples that early alpha users on X have shared.

Language instruction and translation

Several users on X pointed out that ChatGPT Advanced Voice Mode may offer interactive training specifically customized to a person trying to learn or practice another language, suggesting that the well-known language learning program Duolingo may be in jeopardy.

The new GPT-4o model from OpenAI, which powers Advanced Voice Mode as well, is the company’s first natively multimodal large model. Unlike GPT-4, which relied on other domain-specific OpenAI models, GPT-4o was made to handle vision and audio inputs and outputs without linking back to other specialized models for these media.

As a result, if the user allows ChatGPT access to their phone’s camera, Advanced Voice Mode can talk about what it can see. Manuel Sainsily, a mixed reality design instructor at McGill University, provided an example of how Advanced Voice Mode used this feature to translate screens from a Japanese version of Pokémon Yellow for the GameBoy Advance SP:

Humanlike utterances

Italian-American AI writer Cristiano Giardina has shared multiple test results using the new ChatGPT Advanced Voice Mode on his blog, including a widely shared demonstration in which he shows how to ask it to count up to 50 increasingly quickly. It obeys, pausing only toward the very end to catch a breather.

Giardina later clarified in a post on X that ChatGPT’s Advanced Voice Mode has simply acquired natural speaking patterns, which include breathing pauses, and that the transcript of the counting experiment showed no breaths.

As demonstrated in the YouTube video below, ChatGPT Advanced Voice Mode can even mimic applause and clearing its throat.

Beatboxing

In a video that he uploaded to X, startup CEO Ethan Sutin demonstrated how he was able to get ChatGPT Advanced Voice Mode to beatbox convincingly and fluently like a human.

Audio storytelling and roleplaying

If the user instructs ChatGPT to “play along” and creates a fictional situation, such as traveling back in time to Ancient Rome, it can also roleplay (the SFW sort), as demonstrated by University of Pennsylvania Wharton School of Business Ethan Mollick in a video uploaded to X:

In this example, which was obtained from Reddit and uploaded on X, the user can ask ChatGPT Advanced Mode to tell a story. It will do so completely with its AI-generated sound effects, such as footsteps and thunder.

In addition, it is capable of mimicking the voice of an intercom:

Mimicking and reproducing distinct accents

Giardina demonstrated how numerous regional British accents can be imitated using ChatGPT Advanced Voice Mode:

…as well as interpret a soccer commentator’s voice:

Sutin demonstrated its ability to mimic a variety of regional American accents, such as Southern Californian, Mainean, Bostonian, and Minnesotan/Midwestern.

And it can imitate fictional characters, too…

In conclusion, Giardina demonstrated that ChatGPT Advanced Voice Mode can mimic the speech patterns of many fictitious characters in addition to recognizing and comprehending their differences:

Anyway, what are the practical benefits of this mode? Apart from engaging and captivating demonstrations and experiments, will it enhance ChatGPT’s utility or attract a broader audience? Will it lead to an increase in audio-based frauds?

As this technology becomes more widely available, it could revolutionize fields such as language learning, audio content creation, and accessibility services. However, it also raises potential concerns about voice imitation and the creation of misleading audio content. As OpenAI continues to refine and expand access to Advanced Voice Mode, it will be crucial to monitor its impact on various industries and its potential societal implications.