Home artificial intelligence ChatGPT’s Advanced Voice Mode is uncanny

artificial intelligence

ChatGPT’s Advanced Voice Mode is uncanny

By

Dan Brokenhouse

-

August 27, 2024

17

The new ChatGPT’s voice capabilities

The new ChatGPT Advanced Voice option from OpenAI, which is finally available to a small number of users in an “alpha” group, is a more realistic, human-like audio conversational option for the popular chatbot that can be accessed through the official ChatGPT app for iOS and Android.

However, as reported here, people are already sharing videos of ChatGPT Advanced Voice Mode on social media, just a few days after the first alpha testers used it. They show it making incredibly expressive and amazing noises, mimicking Looney Toons characters, and counting so quickly that it runs out of “breath,” just like a human would.

Here are a few of the most intriguing examples that early alpha users on X have shared.

Language instruction and translation

Several users on X pointed out that ChatGPT Advanced Voice Mode may offer interactive training specifically customized to a person trying to learn or practice another language, suggesting that the well-known language learning program Duolingo may be in jeopardy.

ChatGPT’s advanced voice mode is now teaching French!👀 pic.twitter.com/JnjNP5Cpff
— Evinstein 𝕏 (@Evinst3in) July 30, 2024

RIP language teachers and interpreters.
Turn on volume. Goodbye old world.

New GPT Advanced Voice.
Thoughts? pic.twitter.com/WxiRojiNDH
— Alex Northstar (@NorthstarBrain) July 31, 2024

The new GPT-4o model from OpenAI, which powers Advanced Voice Mode as well, is the company’s first natively multimodal large model. Unlike GPT-4, which relied on other domain-specific OpenAI models, GPT-4o was made to handle vision and audio inputs and outputs without linking back to other specialized models for these media.

>>> Beware of your evil twin

As a result, if the user allows ChatGPT access to their phone’s camera, Advanced Voice Mode can talk about what it can see. Manuel Sainsily, a mixed reality design instructor at McGill University, provided an example of how Advanced Voice Mode used this feature to translate screens from a Japanese version of Pokémon Yellow for the GameBoy Advance SP:

Trying #ChatGPT’s new Advanced Voice Mode that just got released in Alpha. It feels like face-timing a super knowledgeable friend, which in this case was super helpful — reassuring us with our new kitten. It can answer questions in real-time and use the camera as input too! pic.twitter.com/Xx0HCAc4To
— Manuel Sainsily (@ManuVision) July 30, 2024

Humanlike utterances

Italian-American AI writer Cristiano Giardina has shared multiple test results using the new ChatGPT Advanced Voice Mode on his blog, including a widely shared demonstration in which he shows how to ask it to count up to 50 increasingly quickly. It obeys, pausing only toward the very end to catch a breather.

ChatGPT Advanced Voice Mode counting as fast as it can to 10, then to 50 (this blew my mind – it stopped to catch its breath like a human would) pic.twitter.com/oZMCPO5RPh
— Cristiano Giardina (@CrisGiardina) July 31, 2024

Giardina later clarified in a post on X that ChatGPT’s Advanced Voice Mode has simply acquired natural speaking patterns, which include breathing pauses, and that the transcript of the counting experiment showed no breaths.

As demonstrated in the YouTube video below, ChatGPT Advanced Voice Mode can even mimic applause and clearing its throat.

>>> Chinese scientists invent a lit fabric

Beatboxing

In a video that he uploaded to X, startup CEO Ethan Sutin demonstrated how he was able to get ChatGPT Advanced Voice Mode to beatbox convincingly and fluently like a human.

Yo ChatGPT Advanced Voice beatboxes pic.twitter.com/yYgXzHRhkS
— Ethan Sutin (@EthanSutin) July 30, 2024

Audio storytelling and roleplaying

If the user instructs ChatGPT to “play along” and creates a fictional situation, such as traveling back in time to Ancient Rome, it can also roleplay (the SFW sort), as demonstrated by University of Pennsylvania Wharton School of Business Ethan Mollick in a video uploaded to X:

ChatGPT, engage the Time Machine!

(A big difference from text is how voice manages to keep a playful vocal tone: cracking and laughing at its own jokes, as well as the vocal style changes, etc.) pic.twitter.com/TQUjDVJ3DC
— Ethan Mollick (@emollick) August 1, 2024

In this example, which was obtained from Reddit and uploaded on X, the user can ask ChatGPT Advanced Mode to tell a story. It will do so completely with its AI-generated sound effects, such as footsteps and thunder.

‼️A Reddit user (“u/RozziTheCreator”) got a sneak peek of ChatGPT’s upgraded voice feature that's way better and even generates background sound effects while narrating !

Take a listen 🎧 pic.twitter.com/271x7vZ9o3
— Sambhav Gupta (@sambhavgupta6) June 27, 2024

In addition, it is capable of mimicking the voice of an intercom:

Testing ChatGPT Advanced Voice Mode’s ability to create sounds.
It somewhat successfully sounds like an airline pilot on the intercom but, if pushed too far with the noise-making, it triggers refusals. pic.twitter.com/361k9Nwn5Z
— Cristiano Giardina (@CrisGiardina) July 31, 2024

Mimicking and reproducing distinct accents

Giardina demonstrated how numerous regional British accents can be imitated using ChatGPT Advanced Voice Mode:

>>> Artificial Intelligence and the frightening Roko's Basilisk

ChatGPT Advanced Voice Mode speaking a few different British accents:
– RP standard
– Cockney
– Northern Irish
– Southern Irish
– Welsh
– Scottish
– Scouse
– Geordie
– Brummie
– Yorkshire
(I had to prompt like that because the model tends to revert to a neutral accent) pic.twitter.com/TDfSIY7NRh
— Cristiano Giardina (@CrisGiardina) July 31, 2024

…as well as interpret a soccer commentator’s voice:

ChatGPT Advanced Voice Mode commentating a soccer match in British English, then switching to Arabic pic.twitter.com/fD4C6MqZRj
— Cristiano Giardina (@CrisGiardina) July 31, 2024

Sutin demonstrated its ability to mimic a variety of regional American accents, such as Southern Californian, Mainean, Bostonian, and Minnesotan/Midwestern.

a tour of US regional accents pic.twitter.com/Q9VypetncI
— Ethan Sutin (@EthanSutin) July 31, 2024

And it can imitate fictional characters, too…

In conclusion, Giardina demonstrated that ChatGPT Advanced Voice Mode can mimic the speech patterns of many fictitious characters in addition to recognizing and comprehending their differences:

ChatGPT Advanced Voice Mode doing a few impressions:
– Bugs Bunny
– Yoda
– Homer Simpson
– Yoda + Homer 😂 pic.twitter.com/zmSH8Rl8SN
— Cristiano Giardina (@CrisGiardina) July 31, 2024

Anyway, what are the practical benefits of this mode? Apart from engaging and captivating demonstrations and experiments, will it enhance ChatGPT’s utility or attract a broader audience? Will it lead to an increase in audio-based frauds?

As this technology becomes more widely available, it could revolutionize fields such as language learning, audio content creation, and accessibility services. However, it also raises potential concerns about voice imitation and the creation of misleading audio content. As OpenAI continues to refine and expand access to Advanced Voice Mode, it will be crucial to monitor its impact on various industries and its potential societal implications.

LEAVE A REPLY Cancel reply

Seraphinite Accelerator

Optimized by Seraphinite Accelerator
Turns on site high speed to be attractive for people and search engines.