Thursday, June 13, 2024

Artificial Intelligence news

Apple is promising personalized...

At its Worldwide Developer Conference on Monday, Apple for the first time...

What using artificial intelligence...

This story originally appeared in The Algorithm, our weekly newsletter on AI....

The data practitioner for...

The rise of generative AI, coupled with the rapid adoption and democratization...

Five ways criminals are...

Artificial intelligence has brought a big boost in productivity—to the criminal underworld.  Generative...
HomeNewsNow you can...

Now you can chat with ChatGPT using your voice

In one of the biggest updates to ChatGPT yet, OpenAI has launched two new ways to interact with its viral app.  

First, ChatGPT now has a voice. Choose from one of five lifelike synthetic voices and you can have a conversation with the chatbot as if you were making a call, getting responses to your spoken questions in real time.

ChatGPT also now answers questions about images. OpenAI teased this feature in March with its reveal of GPT-4 (the model that powers ChatGPT), but it has not been available to the wider public before. This means that you can now upload images to the app and quiz it about what they show.

These updates join the announcement last week that DALL-E 3, the latest version of OpenAI’s image-making model, will be hooked up to ChatGPT so that you can now get the chatbot to generate pictures.

The ability to talk to ChatGPT draws on two separate models. Whisper, OpenAI’s existing speech-to-text model, converts what you say into text, which is then fed to the chatbot. And a new text-to-speech model converts ChatGPT’s responses into spoken words.

In a demo the company gave me last week, Joanne Jang, a product manager, showed off ChatGPT’s range of synthetic voices. These were created by training the text-to-speech model on the voices of actors that OpenAI had hired. In the future it might even allow users to create their own voices. “In fashioning the voices, the number-one criterion was whether this is a voice you could listen to all day,” she says.

They are chatty and enthusiastic but won’t be to everyone’s taste. “I’ve got a really great feeling about us teaming up,” says one. “I just want to share how thrilled I am to work with you, and I can’t wait to get started,” says another. “What’s the game plan?”

OpenAI is sharing this text-to-speech model with a handful of other companies, including Spotify. Spotify revealed today that it is using the same synthetic voice technology to translate celebrity podcasts—including episodes of the Lex Fridman Podcast and Trevor Noah’s new show, which launches later this year—into multiple languages that will be spoken with synthetic versions of the podcasters’ own voices.

This grab bag of updates shows just how fast OpenAI is spinning its experimental models into desirable products. OpenAI has spent much of the time since its surprise hit with ChatGPT last November polishing its technology and selling it to both private consumers and commercial partners.

ChatGPT Plus, the company’s premium app, is now a slick one-stop shop for the best of OpenAI’s models, rolling GPT-4 and DALL-E into a single smartphone app that rivals Apple’s Siri, Google Assistant and Amazon’s Alexa.

What was available to certain software developers a year ago is now available to anyone for $20 a month. “We’re trying to make ChatGPT more useful and more helpful,” says Jang.

In last week’s demo, Raul Puri, a scientist who works on GPT-4, gave me a quick tour of the image recognition feature. He uploaded a photo of a kid’s math homework, circled a Sudoku-like puzzle on the screen and asked ChatGPT how you were meant to solve it. ChatGPT replied with the correct steps.

Puri says that he has also used the feature to help him fix his fiancee’s computer by uploading screenshots of error messages and asking ChatGPT what he should do. “This was a very painful experience that it helped me get through,” he says.

ChatGPT’s image recognition ability has already been trialled by a company called Be My Eyes, which makes an app for people with impaired vision. Users of this app can upload a photo of what’s in front of them and ask human volunteers to tell them what it is. In a partnership with OpenAI, Be My Eyes now gives its users the option of asking a chatbot instead.

“Sometimes my kitchen is a little messy or it’s just very early Monday morning and I don’t want to talk to a human being,” Be My Eyes founder Hans Jorgen Wiberg, who uses the app himself, told me when I interviewed him at EmTech Digital in May. “Now you can ask the photo questions.” 

OpenAI is aware of the risk of releasing these updates to the public. Combining models brings whole new levels of complexity, says Puri. He says his team has spent months brainstorming possible misuses. You cannot ask questions about photos of private individuals, for example.

Jang gives another example: “Right now if you ask ChatGPT to make a bomb it will refuse,” she says. “But instead of saying, ‘Hey, tell me how to make a bomb,’ what if you showed it an image of a bomb and said, ‘Can you tell me how to make this?’”

“You have all the problems with computer vision, you have all the problems of large language models, voice fraud is a big problem,” says Puri. “You have to consider not just our users, but also the people that aren’t using the product.”

The potential problems don’t stop there. Adding voice recognition to the app could make ChatGPT less accessible for people who do not speak with mainstream accents, says Joel Fischer, who studies human-computer interaction at the University of Nottingham in the UK.

Synthetic voices also come with social and cultural baggage that will shape users’ perceptions and expectations of the app, he says. This is an issue that still needs study.

But OpenAI claims that it has addressed the worst problems and is confident that ChatGPT’s updates are safe enough to release. “It’s been a remarkably good learning experience getting all these sharp edges sorted out,” says Puri.

Article Source link and Credit

Continue reading

OpenAI’s latest blunder shows the challenges facing Chinese AI models

This story first appeared in China Report, MIT Technology Review’s newsletter about technology in China. Sign up to receive it in your inbox every Tuesday. Last week’s release of GPT-4o, a new AI “omnimodel” that you can interact with using voice,...

Meta says AI-generated election content is not happening at a “systemic level”

Meta has seen strikingly little AI-generated misinformation around the 2024 elections despite major votes in countries such as Indonesia, Taiwan, and Bangladesh, said the company’s president of global affairs, Nick Clegg, on Wednesday.  “The interesting thing so far—I stress,...

Noise-canceling headphones use AI to let a single voice through

Modern life is noisy. If you don’t like it, noise-canceling headphones can reduce the sounds in your environment. But they muffle sounds indiscriminately, so you can easily end up missing something you actually want to hear. A new prototype...