OpenAI has redesigned how users interact with ChatGPT Voice to make it much more intuitive and natural. The AI chatbot will now interact directly with the user via voice, eliminating the need to switch to a separate voice-only interface. The update is available across mobile and web.
OpenAI confirmed the update through an official X post, “You can now use ChatGPT Voice right inside chat—no separate mode needed. You can speak, see responses appear, scroll back through previous messages, and view multimedia elements like images or maps in real time.”
The post further read, “Rolling out to all users on mobile and web. Just update your app.”
This integration allows users to talk to ChatGPT while simultaneously reading responses, browsing previous messages, and viewing visuals such as images, diagrams, or maps, all within chat.
Previously, turning on voice mode would redirect users to a special full-screen interface that featured an animated blue circle and a mute button.
While this feature was functional, it unnecessarily isolated voice interaction from the main conversation. Users could hear the responses, but they couldn't see them unless they left the mode, causing annoyance if a message was either missed or misheard.
The new in-chat design removes this friction. ChatGPT now speaks as it types out its replies in real time. A user can scroll back, check shared media, or refer to a previous prompt without breaking the voice flow. You can transition smoothly between speaking and typing; you simply need to tap ‘End’ to exit the voice session.
Also Read: How to Get Your Work Cited by ChatGPT
The new feature is now set as the default voice experience across web and mobile apps. A simple update will enable it for all users.
For those who prefer the previous full-screen voice interface, it remains an option within ChatGPT. Go to: Settings → Voice Mode → Separate mode to bring back the default look.
From this update, OpenAI is clearly moving toward more natural, human-like conversations that combine speech, text, and visuals into a seamless interaction.