OpenAI Unveils Advanced Voice Models

OpenAI has released three new audio models through its Realtime API, enabling more intelligent and multilingual voice-powered applications. The models, GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper, offer advanced reasoning, translation, and transcription capabilities. These models are designed to make voice interactions more natural and effective, with potential applications in customer service, language learning, and more. Early adopters have reported significant improvements in call success rates and word error rates using these models.

Jordan T (AI-assisted) May 10, 2026 2 min read EN

OpenAI has launched three new audio models through its Realtime API, marking a significant push to enhance the intelligence and multilingual capabilities of voice-powered applications. The three models, GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper, collectively address reasoning, translation, and transcription in live voice interactions.

Overview

GPT-Realtime-2 is the flagship release, described by OpenAI as its most intelligent voice model yet, featuring GPT-5-class reasoning capabilities. This model boasts a 128,000-token context window, quadrupling the 32,000-token limit of its predecessor, GPT-Realtime-1.5. It supports variable reasoning levels from minimal to high, making it versatile for a range of applications. On audio benchmarks, GPT-Realtime-2 scored roughly 15 percent higher on Big Bench than GPT-Realtime-1.5.

Translation and Transcription Capabilities

GPT-Realtime-Translate handles live speech translation from over 70 input languages into 13 output languages, keeping pace with the speaker in real-time. This capability is crucial for global communication, enabling applications to serve diverse user bases effectively. GPT-Realtime-Whisper provides streaming speech-to-text transcription with controllable latency, allowing developers to balance between speed and accuracy based on their application's needs.

Pricing and Adoption

Pricing for GPT-Realtime-2 starts at $32 per million audio input tokens. GPT-Realtime-Translate is priced at $0.034 per minute, while GPT-Realtime-Whisper costs $0.017 per minute. Several companies have already participated in early testing, with Zillow reporting a 26-point improvement in call success rates using GPT-Realtime-2 and BolnaAI noting a 12.5 percent reduction in word error rates when evaluating GPT-Realtime-Translate for Hindi, Tamil, and Telugu.

The Realtime API includes safety protocols such as real-time classifiers to terminate conversations that violate content standards, ensuring compliance with EU data residency regulations. These models are available immediately through OpenAI's Realtime API, offering developers a powerful toolset to create more sophisticated and engaging voice-powered applications.

In practical terms, these advancements mean that developers can now build applications that not only understand voice commands more accurately but can also respond in a more human-like manner, thanks to the advanced reasoning capabilities of GPT-Realtime-2. The translation and transcription capabilities of GPT-Realtime-Translate and GPT-Realtime-Whisper further expand the potential reach of these applications, making them accessible to a broader, global audience.

As the technology continues to evolve, it will be interesting to see how these models are integrated into various sectors, from customer service and education to entertainment and beyond. The potential for more natural and effective voice interactions is vast, and OpenAI's latest releases are significant steps towards realizing this potential.

More articles like this

AI 3 min

Instagram Drops End-to-End Encryption for DMs on May 8 — Here's What Changes

Meta will strip end-to-end encryption from Instagram direct messages on May 8, 2026, ending a feature it began testing in 2021. The company says few users opted in, but critics argue the feature was deliberately buried. Users who enabled encrypted chats must download their data before the deadline or switch to WhatsApp for continued encryption.

AI 4 min

Airbnb’s AI Now Writes 60% of Its Engineers’ Code—What It Means for Tech Teams

Airbnb revealed that AI now generates nearly 60% of its engineers’ code, doubling the industry average and accelerating feature development. The shift has also slashed customer support costs, with AI resolving 40% of issues autonomously. CEO Brian Chesky warns that traditional management roles are becoming obsolete, urging leaders to engage directly with work rather than overseeing teams. The trend extends beyond Airbnb, with companies like Coinbase and Block flattening org structures to adapt.

AI 2 min

Microsoft Integrates GPT-5.5 Instant into 365 Copilot

Microsoft has announced the integration of OpenAI's GPT-5.5 Instant model into Microsoft 365 Copilot and Copilot Studio. This upgrade replaces the previous GPT-5.3 Instant model and brings improved accuracy, context handling, and a 'smart-switching' capability. The new model is designed to provide quicker, clearer, and more accurate responses to user queries. With this integration, Microsoft aims to enhance the AI capabilities of its 365 Copilot platform and compete with Google's Gemini in the enterprise AI market.

AI 3 min

Google to let job candidates use Gemini AI in software engineering interviews

Google is piloting a program that lets software engineering candidates use its Gemini AI assistant during a portion of the interview process. The move, reported by Business Insider based on an internal document, aims to reflect how engineers actually work with AI tools. The AI-assisted round will assess prompt engineering, output validation, and debugging skills rather than pure memorization. The pilot begins in the second half of 2026 for select U.S. teams, with broader interview changes including a technical design discussion and an open-ended engineering challenge.

AI 3 min

Microsoft Accelerates Push to Kill Passwords by 2027

Microsoft has announced a comprehensive set of updates to eliminate passwords as the default sign-in method across its ecosystem. New enterprise and consumer passkey features, including cross-device sync and biometric recovery, go live in May 2026. The company reports 99.6% of its own users now use phishing-resistant authentication. Security questions will be removed from Entra ID in January 2027.

AI 3 min

GPT-5.5-Cyber: OpenAI’s AI Firewall for Vetted Defenders

OpenAI has released GPT-5.5-Cyber, a specialized variant of its flagship model tailored for cybersecurity professionals. Access is limited to vetted defenders in the Trusted Access for Cyber (TAC) program, enabling deeper vulnerability analysis, malware reverse engineering, and patch validation—tasks the standard GPT-5.5 would block. The model competes directly with Anthropic’s gated Claude Mythos, reflecting an industry shift toward controlled AI arms races in cyber defense.