
"OpenAI released three new voice models in its API, broadening the range of surfaces where developers can plug GPT-class reasoning into live audio. The three are GPT-Realtime-2, a successor to the company's existing realtime voice model with what OpenAI describes as GPT-5-class reasoning; GPT-Realtime-Translate, a live translation model with more than 70 input and 13 output languages; and GPT-Realtime-Whisper, a streaming speech-to-text model built for low-latency transcription."
"What OpenAI is offering with GPT-Realtime-2 is a single model that handles audio in and audio out, with reasoning happening inside the audio loop rather than between transcription and synthesis steps. The release lands in the middle of a voice-AI build-out that the rest of the industry has spent the past year staffing for. Enterprises that have shipped voice agents have done so on a stack of stitched-together components: Whisper or Deepgram for transcription, ElevenLabs or Cartesia for text-to-speech, GPT-4 or Claude for the reasoning step, and bespoke turn-taking and barge-in logic in the middle."
"GPT-Realtime-2 picks up several capabilities that production voice teams have been simulating with prompt scaffolding. Preambles let an agent say "let me check that" while it calls tools, so users do not sit through silence. Parallel tool calls let the model fire multiple back-end requests simultaneously and narrate which one is in flight. Recovery behaviour catches failures and surfaces them rather than freezing the conversation."
OpenAI released three new voice models for its API: GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper. GPT-Realtime-2 is a successor realtime voice model that accepts audio input and produces audio output, with reasoning occurring inside the audio loop. GPT-Realtime-Translate performs live translation with more than 70 input languages and 13 output languages. GPT-Realtime-Whisper is a streaming speech-to-text model designed for low-latency transcription. The models aim to reduce the need for stitched-together voice stacks by combining reasoning, transcription, and translation capabilities. GPT-Realtime-2 also includes features such as preambles, parallel tool calls, and recovery behavior to keep conversations responsive and resilient.
Read at TNW | Openai
Unable to calculate read time
Collection
[
|
...
]