Thinking Machines says today’s AI conversations are built like text chains: it waits to listen, then replies. The company is developing a different approach where the model processes your input and generates its response simultaneously, closer to a phone call. The goal is faster, more natural back-and-forth that better matches how people actually talk.
Thinking Machines Lab has unveiled “interaction models,” a new kind of multimodal AI built to communicate in real time. Unlike systems that wait for separate inputs, these models process audio and visuals together, enabling continuous responses and sharply lower latency. The goal: make human AI collaboration feel more natural, especially for time sensitive enterprise and industrial use cases.
Your news, in seconds
Get the Beige app — every story in 60 words, updated hourly. Free on iOS & Android.
Thinking Machines is previewing “interaction models” meant to move AI beyond turn based chat. Its system processes 200ms chunks in full duplex—listening, talking, and responding to visual cues at once—while a separate background model handles deeper reasoning. The company reports major gains on FD-bench benchmarks, but availability is limited to a research preview first.
OpenAI has unveiled three new audio models for developers aimed at making voice agents faster, smarter, and more interactive in real time. GPT-Realtime-2 tackles complex requests even when users interrupt. GPT-Realtime-Translate delivers live multilingual translation, while GPT-Realtime-Whisper provides instant speech to text for captions and notes. Early adopters include companies like Zillow and Priceline.
Swipe through stories, personalise your feed, and save articles for later — all on the app.