In 2025, generative AI has evolved from text generation to multi-modal use cases ranging from audio transcription and translation to voice agents that require real-time data streaming. Today’s applications demand something more: continuous, real-time dialogue between users and models—the ability for data to flow both ways, simultaneously, over a single persistent connection. Imagine a speech to text use-case, where you will need to stream the audio stream as input and receive the transcripted text as a continuous stream. Such use-cases will require bi-directional streaming capability.
We’re introducing bidirectional streaming for Amazon SageMaker AI Inference, which transforms inference from a transactional exchange into a continuous conversation. Speech works best with real-time AI when conversations flow naturally without interruptions. With bidirectional streaming, speech to text becomes immediate. The model listens and transcribes at the same time, so…