Spatial Speech Translation consists of two AI models, the first of which divides the space surrounding the person wearing the headphones into small regions and uses a neural network to search for potential speakers and pinpoint their direction.
The second model then translates the speakers’ words from French, German, or Spanish into English text using publicly available data sets. The same model extracts the unique characteristics and emotional tone of each speaker’s voice, such as…
Article Source
https://www.technologyreview.com/2025/05/09/1116215/a-new-ai-translation-system-for-headphones-clones-multiple-voices-simultaneously/