By Devesh Beri
Publication Date: 2026-04-03 20:05:00
Images generated by MAI-Image-1.
Credit: Microsoft
On Thursday, Microsoft introduced three new foundational AI models—MAI-Transcribe-1, MAI-Voice-1, and MAI-Image-2—focused on transcription, audio, and image generation, respectively. The tech giant positions them as in-house systems that will provide it with better control over cost, performance, and integration across its software and cloud services.
MAI-Transcribe-1 offers text-to-speech transcription in 25 different languages. This could be used to create instant transcripts of Teams meetings or customer-facing phone calls. Microsoft describes MAI-Transcribe-1 as “lightning fast,” meaning it should produce captions or transcripts with very low latency. The brand also reports its model as having a lower word error rate than GPT-Transcribe, Gemini 3.1 Flash, and other transcription-focused AI models.
MAI-Voice-1 is a…