Microsoft Introduces 3 Foundational AI Models To Take on OpenAI, Anthropic

Microsoft Introduces 3 Foundational AI Models To Take on OpenAI, Anthropic

By Devesh Beri
Publication Date: 2026-04-03 20:05:00

Images generated by MAI-Image-1.

Credit: Microsoft

On Thursday, Microsoft introduced three new foundational AI models—MAI-Transcribe-1, MAI-Voice-1, and MAI-Image-2—focused on transcription, audio, and image generation, respectively. The tech giant positions them as in-house systems that will provide it with better control over cost, performance, and integration across its software and cloud services.

MAI-Transcribe-1 offers text-to-speech transcription in 25 different languages. This could be used to create instant transcripts of Teams meetings or customer-facing phone calls. Microsoft describes MAI-Transcribe-1 as “lightning fast,” meaning it should produce captions or transcripts with very low latency. The brand also reports its model as having a lower word error rate than GPT-Transcribe, Gemini 3.1 Flash, and other transcription-focused AI models.

MAI-Voice-1 is a…