By Nicole Kobie
Publication Date: 2026-05-12 12:00:00
AI providers keep telling us to hand off work to agents, but research from Microsoft suggests that might not be a wise move.
In a pre-print paper, a trio of Microsoft researchers found that large language models (LLMs) corrupt documents over the course of long, extensive workflows, resulting in data deletion and even hallucinations.
Top tier foundational models – including Gemini 3.1 Pro, Claude Opus 4.6, GPT 5.4 – corrupted an average of 25% of content in a document during the research, with other models more than half.
“Delegation requires trust – the expectation that the LLM will faithfully execute the task without introducing errors into documents,” said researchers Philippe Laban, Tobias Schnabel, and Jennifer Neville.
“Our analysis shows that current LLMs are unreliable delegates: they introduce sparse but severe errors that silently corrupt documents, compounding over long interaction.”
The work comes amid growing challenges with extensive AI use, with…