Measuring the autonomy of AI agents in practice

Measuring the autonomy of AI agents in practice

By @AnthropicAI
Publication Date: 2026-02-18 12:00:00

AI agents are here, and they are already being used in contexts that vary widely, from email triage to cyber espionage. Understanding this spectrum is critical to the safe use of AI. Yet we know surprisingly little about how people actually use agents in the real world.

We analyzed millions of human-agent interactions in both Claude Code and our public API using our privacy tool and asked: How much autonomy do humans give agents? How does this change as people gain experience? In which domains do agents operate? And are the agents’ actions risky?

This is what we found out:

  • Claude Code works autonomously longer. Among the longest-running sessions, the time it took for Claude Code to abort has almost doubled in three months, from under 25 minutes to over 45 minutes. This increase is consistent across all model versions, suggesting that it is not just due to increased capabilities and that existing models can do more…