By Asif Razzaq
Publication Date: 2026-05-24 08:56:00
Most web agents today drive a browser one action at a time. The model receives the current page state — as a screenshot or DOM text — and predicts the next click, keypress, or scroll. This action-at-a-time design made sense when language models had limited reasoning ability. As models have become more capable at writing and debugging code, that rigid loop has become a constraint rather than a structure that helps.
Microsoft Research’s AI Frontiers lab built a different approach. Their new open-source framework, Webwright, gives the agent a terminal instead of a stateful browser session. The agent writes Playwright code to control browsers, runs bash commands, inspects logs, and iteratively refines scripts. Playwright is an open-source browser automation library, also from Microsoft, that supports programmatic control of Chromium, Firefox, and WebKit browsers.
What Webwright Does Differently
Webwright separates the agent from the browser and treats…

