By Yuqing Gao, Jian Tan
Publication Date: 2026-02-04 03:10:00
This blog was written in collaboration with Fan Bu, Jason Mackay, Borya Sobolev, Dev Khanolkar, Ali Dabir, Puneet Kamal, Li Zhang, and Lei Jin.
“Everything is a file”; some are databases

Introduction
Machine data underpins observability and diagnosis in modern computing systems, including logs, metrics, telemetry traces, configuration snapshots, and API response payloads. In practice, this data is embedded into prompts to form an interleaved composition of natural-language instructions and large machine-generated payloads, typically represented as JSON blobs or Python/AST literals. While large language models excel at reasoning text and code, they frequently struggle with machine-generated sequences – particularly when those are long, deeply nested, and dominated by repetitive structure.
We repeatedly observe three failure modes:
- Token explosion from verbosity: Nested keys and repeated schema dominate the context window, fragmenting the…