Show HN: LLM agents that write Python to analyze execution traces at scale

Hacker News Show AI
Generative AI

We combined Stanford's ACE (agents learning from execution feedback) with the Reflective Language Model pattern. Instead of reading traces in a single pass, an LLM writes and runs Python in a sandbox to programmatically explore them - finding cross-trace patterns that single-pass analysis misses. The framework achieved 2x consistency improvement on τ2-bench. Comments URL: Points: 5 # Comments: 0