Web use agent harness w/ 30x token reduction, 12x TTFT reduction w/ Qwen 3.5 9B on potato device (And no, I did not use vision capabilities)

r/LocalLLaMA
Generative AI Open Source AI

Browser use agents tend to prefer the models' native multimodality over concrete source, and, even if they do, they still tend to take too much context to even barely function. I was running into this problem when using LLM Agents; Then I came up with an idea. What if I can just. send the rendered DOM to the agent, but with markdown-like compression? Turns out, it works! It reduces token consumption by thirty-two times on GitHub (vs. raw DOM), at least according to my experiments, while only taking ~30ms to parse.