Web use agent harness w/ 30x token reduction, 12x TTFT reduction w/ Qwen 3.5 9B on potato device (And no, I did not use vision capabilities)
r/LocalLLaMA
•
Generative AI
Open Source AI
Browser use agents tend to prefer the models' native multimodality over concrete source, and, even if they do, they still tend to take too much context to even barely function. I was running into this problem when using LLM Agents; Then I came up with an idea. What if I can just. send the rendered DOM to the agent, but with markdown-like compression? Turns out, it works! It reduces token consumption by thirty-two times on GitHub (vs. raw DOM), at least according to my experiments, while only taking ~30ms to parse.