I Tested 15 LLMs for Web Scraping and Built Heuristics Instead
Dev.to AI
•
Generative AI
The problem nobody talks about: 600KB of DOM When I started building a web scraper, the obvious move was to send the page to an LLM and ask it to extract the data. Simple, right? Wrong. A typical product listing page is 500-700KB of raw DOM. Sending that to any model means you're paying for ~150,000 tokens per page, waiting 15-30 seconds per request, and hitting context limits on anything complex. I hit this wall on page one.