Why is nobody talking about how broken web scraping is for AI agents right now?
r/ChatGPT
•
Generative AI
I thought I was being smart building an AI competitor analysis tool. I hooked up Puppeteer to scrape pricing pages, but I didn't realize target sites had updated their bot protection. My scraper got caught in an infinite Cloudflare Turnstile captcha loop. Instead of crashing, my script just kept feeding the bot-challenge HTML back into Claude/OpenAI to "parse the pricing data." It ran all night, burning millions of tokens on literal garbage HTML. Woke up to a catastrophic Stripe receipt. I am never managing headless browsers again.