Asked 6 different devs how they handle web scraping for AI pipelines. got 6 completely different answers. here's what actually works.

r/ChatGPT
Generative AI

Been trying to figure out the "right" way to get clean web data into AI workflows without the whole thing being a maintenance nightmare. talked to a bunch of people building similar stuff. answers ranged from "just use beautifulsoup" to "build your own playwright cluster" to "scraping is dead, use APIs only." after trying most of these approaches myself here's my honest take: Beautifulsoup is fine for dead simple static sites, breaks immediately on anything JS rendered playwright/puppeteer DIY do works but you're now maintaining infrastructure, not building a product.