How are people building deep research agents?

r/LocalLLaMA
Generative AI

For those building deep research agents, how are you actually retrieving information from the web in practice? Are you mostly: calling search/research APIs (Exa, Tavily, Perplexity, etc.) and then visiting each returned link, opening those pages in a browser runtime (Playwright/Puppeteer) and brute-force scraping the HTML or using some efficient architecture? Curious what the typical pipeline looks like submitted by /u/Tricky-Promotion6784 [link] [comments]