AI RESEARCH

Architecture advice: Real-time pipeline for YouTube Audio -> Whisper -> LLM -> SSE (Sub-10s latency) [D]

r/MachineLearning

Currently, my flow is a slow waterfall: Download full audio -> Whisper -> LLM -> Return results. For a 30-minute video, the user waits forever.