AI RESEARCH

[R] How are you managing long-running preprocessing jobs at scale? Curious what's actually working

r/MachineLearning

We're a small ML team for a project and we keep running into the same wall: large preprocessing jobs (think 50-100GB datasets) running on a single machine take hours, and when something fails halfway through, it's painful. We've looked at Prefect, Temporal, and a few others - but they all feel like they require a full-time DevOps person to set up and maintain properly. And most of our team is focused on the models, not the infrastructure.