$\texttt{YC-Bench}$: Benchmarking AI Agents for Long-Term Planning and Consistent Execution

ArXi:2604.01212v1 Announce Type: cross As LLM agents tackle increasingly complex tasks, a critical question is whether they can maintain strategic coherence over long horizons: planning under uncertainty, learning from delayed feedback, and adapting when early mistakes compound. We