When Robots Do the Chores: A Benchmark and Agent for Long-Horizon Household Task Execution

ArXi:2605.14504v1 Announce Type: new Long-horizon household tasks demand robust high-level planning and sustained reasoning capabilities, which are largely overlooked by existing embodied AI benchmarks that emphasize short-horizon navigation or manipulation and rely on fixed task categories. We