PDDL-Mind: Large Language Models are Capable on Belief Reasoning with Reliable State Tracking

ArXi:2604.17819v1 Announce Type: new Large language models (LLMs) perform substantially below human level on existing theory-of-mind (ToM) benchmarks, even when augmented with chain-of-thought prompting or probabilistic belief updates. We argue that these failures primarily arise from unreliable implicit state tracking rather than limitations in high-level reasoning. We