AI RESEARCH

PuzzleWorld: A Benchmark for Multimodal, Open-Ended Reasoning in Puzzlehunts

arXiv CS.AI

ArXi:2506.06211v2 Announce Type: replace-cross Puzzlehunts are a genre of complex, multi-step puzzles lacking well-defined problem definitions. In contrast to conventional reasoning benchmarks consisting of tasks with clear instructions and constrained environments, puzzlehunts requires discovering the underlying problem structure from multimodal evidence and iterative reasoning, mirroring real-world domains such as scientific discovery, exploratory data analysis, or investigative problem-solving.