General365: Benchmarking General Reasoning in Large Language Models Across Diverse and Challenging Tasks

ArXi:2604.11778v1 Announce Type: cross Contemporary large language models (LLMs) have nstrated remarkable reasoning capabilities, particularly in specialized domains like mathematics and physics. However, their ability to generalize these reasoning skills to general and broader contexts--often termed general reasoning--remains under-explored. Unlike domain-specific reasoning, general reasoning relies less on expert knowledge but still presents formidable reasoning challenges, such as complex constraints, nested logical branches, and semantic interference. To address this gap, we