AI RESEARCH

LLMEval-Logic: A Solver-Verified Chinese Benchmark for Logical Reasoning of LLMs with Adversarial Hardening

arXiv CS.CL

ArXi:2605.19597v1 Announce Type: new Evaluating large language models (LLMs) on natural-language logical reasoning is essential because rule-governed tasks require