AI RESEARCH
LLMEval-Logic: A Solver-Verified Chinese Benchmark for Logical Reasoning of LLMs with Adversarial Hardening
arXiv CS.CL
•
ArXi:2605.19597v1 Announce Type: new Evaluating large language models (LLMs) on natural-language logical reasoning is essential because rule-governed tasks require