EngiBench: A Benchmark for Evaluating Large Language Models on Engineering Problem Solving

ArXi:2509.17677v2 Announce Type: replace Large language models (LLMs) have shown strong performance on mathematical reasoning under well-defined conditions. However, real-world engineering problems involve uncertainty, context, and open-ended settings that extend beyond symbolic computation. Existing benchmarks largely focus on well-defined or abstract reasoning and therefore fail to capture these complexities. We