SupChain-Bench: Benchmarking Large Language Models for Real-World Supply Chain Management

ArXi:2602.07342v2 Announce Type: replace Large language models (LLMs) have shown promise in complex reasoning and tool-based decision making, motivating their application to real-world supply chain management. However, supply chain workflows require reliable long-horizon, multi-step orchestration grounded in domain-specific procedures, which remains challenging for current models. To systematically evaluate LLM performance in this setting, we