PerfCodeBench: Benchmarking LLMs for System-Level High-Performance Code Optimization

ArXi:2605.15222v1 Announce Type: cross Large language models (LLMs) can often generate functionally correct code, but their ability to produce efficient implementations for performance-critical systems tasks remains limited. Existing code benchmarks mainly emphasize correctness or algorithmic problem solving, while realistic systems-level optimization is still underexplored. To address this gap, we