ExCyTIn-Bench: Evaluating LLM agents on Cyber Threat Investigation

ArXi:2507.14201v3 Announce Type: replace-cross We present ExCyTIn-Bench, the first benchmark to Evaluate an LLM agent X on the task of Cyber Threat Investigation through security questions derived from investigation graphs. Real-world security analysts must sift through a large number of heterogeneous security logs, follow multi-hop chains of evidence to investigate threats. With the developments of LLMs, building LLM-based agents for automatic threat investigation is a promising direction.