AI RESEARCH

Counterfactual Trace Auditing of LLM Agent Skills

arXiv CS.AI

ArXi:2605.11946v1 Announce Type: new Large Language Model agents are increasingly augmented with agent skills. Current evaluation methods for skills remain limited. Most deployed benchmarks report only pass rate before and after a skill is attached, treating the skill as a black box change to agent behavior. We