AI RESEARCH

When Outcome Looks Right But Discipline Fails: Trace-Based Evaluation Under Hidden Competitor State

arXiv CS.LG

ArXi:2605.18580v1 Announce Type: cross Outcome-only evaluation can certify economically unsafe agents: a policy can hit a business KPI while violating deployable behavioral discipline. In hotel pricing with hidden competitor state, a learner can achieve plausible revenue per available room while failing to preserve the rate discipline of a rule-based revenue-management competitor.