AI RESEARCH

Gaming the Metric, Not the Harm: Certifying Safety Audits against Strategic Platform Manipulation

arXiv CS.LG

ArXi:2605.06324v1 Announce Type: cross Online-safety regulation under the UK Online Safety Act and the EU Digital Services Act increasingly treats scalar metrics as compliance evidence. Once announced, such a metric also becomes an optimization target: a strategic platform can improve its score by routing recommendations through semantically equivalent content variants, without reducing true harm. We ask when such an audit metric can still certify a genuine reduction in harm.