Diagnosing Non-Markovian Observations in Reinforcement Learning via Prediction-Based Violation Scoring

ArXi:2603.27389v1 Announce Type: cross Reinforcement learning algorithms assume that observations satisfy the Marko property, yet real-world sensors frequently violate this assumption through correlated noise, latency, or partial observability. Standard performance metrics conflate Marko breakdowns with other sources of suboptimality, leaving practitioners without diagnostic tools for such violations. This paper