Adapting Critic Match Loss Landscape Visualization to Off-policy Reinforcement Learning

ArXi:2603.14589v1 Announce Type: cross This work extends an established critic match loss landscape visualization method from online to off-policy reinforcement learning (RL), aiming to reveal the optimization geometry behind critic learning. Off-policy RL differs from stepwise online actor-critic learning in its replay-based data flow and target computation.