Malliavin Calculus for Counterfactual Gradient Estimation in Adaptive Inverse Reinforcement Learning

ArXi:2604.01345v1 Announce Type: new Inverse reinforcement learning (IRL) recovers the loss function of a forward learner from its observed responses adaptive IRL aims to reconstruct the loss function of a forward learner by passively observing its gradients as it performs reinforcement learning (RL). This paper proposes a novel passive Langevin-based algorithm that achieves adaptive