Interactive Inverse Reinforcement Learning of Interaction Scenarios via Bi-level Optimization

ArXi:2605.08131v1 Announce Type: new Inverse reinforcement learning (IRL) learns a reward function and a corresponding policy that best fit the nstration data of an expert. However, in the current IRL setting, the learner is isolated from the expert and can only passively observe the expert nstrations. This limits the applicability of IRL to interactive settings, where the learner actively interacts with the expert and needs to infer the expert's reward function from the interactions.