Hierarchical Apprenticeship Learning from Imperfect Demonstrations with Evolving Rewards

ArXi:2604.00258v1 Announce Type: cross While apprenticeship learning has shown promise for inducing effective pedagogical policies directly from student interactions in e-learning environments, most existing approaches rely on optimal or near-optimal expert nstrations under a fixed reward. Real-world student interactions, however, are often inherently imperfect and evolving: students explore, make errors, revise strategies, and refine their goals as understanding develops.