Generalisation in Multitask Fitted Q-Iteration and Offline Q-learning

ArXi:2512.20220v2 Announce Type: replace We study offline multitask reinforcement learning in settings where multiple tasks share a low-rank representation of their action-value functions. In this regime, a learner is provided with fixed datasets collected from several related tasks, without access to further online interaction, and seeks to exploit shared structure to improve statistical efficiency and generalization.