Group-Relative Contextual Bandit Policy Gradient for Homepage Recommendation

Towards AI
AI Research Reinforcement Learning

Efficient Reinforcement Learning from Relative Slate Quality in Contextual Bandits