Group-Relative Contextual Bandit Policy Gradient for Homepage Recommendation
Towards AI
•
AI Research
Reinforcement Learning
Efficient Reinforcement Learning from Relative Slate Quality in Contextual Bandits