AI RESEARCH
Near-Optimal Privacy-Preserving Learning for Max-Min Fair Multi-Agent Bandits
arXiv CS.LG
•
ArXi:2306.04498v3 Announce Type: replace We study fair multi-agent multi-armed bandit learning under collision-only coordination. Agents cannot communicate explicitly during learning and observe only their own rewards and whether collisions occur when several agents access the same arm. The goal is to learn a max-min fair allocation while keeping each agent's reward samples and empirical reward estimates local.