Near-Optimal Privacy-Preserving Learning for Max-Min Fair Multi-Agent Bandits

ArXi:2306.04498v3 Announce Type: replace We study fair multi-agent multi-armed bandit learning under collision-only coordination. Agents cannot communicate explicitly during learning and observe only their own rewards and whether collisions occur when several agents access the same arm. The goal is to learn a max-min fair allocation while keeping each agent's reward samples and empirical reward estimates local.