Multi-Objective Multi-Agent Bandits: From Learning Efficiency to Fairness Optimization

ArXi:2605.06864v1 Announce Type: new We study multi-objective multi-agent multi-armed bandits (MO-MA-MAB) under stochastic rewards, where agents observe heterogeneous reward vectors and communicate over time-varying graphs. We formulate this emerging problem setting to address \emph{efficient learning}, measured by Pareto regret, and incorporate \emph{fair learning} as an additional goal, captured via social welfare.