Adaptive Ensemble Aggregation for Actor-Critics

ArXi:2507.23501v2 Announce Type: replace Ensembles are ubiquitous in off-policy actor-critic learning, yet their efficacy depends critically on how they are aggregated. Current methods typically rely on static rules or task-specific hyperparameters to balance overestimation bias and variance, leaving the challenge of a truly adaptive approach open. We