AI RESEARCH

High entropy leads to symmetry equivariant policies in Dec-POMDPs

arXiv CS.LG

ArXi:2511.22581v3 Announce Type: replace We prove that in any Dec-POMDP, sufficiently high entropy regularization ensures that the policy gradient flow with tabular softmax parametrization always converges, for any initialization, to the same joint policy, and that this joint policy is equivariant w.r.t. all symmetries of the Dec-POMDP. In particular, policies coming from different initializations will be fully compatible, in that their cross-play returns are equal to their self-play returns.