AI RESEARCH
Actor-Critic with Active Importance Sampling
arXiv CS.LG
•
ArXi:2605.07094v1 Announce Type: new For continuous action spaces, AISAC employs Gaussian behavior policies optimized through cross-entropy minimization. We provide theoretical analysis nstrating variance reduction and unbiasedness. Experiments on Inverted Pendulum and Half Cheetah tasks show improved learning speed, sample efficiency, and