Recurrent Deep Reinforcement Learning for Chemotherapy Control under Partial Observability

ArXi:2605.02552v1 Announce Type: new Chemotherapy dose optimization can be formulated as a dynamic treatment regime, requiring sequential decisions under uncertainty that must balance tumor suppression against toxicity. However, most reinforcement learning approaches assume full observability of the patient state, a condition rarely met in clinical practice. We investigate whether memory-augmented policies can improve chemotherapy control under partial observability.