Reinforcement Learning with LLM-Guided Action Spaces for Synthesizable Lead Optimization

ArXi:2604.07669v2 Announce Type: replace-cross Lead optimization in drug discovery requires improving therapeutic properties while ensuring that molecular modifications correspond to feasible synthetic routes. Existing approaches either prioritize property scores without enforcing synthesizability, or rely on expensive enumeration over large reaction networks, while direct application of Large Language Models (LLMs) to molecular generation frequently produces chemically invalid structures. We.