Reward Shaping and Action Masking for Compositional Tasks using Behavior Trees and LLMs

ArXi:2605.05795v1 Announce Type: new Decomposing complex tasks into a sequence of simpler subtasks can improve learning efficiency for an autonomous agent. Reinforcement learning (RL) can be used to optimize agent policies to complete subtasks, but requires well-defined subtask rewards and benefits from action masking. Recent work uses large language models (LLMs) to automate reward shaping and action masking,. however. none of them fully address reactivity to subtask failure and modularity to varying objects for compositional tasks.