Dynamic-TreeRPO: Breaking the Independent Trajectory Bottleneck with Structured Sampling

ArXi:2509.23352v3 Announce Type: replace-cross The integration of Reinforcement Learning (RL) into flow matching models for text-to-image (T2I) generation has driven substantial advances in generation quality. However, these gains often come at the cost of exhaustive exploration and inefficient sampling strategies due to slight variation in the sampling group. Building on this insight, we propose Dynamic-TreeRPO, which implements the sliding-window sampling strategy as a tree-structured search with dynamic noise intensities along depth.