A Comparative Study of Dynamic Programming and Reinforcement Learning in Finite Horizon Dynamic Pricing

ArXi:2604.14059v1 Announce Type: cross This paper provides a systematic comparison between Fitted Dynamic Programming (DP), where demand is estimated from data, and Reinforcement Learning (RL) methods in finite-horizon dynamic pricing problems. We analyze their performance across environments of increasing structural complexity, ranging from a single typology benchmark to multi-typology settings with heterogeneous demand and inter-temporal revenue constraints.