PEER: Unified Process-Outcome Reinforcement Learning for Structured Empathetic Reasoning

ArXi:2508.09521v2 Announce Type: replace Emotional conversations require than fluent responses. ers need to understand the seeker's situation and emotions, adopt an appropriate strategy, and respond in a natural, human-like manner. Despite advances in large language models, current systems often lack structured, psychology-informed reasoning. Additionally, it is challenging to enhance these systems through reinforcement learning because of unreliable reward signals. Moreover, reinforcement fine-tuning can amplify repetitive response patterns.