P-Check: Advancing Personalized Reward Model via Learning to Generate Dynamic Checklist

ArXi:2601.02986v2 Announce Type: replace Recent approaches in personalized reward modeling have primarily focused on leveraging user interaction history to align model judgments with individual preferences. However, existing approaches largely treat user context as a static or implicit conditioning signal, failing to capture the dynamic and multi-faceted nature of human judgment.