Information-Consistent Language Model Recommendations through Group Relative Policy Optimization

ArXi:2512.12858v2 Announce Type: replace-cross Large Language Models (LLMs) are increasingly deployed in business-critical domains such as finance, education, healthcare, and customer, where users expect consistent and reliable recommendations. Yet LLMs often exhibit variability when prompts are phrased with minor differences, even when semantically equivalent. Such inconsistency undermines trust, complicates compliance, and disrupts user experience.