AI RESEARCH
Can Revealed Preferences Clarify LLM Alignment and Steering?
arXiv CS.LG
•
ArXi:2605.08556v1 Announce Type: new LLMs are increasingly used to make or high-stakes decisions under uncertainty, where alignment depends not only on factual accuracy but on how models weigh tradeoffs between different outcomes. We present an empirical pipeline for estimating the implied preferences that an LLM's observed choices optimize: we elicit the model's probability distribution over unknowns along with the choice it would make for the decision task and then fit a discrete choice model to recover the cost function that best rationalizes the model's decisions.