Hedging and Non-Affirmation: Quantifying LLM Alignment on Questions of Human Rights

ArXi:2502.19463v2 Announce Type: replace-cross Hedging and non-affirmation are behaviors exhibited by large language models (LLMs) that limit the clear endorsement of specific statements. While these behaviors are desirable in subjective contexts, they are undesirable in the context of human rights - which apply unambiguously to all groups. We present a systematic framework to measure these behaviors in unconstrained LLM responses regarding various identity groups. We evaluate six large