AI RESEARCH
Testing the Limits of Truth Directions in LLMs
arXiv CS.AI
•
ArXi:2604.03754v1 Announce Type: cross Large language models (LLMs) have been shown to encode truth of statements in their activation space along a linear truth direction. Previous studies have argued that these directions are universal in certain aspects, while recent work has questioned this