Testing the Limits of Truth Directions in LLMs

ArXi:2604.03754v1 Announce Type: cross Large language models (LLMs) have been shown to encode truth of statements in their activation space along a linear truth direction. Previous studies have argued that these directions are universal in certain aspects, while recent work has questioned this