Traces of Social Competence in Large Language Models

ArXi:2603.04161v2 Announce Type: replace The False Belief Test (FBT) has been the main method for assessing Theory of Mind (ToM) and related socio-cognitive competencies. For Large Language Models (LLMs), the reliability and explanatory potential of this test have remained limited due to issues like data contamination, insufficient model details, and inconsistent controls. We address these issues by testing 17 open-weight models on a balanced set of 192 FBT variants (Trott, 2023) using Bayesian Logistic regression to identify how model size and post.