Position: Stop Evaluating AI with Human Tests, Develop Principled, AI-specific Tests instead

ArXi:2507.23009v2 Announce Type: replace-cross Large Language Models (LLMs) have achieved remarkable results on a range of standardized tests originally designed to assess human cognitive and psychological traits, such as intelligence and personality. While these results are often interpreted as strong evidence of human-like characteristics in LLMs, this paper argues that such interpretations constitute an ontological error. Human psychological and educational tests are theory-driven measurement instruments, calibrated to a specific human population.