AI RESEARCH

Probing Ethical Framework Representations in Large Language Models: Structure, Entanglement, and Methodological Challenges

arXiv CS.CL

ArXi:2603.23659v1 Announce Type: new When large language models make ethical judgments, do their internal representations distinguish between normative frameworks, or collapse ethics into a single acceptability dimension? We probe hidden representations across five ethical frameworks (deontology, utilitarianism, virtue, justice, commonsense) in six LLMs spanning 4B--72B parameters.