AI RESEARCH
Probing Ethical Framework Representations in Large Language Models: Structure, Entanglement, and Methodological Challenges
arXiv CS.CL
•
ArXi:2603.23659v1 Announce Type: new When large language models make ethical judgments, do their internal representations distinguish between normative frameworks, or collapse ethics into a single acceptability dimension? We probe hidden representations across five ethical frameworks (deontology, utilitarianism, virtue, justice, commonsense) in six LLMs spanning 4B--72B parameters.