CounterMoral: Editing Morals in Language Models

ArXi:2603.27338v1 Announce Type: new Recent advancements in language model technology have significantly enhanced the ability to edit factual information. Yet, the modification of moral judgments, a crucial aspect of aligning models with human values, has garnered less attention. In this work, we