AI RESEARCH

Deliberative Alignment is Deep, but Uncertainty Remains: Inference time safety improvement in reasoning via attribution of unsafe behavior to base model

arXiv CS.AI

ArXi:2604.09665v1 Announce Type: cross While the wide adoption of refusal