AI RESEARCH
Deliberative Alignment is Deep, but Uncertainty Remains: Inference time safety improvement in reasoning via attribution of unsafe behavior to base model
arXiv CS.AI
•
ArXi:2604.09665v1 Announce Type: cross While the wide adoption of refusal