AI RESEARCH

Faithfulness vs. Safety: Evaluating LLM Behavior Under Counterfactual Medical Evidence

arXiv CS.CL

ArXi:2601.11886v2 Announce Type: replace In high-stakes domains like medicine, it may be generally desirable for models to faithfully adhere to the context provided. But what happens if the context does not align with model priors or safety protocols? In this paper, we investigate how LLMs behave and reason when presented with counterfactual (or even adversarial) medical evidence.