Nvidia built a silent opinion engine into NemotronH to gaslight you and they're not the only ones doing it

r/LocalLLaMA
AI Hardware

Hey everyone, I found something weird while uncensoring Nvidia's NemotronH family this past week. These models don't just refuse harmful prompts in the typical fashion for certain graphic categories. Nvidia trained a completely separate behavior and flaunts it as a positive technological breakthrough. The model quietly rewrites what you asked into the opposite. There is no disclosure and no refusal message, but directly different content than what you requested. The thinking trace makes it obvious.