Selectively reducing eval awareness and murder in Gemma 3 27B via steering (5 minute read)

TLDR AI
Open Source AI AI Research

Google's Gemma 3 models, including the 27B variant, were steered to alter features correlating with evaluation awareness and the intent to murder.