Selectively reducing eval awareness and murder in Gemma 3 27B via steering (5 minute read)
TLDR AI
•
Open Source AI
AI Research
Google's Gemma 3 models, including the 27B variant, were steered to alter features correlating with evaluation awareness and the intent to murder.