AI RESEARCH
I tested 14 LLMs from 0.6B to 123B. All of them get worse at following instructions when users are hostile [R]
r/MachineLearning
•
TL;DR. Across 14 instruct-model configurations spanning Llama 3.1, Mistral, and Qwen3 from 0.6B to 123B, hostile user prompts produce a significant IFEval instruction-following degradation that replicates across architecture, quantization tier (FP16 vs Q4 MLX), routing (dense vs MoE), and scale. Mean hostility residual at 7-8B instruct class is 7.4pp (approximately 10% relative drop). Effect attenuates monotonically with scale but remains significant at every scale tested, including Mistral Large at 123B. Primary finding. At 7-8B instruct FP16, three independently developed.