AI RESEARCH

[P] Clip to Grok Update: Weight Norm Clipping now 39–249× | 6 Tasks (mod arithmetic, mixed ops, S5 permutation) | max_norm Measured Per Task

r/MachineLearning

Seed 0 results on mul mod -97, mixed add,sub,mul and di mode p97 and S5 permutation with max norm ablation Update to our previous post. We're two independent researchers. Since the last post we expanded from modular multiplication to six algebraic tasks - four single modular arithmetic operations, their composition into a mixed task, and S5 permutation composition (non-abelian, 120 elements). Method (unchanged): per-row ℓ₂ clipping on decoder weights after every optimizer step. No weight decay, no extra memory.