AI RESEARCH
[R] Empirical evidence for a primitive layer in small language models — 18 experiments across 4 architectures
r/MachineLearning
•
We ran 18 experiments probing small language models (360M-1B parameters) with inputs ranging from random phonemes to Wierzbicka's universal semantic primitives. The main finding: a consistent activation gap exists between what we term Layer 0a (scaffolding primitives: SOMEONE, TIME, PLACE) and Layer 0b (content primitives: FEAR, GRIEF, JOY, ANGER). The gap averaged +0.245 across all four tested architectures (Qwen 2.5, Gemma 3, LLaMA 3.2, SmolLM2) and was directionally consistent in every model.