AI RESEARCH

Why SSMs struggle in parameter-constrained training: empirical findings at 25M parameters [R]

r/MachineLearning

After ~3 weeks of experimentation in OpenAI's Parameter Golf competition, I wrote up why SSMs are structurally disadvantaged relative to transformers in a time- and size-constrained regime (10 min