AI RESEARCH
Why SSMs struggle in parameter-constrained training: empirical findings at 25M parameters [R]
r/MachineLearning
•
After ~3 weeks of experimentation in OpenAI's Parameter Golf competition, I wrote up why SSMs are structurally disadvantaged relative to transformers in a time- and size-constrained regime (10 min