AI RESEARCH

[R] I trained a 3k parameter model on XOR sequences of length 20. It extrapolates perfectly to length 1,000,000. Here's why I think that's architecturally significant.

r/MachineLearning

I've been working on an alternative to attention-based sequence modeling that I'm calling Geometric Flow Networks