AI RESEARCH
[R] I trained a 3k parameter model on XOR sequences of length 20. It extrapolates perfectly to length 1,000,000. Here's why I think that's architecturally significant.
r/MachineLearning
•
I've been working on an alternative to attention-based sequence modeling that I'm calling Geometric Flow Networks