GRU: The Simpler Successor to LSTM That Quietly Took Over Sequence Modeling
Towards AI
•
Machine Learning
How GRU Simplified Memory Control While Preserving Long-Term Learning Image Credit LSTM solved one of the biggest failures in early recurrent neural networks: memory collapse over long sequences. But after LSTM became successful, researchers noticed something uncomfortable. The architecture worked well. The architecture was also unnecessarily heavy. Three gates. Two separate memory systems. Multiple matrix multiplications. A large parameter count. It solved the problem, but at a computational cost.