GRU: The Simpler Successor to LSTM That Quietly Took Over Sequence Modeling

Towards AI
Machine Learning

How GRU Simplified Memory Control While Preserving Long-Term Learning Image Credit LSTM solved one of the biggest failures in early recurrent neural networks: memory collapse over long sequences. But after LSTM became successful, researchers noticed something uncomfortable. The architecture worked well. The architecture was also unnecessarily heavy. Three gates. Two separate memory systems. Multiple matrix multiplications. A large parameter count. It solved the problem, but at a computational cost.