Improving Deep Neural Learning Networks (Part 2): Optimization Algorithms

Gradient descent is just the starting point - the real question is how fast and how reliably you can reach a good minimum. The series has 4 parts: Part 1. Practical Aspects Improvements - Part 2: Optimization Algorithms Part 3: Hyperparameter Tuning, Batch Normalization & Frameworks Part 4: From Deep Neural Networks to LLMs and Agentic AI Let’s get into Part 2! 1. Mini-batch Gradient Descent Mini-batch Gradient Descent sits between 2 extremes: Batch Gradient Descent (full batch) and Stochastic Gradient Descent (SGD). Batch Gradient Descent (full batch): uses the entire.