Convergence rates for gradient descent in the training of overparameterized artificial neural networks with piecewise affine activation

ArXi:2102.11840v2 Announce Type: replace In recent years, artificial neural networks have developed into a powerful tool for addressing a multitude of problems for which classical solution approaches reach their limits. However, it is still unclear why gradient descent optimization algorithms with random initialization, such as the well-known batch gradient descent, are able to achieve zero