Skip to content
  • Publications
  • Jobs
  • enzuazua
  • Seminars
  • Events Calendar
  • Home
  • About us
    • About the Chair
    • Head of the Chair
    • Team
    • Past Members
  • Research
    • Projects
    • ERC – DyCon
    • DyCon Blog
    • DyCon Toolbox
    • Industrial & Social TransferenceContents related to the industrial and social transference aspects of the work in the Chair of Computational Mathematics.
  • Publications
    • Publications (All)
    • Publications Relased
      • Publications 2022
      • Publications 2021
      • Publications 2020
      • Publications 2019
      • Publications 2018
      • Publications 2017
      • Publications 2016
    • AcceptedAccepted to be released
    • SubmittedSubmitted publications
  • Activities
    • Events calendar
    • Past Events
    • News
    • Seminars
    • Courses
    • enzuazua
    • Gallery
  • Jobs
  • Contact

Large-time asymptotics in deep learning

Esteve C., Geshkovski B., Pighin D., Zuazua E. Large-time asymptotics in deep learning (2021). hal-02912516

Abstract. It is by now well-known that practical deep supervised learning may roughly be cast as an optimal control problem for a specific discrete-time, nonlinear dynamical system called an artificial neural network. In this work, we consider the continuous-time formulation of the deep supervised learning problem, and study the latter’s behavior when the final time horizon increases, a fact that can be interpreted as increasing the number of layers in the neural network setting.

When considering the classical regularized empirical risk minimization problem, we show that, in long time, the optimal states converge to zero training error, namely approach the zero training error regime, whilst the optimal control parameters approach, on an appropriate scale, minimal norm parameters with corresponding states precisely in the zero training error regime. This result provides an alternative theoretical underpinning to the notion that neural networks learn best in the overparametrized regime, when seen from the large layer perspective.

We also propose a learning problem consisting of minimizing a cost with a state tracking term, and establish the well-known turnpike property, which indicates that the solutions of the learning problem in long time intervals consist of three pieces, the first and the last of which being transient short-time arcs, and the middle piece being a long-time arc staying exponentially close to the optimal solution of an associated static learning problem. This property in fact stipulates a quantitative estimate for the number of layers required to reach the zero training error regime.

Both of the aforementioned asymptotic regimes are addressed in the context of continuous-time and continuous space-time neural networks, the latter taking the form of nonlinear, integro-differential equations, hence covering residual neural networks with both fixed and possibly variable depths..

Read Full Paper

Arxiv: arXiv:2008.02491

Tags:
deep learningNeural ODEsoptimal controlResidual Neural NetworksSupervised Learningturnpike property
Last updated on March 17, 2022

Post navigation

Previous Post
Multiplicity of solutions for fractional q(.)-Laplacian equations
Next Post
The Finite-Time Turnpike Phenomenon for Optimal Control Problems: Stabilization by Non-smooth Tracking Terms

Last Publications

Optimal actuator design via Brunovsky’s normal form

Stability and Convergence of a Randomized Model Predictive Control Strategy

Slow decay and Turnpike for Infinite-horizon Hyperbolic LQ problems

Control of certain parabolic models from biology and social sciences

Relaxation approximation and asymptotic stability of stratified solutions to the IPM equation

  • FAU MoD Lecture: Applications of AAA Rational Approximation
  • DASEL
  • Optimal actuator design via Brunovsky’s normal form
  • ERC DyCon Impact Dimension (2016-2022)
  • Spectral inequalities for pseudo-differential operators and control theory on compact manifolds
  • FAU MoD Lecture: Applications of AAA Rational Approximation
  • DASEL
  • Optimal actuator design via Brunovsky’s normal form
  • ERC DyCon Impact Dimension (2016-2022)
  • Spectral inequalities for pseudo-differential operators and control theory on compact manifolds
Copyright 2016 - 2023 — . All rights reserved. Chair of Computational Mathematics, Deusto Foundation - University of Deusto
Scroll to Top
  • Aviso Legal
  • Política de Privacidad
  • Política de Cookies
  • Configuración de Cookies
WE USE COOKIES ON THIS SITE TO ENHANCE USER EXPERIENCE. We also use analytics. By navigating any page you are giving your consent for us to set cookies.    more information
Privacidad