Interplay between depth and width for interpolation in neural ODEs

A. Alvarez-Lopez, A. Hadj Slimane, E. Zuazua (2024) Interplay between depth and width for interpolation in neural ODEs, Neural Networks, Vol. 180, pp. 106-640, https://doi.org/10.1016/j.neunet.2024.106640, ISSN 0893-6080, arXiv.2401.09902

Abstract. Neural ordinary differential equations (neural ODEs) have emerged as a natural tool for supervised learning from a control perspective, yet a complete understanding of their optimal architecture remains elusive. In this work, we examine the interplay between their width $p$ and number of layer transitions $L$ (effectively the depth $L+1$ ). Specifically, we assess the model expressivity in terms of its capacity to interpolate either a finite dataset $\mathcal{D}$ comprising $N$ pairs of points or two probability measures in $\mathbb{R}^d$ within a Wasserstein error margin $\varepsilon>0$ . Our findings reveal a balancing trade-off between $p$ and $L$ , with $L$ scaling as $O(1+N/p)$ for dataset interpolation, and $L=O\left(1+(p\varepsilon^d)^{-1}\right)$ for measure interpolation. In the autonomous case, where $L=0$ , a separate study is required, which we undertake focusing on dataset interpolation. We address the relaxed problem of $\varepsilon$ -approximate controllability and establish an error decay of $\varepsilon\sim O(\log(p)p^{-1/d})$ . This decay rate is a consequence of applying a universal approximation theorem to a custom-built Lipschitz vector field that interpolates $\mathcal{D}$ . In the high-dimensional setting, we further demonstrate that $p=O(N)$ neurons are likely sufficient to achieve exact control.

Interplay between depth and width for interpolation in neural ODEs

Almost periodic turnpike phenomenon for time-dependent systems

Turnpike in optimal control and beyond: a survey

Large-time asymptotics for hyperbolic systems with non-symmetric relaxation: An algorithmic approach

Pointwise constraints for scalar conservation laws with positive wave velocity

Gaussian Beam ansatz for finite difference wave equations