A. Alvarez-Lopez, A. Hadj Slimane, E. Zuazua. Interplay between depth and width for interpolation in neural ODEs (2024) M3AS
Abstract. Neural ordinary differential equations (neural ODEs) have emerged as a natural tool for supervised learning from a control perspective, yet a complete understanding of their optimal architecture remains elusive. In this work, we examine the interplay between their width p and number of layer transitions L (effectively the depth L+1). Specifically, we assess the model expressivity in terms of its capacity to interpolate either a finite dataset \mathcal{D} comprising N pairs of points or two probability measures in \mathbb{R}^d within a Wasserstein error margin \varepsilon>0. Our findings reveal a balancing trade-off between p and L, with L scaling as O(1+N/p) for dataset interpolation, and L=O\left(1+(p\varepsilon^d)^{-1}\right) for measure interpolation. In the autonomous case, where L=0, a separate study is required, which we undertake focusing on dataset interpolation. We address the relaxed problem of \varepsilon-approximate controllability and establish an error decay of \varepsilon\sim O(\log(p)p^{-1/d}). This decay rate is a consequence of applying a universal approximation theorem to a custom-built Lipschitz vector field that interpolates \mathcal{D}. In the high-dimensional setting, we further demonstrate that p=O(N) neurons are likely sufficient to achieve exact control.