Interplay between depth and width for interpolation in neural ODEs

A. Alvarez-Lopez, A. Hadj Slimane, E. Zuazua. Interplay between depth and width for interpolation in neural ODEs (2024) M3AS

Abstract. Neural ordinary differential equations (neural ODEs) have emerged as a natural tool for supervised learning from a control perspective, yet a complete understanding of their optimal architecture remains elusive. In this work, we examine the interplay between their width pp and number of layer transitions LL (effectively the depth L+1L+1). Specifically, we assess the model expressivity in terms of its capacity to interpolate either a finite dataset D\mathcal{D} comprising NN pairs of points or two probability measures in Rd\mathbb{R}^d within a Wasserstein error margin ε>0\varepsilon>0. Our findings reveal a balancing trade-off between pp and LL, with LL scaling as O(1+N/p)O(1+N/p) for dataset interpolation, and L=O(1+(pεd)1)L=O\left(1+(p\varepsilon^d)^{-1}\right) for measure interpolation. In the autonomous case, where L=0L=0, a separate study is required, which we undertake focusing on dataset interpolation. We address the relaxed problem of ε\varepsilon-approximate controllability and establish an error decay of εO(log(p)p1/d)\varepsilon\sim O(\log(p)p^{-1/d}). This decay rate is a consequence of applying a universal approximation theorem to a custom-built Lipschitz vector field that interpolates D\mathcal{D}. In the high-dimensional setting, we further demonstrate that p=O(N)p=O(N) neurons are likely sufficient to achieve exact control.

Read Full Paper