Sparse approximation in learning via neural ODEs

Esteve C., Geshkovski B. Sparse approximation in learning via neural ODEs (2021)

Abstract. We consider the continuous-time, neural ordinary differential equation (neural ODE) perspective of deep supervised learning, and study the impact of the final time horizon $T$ in training. We focus on a cost consisting of an integral of the empirical risk over the time interval, and $L1$ –parameter regularization. Under homogeneity assumptions on the dynamics (typical for ReLU activations), we prove that any global minimizer is sparse, in the sense that there exists a positive stopping time $T*$ beyond which the optimal parameters vanish. Moreover, under appropriate interpolation assumptions on the neural ODE, we provide quantitative estimates of the stopping time $T*$ , and of the training error of the trajectories at the stopping time. The latter stipulates a quantitative approximation property of neural ODE flows with sparse parameters. In practical terms, a shorter time-horizon in the training problem can be interpreted as considering a shallower residual neural network (ResNet), and since the optimal parameters are concentrated over a shorter time horizon, such a consideration may lower the computational cost of training without discarding relevant information.

Read Full Paper

Last updated on March 17, 2022

Sparse approximation in learning via neural ODEs

Interplay between depth and width for interpolation in neural ODEs

Exponential convergence to steady-states for trajectories of a damped dynamical system modelling adhesive strings

FedADMM-InSa: An Inexact and Self-Adaptive ADMM for Federated Learning

Fourier series and sidewise control of 1-d waves

Stability and Convergence of a Randomized Model Predictive Control Strategy