Abstract. We analyze Neural Ordinary Differential Equations (NODEs) from a control theoretical perspective to address some of the main properties and paradigms of Deep Learning (DL), in particular, data classification and universal approximation. These objectives are tackled and achieved from the perspective of the simultaneous control of systems of NODEs. For instance, in the context of classification, each item to be classified corresponds to a different initial datum for the control problem of the NODE, to be classified, all of them by the same common control, to the location (a subdomain of the euclidean space) associated to each label. Our proofs are genuinely nonlinear and constructive, allowing us to estimate the complexity of the control strategies we develop. The nonlinear nature of the activation functions governing the dynamics of NODEs under consideration plays a key role in our proofs, since it allows deforming half of the phase space while the other half remains invariant, a property that classical models in mechanics do not fulfill. This very property allows to build elementary controls inducing specific dynamics and transformations whose concatenation, along with properly chosen hyperplanes, allows achieving our goals in finitely many steps. The nonlinearity of the dynamics is assumed to be Lipschitz. Therefore, our results apply also in the particular case of the ReLU activation function. We also present the counterparts in the context of the control of neural transport equations, establishing a link between optimal transport and deep neural networks.