Wed.3 16:00–17:15 | H 0112 | PDE
.

Decomposition-Based Methods for Optimization of Time-Dependent Problems (2/2)

Chair: Matthias Heinkenschloss Organizers: Carl Laird, Matthias Heinkenschloss
16:00

Nicolas Gauger

joint work with Stefanie Günther, Jacob B. Schroder, Lars Ruthotto, Eric Cyr

Layer-parallel training of deep residual neural networks

Residual neural networks (ResNets) are a promising class of deep neural networks that have shown excellent performance for a number of learning tasks, e.g., image classification and recognition. Mathematically, ResNet architectures can be interpreted as forward Euler discretizations of a nonlinear initial value problem whose time-dependent control variables represent the weights of the neural network. Hence, training a ResNet can be cast as an optimal control problem of the associated dynamical system. For similar time-dependent optimal control problems arising in engineering applications, parallel-in-time methods have shown notable improvements in scalability. In this talk, we demonstrate the use of those techniques for efficient and effective training of ResNets. The proposed algorithms replace the classical (sequential) forward and backward propagation through the network layers by a parallel nonlinear multigrid iteration applied to the layer domain. This adds a new dimension of parallelism across layers that is attractive when training very deep networks. From this basic idea, we derive multiple layer-parallel methods.The most efficient version employs a simultaneous optimization approach where updates to the network parameters are based on inexact gradient information in order to speed up the training process. Using numerical examples from supervised classification, we demonstrate that the new approach achieves similar training performance to traditional methods, but enables layer-parallelism and thus provides speedup over layer-serial methods through greater concurrency.

16:25

Alexander Engelmann

joint work with Timm Faulwasser

Decomposition of optimal control problems using bi-level distributed ALADIN

We explore the decomposition of optimal control problems using a recently proposed decentralized variant of the Augmented Lagrangian Alternating Direction Inexact Newton (ALADIN) method. Specifically we consider a bi-level distributed variant of ALADIN, wherein the outer ALADIN structure is combined with a second inner level of distribution handling the coupling QP by means of decentralized ADMM or similar algorithms. We draw upon case studies from energy systems and from predictive control to illustrate the efficacy of the proposed framework.

16:50

Carl Laird

joint work with Michael Bynum, Bethany Nicholson, Jose Santiago Rodriguez, John Siirola, Victor Zavala

Schur-complement and ADMM approaches for Time-Domain Decomposition in Optimization with PyNumero

PyNumero is a Python package based on Pyomo that supports development of numerical optimization algorithms. Using MPI, PyNumero has been used to develop decomposition algorithms for structured nonlinear programming problems, including problem-level decomposition approaches (e.g., ADMM) and approaches that decompose the linear algebra in the inner SQP step. In this presentation, we will show computational performance of Schur-complement based approaches and ADMM approaches on with time-domain decomposition for dynamic optimization problems.