# Approximate Methods of Optimal Control via Dynamic Programming Models

Time: Tue 2023-03-21 15.00

Language: English

Subject area: Electrical Engineering

Doctoral student: Yuchao Li , Reglerteknik

Opponent: Professor Moritz Diehl, Department of Microsystems Engineering and Department of Mathematics, University of Freiburg

Supervisor: Professor Jonas Mårtensson, Integrated Transport Research Lab, ITRL, Reglerteknik; Professor Karl H. Johansson, Reglerteknik

QC 20230227

## Abstract

Optimal control theory has a long history and broad applications. Motivated by the goal of obtaining insights through unification and taking advantage of the abundant capability to generate data and perform online simulation, this thesis studies the discrete-time infinite horizon optimal control problems and introduces some approximate solution methods via abstract dynamic programming (DP) models. The proposed methods involve approximation in value space through the use of data and simulator, apply to a broad class of problems, and strike a good balance between satisfactory performance and computational expenditure.

First, we consider deterministic problems with nonnegative stage costs. We derive sufficient conditions under which a local controllability condition holds for the constrained nonlinear systems, and apply the results to establish the convergence of the classical algorithms, including value iteration, policy iteration (PI), and optimistic PI. These results provide a starting point for the design of suboptimal schemes. Then we propose algorithms that take advantage of system trajectory or the presence of parallel computing units to approximate the optimal costs. These algorithms can be viewed as variants of model predictive control (MPC) or rollout, and can be applied to deterministic problems with arbitrary state and control spaces, and arbitrary dynamics. It admits extensions to problems with trajectory constraints, and a multiagent structure. Via the viewpoint provided by the abstract DP models, we also derive the performance bounds of MPC applied to unconstrained and constrained linear quadratic problems, as well as their nonlinear counterparts. These insights suggest new designs of MPC, which likely lead to larger feasible regions of the scheme while costing hardly any loss of performance measured by the costs accumulated over infinite stages.

Moreover, we derive algorithms to address problems with a fixed discount factor on future costs. We apply abstract DP models to analyze $\lambda$-PI with randomization algorithms for problems with infinite policies. We show that a contraction property induced by the discount factor is sufficient for the well-posedness of the algorithm. Moreover, we identify the conditions under which the algorithm is convergent with probability one. Guided by the analysis, we exemplify a data-driven approximate implementation of the algorithm for the approximation of the optimal costs of constrained linear and nonlinear control problems. The obtained optimal cost approximations are applied in a related suboptimal scheme. Then we consider discounted problems with discrete state and control spaces and a multiagent structure. When applying rollout to address the problem, the main challenge is to perform minimization over a large control space. To this end, we propose a rollout variant that involves reshuffling the order of the agents. The approximation of the costs of base policies is through the use of on-line simulation. The proposed approach is applied to address multiagent path planning problems within a warehouse context, where through on-line replanning, the robots can adapt to a changing environment while avoiding collision with each other.