# Learning Sequential Decision Rules in Control Design: Regret-Optimal and Risk-Coherent Methods

**Time: **
Wed 2021-06-09 16.00

**Location: **
zoom link for online defense (English)

**Doctoral student: **
Matias I. Müller
, Reglerteknik

**Opponent: **
Professor Tom Oomen, Eindhoven University of Technology, Department of Mechanical Engineering

**Supervisor: **
Associate Professor Cristian R. Rojas, Reglerteknik; Professor Håkan Hjalmarsson, Reglerteknik

## Abstract

Engineering sciences deal with the problem of optimal design in the face of uncertainty. In particular, control engineering is concerned about designing policies/laws/algorithms that sequentially take decisions given unreliable data. This thesis addresses two particular instances of optimal sequential decision making for two different problems.

The first problem is known as the H_{∞}-norm (or, in general, ℓ_{2}-gain for nonlinear systems) estimation problem, which is a fundamental quantity in control design through, e.g., the small gain theorem. Given an unknown system, the goal is to find the maximum ℓ_{2}-gain which, in a model-free approach, involves solving a sequential input design problem. The H_{∞}-norm estimation problem (or simply "gain estimation problem") is cast as the composition of multi-armed bandit problem generating data, and an optimal estimation problem given that data. The problem of generating data is a sequential input-design problem in which, at every round, the decision-maker chooses one (or many) frequencies to sample from the unknown frequency response of the system under study. We show that Thompson Sampling (TS), a classical bandit algorithm, is optimal within the class of algorithms that chooses only one frequency per round. Additionally, we introduce Weighted Thompson Sampling (WTS), which is a TS-based algorithm that can sample many frequencies at every round. In this thesis, we prove that WTS is an optimal bandit policy within the class of algorithms that can sample many frequencies simultaneously. On the other hand, the problem of estimating the H_{∞}-norm of the system using the data provided by the bandit algorithm is also discussed. In particular, we show that the expected estimation error of the gain of the system asymptotically matches the Cramér-Rao lower bound for a proposed estimator, and for every bandit policy in a wide class of algorithms.

In the second part, we address the problem of risk-coherent optimal control design for disturbance rejection under uncertainty, where optimality is studied from an H_{2} and an H_{∞} sense. We consider a parametric model for the plant and the noise spectrum, where the modeling error between the model and the real system is uncertain. This uncertainty is condensed in a probability density function over the different realizations of the parameters defining the model. We use this information to design a controller that minimizes the risk of falling into poor closed-loop performance within a financial theory of risk framework. When the parameters in the plant are not known with sufficient accuracy for control purposes, we introduce a framework that allows us to tackle the joint-stabilization problem by means of sequential convex relaxations, each of them leading to a semi-definite program. On the other hand, when the noise spectrum is uncertain, we propose a systematic scenario approach for designing H_{2}- and H_{∞}-optimal controllers in terms of quadratically-constrained linear programs and sequential semi-definite programming, respectively. Simulations show that, from a risk-theoretical perspective, exploiting the information encoded in the probability density function of the parameters defining the models better balances the risk of falling into poor closed-loop performances.