Efficient Exploration and Robustness in Controlled Dynamical Systems
Time: Wed 2023-09-13 14.00
Location: Kollegiesalen, Brinellvägen 8, Stockholm
Video link: https://kth-se.zoom.us/j/65648025970
Subject area: Electrical Engineering Computer Science Mathematical Statistics
Doctoral student: Alessio Russo , Reglerteknik, Statistical Learning for Control
Opponent: Associate Professor Marcello Restelli, Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano; Assistant Professor Nikolai Matni, Department of Electrical and Systems Engineering, University of Pennsylvania; Associate Professor Jana Tumova, Robotik, perception och lärande, RPL; Associate Professor Mikael Asplund, Department of Computer and Information Science, Linköping University
Supervisor: Professor Alexandre Proutiere, Reglerteknik; Professor Henrik Sandberg, ACCESS Linnaeus Centre, Reglerteknik
In this thesis, we explore two distinct topics. The first part of the thesis delves into efficient exploration in multi-task bandit models and model-free exploration in large Markov decision processes (MDPs). This section is driven by the recent research community's interest in uncovering optimal methods to navigate complex decision-making problems. In the second part, we examine attack vectors against MDPs and dynamical systems, which is motivated by the research community's recent push to enhance the safety of Machine Learning models against malicious attacks. Additionally, we explore how to quantify uncertainty in an off-policy evaluation problem, reflecting our ongoing efforts to deepen understanding in this domain.
In the first part of the thesis, we present an investigation into the sample complexity of identifying the optimal arm in multi-task bandit problems. In this setting, an agent can select a task and efficiently learn through cross-task knowledge transfer. We derive instance-specific lower bounds that any probably approximately correct (PAC) algorithm should satisfy, revealing both theoretically and empirically significant improvements in scaling over previous methods. Subsequently, we investigate model-free exploration in Reinforcement Learning (RL) problems. By leveraging an information-theoretical viewpoint, we focus on the instance-specific lower bound for the number of samples needed to identify a nearly-optimal policy. We develop an approximation of this lower bound involving quantities that can be inferred using model-free approaches. This leads to a new model-free exploration strategy applicable to continuous MDPs.
In the second part of the thesis, we begin by investigating two types of attacks on MDPs: those that alter the observations and those that manipulate the control signals of the victim. We present strategies for designing optimal attacks to minimize the collected reward of the victim. Our strategies show how uncertainties induced by the attack can be modeled using a partially observable MDP (POMDP) framework. We also illustrate how to devise attacks that achieve lower detectability, approaching the problem from a statistical detection perspective. Next, we explore the problem of an eavesdropper aiming to detect changes in an MDP. Approaching this problem from a statistical detection perspective, we introduce a novel definition of online privacy based on the average amount of information per observation of the underlying stochastic system. We derive privacy upper bounds and calculate policies that attain higher privacy levels, supplementing our analysis with examples and numerical simulations. Finally, we present a new off-policy estimation method based on Conformal Prediction that outputs an interval containing the target policy's true reward, demonstrating how to handle the distribution shift between target and behavior policies, and maintain the certainty level while reducing the interval length.Next, we shift our focus onto linear dynamical systems. We study poisoning attacks on data-driven control methods, focusing on how slight changes in the dataset induced by an adversary can significantly deteriorate control methods and potentially destabilize the system. We propose a novel algorithm for computing effective attacks, providing a theoretical basis for our strategy. We also study the detectability of poisoning attacks, focusing on the impact of data poisoning on least-squares estimates. We establish conditions under which the set of models compatible with the data includes the true model of the system, and we analyze different poisoning strategies for the attacker. On the basis of the arguments presented herein, we propose a stealthy data poisoning attack on the least-squares estimator that can evade classical statistical tests.