Skip to main content
To KTH's start page

Multi-Agent Learning Under Spatio-Temporal Constraints in Coordinated Communication Networks

Time: Thu 2026-05-07 14.00

Location: F3 (Flodis), Lindstedtsvägen 26

Video link: https://kth-se.zoom.us/s/67709142389

Language: English

Doctoral student: Albin Larsson Forsberg , Robotik, perception och lärande

Opponent: Sindri Magnússon,

Supervisor: Jana Tumova, Robotik, perception och lärande; Alexandros Nikou, ; Aneta Vulgarakis Feljan,

Export to calendar

QC 20260410

Abstract

Modern cellular networks have gotten more complex over the years, transitioning from sparse macro-cell deployments to ultra-dense, heterogeneous systems. In this thesis we consider a radio resource management (RRM) problem called remote electrical tilt (RET), in particular. The objective in RET opimization is to tune antenna tilt parameters in the network to allocate radio resources where they are the most needed. As cellular networks evolve toward 6G, we expect an unprecedented increased need for autonomous decision making in the networks, introducing new coordination challenges exacerbated by the denser networks. Traditional network management has been reliant on manual engineering and rule-based heuristics and is insufficient for the needs of the next generation as it scales poorly. While Multi-Agent Reinforcement Learning appears as a promising tool for autonomously adapting the network, currently deployed solutions often struggle with the large scale of the problem. Additionally, they fail to provide formal guarantees, and remain limited by myopic and step-wise reward structures that cannot capture complex constraints communication service providers (CSPs) may impose on the network. Lacking these attributes holds back deployment in live networks beyond small scale pilot studies.

This thesis proposes a series of approaches that aim to provide high-assurance autonomous network parameter control. The contributions progressively build on each other from spatial interference coordination to long-horizon, risk-aware planning to satisfy CSP network intents. First, we address the myopic constraints by leveraging graph-based decomposition and coordination graphs to factorize the joint action space, enabling scalable \textit{constrained} learning in dense urban environments. Recognizing that critical infrastructure demands reliability beyond mean performance, we also introduce a risk-aware constrained learning framework utilizing Conditional Value-at-Risk to provide probabilistic reasoning over constraints in the network.

To bridge the gap between low-level control and high-level CSP intents, we transition from scalar rewards to formal specifications. We utilize Signal Temporal Logic (STL) and transformer-based architectures to satisfy complex intents, enabling agents to reason over  long-horizon requirements. Finally, we move beyond traditional control policies toward generative planning of trajectory rollouts. We aim to enable the generation of safe, high-quality plans that respect hard constraints with probabilistic guarantees by using diffusion probabilistic.

The proposed methods are evaluated on high-fidelity simulators modeled after real-world urban topologies. The results demonstrate that by integrating structural coordination, formal logic, and generative modeling, it is possible to address many of the issues that plague contemporary autonomous network management. The policies that are obtained by these approaches are not only high-performing but also interpretable, safe, and aligned with the rigorous demands of next-generation telecommunications infrastructure.

Link to DiVA