Towards Efficient Distributed Intelligence
Cost-Aware Sensing and Offloading for Inference at the Edge
Time: Fri 2026-01-16 10.00
Location: Salongen, Osquars backe 31, Stockholm
Video link: https://kth-se.zoom.us/s/61617488895
Language: English
Subject area: Electrical Engineering
Doctoral student: Vishnu Narayanan Moothedath , Teknisk informationsvetenskap
Opponent: Professor Ganesh Ayalvadi, University of Bristol, Bristol, United Kingdom
Supervisor: Professor James Gross, Teknisk informationsvetenskap; Professor György Dán, Nätverk och systemteknik
QC 20251127
Abstract
The ongoing proliferation of intelligent systems, driven by artificial intelligence (AI) and 6G, is leading to a surge in closed-loop inference tasks performed on distributed compute nodes.These systems operate under strict latency and energy constraints, extending the challenge beyond achieving high accuracy to enabling timely and energy-efficient inference.This thesis examines how distributed inference can be optimised through two key decisions: when to sample the environment and when to offload computation to a more accurate remote model.These decisions are guided by the semantics of the underlying environment and its associated costs.The semantics are kept abstract, and pre-trained inference models are employed, ensuring a platform-independent formulation adaptable to the rapid evolution of distributed intelligence and wireless technologies.
Regarding sampling, we studied the trade-off between sampling cost and detection delay in event-detection systems without sufficient local inference capabilities. The problem was posed as an optimisation over sampling instants under a stochastic event sequence and analysed at different levels of modelling complexity, ranging from periodic to aperiodic sampling. Closed-form, algorithmic, and approximate solutions were developed, with some results of independent mathematical interest.Simulations in realistic settings showed marked gains in efficiency over systems that neglect event semantics. In particular, aperiodic sampling achieved a stable improvement of ~10% over optimised periodic policies across parameter variations.
Regarding offloading, we introduced a novel Hierarchical Inference (HI) framework, which makes sequential offload decisions between a low-latency, energy-efficient local model and a high-accuracy remote model using locally available confidence measures. We proposed HI algorithms based on thresholds and ambiguity regions learned online by suitably extending the Prediction with Expert Advice (PEA) approaches to continuous expert spaces and partial feedback. HI algorithms minimise the expected cost across inference rounds, combining offloading and misclassification costs, and are shown to achieve a uniformly sublinear regret of O(T2/3).The proposed algorithms are agnostic to model architecture and communication systems, do not alter model training, and support model updates during operation. Benchmarks on standard classification tasks using the softmax output as a confidence measure showed that HI adaptively distributes inference based on offloading costs, achieving results close to the offline optimum. HI is shown to add resilience to distribution changes and model mismatches, especially when asymmetric misclassification costs are present.
In summary, this thesis presents efficient approaches for sampling and offloading of inference tasks, where various performance metrics are combined into a single cost structure. The work extends beyond conventional inference problems to areas with similar trade-offs, advancing toward efficient distributed intelligence that infers at the right time and in the right place. Future work includes conceptual extensions like joint sampling-offloading design, and integration with collaborative model-training architectures.