Embedded Machine Learning
Reliability and Performance Enhancement
Time: Wed 2026-06-03 14.00
Location: Kollegiesalen, ; Brinellvägen 8, Royal Institute of Technology
Video link: https://kth-se.zoom.us/j/63924695817
Language: English
Subject area: Information and Communication Technology
Doctoral student: Yizhi Chen , Elektronik och inbyggda system
Opponent: Associate Professor Luca Pezzarossa, Danmarks Tekniske Universitet - DTU
Supervisor: Ahmed Hemani, Elektronik och inbyggda system; Artur Podobas,
QC 20260512
Abstract
Deploying machine learning (ML) on resource-constrained embedded systemspresents significant challenges in reliability as well as computational, memory, andpower efficiency. This thesis presents our research contributions across two mainthemes: reliability and performance optimization for embedded ML systems.
Regarding reliability, we developed an online image sensor fault detection method forautonomous vehicles, utilizing historical variance comparison to identify defectivepixels without interrupting camera functionality. We further investigated the impactof various image sensor faults on pruned neural networks for object detection,examining both spatial faults (blur, darkness, speckle noise) and temporal faultsarising from sensor aging. Our work addresses reliability from a system perspective,covering both sensor-level fault detection and neural network-level fault tolerance forembedded AI applications.
Regarding performance optimization, we aim to reduce latency, power, and energyconsumption in ML accelerators through task mapping, data encoding, andapproximate computing techniques. Specifically, we explored approximatecomputing through an FPGA-based non-negative matrix factorization acceleratorwith hybrid logarithmic approximation, achieving a 69× energy reduction comparedwith an ARM CPU implementation. For DNN accelerators, we proposed a traveltime-based task mapping strategy, achieving up to 12.1% latency reduction bydynamically balancing workloads across processing elements. We further developed a‘1’-bit count-based data ordering method to reduce bit transitions on NoC links,achieving up to 40.85% BT reduction and consequently lowering link powerconsumption.
To enable efficient hardware implementation, we designed anapproximate sorting unit for the data ordering method, achieving a 35.4% areareduction with only a 4.5% loss in BT reduction effectiveness (from 20.42% to19.50%). For emerging LLM architectures, we proposed a soft-edge quantizer forState Space Model quantization that improves accuracy by preserving outlierinformation through multi-scale quantization.