Skip to main content
To KTH's start page

Embedded Machine Learning

Reliability and Performance Enhancement

Time: Wed 2026-06-03 14.00

Location: Kollegiesalen, ; Brinellvägen 8, Royal Institute of Technology

Video link: https://kth-se.zoom.us/j/63924695817

Language: English

Subject area: Information and Communication Technology

Doctoral student: Yizhi Chen , Elektronik och inbyggda system

Opponent: Associate Professor Luca Pezzarossa, Danmarks Tekniske Universitet - DTU

Supervisor: Ahmed Hemani, Elektronik och inbyggda system; Artur Podobas,

Export to calendar

QC 20260512

Abstract

Deploying machine learning (ML) on resource-constrained embedded systemspresents significant challenges in reliability as well as computational, memory, andpower efficiency. This thesis presents our research contributions across two mainthemes: reliability and performance optimization for embedded ML systems.

Regarding reliability, we developed an online image sensor fault detection method forautonomous vehicles, utilizing historical variance comparison to identify defectivepixels without interrupting camera functionality. We further investigated the impactof various image sensor faults on pruned neural networks for object detection,examining both spatial faults (blur, darkness, speckle noise) and temporal faultsarising from sensor aging. Our work addresses reliability from a system perspective,covering both sensor-level fault detection and neural network-level fault tolerance forembedded AI applications.

Regarding performance optimization, we aim to reduce latency, power, and energyconsumption in ML accelerators through task mapping, data encoding, andapproximate computing techniques. Specifically, we explored approximatecomputing through an FPGA-based non-negative matrix factorization acceleratorwith hybrid logarithmic approximation, achieving a 69× energy reduction comparedwith an ARM CPU implementation. For DNN accelerators, we proposed a traveltime-based task mapping strategy, achieving up to 12.1% latency reduction bydynamically balancing workloads across processing elements. We further developed a‘1’-bit count-based data ordering method to reduce bit transitions on NoC links,achieving up to 40.85% BT reduction and consequently lowering link powerconsumption.

To enable efficient hardware implementation, we designed anapproximate sorting unit for the data ordering method, achieving a 35.4% areareduction with only a 4.5% loss in BT reduction effectiveness (from 20.42% to19.50%). For emerging LLM architectures, we proposed a soft-edge quantizer forState Space Model quantization that improves accuracy by preserving outlierinformation through multi-scale quantization.

Link to DiVA