Skip to main content
To KTH's start page

Efficient Machine Learning for Edge Computing

Architecture and Application

Time: Fri 2025-03-28 13.00

Location: Ka-Sal B (Peter Weissglas), Kistagången 16, Kista

Video link: https://kth-se.zoom.us/j/63180568741

Language: English

Subject area: Information and Communication Technology

Doctoral student: Wenyao Zhu , Elektronik och inbyggda system

Opponent: Associate Professor Miquel Moretó Planas, Computer Architecture Department (DAC), Universitat Politècnica de Catalunya (UPC), Barcelona, Spain

Supervisor: Professor Zhonghai Lu, Elektronik och inbyggda system; Associate Professor Dejiu Chen, Mekatronik och inbyggda styrsystem

Export to calendar

QC 20250305

Abstract

Machine learning has demonstrated exceptional capability in solving complex tasks across a wide range of fields. Advances in hardware accelerators have enabled the deployment of machine learning models on edge devices, facilitating real-time AI applications in resource-constrained systems. Recent accelerators have increasingly adopted Network-on-Chip (NoC) architectures to support massive data communication within large-scale processing element arrays. However, as the complexity of these accelerators continues to grow, effective design-space exploration before hardware prototyping becomes essential. Additionally, achieving high flexibility and efficiency across diverse machine learning workloads remains a significant challenge, especially for edge computing.

To address these problems, we explore from both the architecture side and the application side. Firstly, we introduce a cycle-accurate simulation tool for NoC-based deep neural network (DNN) accelerators. This simulator enables rapid and precise evaluation of inference efficiency by exploring design parameters. By providing detailed performance tracing into system behavior, the simulator facilitates the optimization of DNN inference efficiency, which can reduce the time and cost associated with hardware prototyping. Then we focus on novel architectural designs for NoC-based DNN accelerators, leveraging in-network processing techniques to improve end-to-end latency and resource utilization. Two key approaches are proposed: an activation-in-network design that offloads non-linear operations to the NoC and a pooling on-the-go design that minimizes communication overhead for pooling layers. These designs demonstrate substantial improvements in processing efficiency upon existing NoC-based accelerator architectures, while maintaining scalability and adaptability for diverse DNN workloads.

The third part explores the application of machine learning in embedded sensor systems, with a focus on lower-limb prostheses. A wearable pressure measurement system is developed to collect and analyze intra-socket pressure data. Two machine learning applications are proposed for solving sub-tasks within the field of comfortable prosthetic socket design. A clustering-based method is developed for optimizing sensor deployment by reducing redundancy while maintaining data integrity. A gait phase recognition approach that utilizes multiple hidden Markov models and Gaussian mixture models is developed. The proposed gait recognition method achieves high accuracy and computational efficiency, which outperforms conventional techniques.

By tackling the challenges in NoC-based accelerator design and machine learning applications for embedded systems, we bridge the gap between hardware optimization and practical deployment. These techniques would pave the way for future advancements in embedded intelligence.

urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-360884