Efficient Machine Learning for Edge Computing
Architecture and Application
Time: Fri 2025-03-28 13.00
Location: Ka-Sal B (Peter Weissglas), Kistagången 16, Kista
Video link: https://kth-se.zoom.us/j/63180568741
Language: English
Subject area: Information and Communication Technology
Doctoral student: Wenyao Zhu , Elektronik och inbyggda system
Opponent: Associate Professor Miquel Moretó Planas, Computer Architecture Department (DAC), Universitat Politècnica de Catalunya (UPC), Barcelona, Spain
Supervisor: Professor Zhonghai Lu, Elektronik och inbyggda system; Associate Professor Dejiu Chen, Mekatronik och inbyggda styrsystem
QC 20250305
Abstract
Machine learning has demonstrated exceptional capability in solving complex tasks across a wide range of fields. Advances in hardware accelerators have enabled the deployment of machine learning models on edge devices, facilitating real-time AI applications in resource-constrained systems. Recent accelerators have increasingly adopted Network-on-Chip (NoC) architectures to support massive data communication within large-scale processing element arrays. However, as the complexity of these accelerators continues to grow, effective design-space exploration before hardware prototyping becomes essential. Additionally, achieving high flexibility and efficiency across diverse machine learning workloads remains a significant challenge, especially for edge computing.
To address these problems, we explore from both the architecture side and the application side. Firstly, we introduce a cycle-accurate simulation tool for NoC-based deep neural network (DNN) accelerators. This simulator enables rapid and precise evaluation of inference efficiency by exploring design parameters. By providing detailed performance tracing into system behavior, the simulator facilitates the optimization of DNN inference efficiency, which can reduce the time and cost associated with hardware prototyping. Then we focus on novel architectural designs for NoC-based DNN accelerators, leveraging in-network processing techniques to improve end-to-end latency and resource utilization. Two key approaches are proposed: an activation-in-network design that offloads non-linear operations to the NoC and a pooling on-the-go design that minimizes communication overhead for pooling layers. These designs demonstrate substantial improvements in processing efficiency upon existing NoC-based accelerator architectures, while maintaining scalability and adaptability for diverse DNN workloads.
The third part explores the application of machine learning in embedded sensor systems, with a focus on lower-limb prostheses. A wearable pressure measurement system is developed to collect and analyze intra-socket pressure data. Two machine learning applications are proposed for solving sub-tasks within the field of comfortable prosthetic socket design. A clustering-based method is developed for optimizing sensor deployment by reducing redundancy while maintaining data integrity. A gait phase recognition approach that utilizes multiple hidden Markov models and Gaussian mixture models is developed. The proposed gait recognition method achieves high accuracy and computational efficiency, which outperforms conventional techniques.
By tackling the challenges in NoC-based accelerator design and machine learning applications for embedded systems, we bridge the gap between hardware optimization and practical deployment. These techniques would pave the way for future advancements in embedded intelligence.