Skip to main content
To KTH's start page

Domain-Specific Compilation Framework with High-Level Tensor Abstraction for Fast Fourier Transform and Finite-Difference Time-Domain Methods

Time: Thu 2025-06-12 14.00

Location: D3, Lindstedtsvägen 5, Stockholm

Language: English

Subject area: Computer Science

Doctoral student: Yifei He , Beräkningsvetenskap och beräkningsteknik (CST)

Opponent: Associate Professor Ren Bin, College of William & Mary, Williamsburg, VA, USA

Supervisor: Professor Stefano Markidis, Beräkningsvetenskap och beräkningsteknik (CST); Associate professor Artur Podobas, Programvaruteknik och datorsystem, SCS

Export to calendar

QC 20250519

Abstract

With the end of Dennard scaling, hardware performance improvements now stem from increased architectural complexity, which in turn demands more sophisticated programming models. Today’s computing landscape includes a broad spectrum of hardware targets—CPUs, GPUs, FPGAs, and domain-specific ASICs—each requiring substantial manual effort and low-level tuning to fully exploit their potential. Performance programming has evolved beyond traditional code optimization and increasingly depends on domain-specific compilers, constraint-solving frameworks, advanced performance models, and automatic or learned strategies for code generation.

Conventional implementations of numerical libraries often rely on handwritten, platform-specific kernels. While such kernels may achieve high performance for selected routines, they typically underperform in others, and their lack of portability results in high development overhead and performance bottlenecks. This impedes scalability across heterogeneous hardware systems.

To address these challenges, this thesis presents the design and implementation of end-to-end domain-specific compilers for numerical workloads, with a focus on applications such as Fast Fourier Transform (FFT) and Finite Difference Time Domain (FDTD) solvers. The proposed framework is built on the Multi-Level Intermediate Representation (MLIR) and Low-Level Virtual Machine (LLVM) infrastructures. It models compute kernels as operations on 3D tensor abstractions with explicit computational semantics. High-level optimizations—including loop tiling, fusion, and vectorization—are automatically applied by the compiler.

We evaluate the proposed code generation pipeline across diverse hardware platforms, including Intel, AMD, and ARM CPUs, as well as GPUs. Experimental results demonstrate the approach’s ability to deliver both high performance and portability across heterogeneous architectures.

urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-363493