Leveraging Intermediate Representations for High-Performance Portable Discrete Fourier Transform Frameworks
with Application to Molecular Dynamics
Time: Fri 2023-06-09 14.00
Location: VIC-studion, Lindstedtsvägen 5, Stockholm
Language: English
Doctoral student: Måns Andersson , Beräkningsvetenskap och beräkningsteknik (CST)
Opponent: Hartwig Anzt, University of Tennessee
Supervisor: Stefano Markidis, Beräkningsvetenskap och beräkningsteknik (CST); Artur Podobas, Programvaruteknik och datorsystem, SCS; Niclas Jansson, Parallelldatorcentrum, PDC; Ivy Bo Peng, Parallelldatorcentrum, PDC, Beräkningsvetenskap och beräkningsteknik (CST)
QC 20230522
Abstract
The Discrete Fourier Transform (DFT) and its improved formulations, the Fast Fourier Transforms (FFTs), are vital for scientists and engineers in a range of domains from signal processing to the solution of partial differential equations. A growing trend in Scientific Computing is heterogeneous computing, where accelerators are used instead or together with CPUs. This has led to problems for developers in unifying portability, performance, and productivity.
This thesis first motivates this work by showing the importance of having efficient DFT calculations, describes the DFT algorithm and a formulation based on matrix-factorizations which has been developed to formulate FFT algorithms and express their parallelism to exploit modern computer architectures, such as accelerators.
The first paper is a motivating study of the breakdown of the performance and scalability of the high-performance Molecular Dynamics code GROMACS where DFT calculations are a main performance bottleneck. In particular, the long-range interactions are solved with the Particle-Mesh Ewald algorithm which uses a three-dimensional Fast Fourier Transform.
The two following papers present two approaches to leverage factorization with the help of two different frameworks using Intermediate Representation and compiler technology, for the development of fast and portable code. The second paper presents a front-end and a pipeline for code generation in a domain-specific language based on Multi-Level Intermediate Representation (MLIR) for developing Fast Fourier Transform libraries. The last paper investigates and optimizes an implementation of an important kernel within the matrix-factorization framework: the batched DFT. It is implemented with data-centric programming and a data-centric intermediate representation called Stateful Dataflow multi-graphs (SDFG). The paper evaluates strategies for complex-valued data layout for performance and portability and we show that there is a trade-off between portability and maintainability in using the native complex data type and that an SDFG-level abstraction could be beneficial for developing higher-level applications.