Minisymposium
MS6D - Motif-Based Automated Performance Engineering for HPC
Live streaming
Session Chair
Description
In this minisymposium we will discuss some of the efforts that are made in the area of improving performance portability and programming productivity for motif-based high-performance algorithms that are used in wide range of scientific applications. We will describe domain specific libraries (DSLs) that express mathematical/programming motifs (data objects and operations on those data objects), along with software back-ends that translate the library calls into high-performance code. By the use of a motif-aware software stack, the scientific application code written is much smaller than fully optimized code, with the applications-level code remaining unchanged in moving between platforms, thus leading to a less expensive development process. There will be four talks given in this minisymposium covering multiple motifs like structured grids, Fast Fourier Transforms (FFTs), particle methods and dense/sparse linear algebra, and different approaches to supporting motif-based DSLs. This minisymposium aims to bring together different groups focused on developing various motif-based high-performance codes using different software tools and automate the process as much as possible.
Presentations
ProtoX is a code generation framework for stencil and pointwise operations-the key components in numerically approximating the solution to various partial differential equations (PDEs). The frontend for ProtoX uses Proto-a C++ based domain specific library that provides a high level of abstraction and an intuitive interface that optimizes the designing and scheduling of an algorithm aimed at solving various PDEs numerically on structured grids. The high level of abstractions used in Proto can be fused together to improve its current performance. However, abstraction fusion cannot be performed easily by a compiler. To overcome this issue ProtoX uses SPIRAL as its backend. SPIRAL is a code generation system that focuses on generating highly optimized target code in C/C++. The performance gain that is thus obtained in ProtoX is demonstrated for examples like the 2D Poisson problem as well as the 2Dand3D Euler equations that are used in the study of gas dynamics. The results obtained from CPU and GPU implementations will be discussed. An LLM-based based approach to reduce the numerical error in structured grid examples from Proto will also be introduced.
Motif-specific libraries have long been the gold standard for writing large scale scientificapplications. These libraries expose key operators and can be specialized to a givenhardware architecture without significant API modification. However, as algorithmsincrease in complexity, using the library approach leaves performance on the table dueto optimizations available across library calls. We present recent efforts to optimizelibrary-based applications without significant source code modification. FFTX is a newFFT library that uses code generation and runtime compilation to dynamically generateFFT kernels and optimize algorithms using FFT’s like convolution. IRISX leverages thecode generation capabilities of FFTX to dynamically dispatch runtime generated kernelsvia the IRIS runtime system, enabling diverse heterogeneity and performanceportability. Finally, FortranX can optimize legacy Fortran programs without source codemodification by leveraging FFTX/IRISX.
Particle-particle particle-mesh (PPPM) methods are used to describe advective dynamics (represented by particle discretizations) that are coupled to nonlocal fields (represented by grid discretizations). Our objective is to implement one such method, the Method of Local Corrections (MLC) for vorticity transport in 3D, on GPUs using performance-portable libraries: Cabana for particles, and FFTX for FFT-based solutions to the field equations. MLC splits the calculation of the velocity field into two parts: near-field (local N-body calculations) and far-field (single free-space convolution). MLC exercises the full capabilities and data choreography of Cabana where local N-body calculations and grid to / from particle interpolations are accelerated. FFTX is needed for the free space convolution step to approximate the velocity on the grid. Using this MLC method, we also develop more a systematic numerical analysis for particle methods in 3D via numerical experiments. Particles deforming significantly from the initial grid locations leads to large interpolation errors which leads to velocity errors and “particle noise". We address this with adaptive remapping where we map particles back to a Eulerian grid selectively, in regions where the particle locations have undergone large deformations.
In the Japanese computational science community, which has developed the K computer and Fugaku, the high demand for large-scale eigenvalue calculations in condensed material science has prompted updates to numerical software. Capability Computing is a crucial method for effectively addressing challenging large-scale systems and is a standard scientific tool. On the other hand, an evolving approach is also essential from the perspective of Capacity Computing, which manages large batches of eigenvalue computations. We aim to develop statistical, ensemble, and AI-enabled computational frameworks that leverage advanced approximation algorithms, cutting-edge system runtimes, and software frameworks such as Kokkos, IRIS, C++, Python, and Julia. Our goal is to discuss the development of the solver and present a roadmap that connects the creation of next-generation mathematical software with this framework and next-generation computers, encouraging participants to engage in conversation.