Minisymposium

MS6D - Motif-Based Automated Performance Engineering for HPC

Fully booked

Wednesday, June 18, 2025

14:00

16:00

CEST

Room 6.0D13

Live streaming recording

Session recording

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.

Session Chair

Het Yagnesh

Mankad

Oak Ridge National Laboratory

Description

In this minisymposium we will discuss some of the efforts that are made in the area of improving performance portability and programming productivity for motif-based high-performance algorithms that are used in wide range of scientific applications. We will describe domain specific libraries (DSLs) that express mathematical/programming motifs (data objects and operations on those data objects), along with software back-ends that translate the library calls into high-performance code. By the use of a motif-aware software stack, the scientific application code written is much smaller than fully optimized code, with the applications-level code remaining unchanged in moving between platforms, thus leading to a less expensive development process. There will be four talks given in this minisymposium covering multiple motifs like structured grids, Fast Fourier Transforms (FFTs), particle methods and dense/sparse linear algebra, and different approaches to supporting motif-based DSLs. This minisymposium aims to bring together different groups focused on developing various motif-based high-performance codes using different software tools and automate the process as much as possible.

Presentations

14:00

14:30

CEST

ProtoX: A Code Generation System for Stencil Operations

ProtoX is a code generation framework for stencil and pointwise operations-the key components in numerically approximating the solution to various partial differential equations (PDEs). The frontend for ProtoX uses Proto-a C++ based domain specific library that provides a high level of abstraction and an intuitive interface that optimizes the designing and scheduling of an algorithm aimed at solving various PDEs numerically on structured grids. The high level of abstractions used in Proto can be fused together to improve its current performance. However, abstraction fusion cannot be performed easily by a compiler. To overcome this issue ProtoX uses SPIRAL as its backend. SPIRAL is a code generation system that focuses on generating highly optimized target code in C/C++. The performance gain that is thus obtained in ProtoX is demonstrated for examples like the 2D Poisson problem as well as the 2Dand3D Euler equations that are used in the study of gas dynamics. The results obtained from CPU and GPU implementations will be discussed. An LLM-based based approach to reduce the numerical error in structured grid examples from Proto will also be introduced.

Het Yagnesh Mankad (Oak Ridge National Laboratory); Sanil Rao (Carnegie Mellon University); Phillip Colella (Lawrence Berkeley National Laboratory); Brian Van Straalen (Lawrence Berkeley National Laboratory, Carnegie Mellon University); and Franz Franchetti (Carnegie Mellon University)

14:30

15:00

CEST

The X Frameworks: FFTX, IRISX, FortranX

Motif-specific libraries have long been the gold standard for writing large scale scientificapplications. These libraries expose key operators and can be specialized to a givenhardware architecture without significant API modification. However, as algorithmsincrease in complexity, using the library approach leaves performance on the table dueto optimizations available across library calls. We present recent efforts to optimizelibrary-based applications without significant source code modification. FFTX is a newFFT library that uses code generation and runtime compilation to dynamically generateFFT kernels and optimize algorithms using FFT’s like convolution. IRISX leverages thecode generation capabilities of FFTX to dynamically dispatch runtime generated kernelsvia the IRIS runtime system, enabling diverse heterogeneity and performanceportability. Finally, FortranX can optimize legacy Fortran programs without source codemodification by leveraging FFTX/IRISX.

Sanil Rao and Franz Franchetti (Carnegie Mellon University)

15:00

15:30

CEST

Numerical Analysis and Acceleration of Particle-Particle Particle-Mesh Method Using Cabana

Particle-particle particle-mesh (PPPM) methods are used to describe advective dynamics (represented by particle discretizations) that are coupled to nonlocal fields (represented by grid discretizations). Our objective is to implement one such method, the Method of Local Corrections (MLC) for vorticity transport in 3D, on GPUs using performance-portable libraries: Cabana for particles, and FFTX for FFT-based solutions to the field equations. MLC splits the calculation of the velocity field into two parts: near-field (local N-body calculations) and far-field (single free-space convolution). MLC exercises the full capabilities and data choreography of Cabana where local N-body calculations and grid to / from particle interpolations are accelerated. FFTX is needed for the free space convolution step to approximate the velocity on the grid. Using this MLC method, we also develop more a systematic numerical analysis for particle methods in 3D via numerical experiments. Particles deforming significantly from the initial grid locations leads to large interpolation errors which leads to velocity errors and “particle noise". We address this with adaptive remapping where we map particles back to a Eulerian grid selectively, in regions where the particle locations have undergone large deformations.

Zoe Barbeau (Stanford University), Het Makand and Sam Reeve (Oak Ridge National Laboratory), and Phillip Colella (Lawrence Berkeley National Laboratory)

15:30

16:00

CEST

What and How Would we Build the Future Eigenvalue Solver?

In the Japanese computational science community, which has developed the K computer and Fugaku, the high demand for large-scale eigenvalue calculations in condensed material science has prompted updates to numerical software. Capability Computing is a crucial method for effectively addressing challenging large-scale systems and is a standard scientific tool. On the other hand, an evolving approach is also essential from the perspective of Capacity Computing, which manages large batches of eigenvalue computations. We aim to develop statistical, ensemble, and AI-enabled computational frameworks that leverage advanced approximation algorithms, cutting-edge system runtimes, and software frameworks such as Kokkos, IRIS, C++, Python, and Julia. Our goal is to discuss the development of the solver and present a roadmap that connects the creation of next-generation mathematical software with this framework and next-generation computers, encouraging participants to engage in conversation.

Toshiyuki Imamura (RIKEN)

Bookmark
this session

Unbookmark
this session

Saving...