Paper

AP2B - ACM Papers Session 2B

Fully booked

Wednesday, June 18, 2025

11:30

13:00

CEST

Room 5.0B15 & 16

Live streaming recording

Session recording

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.

Session Chair

Andreas

Lintermann

Forschungszentrum Jülich

Description

Presentations

11:30

12:00

CEST

OpenACC and OpenMP-Accelerated Fortran/C++ Gyrokinetic Fusion Code GENE-X for Heterogeneous Architectures

Achieving net-positive fusion energy and its commercialization requires not only engineering marvels but also state-of-the-art, massively parallel codes that can handle reactor-scale simulations. The GENE-X code is a global continuum gyrokinetic turbulence code designed to predict energy confinement and heat exhaust for future fusion reactors. GENE-X is capable of simulating plasma turbulence from the core region to the wall of a magnetic confinement fusion (MCF) device. Originally written in Fortran 2008, GENE-X leverages MPI+OpenMP for parallel computing. In this paper, we augment the Fortran-based compute operators in GENE-X to a C++-17 layer exposing them to a wide array of C++-compatible tools. Here we focus on offloading the augmented operators to GPUs via directive-based programming models such as OpenACC and OpenMP offload. The performance of GENE-X is comprehensively characterized, e.g., by roofline analysis on a single GPU and scaling analysis on multi-GPUs. The major compute operators achieve significant performance improvements, shifting the bottleneck to inter-GPU communications. We discuss additional opportunities to enhance further the performance, such as by reducing memory traffic and improving memory utilization efficiency.

Jordy Trilaksono and Philipp Ulbl (Max Planck Institute for Plasma Physics), Jeremy Williams (KTH Royal Institute of Technology), Carl-Martin Pfeiler and Marion Finkbeiner (Max Planck Institute for Plasma Physics), Tilman Dannert and Erwin Laure (Max Planck Computing and Data Facility), Stefano Markidis (KTH Royal Institute of Technology), and Frank Jenko (Max Planck Institute for Plasma Physics)

12:00

12:30

CEST

An Efficient GPU Parallelization Strategy for Binary Collisions in Particle-In-Cell Plasma Simulations

The Particle-In-Cell (PIC) algorithm coupled with binary collision modules is a widely applicable method to simulate plasmas over a broad range of regimes (from the collisionless kinetic regime to the collisional regime). While several popular PIC codes implement binary collision modules, their performance on GPUs can be constrained by the default parallelization strategy, which assigns one GPU thread per simulation cell. This approach can underutilize GPU resources for simulations with many macroparticles per cell, and relatively few cells per GPU. To address this limitation, we propose an alternative parallelization strategy that instead GPU distributes threads based on independent pairs of colliding particles. Our proposed strategy shows a speed improvement of up to $\sim 4 \times$ for cases with relatively few cells per GPU and a similar performance otherwise.

Remi Lehe and Muhammad Haseeb (Lawrence Berkeley National Laboratory); Justin Angus and David Grote (Lawrence Livermore National Laboratory); Roelof Groenewald (TAE); and Arianna Formenti, Axel Huebl, Jack Deslippe, and Jean-Luc Vay (Lawrence Berkeley National Laboratory)

Bookmark
this session

Unbookmark
this session

Saving...