Minisymposium

MS6F - Application Perspective on SYCL, a Modern Programming Model for Performance and Portability

Fully booked

Wednesday, June 18, 2025

14:00

16:00

CEST

Room 5.2D02

Live streaming recording

Session recording

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.

Session Chair

Andrey

Alekseenko

KTH Royal Institute of Technology

Description

HPC and data-intensive computing now stand as the fourth pillar of science. The heterogeneous architectures, primarily GPUs, are a staple of HPC. However, the fast pace of development and increasing diversity of hardware bring forth not only new opportunities, but also challenges. Proprietary heterogeneous programming models often pose risks due to vendor lock-in, hurting collaboration and limiting portability, hence undermining reproducibility in scientific research. SYCL, a vendor-agnostic, C++-based standard for heterogeneous computing with several mature implementations for a wide range of hardware accelerators, offers a promising path towards vendor-agnostic, high-performance computing. This minisymposium focuses on the experiences of scientific application developers using SYCL as a performance-portability solution across different hardware accelerators, fostering a collaborative open-software ecosystem. The talks will focus on best practices for developing performance-portable and maintainable code using SYCL, as well as on co-development and interaction between the SYCL Working Group and application developers. The aim of this workshop is to contribute to the wider adoption of open standards in scientific computing.

Presentations

14:00

14:30

CEST

Celerity - SYCL-Based High-Productivity Development at HPC Scale

The vendor-agnostic SYCL standard provides a C++-based foundation for the development of applications targeting heterogeneous systems. However, while SYCL supports multiple devices in a single shared memory host system or node, leveraging this support requires developers to manually take care of the issues of work splitting and data coherence across devices. Developing for an HPC environment, which features a large number of accelerators in a distributed memory cluster, is even more challenging and labor-intensive. As a consequence, algorithmic decisions are often difficult to change once taken, reducing the potential for experimenting with different approaches at scale.This talk will briefly introduce a subset of SYCL, and demonstrate how the Celerity runtime system extends the applicability of single-GPU concepts to clusters, in a manner largely transparent to the programmer. The applicability of this system, and its potential for enabling the rapid evaluation of different implementation choices, will be illustrated with scalability studies on a set of real-world use cases. Additionally, some SYCL software engineering best practices based on the Celerity development experience will be shared.

Peter Thoman (University of Innsbruck)

14:30

15:00

CEST

Coastal Ocean Simulations with Unstructured and Block-Structured Grids on CPUs, GPUs, and FPGAs

The shallow water equations model fluid flow in regions where the horizontal scale is much larger than the vertical depth. Applications of the shallow water equations include the modelling of tides, tsunamis or atmospheric flows. In this talk, we will discuss a SYCL implementation of a discontinuous Galerkin discretization for the two dimensional shallow water equations with application to coastal ocean simulations. We will first have a brief discussion of the discontinuous Galerkin discretization and important performance characteristics of the resulting scheme on unstructured grids. We will then show how this scheme can be implemented in SYCL, what abstractions are necessary for different architectures, and highlight differences betweencompilers and backends. At last, we briefly discuss energy efficiency for our code across the tested architectures and give a brief outlook on how block-structured grids can be used to improve simulation performance.

Markus Büttner (University of Bayreuth); Christoph Alt (Paderborn University, Friedrich-Alexander-Universität Erlangen-Nürnberg); Tobias Kenter (Paderborn University); and Vadym Aizinger (University of Bayreuth)

15:00

15:30

CEST

Experiences with Using SYCL for NWChemEx on Exascale Supercomputers

The rise of exascale computing offers new possibilities for computational chemistry, particularly in modeling molecular systems. We examine the integration of the Coupled Cluster with Single, Double, and Perturbative Triple excitations (CCSD(T)) method within the NWChemEx library, using SYCL and oneMKL to ensure portability and performance across exascale hardware. By leveraging oneAPI, we create a unified platform that targets heterogeneous systems, including CPUs, GPUs, and accelerators, without sacrificing maintainability or performance. Our work addresses the challenges of porting, debugging, and optimizing algorithms for various exascale hardware used in Department of Energy (DOE) facilities, including NVIDIA, AMD, and Intel architectures. Using SYCL’s portability, we achieve efficient parallelization across these GPU architectures, ensuring scalability on platforms like NVIDIA A100, AMD MI250, and Intel Ponte Vecchio. Benchmarks on DOE exascale systems demonstrate the efficiency and scalability of our approach, highlighting the potential of SYCL to fully leverage exascale architectures for next-generation quantum chemistry. This work is a key step toward developing exascale-ready computational chemistry software that is efficient and portable across high-performance computing platforms.

Abhishek Bagusetty and Kevin Harms (Argonne National Laboratory)

15:30

16:00

CEST

Shamrock: Scaling SPH to Exascale for Astrophysics Using SYCL

Shamrock is a native SYCL-based framework designed for numerical astrophysics on exascale architectures. Developed in C++17, Shamrock addresses the growing need for efficient and scalable solutions in hydrodynamical simulations for astrophysics. The core of the framework utilizes a ray-tracing-inspired parallel tree structure nested within an abstracted domain decomposition. This design enables the efficient implementation of the Smoothed Particle Hydrodynamics (SPH) algorithm, providing significant speedups for large-scale simulations and scalability across thousands of GPUs. We chose SYCL for its portability across CPU and GPU vendors, as well as its performance portability, allowing us to maintain the same code paths throughout the codebase. This presentation will introduce Shamrock, its use of SYCL, its performance across multiple vendors and backends, and the reasons behind our initial decision to develop the framework with the buffer/accessor model. We will also discuss our migration to the Unified Shared Memory (USM) model to achieve finer memory control and reduced latencies.

Timothée David--Cléris (Université Grenoble Alpes)

Bookmark
this session

Unbookmark
this session

Saving...