Back

Minisymposium

MS4B - Julia for HPC: Reproducible High-Performance Computing

Fully booked
Tuesday, June 17, 2025
15:00
-
17:00
CEST
Room 5.0B15 & 16
Join session

Live streaming

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.

Session Chair

Description

The fourth instalment of the “Julia for HPC” PASC minisymposium focuses on reproducibility, a cornerstone of scientific research and a key component of High-Performance Computing (HPC). As hardware and software evolve rapidly, reproducibility of both scientific results and application performance becomes increasingly complex. This challenge is particularly pronounced in HPC, where software packages are often tailored to specific hardware architectures for optimal performance. Addressing reproducibility in HPC therefore necessitates the careful development of portable libraries and applications, portable packaging, and a package management system that controls versions of all kinds of software dependencies. This minisymposium highlights the Julia programming language and its environment, which tackle these interconnected challenges in a holistic manner. Tightly integrated tools, such as the package manager and artifact builder, coordinate packaging and environment management, offering a consistent approach. Expert speakers will provide insights into the current state of reproducibility in Julia, highlighting strengths and remaining hurdles. The talks will be presented from the perspectives of both tooling developers and domain scientists. This minisymposium is aimed at Julia users eager to deepen their understanding of the existing reproducibility toolchain, as well as non-Julia users curious about how Julia’s reproducibility solutions might be adapted to other software ecosystems.

Presentations

15:00
-
15:30
CEST
Reproducible Heterogeneous Computing with the Julia Language

I will talk about IPUToolkit.jl, a package for running Julia code on the Intelligence Processing Unit (IPU), a massively parallel accelerator developed by Graphcore and powered by almost 1500 cores. I will show how Julia enables a high-degree of code reuse on this specialised hardware, using advanced packages such as DifferentialEquations.jl and Enzyme.jl.

Mose Giordano (University College London)
15:30
-
16:00
CEST
Generating Architecture-Agnostic Performance Tests from Functional Unit Tests Using Classical Performance Models

PerfTest.jl is a Julia package conceived from the idea of bridging the gap between unit testing and architecture-agnostic performance testing. It brings a set of features that allow the user to set up performance regression unit tests from functional unit tests with minimal effort. An emphasis is made on letting the user create flexible test suites with classical performance models that can be applied across different machines.

Daniel Sergio Vega Rodriguez (Università della Svizzera italiana), Samuel Omlin (ETH Zurich / CSCS), and Dimosthenis Pasadakis and Olaf Schenk (Università della Svizzera italiana)
16:00
-
16:30
CEST
A GPU-Accelerated Unified API for Singular Values Enabling Reproducability Across Architectures and Data Types

We present a portable, GPU-accelerated implementation of a QR-based Singular Value algorithm in Julia, that allows code reproducibility across several different GPU vendors. Singular Value Decomposition (SVD) is a fundamental numerical tool in scientific computing and machine learning, providing optimal low-rank matrix approximations with applications ranging from dimensionality reduction to data compression and signal processing. Our implementation leverages Julia’s multiple dispatch and metaprogramming capabilities, integrating with the GPUArrays and KernelAbstractions frameworks to provide a unified type-, and hardware-agnostic API. It supports diverse GPU architectures and data types, including half precision and Apple Metal. We benchmark the algorithm against several state-of-the-art linear algebra libraries and confirm performance reproducibility through a unified API. We explore GPU kernel optimization through parameter tuning to enable efficient parallelism and improved memory locality. Performance results on multiple GPU backends and data types demonstrate scalability combined with reproducibility, highlighting Julia’s suitability for high-performance linear algebra in heterogeneous environments.

Evelyne Ringoot, Rabab Alomairy, Valentin Churavy, and Alan Edelman (Massachusetts Institute of Technology)
16:30
-
17:00
CEST
Parallelizing GaPSE.jl with KernelAbstraction.jl: A Real-World Example of Reproducibility in Julia

Julia is gaining traction in scientific computing, and at the Leibniz Supercomputing Centre (LRZ), we are exploring its potential on our high-performance computing (HPC) system, particularly on our Intel Ponte Vecchio GPUs of the SuperMUC-NG Phase 2 supercomputer. The Julia package KernelAbstraction.jl enables vendor-agnostic parallelization, allowing developers to write kernels that run efficiently on both multi-threaded CPUs and various GPU architectures with minimal modifications. This ability to write a single single-source, hardware-agnostic kernel bridges the gap between different hardware backends and enhances the reproducibility of both results and performance across diverse computing environments. To evaluate its real-world impact, we apply KernelAbstraction.jl to GaPSE.jl, a cosmology program that computes Two-Point Correlation Functions of galaxies including General Relativistic effects. GaPSE.jl needs to perform numerous nested integrals, which can be computationally expensive. By leveraging parallel execution on CPUs and GPUs, we aim to significantly accelerate these calculations, improving efficiency and scalability. In this talk, we share our experience developing and optimizing kernels with KernelAbstraction.jl, we benchmark Julia’s performance on HPC, and we show how reproducibility is ensured in a real-world application.

Matteo Foglieni (Leibniz Supercomputing Centre)