Back

Minisymposium Presentation

Reimagining Performance and Reproducibility in the Post-Moore Era: Innovations in Checkpointing and Workflow Management

Tuesday, June 17, 2025
12:00
-
12:30
CEST
Climate, Weather and Earth Sciences
Climate, Weather and Earth Sciences
Climate, Weather and Earth Sciences
Chemistry and Materials
Chemistry and Materials
Chemistry and Materials
Computer Science and Applied Mathematics
Computer Science and Applied Mathematics
Computer Science and Applied Mathematics
Humanities and Social Sciences
Humanities and Social Sciences
Humanities and Social Sciences
Engineering
Engineering
Engineering
Life Sciences
Life Sciences
Life Sciences
Physics
Physics
Physics

Description

In the post-Moore era, the quest for enhanced performance and reproducibility is more critical than ever. As researchers and engineers in high-performance computing (HPC) and scientific computing, reimagining key areas such as algorithms, hardware architecture, and software is essential to drive progress. In this talk, we will explore how performance engineering is evolving, focusing on checkpointing and the management of intermediate data in scientific workflows. We will first discuss the shift from traditional low-frequency checkpointing techniques to modern high-frequency approaches that require complete histories and efficient memory use. By breaking data into chunks, using hash functions to store only modified data, and leveraging Merkle-tree structures, we improve efficiency, scalability, and GPU utilization while addressing challenges like sparse data updates and limited I/O bandwidth. We will also examine the balance between performance and data persistence in workflows, where cloud infrastructures often sacrifice reproducibility for speed. To overcome this, we propose a persistent, scalable architecture that makes node-local data shareable across nodes. By rethinking checkpointing and cloud data architectures, we show how innovations in algorithms, hardware, and software can significantly advance both performance and reproducibility in the post-Moore era.

Authors