Minisymposium Presentation
Porting High Energy Physics Simulation Workflows Across US DOE HPC Facilities
Presenter
Dr. Ozgur Ozan Kilic is a Research Scientist at Brookhaven National Laboratory (BNL), where he focuses on analyzing and improving the performance of (AI/ML-driven) scientific workflows and high-performance computing (HPC). His research interests include:• Scientific workflow management and optimization• HPC Scheduling and Resource Management• Performance Modeling and Simulation• Data and Computational Resilience on HPC
Description
High Energy Physics (HEP) experiments, like ATLAS and DUNE, generate massive amounts of data requiring advanced simulation and data processing workflows. The increasing computational needs have necessitated the exploration of running these workflows on leadership-class HPC facilities (Perlmutter, Polaris, Frontier). However, the transition presents significant portability challenges, because each facility is unique with their architecture and software stacks. This talk explores strategies and tools for achieving cross-facility portability of large-scale HEP workflows, using DUNE and ATLAS simulation workflows as examples. Our main focus is on understanding how to coordinate emerging standards coming from various facility-specific APIs (e.g., Superfacility API, Globus Compute and others). Additionally, we examine detailed portability issues arising from heterogeneous CPU/GPU architectures, diverse library dependencies, varying shared file systems to understand possible solutions for sustainable workflow. Our experiences highlighted various portability challenges and revealed the need for closer collaboration between HEP application developers, HPC centers, and broader software communities. This study also underlines the crucial role of the Integrated Research Infrastructure (IRI), the development of which can benefit greatly from experiences of HEP workflows.