Minisymposium
MS1A - FAIR-By-Design HPC-Driven Research
Live streaming
Session Chair
Description
In this minisymposium, we discuss state-of-the-art solutions and ongoing developments to embed FAIR data principles into HPC research workflows. Indeed, despite the growing awareness of the importance of FAIR data principles, implementing them remains a challenge. This is especially true for HPC-powered research, due to the sheer amount of simulations, the lack of tools to simplify their management, and the rapidly evolving landscape of computational frameworks. The minisymposium focuses on best practices for reproducible workflows via modern state-of-the-art tools and approaches, concerning both workflow engines and the underlying simulation software running on current HPC architectures. We discuss applications from climate research and materials science, two domains with similar requirements in terms of workflows, access to HPC resources, and related data management needs. Nevertheless, these two domains have had until now limited interaction and cross-fertilization. Invited speakers are both developers and advanced users of widely adopted open-source workflow engines with a clear focus on reproducibility. They will showcase current efforts and future directions for addressing challenges on how the workflow engines can be expanded to facilitate adherence to FAIR requirements, by still keeping the user interface (for workflow developers and users) as simple as possible.
Presentations
With the increase in simulation resolution, climate and weather models are now potentially outputting petabytes of data so that the largest projects now require complex workflows tightly integrating preprocessing, postprocessing, potential downstream applications and archiving. We present here Sirocco, a climate and weather workflow tool currently under development written in python in collaboration between ETHZ, CSCS and PSI. It builds on top of the low-level generic workflow tool library AiiDA and the AiiDA-workgraph layer. Sirocco defines a user-friendly yaml based configuration format describing the workflow graph in which, equally to tasks, data nodesnaturally become first class citizens by using AiiDA objects. The graph is first generated as an internal representation, then translated to an aiida-workgraph object and finally orchestrated by AiiDA. Sitting on top of AiiDA, Sirocco can naturally target different HPC systems. Still the current effort focuses on the Alps system and the ICON model with the aim of not only making complex workflows possible but also make the ICON users life overall easier on Alps by integrating specifics of ICON and the advanced HPC management of Alps involving technologies like user environments and FirecREST.
Solid-solid interfaces are ubiquitous in science and their tribological, electronic, optical, and magnetic properties impact the functionality of many technologies, from engines to solar cells. Studying solid-solid interfaces experimentally is challenging due to their buried nature. Simulations help bridge knowledge gaps and serve as powerful tools for screening and designing solutions. I will present how we address the computational challenges posed by the automatized, first-principles study of solid-solid interfaces by developing Tribchem [1]. Tribchem is a python software based on the FireWorks platform that automates the creation and optimization of interface models, the identification of the optimal computational parameters, job submission, and data retrieval and storage. Importantly, in the FAIR framework, within Tribchem a high-level interface class is defined to store/retrieve results from its own database and connect to public ones. I will show how Tribchem is applied to study the tribological properties of different classes of interfaces as metal-metal, metal-semiconductor, and metal-2D-materials [2]. These results are part of the ERC-SLIDE project (Grant agreement No. 865633). [1] G. Losi, O. Chehaimi, and M.C. Righi, Journal of Chemical Theory and Computation (2023). [2] P. Restuccia, G. Losi, O. Chehami, M. Marsili, and M.C. Righi, ACS Applied Materials & Interfaces (2023).
Cesium-telluride photocathodes are established materials for electron sources in particle accelerators. While ab initio methods like density functional theory (DFT) show great potential to complement experimental research efforts [Cocchi & Saßnick, Micromachines 12, 1002 (2021)], their performance is hindered by the poor control of the microstructure and stoichiometry during growth. To overcome these limitations, computational predictions and high-throughput screening are essential to identify and characterize these systems. This application stimulated the development of aim2dat (https://aim2dat.github.io/), a numerical library implementing workflows to perform DFT calculations ensuring data provenance and reproducibility, in addition to an effective and sustainable usage of high-performance computing resources. In the first step, the stability and electronic properties of a set of Cs-Te crystal structures and stoichiometries are analyzed [Saßnick & Cocchi, J. Chem. Phys. 156, 104108 (2022)]. Next, surface slabs of the Cs2Te compound are computed and their electronic properties are discussed [Saßnick & Cocchi, NPJ Comput. Mater. 10, 38 (2024)]. Finally, to expand the pool of crystals beyond the experimentally resolved systems, machine learning models are incorporated to predicting new binary stable cesium-telluride crystals [Saßnick & Cocchi, Adv. Theory Simul., 2401344 (2025)]. The proposed approach aims to accelerate the discovery and optimization of high-performance Cs-Te photocathodes.
We introduce Autosubmit, a workflow management tool developed to support Earth Sciences research and operations in HPC environments. In constant development to follow the FAIR-by-design principles, Autosubmit enables seamless management, execution, and sharing of scientific experiments and workflows while ensuring that the resulting data is easily discoverable, accessible, and reproducible across diverse systems. It includes advanced automation features such as meta-scheduling, high-level configuration, and automatic retries, ensuring efficiency and reliability in complex climate modeling tasks. Unlike other workflow solutions in the domain, it integrates the capabilities of an experiment manager, workflow orchestrator and monitor in a self-contained application. Interoperability is key, with multi-platform support built in Python with a user-friendly GUI built in Javascript. The tool enables single-point access to workflows, ensuring that users can interact with a distributed, scalable database and manage tasks across multiple hosts. Customizable granularity and dynamic task aggregation optimize the makespan of the simulation, while integrated performance metrics offer insights for workflow optimization. Robustness is assured through scalable architecture, automatic recovery mechanisms, and built-in traceability, providing real-time monitoring of task status, logs, and workflow statistics. By supporting reproducibility, traceability, and collaboration, Autosubmit efficiently manages high-performance computing resources in the context of climate research.