Back

Minisymposium Presentation

Portable Analysis Workflows for Data Reproducibility

Wednesday, June 18, 2025
15:30
-
16:00
CEST
Climate, Weather and Earth Sciences
Climate, Weather and Earth Sciences
Climate, Weather and Earth Sciences
Chemistry and Materials
Chemistry and Materials
Chemistry and Materials
Computer Science and Applied Mathematics
Computer Science and Applied Mathematics
Computer Science and Applied Mathematics
Humanities and Social Sciences
Humanities and Social Sciences
Humanities and Social Sciences
Engineering
Engineering
Engineering
Life Sciences
Life Sciences
Life Sciences
Physics
Physics
Physics

Description

As High Energy Physics (HEP) enters an era of unprecedented dataset size, ensuring data analysis reproducibility and preservation becomes a growing concern in the community. HEP physicists often manually manage complex analysis workflows, including job submissions and data management. The manual approach is both labor-intensive and prone to errors. Ultimately, it results in undocumented dependencies between different analysis steps, making analysis coordination, sharing, and reproducibility challenging. To address these issues, workflow management tools, such as Snakemake, Common Workflow Language, and Luigi Analysis Workflows, have been adopted in HEP community. A HEP-applicable workflow management system must be transparent, configurable, portable, and scalable to support the increasing use of HPC resources in data analysis. However, workflow tools are often perceived as only documentation rather than integral components of the analysis process, making adoption challenging for many physicists. In this talk, we will present our experience with workflow managing tools in HEP analysis, highlighting features of the workflow manager essential for the HEP application. We describe some typical issues physicists face when developing their workflows based on the feedback from analysis reproducibility training for HEP professionals. We emphasize the risk of blindly re-executing inherited workflows and advocate for integrated testing within workflow design.

Authors