P24 - How to Build an Energy Dataset for HPC
Description
Quantifying the energy consumption in HPC domain is becoming increasingly critical nowadays, driven by rising energy costs. To gain a comprehensive understanding of the energy footprint created by the significant power demand of modern systems like Alps, which exceeds its predecessor Piz Daint in energy usage, the Swiss National Supercomputing Centre (CSCS) decided to collect and aggregate energy consumption data from various sources associated with each job executed on Alps to create an energy dataset. This dataset is contained in a MySQL database and consists of: - Job Metadata: This includes fields such as job ID, CPU hours consumed, nodes utilized, start time, end time, and elapsed time. This data is sourced from the SLURM workload manager via the jobcompletion plugin component. Energy Metrics: energy consumption data is derived from telemetry sensors installed on Alps supercomputer. These sensors, provided by DMTF’s Redfish® standard , capture raw data then processed to calculate energy consumption for all nodes associated with a job and aggregate the results. SLURM Energy data: added through energy plugin to the Job Metadata. Quality Indicators: Additional fields are included to assess and ensure reliability and accuracy of computed energy consumption metrics. DCGM data: Nvidia GPU collection metrics system.
Presenter(s)

Presenter
Mathilde Gianolli was born in Mendrisio on the 1.12.1992.She studied Mathematics at ETHZ.After her studies, she was employed from June 2020 to February 2023 as data analyst for the engineering company IFEC in Rivera on noise and energy quantification. From June 2023 she works for CSCS as software engineer on the node-hours and energy accounting system of the new Supercomputer Alps.