Back

Minisymposium Presentation

Monitoring and Analysis of Energy Consumption in HPC Systems

Wednesday, June 18, 2025
14:00
-
14:30
CEST
Climate, Weather and Earth Sciences
Climate, Weather and Earth Sciences
Climate, Weather and Earth Sciences
Chemistry and Materials
Chemistry and Materials
Chemistry and Materials
Computer Science and Applied Mathematics
Computer Science and Applied Mathematics
Computer Science and Applied Mathematics
Humanities and Social Sciences
Humanities and Social Sciences
Humanities and Social Sciences
Engineering
Engineering
Engineering
Life Sciences
Life Sciences
Life Sciences
Physics
Physics
Physics

Description

Energy efficiency is a critical challenge in modern data centers facing an ever-growing scale and complexity. This talk presents energy monitoring and analysis strategies employed in the data center of the TU Dresden. The focus is on how energy consumption and other metrics are measured and analyzed across different levels, including the building infrastructure, HPC clusters and racks, as well as down to individual nodes. We will provide insights into practical challenges and solutions for monitoring HPC systems and offer a perspective on how such tools and techniques contribute to improving energy efficiency and sustainability in large-scale computing environments. We outline the methodology behind capturing comprehensive measurement data to better understand consumption patterns and enable system-level optimizations. This process includes integrating sensor data and metrics from various sources within the data center to provide a comprehensive view of energy usage. A key component of our approach is using MetricQ, an in-house developed, highly scalable, distributed metric data processing framework. MetricQ supports scalable, high-resolution data collection and real-time visualization, allowing us to analyze trends in order to identify inefficiencies accurately. Its responsiveness facilitates iterative exploration in many long-running data sets.

Authors