Minisymposium
MS1G - Sustainable Computing for Big Data Infrastructures
Live streaming
Session Chair
Description
The rapid growth of data in fields like astronomy, particle physics, and genomics, alongside the rising need for large AI model training, poses significant sustainability challenges for big data infrastructures. Carbon-conscious innovation is crucial for future computing infrastructures of the Square Kilometre Array Observatory (SKAO), expected to generate over 700 petabytes annually for the next 50 years to enable groundbreaking discoveries in physics and astronomy. Similarly, effective decarbonization strategies are necessary for legacy systems, such as CERN’s Worldwide LHC Computing Grid (WLCG), which offers global access to over 1.5 exabytes of data. This minisymposium will explore innovative strategies for designing energy-efficient hardware and optimizing software pipelines while minimizing environmental impact. It will feature sustainable computing research and solutions for infrastructures like SKAO and CERN, emphasizing real-world examples and insights into improving sustainability metrics for big data infrastructures. Key topics will include reducing carbon footprints, ensuring performance portability, and co-designing energy-efficient accelerators for high-performance computing, all crucial for addressing the growing demand for extreme-scale scientific computing. By fostering collaboration and sharing cutting-edge research, this event aims to enhance energy efficiency in scientific computing, develop more sustainable high-performance computing infrastructures, and advance carbon-aware practices to meet the demands of a data-driven future.
Presentations
The Worldwide LHC computing Grid has been developed over the last two decades to include more than 160 sites in over 40 countries. This infrastructure grew out of technical, financial, and political considerations, and is a remarkable achievement. However, it was also created before the necessity to consider the environmental impact of computing infrastructures, became an imperative. Great efforts are now being made to reduce the carbon footprint of the WLCG computing and respond strongly to this agenda. In the future, many more scientific projects will need large-scale computing infrastructures to tackle increasingly large data sets. Whilst these endeavours may be able to build on the technical success of WLCG they also have the opportunity to build-in, and optimise, environmental considerations from the start. In this talk, I will give an overview of the work that is going on within WLCG to “green the Grid” and consider what aspects might be done differently if we had the opportunity to start again.
High Performance Computing (HPC) is increasingly defined by heterogeneity, with diverse hardware architectures and growing core counts per device. Optimizing performance and ensuring code portability are not just technical challenges but also essential components of sustainable computing. Efficient resource utilization and energy-aware optimizations are critical for reducing the environmental impact of large-scale simulations while maintaining adaptability across rapidly evolving HPC systems.In this talk, we will explore three strategies for achieving performance and portability in modern heterogeneous HPC environments and evaluate their sustainability. We will share insights from optimizing legacy applications, developing new simulation frameworks, and integrating data analysis pipelines that exploit multiple levels of parallelism—both within and across nodes. Our discussion will highlight the trade-offs between performance and portability while considering energy efficiency and long-term sustainability in computational science.We will conclude by examining future computing trends and the increasing complexity of next-generation HPC systems, emphasizing strategies to balance computational power, energy efficiency, and scientific productivity in an era of growing heterogeneity.
The increasing computational demand from big data-intensive domains such as genomics, astronomy, and artificial intelligence accelerates data center (DC) carbon emissions. This trend underscores the need for sustainable computing strategies balancing performance, energy efficiency, and carbon reduction in large computing infrastructures. This talk examines trade-offs in upgrading legacy DCs to meet growing performance demands while aligning with sustainability goals. We analyze the acceleration vs. energy efficiency dilemma of different platform choices and the specialization vs. flexibility balance of CPUs, GPUs, FPGAs, and ASICs. Effective carbon-reduction procurement strategies require considering both operational and capital expenditures. We will introduce a design space exploration framework to support optimal decision-making for DC upgrades and sustainable investments. A key focus will be FPGAs for balancing performance, energy efficiency, and flexibility. We will discuss hardware-software co-design methodologies for reducing time-to-solution in complex workloads. Real-world case studies demonstrate the effectiveness of FPGA-based solutions in genomics and radio astronomy, showcasing tangible energy savings while maintaining high computational throughput. This session provides practical methodologies for designing next-generation, low-carbon computing infrastructures, ensuring DCs can scale without worsening the climate crisis.
Los Alamos National Laboratory has experience operating advanced high performance computing facilities since the late 1950s. Today, the computing facilities use about 50% of the lab’s total energy and 31% of it’s total water. The need for efficient water and energy usage is ongoing to support growth and expanded mission. Data-driven engineering, innovations and strategic planning are the lab's toolkit for both its external challenges and its internal processes. We present the history of the resource restrictions, water reuse, carbon-free power procurement and computing optimizations with the following accomplishments: High-performance computing facilities now use almost 100% of reclaimed water instead of potable, reducing our water usage almost to zero. LANL has signed a contract to purchase and use 170MW of solar power from a new solar farm in the northwest corner of the state, enabling mission expansion using a renewable energy source. Current efforts include the application of machine learning models to accurately predict short-term energy needs, enabling more informed and diverse energy procurement strategies. While we have made great strides there is still more to do and some of our plans for the future include a central utility plant and to determine ways to reuse waste heat.