Minisymposium
MS2G - Data Without Borders: Fostering Equity and Access in Scientific Research
Live streaming
Session Chair
Description
This minisymposium aims to bridge geographical divides, such as those between Europe and the United States, and domain-specific barriers across fields like material sciences, astrophysics, and high-energy physics. Our primary objective is to foster collaboration and knowledge exchange among communities that often operate in isolation, ensuring that diverse perspectives are represented. We are committed to engaging a diverse range of speakers, including both male and female experts, to enrich the dialogue with varied insights. In addition, this symposium focuses on supporting underserved communities that face challenges in accessing high-end resources and data storage capabilities. This includes regions with limited access to advanced technology and data infrastructure, posing barriers to conducting leading-edge research. These communities often struggle to obtain and share critical data for their scientific work. This minisymposium provides a platform for these communities to facilitate the exchange of knowledge, best practices, and resources. Through this collaborative approach, we strive to empower researchers from diverse backgrounds to make substantial contributions to their fields and the broader scientific community.
Presentations
The growing climate impact of increased Greenhouse Gas Emissions and CO2 levels in Earth's atmosphere highlights the value and importance of technologies that reduce such impact. Electricity generation is a significant contributor to global CO2 emissions, and the data center industry is expected to reach anywhere from 3 to 13% of global electricity demand by 2030. Data centers can facilitate grid decarbonization in a manner different from isolated power loads, and Google has set on a mission to increase and scale carbon awareness via new technology solutions. We present an overview and enhancements of Google’s Carbon-Intelligent Computing System, and discuss technical challenges associated with increasing and harnessing temporal and spatial flexibility of diverse workloads running in Google data centers. Furthermore, we share some insights from building systems that leverage different types of workload flexibility to effectively increase resource efficiency and reduce environmental impact, while meeting infrastructure and application SLOs. Finally, we pause some key, open questions from our investigations of load shaping strategies aimed not only to reduce grid-level emissions, but contribute to energy systems’ more resilient, robust and cost-efficient decarbonization.
Earth system data's exponential growth presents opportunities and significant challenges for Earth System science research. The presentation outlines NSF NCAR's implementation of the Next-Generation Geoscience Data Exchange (NG-GDEX), a data commons architecture, alongside on-premise cloud computing resources, complementing our existing HPC and data analysis infrastructure. NSF NCAR aims to remove technical barriers that impede participation in Earth system science across institutional and geographical boundaries. Our long-term goal is to deliver data to the scientific community when needed and in the format they want. Our approach unifies currently siloed repositories while integrating with the Open Science Data Federation (OSDF) to democratize access to petabyte-scale Earth System science models and observational datasets. We will describe our approach to making high-demand datasets directly accessible from our HPC systems, enabling researchers to leverage data-proximate computing without unnecessary data movement. NSF NCAR is committed to prioritizing community needs and is working on implementing our data commons workshop recommendations. We aim to break down barriers, enhance reproducibility, and foster a more inclusive and collaborative global research environment.
The National Science Data Fabric (NSDF) is building an open, accessible, and scalable infrastructure to democratize access to scientific data and computing resources. This talk presents NSDF’s vision and practical strategies for maximizing the return on national investments in data infrastructure. We highlight the integration of FAIR digital objects, end-to-end data services, and cross-institutional collaboration that enables broad participation across disciplines and institutions. By supporting real-world use cases in areas such as climate science, materials discovery, and biomedical research, NSDF is driving sustainable innovation and ensuring that scientific discovery is both inclusive and impactful.
The management of scientific data is a complex task, especially when they are distributed across geographically-dispersed and heterogeneous storages. Rucio was developed originally for the needs of the high-energy physics experiment ATLAS, running at the Large Hadron Collider (LHC) at CERN. With over a decade of successful operation and actively managing over one exabyte of data, Rucio has proven to be scalable and performant. Thanks to its design, features, and strong community support, numerous other scientific collaborations, both in the area of high-energy physics and beyond, have chosen to adopt Rucio for their data-management needs. This talk aims to be an introduction to Rucio, explaining its fundamental concepts and describing some of its key features.