Minisymposium
MS3B - Challenges and Opportunities for Next-Generation Research Applications and Workflows
Live streaming
Session Chair
Description
We are increasingly engaged in transdisciplinary research to address the complex challenges facing our world. These challenges include transitioning to renewable energy systems, advancing personalized medicine, utilizing digital twins, and accurately predicting climate change and its impacts on local and regional ecosystems. As we look toward a future shaped by computing, data, and AI, we aim to leverage various digital services & methodologies. In this context, application and workflow-focused approaches can play a crucial role in advancing scientific frontiers by harnessing the potential of integration. This approach could also serve as a long-term strategy for upholding the principles of sustainability, openness, and transparency, particularly within federated ecosystems. Thus, engaging in discussions about next-generation application workflows is just as crucial for advancing research as conversations concerning the development of digital infrastructure. This minisymposium will convene experts from various domains, each focusing on different aspects of the scientific research lifecycle. Speakers will critically examine the role of AI, explore performance and productivity beyond Moore's Law, and discuss how generative strategies can empower physics-based simulations. Representing early-career researchers, Dr. Filippo Gatti from France will discuss generative strategies for physics-based simulations.
Presentations
Field Stations advance our understanding of the physical, biogeochemical, ecological, social, and economic interactions that constitute place. Society has sophisticated ‘open science’ (OS)(cyberinfrastructure and progress is being made toward digital twins of Earth systems; yet local communities often feel disconnected from place-based scientific information and its benefits. One reason is that metadata describing samples/data collected in situ, including legal and social metadata that are vital for their reuse, can be stripped or lost in downstream applications. A novel publishing platform (iPlaces) creates a culture in which the common self-interest of all participants is clear to everyone. iPlaces enables field stations (and other anchor institutions) to publish project descriptions and related documentation in their station-branded journal. Using familiar peer review processes, station directors act as editors in a collaborative ecosystem that leverages OS data services, while empowering local communities to enter a dialogue with research teams. Benefits flow up and down value chains as: (1) place-based metadata are systematically layered onto research projects, (2) global OS infrastructure automatically applies this metadata to downstream research outputs, and (3) data trust services link outputs back to field stations and their communities. The power of this approach is discussed in a variety of contexts.
In the post-Moore era, the quest for enhanced performance and reproducibility is more critical than ever. As researchers and engineers in high-performance computing (HPC) and scientific computing, reimagining key areas such as algorithms, hardware architecture, and software is essential to drive progress. In this talk, we will explore how performance engineering is evolving, focusing on checkpointing and the management of intermediate data in scientific workflows. We will first discuss the shift from traditional low-frequency checkpointing techniques to modern high-frequency approaches that require complete histories and efficient memory use. By breaking data into chunks, using hash functions to store only modified data, and leveraging Merkle-tree structures, we improve efficiency, scalability, and GPU utilization while addressing challenges like sparse data updates and limited I/O bandwidth. We will also examine the balance between performance and data persistence in workflows, where cloud infrastructures often sacrifice reproducibility for speed. To overcome this, we propose a persistent, scalable architecture that makes node-local data shareable across nodes. By rethinking checkpointing and cloud data architectures, we show how innovations in algorithms, hardware, and software can significantly advance both performance and reproducibility in the post-Moore era.
ODISSEI’s advanced scientific computing infrastructure demonstrates how high-performance computing (HPC) can transform social science research. By leveraging a national supercomputer, ODISSEI provides social scientists with a secure, powerful HPC environment to process massive longitudinal datasets and complex data linkages. Researchers can now apply complex models and simulations to sensitive and high resolution data—such as large-scale network analysis, agent-based modeling, and deep neural networks—thanks to ample memory, massive parallelism, and specialized hardware (GPUs) that accelerate computation. HPC yields significant computational efficiencies: tasks that once took months can run in a matter of days, greatly accelerating the research workflow and iterative discovery. ODISSEI’s infrastructure is highly scalable, accommodating the ever-growing volume and complexity of datasets, drawn from administrative, experimental and web sources, while maintaining performance. Equally important, this integration of cutting-edge computing within social science fosters interdisciplinary collaboration between researchers, data providers, and computer scientists. Notably, the combination of extensive, well-annotated social science datasets with supercomputing capabilities that ODISSEI offers is unprecedented, positioning it at the forefront of data-intensive social research. In this presentation I provide a number of use cases where exceptionally rich data sources have led to new, innovative, and novel research lines.
This study integrates the Multiple-Input Fourier Neural Operator (MIFNO) with the diffusion model by Gabrielidis et al. (2024) to address challenges in capturing mid-frequency details in synthetic earthquake ground motion. MIFNO, a computationally efficient surrogate model for seismic wave propagation, processes 3D heterogeneous geological data along with earthquake source characteristics. It is trained to reproduce the three-component (3C) earthquake wavefield at the surface. The HEMEWS-3D database (Lehmann et al., 2024) is used, comprising 30,000 earthquake simulations across varying geologies with random source positions and orientations. These reference simulations were conducted using the high-performance SEM3D software (CEA et al., 2017), which excels in simulating fault-to-structure scenarios at a regional scale. While SEM3D provides accurate results at lower frequencies, its performance degrades with increasing frequency due to complex physical phenomena and a known bias in neural networks, which struggle with small-scale features. This limitation restricts MIFNO's applicability in earthquake nuclear engineering. The proposed combination with the diffusion model aims to mitigate this issue and improve the accuracy of mid-frequency predictions in synthetic ground motion generation.