Paper

AP2D - ACM Papers Session 2D

Fully booked

Wednesday, June 18, 2025

11:30

13:00

CEST

Room 6.0D13

Live streaming recording

Session recording

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.

Session Chair

Zhaohui

Song

Zhaohui Song, Politecnico di Milano, Italy

Description

Presentations

11:30

12:00

CEST

HiPerRAG: High-Performance Retrieval Augmented Generation for Scientific Insights

The volume of scientific literature is growing exponentially, leading to underutilized discoveries, duplicated efforts, and limited cross-disciplinary collaboration. Retrieval-Augmented Generation (RAG) offers a way to assist scientists by improving the factuality of Large Language Models (LLMs) in processing this influx of information. However, scaling RAG to handle millions of articles introduces significant challenges, including the high computational costs associated with parsing documents and embedding scientific knowledge, as well as the algorithmic complexity of aligning these representations with the nuanced semantics of scientific content. To address these issues, we introduce HiPerRAG, a RAG workflow powered by high performance computing (HPC) to index and retrieve knowledge from more than 3.6 million scientific articles. At its core are Oreo, a high-throughput model for multimodal document parsing, and ColTrast, a query-aware encoder fine-tuning algorithm that enhances retrieval accuracy by using contrastive learning and late-interaction techniques. HiPerRAG delivers robust performance on existing scientific question answering (Q/A) benchmarks and two new benchmarks introduced in this work, achieving 90% accuracy on SciQ and 76% on PubMedQA—outperforming both domain-specific models like PubMedGPT and commercial LLMs such as GPT-4. Scaling to thousands of GPUs on the Polaris, Sunspot, and Frontier supercomputers, HiPerRAG delivers million document-scale RAG workflows for unifying scientific knowledge and fostering interdisciplinary innovation.

Ozan Gokdemir, Carlo Siebenschuh, and Alexander Brace (University of Chicago, Argonne National Laboratory); Azton Wells (Argonne National Laboratory); Brian Hsu (Argonne National Laboratory, University of Chicago); Kyle Hippe and Priyanka Setty (University of Chicago, Argonne National Laboratory); Aswathy Ajith and J. Gregory Pauloski (University of Chicago); Varuni Sastry, Sam Foreman, Huihuo Zheng, Heng Ma, Bharat Kale, and Nicholas Chia (Argonne National Laboratory); Thomas Gibbs (NVIDIA Inc.); Michael Papka (Argonne National Laboratory, University of Illinois Chicago); Thomas Brettin and Francis Alexander (Argonne National Laboratory); Anima Anandkumar (California Institute of Technology); Ian Foster (Argonne National Laboratory, University of Chicago); Rick Stevens and Venkatram Vishwanath (Argonne National Laboratory); Arvind Ramanathan (Argonne National Laboratory, University of Chicago); and Thomas Uram (Argonne National Laboratory)

12:00

12:30

CEST

Leveraging Large Language Models for Code Translation and Software Development in Scientific Computing

The emergence of foundational models and generative artificial intelligence (GenAI) is poised to transform productivity in scientific computing, especially in code development, refactoring, and translating from one programming language to another. However, because the output of GenAI cannot be guaranteed to be correct, manual intervention remains necessary. Some of this intervention can be automated through task-specific tools, alongside additional methodologies for correctness verification and effective prompt development. We explored the application of GenAI in assisting with code translation, language interoperability, and codebase inspection within a legacy Fortran codebase used to simulate particle interactions at the Large Hadron Collider (LHC). In the process, we developed a tool, CodeScribe, which combines prompt engineering with user supervision to establish an efficient process for code conversion. In this paper, we demonstrate how CodeScribe assists in converting Fortran code to C++, generating Fortran-C APIs for integrating legacy systems with modern C++ libraries, and providing developer support for code organization and algorithm implementation. We also address the challenges of AI-driven code translation and highlight its benefits for enhancing productivity in scientific computing workflows.

Akash Dhruv and Anshu Dubey (Argonne National Laboratory)

12:30

13:00

CEST

CAFE AU LAIT: Compute-Aware Federated Augmented Low-Rank AI Training

Federated finetuning is essential for unlocking the knowledge embedded in pretrained Large Language Models (LLMs) when data is distributed across clients. Unlike single-institution finetuning, federated finetuning enables collaboration across decentralized datasets while preserving data privacy. To address the high computing costs of LLM training and improve energy efficiency in Federated Learning (FL), Low-Rank Adaptation (LoRA) has gained popularity due to its reduced number of trainable parameters. However, this approach assumes all clients have sufficient computing resources, which is often unrealistic due to the heterogeneity of resources across clients. While some clients may access powerful GPUs, others have limited or no such resources. Federated finetuning using synthetic data allows participation without local LLM training but introduces a performance gap compared to local updates. To address this, we propose a novel two-stage algorithm leveraging the storage and computing power of a strong server. In the first stage, resource-constrained clients generate synthetic data under the coordination of the strong server, which is stored on the strong server. In the second stage, the strong server uses this synthetic data on behalf of constrained clients to perform federated LoRA finetuning alongside clients with sufficient resources. This ensures participation from all clients. Experimental results demonstrate that incorporating local updates from even a small fraction of clients improves performance compared to using synthetic data for all clients. Additionally, we integrate the Gaussian mechanism in both stages to ensure client-level differential privacy.

Jiayi Wang, John Gounley, and Heidi Hanson (Oak Ridge National Laboratory)

Bookmark
this session

Unbookmark
this session

Saving...