Minisymposium Presentation

MLDocking: Accelerated Drug Discovery with Transformer-Based Surrogate Models and In-Memory Workflows on Heterogeneous HPC Systems

Tuesday, June 17, 2025

12:00

12:30

CEST

Climate, Weather and Earth Sciences

Chemistry and Materials

Computer Science and Applied Mathematics

Engineering

Life Sciences

Physics

Presenter

Riccardo

Balin

Argonne National Laboratory

Watch replay

Description

The use of AI in drug discovery workflows has accelerated the task of screening billions of molecules to identify top candidates for binding to particular proteins. Typically, these workflows are composed of distinct tasks run sequentially on HPC clusters to iteratively screen through the list of compounds, identify top candidates, perform molecular dynamics simulations, and fine-tune the AI surrogate. The sequential nature of these offline workflows results in multiple job submissions with long queue times and heavy use of the parallel file system. In this talk, we present MLDocking – an automated drug discovery workflow which leverages a novel distributed run-time called Dragon specifically designed to manage dynamic processes, memory, and data on HPC systems. MLDocking automates the identification of top candidates by executing all workflow components concurrently, efficiently distributing tasks across CPU and GPU resources available on current heterogeneous HPC systems. Moreover, it limits the use of the file system by performing all data sharing operations through an in-memory distributed dictionary that features local memory or fast RDMA transfers across the system’s interconnect. The talk will cover results obtained scaling the workflow on the Aurora supercomputer and lessons learned in managing large datasets for in-situ workflows.