Back

Minisymposium Presentation

MLDocking: Accelerated Drug Discovery with Transformer-Based Surrogate Models and In-Memory Workflows on Heterogeneous HPC Systems

Tuesday, June 17, 2025
12:00
-
12:30
CEST
Climate, Weather and Earth Sciences
Climate, Weather and Earth Sciences
Climate, Weather and Earth Sciences
Chemistry and Materials
Chemistry and Materials
Chemistry and Materials
Computer Science and Applied Mathematics
Computer Science and Applied Mathematics
Computer Science and Applied Mathematics
Humanities and Social Sciences
Humanities and Social Sciences
Humanities and Social Sciences
Engineering
Engineering
Engineering
Life Sciences
Life Sciences
Life Sciences
Physics
Physics
Physics

Description

The use of AI in drug discovery workflows has accelerated the task of screening billions of molecules to identify top candidates for binding to particular proteins. Typically, these workflows are composed of distinct tasks run sequentially on HPC clusters to iteratively screen through the list of compounds, identify top candidates, perform molecular dynamics simulations, and fine-tune the AI surrogate. The sequential nature of these offline workflows results in multiple job submissions with long queue times and heavy use of the parallel file system. In this talk, we present MLDocking – an automated drug discovery workflow which leverages a novel distributed run-time called Dragon specifically designed to manage dynamic processes, memory, and data on HPC systems. MLDocking automates the identification of top candidates by executing all workflow components concurrently, efficiently distributing tasks across CPU and GPU resources available on current heterogeneous HPC systems. Moreover, it limits the use of the file system by performing all data sharing operations through an in-memory distributed dictionary that features local memory or fast RDMA transfers across the system’s interconnect. The talk will cover results obtained scaling the workflow on the Aurora supercomputer and lessons learned in managing large datasets for in-situ workflows.

Authors