Minisymposium

MS1D - Geometries and Topology of Learning for Computational Discovery in High Dimensional Biological Systems with Applications to Human Health

Fully booked

Monday, June 16, 2025

11:20

13:20

CEST

Room 6.0D13

Live streaming recording

Session recording

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.

Session Chair

Michael

Kirby

Colorado State University

Description

In recent years there has been substantial interest in using machine learning and AI algorithms for data-driven scientific discovery. This interest has to a large degree been fueled by significant increases in the power of high-performance computing coupled with growing availability of massive data sets, ranging from weather and climate simulations to biological studies of the host response to infectious disease. This computational and data driven research has led to a number of significant discoveries related to, e.g., protein-protein interactions, biomarkers of infectious disease, molecular neuroscience, immunology, cancer, and structural biology. This minisymposium will highlight recent work being done at the interface of high-performance computing, algorithms, mathematics, and computing for understanding complex systems biological systems. It will feature simulations and AI analyses using high-performance computing resources at Oak Ridge National Labs and the National Center for Atmospheric Research (NCAR). The focus will be on health-related applications, including modeling pathogen emergence in relation to climate change, graph learning and statistical shape analysis for understanding complex biological systems and learning neural activity patterns from deep geometric and topological networks.

Presentations

11:20

11:50

CEST

Computational modeling of protein structures: Quantifying the effect of mutations on protein structures

Point mutations in the protein sequences are known to have potential to alter the protein’s native fold, stability, and functions, and may result in observable disease phenotypes. Currently, there are over 200,000 experimental protein structures deposited in the Protein Data Bank, enabling the training of deep learning models for structure prediction. However, these models often fail to accurately predict the structural effects of mutations. As a result, we lack a quantitative understanding of the effect of mutations on protein structures. Here, we curate a dataset of x-ray crystal structure duplicates and their corresponding single-point mutant structures, creating opportunities to leverage high-performance computing for large-scale analysis and the training of predictive models. We quantify the local structural deformation per residue between wildtype-mutant pairs and compare them to the baseline within wildtype duplicates. Our analysis shows that on average, the magnitude of structural perturbation decreases as the sequence and spatial distance from the mutation site increase. We aim to illustrate the key physical features that determine the mutational impact and develop predictive models using data-driven approach in future studies. These results could advance our understanding of genetic diseases and support the development of structure-based drug discovery and therapeutic design.

Zhuoyi Liu, Alex Calabrese, and Corey O'Hern (Yale University)

11:50

12:20

CEST

Learning and Shape Analysis of Pose Image Manifolds

Despite the high-dimensionality of images, the sets of images of 3D objects have long been hypothesized to form low-dimensional manifolds. What is the nature of such manifolds? How do they differ across objects and object classes? Answering these questions can provide key insights in explaining and advancing success of machine learning algorithms in computer vision. This paper investigates dual tasks -- learning and analyzing shapes of image manifolds -- by revisiting a classical problem of manifold learning but from a novel geometrical perspective. It uses geometry-preserving transformations to map the pose image manifolds, sets of images formed by rotating 3D objects, to low-dimensional latent spaces. The pose manifolds of different objects in latent spaces are found to be nonlinear, smooth manifolds. The paper then compares shapes of these manifolds for different objects using Kendall's shape analysis, modulo rigid motions and global scaling, and clusters objects according to these shape metrics. Interestingly, pose manifolds for objects from the same classes are frequently clustered together. The geometries of image manifolds can be exploited to simplify vision and image processing tasks, to predict performances, and to provide insights into learning methods.

Anuj Srivastava, Benjamin Beadett, and Shenyuan Liang (Florida State University)

12:20

12:50

CEST

Developing a Data - Driven Farmer Vulnerability Index for Farms in Rural Communities with High Performance Computing

African Swine Fever (ASF) is a highly contagious and deadly viral disease infecting domestic and feral swine populations in Africa and Asia and more recently in the Europe, South America, and Caribbean. ASF has devastating impacts on swine industries in the affected countries. This study proposes to develop a ASF farmer vulnerability risk index for rural swine farms that integrates an array of potential environmental factors (e.g., domestic and feral swine population densities, precipitation, temperature, vegetation), geographic and social factors and seasonality to assess and predict regions where outbreaks are more likely to occur.   The approach is data agnostic, leveraging a broad range of available data with the goal of identifying discriminating features in a data-driven manner.   A feature indexing approach is used to construct labeled training data from historical outbreaks to train a machine learning model to produce a spatial risk index.   Sparse optimization tools are employed to identify the salient features most useful for predictive modeling. The resultant risk index can guide surveillance and preventive strategies, while also outlining limitations related to data granularity and model generalizability. This study incorporates integrating diverse data set to identify areas of risk to potentially inform mitigating the spread of ASF.

David Kott and Connor Price (Colorado State University), Tom Hopson and Jason C. Knievel (NSF - National Center of Atmospheric Research), Tracy L. Webb (Colorado State University), Olga Wilhelmi (NSF - National Center of Atmospheric Research), and Michael Kirby (Colorado State University)

12:50

13:20

CEST

AI-Driven Systems Biology for Addiction: Large-Scale Multi-Omics Network Modeling and AI Agents for Mechanistic Discovery

Understanding the genetic and molecular underpinnings of addiction and related disorders requires integrative approaches that leverage large-scale omics data, network biology, and artificial intelligence. This work presents a systems biology framework that combines predictive expression networks, foundation models, and AI agents to elucidate mechanisms underlying opioid and nicotine addiction. By integrating genome-wide association studies (GWAS), transcriptomics, and multiplex network modeling, we identify gene clusters linked to addiction-related phenotypes, emphasizing shared mechanisms between opioid use and smoking cessation. Using the MENTOR framework, we partition genes of interest into mechanistically coherent clades, revealing significant overlap between addiction pathways. Network-based analyses uncover key regulators, including BDNF/NTRK2 and MAPK signaling, which influence neuronal plasticity and reinforcement learning. AI-driven interpretation automates gene-function annotation, improving mechanistic inference. Further, retrieval-augmented generation (RAG) agents and reinforcement learning models facilitate high-throughput interpretation of biological networks, accelerating hypothesis generation. This study highlights AI’s role in translating multi-omics data into actionable insights for addiction biology. The framework extends to broader disease contexts, offering a scalable model for systems medicine. Future directions include validation through retrospective clinical trials and experimental assays, emphasizing the potential for AI-guided therapeutic discovery.

Daniel Jacobson and Matthew Lane (Oak Ridge National Laboratory)

Bookmark
this session

Unbookmark
this session

Saving...