Back

Minisymposium Presentation

Building Ultra-Large Pangenomes

Monday, June 16, 2025
15:00
-
15:30
CEST
Climate, Weather and Earth Sciences
Climate, Weather and Earth Sciences
Climate, Weather and Earth Sciences
Chemistry and Materials
Chemistry and Materials
Chemistry and Materials
Computer Science and Applied Mathematics
Computer Science and Applied Mathematics
Computer Science and Applied Mathematics
Humanities and Social Sciences
Humanities and Social Sciences
Humanities and Social Sciences
Engineering
Engineering
Engineering
Life Sciences
Life Sciences
Life Sciences
Physics
Physics
Physics

Presenter

Yatish
Turakhia
-
University of California San Diego

Dr. Yatish Turakhia is an Assistant Professor in the Department of Electrical and Computer Engineering at the University of California San Diego (UCSD), with affiliations in the Department of Computer Science and Engineering (CSE) and the Bioinformatics and Systems Biology (BISB) graduate program. Prior to joining UCSD, he was a postdoctoral scholar at the Genomics Institute, UC Santa Cruz. Dr. Turakhia earned his Ph.D. in Electrical Engineering from Stanford University in 2019 and his bachelor’s and master’s degrees in Electrical Engineering from the Indian Institute of Technology (IIT) Bombay in 2014. He is a recipient of the MIT Technology Review’s Innovators Under 35 award, Hellman Fellowship, Jacobs Early Career Award, Amazon Research Award, NVIDIA Graduate Fellowship, and multiple paper awards.

Description

Pangenomics is an emerging field that is allowing us to accurately and comprehensively study the within-species genetic diversity and its relationship to physical traits (phenotypes) by using a collection of genomes of a species instead of a single reference genome. Future pangenomics applications would require analyzing ultra-large and ever-growing collections of genomes. While existing pangenome data formats can represent the genetic variation in a collection of genomes, they do not store their shared evolutionary and mutational histories and are also unlikely to keep up with the speed and volume of genome sequencing data. In this talk, I will present ongoing work from my lab on a novel pangenomic data representation that achieves significant improvements in memory efficiency and the representative power of pangenomes. I will then discuss how we are leveraging GPUs and HPC systems to construct massive pangenomes consisting of millions of sequences. While the focus will be on microbial genomes, I will also discuss how these approaches can be extended to more complex genomes.

Authors