P38 - Scalable Genomic Context Analysis with GCsnap2 on HPC Clusters
Description
GCsnap2 Cluster is a scalable, Python-based high performance solution for genomic context analysis, co-developed by computer and life scientists to overcome the scalability limitations of its predecessor, GCsnap1 Desktop. Leveraging distributed computing with mpi4py.futures, GCsnap2 Cluster achieved a 30× improvement in execution time, and can now perform genomic context analyses of hundreds of thousands of protein-coding gene sequences on HPC clusters. Its modular architecture enables creation of task-specific workflows and flexible deployment on various computational environments, making it ideally-suited for bioinformatics studies of large-scale datasets. This work highlights the potential of applying similar approaches to solve scalability challenges in other scientific domains that rely on large-scale data analysis pipelines.
Presenter(s)

Presenter
Since November 2024, I am a research assistant in the High Performance Computing (HPC) group and a Ph.D. student in the PhD Program Data Science at the University of Basel. I am working on improving scheduling in HPC systems using machine learning. I am also responsible for the μ-Cluster.In 2024, I received my M.Sc. degree from the University of Basel in Computer Science, with a major in Machine Intelligence. My master’s thesis focused on improving the performance of the genomic context analysis tool GCsnap.I received my Bachelor’s degree from the University of Basel in 2022. My bachelor thesis was on benchmarking DAPHNE, an integrated data analysis pipeline for large-scale data management, HPC, and machine learning.Before diving into computer science, I earned an M.Sc. in Business and Economics and worked as a regional economic forecaster.