Paper
AP1D - ACM Papers Session 1D
Live streaming
GCsnap2 Cluster is a scalable, high performance tool for genomic context analysis, developed to overcome the limitations of its predecessor, GCsnap1 Desktop. Leveraging distributed computing withmpi4py.futures, GCsnap2 Cluster achieved a 22× improvement in execution time and can now perform genomic context analysis for hundreds of thousands of input sequences in HPC clusters. Its modular architecture enables the creation of task-specific workflows and flexible deployment in various computational environments, making it well suited for bioinformatics studies of large-scale datasets.This work highlights the potential for applying similar approaches to solve scalability challenges in other scientific domains that rely on large-scale data analysis pipelines.
Positive natural selection is the driving force that enables species
to survive and reproduce in their environment. Localizing traces of positive selection has practical applications in studying virus evolution and designing more effective drug treatments. State-of-the-art methods for the detection of positive selection combine Convolutional Neural Networks (CNN) with sliding-window algorithms to scan genomic sequences with high precision, but require prohibitively long execution times to process whole genomes with fine-grained resolution. We present an FPGA-accelerated system for efficiently scanning whole genomes with high granularity, implementing a quantized version of FAST-NN, a CNN that has been designed through a hardware-aware neural architecture search. FAST-NN employs a compact representation of genomic data as features, which eliminates potential I/O bottlenecks in hardware. Our accelerator architecture consists of a dedicated stage for each CNN layer in a pipelined datapath that integrates a specialized buffer design; this enables data reuse between overlapping sliding windows by leveraging the dilated convolutions in FAST-NN. A design point implemented onto an Alveo U250 accelerator card achieves comparable accuracy to FAST-NN, with a maximum reduction of only 2.2% due to quantization, while producing a classification outcome in each clock cycle at a frequency of 100MHz. Scanning the entire human genome (excluding the sex chromosomes), we observed between 19.51× and 28.61× faster processing than a PyTorch implementation on a 16-core CPU, and between 1.22× and 2.89× faster processing than a high-end GPU. The architecture is adaptable to other domains where CNNs are deployed in sliding-window algorithms for large-scale data processing.