Data released on February 05, 2016
Linkage disequilibrium is defined as the non-random associations of alleles at different loci, and it occurs when genotypes at the two loci depend on each other. The model of genetic hitchhiking predicts that strong positive selection affects the patterns of linkage disequilibrium around the site of a beneficial allele, resulting in specific motifs of correlations between neutral polymorphisms that surround the fixed beneficial allele. Linkage disequilibrium is built upon each side of a beneficial allele and diminishes for sites that are across the selected allele. This specific pattern of linkage disequilibrium occurs more frequently when positive selection has acted on the population rather than under various neutral models. Thus, detecting such patterns could accurately reveal targets of positive selection along a recombining chromosome or a genome. Calculating linkage disequilibria in whole genomes is computationally expensive since correlations of alleles need to be evaluated for millions of pairs of sites. To analyze large datasets efficiently, algorithmic implementations employed in modern population genetics need to exploit multiple cores of current workstations in a scalable way.
Population genomic datasets come in different types and shapes while typically exhibiting SNP density heterogeneity. Due to such peculiarities, the implementation of generally scalable parallel algorithms is a challenging task. In this work, we present a series of four parallelization strategies targeting shared-memory systems for the computationally intensive problem of detecting genomic regions that have contributed to the past adaptation of the species, also referred to as selective sweep, based on linkage disequilibrium (LD) patterns. We provide a thorough performance evaluation of the proposed parallel algorithms for computing LD and outline the benefits of each approach. Furthermore, we compare our open-source software OmegaPlus to a variety of neutrality tests.
The computational demands of selective sweep detection algorithms can vary significantly depending on a variety of attributes such as the SNP density heterogeneity, and the analysis of DNA or binary data due to the finite or infinite site models, respectively. Choosing the right parallel algorithm for the analysis can lead to significant processing time reduction as well as major energy savings. However, which parallel algorithm is going to execute more efficiently on a specific processor architecture and number of available cores for a particular dataset is not straightforward.
Alachiotis, N., & Pavlidis, P. (2016). Scalable linkage-disequilibrium-based selective sweep detection: a performance guide. GigaScience, 5(1). doi:10.1186/s13742-016-0114-9