Software and supporting material for: "SmileFinder: a resampling-based platform to evaluate signatures of selection from genome-wide sets of matching allele frequency data between populations".
Dataset type: Software, Workflow
Data released on November 05, 2014
SmileFinder is a simple program that looks for the sweep patterns left by historic selection in genome-wide allele frequency datasets by evaluating the diversity and difference between two or more populations of diploid species against the neutral expectation. The program calculates the mean of heterozygosity and of variance in fixation index (FST) in a set of sliding windows of incrementally increasing sizes, and then builds a resampled distribution (the baseline) of random multi-locus sets matched to the sizes of sliding windows, using an unrestricted sampling. Percentiles of the values in the sliding windows are derived from the superimposed resampled distribution. The resampling can easily be scaled from 1K to 100M times; the higher the number the more precise the percentiles ascribed to the extreme observed values.
The output from SmileFinder can be used to plot percentile values to look for signatures of selection along chromosome maps, or to compare lists of candidate genes to random gene sets to test for the overrepresentation of sweeps. Both uses of the algorithms have already been implemented in published studies. This publicly available, open source program should become a useful tool for preliminary scans of selection using worldwide databases of human genetic variation as well as population datasets for many non-human species from which such data is rapidly emerging with the advent of the new genotyping and sequencing technologies. A Galaxy workflow showing how Smilefinder can be used is available from GigaGalaxy (http://galaxy.cbiit.cuhk.edu.hk/u/gigascience/g/wguiblet2014).
Read the peer-reviewed publication(s):
Guiblet, W. M., Zhao, K., O’Brien, S. J., Massey, S. E., Roca, A. L., & Oleksyk, T. K. (2015). SmileFinder: a resampling-based approach to evaluate signatures of selection from genome-wide sets of matching allele frequency data in two or more diploid populations. GigaScience, 4(1), 1–6. doi:10.1186/2047-217x-4-1
Accessions (data included in GigaDB):