Data released on August 28, 2015
One of the fundamental problems of modern genomics is to extract the genetic architecture of a complex trait from a data set of individual genotypes and trait values. This problem is complicated by the large number of candidate genes, the potentially large number of causal loci, and the likely presence of some nonlinear interactions between different genes. Compressed Sensing methods obtain solutions to under-constrained systems of linear equations. These methods can be applied to the problem of determining the best model relating genotype to phenotype, and generally deliver better performance than simply regressing the phenotype against each genetic variant, one at a time. We introduce a Compressed Sensing method that can reconstruct nonlinear genetic models (i.e., including epistasis, or gene-gene interactions) from phenotype-genotype (GWAS) data. Our method uses L1-penalized regression applied to nonlinear functions of the sensing matrix.
Most of our simulations are performed using synthetic genomes with the minor allele frequency (MAF) restricted to values between 0.05 and 0.5. The synthetic genomes are determined as follows: generate a random population-level MAF ∈ (0.05, 0.5) for each locus, then populate each individual genome with 0,1,2 SNP values according to the MAF for each locus.
Ho, C. M., & Hsu, S. D. (2015). Determination of nonlinear genetic architecture using compressed sensing. GigaScience, 4(1). doi:10.1186/s13742-015-0081-6