Comprehensive characterization of genomic variation in a human individual is important for understanding disease and for development of personalized approaches to treatment. Many tools exist for identification of single nucleotide polymorphism (snps), small indels and large deletions based on DNA re-sequencing strategy. However, those approaches consistently display significant bias for recovery of complex structural variants and novel sequence in the individual genomes and lack sequence interpretation such as ancestral state and mechanism. Here we present a novel approach implemented in a single software package, AsmVar, to discover, genotype and characterize different forms of structural variants and novel sequence in population-scale de novo assemblies at single nucleotide resolution. Our approach displays good scalability and makes it applicable for investigations in large population studies of species with complex genomes, such as homo sapiens. Application of AsmVar to several human de novo assemblies captures a wide spectrum of structural variants and novel sequences present in the human population with high sensitivity and specificity. Our method provides a direct solution to investigate the structural variations and novel sequences from de novo assemblies, which is important for construction of population-scale pan genome. Our study also suggests the advantages of the de novo assembly strategy for definition of genome structure.
This software has been released under the MIT License Copyright 2014-2015.

Additional details

Read the peer-reviewed publication(s):

Liu, S., Huang, S., Rao, J., Ye, W., Krogh, A., & Wang, J. (2015). Discovery, genotyping and characterization of structural variation and novel sequence at single nucleotide resolution from de novo genome assemblies on a population scale. GigaScience, 4(1). doi:10.1186/s13742-015-0103-4

Additional information:

Sample IDTaxonomic IDCommon NameGenbank NameScientific NameSample Attributes
NA128789606HumanhumanHomo sapiens Ethnicity:Caucasian
Cell type:B-lymphocyte
Cell line:GM12878
Displaying 1-1 of 1 Sample(s).

File NameSample IDData TypeFile FormatSizeRelease Date 
Softwarezip25.25 MB2015-10-27
NA12878Otherzip80.35 MB2015-10-27
ReadmeTEXT2.59 KB2015-10-27
Displaying 1-3 of 3 File(s).
Funding body Awardee Award ID Comments
Danish National Advanced Technology Foundation
Danish National Research Foundation
Novo Nordisk UK Research Foundation
State Key Development Program for Basic Research of China-973 Program
Date Action
December 3, 2015 Dataset publish
January 12, 2016 Manuscript Link added : 10.1186/s13742-015-0103-4