Supporting data for "SVEngine: an efficient and versatile simulator of genome structural variations with features of cancer clonal evolution"

Dataset type: Software, Genomic
Data released on June 21, 2018

Xia LC; Ai D; Lee H; Andor N; Li C; Zhang NR; Ji HP (2018): Supporting data for "SVEngine: an efficient and versatile simulator of genome structural variations with features of cancer clonal evolution" GigaScience Database.


Simulating genome sequence data with features can facilitate the development and benchmarking of structural variant analysis programs. However, there are a limited number of data simulators that provide structural variants in silico. Moreover, there are a paucity of programs that generate structural variants with different allelic fraction and haplotypes. We developed SVEngine, an open source tool to address this need. SVEngine simulates next generation sequencing data with embedded structural variations. As input, SVEngine takes template haploid sequences (FASTA) and an external variant file, a variant distribution file and/or a clonal phylogeny tree file (NEWICK) as input. Subsequently, it simulates and outputs sequence contigs (FASTAs), sequence reads (FASTQs) and/or post-alignment files (BAMs). All of the files contain the desired variants, along with BED files containing the ground truth. SVEngine's flexible design process enables one to specify size, position, and allelic fraction for deletion, insertion, duplication, inversion and translocation variants. Finally, SVEngine simulates sequence data that replicates the characteristics of a sequencing library with mixed sizes of DNA insert molecules. To improve the compute speed, SVEngine is highly parallelized to reduce the simulation time. We demonstrated the versatile features of SVEngine and its improved runtime comparisons with other available simulators. SVEngine's features include the simulation of locus-specific variant frequency designed to mimic the phylogeny of cancer clonal evolution. We validated the accuracy of the simulations. Our evaluation included checking various sequencing mapping features such as coverage change, read clipping, insert size shift and neighbouring hanging read pairs for representative variant types. SVEngine is implemented as a standard Python package and is freely available for academic use at:

Additional details

Additional information:


Sample IDTaxonomic IDCommon NameGenbank NameScientific NameSample Attributes
NA128789606HumanhumanHomo sapiens Description:Genomic DNA of female Homo sapiens use...
Alternative names:human
Isolation source:peripheral vein
Displaying 1-1 of 1 Sample(s).

File NameSample IDData TypeFile FormatSizeRelease Date 
BitBucket archivearchive41.65 MB2018-06-14
ReadmeTEXT2.7 KB2018-06-14
Displaying 1-2 of 2 File(s).
Funding body Awardee Award ID Comments
National Human Genome Research Institute HP Ji R01HG006137
National Cancer Institute N Andor U01CA15192001 Cancer Target Discovery and Development (CTDD) Consortium
National Cancer Institute N Andor K99 CA215256
National Institutes of Health HP Ji P01 CA91955
National Institutes of Health HP Ji NIH U01CA15192001
National Natural Science Foundation of China D Ai 61370131
American Cancer Society HP Ji RSG-13-297-01-TBG
Date Action
June 22, 2018 Dataset publish