Examplar data demonstrating the improvement of genome assembly and annotation by using AGOUTI.
Dataset type: Genomic, Software, Transcriptomic
Data released on June 26, 2016
Zhang SV; Zhuo L; Hahn MW (2016): Examplar data demonstrating the improvement of genome assembly and annotation by using AGOUTI. GigaScience Database. http://dx.doi.org/10.5524/100195
Genomes sequenced using short-read, next-generation sequencing technologies are error-filled and fragmented into thousands of small contigs. These incomplete and fragmented assemblies lead to errors in gene identification, such that single genes spread across multiple contigs are annotated as separate gene models. Such biases can confound inferences about the number of genes within species, as well as gene gain and loss between species. We present AGOUTI (Annotated Genome Optimization Using Transcriptome Information), a tool that uses RNA-seq data to simultaneously combine contigs into scaffolds and fragmented gene models into single models. We show that AGOUTI improves both the contiguity of genome assemblies and the accuracy of gene annotation, providing updated versions of each as output. Running AGOUTI on a simulated dataset, we show that it is highly accurate and that it achieves higher accuracy and contiguity compared to other existing methods. Here we provide the software, available free of charge under the MIT license, as well as the synthetic dataset for reuse and reproducibility. For the most recent updates to the software please refer to the GitHub page .
Additional details
Read the peer-reviewed publication(s):
Additional information:
https://github.com/svm-zhang/AGOUTI
Accessions (data generated as part of this study):
SRA:
SRR3031982
SRA:
SRR3031978
SRA:
SRR3031987
PROJECT:
PRJNA322306
Sample ID | Taxonomic ID | Common Name | Genbank Name | Scientific Name | Sample Attributes |
---|---|---|---|---|---|
SAMN00678264 | 6239 | roundworm | Caenorhabditis elegans | Description:C. elegans, RNA extracted from early e... Life stage:early embryo Alternative accession-BioSample:SAMN00678264 ... + | |
SAMN04335047 | 4081 | tomato | Solanum lycopersicum | Description:S. lycopersicum, RNA extracted from ro... Life stage:mature Alternative accession-BioSample:SAMN04335047 ... + | |
SAMN04348653 | 4081 | tomato | Solanum lycopersicum | Description:S. lycopersicum, RNA extracted from r... Life stage:mature Alternative accession-BioSample:SAMN04348653 ... + | |
SAMN04348661 | 142760 | Solanum lycopersicoides | Description:S. lycopersicoides, RNA extracted from... Life stage:mature Alternative accession-BioSample:SAMN04348661 ... + | ||
SAMN05150828 | 6239 | roundworm | Caenorhabditis elegans | Description:C. elegans N2_CB strain, DNA extracted from whole worms Life stage:Larval L4 phase Alternative accession-BioSample:SAMN05150828 |