Examplar data demonstrating the improvement of genome assembly and annotation by using AGOUTI.

Dataset type: Genomic, Software, Transcriptomic
Data released on June 26, 2016

Zhang SV; Zhuo L; Hahn MW (2016): Examplar data demonstrating the improvement of genome assembly and annotation by using AGOUTI. GigaScience Database. http://dx.doi.org/10.5524/100195

DOI10.5524/100195

Genomes sequenced using short-read, next-generation sequencing technologies are error-filled and fragmented into thousands of small contigs. These incomplete and fragmented assemblies lead to errors in gene identification, such that single genes spread across multiple contigs are annotated as separate gene models. Such biases can confound inferences about the number of genes within species, as well as gene gain and loss between species. We present AGOUTI (Annotated Genome Optimization Using Transcriptome Information), a tool that uses RNA-seq data to simultaneously combine contigs into scaffolds and fragmented gene models into single models. We show that AGOUTI improves both the contiguity of genome assemblies and the accuracy of gene annotation, providing updated versions of each as output. Running AGOUTI on a simulated dataset, we show that it is highly accurate and that it achieves higher accuracy and contiguity compared to other existing methods. Here we provide the software, available free of charge under the MIT license, as well as the synthetic dataset for reuse and reproducibility. For the most recent updates to the software please refer to the GitHub page .

Additional details

Read the peer-reviewed publication(s):


Additional information:

https://github.com/svm-zhang/AGOUTI

Accessions (data generated as part of this study):

SRA: SRR3031982
SRA: SRR3031978
SRA: SRR3031987
PROJECT: PRJNA322306





Sample IDTaxonomic IDCommon NameGenbank NameScientific NameSample Attributes
SAMN006782646239roundworm Caenorhabditis elegans Description:C. elegans, RNA extracted from early e...
Life stage:early embryo
Alternative accession-BioSample:SAMN00678264
...
+
SAMN043350474081 tomatoSolanum lycopersicum Description:S. lycopersicum, RNA extracted from ro...
Life stage:mature
Alternative accession-BioSample:SAMN04335047
...
+
SAMN043486534081 tomatoSolanum lycopersicum Description:S. lycopersicum, RNA extracted from r...
Life stage:mature
Alternative accession-BioSample:SAMN04348653
...
+
SAMN04348661142760  Solanum lycopersicoides Description:S. lycopersicoides, RNA extracted from...
Life stage:mature
Alternative accession-BioSample:SAMN04348661
...
+
SAMN051508286239roundworm Caenorhabditis elegans Description:C. elegans N2_CB strain, DNA extracted from whole worms
Life stage:Larval L4 phase
Alternative accession-BioSample:SAMN05150828
Displaying 1-5 of 5 Sample(s).




File NameSample IDData TypeFile FormatSizeRelease Date 
TextTEXT4.21 MB2016-05-30
TextTEXT5.19 MB2016-05-30
OtherTEXT4.17 MB2016-05-30
TextTEXT2.55 MB2016-05-30
TextTEXT3.53 MB2016-05-30
OtherTEXT2.52 MB2016-05-30
TextTEXT9.63 MB2016-05-30
TextTEXT10.79 MB2016-05-30
OtherTEXT9.55 MB2016-05-30
Sequence assemblyFASTA98.04 MB2016-05-30
Displaying 1-10 of 41 File(s).
Funding body Awardee Award ID Comments
National Science Foundation DEB-1249633 Matthew W Hahn
Date Action
June 26, 2016 Dataset publish
July 21, 2016 Manuscript Link added : 10.1186/s13742-016-0136-3
May 1, 2020 File readme.txt updated
December 3, 2020 File n2_ee_50-120.200k.bam updated
December 3, 2020 File n2_ee_50-120.100k.bam updated
November 14, 2022 Data type for File fastqs.tar.gz updated