Help Login Create account

Data released on June 26, 2016

Examplar data demonstrating the improvement of genome assembly and annotation by using AGOUTI.

Hahn, M, W; Zhang, S, V; Zhuo, L (2016): Examplar data demonstrating the improvement of genome assembly and annotation by using AGOUTI. GigaScience Database. RIS BibTeX Text

Genomes sequenced using short-read, next-generation sequencing technologies are error-filled and fragmented into thousands of small contigs. These incomplete and fragmented assemblies lead to errors in gene identification, such that single genes spread across multiple contigs are annotated as separate gene models. Such biases can confound inferences about the number of genes within species, as well as gene gain and loss between species. We present AGOUTI (Annotated Genome Optimization Using Transcriptome Information), a tool that uses RNA-seq data to simultaneously combine contigs into scaffolds and fragmented gene models into single models. We show that AGOUTI improves both the contiguity of genome assemblies and the accuracy of gene annotation, providing updated versions of each as output. Running AGOUTI on a simulated dataset, we show that it is highly accurate and that it achieves higher accuracy and contiguity compared to other existing methods. Here we provide the software, available free of charge under the MIT license, as well as the synthetic dataset for reuse and reproducibility. For the most recent updates to the software please refer to the GitHub page .

Contact Submitter

Related manuscripts:


Additional information:

Accessions (data included in GigaDB):

SRA: SRR3031982
SRA: SRR3031978
SRA: SRR3031987


AGOUTI assembly Genome RNA-Seq Exome 

Software, Transcriptomic, Genomic


  • Funding body - National Science Foundation
  • Award ID - DEB-1249633
  • Comment - Matthew W Hahn

Samples: Table Settings


Common Name
Scienfic Name
Sample Attributes
Taxonomic ID
Genbank Name

Sample IDTaxonomic IDCommon NameGenbank NameScientific NameSample Attributes
SAMN006782646239roundworm Caenorhabditis elegans Description:C. elegans, RNA extracted from early e...
Life stage:early embryo
Alternative accession-biosample:SAMN00678264
SAMN043350474081 tomatoSolanum lycopersicum Description:S. lycopersicum, RNA extracted from ro...
Life stage:mature
Alternative accession-biosample:SAMN04335047
SAMN043486534081 tomatoSolanum lycopersicum Description:S. lycopersicum, RNA extracted from r...
Life stage:mature
Alternative accession-biosample:SAMN04348653
SAMN04348661142760  Solanum lycopersicoides Description:S. lycopersicoides, RNA extracted from...
Life stage:mature
Alternative accession-biosample:SAMN04348661
SAMN051508286239roundworm Caenorhabditis elegans Description:C. elegans N2_CB strain, DNA extracted from whole worms
Life stage:Larval L4 phase
Alternative accession-biosample:SAMN05150828
Displaying 1-5 of 5 Sample(s).

Files: (FTP site) Table Settings


File Description
Sample ID
File Type
File Format
Release Date
Download Link
File Attributes

File NameSample IDFile TypeFile FormatSizeRelease Date 
TextTEXT4.21 MB2016-05-30
TextTEXT5.19 MB2016-05-30
OtherTEXT4.17 MB2016-05-30
TextTEXT2.55 MB2016-05-30
TextTEXT3.53 MB2016-05-30
OtherTEXT2.52 MB2016-05-30
TextTEXT9.63 MB2016-05-30
TextTEXT10.79 MB2016-05-30
OtherTEXT9.55 MB2016-05-30
AssemblyFASTA98.04 MB2016-05-30
Displaying 1-10 of 41 File(s).



Other datasets you might like: