Supporting data for "Improving the annotation of the Heterorhabditis bacteriophora genome"

Dataset type: Genomic
Data released on March 07, 2018

McLean F; Berger D; Laetsch DR; Schwartz HT; Blaxter M (2018): Supporting data for "Improving the annotation of the Heterorhabditis bacteriophora genome" GigaScience Database.


Genome assembly and annotation remains an exacting task. As the tools available for these tasks improve, it is useful to return to data produced with earlier instances to assess their credibility and correctness. The entomopathogenic nematode Heterorhabditis bacteriophora is widely used to control insect pests in horticulture. The genome sequence for this species was reported to encode an unusually high proportion of unique proteins and a paucity of secreted proteins compared to other related nematodes. We revisited the H. bacteriophora genome assembly and gene predictions to ask whether these unusual characteristics were biological or methodological in origin. We mapped an independent resequencing dataset to the genome and used the blobtools pipeline to identify potential contaminants. While present (0.2% of the genome span, 0.4% of predicted proteins), assembly contamination was not significant. Re-prediction of the gene set using BRAKER1 and published transcriptome data generated a predicted proteome that was very different from the published one. The new gene set had a much reduced complement of unique proteins, better completeness values that were in line with other related species' genomes, and an increased number of proteins predicted to be secreted. It is thus likely that methodological issues drove the apparent uniqueness of the initial H. bacteriophora genome annotation and that similar contamination and misannotation issues affect other published genome assemblies.

Additional details

Read the peer-reviewed publication(s):

McLean, F., Berger, D., Laetsch, D. R., Schwartz, H. T., & Blaxter, M. (2018). Improving the annotation of the Heterorhabditis bacteriophora genome. GigaScience, 7(4). doi:10.1093/gigascience/giy034

Additional information:

Accessions (data referenced by this study):


Sample IDTaxonomic IDCommon NameGenbank NameScientific NameSample Attributes
G2a122337862  Heterorhabditis bacteriophora Strain:G2a1223
Description: Heterorhabditis bacteriophora G2a1223...
Collected by:Adler DILLMAN
Displaying 1-1 of 1 Sample(s).

File NameSample IDData TypeFile FormatSizeRelease Date 
annotationUNKNOWN5.54 MB2018-03-05
annotationGFF37.52 MB2018-03-05
annotationUNKNOWN38.33 MB2018-03-05
annotationUNKNOWN5.16 MB2018-03-05
annotationGFF35.6 MB2018-03-05
annotationUNKNOWN36.47 MB2018-03-05
TextTEXT31.45 KB2018-03-05
TextTEXT0.59 KB2018-03-05
TextTEXT0.69 KB2018-03-05
AnnotationTEXT512.65 KB2018-03-12
Displaying 1-10 of 28 File(s).
Funding body Awardee Award ID Comments
Wellcome Trust Dr Florence McLean 204052/Z/16/Z
Date Action
March 7, 2018 Dataset publish
March 7, 2018 Additional file readme.txt added
March 7, 2018 File README.txt removed
March 7, 2018 File removed : README.txt
March 7, 2018 File readme.txt removed
March 7, 2018 File removed : readme.txt
March 7, 2018 Additional file readme.txt added
March 8, 2018 External Link added :
March 16, 2018 Link added : PROJECT:PRJNA438576
July 3, 2018 Manuscript Link added : 10.1093/gigascience/giy034
November 29, 2018 File updated