Supporting data for "GeneSeqToFamily: a Galaxy workflow to find gene families based on the Ensembl Compara GeneTrees pipeline"

Dataset type: Genomic, Workflow, Software
Data released on January 19, 2018

Thanki AS; Soranzo N; Haerty W; Davey RP (2018): Supporting data for "GeneSeqToFamily: a Galaxy workflow to find gene families based on the Ensembl Compara GeneTrees pipeline" GigaScience Database.


Gene duplication is a major factor contributing to evolutionary novelty, and the contraction or expansion of gene families has often been associated with morphological, physiological and environmental adaptations. The study of homologous genes helps us to understand the evolution of gene families. It plays a vital role in finding ancestral gene duplication events as well as identifying genes that have diverged from a common ancestor under positive selection. There are various tools available, such as MSOAR, OrthoMCL and HomoloGene, to identify gene families and visualise syntenic information between species, providing an overview of syntenic regions evolution at the family level. Unfortunately, none of them provide information about structural changes within genes, such as the conservation of ancestral exon boundaries amongst multiple genomes. The Ensembl GeneTrees computational pipeline generates gene trees based on coding sequences and provides details about exon conservation, and is used in the Ensembl Compara project to discover gene families. A certain amount of expertise is required to configure and run the Ensembl Compara GeneTrees pipeline via command line. Therefore, we have converted the command line Ensembl Compara GeneTrees pipeline into a Galaxy workflow, called GeneSeqToFamily, and provided additional functionality. This workflow uses existing tools from the Galaxy ToolShed, as well as providing additional wrappers and tools that are required to run the workflow. GeneSeqToFamily represents the Ensembl Compara pipeline as a set of interconnected Galaxy tools, so they can be run interactively within the Galaxy’s user-friendly workflow environment while still providing the flexibility to tailor the analysis by changing configurations and tools if necessary. Additional tools allow users to subsequently visualise the gene families produced by the workflow, using the Aequatus.js interactive tool, which has been developed as part of the Aequatus software project.

Additional details

Read the peer-reviewed publication(s):

Thanki, A. S., Soranzo, N., Haerty, W., & Davey, R. P. (2018). GeneSeqToFamily: a Galaxy workflow to find gene families based on the Ensembl Compara GeneTrees pipeline. GigaScience, 7(3). doi:10.1093/gigascience/giy005

Additional information:

File NameSample IDData TypeFile FormatSizeRelease Date 
GitHub archivearchive3.22 MB2018-01-16
SoftwareUNKNOWN7.6 GB2018-01-16
mixed archivearchive232.38 KB2018-01-16
ReadmeTEXT2.82 KB2018-01-16
Displaying 1-4 of 4 File(s).
Funding body Awardee Award ID Comments
Biotechnology and Biological Sciences Research Council N Soranzo (BBSRC Biomathematics and Bioinformatics Training fund (2014).
Biotechnology and Biological Sciences Research Council A Thanki W Haerty R P Davey BBS/E/T/000PR9814 (BBSRC strategic funds)
Date Action
January 19, 2018 Dataset publish
March 30, 2018 Manuscript Link added : 10.1093/gigascience/giy005