Supporting data for "PiGx: Reproducible genomics analysis pipelines with GNU Guix"

Dataset type: Software
Data released on September 03, 2018

Wurmus R; Uyar B; Osberg B; Franke V; Gosdschan A; Wreczycka K; Ronen J; Akalin A (2018): Supporting data for "PiGx: Reproducible genomics analysis pipelines with GNU Guix" GigaScience Database.


In bioinformatics, as well as other computationally-intensive research fields, there is a need for workflows that can reliably produce consistent output, from known sources, independent of the software environment or configuration settings of the machine on which they are executed. Indeed, this is essential for controlled comparison between different observations or for the wider dissemination of workflows. Providing this type of reproducibility and traceability, however, is often complicated by the need to accommodate the myriad dependencies included in a larger body of software, each of which generally come in various versions. Moreover, in many fields (bioinformatics being a prime example), these versions are subject to continual change due to rapidly evolving technologies, further complicating problems related to reproducibility. Here, we propose a principled approach for building analysis pipelines and managing their dependencies. As a case study to demonstrate the utility of our approach, we present a set of highly reproducible pipelines for the analysis of RNA-seq, ChIP-seq, Bisulfite-seq, and single-cell RNA-seq. All pipelines process raw experimental data, and generate reports containing publication-ready plots and figures, with interactive report elements and standard observables. Users may install these highly reproducible packages and apply them to their own datasets without any special computational expertise beyond the use of the command line. We hope such a toolkit will provide immediate benefit to laboratory workers wishing to process their own data sets or bioinformaticians seeking to automate all, or parts of, their analyses. In the long term, we hope our approach to reproducibility will serve as a blueprint for reproducible workflows in other areas. Our pipelines, along with their corresponding documentation and sample reports, are available at

Additional details

Read the peer-reviewed publication(s):

Wurmus, R., Uyar, B., Osberg, B., Franke, V., Gosdschan, A., Wreczycka, K., … Akalin, A. (2018). PiGx: reproducible genomics analysis pipelines with GNU Guix. GigaScience, 7(12). doi:10.1093/gigascience/giy123

Additional information:

File NameSample IDData TypeFile FormatSizeRelease Date 
mixed archiveTAR1.76 KB2018-08-24
mixed archiveTAR4.85 KB2018-08-24
HTMLHTML8.25 MB2018-08-24
HTMLHTML1.67 MB2018-08-24
HTMLHTML12.88 MB2018-08-24
GitHub archivearchive4.83 MB2018-08-24
GitHub archivearchive27.48 MB2018-08-24
GitHub archivearchive435.46 KB2018-08-24
GitHub archivearchive53.42 MB2018-08-24
GitHub archivearchive7.65 MB2018-08-24
Displaying 1-10 of 24 File(s).
Funding body Awardee Award ID Comments
German Federal Ministry of Education and Research (BMBF) B Uyar 031 A538C RBC
Berlin Institute for Health K Wreczycka
Date Action
September 3, 2018 Dataset publish
November 3, 2018 Manuscript Link added : 10.1093/gigascience/giy123