GigaDB Dataset - DOI 10.5524/100304 - Supporting data for "Calculating quality of public high-throughput sequencing data to obtain suitabl ...

http://gigadb.org/images/data/cropped/100304.jpg

Supporting data for "Calculating quality of public high-throughput sequencing data to obtain suitable subset for reanalysis from the Sequence Read Archive"

Dataset type: Metadata
Data released on April 19, 2017

Ohta T; Nakazato T; Bono H (2017): Supporting data for "Calculating quality of public high-throughput sequencing data to obtain suitable subset for reanalysis from the Sequence Read Archive" GigaScience Database. http://dx.doi.org/10.5524/100304

DOI10.5524/100304

It is important for public data repositories to promote the reuse of archived data. In the growing field of omics science, however, the increasing number of submissions of high-throughput sequencing (HTSeq) data to public repositories prevents users from choosing a suitable data set from among the large number of search results. Repository users need to be able to set a threshold to reduce the number of results to obtain a suitable subset of high-quality data for reanalysis.
We calculated the quality of sequencing data archived in a public data repository, the Sequence Read Archive (SRA), by using the quality control software FastQC. We obtained quality values for 1,171,313 experiments, which can be used to evaluate the suitability of data for reuse. We also visualized the data distribution in SRA by integrating the quality information and metadata of experiments and samples.
We provide quality information of all of the archived sequencing data which enable users to obtain sufficient quality sequencing data for reanalyses. The calculated quality data are available to the public in various formats. Our data also provide an example of enhancing the reuse of public data by adding metadata to published research data by a third party.

Keywords:

Additional details

Read the peer-reviewed publication(s):

(PubMed: 28449062)

Additional information:

https://github.com/inutano/sra-quanto

http://integbio.jp/rdf/?view=detail&id=quanto

Files
History

File Name	Data Type	File Format	Size	Release Date
quanto.data.20161021.tsv	Tabular data	TSV	464.48 MB	2017-04-05
quanto.tar.gz	Mixed archive	TAR	114.59 MB	2017-04-05
readme.txt	Readme	TEXT	2.26 KB	2017-04-05
sra-quanto-master.zip	GitHub archive	archive	28.13 KB	2017-04-05

Displaying 1-4 of 4 File(s).

Date	Action
April 19, 2017	Dataset publish
October 2, 2017	Manuscript Link added : 10.1093/gigascience/gix029
November 9, 2022	Manuscript Link updated : 10.1093/gigascience/gix029

Supporting data for "Calculating quality of public high-throughput sequencing data to obtain suitable subset for reanalysis from the Sequence Read Archive"

Additional details

Read the peer-reviewed publication(s):

Additional information:

Columns: