Help Login Create account

Contents

  1. What is GigaDB?
  2. What journals are integrated with GigaDB ?
  3. Why use GigaDB?
  4. What kinds of data does GigaDB accept?
  5. My research is on human subjects. Can I archive my data in GigaDB?
  6. In what file format(s) should I submit my data?
  7. When should I submit my data?
  8. How can I modify files I have submitted to GigaDB while my article is in review?
  9. What should I prepare before submission?
  10. How can I make my data submission as accessible and reusable as possible?
  11. How do I submit data?
  12. How do I write a ReadMe file?
  13. How do I cite the data in my manuscript?
  14. Are there any problems with publishing my final research paper AFTER publishing the data in GigaScience?
  15. Do I have the option to embargo release of my data?
  16. How much does it cost?
  17. Do I have to pay to download or use the data?
  18. How do I download a large dataset with my slow internet connection ?
  19. How do I cite data from GigaDB ?
  20. How do I download information to my citation management software?
  21. What is a dataset?
  22. Does my journal work with GigaDB and how?
  23. What is a GigaDB DOI?
  24. Why does GigaDB use Creative Commons Zero?
  25. Can the GigaDB repository help me prepare a data management plan?
  26. What are the charges for submitting data?
  27. Why is submission to GigaDB not closely integrated with submission to GigaScience?
  28. How are datasets in GigaDB backed up?
  29. What happens to data after it is submitted?
  30. Can I see how often my dataset is being used and downloaded?
  31. How may data from GigaDB be reused?
  32. What is the side-bar on the right of all dataset pages?
  33. What is Hypothes.is ?
  34. How do I report missing values in my metadata?


What is GigaDB?

GigaDB is the home for all data/files/tools/software associated with GigaScience manuscripts. GigaDB curators will ensure the information is complete and appropriately formatted, before cataloging and publishing. Submission of data to GigaDB complements but does not serve as a replacement for community approved public repositories, supporting data and source code should still be made publicly available in a suitable public repository. GigaDB can link any and all publicly deposited data together with additional files/tools that do not have a natural home in any other public repository.

What journals are integrated with GigaDB?

At the present only GigaScience.

Why use GigaDB?

GigaScience is committed to enabling reproducible science, to do this, readers need to be able to easily find and get hold off all the underlying data, methods, workflows, software and anything else that was used in the research. In the past authors of research articles have made (justified) claims that there is no way of making all their data available, now GigaDB has filled that gap. GigaDB complements but does not serve as a replacement for community approved public repositories, and can link any and all publicly deposited data together with additional files/tools that do not have a natural home in any other public repository. Any and all data/files required for reproducibility of GigaScience manuscripts should be either hosted in, or linked to from, GigaDB.

What kinds of data does GigaDB accept?

Anything related to a GigaScience article that does NOT already have a relevant public repository (e.g. sequence data should still be deposited in the INSDC archives and/or the SRA).

My research is on human subjects. Can I archive my data in GigaDB?

As long as the data is fully consented and legally and ethically approved for public release, we encourage complete disclosure including a blank copy of the consent form signed where possible.

In what file format(s) should I submit my data?

There are many specific formats depending on the date types, but the rule of thumb is to use non-propitiatory formats and where possible follow the standard for the relevant field, our curators will be on hand to help with any specific questions on this matter.

When should I submit my data?

At the same time or soon after submission of the manuscript. GigaScience reviewers will be looking to see if the underlying data is available and appropriate, so they will need access to it. This can be via your own private servers if you prefer, but we offer a secure staging area to host data under review/pre-release.

How can I modify files I have submitted to GigaDB while my article is in review?

While the dataset is "pre-release" status, any files can be replaced by over-writing the original file with the newly modified one. After publishing the data, no over-writing will be permissible, only the addition of new files, all published files will remain available (unless there is a very good reason to remove them). Versioning is still possible for updates and major changes to the files if they need to be changed post-publication.

What should I prepare before submission?

Organise your data into logical directories/folders, name files consistently and without using spaces or special characters. Re-read you methods section to check that everything mentioned there is available, either by links to public repositories or as files you have organised to submit to GigaDB.

How can I make my data submission as accessible and reusable as possible?

By ensuring all files are in non-proprietary formats that anyone can use without the need for expensive software.By making sure data tables are in tables not PDF.By using a CC0 waiver or other suitable public domain licenses for datasets. By using OSI (Open Source Initiative) licenses for software, and linking to versions in code repositories for updates and forking.By including as much metadata about the samples/specimens/files/methods etc... as possible.

How do I submit data?

All data submissions should be approved before being started, please contact editorial@gigasciencejournal.com to discuss your article and associated data with our editors.Once approved, there are two possible routes to provide the metadata about your data:
  1. use the online submission wizard - this is a good option for datasets with few authors, and few files. The wizard currently does not have functionality to upload tabular information so everything must be typed in individually.
  2. use the template spreadsheet (excel, but compatible with open office too) downloadable from here - This option is better where there are multiple authors and/or multiple files and/or samples. NB. the spreadsheet contains macros, but these are only to allow the forward and back buttons to work so can be disabled, you can just click the relevant tabs at the bottom of the spreadsheet.
For more details on submitting using the Spreadsheet please see here.

How do I write a ReadMe file?

The readme file is an important part of any dataset and our curators will be able and willing to assist with this if required. We intend to formalise the readme format at some point in the near future, but for now here is an example of the format we try to work to:

filename = readme.txt
format = ASCII plain text (not RTF, not .doc !)

<Dataset title>
==========
<Author list>:<year>, GigaScience database, <DOI>
summary:
---------
[optionally you may include a summary text about the dataset or directory structure used here]
Associated data:
--------------
[list any URL links or DOIs to other public repository data]

Directories:
----------
[list any directories of related files with a description to help users understand why these files are grouped into a directory]
<directory_name> - <description of the group of files in the directory>

Files:
-----
[list the files available in this dataset with a brief description for each]
<filename> - <description>

How do I cite the data in my manuscript?

GigaScience supports and has signed the FORCE 11 Joint Declaration of the Data Citation Principles, feeling strongly that data should be accorded the same importance in the scholarly record as citations of publications. GigaDB datasets can and should be cited in the same manner as any other reference, although the format is journal specific based on their instructions. Following DCC and DataCite guildelines, in GigaScience journal the citation within the references section will be of the form:Author List; publication year: "Dataset title", GigaScience Database. DOI. Example.Peter E Larsen; Yang Dai; (2015): Supporting materials for "Metabolome of Human gut microbiome is predictive of host dysbiosis".; GigaScience Database. http://dx.doi.org/10.5524/100163

Are there any problems with publishing my final research paper AFTER publishing the data in GigaScience?

There shouldn't be, and a major rationale for data publishing is to incentivise earlier release of data in this manner. It is commonly understood throughout the publishing community that publishing data (as a Data Note or in a public archive) is a good thing to be encouraged, and as such, there are no penalties to then subsequently publishing research based on those data. GigaScience has published plenty of data notes and released data sets prior to the analysis papers being published, some examples are:
  • 3,000 Rice Genomes Project (13.4 Tb data).
  • Polar Bear genome - dataset released in GigaDB nearly 3 years before the analysis paper was published in Cell.
  • Deadly 2011 outbreak E. Coli genome that lead to over 50 deaths in Germany (and eventually published in NEJM).

Our Polar bear genome data was released nearly three years before any official publication came out from the project, and despite being used by at least 5 other studies, the analysis paper made the cover of Cell (see the blog for more).

Journals do not consider the publication of a dataset with a DOI and associated protocol information as a 'prior publication' that would preclude subsequent publication of new results obtained from such a dataset. F1000 Research did a useful survey to confirm this with a number of publishers (see: F1000 policy), and this is only going to become increasingly observed and accepted as most of the publishers are now promoting their own Data Journals.



Do I have the option to embargo release of my data?

As early release as possible is encouraged, although the standard protocol we follow is to maintain data as private to the peer reviewers only until the associated manuscript has been formally accepted, at which point the dataset is released, this is usually several days prior to the manuscript publication due to production times of the BMC publishing system. While we cannot foresee any reason why datasets should be embargoed for extended periods we can discuss this further on a case-by-case basis.If you have major concerns about someone else publishing on your data before you, we can add a Fair Use policy statement on the GigaDB dataset page which looks like this:
policy

These data are made available pre-publication under the recommendations of the Fort Lauderdale/Toronto meetings. Please respect the rights of the data producers to publish their whole dataset analysis first. The data is being made available so that the research community can make use of them for more focused studies without having to wait for publication of the whole dataset analysis paper. If you wish to perform analyses on this complete dataset, please contact the authors directly so that you can work in collaboration rather than in competition.

This dataset fair use agreement is in place until <author can specify a data up to 12 months away>



How much does it cost?

There are currently no separate data publishing charges (DPC's) for GigaDB as we currently do not accept data that is not accompanied by a GigaScience manuscript. All DPCs for GigaScience manuscripts are covered by the article publishing charges (APC's) of that manuscript (up to a terabyte automatically included, but contact us if you need more). For APCs of GigaScience manuscripts please see here.

Do I have to pay to download or use the data?

No. All data provided by GigaDB is free to download and use. On occasion when datasets are very large and internet connections are slow, some user may request data to be sent by hard disk, GigaDB cannot bare the cost of this but we will assist in the copy of the data onto the disks and help arrange shipment, but the user will be required to cover the cost of the disks and shipment.

How do I download a large dataset with my slow internet connection ?

There are 2 ways to download data from GigaDB:
  1. FTP. This is the "normal" method, click the download button on any dataset page and this is how your data will be sent.
  2. Hard drive shipment. On occasion when datasets are very large and internet connections are slow, some user may request data to be sent by hard disk, GigaDB cannot bare the cost of this but we will assist in the copy of the data onto the disks and help arrange shipment, but the user will be required to cover the cost of the disks and shipment.


How do I cite data from GigaDB ?

GigaScience supports and has signed the FORCE 11 Joint Declaration of the Data Citation Principles, feeling strongly that data should be accorded the same importance in the scholarly record as citations of publications. GigaDB datasets can and should be cited in the same manner as any other reference, although the format is journal specific based on their instructions. Following DCC and DataCite guildelines, in GigaScience journal the citation within the references section will be of the form:Author List; publication year: "Dataset title", GigaScience Database. DOI. Example.Peter E Larsen; Yang Dai; (2015): Supporting materials for "Metabolome of Human gut microbiome is predictive of host dysbiosis".; GigaScience Database. http://dx.doi.org/10.5524/100163

How do I download information to my citation management software?

On each dataset page there are 3 buttons after the authors names, "RIS", "BIBTEX" and "TEXT" you may use these to download the citation of the dataset in those formats.

What is a dataset?

The term dataset in GigaDB refers to a collection of related works, including but not limited to; files, software, workflows, experiments, data, metadata and results. Each dataset has its own webpage which has a DOI (digital object identifier). These datasets are permanent and citable records of research output designed to allow for a modernization of the classical publishing framework while maintaining the familiarity of citations and metrics thereof.While uncommon, it is possible for a dataset to be made-up of several other datasets in a nested fashion, for example the Avian phylogenomics project data dataset (http://dx.doi.org/10.5524/101000) is a compilation of 48 other datasets, some of those were published before and some at the same time. This allows the original authors to cite just one dataset to cover them all, but also allows future users to cite individual datasets if they require. We will discuss the merits of such procedures on a case-by-case basis with the submitter.

Does my journal work with GigaDB and how?

While we have no formal agreements with any particular journals, we are happy to work with other journals to ensure timely and coherent joint publications, please discuss with the editors (editorial@gigasciencejournal.com)

What is a GigaDB DOI?

A Digial Object Identifier (DOI) is a stable, citable link to an electronic resource. A GigaDB DOI is a stable and citable link to a dataset hosted by GigaDB.

Why does GigaDB use Creative Commons Zero?

It is widely recognized that publicly funded research data should be made publicly available for free to be used by anyone. The Creative Commons Zero (CC0) waiver provides the explicit statement of that fact, and it is transparent to all that the data hosted by GigaDB are all freely available for any use case. CC0 is thought to be the most appropriate method for dedicating data to the public domain, but for more on the rationale and practicalities see this BMC Research Notes editorial. Citation of data usage is greatly encouraged in order to provide the recognition to the data producers, both for their efforts in the production and in their foresight and generosity in making the data CC0.

Can the GigaDB repository help me prepare a data management plan?

At the present time GigaDB doesn't have the resources to assist in data management plans, but there are many useful resources available on the internet, including places like the DCC (UK focus)or the CIC (US focus)

What are the charges for submitting data?

There are currently no separate data publishing charges (DPC's) for GigaDB as we currently do not accept data that is not accompanied by a GigaScience manuscript. All DPCs for GigaScience manuscripts are covered by the article publishing charges (APC's) of that manuscript (up to a terabyte automatically included, but contact us if you need more). For APCs of GigaScience manuscripts please see here.

Why is submission to GigaDB not closely integrated with submission to GigaScience?

Due to various differences in the BMC's editorial tools and the GigaDB system, unfortunately at this time it is not possible to integrate the submission process, but our editors and curators will do everything they can to make the process as smooth as possible for authors.

How are datasets in GigaDB backed up?

We have a regular backup of data, so if you find a corrupt file please let us know and we will replace it with a copy from back-up.

What happens to data after it is submitted?

We will host your data on our private GigaDB server giving access to the reviewers, if the manuscript is accepted we will move the data to our public production server. If it is unsuccessful the data will be deleted.

Can I see how often my dataset is being used and downloaded?

urrrm. Sort-of yes. If a user clicks the download button on the website, it is recorded in the database and you can see on the dataset page how many times this has happened. However, if a file is pulled directly from the FTP server it is currently not recorded in the database. This functionality is on our to-do list and will be addressed as soon as we can.

How may data from GigaDB be reused?

It can be used for anything by anyone, most* data is given the licence CC0 specifically to remove any restrictions on reuse. * - on occasion we host some files for convenience of our users that are already covered by other licences (e.g. more appropriate OSI-compliant licenses for software, or multiple (all open) licenses in a workflow or virtual machine), where this happens we make every effort to make users aware of the different licences.

What is the side-bar on the right of all dataset pages?

This is the hypothes.is annotation tool bar. See what is Hypothes.is for more details.



What is Hypothes.is ?

http://hypothes.is

Hypothes.is is an open source project helping to bring a discussion, annotation and curation layer to the web, we are collaborating with Hypothes.is in order to make all our datasets open to discussion by anyone (with a hypothesis user account). Simply highlight the text of interest on the page and click "New Note" icon that appears. To see previous notes, click the number on the side bar, or open the side bar to see all preious public annotations.

How do I report missing values in my metadata?

A. There are various reason why certain data values may need to not be included in the sample metadata, but you still want it to be compliant with particular Minimum Information standards such the GSC MIxS. To maintain compliance when there are missing values within the mandatory fields please use the following terms only:
Term - Definition
not applicable - information is inappropriate to report, can indicate that the standard itself fails to model or represent the information appropriately
restricted access - information exists but can not be released openly because of privacy concerns
not provided - information is not available at the time of submission, a value may be provided at the later stage
not collected - information was not collected and will therefore never be available