<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:wfw="http://wellformedweb.org/CommentAPI/">
<channel>
<title>GigaDB</title>
<link>http://www.gigadb.org</link>
<description><![CDATA[GigaDB RSS Feed
]]></description>
<language>en-us</language>
<pubDate>Tue, 26 Jun 2018 08:46:51 +0000</pubDate>
<item>
<title>
Supporting data for &quot;Binning Enables Efficient Host Genome Reconstruction in Cnidarian Holobionts&quot;</title>
<link>
http://gigadb.org/dataset/100462</link>
<pubDate>
Mon, 25 Jun 2018 00:00:00 +0000</pubDate>
<description><![CDATA[
Many cnidarians, including stony corals, engage in complex symbiotic associations, comprising the eukaryotic host, photosynthetic algae, and highly diverse microbial communities – together referred to as holobiont. This taxonomic complexity makes sequencing and assembling coral host genomes extremely challenging. Therefore, previous cnidarian genomic projects were based on symbiont-free tissue samples. However, this approach may not be applicable to the majority of cnidarian species for ecological reasons. We therefore evaluated the performance of an alternative method based on sequence binning for reconstructing the genome of the stony coral <i>Porites rus</i> from a hologenomic sample, and compared it to traditional approaches. <br> Our results demonstrate that binning performs well for hologenomic data, producing sufficient reads for assembling the draft genome of <i>P. rus</i>. An assembly evaluation based on operational criteria showed comparable results to symbiont-free approaches in terms of completeness and usefulness, despite a high degree of fragmentation in our assembly. In addition, we found that binning provides sufficient data for exploratory k-mer estimation of genomic features, such as genome size and heterozygosity. <br> Binning constitutes a powerful approach for disentangling taxonomically complex coral hologenomes. Considering the recent decline of coral reefs on the one hand and previous limitations to coral genome sequencing on the other hand, binning may facilitate rapid and reliable genome assembly. This study also provides an important milestone in advancing binning from the metagenomic to the hologenomic and from the prokaryotic to the eukaryotic level.
]]></description>
</item>
<item>
<title>
Supporting data for &quot;SVEngine: an efficient and versatile simulator of genome structural variations with features of cancer clonal evolution&quot;</title>
<link>
http://gigadb.org/dataset/100473</link>
<pubDate>
Thu, 21 Jun 2018 00:00:00 +0000</pubDate>
<description><![CDATA[
Simulating genome sequence data with features can facilitate the development and benchmarking of structural variant analysis programs. However, there are a limited number of data simulators that provide structural variants in silico.  Moreover, there are a paucity of programs that generate structural variants with different allelic fraction and haplotypes. We developed SVEngine, an open source tool to address this need.  SVEngine simulates next generation sequencing data with embedded structural variations.  As input, SVEngine takes template haploid sequences (FASTA) and an external variant file, a variant distribution file and/or a clonal phylogeny tree file (NEWICK) as input.  Subsequently, it simulates and outputs sequence contigs (FASTAs), sequence reads (FASTQs) and/or post-alignment files (BAMs).  All of the files contain the desired variants, along with BED files containing the ground truth.  SVEngine's flexible design process enables one to specify size, position, and allelic fraction for deletion, insertion, duplication, inversion and translocation variants.  Finally, SVEngine simulates sequence data that replicates the characteristics of a sequencing library with mixed sizes of DNA insert molecules.  To improve the compute speed, SVEngine is highly parallelized to reduce the simulation time. We demonstrated the versatile features of SVEngine and its improved runtime comparisons with other available simulators.  SVEngine's features include the simulation of locus-specific variant frequency designed to mimic the phylogeny of cancer clonal evolution.  We validated the accuracy of the simulations.  Our evaluation included checking various sequencing mapping features such as coverage change, read clipping, insert size shift and neighbouring hanging read pairs for representative variant types.  SVEngine is implemented as a standard Python package and is freely available for academic use at: https://bitbucket.org/charade/svengine.
]]></description>
</item>
<item>
<title>
Supporting data for&quot;Hot-starting software containers for STAR aligner&quot;</title>
<link>
http://gigadb.org/dataset/100468</link>
<pubDate>
Tue, 19 Jun 2018 00:00:00 +0000</pubDate>
<description><![CDATA[
Using software containers has become standard practice to reproducibly deploy and execute biomedical workflows on the cloud. However, some applications which contain time-consuming initialization steps will produce unnecessary costs for repeated executions. <br>We demonstrate that hot-starting, from containers that have been frozen after the application has already begun execution, can speed up bioinformatics workflows by avoiding repetitive initialization steps. We use an open source tool called Checkpoint and Restore in Userspace (CRIU) to save the state of the containers as a collection of checkpoint files on disk after it has read in the indices. The resulting checkpoint files are migrated to the host and CRIU is used to regenerate the containers in that ready-to-run hot-start state. As a proof-of-concept example, we create a hot-start container for the STAR aligner and deploy this container to align RNA sequencing data. We compare the performance of the alignment step with and without checkpoints on cloud platforms using local and network disks.<br>We demonstrate that hot-starting Docker containers from snapshots taken after repetitive initialization steps are completed, significantly speeds up the execution of the STAR aligner on all experimental platforms, including Amazon Web Services (AWS), Microsoft Azure and local virtual machines. Our method can be potentially employed in other bioinformatics applications in which a checkpoint can be inserted after a repetitive initialization phase.
]]></description>
</item>
<item>
<title>
Supporting data for &quot;High quality assembly of the reference genome for scarlet sage, Salvia splendens, an economically important ornamental plant&quot;</title>
<link>
http://gigadb.org/dataset/100463</link>
<pubDate>
Tue, 19 Jun 2018 00:00:00 +0000</pubDate>
<description><![CDATA[
Salvia splendens Ker-Gawler, scarlet or tropical sage, is a tender herbaceous perennial widely introduced and seen in public gardens all over the world. With few molecular resources, breeding is still restricted to traditional phenotypic selection, and the genetic mechanisms underlying phenotypic variation still remain unknown. Hence, a high quality reference genome will be very valuable for marker assisted breeding, genome editing or molecular genetics. We generated 66 gigabases (Gb) and 37 Gb of raw DNA sequences, respectively, from whole-genome sequencing of a largely homozygous scarlet sage inbred line using PacBio Single-Molecule Real-Time (SMRT) and Illumina HiSeq sequencing platforms. PacBio de novo assembly yielded a final genome with a scaffold N50 size of 3.12 megabases (Mb), and a total length of 808 Mb. The repetitive sequences identified accounted for 57.52% of the genome sequence and 54,008 protein-coding genes were predicted collectively with ab initio and homology-based gene prediction from the masked genome. The divergence time between S. splendens and S. miltiorrhiza was estimated with 28.21 million years ago (Mya). Moreover, 3,797 species-specific genes and 1,187 expanded gene families were identified for the scarlet sage genome. We provide the first genome sequence and gene annotation for the scarlet sage. The availability of these resources will be of great importance for further breeding strategies, genome editing and also for comparative genomics among related species.
]]></description>
</item>
<item>
<title>
Supporting data for &quot;A workflow for simplified analysis of ATAC-cap-seq data in R&quot;</title>
<link>
http://gigadb.org/dataset/100476</link>
<pubDate>
Tue, 19 Jun 2018 00:00:00 +0000</pubDate>
<description><![CDATA[
ATAC-cap-seq is a high-throughput sequencing method that combines targeted nucleic acid enrichment of precipitated DNA fragments with an upstream ATAC-seq step. There are increased analytical difficulties arising from working with a set of regions of interest that may be small in number and biologically dependent. Common statistical pipelines for RNAseq might be assumed to apply but can give misleading results on ATAC-cap-seq data. A tool is needed to allow a non-specialist user to quickly and easily summarise data and apply sensible and effective normalisation and analysis. We developed atacR to allow a user to easily analyse their ATAC enrichment experiment. It provides comprehensive summary functions and diagnostic plots for studying enriched tag abundance. Applying between-sample normalisation is made straightforward and functions for normalising based on user-defined control regions, whole library size and regions selected from the least variable regions in a dataset are provided. Three methods for detecting differential abundance of tags from enriched methods are provided, including Bootstrap t, Bayes Factor and a wrapped version of the standard exact test in the edgeR package. We compared the precision, recall and F-score of each detection method on resampled datasets at varying replicate, significance threshold and genes changed, we found that the Bayes factor method had greatest overall detection power, though edgeR was slightly stronger in simulations with lower numbers of genes changed. Our package allows a non-specialist user to easily and effectively apply methods appropriate to the analysis of ATAC-cap-seq in a reproducible manner. The package is implemented in pure R and is fully interoperable with common workflows in Bioconductor.
]]></description>
</item>
<item>
<title>
Supporting data for &quot;A similarity-based approach to leverage multi-cohort medical data on the diagnosis and prognosis of Alzheimer&#039;s disease&quot;</title>
<link>
http://gigadb.org/dataset/100471</link>
<pubDate>
Fri, 15 Jun 2018 00:00:00 +0000</pubDate>
<description><![CDATA[
Heterogeneous diseases such as Alzheimer's disease manifest a variety of phenotypes among populations. Early diagnosis and effective treatment offer costbenefits. Many studies on biochemical and imaging markers have shown potential promise in improving diagnosis, yet establishing quantitative diagnostic criteria for ancillary tests remains challenging. <br> We have developed a similarity-based approach that matches individuals to subjects with similar conditions. We modeled the disease with a Gaussian process, and tested the method in the Alzheimer's Disease Big Data DREAM Challenge. Ranked the highest among submitted methods, our diagnostic model predicted cognitive impairment scores in an independent dataset test with a correlation score of 0.573. It differentiated Alzheimer's disease patients from control subjects with an area under receiver operating curve of 0.920. Without knowing longitudial information about subjects, the model predicted patients that are vulnerable to MCI-to-AD conversion through the similarity network. This diagnostic framework can be applied to other diseases with clinical heterogeneity, such as Parkinson's disease.
]]></description>
</item>
<item>
<title>
Supporting data for &quot;Arabidopsis phenotyping through Geometric Morphometrics&quot;</title>
<link>
http://gigadb.org/dataset/100457</link>
<pubDate>
Mon, 11 Jun 2018 00:00:00 +0000</pubDate>
<description><![CDATA[
Recently, much technical progress was done regarding plant phenotyping. High-throughput platforms and the development of improved algorithms for the rosette image segmentation make now possible to massively extract shape and size parameters for genetic, physiological and environmental studies. The development of low-cost phenotyping platforms and freeware resources make it possible to widely expand phenotypic analysis tools for Arabidopsis. However, objective descriptors of shape parameters that could be used independently of platform and segmentation software used are still lacking and shape descriptions still rely on ad hoc or even sometimes contradictory descriptors, which could make comparisons difficult and perhaps inaccurate. Modern geometric morphometrics is a family of methods in quantitative biology proposed to be the main source of data and analytical tools in the emerging field of phenomics studies. Based on the location of landmarks (corresponding points) over imaged specimens and by combining geometry, multivariate analysis and powerful statistical techniques, these tools offer the possibility to reproducibly and accurately account for shape variations amongst groups and measure them in shape distance units. Here, it is proposed a particular scheme of landmarks placement on Arabidopsis rosette images to study shape variation in the case of viral infection processes. Shape differences between controls and infected plants are quantified throughout the infectious process and visualized. Quantitative comparisons between two unrelated ssRNA+ viruses are shown and reproducibility issues are assessed. Combined with the newest automated platforms and plant segmentation procedures, geometric morphometric tools could boost phenotypic features extraction and processing in an objective, reproducible manner.
]]></description>
</item>
<item>
<title>
Supporting data for &quot;Long-read sequencing and de novo genome assembly of Ammopiptanthus nanus, a desert shrub&quot;</title>
<link>
http://gigadb.org/dataset/100466</link>
<pubDate>
Fri, 08 Jun 2018 00:00:00 +0000</pubDate>
<description><![CDATA[
Ammopiptanthus nanus is a rare broad-leaved shrub in the desert and arid regions of Central Asia. This plant species exhibits extremely high tolerance to drought and freezing and has been used in abiotic tolerance research in plants. As a relic of the Tertiary period, A. nanus is of great significance to plant biogeographic research in the ancient Mediterranean region. Here we report a draft genome assembly using the PacBio platform and gene annotation for A. nanus. <br> A total of 64.72 gigabases (Gb) of raw PacBio Sequel reads were generated from four 20 kb libraries. After filtering, 64.53 Gb of clean reads were obtained, giving 72.59 × coverage depth. Assembly using Canu gave an assembly length of 823.74 Mb, with a contig N50 of 2.76 Mb. The final size of the assembled A. nanus genome was close to the 889 Mb estimated by k-mer analysis. The gene annotation completeness was evaluated by BUSCO, and 1,327 out of the 1,440 conserved genes (92.15%) could be found in the A. nanus assembly. Genome annotation revealed that 74.08% of the A. nanus genome is composed of repetitive elements, and 53.44% of long terminal repeat elements (LTRs). We predicted 37,188 protein-coding genes, of which 96.53% were functionally annotated.<br> The genomic sequences of A. nanus could be a valuable source for comparative genomic analysis in the legume family, and will be useful for understanding the phylogenetic relationships of the Thermopsideae and the evolutionary response of plant species to the Qinghai Tibetan Plateau uplift.
]]></description>
</item>
<item>
<title>
Supporting data for &quot;Chromosome-level reference genome of the Siamese fighting fish Betta splendens, a model species for the study of aggression&quot;</title>
<link>
http://gigadb.org/dataset/100433</link>
<pubDate>
Fri, 08 Jun 2018 00:00:00 +0000</pubDate>
<description><![CDATA[
Siamese fighting fish Betta splendens are notorious for their aggressiveness and accordingly have been widely used to study aggression. However, the lack of a reference genome has so far limited the understanding of the genetic basis of aggression in this species. Here we present the first reference genome assembly of the Siamese fighting fish.<br>We first sequenced and de novo assembled a 465.24 Mb genome for the B. splendens variety Giant, with a weighted average (N50) scaffold size of 949.03 Kb and an N50 contig size of 19.01 Kb, covering 99.93% of the estimated genome size. To obtain a chromosome-level genome assembly, we constructed one Hi-C library and sequenced 75.24 Gb reads using the BGISEQ-500 platform. We anchored approximately 93% of the scaffold sequences into 21 chromosomes and evaluated the quality of our assembly using the high contact frequency heatmap and BUSCO. We also performed comparative chromosome analyses between Oryzias latipes and B. splendens, revealing a chromosome conservation evolution in B. splendens. We predicted a total of 23,981 genes assisted by RNA-seq data generated from brain, liver, muscle and heart tissues of Giant, and annotated 15% repetitive sequences in the genome. Additionally, we resequenced other five B. splendens varieties and detected ~3.4M single-nucleotide variations (SNVs) and 27,305 indels.<br>We provide the first chromosome-level genome for the Siamese fighting fish. The genome will lay a valuable foundation for future research on  aggression in B. splendens.
]]></description>
</item>
<item>
<title>
Supporting data for &quot;AMBER: Assessment of Metagenome BinnERs&quot;</title>
<link>
http://gigadb.org/dataset/100454</link>
<pubDate>
Thu, 07 Jun 2018 00:00:00 +0000</pubDate>
<description><![CDATA[
Reconstructing the genomes of microbial community members is key to the interpretation of shotgun metagenome samples. Genome binning programs deconvolute reads or assembled contigs of such samples into individual bins, but assessing their quality is difficult due to the lack of evaluation software and standardized metrics. We present AMBER, an evaluation package for the comparative assessment of genome reconstructions from metagenome benchmark data sets. It calculates the performance metrics and comparative visualizations used in the first benchmarking challenge of the Initiative for the Critical Assessment of Metagenome Interpretation (CAMI). As an application, we show the outputs of AMBER for eleven different binning programs on two CAMI benchmark data sets. AMBER is implemented in Python and available under the Apache 2.0 license on <a href="https://github.com/CAMI-challenge/AMBER"target=_blank>GitHub.<a/>
]]></description>
</item>
</channel></rss>