<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:wfw="http://wellformedweb.org/CommentAPI/">
<channel>
<title>GigaDB</title>
<link>http://www.gigadb.org</link>
<description><![CDATA[GigaDB RSS Feed
]]></description>
<language>en-us</language>
<pubDate>Fri, 14 Jul 2017 10:26:14 +0000</pubDate>
<item>
<title>
Supporting data for &quot;Population-wide Sampling of Retrotransposon Insertion Polymorphisms Using Deep Sequencing and Efficient Detection&quot;</title>
<link>
http://gigadb.org/dataset/100318</link>
<pubDate>
Thu, 13 Jul 2017 00:00:00 +0000</pubDate>
<description><![CDATA[
Active retrotransposons play important roles during evolution and continue to shape our genomes today, especially in genetic polymorphisms underlying a diverse set of diseases. However, studies of human retrotransposon insertion polymorphisms (RIPs) based on whole-genome deep sequencing at the population level have not been sufficiently undertaken, despite the obvious need for a thorough characterization of RIPs in the general population.<br>
Herein, we present a novel and efficient computational tool named Specific Insertions Detector (SID) for the detection of non-reference RIPs. We demonstrate that SID is suitable for high depth whole-genome sequencing (WGS) data using paired-end reads obtained from simulated and real datasets. We construct a comprehensive RIP database using a large population of 90 Han Chinese individuals with a mean 68× depth per individual. In total, we identify 9342 recent RIPs, and 8433 of these RIPs are novel compared with dbRIP, including 5826 Alu, 2169 long interspersed nuclear element 1 (L1), 383 SVA, and 55 long terminal repeats (LTR). Among the 9342 RIPs, 4828 were located in gene regions and five were located in protein-coding regions. We demonstrate that RIPs can, in principle, be an informative resource to perform population evolution and phylogenetic analyses. Taking the demographic effects into account, we identify a weak negative selection on SVA and L1 but approximately neutral selection for Alu elements based on the frequency spectrum of RIPs.<br>
SID is a powerful open-source program for the detection of non-reference RIPs. We built a non-reference RIP dataset that greatly enhanced the diversity of RIPs detected in the general population and should be invaluable to researchers interested in many aspects of human evolution, genetics, and disease. As a proof-of-concept, we demonstrate that the RIPs can be used as biomarkers in a similar way as single nucleotide polymorphisms (SNPs).
<br>
]]></description>
</item>
<item>
<title>
Supporting data for &quot;Deep whole-genome sequencing of 90 Han Chinese genomes&quot;</title>
<link>
http://gigadb.org/dataset/100302</link>
<pubDate>
Wed, 12 Jul 2017 00:00:00 +0000</pubDate>
<description><![CDATA[
Next generation sequencing provides a high-resolution insight into human genetic information. However, the focus of previous studies has primarily been on low-coverage data, due to the high cost of sequencing. Although the 1000 Genomes Project and the Haplotype Reference Consortium have both provided powerful reference panels for imputation, low frequency and novel variants remain difficult to discover and call with accuracy on the basis of low-coverage data. Deep sequencing provides an optimal solution for the problem of these low frequency and novel variants. Although whole exome sequencing is also a viable choice for exome regions, it cannot account for noncoding regions, sometimes resulting in the absence of  important, causal variants. For Han Chinese populations, the majority of variants have been discovered based upon low-coverage data from the 1000 Genomes Project. However, high-coverage, whole genome sequencing data is limited for any population, and a large amount of low-frequency, population-specific variants remains uncharacterized.
We have performed whole genome sequencing at high depth (~80X) of 90 unrelated individuals of Chinese ancestry, collected from the 1000 Genome Project samples, including 45 North Han Chinese and 45 South Han Chinese samples. 83 of these 90 have not been sequenced by the 1000 Genomes Project. We have identified 12,568,804 single nucleotide polymorphisms, 2,074,734 short InDels and 26,142 structural variations from these 90 samples. Compared to the Han Chinese data from the 1000 Genomes Project, we have found 7,007,685 novel variants with low frequency (defined as minor allele frequency < 5%), including 5,816,839 SNPs, 1,172,919 InDels, and 17,927 structural variants.
Using deep sequencing data, we have built a greatly expanded spectrum of genetic variation for the Han Chinese genome. Compared to the 1000 Genomes Project, this Han Chinese deep sequencing data enhances characterization of a large number of low-frequency, novel variants. This will be a valuable resource for promoting Chinese genetics research and medical development. Additionally, it will provide a valuable supplement for the 1000 Genomes Project, as well as for other human genome projects.

]]></description>
</item>
<item>
<title>
Supporting data for &quot;Connections between human gut microbiome and gestational diabetes mellitus&quot;</title>
<link>
http://gigadb.org/dataset/100326</link>
<pubDate>
Tue, 11 Jul 2017 00:00:00 +0000</pubDate>
<description><![CDATA[
The human gut microbiome can modulate metabolic health and affect insulin resistance, and may play an important role in the etiology of gestational diabetes mellitus (GDM). Here, we compared the gut microbial composition of 43 GDM patients and 81 healthy pregnant women via whole-metagenome shotgun sequencing of their fecal samples collected at 21-29 weeks, to explore associations between GDM and the composition of microbial taxonomic units and functional genes.
Metagenome-wide association study (MGWAS) identified 154,837 genes, which clustered into 129 metagenome linkage groups (MLGs) for species description, with significant relative abundance differences between the two cohorts. Parabacteroides distasonis, Klebsiella variicola, etc., were enriched in GDM patients, whereas Methanobrevibacter smithii, Alistipes spp., Bifidobacterium spp. and Eubacterium spp. were enriched in controls. The ratios of the gross abundances of GDM-enriched MLGs to control-enriched MLGs were positively correlated with blood glucose levels. Random Forest model shows fecal MLGs have excellent discriminatory power to predict GDM status. 

]]></description>
</item>
<item>
<title>
Supporting data for “The pearl oyster Pinctada fucata martensii genome and multi-omic analyses provide insights into biomineralization”</title>
<link>
http://gigadb.org/dataset/100240</link>
<pubDate>
Tue, 11 Jul 2017 00:00:00 +0000</pubDate>
<description><![CDATA[
Nacre, the iridescent material found in pearls and shells of molluscs, is formed through an extraordinary process of matrix-assisted biomineralization. Despite recent advances, many parts of the biomineralization process and its evolutionary origin remain a mystery. The pearl oyster Pinctada fucata martensii is a well-known master of biomineralization, but the molecular mechanisms underlie its production of remarkable shells and pearls is not fully understood.
We sequenced the highly polymorphic genome of the pearl oyster and conducted multi-omic and biochemical studies to probe nacre formation. We identified a large set of novel proteins participating in matrix-framework formation, many in expanded families, including components similar to that found in vertebrate bones such as collagen-related VWA-containing proteins (VWAP), fibronectin, chondroitin sulfotransferases and regulatory elements. 
Considering that there are only collagen-based matrices in vertebrate bones and chitin-based matrices in most invertebrate skeletons, the presence of both chitin and elements of collagen-based matrices in nacre matrices suggests that elements of chitin- and collagen-based matrices are deeply rooted and might be part of an ancient biomineralizing matrix. Our results expand the current shell matrix-framework model and provide new insights into the evolution of diverse biomineralization systems.
]]></description>
</item>
<item>
<title>
Supporting data for &quot;Draft genome of the Antarctic dragonfish, Parachaenichthys charcoti&quot;</title>
<link>
http://gigadb.org/dataset/100321</link>
<pubDate>
Fri, 07 Jul 2017 00:00:00 +0000</pubDate>
<description><![CDATA[
The Antarctic bathydraconid dragonfish, Parachaenichthys charcoti, is an Antarctic notothenioid teleost endemic to the Southern Ocean.
The Southern Ocean has cooled to ?1.8C over the past 30 million years, and the seawater had retained cold temperature and isolated oceanic environment by Antarctic Circumpolar Current (ACC).  Notothenioids dominate Antarctic fish, making up 90% biomass and all notothenioids have undergone molecular and ecological diversification to survive in this cold environment. Therefore, they are considered an attractive Antarctic fish model for evolutionary and ancestral genomic studies. Bathydraconidae is a speciose family of the Notothenioidei, the dominant taxonomic component of Antarctic teleosts. To understand the process of evolution of Antarctic fish, we select a typical Antarctic bathydraconid dragonfish, P. charcoti. Here, we have sequenced, de novo assembled and annotated a comprehensive genome from P. charcoti. <br>
The draft genome of P. charcoti is 709 Mb in size. The N50 contig length is 6,145 bp and its N50 scaffold length 178,362 kb. The genome of P. charcoti is predicted to contain 32,712 genes, 18,455 of which have been assigned preliminary functions. A total of 8,951 orthologous groups common to seven species fish were identified, while 333 genes were identified in P. charcoti only; 2,519 orthologous group were also identified in both P. charcoti and N. coriiceps, another Antarctic fish. Four gene ontology (GO) terms were statistically overrepresented among the 333 genes unique to P. charcoti, according to GO enrichment analysis.<br>
The draft P. charcoti genome will broaden our understanding of the evolution of Antarctic fish in their extreme environment.  It will provide a basis for further investigating the unusual characteristics of Antarctic fishes.

]]></description>
</item>
<item>
<title>
Supporting data for &quot;Transcriptome Analysis of the Response of Burmese Python to Digestion&quot;</title>
<link>
http://gigadb.org/dataset/100320</link>
<pubDate>
Thu, 06 Jul 2017 00:00:00 +0000</pubDate>
<description><![CDATA[
Exceptional and extreme feeding behaviour makes the Burmese python (Python bivittatus) an interesting model to study physiological remodelling and metabolic adaptation in response to refeeding after prolonged starvation. In this study, we used transcriptome sequencing of five visceral organs during fasting as well as 24h and 48h after ingestion of a large meal to unravel the postprandial changes in Burmese pythons. We first used the pooled data to perform a de novo assembly of the transcriptome and supplemented this with a proteomic survey of enzymes in the plasma and gastric fluid. <br>
We constructed a high-quality transcriptome with 34,423 transcripts of which 19,713 (57%) were annotated. Among highly expressed genes (FPKM>100 in one tissue) we found the transition from fasting to digestion was associated with differential expression of 43 genes in the heart, 206 genes in the liver, 114 genes in the stomach, 89 genes in the pancreas and 158 genes in the intestine. We interrogated the function of these genes to test previous hypotheses on the response to feeding. We also used the transcriptome to identify 314 secreted proteins in the gastric fluid of the python. <br>
Digestion was associated with an upregulation of genes related to metabolic processes, and translational changes therefore appears to support the postprandial rise in metabolism. We identify stomach-related proteins from a digesting individual and demonstrate that the sensitivity of modern LC-MS/MS equipment allows the identification of gastric juice proteins that are present during digestion.

]]></description>
</item>
<item>
<title>
Supporting data for &quot;A recurrence based approach for validating structural variation using long-read sequencing technology&quot;</title>
<link>
http://gigadb.org/dataset/100325</link>
<pubDate>
Thu, 06 Jul 2017 00:00:00 +0000</pubDate>
<description><![CDATA[
Although numerous algorithms have been developed to identify structural variation (SVs) in genomic sequences, there is a dearth of approaches that can be used to evaluate their results. This is significant, as the accurate identification of structural variation is still an outstanding but important problem in genomics. The emergence of new sequencing technologies that generate longer sequence reads can, in theory, provide direct evidence for all types of SVs regardless of the length of region through which it spans. However, current efforts to use these data in this manner require the use of large computational resources to assemble these sequences as well as visual inspection of each region.<br>
Here we present VaPoR, a highly efficient algorithm that autonomously validates large SV sets using long read sequencing data. We assessed the performance of VaPoR on SVs in both simulated and real genomes and report a high-fidelity rate for overall accuracy across different levels of sequence depths. We show that VaPoR can interrogate a much larger range of SVs while still matching existing methods in terms of false positive validations and providing additional features considering breakpoint precision and predicted genotype. We further show that VaPoR can run quickly and efficiency without requiring a large processing or assembly pipeline.<br>
VaPoR provides a long read based validation approach for genomic SVs that requires relatively low read depth and computing resources and thus will provide utility with targeted or low-pass sequencing coverage for accurate SV assessment. The VaPoR Software is available at: https://github.com/mills-lab/vapor.
]]></description>
</item>
<item>
<title>
Supporting data for &quot;High precision registration between zebrafish brain atlases using symmetric diffeomorphic normalization&quot;</title>
<link>
http://gigadb.org/dataset/100322</link>
<pubDate>
Wed, 05 Jul 2017 00:00:00 +0000</pubDate>
<description><![CDATA[
Atlases provide a framework for spatially-mapping information from diverse sources into a common reference space. Specifically, brain atlases allow annotation of gene expression, cell morphology, connectivity, and activity. In larval zebrafish, advances in genetics, imaging, and computational methods now allow the collection of such information brain-wide. However, due to technical considerations, disparate datasets may use different references and may not be aligned to the same coordinate space. Two recent larval zebrafish atlases exemplify this problem: Z-Brain, containing gene expression, neural activity and neuroanatomical segmentations, was acquired using immunohistochemical stains, while the Zebrafish Brain Browser (ZBB) was constructed from live scans of fluorescent reporters in transgenic larvae. Although different references were used, the atlases included several common transgenic patterns that provide potential 'bridges' for transforming each into the other's coordinate space. We tested multiple bridging channels and registration algorithms and found that the symmetric diffeomorphic normalization (SyN) algorithm improved live brain registration precision while better preserving cell morphology than B-spline based registrations. SyN also corrected for tissue distortion introduced during fixation. Multi-reference channel optimization provided a transformation that enabled Z-Brain and ZBB to be co-aligned with precision of approximately a single cell diameter and minimal perturbation of cell and tissue morphology. Finally, we developed software to visualize brain regions in 3-dimensions, including a virtual reality neuroanatomy explorer. This study demonstrates the feasibility of integrating whole brain datasets, despite disparate reference templates and acquisition protocols, when sufficient information is present for bridging. Increased accuracy and interoperability of zebrafish digital brain atlases will facilitate neurobiological studies.
]]></description>
</item>
<item>
<title>
Supporting data for &quot;Large-scale phenomics analysis of a T-DNA tagged mutant population.&quot;</title>
<link>
http://gigadb.org/dataset/100314</link>
<pubDate>
Wed, 05 Jul 2017 00:00:00 +0000</pubDate>
<description><![CDATA[
Rice, Oryza sativa L., is one of the most important crops in the world. With the rising world population, feeding people in a more sustainable and environment-friendly way becomes increasingly important. Therefore, rice research community needs to share resources to better understand functions of rice genes that are the foundation for future agricultural biotechnology development, and one way to achieve this goal is via the extensive study of insertional mutants.<br>
We have constructed a large rice insertional mutant population in a japonica rice variety, Tainung 67. The collection contains about 93,000 mutant lines, among them 85% with phenomics data and 65% with flanking sequence data. We screened the phenotypes of 12 individual plants for each line grown under field conditions according to 68 subcategories and 3 quantitative traits. Both phenotypes and integration sites are searchable in the database at Taiwan Rice Insertional Mutants Database (http://trim.sinica.edu.tw).<br>
Detailed analyses of phenomics data, T-DNA flanking sequences and whole-genome sequencing data for rice insertional mutants can lead to the discovery of novel genes. In addition, studies of mutant phenotypes can reveal relationships among varieties, cultivation locations, and cropping seasons.

]]></description>
</item>
<item>
<title>
Supporting data for &quot;iCAVE: an open source tool for visualizing biomolecular networks in 3D, stereoscopic 3D and immersive 3D&quot;</title>
<link>
http://gigadb.org/dataset/100288</link>
<pubDate>
Tue, 04 Jul 2017 00:00:00 +0000</pubDate>
<description><![CDATA[
Visualizations of biomolecular networks assist in systems-level data exploration in many cellular processes. Data generated from high-throughput experiments increasingly inform these networks, yet current tools do not adequately scale with concomitant increase in their size and complexity.
We present an open-source software platform, interactome-CAVE (iCAVE), for visualizing large and complex biomolecular interaction networks in three dimensions (3D). Users can explore networks (i) in 3D using a desktop; (ii) in stereoscopic 3D using 3D-vision glasses and a desktop; or (iii) in immersive 3D within a CAVE environment. iCAVE introduces 3D extensions of known 2D network layout, clustering, and edgebundling algorithms, as well as new 3D network layout algorithms. Furthermore, users can simultaneously query several built-in databases within iCAVE for network generation, or visualize their own networks (e.g. disease, drug, protein, metabolite). iCAVE has modular structure that allows rapid development by addition of algorithms, datasets or features without affecting other parts of the code.
Overall, iCAVE is the first freely available open source tool that enables 3D (optionally stereoscopic or immersive) visualizations of complex, dense or multi-layered biomolecular networks. While primarily designed for researchers utilizing biomolecular networks, iCAVE can assist researchers in any field.
]]></description>
</item>
</channel></rss>