Supporting data for "Genome Diversity in Ukraine"

Dataset type: Bioinformatics
Data released on December 11, 2020

The main goal of this collaborative effort is to provide genome wide data for the previously underrepresented population in Eastern Europe, and to provide cross-validation of the data from genome sequences and genotypes of the same individuals acquired by different technologies. We collected 97 genome-grade DNA samples from consented individuals representing major regions of Ukraine that were consented for the public data release. DNBSEQ-G50 sequences, and genotypes by an Illumina GWAS chip were cross-validated on multiple samples, and additionally referenced to one sample that has been resequenced by Illumina NovaSeq6000 S4 at high coverage. The genome data has been searched for genomic variation represented in this population, and a number of variants have been reported: large structural variants, indels, CNVs, SNPs and microsatellites. This study provides the largest to-date survey of genetic variation in Ukraine, creating a public reference resource aiming to provide data for historic and medical research in a large understudied population. While most of the common variation is shared with other European populations, this survey of population variation contributes a number of novel SNPs and structural variants that have not been reported in the gnomAD/1KG databases representing global distribution of genomic variation. These endemic variants will become a valuable resource for designing future population and clinical studies, help address questions about ancestry and admixture, and will fill a missing place in the puzzle characterizing human population diversity in Eastern Europe. Our results indicate that genetic diversity of the Ukrainian population is uniquely shaped by the evolutionary and demographic forces, and cannot be ignored in the future genetic and biomedical studies. This data will contribute a wealth of new information bringing forth different risk and/or protective alleles. The newly discovered low frequency and local variants can be added to the current genotyping arrays for genome wide association studies, clinical trials, and in genome assessment of proliferating cancer cells.

Additional details

Read the peer-reviewed publication(s):


Accessions (data generated as part of this study):

BioProject: PRJNA661978





Sample IDTaxonomic IDCommon NameGenbank NameScientific NameSample Attributes
EG6000019606HumanhumanHomo sapiens Description:Genomic DNA extracted from blood of on...
Alternative accession-BioSample:SAMN16072365
Sex:female
...
+
EG6000029606HumanhumanHomo sapiens Description:Genomic DNA extracted from blood of on...
Alternative accession-BioSample:SAMN16072366
Sex:female
...
+
EG6000039606HumanhumanHomo sapiens Description:Genomic DNA extracted from blood of on...
Alternative accession-BioSample:SAMN16072367
Sex:female
...
+
EG6000049606HumanhumanHomo sapiens Description:Genomic DNA extracted from blood of on...
Alternative accession-BioSample:SAMN16072368
Sex:female
...
+
EG6000059606HumanhumanHomo sapiens Description:Genomic DNA extracted from blood of on...
Alternative accession-BioSample:SAMN16072369
Sex:female
...
+
EG6000069606HumanhumanHomo sapiens Description:Genomic DNA extracted from blood of on...
Alternative accession-BioSample:SAMN16072370
Sex:female
...
+
EG6000079606HumanhumanHomo sapiens Description:Genomic DNA extracted from blood of on...
Alternative accession-BioSample:SAMN16072371
Sex:female
...
+
EG6000089606HumanhumanHomo sapiens Description:Genomic DNA extracted from blood of on...
Alternative accession-BioSample:SAMN16072372
Sex:male
...
+
EG6000099606HumanhumanHomo sapiens Description:Genomic DNA extracted from blood of on...
Alternative accession-BioSample:SAMN16072373
Sex:male
...
+
EG6000109606HumanhumanHomo sapiens Description:Genomic DNA extracted from blood of on...
Alternative accession-BioSample:SAMN16072374
Sex:female
...
+
Displaying 1-10 of 97 Sample(s).




File NameSample IDData TypeFile FormatSizeRelease Date 
Otherzip10.27 MB2020-12-03
mixed archivezip26.12 KB2020-12-03
scriptTEXT4.21 KB2020-12-03
textDOC461.25 KB2020-11-25
otherPDF1002.51 KB2020-11-25
mixed archivezip29.12 KB2020-11-25
Tabular DataCSV9.98 KB2020-11-25
Mixed archiveTAR656.33 MB2020-12-03
Sequence variantsGZIP11.93 GB2020-11-25
Displaying 1-9 of 9 File(s).
Date Action
December 11, 2020 Dataset publish
December 21, 2020 Manuscript Link added : 10.1093/gigascience/giaa159