Supporting data for "Chromatin conformation capture (Hi-C) sequencing of patient-derived xenografts: analysis guidelines"

Dataset type: Software, Bioinformatics
Data released on March 05, 2021

Dozmorov M; Tyc KM; Sheffield NC; Boyd DC; Olex AL; Reed J; Harrell JC (2021): Supporting data for "Chromatin conformation capture (Hi-C) sequencing of patient-derived xenografts: analysis guidelines" GigaScience Database. http://dx.doi.org/10.5524/100870

DOI10.5524/100870

Sequencing of patient-derived xenograft (PDX) mouse models allows investigation of the molecular mechanisms of human tumor samples engrafted in a mouse host. Thus, both human and mouse genetic material is sequenced. Several methods have been developed to remove mouse sequencing reads from RNA-seq or exome sequencing PDX data and improve the downstream signal. However, for more recent chromatin conformation capture technologies (Hi-C), the effect of mouse reads remains undefined. We evaluated the effect of mouse read removal on the quality of Hi-C data using in silico created PDX Hi-C data with 10% and 30% mouse reads. Additionally, we generated two experimental PDX Hi-C datasets using different library preparation strategies. We evaluated three alignment strategies (Direct, Xenome, Combined) and three pipelines (Juicer, HiC-Pro, HiCExplorer) on Hi-C data quality. Removal of mouse reads had little-to-no effect on data quality than the results obtained with the Direct alignment strategy. Juicer extracted more valid chromatin interactions for Hi-C matrices, regardless of the mouse read removal strategy. However, the pipeline effect was minimal, while the library preparation strategy had the largest effect on all quality metrics. Together, our study presents comprehensive guidelines on PDX Hi-C data processing.

Additional details

Read the peer-reviewed publication(s):

(PubMed: 33880552)

Github links:

https://github.com/dozmorovlab/PDX-HiC_processingScripts

Accessions (data generated as part of this study):

BioProject: PRJNA668904





Sample IDTaxonomic IDCommon NameGenbank NameScientific NameSample Attributes
UCD52_CR_Arima_rep19606HumanhumanHomo sapiens Description:Hi-C sequence of patient-derived xenog...
Alternative accession-BioSample:SAMN16427194
Analyte type:DNA
...
+
UCD52_CR_Arima_rep29606HumanhumanHomo sapiens Description:Hi-C sequence of patient-derived xenog...
Alternative accession-BioSample:SAMN16427195
Analyte type:DNA
...
+
UCD52_CR_Phase_rep19606HumanhumanHomo sapiens Description:Hi-C sequence of patient-derived xenog...
Alternative accession-BioSample:SAMN16427192
Analyte type:DNA
...
+
UCD52_CR_Phase_rep29606HumanhumanHomo sapiens Description:Hi-C sequence of patient-derived xenog...
Alternative accession-BioSample:SAMN16427193
Analyte type:DNA
...
+
Displaying 1-4 of 4 Sample(s).




File NameSample IDData TypeFile FormatSizeRelease Date 
OtherHDF5341.61 MB2021-03-02
OtherHDF5179.42 MB2021-03-02
OtherUNKNOWN2.51 GB2021-03-02
OtherHDF5318.5 MB2021-03-02
OtherHDF5847.31 MB2021-03-02
OtherHDF5855.67 MB2021-03-02
OtherUNKNOWN1.38 GB2021-03-02
OtherHDF5821 MB2021-03-02
OtherBED8.98 MB2021-03-02
OtherHDF5310.36 MB2021-03-02
Displaying 1-10 of 91 File(s).
Funding body Awardee Award ID Comments
Pharmaceutical Research and Manufacturers of America Foundation M Dozmorov Research Informatics Award
National Cancer Institute J Chuck Harrell 1R01CA246182-01A1
Susan G. Komen Foundation J Chuck Harrell CCR19608826
Date Action
March 4, 2021 Dataset publish
March 15, 2021 Manuscript Link added : 10.1093/gigascience/giab022
November 29, 2021 Manuscript Link updated : 10.1093/gigascience/giab022