Supporting data for "A dataset of images and morphological profiles of 30,000 small-molecule treatments using the Cell Painting assay"

Dataset type: Imaging
Data released on October 04, 2017

Shamji AF; Carpenter AE; Bray AM; Hon CS; Walpita D; Bittker JA; Sokolnicki KL; Li K; Wawer MJ; Kemp MM; Bodycombe NE; Clemons PA; Singh S; Gustafsdottir SM; Schreiber SL; Hasaka TP; Golub TR; Ljosa V; Dančík V (2017): Supporting data for "A dataset of images and morphological profiles of 30,000 small-molecule treatments using the Cell Painting assay" GigaScience Database.


Large-scale image sets acquired by automated microscopy of perturbed samples enable a detailed comparison of cell states induced by each perturbation, such as a small molecule from a diverse library. Highly multiplexed measurements of cellular morphology can be extracted from each image and subsequently mined for a number of applications.
​This microscopy data set includes 919,265 five-channel fields of view representing 30,616 tested compounds, available at The Cell Image Library repository. It also includes data files containing morphological features derived from each cell in each image, both at the single-cell level and population-averaged (i.e., per-well) level; the image analysis workflows that generated the morphological features are also provided. Quality-control metrics are provided as metadata, indicating fields of view that are out-of-focus or containing highly fluorescent material or debris. Lastly, chemical annotations are supplied for the compound treatments applied.
Because computational algorithms and methods for handling single-cell morphological measurements are not yet routine, the dataset serves as a useful resource for the wider scientific community applying morphological (image-based) profiling. The data set can be mined for many purposes, including small-molecule library enrichment and chemical mechanism-of-action studies, such as target identification. Integration with genetically-perturbed datasets could enable identification of small-molecule mimetics of particular disease- or gene-related phenotypes that could be useful as probes or potential starting points for development of future therapeutics.
The current version of the dataset was generated using updated CellProfiler pipelines to improve the quality of cell and nucleus segmentations. To evaluate the quality of segmentations, 30 wells were randomly sampled across seven plate maps from the bioactive compound collection, and one site was randomly sampled per well, producing a test set of 210 five-channel images. Based on this set, two expert CellProfiler users produced an improved segmentation pipeline that was used to reprocess all the 406 plates. The updated pipeline also produces more measurements per cell (n=1783) compared to the previous version. This version of the dataset contains 7 fewer plates 406 compared to 413). Six of these (PlateIDs = 26782, 26783, 26784, 26791, 26792, 26796) correspond to plates that had later been reimaged and should not have been included in analysis. One set (25723) had several inconsistencies in the files that could not be resolved, and was therefore excluded.

Additional details

Read the peer-reviewed publication(s):

(PubMed: 28327978)

Related datasets:

doi:10.5524/100351 IsNewVersionOf doi:10.5524/100200

Additional information:

Sample IDTaxonomic IDCommon NameGenbank NameScientific NameSample Attributes
U2OS cell line9606HumanhumanHomo sapiens Sex:female
Disease status:osteocarcoma
Displaying 1-1 of 1 Sample(s).

File NameSample IDData TypeFile FormatSizeRelease Date 
Tabular dataCSV4.93 MB2017-10-02
Tabular dataCSV17.43 KB2017-09-11
MD5sumTEXT18.06 KB2017-09-11
Scriptarchive9.79 KB2017-10-02
U2OS cell lineMixed archiveTAR2.41 GB2017-09-11
U2OS cell lineMixed archiveTAR2.43 GB2017-09-11
U2OS cell lineMixed archiveTAR2.62 GB2017-09-11
U2OS cell lineMixed archiveTAR2.72 GB2017-09-11
U2OS cell lineMixed archiveTAR2.71 GB2017-09-11
U2OS cell lineMixed archiveTAR2.86 GB2017-09-11
Displaying 1-10 of 419 File(s).
Funding body Awardee Award ID Comments
National Science Foundation CAREER DBI-1148823 Anne E Carpenter (for publishing costs)
Date Action
October 5, 2017 Dataset publish
November 13, 2017 Manuscript Link added : 10.1093/gigascience/giw014
December 17, 2018 Plate_25575.tar.gz: file attribute updated
December 17, 2018 File Plate_25575.tar.gz updated
December 17, 2018 File Plate_25575.tar.gz updated - authors found it was corrupt. Replaced with non-corrupt version.
March 14, 2019 Additional file profiles.tar.gz added
March 14, 2019 profiles.tar.gz: additional file attribute added
March 14, 2019 File profiles.tar.gz updated
April 27, 2022 Manuscript Link updated : 10.1093/gigascience/giw014