Supporting data for "Estimating the total number of phosphoproteins and phosphorylation sites in eukaryotic proteomes."

Dataset type: Epigenomic, Proteomic
Data released on December 21, 2016

Vlastaridis P; Kyriakidou P; Chaliotis A; de Peer YV; Oliver SG; Amoutzias GD (2016): Supporting data for "Estimating the total number of phosphoproteins and phosphorylation sites in eukaryotic proteomes." GigaScience Database.


Phosphorylation is the most frequent post-translational modification made to proteins and may regulate protein activity as either a molecular digital switch or a rheostat. Despite the cornucopia of high throughput phosphoproteomic data in the last decade, it remains unclear how many proteins are phosphorylated and how many phosphorylation sites (p-sites) can exist in total within a eukaryotic proteome. We present the first reliable estimates of the total number of phosphoproteins and phosphorylation sites (p-sites), for four eukaryotes (human, mouse, Arabidopsis, and yeast).
In all, 188 high-throughput phosphoproteomic datasets were filtered, compiled and studied along with two low-throughput compendia. Estimates of the number of phosphoproteins and p-sites were inferred by two methods: Capture-Recapture, and fitting the saturation curve of cumulative redundant vs. cumulative non-redundant phosphoproteins/p-sites. Estimates were also adjusted for different levels of noise within the individual datasets and other confounding factors. We estimate that in total, 13,000, 11,000 and 3,000 phosphoproteins and 230,000, 156,000 and 40,000 p-sites exist in human, mouse and yeast, respectively, whereas estimates for Arabidopsis were not as reliable. Most of the phosphoproteins have been discovered for human, mouse and yeast, while the dataset for Arabidopsis is still far from complete. The datasets for p-sites are not as close to saturation as those for phosphoproteins, Integration of the low-throughput data suggests that current high-throughput phosphoproteomics is capable of capturing 70-95% of total phosphoproteins, but only 40-60% of total p-sites.

Additional details

Read the peer-reviewed publication(s):

File NameSample IDData TypeFile FormatSizeRelease Date 
tabular dataCSV1.42 MB2016-12-16
VideoUNKNOWN61.05 MB2016-12-16
tabular dataCSV20.62 MB2016-12-16
tabular dataEXCEL407.75 KB2016-12-16
tabular dataCSV4.89 MB2016-12-16
Tabular dataTEXT35.73 KB2016-12-16
tabular dataEXCEL237.78 KB2016-12-16
ReadmeTEXT2.73 KB2016-12-16
tabular dataCSV1.12 MB2016-12-16
Displaying 1-9 of 9 File(s).
Funding body Awardee Award ID Comments
European Social Fund (ESF) and National Resources 4288 G D Amoutzias
University of Thessaly 3817 G D Amoutzias
Ghent University 01MR0310W Y Van de Peer
Date Action
December 21, 2016 Dataset publish
March 6, 2017 Manuscript Link added : 10.1093/gigascience/giw015