Table 1.
MaxQuant identification statistics of searches against several search spaces from differing sources and sizes
Sequencing technique | UniProt | Search space |
Identified protein groups |
Identified peptides |
Identified PSMs |
Identified PSMs |
|
---|---|---|---|---|---|---|---|
Entries | Amino acids | MaxQuant | MaxQuant | MaxQuant | MaxQuant+Percolator | ||
None | Canonical | 71,356 | 24,055,511 | 4294 | 28,443 | 180,526 | 186,937 |
Ribosome profiling | Canonical | 176,202 | 40,603,175 | 4333 | 28,402 | 177,473 | 185,767 |
Spliced | 186,627 | 46,830,033 | 4347 | 28,372 | 176,978 | 184,578 | |
RNA-Seq | Spliced | 4,988,183 | 757,075,232 | 3669 | 15,820 | 91,232 | 175,775 |
The size of the search space is given based on the number of present sequences as well as based on amino acid content. Information of both ribosome profiling and RNA-Seq could be combined with reference information from UniProt (only canonical proteins or with additional splicing isoforms included). The obtained proteogenomic search spaces were afterward used in the MaxQuant search tool. The number of identified PSMs, peptides, and inferred protein groups clearly differ based on the size of the used search space. Especially for the RNA-Seq–based search space, the size of the search space has dramatic effects on the identification in MaxQuant. Percolator helps to overcome already a big part of this identification reduction. “MaxQuant+Percolator” is used in the rest of the article as the baseline.