Table 1. Overview of PIT TGE classification results.
Dataset (number of samples in parentheses) | Homo sapiens (1) | Mus musculus (8) | Pteropus alecto (9) | Aedes aegypti (1) | |||
---|---|---|---|---|---|---|---|
Total spectra | 210,560 | 293,894 | 350,890 | 829,093 | |||
Standard search | Peptides | 24,187 | 23,151 | 22,554 | 58,336 | ||
PAGs (protein ambiguity groups) | 3,011 | 3,536 | 3,270 | 4,743 | |||
Total proteins | 12,589 | 14,107 | 3,522 | 5,692 | |||
SwissProt | Canonical | 3,302 | 3,534 | 2 | 71 | ||
Isoform | 3,365 | 1,344 | 0 | 79 | |||
TrEMBL | 5,922 | 9,229 | 3,520 | 5,542 | |||
PIT search | Peptides | 21,612 | 24,297 | 23,875 | 52,221 | ||
PAGs | 2,646 | 2,814 | 2,701 | 4,394 | |||
Total TGEs | 3,504 | 24,602 | 28,311 | 5,488 | |||
TGEs mapping to SwissProt | Canonical | Total | 1,134 | 1,270 | 0 | 77 | |
Complete ORF | 1,134 | 1,268 | 0 | 77 | |||
Isoform | Total | 197 | 195 | 0 | 1 | ||
Complete ORF | 197 | 193 | 0 | 1 | |||
TGEs mapping to TrEMBL | Total | 38 | 925 | 765 | 1,939 | ||
Complete ORF | 38 | 915 | 756 | 1,930 | |||
Putative novel isoform | SwissProt | Total | 1,815 | 12,351 | 0 | 57 | |
Complete ORF | 174 | 1,864 | 0 | 20 | |||
Score | 363 | 707 | 0 | 9 | |||
With specific peptide evidence | 50 | 357 | 0 | 11 | |||
With unique specific peptide ev. | 24 | 76 | 0 | 7 | |||
TrEMBL | Total | 233 | 9,194 | 26,328 | 3,080 | ||
Complete ORF | 30 | 1,643 | 5,700 | 1,077 | |||
Score | 92 | 488 | 5,092 | 891 | |||
With specific peptide evidence | 10 | 390 | 4,735 | 903 | |||
With unique specific peptide ev. | 3 | 82 | 1,452 | 428 | |||
Known protein with | SwissProt | Total | 47 | 278 | 0 | 4 | |
polymorphism | Complete ORF | 21 | 92 | 0 | 4 | ||
Score | 7 | 14 | 0 | 0 | |||
With specific peptide evidence | 1 | 6 | 0 | 0 | |||
With unique specific peptide ev. | 1 | 3 | 0 | 0 | |||
TrEMBL | Total | 8 | 187 | 251 | 97 | ||
Complete ORF | 5 | 86 | 95 | 85 | |||
Score | 0 | 31 | 25 | 32 | |||
With specific peptide evidence | 0 | 16 | 21 | 24 | |||
With unique specific peptide ev. | 0 | 6 | 13 | 12 | |||
Novel TGE | Total | 32 | 202 | 967 | 233 | ||
Complete ORF | 3 | 38 | 236 | 61 | |||
With unique peptide evidence | 0 | 18 | 283 | 131 |
To allow comparison with standard proteomics methods, peptide and protein identification was also performed for each species by searching directly against the reference proteome—the results of this are shown in the top (standard search) portion of the table. Throughout the table, identified proteins are shown based on the source reference sequence: Swiss-Prot or TrEMBL. Swiss-Prot proteins are further divided into two groups, canonical and isoform. TGEs with exact sequence map to reference proteins are classed as known proteins. TGEs not mapping to any reference proteins or with e-value above the threshold are classified as novel TGEs. The remaining TGEs are classified as known proteins with polymorphism, or novel isoforms of known proteins. The novel isoform TGEs are further separated into 16 classes and reliability of this annotation is verified by isoform-specific peptide evidence (see Supplementary Table S5 for details). Peptide and protein counts reported in the table are unique sequences across all the samples for datasets with multiple samples and average PAG (protein ambiguity group) counts are reported for these cases.