Skip to main content
. 2021 Aug 19;33(11):3421–3453. doi: 10.1093/plcell/koab211

Figure 10.

Figure 10

Examples of the identification of sORFs and one LW in the PeptideAtlas build. These six examples represent the identifications that pass the stringent criterium of having at least two matched distinct peptides that are each identified three times (for more information, see Supplemental Data Set S3D). A, CONTRIB_sORFs_sORF2808 encodes a 4.9 kDa peptide (42 aa) and was identified as having three distinct and nested peptides that map to a small portion of the pseudogene AT2G20724 in Araport11. The identified peptides were all identified from dimethyl labeling (modifying both N-terminal free amines and the lysine side-chain) and enrichment studies (using TAILS or COFRADIC) as indicated. The 4.9-kDa predicted protein has a very high number of lysine and arginine residues (total 13) and upon tryptic digestion would result in only a single peptide of 7 (N-terminal methionine removed) or 8 aa. Trypsin cannot cleave the peptidyl bond of dimethylated lysine residues, which enhances the chance to observe peptides GAAFEDQVKMR and GAAFEDQVKMRALE. All four PSMs of GAFEDQVK and one of the four PSMs of GAFEDQVKMR are dimethylated and the other three are iTRAQ8plex labeled. The single PSM of GAFEDQVKMRALE is iTRAQ8plex labeled. The results suggest that AT2G20724 is not a pseudogene but rather a protein-coding gene. B, Both CONTRIB_sORFs_sORF1912 and CONTRIB_sORFs_sORF1913 mapped to the same transposon (AT2TE05755/AT2G04000) in Araport11. AT2G0400 has two exons, and the two ORFs each represent one exon. The transposon belongs to the VANDAL21 family and the DNA/MuDR superfamily and its preferred substrate for the integration of VANDAL21 is euchromatin. VANDAL21 mainly targets promotors and 5′UTR of broadly active genes that are enriched in histone marks H3K4me3 and H3K36me3 (Quesneville, 2020). C, CONTRIB_sORFs_sORF7763 encodes a 11 kDa protein (96 aa) and was identified as having five peptides that all map to exon 2 of AT5G00690 in Araport11. However, Araport11 has not assigned this as a protein-coding gene but as a “novel transcribed region”. The results suggest that AT5G00690 should be annotated as a protein-coding gene. D, CONTRIB_LW_ath_mu_ch3_8568top encodes a 16-kDa protein (136 aa) and was identified as having six peptides, three of which mapped to AT3G02345, which is annotated as a long-non-coding RNA in Araport11. Most PSMs were identified in seeds and a few others in embryos or siliques (see PeptideAtlas). BlastP with the 136 aa sequence against Araport11 found that the closest match was AT2G23148 but with a very poor E-value (0.003). BlastP against all nr proteins identified ARALYDRAFT_897225 in the lyrate ecotype as the closest match (98/117 identities for the region 20–80 aa; 1E-72). The significance of the small protein remains to be determined. E, CONTRIB_LW_ath_mu_ch5_3086top encodes a 15.6-kDa protein (131 aa) and was identified as having nine peptides, none of which map to an annotated genome element in Araport11. However, BlastP against all nr proteins identified AT5G03740 in ecotype Landsberg as the perfect match. The peptides were identified in samples from flowers and flower parts (petals, pollen, sepals, stamen) as well as siliques in two studies (Zhang et al., 2019; Mergner et al., 2020), even though these studies used samples from ecotype Col-0.