Abstract
The chloroplast chaperone CLPC1 unfolds and delivers substrates to the stromal CLPPRT protease complex for degradation. We previously used an in vivo trapping approach to identify interactors with CLPC1 in Arabidopsis thaliana by expressing a STREPII-tagged copy of CLPC1 mutated in its Walker B domains (CLPC1-TRAP) followed by affinity purification and mass spectrometry. To create a larger pool of candidate substrates, adaptors, or regulators, we carried out a far more sensitive and comprehensive in vivo protein trapping analysis. We identified 59 highly enriched CLPC1 protein interactors, in particular proteins belonging to families of unknown functions (DUF760, DUF179, DUF3143, UVR-DUF151, HugZ/DUF2470), as well as the UVR domain proteins EXE1 and EXE2 implicated in singlet oxygen damage and signaling. Phylogenetic and functional domain analyses identified other members of these families that appear to localize (nearly) exclusively to plastids. In addition, several of these DUF proteins are of very low abundance as determined through the Arabidopsis PeptideAtlas http://www.peptideatlas.org/builds/arabidopsis/ showing that enrichment in the CLPC1-TRAP was extremely selective. Evolutionary rate covariation indicated that the HugZ/DUF2470 family coevolved with the plastid CLP machinery suggesting functional and/or physical interactions. Finally, mRNA-based coexpression networks showed that all 12 CLP protease subunits tightly coexpressed as a single cluster with deep connections to DUF760-3. Coexpression modules for other trapped proteins suggested specific functions in biological processes, e.g., UVR2 and UVR3 were associated with extraplastidic degradation, whereas DUF760-6 is likely involved in senescence. This study provides a strong foundation for discovery of substrate selection by the chloroplast CLP protease system.
Keywords: CLP serine protease, AAA+ chaperone, domain of unknown function (DUF) proteins, adaptors, proteolysis, proteostasis, substrate trapping, chloroplast, Arabidopsis thaliana
Abbreviations: ARM, armadillo repeat; DUF, domains of unknown function; ERC, Evolutionary Rate Covariation; GBP, glutamyl-tRNA reductase binding protein; GluTR, glutamyl-tRNA reductase; LS, logit score; MS/MS, tandem mass spectrometry; PPDB, Plant Proteome Database; SPC, spectral count; SPP, stromal processing peptidase
Plastids undergo developmental transitions from nonphotosynthetic plastids in roots to photosynthetic chloroplasts in green tissues and are able to adapt to (a)biotic conditions (1). Each plastid type must contain a specific proteome through the coordinated actions of the proteostasis network, involving transcription, translation, protein folding, and degradation machineries. The remodeling and stability of these proteomes during plastid differentiation and adaptation occurs through selective protein synthesis and proteolysis. Understanding the proteolytic hierarchies and degrons is therefore essential to understand plastid differentiation, adaptation, and function (2, 3, 4). The most abundant and complex protease system in the chloroplast is the soluble CLP system located in the stroma. Forward and/or reverse genetics in Arabidopsis, maize, rice, and tobacco demonstrated the essential nature of the plastid CLP system. Complete loss of the CLPC chaperone or CLPPR protease capacity results in embryo lethality, whereas partial loss results in delayed growth and development and virescent leaves (5, 6, 7).
The plastid CLP system in Arabidopsis consists of a hetero-oligomeric protease core comprising one or more copies of five proteolytically active subunits (CLPP1 and CLPP3-6), four proteolytically inactive proteins (CLPR1-4), as well as two plant-specific accessory proteins (CLPT1,2), three AAA+ chaperones (CLPC1, CLPC2, CLPD), and two adaptors CLPS1 and CLPF. Plastids do not contain any CLPX homologs, which instead are present in mitochondria along with CLPP2 (5). A recent study showed that there is a tight correlation between amino acid substitution rates in the plastid-encoded CLPP1 and the nuclear-encoded CLP subunits across a broad sampling of angiosperms, suggesting continuing selection on interactions within this complex (8).
CLP-dependent proteolysis is an ATP-dependent multistep regulated process that involves the CLP chaperones assembled into hexamers and the CLP protease core. The CLPC1,2 and CLPD chaperones have two ATPase domains and an IGF motif that is essential for binding to the CLP protease core complex (7). The CLPC chaperones accumulate as dimers when not engaged in the degradation cycle, and formation of the chaperone hexamer requires priming of the chaperone by adaptors and/or ATP leading to the formation of the active hexamer in the ATP-bound state (5, 9, 10). Substrates are recognized directly by the CLP chaperone(s) and/or by active recruitment by so-called adaptor proteins or recognins, or even other chaperones. Upon interaction of the substrate with the CLP chaperone, the ATP-dependent substrate unfolding process starts and the CLP protease core complex is recruited to the substrate–chaperone assembly. ATP binding and hydrolysis is required for substrate unfolding. In contrast, the actual proteolytic cleavage by the catalytic CLP protease core does not require ATP. Small substrate fragments (∼6–9 aa) are released from the CLP protease core through dynamic lateral pores, and once the substrate degradation is complete, the CLP chaperone–protease complex disassembles (11).
Recently, we took an in vivo CLP trapping approach in Arabidopsis that identified potential substrates and/or regulators interacting with Arabidopsis chloroplast CLPC1 (11), following strategies successfully used for substrates trapping of other AAA+ proteins in bacterial systems—reviewed in (5, 12). The in vivo trap was generated by expressing CLPC1 mutated in two critical glutamate residues in the two Walker B domains required for the hydrolysis of ATP and with a C-terminal STREPII affinity tag for purification (11). Affinity purification of the CLPC1-TRAP followed by tandem mass spectrometry (MS/MS) analysis resulted in a dozen proteins highly enriched compared with affinity-purified CLPC1 with a C-terminal STREPII affinity tag. These enriched proteins likely represent CLP protease substrates and/or new adaptors. Several of these trapped proteins overaccumulated in CLP mutants and/or were found as interactors of the adaptor CLPS1, supporting their functional relationship to CLP. The complete plastid protease core complex was strongly enriched in the CLPC1-TRAP eluates, providing the first robust support for CLPC and CLP core physical and functional interactions (11). This was the first in vivo trapping experiment with CLPC1. Although this study showed the proof of principle of chloroplast CLPC1 trapping, this study was carried out with a limited number of replicates and affinity-purified CLPC1 traps were analyzed with an older-generation Orbitrap mass spectrometer. A far more comprehensive in vivo trapping study should allow for a more robust dataset and potentially many additional candidate substrates, adaptors, or other regulators. This would be highly valuable also to make more informed choices as to which protein interactors to further pursue experimentally.
To obtain a more in-depth analysis of CLPC1 trapped proteins, we used the same genetic material as in (11) but carried out affinity purification and MS/MS analysis with a larger amount of leaf starting materials, more biological and technical replicates, and a far more sensitive and faster mass spectrometer. We also included an additional negative control line expressing an unrelated STREPII-tagged protease. Indeed, as described in this study, this greatly expanded the depth of analysis (many more proteins, better sequence coverage) and also allowed us to apply more robust protein quantification and enrichment analyses. The trapped proteins consisted of known plastid-localized proteins involved in various metabolic pathways and a set of proteins with different types of Domains of Unknown Function (DUF), as well as other uncharacterized proteins with UVR, Armadillo, or HugZ domains. Strikingly, several of these were of very low abundance as determined from inspection of public proteome resources (e.g., PPDB, PeptideAtlas, SUBA) but were extremely enriched through the trapping approach. These proteins of unknown function could simply be substrates but should also be considered candidates for a regulatory role in CLP proteolysis, e.g., as a modulator of CLPC chaperone or CLPPR protease activity, as an adaptor, coadaptor, or antiadaptor in substrate selection or perhaps supporting the priming and oligomerization of the CLPC chaperones. In such cases, these proteins could have evolved with the CLP system, and we therefore set out to search for signals of coevolution between these interactors and the different components of the CLP system at the amino acid level. This study will provide a comprehensive analysis for these DUF, UVR, HugZ proteins and their homologs based on (i) phylogenetic and Evolutionary Rate Covariation (ERC) analyses, (ii) an analysis of protein sequence coverage by experimental peptides, possible posttranslational modifications, and protein abundance in different parts of the plant based on our recently launched Arabidopsis PeptideAtlas build#1 (http://www.peptideatlas.org/builds/arabidopsis/), and (iii) mRNA-based coexpression networks using information from ATTEDII (https://atted.jp/). The coexpression and ERC analyses will be used to infer possible functional and/or physical relationships between the CLP machinery and these enriched proteins and their homologs.
Results and discussion
To screen for additional chloroplast CLPC1 chaperone interactors including potential substrates, adaptors, and antiadaptors, and to improve their protein sequence coverage and potential discovery of degrons, we carried out a comprehensive in vivo protein interaction screen with chloroplast CLPC1-WT and CLPC1-TRAP proteins expressed in wildtype Arabidopsis. Both transgenes are driven by a constitutive promotor, and each has a C-terminal STREPII tag that allows for efficient affinity enrichment (11). Prior transformation of the null clpc1-1 line with the CLPC1-STREP transgene showed full complementation of the virescent phenotype and reduced biomass phenotype of clpc1-1 (11). The two transgenes differ in that, in CLPC1-TRAP-STREPII, the critical glutamate residues in the two Walker B domains of CLPC1 required for hydrolysis of ATP (CLPC1-TRAP) are changed to alanines (E374A and E718A), whereas CLPC1-STREPII is unmodified. The transgenic plants were grown on soil, and rosettes were harvested in three batches per genotype before bolting; these different batches serve as biological replicates. Fig. S1 shows images of the plants just before harvest. The heterozygous CLPC1-TRAP-STREPII lines have reduced biomass, and phenotypes of the rosette leaves range from virescent in young leaves but wt-like green in mature, fully developed leaves (Fig. S1) The phenotype of the heterozygous CLPC1-TRAP line is less severe than the clpc1-1 null mutant (11). The soluble leaf proteomes were isolated under nondenaturing conditions and applied to streptactin affinity purification. Affinity eluates were then subjected to SDS-PAGE, and gels were stained by Coomassie blue, followed by protein in-gel digestion with trypsin. Three biological replicates were analyzed. The resulting peptides for each biological replicate were extracted and analyzed by LC-MS/MS using triplicate runs that differed in acquisition parameters (technical replicates). Proteins were identified and quantified based on the number of matched MS/MS spectra using a well-established bioinformatics “pipeline” around the search engine Mascot (11) (and see Experimental procedures). Identified proteins were annotated for function and subcellular location using updated information from the Plant Proteome Database (PPDB). The CLPC affinity experiments identified 1643 proteins of which 575 were assigned to the plastid based on experimental support described in the literature (see PPDB) (Table S1A). The scatter plot in Figure 1A shows the number of spectral counts in CLPC1-WT and CLPC1-TRAP for all 1643 proteins; the 575 proteins that we have annotated as plastid proteins are marked up in blue. Figure 2A summarizes the proteomics workflow. These plastid proteins represented ∼72% of the protein biomass based on both adjusted Spectral Counts (adjSPC) and normalized adjSPC (NadjSPC). Previously, we also carried out a similar in vivo protein interaction analysis for transgenic plants expressing two different STREPII-tagged versions of the unrelated chloroplast glutamyl peptidase CGEP (13). As described (13), this did not identify any strong candidate interactors to CGEP, and this dataset therefore serves as an excellent negative control for nonspecific binding to the streptactin affinity columns and for abundant proteins. Proteins also identified in the CGEP-STREP affinity experiments are listed in Table S1.
Enrichment of the complete chloroplast CLP system
CLPC1 was by far the most abundant protein in all replicates, averaging about 46% of all matched MS/MS spectra (Fig. 1, A and B). CLPC1 was observed in equal amounts in the CLPC1-WT and CLPC1-TRAP samples, with an average ratio of 0.98. This demonstrates that CLPC1 affinity enrichment was consistent and successful. The CLPC1 interactome included all known proteins of the chloroplast CLP system, including the adaptor CLPF, but excluding the adaptor CLPS1 (Table S1A). This lack of identification of CLPS1 by MS/MS is because it is a small protein (12 kDa) with relatively few suitable tryptic peptides (see also http://www.peptideatlas.org/builds/arabidopsis/); immunoblotting with CLPS1 specific serum previously showed that CLPS1 was enriched to the same extent as CLPF (11). All chloroplast CLPP (P1,3,4,5,6), CLPR (R1,2,3,4) core subunits as well as the peripheral CLPT1,2 core proteins (1, 2) were at least 2-fold enriched in CLPC1-TRAP as compared with CpC1-WT, whereas CLPF, CLPC2, and CLPD were 4- to 7-fold enriched (Fig. 1B). Together, this showed that the interaction between the CLP protease core and CLPC1 was stabilized by blocking ATP hydrolysis in CLPC1 through the Walker B mutations, supporting our previous findings (11).
Enrichment analysis
We first used statistics to evaluate plastid proteins for potential enrichment in the CLPC1-TRAP or CLPC1-WT samples. We limited the plastid proteins to those with at least a total of 18 adjSPC across all experiments, resulting in 339 proteins. A volcano plot displays the log2 of CLPC1-TRAP/CLPC1-WT ratio and -log10 p-values based on the spectral counting data (Fig. 1C). Seventy-seven proteins were significantly (p < 0.01) different between CLPC1-TRAP and CLPC1-WT (Table S1B and Fig. 1C). Most of these (67) were enriched in the CLPC1-TRAP samples (upper right quadrant in Fig. 1C), and only 10 proteins were enriched in CLPC1-WT as compared with CLPC1-TRAP (upper left quadrant in Fig. 1C). Thirteen proteins were also observed in the CGEP affinity eluates; the negative control, however, only two of these, stromal CPN21 and HDS, were at least 3-fold enriched in the CLPC1-TRAP (Fig. 1C; area marked up in gray), indicating that a 3-fold enrichment was a strong criterium for specific trapping in the CLPC1-TRAP.
To obtain a stringent (conservative) set of proteins enriched in the CLPC1-TRAP eluates for further evaluation, we required at least 3-fold enrichment in CLPC1-TRAP compared with CLPC1-WT. We also required either two or three observations across the three biological CLPC1-TRAP replicates and a minimal threshold of 18 matched MS/MS for proteins identified in the CLPC1-TRAP samples (averaging two matched MS/MS spectra for the nine [biological + technical] replicates). This resulted in a set of 69 proteins (Table 1) of which 59 are plastid localized (Fig. 1A-inset). These 10 proteins not assigned to the plastid nearly all have a low number of SPC (between 26 and 51 across all experiments), with the exception of Hsc70-4 with 117 SPC. Five are observed only in two of the three bioreplicates. One of them (AT2G13440) is likely plastid localized, and the others have diverse functions and are unlikely to be located in plastids. This showed that our experiments and bioinformatics workflow (including selection criteria for enrichment) indeed mostly find plastid proteins, and the ones not in the plastid have low number of matched spectra. Most of these plastid proteins (52/59) were observed in all three biological CLPC1-TRAP replicates. Important, these 59 proteins were identified with at least three independent nonredundant peptides (irrespective of charge state or posttranslational modification) (Table S1). Of these 59 plastid proteins, 54 also showed statistical significance at p < 0.05 and the remaining five were significant at p < 0.1 (Table 1). Of the 17 proteins, 12 identified as trapped proteins in our previous study (11) are also part of this set of 59 enriched proteins, supporting their functional interaction with the CLP complex (Table 1). Just 2 of these 59 proteins, CPN21 and HDS (marked as #1 and #2 in Fig. 1C), were also observed in the CGEP-STREP experiments, and they could be nonspecific interactors with CLPC1 or perhaps also functionally interact with both CLPC1 and CGEP (see further below).
Table 1.
Protein identifier | Montandon et al. (2019) enriched in ClpC1-TRAPb | Protein annotation | Function | Observed in # bio reps of TRAP (out of 3) | Total AdjSPC (all 18 exp.) | AdjSPC WT (all) | AdjSPC TRAP (all) | Average TRAP/WT (based on NadjSPC)c | p-valued |
---|---|---|---|---|---|---|---|---|---|
ATCG00190.1 | rpoB RNA polymerase (PEP) beta | DNA–RNA | 3 | 35 | 3 | 32 | 12.6 | 0.01500 | |
ATCG00180.1 | rpoC1 RNA polymerase (PEP) beta | DNA–RNA | 3 | 41 | 3 | 38 | 16.4 | 0.00820 | |
AT5G46580.1 | pentatricopeptide repeat (PPR) protein SOT1 | DNA–RNA | 3 | 82 | 15 | 67 | 4.0 | 0.02370 | |
AT5G26742.1 | DEAD box RNA helicase (RH3) (EMB1138; globular stage) | DNA–RNA | 3 | 244 | 21 | 223 | 12.6 | 0.00010 | |
AT4G36390.1 | tRNA/rRNA methyltransferase | DNA–RNA | 3 | 29 | 2 | 27 | 14.9 | 0.01470 | |
AT4G31210.1 | DNA-directed topoisomerase—dually targeted mitochondria & plastid | DNA–RNA | 3 | 30 | 4 | 26 | 9.2 | 0.02230 | |
AT4G09730.1 | DEAD box RNA helicase, RH39 (nara12) 23S rRNA processing | DNA–RNA | 3 | 27 | 1 | 26 | 26.4 | 0.01710 | |
AT3G48500.1 | nucleoid protein (pTAC10) | DNA–RNA | 3 | 54 | 6 | 48 | 10.8 | 0.00770 | |
AT3G10270.1 | DNA gyrase B1—dual targeted plastids and mitochondria | DNA–RNA | 3 | 24 | 4 | 20 | 6.7 | 0.04090 | |
AT3G04260.1 | SAP domain–containing protein (pTAC3) | DNA–RNA | 3 | 58 | 5 | 53 | 9.9 | 0.00910 | |
AT3G02060.1 | DEAD/DEAH box helicase | DNA–RNA | 3 | 18 | 0 | 18 | 13.9 | 0.02500 | |
AT2G39670.1 | tRNA/rRNA methyltransferase | DNA–RNA | 2 | 21 | 0 | 21 | 16.5 | 0.01910 | |
AT1G74850.1 | pentatricopeptide (PPR) repeat (pTAC2) | DNA–RNA | 3 | 25 | 0 | 25 | 19.7 | 0.01400 | |
AT1G30680.1 | DNA primase-helicase (dual chloro-mito) | DNA–RNA | 3 | 37 | 2 | 35 | 17.1 | 0.01060 | |
AT1G02150.1 | pentatricopeptide repeat (PPR) protein (6 or 7 repeats). Coexpresses with RNAse E/G At2g04270 | DNA–RNA | 3 | 123 | 26 | 97 | 3.7 | 0.01700 | |
AT5G67030.1 | zeaxanthin epoxidase (ZEP) | metabolism | 3 | 62 | 13 | 49 | 3.2 | 0.04950 | |
AT5G64840.1 | ABC transporter family protein (ATGCN5) | metabolism | 3 | 49 | 2 | 47 | 15.6 | 0.00750 | |
AT5G60600.1 | 4-hydroxy-3-methylbutyl diphosphate synthase (HDS) | metabolism | 3 | 199 | 34 | 165 | 4.8 | 0.00400 | |
AT5G52920.1 | pyruvate kinase-2 (typically homotetramer) | metabolism | 3 | 31 | 5 | 26 | 5.7 | 0.04290 | |
AT5G45930.1 | Mg-protoporphyrin IX chelatase - CHLI-2 | metabolism | 3 | 24 | 3 | 21 | 8.3 | 0.03680 | |
AT5G13110.1 | glucose-6-phosphate dehydrogenase 2 (G6PD2) | metabolism | 3 | 26 | 3 | 22 | 7.2 | 0.04170 | |
AT4G30720.1 | pigment defective 327 (PDE327) - oxidoreductase | metabolism | 3 | 43 | 9 | 34 | 4.1 | 0.05010 | |
AT4G22240.1 | fibrillin 1b (FBN1b) | metabolism | 3 | 32 | 0 | 32 | 25.1 | 0.00790 | |
AT4G21990.1 | 8 | 5′-adenylylsulfate reductase-3 (APR3) | metabolism | 3 | 81 | 4 | 77 | 24.7 | 0.00120 |
AT4G15560.1 | 1-deoxy-D-xylulose 5-phosphate synthase (DXS1) | metabolism | 3 | 66 | 8 | 58 | 6.4 | 0.01250 | |
AT4G11570.1 | 6 | ARPP phosphatase cpFHy2 or PYRP2 (high in clpc1, clps1) | metabolism | 3 | 84 | 0 | 84 | 68.9 | 0.00040 |
AT4G04610.1 | 8 | 5′-adenylylsulfate reductase-1 (APR1) | metabolism | 3 | 218 | 24 | 194 | 8.5 | 0.00040 |
AT4G04020.1 | fibrillin 1a (FBN1a) | metabolism | 3 | 74 | 14 | 60 | 3.7 | 0.02860 | |
AT3G44720.1 | arogenate dehydratase 4 (ADT4) | metabolism | 3 | 34 | 1 | 33 | 50.3 | 0.00730 | |
AT3G21200.1 | GluTR binding protein (GBP or PGR7) | metabolism | 3 | 24 | 3 | 21 | 5.2 | 0.06020 | |
AT3G10970.1 | 7 | PYRP2-related Haloacid dehalogenase (HAD) hydrolase | metabolism | 2 | 54 | 0 | 54 | 43.1 | 0.00200 |
AT3G07630.1 | arogenate dehydratase 2 (ADT2) | metabolism | 3 | 25 | 4 | 21 | 4.3 | 0.07420 | |
AT2G44530.1 | ribose-phosphate pyrophosphokinase | metabolism | 3 | 38 | 5 | 33 | 4.8 | 0.04410 | |
AT2G35390.1 | ribose-phosphate pyrophosphokinase 1/phosphoribosyl diphosphate synthetase 1 (PRSI) | metabolism | 3 | 28 | 3 | 25 | 8.5 | 0.03000 | |
AT2G29630.1 | thiamine biosynthesis (thiC family) | metabolism | 3 | 40 | 5 | 35 | 5.7 | 0.02780 | |
AT1G62180.1 | 8 | 5′-adenylylsulfate reductase-2 (APR2) | metabolism | 3 | 270 | 26 | 245 | 10.2 | 0.00010 |
AT1G36180.1 | acetyl-CoA carboxylase - ACC2 | metabolism | 3 | 147 | 0 | 147 | 137.4 | 0.00000 | |
AT5G51110.1 | Rubisco assembly factor 2 (RAF2) | proteostasis | 2 | 33 | 8 | 25 | 3.4 | 0.09030 | |
AT5G51070.1 | CLPD | proteostasis | 3 | 575 | 111 | 464 | 3.8 | 0.00040 | |
AT5G42390.1 | stromal processing peptidase (SPP) | proteostasis | 2 | 63 | 10 | 53 | 5.6 | 0.02050 | |
AT5G20720.1 | CPN20 | proteostasis | 3 | 89 | 9 | 80 | 10.3 | 0.00230 | |
AT4G25370.1 | CLPT1 | proteostasis | 3 | 24 | 0 | 24 | 18.3 | 0.01590 | |
AT4G12060.1 | CLPT2 | proteostasis | 3 | 38 | 6 | 32 | 5.4 | 0.03320 | |
AT3G48870.1 | CLPC2 | proteostasis | 3 | 1527 | 170 | 1356 | 6.3 | 0.00000 | |
AT2G44650.1 | 10 | CPN10–1 | proteostasis | 3 | 19 | 0 | 19 | 17.3 | 0.01810 |
AT2G03390.1 | CLPF (adaptor) | proteostasis | 3 | 76 | 10 | 66 | 5.6 | 0.01250 | |
AT1G35340.1 | LON-domain protein 2 (LON-like2) | proteostasis | 3 | 25 | 3 | 22 | 7.9 | 0.03690 | |
AT5G66050.1 | UVR4 (DUF151 and UVR domain) | unknown | 3 | 60 | 0 | 60 | 46.7 | 0.00150 | |
AT5G24060.1 | 16 | HugZ-1 | unknown | 3 | 98 | 0 | 98 | 83.5 | 0.00020 |
AT4G33630.1 | Executer 1 (EXE1) | unknown | 3 | 194 | 0 | 194 | 178.5 | 0.00000 | |
AT3G29240.1 | 3 | DUF179–3 | unknown | 3 | 427 | 73 | 354 | 4.6 | 0.00050 |
AT3G17800.1 | 2 | DUF760–5 | unknown | 3 | 180 | 1 | 179 | 142.2 | 0.00000 |
AT2G14910.1 | 5 | DUF760–4 | unknown | 3 | 103 | 2 | 101 | 43.0 | 0.00040 |
AT1G75380.1 | UVR2 (DUF151 and UVR domain) | unknown | 2 | 34 | 1 | 33 | 28.4 | 0.01070 | |
AT1G48450.1 | 1 | DUF760–2 | unknown | 3 | 600 | 16 | 584 | 32.1 | 0.00000 |
AT1G32160.1 | DUF760–1 | unknown | 2 | 23 | 4 | 19 | 4.4 | 0.08430 | |
AT1G27510.1 | 9 | Executer 2 (EXE2) | unknown | 3 | 233 | 0 | 233 | 215.7 | 0.00000 |
AT1G23180.1 | armadillo repeat protein (ARM) | unknown | 3 | 78 | 4 | 74 | 16.6 | 0.00220 | |
AT1G19660.1 | UVR3 (DUF151 and UVR domain) | unknown | 2 | 20 | 0 | 20 | 16.1 | 0.01990 |
At least 3-fold ratio of CLPC1-TRAP/CLPC1-WT based NadjSPC. All proteins have at least three independent peptides (different aa sequences). All proteins are localized to the plastid.
Montandon et al., 2019 JPR. Table 2 - enriched in ClpC1-TRAP - rank (1–17; 1 is most enriched).
trap/wt NadjSPC (input 1.10–5 for zero; this only happened for wt).
p-Value (normalized to ClpC1) (based on GLEE pVal NadjSPC).
The relation between relative abundance in the CLPC1-WT and CLPC1-TRAP eluates and the relative enrichment in the CLPC1-TRAP for the 59 plastid proteins is shown in Figure 1D. This illustrates, e.g., that DUF760-2 has a high relative abundance in the CLPC1-TRAP sample and is 32-fold enriched as compared with CLPC1-WT, whereas EXE1, EXE2, and DUF760-5 are >200-fold enriched and identified with ∼200 matched MS/MS spectra.
Evaluation of CLPC1-TRAP enriched proteins
The functions of the enriched proteins in Table 1 can be assigned to four groups: (i) 15 proteins involved in DNA or RNA metabolism, (ii) 22 proteins directly or indirectly involved in chloroplast metabolism, (iii) 10 proteins involved in proteostasis, including chaperones (CPN10 and CPN21) and subunits of protease systems (CLPT1, CLPF, CLPC2, CLPD, SPP, and Lon-like2), and (iv) 12 proteins with specific domains (DUF760, DUF179, DUF151, UVR, HugZ, and ARM) but with mostly unknown functions. We will first briefly summarize the proteins for each of these four categories in the next sections, followed by an extensive analysis of DUF, UVR, HugZ, and ARM proteins, including phylogeny, mRNA-based coexpression, and protein identification across hundreds of experiments using the recent release of the Arabidopsis PeptideAtlas (14). This extensive analysis is summarized in Figure 2B.
Enriched proteins involved in DNA and RNA metabolism
Most of the 15 proteins involved in DNA or RNA metabolism were previously found to be enriched in Arabidopsis chloroplast nucleoids (15); their homologs in maize were also nucleoid enriched (16). These 15 proteins include two subunits of the plastid-encoded RNA polymerase (PEP) complex, several PPR proteins (including pTAC2 (17, 18) and SOT1 (19, 20, 21)), three DEAD box RNA helicases two of which are involved in splicing (RH3 (22, 23), RH39 (24)), as well as two putative tRNA/rRNA methyltransferases that have not been described previously. Proteins involved with chloroplast DNA include a DNA topoisomerase, DNA gyrase B1 (25, 26), a DNA primase/helicase (27, 28) and pTAC3 (29) and pTAC10 (30). None of these proteins were observed in the CGEP-STREP affinity purification (the negative control), and the enrichment in the CLPC1-TRAP ranged from 3.7 to over 100, with between 18 to 223 matched MS/MS spectra for proteins in the CLPC1-TRAP (Table 1). Their enrichment suggests that these proteins are degraded by the CLP system, perhaps because most of the leaves (rosettes) were fully developed and therefore likely to have a lower demand for these proteins involved in DNA and RNA metabolism since plastid gene expression and translation are expected to be reduced when leaves are fully developed. The data do not tell us whether the CLPC1 chaperone directly interacts with these proteins (functioning in DNA/RNA metabolism) when they are attached to the nucleoid or otherwise located in the stroma.
Enriched proteins involved in metabolism
Of interest, none of the trapped proteins involved in metabolism were involved in (high abundance) primary carbon metabolism (e.g., Calvin–Benson cycle or starch metabolism), but instead they are involved in six other metabolic pathways, namely, fatty acid metabolism (ACC2 and pyruvate kinase), phenylalanine synthesis (arogenate dehydratase 2 and 4 (ADT2,4), 5′-adenylylsulfate reductases-1,2,3 (APR1,2,3) involved in sulfur metabolism, the methylerythritol phosphate pathway (DXS1 and HDS), the thiamin pathway (THIC (31, 32) and ARPP phosphatase PYRP2 (33) and a PYRP2 homolog), tetrapyrrole synthesis (GluTR binding protein GBP (34, 35) and Mg-protoporphyrin IX chelatase CHLI2 (36, 37)), and nucleotide metabolism (ribose-phosphate pyrophosphokinases). The family of APR proteins, as well as PYRP2 and its homolog, were also observed in our prior, smaller-scale CLPC1-TRAP analysis (11). GBP interacts with glutamyl t-RNA reductase (GluTR), the controlling enzyme in the synthesis of heme and chlorophyll. Binding of heme to GBP inhibits its interaction with the N-terminal regulatory domain of GluTR1, thus making GluTR1 accessible for recognition and degradation by the CLP protease system (34). Indeed, CLPS1, CLPC1, CLPF, and GBP all interact with the N terminus of GluTR (34, 38) and loss-of-function mutants of CLPR2 and CLPC1 showed increased GluTR stability, whereas absence of GBP results in decreased GluTR stability (35). Finally, fibrillins 1A and 1B were highly enriched in the CLPC1-TRAP. These fibrillins mostly function as components of plastoglobules and they respond to a wide range of abiotic stress conditions, but their molecular function is not known (39). The enriched proteins described above are candidate substrates for degradation by CLPPR protease and less likely to function as CLP substrate adaptors or regulators.
Enriched proteins involved in chloroplast proteostasis
All known chloroplast CLP core subunits were enriched at least 2-fold in the CLPC1-TRAP, most likely due to stabilization of the interaction between the CLPC hexamer with the CLPPRT core complex (11). Stromal processing peptidase (SPP), responsible for cleaving all chloroplast transit peptides (40, 41), was 5-fold enriched in the CLPC1-TRAP; SPP levels were consistently several fold higher in various loss-of-function CLP mutants (42, 43) suggesting upregulation of SPP in response to proteostasis stress or alternatively that SPP is stabilized when CLP capacity is reduced. LON-domain protein 2 (LON-like2) was 7-fold enriched. LON-like2 is part of a small family with LON-like1 (AT1G19740), LON-like3 (AT1G75460), and LON-like4 (At2G25740). LON proteases are found in plant organelles (LON1–4 in Arabidopsis) and have an N-terminal LON domain, an AAA+ domain, and the catalytic LON domain (44). However, the LON-like family members (also named the iLON family) only have an N-terminal LON domain, and they are unlikely to have proteolytic activity themselves (2). Just recently, LON-like1 was suggested to somehow repress the activity of chloroplast thioredoxin y2, but the molecular mechanism is unknown (45). We detected, in addition to LON-like2, LON-like1 and LON-like3 in the CLPC eluates. LON-like1 was identified with 15 matched MS/MS spectra and a CLPC1-TRAP/CLPC1-WT ratio of 5.9, whereas LON-like3 was identified with 194 MS/MS spectra at a 2.1-fold abundance ratio (Table S1). Although neither of these LON-like proteins passed our thresholds for Table 1, they do appear to get trapped in CLPC1 either because they are CLP substrates or perhaps because they are involved in regulating aspects of CLP substrate selection and degradation. The Rubisco assembly factor 2 (RAF2) (46) identified with 33 MS/MS spectra was 3.4-fold enriched in the CLPC1-TRAP, but the significance level of enrichment was relatively low (p = 0.09). Finally, both the chaperone CPN20 and its cochaperonin CPN10-1 (46, 47) were >10-fold enriched in the CLPC1-TRAP (Table 1). Their enrichment could reflect not only their involvement of substrate unfolding and/or delivery but also their degradation. We previously observed and highlighted a strong enrichment of CPN20 in protein interactome analysis of CLPT1,2 (43). Interesting, a recent cryo-EM structure of the affinity-purified chloroplast CLPPR protease complex from the green algae Chlamydomonas reinhardtii showed that a heterotetramer of CPN11, CPN20, and CPN23 associated with one of the axial sides of the CLP core complex to form a stable 550-kDa complex (48). It was suggested that this cochaperone complex could play a role in coordinating protein folding and degradation in the Chlamydomonas chloroplast.
Enriched proteins with unknown function, their domains, and phylogeny
The enrichment analysis also identified 12 proteins with unknown function (Table 1). These are proteins with Domain of Unknown Function (DUF) 179, DUF760, a UVR domain together with a DUF151 domain (UVR2, UVR3, UVR4) or without a DUF151 domain (EXE1, EXE2, UVR1), a Heme oxygenase HugZ-like domain, or several armadillo repeat (ARM) domains (Table 1). Six of these proteins were significantly enriched in our previous CLPC1-Trap study (11) (Table 1). Except for EXE1 and EXE2, involved in chloroplast singlet oxygen stress response (49, 50, 51), none of these proteins have been studied previously. None of these proteins or their homologs have known or predicted functions as metabolic enzymes, and therefore they are potential regulatory proteins in CLP proteolysis, including functions as CLP protease adaptors and antiadaptors. In the remainder of this study, we focus on this interesting set of CLPC1 interactors (as also summarized in Fig. 2B).
The enrichment analysis identified one protein with a DUF179 (AT3G29240), assigned DUF179-3 (Table 1). However, inspection of the original proteome dataset (Table S1) identified one additional DUF179 protein (AT1G32160–DUF179-1) identified with 217 matched MS/MS spectra and 1.3-fold enriched in the CLPC1-TRAP. Homology searches of the Arabidopsis genome identified one additional member of this family (AT1G48450–DUF179-2) (Table 2).
Table 2.
Protein id | Abbreviated name | This study in Table 1 or prior studya,b | Curated location (PPDB) | Predicted locationc | Total AdjSPC (this study)d | Average CLPC1-TRAP/CLPC1-WTe | Conclusion coevolution ERC (Fig. 5) (in bold, most pronounced) | Conclusions from mRNA coexpression (Figs. 6 and 7) (in bold, most pronounced) | PeptideAtlas # experiments (Fig. 8) | Conclusion for protein abundance and CLPC1 interaction and trapping |
---|---|---|---|---|---|---|---|---|---|---|
AT1G23180.1 | ARM | Table 1 | plastid | C | 78 | 16.6 | coevolution of ARM with EXE2 and with CLP core and CLPC1/2 | in module enriched for plastid proteostasis | 108 | Abundant protein and enriched in trap |
AT1G33780.1 | DUF179-1 | plastid stroma | C | 217 | 1.3 | coevolution with UVR2/3 and DUF760-1 | some connectivity | 90 | Abundant interactor to ClpC1, independent of trapping | |
AT3G19780.1 | DUF179-2 | unknown | S | 0 | nd | Coevolution of DUF760-3, DUF760-7 and DUF179-2 | poor connectivity | 38 | Moderately abundant, but not a ClpC1 interactor. Perhaps not located in the plastid | |
AT3G29240.1 | DUF179-3 | Table 1 (b3) | plastid stroma | C | 427 | 4.6 | coevolution with DUF179-2 | module enriched for UBI/ATG degradation | 35 | Moderately abundant interactor, enriched in trap |
AT5G52960.1 | DUF3143 | a,b13 | plastid stroma | C | 19 | 2.4 | DUF760-2, DUF760-3 and DUF3143 showed many connections to the tight CLPPRT cluster | 74 | Abundant, but not a strong ClpC1 interactor | |
AT1G32160.1 | DUF760-1 (clade 1) | Table 1 | plastid | C | 23 | 4.4 | Coevolution with DUF179–1 and CLPT1/2 | small module of UVR1, DUF760-1, DUF760-5, DUF760-7, DUF760-8. Direct edges between DUF760-1, DUF760-7, DUF760-8 and HUGZ-1 | 115 | Abundant, but not a strong ClpC1 interactor, but enriched in trap |
AT1G48450.1 | DUF760-2 (clade 1) | Table 1 (b1) | plastid | C | 600 | 32.1 | DUF760-2, DUF760-3, and DUF3143 showed many connections to the tight CLPPRT cluster | 18 | Low abundance, highly enriched in trap | |
AT1G63610.1 | DUF760-3 (clade 2) | b4 | plastid | C | 821 | 2.1 | Coevolution of DUF760-3, DUF760-7 and DUF179-2 | DUF760-2, DUF760-3, and DUF3143 showed many connections to the tight CLPPRT cluster; direct edge with HugZ-3 | 105 | Abundant, ClpC1 interactor, not strongly dependent on trapping |
AT2G14910.1 | DUF760-4 (clade 2) | Table 1 (b5) | plastid | C | 103 | 43.0 | some connectivity | 12 | Low abundance, highly enriched in trap | |
AT3G07310.1 | DUF760-5 (clade 3) | unknown | M | 6 | 4.5 | module of UVR1, DUF760-1, DUF760-5; direct edges with UVR1,3 and DUF760-8 | 3 | Very low abundance, enriched in trap | ||
AT3G17800.1 | DUF760-6 (clade 1) | Table 1 (b2) | plastid | C | 180 | 142.2 | small module of CLPD and DUF760-6 | 36 | Moderately abundant interactor, highly enriched in trap | |
AT5G14970.1 | DUF760-7 (clade 2) | unknown | C | 13 | 10.9 | Coevolution with DUF760-3, DUF179-2 and CLPD | small module of UVR1, DUF760-1, DUF760-5, DUF760-7, DUF760-8. Direct edges between DUF760-1, DUF760-7, DUF760-8, and UVR1 | 2 | Very low abundance, enriched in trap | |
AT5G48590.1 | DUF760-8 (clade 3) | unknown | C | 0 | nd | small module of UVR1, DUF760-1, DUF760-5, DUF760-7, DUF760-8. Direct edges between DUF760-1, DUF760-7, DUF760-8 and HUGZ-1 | 0 | protein not detected; pseudogene? | ||
AT4G33630.1 | EXE1 | Table 1 | thylakoid | C | 194 | 178.5 | some connectivity | 84 | Abundant interactor highly enriched in trap | |
AT1G27510.1 | EXE2 | Table 1 (b9) | thylakoid | C | 233 | 215.7 | Coevolution with ARM | direct edges with DUF760-2 and ClpX2 | 74 | Abundant interactor highly enriched in trap |
AT5G24060.1 | HugZ-1 | Table 1 (b16) | plastid | C | 97.7 | 83.5 | Coevolution with CLPP3,4,5 and EXE1 | Direct edges between DUF760-1, DUF760-7, DUF760-8 and HUGZ-1. HugZ1- direct edge to CLPP3 and CLPP5 | 65 | Abundant interactor to ClpC1, highly enriched in trap |
AT3G49140.1 | HugZ-2 | plastid | C | 11.3 | 10.9 | Coevolution with CLPP3,4,5 and EXE1 | direct edge with DUF760-3 | 108 | Abundant, but not a strong ClpC1 interactor, but enriched in trap | |
AT3G59300.1 | HugZ-3 | plastid | C | 9 | 9.4 | coevolution with CLPF and CLPP1, P3, R2, R4,T1/2, CLPS1, DUF760-3 and ARM | poor connectiviy | 6 | Very low abundance, enriched in trap | |
AT3G09250.1 | UVR1 | plastid stroma | C | 0 | nd | small module of UVR1, DUF760-1, DUF760-5, DUF760-7, DUF760-8; direct edges of UVR1 to DUF760-4,7, CLPP3,5 | 59 | Abundant, but not a ClpC1 interactor | ||
AT1G75380.1 | UVR2 | Table 1 | plastid | C | 34 | 28.4 | Coevolution with DUF179–1 | UBI/ATG degradation module of UVR2, UVR3 and DUF760-5. Direct edge between UVR2 and UVR3 | 50 | moderately abundant interactor, highly enriched in trap |
AT1G19660.1 | UVR3 | Table 1 | plastid | C | 20 | 16.1 | Coevolution with DUF179-1 | UBI/ATG degradation module of UVR2, UVR3 and DUF760-5. Direct edge between UVR2 and UVR3 | 45 | Moderately abundant interactor, enriched in trap |
AT5G66050.1 | UVR4 | Table 1 | plastid | C | 60 | 46.7 | poor connectivity | 8 | Low abundance, highly enriched in trap |
ClpS1 interactor, Nishimura et al. (2013).
Trapped in ClpC1, Montandon et al. (2019); # indicates abundance rank.
Predicted subcellular location by TargetP. C, chloroplast; M, mitochondria; S, secreted with signal peptide.
Total AdjSPC, adjusted matched MS/MS spectra.
Average TRAP/WT NadjSPC (input 1.10–5 for zero).
The enrichment analysis identified four proteins with a DUF760; we assigned these as DUF760-1,2,4,6. However, inspection of the original proteome dataset (Table S1) recognized three additional DUF760 proteins (DUF760-3,5,7), and searching the Arabidopsis genome revealed one additional member of this family (DUF760-8) (Table 2), which was, however, not observed in our CLPC1-trap experiments or in any other dataset in PPDB.
The enrichment analysis identified five proteins with a UVR domain, i.e., EXE1, EXE2, UVR2, UVR3, and UVR4 (Table 1). In addition, the chaperones CLPC1, CLPC2 (but not CLPD), and CLPF also have UVR domains (7, 38). A search of the Arabidopsis genome revealed one additional protein with a UVR domain, assigned UVR1 (At3G09250) (Table 2).
The enrichment analysis identified one protein with a HugZ domain (IPR037119), assigned HugZ-1. Analysis of the original proteome dataset (Table S1) found two additional proteins with a HugZ domain (assigned HugZ-2 and HugZ-3)—these showed a 9- and 11-fold enrichment in the CLPC1-TRAP, respectively, but they did not pass our enrichment criteria owing to the relatively low number of matched MS/MS spectra (11 and 9, respectively) (Table 2).
Finally, the enrichment analysis identified one protein (AT1G23180) with four armadillo repeat (ARM) domains; we named it ARM (Table 2). The Armadillo repeat is a repetitive amino acid sequence of about 40 residues composed of a pair of alpha helices that form a hairpin structure (52). There are no close Arabidopsis homologs to ARM. It is interesting to note that ARM domains are frequently found in combination with U-box or F-box domains involved in proteasomal degradation. Examples are AT5G67340, AT2G44900 (ARMADILLO-1), AT3G60350 (ARABIDILLO-2) (53, 54, 55), as well as PUB4 E3 ligase (AT2G23140) involved in chloroplast degradation (56).
In our previous trapping study (11), we found another DUF domain protein to be enriched, DUF3143 (AT5G52960); this was 2.4-fold enriched in the CLPC1-TRAP in the current study and identified in all three biological replicates (Table S1). This protein was also identified as an interactor to CLPS1 (57). There are no Arabidopsis homologs of DUF3143.
BLAST and functional domain searches against the Arabidopsis genome identified additional proteins with DUF179, DUF760, HugZ, and UVR domains resulting in a total set of 22 Arabidopsis proteins (Table 2). We searched for homologs of the 22 Arabidopsis proteins in 18 species across Archaeplastida with representatives from the glaucophytes, rhodophytes, chlorophytes, charophytes, bryophytes, lycophytes, and angiosperms and performed phylogenetic and conserved domain prediction analyses (Table S2 for more information). Based on this analysis, we mapped the 22 proteins to 10 gene families, and for comparison, we also included the CLPF protein family (Figure 3, Figure 4, Figure 5). With the exception of ARM (Fig. 5), all families underwent at least one gene duplication event within one or more species. Some show frequent duplications including at ancient nodes in the tree (e.g. UVR2/UVR3/UVR4, Fig. 3), whereas others show only recent lineage-specific duplications, meaning the gene remained single-copy throughout most of the tree (e.g. DUF3143, Fig. 5). Domain maps indicate that the level of conservation of domain architecture varies by gene family (see figure legends for details on the functional domains). Several genes exhibit conservation of one core domain paired with the occasional gain/loss of an additional domain (e.g. UVR2/UVR3/UVR4 (Fig. 3), EXE1/EXE2 (Fig. 4), CLPF (Fig. 5)). The UVR1 family presents a particularly interesting case of duplication and domain evolution (Fig. 3). Duplication occurred at an ancient point in Arachaeplastida evolution, and the two resulting paralogs diverged with one lineage acquiring a UVR domain and the other acquiring an F-box-like domain, exemplified by the two Arabidopsis F-box proteins (E3-ligases) AT4G23960 and AT4G10925 (neither have been studied) likely involved in substrate recognition for degradation by the proteosome. This pattern suggests neofunctionalization within proteostasis.
Coevolution of CLP proteins and candidate CLP-interacting proteins
ERC is a method to reveal genes with a history of coevolution and/or shared evolutionary pressures, based on the concept that functionally related genes will experience correlated changes in rates of sequence evolution across a phylogeny (58, 59, 60, 61). Recently, we used ERC across angiosperms to demonstrate signatures of coevolution between plastid-encoded and nuclear-encoded proteins, in particular for proteins involved in plastid proteostasis (8, 59). For example, ERC analysis showed strong coevolution between the plastid-encoded CLPP1 and the nuclear-encoded CLPR and CLPP subunits of the CLP proteolytic core but the relationship between the nuclear-encoded proteins were not studied (8).
We applied this ERC method to probe for coevolution between all subunits of the plastid CLP chaperone-protease system (CLPP1,3-6, CLPR1-4, CLPT1,2, CLPS1, CLPF, CLPC1,2 and CLPD) and the candidate interactors listed in Table 2 (Figs. 2B and 6). Figure 6A shows the full matrix with p-values, and Figure 6B displays the significant relationships as a network. This analysis showed strong coevolution between all subunit pairs within the CLPPR core and between CLPT1/T2 and the CLPPR core subunits, with the exception of CLPP5. This lack of coevolution for CLPP5 is surprising given that CLPP5 is essential for both structure and proteolytic function (62). There is strong ERC between the CLPS and CLPF adaptors and between CLPS1, CLPF, and members of the CLP core and CLPT1/T2 (Fig. 6). The exception is CLPP5, which does not show coevolution with CLPT1/T2, CLPF, or CLPS. On the other hand, the chaperones CLPC and CLPD show very little signature of coevolution with the CLP core. This lack of signature could reflect either false negatives or a true absence of selective pressure to coevolve, even while interacting (note that we did previously observe elevated CLPC rates in Silene species with rapid evolution in other CLP subunits (63)). For CLPD, this lack of signal likely results from a lack of power due to absence of the gene in many of the sampled species. Overall, the high degree of ERC within the CLP complex suggests coevolution between the CLPPR core, CLPT1,2, and CLPF and CLPS that reflects functional (but not necessarily physical) interactions within the CLP machinery.
We also found signs of ERC between CLP subunits and some of the CLP interactors (Fig. 6). In particular, HugZ-1/2, HugZ-3, and ARM show ERC signatures with several members of the plastid CLP system. For instance, HugZ-3 showed coevolution with CLPP1, P3, R2, R4 as well as CLPT1/2, CLPS1, and CLPF, suggesting that HugZ is functionally linked to the CLP system Interesting, a HugZ domain is also found in the C terminus of the Arabidopsis glutamyl-tRNA reductase (GluTR) binding protein (GBP) localized in chloroplasts. GluTR is important for the synthesis of 5-aminolevulinate, a precursor in heme and chlorophyll biosynthesis. Of importance, GBP plays a regulatory role in the stability of GluTR and protects the N terminus from being recruited by CLPS1 for degradation by the CLP system (34). This is quite a striking connection and suggests that the HugZ1/3 family could be directly involved in regulation of CLP substrate selection. Three DUF genes (DUF179-2, DUF3143, and DUF760-7) showed coevolution with the senescence and drought-induced CLPD chaperone, suggesting a functional connection. Finally, coevolutionary signatures were also found among pairs of candidate interactors. In particular, DUF760-7 showed coevolution with DUF179-2 and DUF760-3 (at adjusted p-value <0.05), whereas DUF179-1 showed a weaker coevolution signature with DUF760-1 and UVR2/3. These coevolutionary links provide a further incentive to study these interactors in more detail.
Coexpression analysis of the CLP machinery and the trapped protein families
A complementary tool to infer functional relationships between proteins is to study the correlation between mRNA expression levels across tissues or developmental stages in a single species, here Arabidopsis (Fig. 2B). To better understand the functional relationship of the trapped proteins and their homologs (Table 2) with the CLP machinery, we generated mRNA-based coexpression networks using correlation Arabidopsis data from ATTED-II based on both microarray and RNA-Seq experiments (64). We downloaded 100 genes with the highest coexpression values for each of the 22 proteins in Table 2, as well as the complete nuclear-encoded chloroplast CLP system (15 proteins), the four mitochondrial CLP proteins (CLPP2, CLPX1-3), and the plastid unfoldase CLPB3, which does not directly physically interact with the CLP protease system (Table S3A). This resulted in a set of 2157 nonredundant genes (Table S3). Coexpression was based on the logit score (LS), which is a monotonic transformation of the Mutual Rank index, with larger LS indicating stronger coexpression. We then constructed a coexpression network for the top 20 highest coexpressors of each of the 42 genes creating a network of 579 genes making 840 edges (1.45 edges/gene). We also generated coexpression networks based on two different minimal correlation thresholds for coexpression (LS ≥ 6 or 7) with 585 genes (1061 edges; 1.81 edges/gene) and 273 genes (414 edges; 1.52 edges/gene), respectively. Fig. S2 shows the three networks side by side, with bait names shown in yellow, plastid-localized gene products in green, mitochondrial localized gene products in orange, and gene products with unknown or other subcellular locations in gray. Each gene has the same identification number across the three networks (Table S3); 63%, 80%, and 85% of the proteins in the top20, LS ≥ 6, and LS ≥ 7 networks, respectively, were localized to the plastid. Figure 7 shows the LS ≥ 6 network.
In all three networks, the complete CLPPRT protease core complex formed a tight coexpression cluster, with CLPC1 and to a lesser degree CLPF, connected with multiple edges. CLPS1 was more distantly connected, with one shared coexpressor (Crumpled Leaf—AT5G51020) to CLPF (LS = 6.2/6.3). Interesting, DUF760-2, DUF760-3, and DUF3143 showed many connections to the tight CLPPRT cluster even at LS ≥ 7, suggesting that these three DUF proteins likely have a function closely associated with the plastid CLP system. At the highest stringency level (LS ≥7) (Fig. S2), only DUF760-1,2,3,7, DUF3143, HugZ-2, UVR1, EXE1, and EXE2 were part of the main network with the CLPPRT complex, CLPC1, CLPS1, and CLPF. Three proteins had no coexpressors at this highest stringency level (HugZ-3 and DUF179-1,2), and the other 11 proteins had between one (DUF179-3 and CLPX3) and 11 (CLPB3) coexpressors. The small DUF179 family only connected to the main network in the Top20 network (Fig. S2).
To more easily visualize the connectivity between CLP and trapped proteins, we generated a network of the combined top 20 and LS ≥ 6 coexpressors but including only those coexpressors with at least two edges (Fig. 8). This resulted in a dense network of 274 proteins and with 478 edges connected to CLP proteins and 311 edges connected to trapped proteins (average connectivity is 2.88 edges/protein); CLPX3, DUF179-2, and UVR4 were not part of this network. Ninety percent of the proteins are plastid localized. The direct edges between the baits (CLP and trapped proteins) are colored in red (see Fig. S3 for just the direct edge network). Again, the CLPPRT core formed a highly connected module, and DUF760-3 was an integral part of this module through direct edges to CLPR2, CLPP4, CLPP5, and CLPP6, suggesting a closely related functional role (Fig. 8). DUF760-2 was connected to this module through CLPR2 and CLPR4 (part of the R-ring), whereas UVR1 connected to CLPP3 and CLPP5 (part of the P-ring). UVR1, DUF760-1, DUF760-5, DUF760-7, and DUF760-8 formed a smaller module (module II), connected to the main module through edges of UVR1 to CLPP3 and CLPP5. UVR2 and UVR3 have direct edges and formed a small module (III) that included DUF179-3 and connecting to DUF760-5 and CLPX. Strikingly, several of the coexpressors in this module III encode for proteins involved in extraplastidic degradation through autophagy (ATG8f) and the UBI system. This is strongly contrasted to the dominant presence in most of the network for plastid proteins involved in various aspects of chloroplast biogenesis and proteostasis. CLPD and DUF760-6 form a small module (IV) connecting to DUF179-3, CLPX1, and CLPB3. Coexpressors in this module IV are mostly involved with senescence and plastoglobules, including the PG protease PGM48 (65) and atypical kinase ABC1K7 (66), as well as pheophytin pheophorbide hydrolase, a key enzyme in chlorophyll degradation (67).
Protein observations in the Arabidopsis PeptideAtlas and comparison with CLPC1-TRAP and CLPC1-WT samples
To further evaluate the CLPC1-trapped proteins and their homologs, we took advantage of a new resource, the Arabidopsis PeptideAtlas (www.peptideatlas.org/builds/arabidopsis/) (14). Arabidopsis PeptideAtlas is based on publicly available mass spectrometry data from many published Arabidopsis proteome studies, collected through ProteomeXchange (http://www.proteomexchange.org/) and reanalyzed through a uniform processing and metadata annotation pipeline. In the first release, ∼40 million of ∼143 million MS/MS spectra acquired from a wide range of highly diverse samples from Arabidopsis (including leaves, flowers, roots, cell cultures, and subcellular fractions) were matched to the reference genome Araport11, identifying 17,858 uniquely identified proteins at the highest confidence level (canonical proteins) and 3543 lower confidence proteins. The raw MS datasets of the CLPC1 trapping experiment, as described above, as well as our previous CLPC1 trapping study (11) are also part of this atlas. In total there are 266 experiments in this peptideatlas.
We collected information from PeptideAtlas for the 22 proteins including relative abundance (as matched number of spectra/protein length) across these very diverse datasets, overall protein sequence coverage by matched peptides, and the most N-terminal residue observed and evaluated in what datasets in PeptideAtlas these proteins were observed (e.g. tissue types, subcellular fractions) (summarized in Figs. S4–S10, Table S4, Figs. 9 and 10). Simplified information and a summary are provided in Table 2. All except one protein (DUF760-8) were identified at the canonical (most confident) level in PeptideAtlas. Some proteins were identified in more than 100 experiments (DUF760-1, DUF760-3, ARM, HugZ-2), whereas others were nearly exclusively identified in our CLPC1 affinity experiments (e.g., UVR4, HugZ-3, DUF760-5, and DUF760-7), indicative of their low abundance and specific CLPC1 trapping (Fig. 9 and Table 2). For comparison, CLPS1 and CLPF were identified 41 and 118 times, respectively. The abundance of the canonical proteins in the current PeptideAtlas release (based on apportioned matched MS/MS spectra per protein length) ranges from 0.0018 to 1639 (the large subunit of Rubisco and CF1β of the thylakoid ATP synthase are the most abundant) (14), whereas the abundance of the 22 proteins (Table 2) ranged from 0.016 to 12.1 (Fig. 9A). DUF760-3 was by far the most abundant in this first PeptideAtlas release, whereas DUF760-5, DUF760-7, and HUGZ-3 were the least abundant and DUF760-8 was never observed (Fig. 9A and Table 2). For comparison, CLPS1, CLPF, and the average abundance of the CLPPRT subunits were 0.7, 7.6, and 25. We do note that these abundance numbers can vary greatly across experiments and tissue types, and therefore they do not directly correlate to abundance in one specific cell or tissue type; nevertheless, they provide a general measure of protein observability.
Figure 9B compares the CLPC1-TRAP/CLC1-WT ratio to the number of experiments in the PeptideAtlas with the proteins ordered based on increased number of experiments. This shows that the enrichment in the CLPC1-TRAP is not related to general abundance (or observability), e.g., DUF760-6 is highly enriched in the CLPC1-TRAP but generally not that frequently observed in PeptideAtlas. Similarly, EXE1, EXE2, DUF179-1, and others are observed many times in the PeptideAtlas but only EXE1 and EXE2 are extremely enriched in the CLPC1-TRAP.
Figure 10 shows two examples (UVR4 and DUF760-4) of the primary sequence coverage and the peptide observations across experiments in the PeptideAtlas. In addition to our CLPC1 affinity experiments (>40 fold enriched in the CLPC1-TRAP compared with CLPC1-WT), UVR4 was detected mostly in nonphotosynthetic tissues (cell cultures, roots), whereas DUF760-4 was identified in a broader range of plant materials (leaves, flowers, and cell cultures) (Fig. 10B). However, for both proteins many more peptides were detected in the CLPC1-TRAP experiments showing that these proteins were truly highly enriched. All 21 observed proteins (Table 2) were identified with good sequence coverage in PeptideAtlas (32%–70%), whereas no peptides were identified in the N-terminal regions (see Figs. S4–S10 for all 21 proteins). The most N-terminal residue detected was at position 45; on average the most N-terminal residue was 69 aa from the N terminus supporting our prediction that (most of) these proteins have cleavable N-terminal chloroplast sorting sequences (chloroplast Transit Peptides or cTPs) (see Table 2). Moreover, for 11 proteins it was quite likely that the bona fide N terminus of the mature protein was detected because it was identified by a semi-tryptic peptide immediately downstream of a residue that was not K or R (hence not cleaved by trypsin) and with C-terminal K or R residues. Indeed, in most of these cases, the detected N terminus did fit the pattern of a cleaved cTP (i.e., cleavage downstream of a cysteine, serine, or alanine) (see examples in Fig. 10). We did evaluate for possible plastid N-degrons (5, 68), and we observed three times a Leu (UVR3, HugZ-3, and DUF760-7) and once an Asp (UVR4) as the likely N-terminal residue. It was recently shown that N-terminal Leu is recognized by CLPS1 but that the following residue (the P2′ position) greatly affects the affinity, with Arg and also Gly enhancing the affinity but Leu, Ser, and Ala reducing affinity (68). Leu was followed by a Ser for HugZ-3 and DUF760-7 but Phe in case of UVR3. The significance of these N-terminal residues in the trapped samples remains to be determined.
Conclusions
This study provides a comprehensive analysis of proteins that are copurified with CLPC1 chaperones in the Arabidopsis chloroplast, in particular when ATP hydrolysis of CLPC1 is impaired through Walker B mutations. In the absence of ATP hydrolysis, the interaction between CLPC1 and its substrates is stabilized (12). Since the main function of CLPC1 is the unfolding and delivery of substrates for degradation by the CLP protease complex, most of these interactors are likely protease substrates. However, it is quite likely that proteins that act in the regulation of CLPC1 hexamerization and activation could also be stabilized in their interactions with the CLPC1-TRAP. Finally, proteins that serve to select and deliver substrates (adaptors) to the CLPC1 chaperone may be unable to leave the CLPC1 chaperone if the substrate is unable to be unfolded and released into the CLP protease.
The CLPC1-TRAP plants do have pale green (virescent) young leaves, but these leaves green as they further develop and mature. The virescent phenotype must be accompanied with changes in the (chloroplast) proteome, and indeed, comparative proteomics of the homozygous clpc1-1 null mutant previously observed a proteome phenotype (57). This clpc1-1 null mutant has a much stronger phenotype (it is smaller and develops slower and its leaves are very pale) than the heterozygous CLPC1-TRAP line used for the current affinity enrichment. It is likely that proteins enriched in the CLPC1-TRAP line might also overaccumulate in the clpc1-1 null line, and indeed, that was the case for several proteins, in particular EXE2 and DUF179.3.
This study identified 15 trapped proteins involved in DNA and RNA metabolism and 22 proteins involved in different chloroplast metabolic pathways; most of these are likely to be CLP protease substrates but protein half-life experiments in CLP-deficient backgrounds will be needed to investigate this further. Furthermore, another 10 proteins involved in chloroplast proteostasis were highly enriched in the CLPC1-TRAP; these include the CLPF adaptor, the CLPD chaperone, CLPT1 and CLPT2, as well as the CPN10/CPN20 cochaperone pair. Several of these proteins are direct components of the CLP chaperone-protease system (CLPF, CLPT1, CLPT2, CLPD). The >10-fold enrichment of cochaperone pair CPN10 and CPN20 is highly intriguing given the recent identification in the Chlamydomonas CLP core structure through cryo-EM (48); perhaps the CPN10/20 proteins also directly interact with the CLP protease core complex to regulate access to the catalytic chamber.
Most of this study focused on a set of proteins in families with unknown functions, i.e., DUF179, DUF760, DUF151/UVR, DUF3143, HugZ, ARM, as well as EXE1 and EXE2. We identified 12 proteins in these families as being highly enriched in the CLPC1-TRAP, and analysis with BLAST and phylogeny identified another 10 members in these families, several of which were also enriched in the CLPC1 samples. Most (or perhaps all) of these 22 proteins localize to the chloroplast, suggesting that they specifically evolved to play a role in chloroplast metabolism or proteostasis. These proteins can perhaps serve as adaptors or in other regulatory functions in the Clp system and can also be substrates. Studies to determine possible regulatory functions such as CLP adaptor are difficult and often highly multiyear projects, as evidenced by the few examples published so far, in all cases for various types of bacteria. Just a few examples are (i) HSPQ in Escherichia coli which is now shown to be a regulator of Clp by inhibiting CLPS substrate selection but only if HSPQ is acetylated, thus HSPQ serves as an antiadaptor of CLPS (69); (ii) MecA in Bacillus subtilis, which not only acts as a substrate adaptor but also serves to functionally activate the CLPC hexamer (70), and (iii) the case of a tripartite adaptor system involving the adaptors CpdA, RcdA, and PopA in Caulobacter crescentus where RcdA can also be a substrate of the Clp protease system in dependence of its oligomeric state. It took several laboratories and many publications to begin to establish these regulatory functions. It is also important to note that several of these adaptors are themselves substrate for degradation by the Clp system (71). Because elucidation of Clp adaptor functions and even substrates can be so daunting, we carried out a comprehensive analysis of these 22 candidate adaptors and substrates through computational analysis (summarized in Fig. 2). We believe this will help to make more rational choices in selecting proteins for functional studies and also help design the most promising experiments.
We investigated for possible signals of coevolution with the CLP system and with each other and indeed several proteins; in particular the HugZ family members and ARM show signs of coevolution with the CLP system. Furthermore, specific members of the DUF760 and DUF179 families show strong coevolutionary signals, perhaps also indicative of protein–protein interactions between these members. To try and infer function, we used an in-depth mRNA-based coexpression network analysis. The complete set of CLPPRT proteins showed extremely tight coexpression consistent with a highly organized protein complex and further instilling confidence in the biological significance of the coexpression networks. Indeed, the coexpression networks suggest functional association of several of the proteins to specific functions or processes, such as the association of UVR2, UVR3, and DUF760-5 with members of the autophagy pathway and ubiquitination system, including several F-box proteins. These coexpression results will help to design experimental analysis for several of these proteins with unknown functions.
Finally, this study took advantage of the recent release of the Arabidopsis PeptideAtlas, which allowed a better understanding of the general abundance of the 22 proteins with unknown functions. This showed a wide range of abundance, and, importantly, showed that the CLPC1 trapping was highly specific as the enrichment in the CLPC1-TRAP showed no correlation with general abundance. Furthermore, the PeptideAtlas showed that all observed proteins accumulated without the first 50 to 70 amino acids, which is consistent with them having a cleavable chloroplast transit peptide for sorting from the cytoplasm (the site of protein translation for these nuclear-encoded proteins).
All together this comprehensive study provides a broad foundation to study the physiological role of the chloroplast CLP chaperone-protease system and discover molecular players and details of substrate delivery and regulation of CLP activity.
Experimental procedures
Plant material and plant growth
Homozygous wt/CLPC1-WT-STREPII and heterozygous wt/CLPC1-TRAP-STREPII transgenic lines used in this study are described in (11). Seeds were sown on agar plates with 50% Murashige and Skoog medium, 1% sucrose, and 20 mg/L BASTA. After 3 days dark stratification in the cold, these plates were transferred into to 10 h/14 h light/dark cycle at 100 μE m−2 s−1 to select transgenic lines carrying either transgene. After 10 days, surviving seedlings (100% for the homozygous wt/CLPC1-WT-STREPII line) were transferred to soil and grown under the same light/dark regime. Rosettes were harvested after 38 days just before bolting, divided in three separate batches per genotype, weighed, immediately frozen in liquid nitrogen, and stored at −80 °C. The different batches serve as biological replicates.
Protein extraction and affinity purification
Batches of rosettes (10–14 g) were ground by pestle and mortar in liquid nitrogen to a fine powder and vortexed in 10 to 12 ml extraction medium (EM; 50 mM Hepes-KOH pH 8.0; 15% glycerol, 10 mM MgCl2, 75 mM NaCl, 0.32 mg avidin/ml EM, and 250 μg/ml pefablok serine protease inhibitor). The suspension was filtered through four layers of Miracloth (∼25 μm, Millipore), and larger particles were removed by centrifugation for 1.5 h at 28,000 rpm in a SW28 rotor at 4 °C. The supernatants were collected and aliquoted and either directly used for affinity purification on StrepTactinXT high-capacity affinity beads (# 2-4030-010 from IBA Life Sciences) or stored at −80 °C for later analysis. StrepTactin columns (0.5–1 ml) were prepared as in (72) and washed with 2 column volumes with EM without glycerol followed by equilibration with 2 column volumes of EM. Samples (0.5–1 column volumes) were loaded, the flow through was discarded, and columns were washed with 5 to 10 column volumes of elution medium (EM without avidin). STREPII tagged proteins were eluted in 3 column volumes of EM + 2.5 mM biotin (Biotin binds irreversible to Streptactin resin but is reversible with the newer-generation StrepTactinXT resin used here) and collected individually. The eluates were pooled and concentrated using Ultra-4 Centrifugal Filter Units with a 3-kDa cutoff by centrifugation for ∼16 h at 5000 rpm at 4 °C in a JS 13.1 rotor. The concentrates were aliquoted and stored at −80 °C for further proteome and MS/MS analysis.
Proteomics and mass spectrometry
Affinity eluates of the transgenic lines expressing CLPC1-WT-STREP and CLPC1-TRAP-STREP were separated by SDS-PAGE on Biorad Criterion Tris-HCl precast gels (10.5%–14% acrylamide gradient) with three biological replicates. We refer to these eluates further as CLPC1-WT and CLPC1-TRAP. Each of the SDS-PAGE gel lanes were completely cut into consecutive gel slices (six per lane), followed by reduction, alkylation, and in-gel digestion with trypsin (73). The peptides resuspended in 15% formic acid were analyzed using a QExactive mass spectrometer equipped with a nanospray flex ion source and interfaced with a nanoLC system and autosampler (Dionex Ultimate 3000 Binary RSLCnano system). Peptide samples were automatically loaded on a guard column (C18 PepMap 100, 5 μm, 100 A; 300 μm i.d. × 1 mm; Thermo Scientific) via the autosampler followed by separation on a PepMap C18 reverse-phase nanocolumn (Inertsil ODS-3, 3 μm C18; 75 μm i.d. × 15 cm; Thermo Scientific) using 100-min gradients with 95% water, 5% ACN, 0.1% FA (solvent A) and 95% ACN, 5% water, 0.1% FA (solvent B) at a flow rate of 300 nl/min. Two blank samples were run after the six samples from each lane. The acquisition cycle consisted of a survey MS scan with a set mass range from 400 to 2000 m/z at the 70,000 resolving power followed by 10 data-dependent MS/MS scans with 2.0 m/z isolation window. Dynamic exclusion was used for 15 s. AGC target values were set at 1 × 106 for the MS survey scans and maximum scan time 30 ms, and either 5.105 or 5.104 for MS/MS scans and maximum scan time 50 ms. Each sample was analyzed three times using different acquisition conditions (technical replicates) as follows: (i) 5.105 MS/MS AGC and two internal washes with 95% B, (ii) 5.105 MS/MS AGC and one internal wash with 95% B, and (iii) 5.104 MS/MS AGC and one internal wash with 95% B.
Data processing using MASCOT and our internal workflow
Peak lists in MGF format were generated from RAW files using Distiller software (version 2.7.1.0) in default mode (Matrix Science). MGF files were searched with MASCOT v2.4.0 against TAIR10 including a small set of typical contaminants and the decoy (71,148 sequences; 29,099,536 residues). Two parallel searches (Mascot p-value <0.01 for individual ion scores; precursor ion window 700–3500 Da) were carried out: (i) full tryptic (error tolerance 6 ppm for MS and 0.5 Da for MS/MS) with variable M-oxidation, Gln to pyro-Glu (N-termQ), N-term protein acetylation, W mono-, di-, or tri-oxidation and Fixed Cys-carbamido-methylation, two missed cleavages (in Mascot PR or PK does not count as missed cleavage), (ii) semi-tryptic (error tolerance 3 ppm and 0.5 Da for MS/MS) with variable M-oxidation, N-term acetylation, Gln to pyro-Glu (N-termQ), W-mono-, di-, or tri-oxidation, and fixed Cys-carbamido-methylation, two missed cleavages. W-oxidation was included based on the recent observations showing that a specific tryptophan residue in EXECUTER1 was oxidized (49). To ensure a final peptide false discovery rate below 1%, using a post-Mascot script, all search results were further filtered for minimum ion score of 33, but 35 for single peptide identifications. This resulted in a false discovery rate for proteins identified with two or more peptides of zero. Proteins identified by MS/MS spectra that were all shared with other proteins identified by unique peptides were discarded. Proteins could only be identified by the spectral counting method (SPC) with the full tryptic (6 ppm) search. The semi-tryptic search served to increase protein coverage and was combined with the full tryptic search results. Proteins were quantified by the spectral counting method (SPC) using full and semi-tryptic peptides search results. For quantification by spectral counting, each accession was scored for total spectral counts (SPC), unique SPC (uniquely matching to an accession), and adjusted SPC (73). The latter assigns shared peptides to accessions in proportion to their relative abundance using unique spectral counts for each accession as a basis. Proteins that shared more than 80% of their matched peptides with other proteins across the complete dataset were grouped into families quantified as groups with these homologs (73). We evaluated the samples for potential enrichment based on matched MS/MS adjusted spectra (adjSPC) normalized to the total number of adjSPC in each sample, resulting in NadjSPC. Alternatively, abundances of proteins within each lane were normalized based on adjSPC for CLPC proteins. Significance analysis for individual protein enrichment based on NadjSPC was done using the GLEE software developed in Phyton, and a stand-alone executable version of the software code was created (https://github.com/lponnala/glee-py) (A. Poliakov, L. Ponnala, P.D. Olinares, and K.J. van Wijk, unpublished data). GLEE was run in a Windows platform with a cubic polynomial equation fitting, adaptive binning, and 20,000 iterations for the estimation of variation. No normalization by protein length or peptide length was included. Volcano plots were generated in Excel.
mRNA-based coexpression, networks, and functional enrichment
Coexpressed genes for the CLP and protein interactors families were downloaded (July 2020) from the plant coexpression database ATTED-II (http://atted.jp/) (64) using dataset Ath-u1. This dataset is a unified version of coexpression calculated by linear regression of both RNA-Seq and microarray coexpression data. The top 100 highest expressed genes based on the LS, a monotonic transformation of the Mutual Rank index, for each bait were used for detailed analysis. Larger LS indicates stronger coexpression, and LS = 0 indicates no coexpression. Protein function was based on an updated version of the MapMan annotation system integrated into the PPDB (http://ppdb.tc.cornell.edu/), and protein experimental or predicted subcellular location was obtained from PPDB. Proteins were assigned to plastid, mitochondria, peroxisome, or “other.”
Gene duplication and domain architecture evolution
Complete sets of annotated protein-coding sequences for 18 species across Archaeplastida were obtained from published sources (Table S2) and processed to select only the primary gene model for each locus. Orthofinder (version 2.4.0) (74) was used to cluster gene families from the 18 species. Amino acid sequences were aligned using the L-INS-i algorithm in MAFFT (v7.407) (75). These alignments were manually inspected for assembly/annotation artifacts, and several sequences were found that appeared to be erroneously annotated as two neighboring partial proteins, each covering roughly half the length of the full-length protein. Such sequences were concatenated together to yield a single protein sequence for the given species. These curated sequences were used for domain analyses (see below). To prepare alignments for phylogenetic analyses, GBLOCKS (version 0.91b) (76) was used to trim poorly aligned regions. GBLOCK parameters b1, b2, and b5 were set such that conserved, flank, and gap positions were defined based on presence in at least 50% of sequences. RAxML (v8.2.12) (77) was used to infer maximum likelihood trees using the following command for each gene:
raxmlHPC-PTHREADS-AVX -s <input file name> -n <output file name> -m PROTGAMMALG -p 12345 -x 12345 -# 100 -f a. The -m argument indicated the model used (gamma distributed rate heterogeneity, empirical amino acid frequencies, and the LG substitution model). The -p and -x arguments provided a seed for parsimony search and bootstrapping, respectively. The -# argument indicates the number of bootstrap replicates. The -f a argument implements rapid bootstrap analyses and best scoring tree search. Gene-tree/species-tree reconciliation analyses were carried out using Notung (version 2.9) (78, 79). These analyses allowed comparison of each gene tree against a predefined species tree (80, 81) in order to identify gene duplication events, rearrange poorly supported nodes, and root trees in a manner that best matches the species tree. Default parameters were used for reconciliation and defined poorly supported relationships as those displaying <80% bootstrap support. The NCBI Conserved Domain search tool (CD-search) (82) was used to study the evolution of domain architecture of the selected gene families using the manually curated but untrimmed versions of the sequences (described above) using default parameters. Domain map figures were generated in R with the ggtree package (version 1.14.6) (83).
Coevolution of CLP proteins and candidate CLP-interacting proteins
To search for evidence of coevolution between our proteins of interest, pairwise ERC analyses (58) was performed with 20 angiosperm species from a previously published dataset (59). p-Values were corrected for multiple tests using false discovery rate (84). The ERC network diagram was generated in R with igraph (85).
Arabidopsis protein names and identifiers
CLPR1 - AT1G49970; CLPR2 - AT1G12410; CLPR3 - AT1G09130; CLPR4 - AT4G17040; CLPP3 - AT1G66670; CLPP4 - AT5G45390; CLPP5 - AT1G02560; CLPP6 - AT1G11750; CLPD AT5G51070; CLPS - AT1G68660; CLPC1 - AT5G50920; CLPC2 - AT3G48870; CLPT1 - AT4G25370; CLPT2 - AT4G12060; CLPF - AT2G03390; ARM - AT1G23180; DUF179-1 - AT1G33780; DUF179-2 - AT3G19780; DUF179-3 - AT3G29240; DUF3143 - AT5G52960; DUF760-1 - AT1G32160; DUF760-3 - AT1G63610; DUF760-7 - AT5G14970; EXE1 - AT4G33630; EXE2 - AT1G27510; DUF760-6 - AT3G17800; DUF760-4 - AT2G14910; DUF760 to 2 - AT1G48450; HUGZ-1- AT5G24060; HUGZ-2- AT3G49140.
Data availability
The MS data have been deposited to the PRIDE Archive (http://www.ebi.ac.uk/pride/archive/) via the PRIDE partner repository and are available with the dataset identifier PXD017400. Matched posttranslational modifications as included in the Mascot searches, and limited information about MS-based identification results (peptide, ion score), as well as annotation of protein name, location, and function for the identified proteins can be found in the PPDB (http://ppdb.tc.cornell.edu/). The RAW files from PXD017400 were also processed as part of the Arabidopsis PeptideAtlas project and are available at http://www.peptideatlas.org/builds/arabidopsis/ (14). These PeptideAtlas data will be explored in this paper and compared with other Arabidopsis proteome datasets from other processed PXDs from ProteomeXchange.
Supporting information
This article contains supporting information.
Conflict of interest
The authors declare that they have no conflicts of interest with the contents of this article.
Acknowledgments
Author contributions
K. J. v. W. conceptualization; G. F., E. S. F., A. M. W., E. J. S. M., S. S. B., L. P., and D. B. S. formal analysis; J.-Y. R. L. and G. F. investigation; K. J. v. W. writing – original draft; K. J. v. W. project administration.
Funding and additional information
This research was supported by grants from the National Science Foundation (MCB 1940961 to K. J. v. W., MCB-1733227 to D. B. S., and IOS-2114641 to D. B. S and E. S. F.). A. M. W. was supported by graduate fellowships from the National Science Foundation (DGE-1321845) and the National Institutes of Health (T32-GM132057). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Edited by Gerald Hart
Footnotes
Present address for Jui-Yun Rei Liao: National Institute of Health (NIH), Bethesda, MD 20892-4256, USA.
Present address for Elena J. S. Michel: Boyce Thompson Institute, Ithaca, New York, USA.
Supporting information
References
- 1.Jarvis P., Lopez-Juez E. Biogenesis and homeostasis of chloroplasts and other plastids. Nat. Rev. Mol. Cell Biol. 2013;14:787–802. doi: 10.1038/nrm3702. [DOI] [PubMed] [Google Scholar]
- 2.van Wijk K.J. Protein maturation and proteolysis in plant plastids, mitochondria, and peroxisomes. Annu. Rev. Plant Biol. 2015;66:75–111. doi: 10.1146/annurev-arplant-043014-115547. [DOI] [PubMed] [Google Scholar]
- 3.Nishimura K., Kato Y., Sakamoto W. Essentials of proteolytic machineries in chloroplasts. Mol. Plant. 2017;10:4–19. doi: 10.1016/j.molp.2016.08.005. [DOI] [PubMed] [Google Scholar]
- 4.Izumi M., Nakamura S. Chloroplast protein turnover: The influence of extraplastidic processes, including autophagy. Int. J. Mol. Sci. 2018;19:828. doi: 10.3390/ijms19030828. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Bouchnak I., van Wijk K.J. Structure, function, and substrates of Clp AAA+ protease systems in cyanobacteria, plastids, and apicoplasts: A comparative analysis. J. Biol. Chem. 2021;296:100338. doi: 10.1016/j.jbc.2021.100338. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Rodriguez-Concepcion M., D'Andrea L., Pulido P. Control of plastidial metabolism by the Clp protease complex. J. Exp. Bot. 2019;70:2049–2058. doi: 10.1093/jxb/ery441. [DOI] [PubMed] [Google Scholar]
- 7.Nishimura K., van Wijk K.J. Organization, function and substrates of the essential Clp protease system in plastids. Biochim. Biophys. Acta. 2015;1847:915–930. doi: 10.1016/j.bbabio.2014.11.012. [DOI] [PubMed] [Google Scholar]
- 8.Williams A.M., Friso G., van Wijk K.J., Sloan D.B. Extreme variation in rates of evolution in the plastid Clp protease complex. Plant J. 2019;98:243–259. doi: 10.1111/tpj.14208. [DOI] [PubMed] [Google Scholar]
- 9.Seraphim T.V., Houry W.A. AAA+ proteins. Curr. Biol. 2020;30:R251–R257. doi: 10.1016/j.cub.2020.01.044. [DOI] [PubMed] [Google Scholar]
- 10.Puchades C., Sandate C.R., Lander G.C. The molecular principles governing the activity and functional diversity of AAA+ proteins. Nat. Rev. Mol. Cell Biol. 2020;21:43–58. doi: 10.1038/s41580-019-0183-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Montandon C., Friso G., Liao J.R., Choi J., van Wijk K.J. In vivo trapping of proteins interacting with the chloroplast CLPC1 chaperone: Potential substrates and adaptors. J. Proteome Res. 2019;18:2585–2600. doi: 10.1021/acs.jproteome.9b00112. [DOI] [PubMed] [Google Scholar]
- 12.Rei Liao J.Y., van Wijk K.J. Discovery of AAA+ protease substrates through trapping approaches. Trends Biochem. Sci. 2019;44:528–545. doi: 10.1016/j.tibs.2018.12.006. [DOI] [PubMed] [Google Scholar]
- 13.Bhuiyan N.H., Rowland E., Friso G., Ponnala L., Michel E.J.S., van Wijk K.J. Autocatalytic processing and substrate specificity of Arabidopsis chloroplast glutamyl peptidase. Plant Physiol. 2020;184:110–129. doi: 10.1104/pp.20.00752. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.van Wijk K.J., Leppert T., Sun Q., Boguraev S.S., Sun Z., Mendoza L., Deutsch E.W. The Arabidopsis PeptideAtlas: Harnessing worldwide proteomics data to create a comprehensive community proteomics resource. Plant Cell. 2021;33:3421–3453. doi: 10.1093/plcell/koab211. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Huang M., Friso G., Nishimura K., Qu X., Olinares P.D., Majeran W., Sun Q., van Wijk K.J. Construction of plastid reference proteomes for maize and Arabidopsis and evaluation of their orthologous relationships; the concept of orthoproteomics. J. Proteome Res. 2013;12:491–504. doi: 10.1021/pr300952g. [DOI] [PubMed] [Google Scholar]
- 16.Majeran W., Friso G., Asakura Y., Qu X., Huang M., Ponnala L., Watkins K.P., Barkan A., van Wijk K.J. Nucleoid-enriched proteomes in developing plastids and chloroplasts from maize leaves: A new conceptual framework for nucleoid functions. Plant Physiol. 2012;158:156–189. doi: 10.1104/pp.111.188474. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Williams-Carrier R., Zoschke R., Belcher S., Pfalz J., Barkan A. A major role for the plastid-encoded RNA polymerase complex in the expression of plastid transfer RNAs. Plant Physiol. 2014;164:239–248. doi: 10.1104/pp.113.228726. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Pfalz J., Liere K., Kandlbinder A., Dietz K.J., Oelmuller R. pTAC2, -6, and -12 are components of the transcriptionally active plastid chromosome that are required for plastid gene expression. Plant Cell. 2006;18:176–197. doi: 10.1105/tpc.105.036392. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Zhou W., Lu Q., Li Q., Wang L., Ding S., Zhang A., Wen X., Zhang L., Lu C. PPR-SMR protein SOT1 has RNA endonuclease activity. Proc. Natl. Acad. Sci. U. S. A. 2017;114:E1554–E1563. doi: 10.1073/pnas.1612460114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Wu W., Liu S., Ruwe H., Zhang D., Melonek J., Zhu Y., Hu X., Gusewski S., Yin P., Small I.D., Howell K.A., Huang J. SOT1, a pentatricopeptide repeat protein with a small MutS-related domain, is required for correct processing of plastid 23S-4.5S rRNA precursors in Arabidopsis thaliana. Plant J. 2016;85:607–621. doi: 10.1111/tpj.13126. [DOI] [PubMed] [Google Scholar]
- 21.Zoschke R., Watkins K.P., Miranda R.G., Barkan A. The PPR-SMR protein PPR53 enhances the stability and translation of specific chloroplast RNAs in maize. Plant J. 2016;85:594–606. doi: 10.1111/tpj.13093. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Lee K.H., Park J., Williams D.S., Xiong Y., Hwang I., Kang B.H. Defective chloroplast development inhibits maintenance of normal levels of abscisic acid in a mutant of the Arabidopsis RH3 DEAD-box protein during early post-germination growth. Plant J. 2013;73:720–732. doi: 10.1111/tpj.12055. [DOI] [PubMed] [Google Scholar]
- 23.Asakura Y., Galarneau E.R., Watkins K.P., Barkan A., van Wijk K.J. Chloroplast RH3 DEAD-box RNA helicases in Zea mays and Arabidopsis thaliana function in splicing of specific group II introns and affect chloroplast ribosome biogenesis. Plant Physiol. 2012;159:961–974. doi: 10.1104/pp.112.197525. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Nishimura K., Ashida H., Ogawa T., Yokota A. A DEAD box protein is required for formation of a hidden break in Arabidopsis chloroplast 23S rRNA. Plant J. 2010;63:766–777. doi: 10.1111/j.1365-313X.2010.04276.x. [DOI] [PubMed] [Google Scholar]
- 25.Wall M.K., Mitchenall L.A., Maxwell A. Arabidopsis thaliana DNA gyrase is targeted to chloroplasts and mitochondria. Proc. Natl. Acad. Sci. U. S. A. 2004;101:7821–7826. doi: 10.1073/pnas.0400836101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Evans-Roberts K.M., Mitchenall L.A., Wall M.K., Leroux J., Mylne J.S., Maxwell A. DNA gyrase is the target for the quinolone drug ciprofloxacin in Arabidopsis thaliana. J. Biol. Chem. 2016;291:3136–3144. doi: 10.1074/jbc.M115.689554. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Diray-Arce J., Liu B., Cupp J.D., Hunt T., Nielsen B.L. The Arabidopsis At1g30680 gene encodes a homologue to the phage T7 gp4 protein that has both DNA primase and DNA helicase activities. BMC Plant Biol. 2013;13:36. doi: 10.1186/1471-2229-13-36. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Morley S.A., Peralta-Castro A., Brieba L.G., Miller J., Ong K.L., Ridge P.G., Oliphant A., Aldous S., Nielsen B.L. Arabidopsis thaliana organelles mimic the T7 phage DNA replisome with specific interactions between Twinkle protein and DNA polymerases Pol1A and Pol1B. BMC Plant Biol. 2019;19:241. doi: 10.1186/s12870-019-1854-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Yagi Y., Ishizaki Y., Nakahira Y., Tozawa Y., Shiina T. Eukaryotic-type plastid nucleoid protein pTAC3 is essential for transcription by the bacterial-type plastid RNA polymerase. Proc. Natl. Acad. Sci. U. S. A. 2012;109:7541–7546. doi: 10.1073/pnas.1119403109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Chang S.H., Lee S., Um T.Y., Kim J.K., Do Choi Y., Jang G. pTAC10, a key subunit of plastid-encoded RNA polymerase, promotes chloroplast development. Plant Physiol. 2017;174:435–449. doi: 10.1104/pp.17.00248. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Bocobza S.E., Malitsky S., Araujo W.L., Nunes-Nesi A., Meir S., Shapira M., Fernie A.R., Aharoni A. Orchestration of thiamin biosynthesis and central metabolism by combined action of the thiamin pyrophosphate riboswitch and the circadian clock in Arabidopsis. Plant Cell. 2013;25:288–307. doi: 10.1105/tpc.112.106385. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Coquille S., Roux C., Mehta A., Begley T.P., Fitzpatrick T.B., Thore S. High-resolution crystal structure of the eukaryotic HMP-P synthase (THIC) from Arabidopsis thaliana. J. Struct. Biol. 2013;184:438–444. doi: 10.1016/j.jsb.2013.10.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Sa N., Rawat R., Thornburg C., Walker K.D., Roje S. Identification and characterization of the missing phosphatase on the riboflavin biosynthesis pathway in Arabidopsis thaliana. Plant J. 2016;88:705–716. doi: 10.1111/tpj.13291. [DOI] [PubMed] [Google Scholar]
- 34.Richter A.S., Banse C., Grimm B. The GluTR-binding protein is the heme-binding factor for feedback control of glutamyl-tRNA reductase. Elife. 2019;8 doi: 10.7554/eLife.46300. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Apitz J., Nishimura K., Schmied J., Wolf A., Hedtke B., van Wijk K.J., Grimm B. Posttranslational control of ALA synthesis includes GluTR degradation by Clp protease and stabilization by GluTR-binding protein. Plant Physiol. 2016;170:2040–2051. doi: 10.1104/pp.15.01945. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Ikegami A., Yoshimura N., Motohashi K., Takahashi S., Romano P.G., Hisabori T., Takamiya K., Masuda T. The CHLI1 subunit of Arabidopsis thaliana magnesium chelatase is a target protein of the chloroplast thioredoxin. J. Biol. Chem. 2007;282:19282–19291. doi: 10.1074/jbc.M703324200. [DOI] [PubMed] [Google Scholar]
- 37.Huang Y.S., Li H.M. Arabidopsis CHLI2 can substitute for CHLI1. Plant Physiol. 2009;150:636–645. doi: 10.1104/pp.109.135368. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Nishimura K., Apitz J., Friso G., Kim J., Ponnala L., Grimm B., van Wijk K.J. Discovery of a unique Clp component, ClpF, in chloroplasts: A proposed binary ClpF-ClpS1 adaptor complex functions in substrate recognition and delivery. Plant Cell. 2015;27:2677–2691. doi: 10.1105/tpc.15.00574. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Michel E.J.S., Ponnala L., van Wijk K.J. Tissue-type specific accumulation of the plastoglobular proteome, transcriptional networks, and plastoglobular functions. J. Exp. Bot. 2021;72:4663–4679. doi: 10.1093/jxb/erab175. [DOI] [PubMed] [Google Scholar]
- 40.Trosch R., Jarvis P. The stromal processing peptidase of chloroplasts is essential in Arabidopsis, with knockout mutations causing embryo arrest after the 16-cell stage. PLoS One. 2011;6 doi: 10.1371/journal.pone.0023039. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Richter S., Lamppa G.K. Structural properties of the chloroplast stromal processing peptidase required for its function in transit peptide removal. J. Biol. Chem. 2003;278:39497–39502. doi: 10.1074/jbc.M305729200. [DOI] [PubMed] [Google Scholar]
- 42.Kim J., Olinares P.D., Oh S.H., Ghisaura S., Poliakov A., Ponnala L., van Wijk K.J. Modified Clp protease complex in the ClpP3 null mutant and consequences for chloroplast development and function in Arabidopsis. Plant Physiol. 2013;162:157–179. doi: 10.1104/pp.113.215699. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Kim J., Kimber M.S., Nishimura K., Friso G., Schultz L., Ponnala L., van Wijk K.J. Structures, functions, and interactions of ClpT1 and ClpT2 in the Clp protease system of arabidopsis chloroplasts. Plant Cell. 2015;27:1477–1496. doi: 10.1105/tpc.15.00106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Tsitsekian D., Daras G., Alatzas A., Templalexis D., Hatzopoulos P., Rigas S. Comprehensive analysis of Lon proteases in plants highlights independent gene duplication events. J. Exp. Bot. 2019;70:2185–2197. doi: 10.1093/jxb/ery440. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Shin J.S., Kim S.Y., So W.M., Noh M., Yoo K.S., Shin J.S. Lon domain-containing protein 1 represses thioredoxin y2 and regulates ROS levels in Arabidopsis chloroplasts. FEBS Lett. 2020;594:986–994. doi: 10.1002/1873-3468.13664. [DOI] [PubMed] [Google Scholar]
- 46.Hayer-Hartl M., Hartl F.U. Chaperone machineries of Rubisco - the most abundant enzyme. Trends Biochem. Sci. 2020;45:748–763. doi: 10.1016/j.tibs.2020.05.001. [DOI] [PubMed] [Google Scholar]
- 47.Vitlin Gruber A., Zizelski G., Azem A., Weiss C. The Cpn10(1) co-chaperonin of A. thaliana functions only as a hetero-oligomer with Cpn20. PLoS One. 2014;9 doi: 10.1371/journal.pone.0113835. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Wang N., Wang Y., Zhao Q., Zhang X., Peng C., Zhang W., Liu Y., Vallon O., Schroda M., Cong Y., Liu C. The cryo-EM structure of the chloroplast ClpP complex reveals an interaction with the co-chaperonin complex that inhibits ClpP proteolytic activity. bioRxiv. 2021 doi: 10.1101/2021.07.26.453741. [preprint] [DOI] [PubMed] [Google Scholar]
- 49.Dogra V., Li M., Singh S., Li M., Kim C. Oxidative post-translational modification of EXECUTER1 is required for singlet oxygen sensing in plastids. Nat. Commun. 2019;10:2834. doi: 10.1038/s41467-019-10760-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Dogra V., Rochaix J.D., Kim C. Singlet oxygen-triggered chloroplast-to-nucleus retrograde signalling pathways: An emerging perspective. Plant Cell Environ. 2018;41:1727–1738. doi: 10.1111/pce.13332. [DOI] [PubMed] [Google Scholar]
- 51.Dogra V., Duan J., Lee K.P., Lv S., Liu R., Kim C. FtsH2-Dependent proteolysis of EXECUTER1 is essential in mediating singlet oxygen-triggered retrograde signaling in Arabidopsis thaliana. Front. Plant Sci. 2017;8:1145. doi: 10.3389/fpls.2017.01145. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Sharma M., Pandey G.K. Expansion and function of repeat domain proteins during stress and development in plants. Front. Plant Sci. 2015;6:1218. doi: 10.3389/fpls.2015.01218. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Moody L.A., Saidi Y., Gibbs D.J., Choudhary A., Holloway D., Vesty E.F., Bansal K.K., Bradshaw S.J., Coates J.C. An ancient and conserved function for Armadillo-related proteins in the control of spore and seed germination by abscisic acid. New Phytol. 2016;211:940–951. doi: 10.1111/nph.13938. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Nibau C., Gibbs D.J., Bunting K.A., Moody L.A., Smiles E.J., Tubby J.A., Bradshaw S.J., Coates J.C. ARABIDILLO proteins have a novel and conserved domain structure important for the regulation of their stability. Plant Mol. Biol. 2011;75:77–92. doi: 10.1007/s11103-010-9709-1. [DOI] [PubMed] [Google Scholar]
- 55.Moody L.A., Saidi Y., Smiles E.J., Bradshaw S.J., Meddings M., Winn P.J., Coates J.C. ARABIDILLO gene homologues in basal land plants: Species-specific gene duplication and likely functional redundancy. Planta. 2012;236:1927–1941. doi: 10.1007/s00425-012-1742-7. [DOI] [PubMed] [Google Scholar]
- 56.Woodson J.D., Joens M.S., Sinson A.B., Gilkerson J., Salome P.A., Weigel D., Fitzpatrick J.A., Chory J. Ubiquitin facilitates a quality-control pathway that removes damaged chloroplasts. Science. 2015;350:450–454. doi: 10.1126/science.aac7444. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Nishimura K., Asakura Y., Friso G., Kim J., Oh S.H., Rutschow H., Ponnala L., van Wijk K.J. ClpS1 is a conserved substrate selector for the chloroplast Clp protease system in Arabidopsis. Plant Cell. 2013;25:2276–2301. doi: 10.1105/tpc.113.112557. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Clark N.L., Alani E., Aquadro C.F. Evolutionary rate covariation reveals shared functionality and coexpression of genes. Genome Res. 2012;22:714–720. doi: 10.1101/gr.132647.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Forsythe E.S., Williams A.M., Sloan D.B. Genome-wide signatures of plastid-nuclear coevolution point to repeated perturbations of plastid proteostasis systems across angiosperms. Plant Cell. 2021;33:980–997. doi: 10.1093/plcell/koab021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.de Juan D., Pazos F., Valencia A. Emerging methods in protein co-evolution. Nat. Rev. Genet. 2013;14:249–261. doi: 10.1038/nrg3414. [DOI] [PubMed] [Google Scholar]
- 61.Powers E.T., Balch W.E. Diversity in the origins of proteostasis networks--a driver for protein function in evolution. Nat. Rev. Mol. Cell Biol. 2013;14:237–248. doi: 10.1038/nrm3542. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Liao J.-Y.R., Friso G., Kim J., van Wijk K.J. Consequences of the loss of catalytic triads in chloroplast CLPPR protease core complexes in vivo. Plant Direct. 2018;2 doi: 10.1002/pld3.86. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Rockenbach K., Havird J.C., Monroe J.G., Triant D.A., Taylor D.R., Sloan D.B. Positive selection in rapidly evolving plastid-nuclear enzyme complexes. Genetics. 2016;204:1507–1522. doi: 10.1534/genetics.116.188268. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Obayashi T., Aoki Y., Tadaka S., Kagaya Y., Kinoshita K. ATTED-II in 2018: A plant coexpression database based on investigation of the statistical property of the mutual rank index. Plant Cell Physiol. 2018;59:440. doi: 10.1093/pcp/pcx209. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Bhuiyan N.H., Friso G., Rowland E., Majsec K., van Wijk K.J. The plastoglobule-localized metallopeptidase PGM48 is a positive regulator of senescence in Arabidopsis thaliana. Plant Cell. 2016;28:3020–3037. doi: 10.1105/tpc.16.00745. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.van Wijk K.J., Kessler F. Plastoglobuli: Plastid microcompartments with integrated functions in metabolism, plastid developmental transitions, and environmental adaptation. Annu. Rev. Plant Biol. 2017;68:253–289. doi: 10.1146/annurev-arplant-043015-111737. [DOI] [PubMed] [Google Scholar]
- 67.Schelbert S., Aubry S., Burla B., Agne B., Kessler F., Krupinska K., Hortensteiner S. Pheophytin pheophorbide hydrolase (pheophytinase) is involved in chlorophyll breakdown during leaf senescence in Arabidopsis. Plant Cell. 2009;21:767–785. doi: 10.1105/tpc.108.064089. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Aguilar Lucero D., Cantoia A., Sanchez-Lopez C., Binolfi A., Mogk A., Ceccarelli E.A., Rosano G.L. Structural features of the plant N-recognin ClpS1 and sequence determinants in its targets that govern substrate selection. FEBS Lett. 2021;595:1525–1541. doi: 10.1002/1873-3468.14081. [DOI] [PubMed] [Google Scholar]
- 69.Yeom J., Groisman E.A. Activator of one protease transforms into inhibitor of another inresponse to nutritional signals. Genes Dev. 2019;33:1280–1292. doi: 10.1101/gad.325241.119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Carroni M., Franke K.B., Maurer M., Jäger J., Hantke I., Gloge F., Linder D., Gremer S., Turgay K., Bukau B., Mogk A. Regulatory coiled-coil domains promote head-to-head assemblies of AAA+ chaperones essential for tunable activity control. Elife. 2017;6 doi: 10.7554/eLife.30120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Kuhlmann N.J., Doxsey D., Chien P. Cargo competition for a dimerization interface restricts and stabilizes a bacterial protease adaptor. Proc. Natl. Acad. Sci. U. S. A. 2021;118 doi: 10.1073/pnas.2010523118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Schmidt T.G., Skerra A. The Strep-tag system for one-step purification and high-affinity detection or capturing of proteins. Nat. Protoc. 2007;2:1528–1535. doi: 10.1038/nprot.2007.209. [DOI] [PubMed] [Google Scholar]
- 73.Friso G., Olinares P.D.B., van Wijk K.J. In: Chloroplast Research in Arabidopsis. Jarvis R.P., editor. Humana Press; New York, NY: 2011. The workflow for quantitative proteome analysis of chloroplast development and differentiation, chloroplast mutants, and protein interactions by spectral counting; pp. 265–282. [DOI] [PubMed] [Google Scholar]
- 74.Emms D.M., Kelly S. OrthoFinder: Solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy. Genome Biol. 2015;16:157. doi: 10.1186/s13059-015-0721-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Katoh K., Standley D.M. MAFFT multiple sequence alignment software version 7: Improvements in performance and usability. Mol. Biol. Evol. 2013;30:772–780. doi: 10.1093/molbev/mst010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Castresana J. Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol. Biol. Evol. 2000;17:540–552. doi: 10.1093/oxfordjournals.molbev.a026334. [DOI] [PubMed] [Google Scholar]
- 77.Stamatakis A. RAxML version 8: A tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics. 2014;30:1312–1313. doi: 10.1093/bioinformatics/btu033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Vernot B., Stolzer M., Goldman A., Durand D. Reconciliation with non-binary species trees. J. Comput. Biol. 2008;15:981–1006. doi: 10.1089/cmb.2008.0092. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Stolzer M., Lai H., Xu M., Sathaye D., Vernot B., Durand D. Inferring duplications, losses, transfers and incomplete lineage sorting with nonbinary species trees. Bioinformatics. 2012;28:i409–i415. doi: 10.1093/bioinformatics/bts386. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Moreira D., Le Guyader H., Philippe H. The origin of red algae and the evolution of chloroplasts. Nature. 2000;405:69–72. doi: 10.1038/35011054. [DOI] [PubMed] [Google Scholar]
- 81.One Thousand Plant Transcriptomes Initiative One thousand plant transcriptomes and the phylogenomics of green plants. Nature. 2019;574:679–685. doi: 10.1038/s41586-019-1693-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Marchler-Bauer A., Derbyshire M.K., Gonzales N.R., Lu S., Chitsaz F., Geer L.Y., Geer R.C., He J., Gwadz M., Hurwitz D.I., Lanczycki C.J., Lu F., Marchler G.H., Song J.S., Thanki N., et al. Cdd: NCBI's conserved domain database. Nucleic Acids Res. 2015;43:D222–D226. doi: 10.1093/nar/gku1221. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Yu G., Smith D.K., Zhu H., Guan Y., Lam T.T.-Y. ggtree: An R package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods Ecol. Evol. 2017;8:28–36. [Google Scholar]
- 84.Benjamini Y., Hochberg Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Royal Stat. Soc. Ser. B. 1995;57:289–300. [Google Scholar]
- 85.Csardi G., Nepusz T. The igraph software package for complex network research. Inter J. Complex Systems. 2006;1695:1–9. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The MS data have been deposited to the PRIDE Archive (http://www.ebi.ac.uk/pride/archive/) via the PRIDE partner repository and are available with the dataset identifier PXD017400. Matched posttranslational modifications as included in the Mascot searches, and limited information about MS-based identification results (peptide, ion score), as well as annotation of protein name, location, and function for the identified proteins can be found in the PPDB (http://ppdb.tc.cornell.edu/). The RAW files from PXD017400 were also processed as part of the Arabidopsis PeptideAtlas project and are available at http://www.peptideatlas.org/builds/arabidopsis/ (14). These PeptideAtlas data will be explored in this paper and compared with other Arabidopsis proteome datasets from other processed PXDs from ProteomeXchange.