Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2014 Oct 1;111(41):14758–14763. doi: 10.1073/pnas.1409345111

Structure of a PE–PPE–EspG complex from Mycobacterium tuberculosis reveals molecular specificity of ESX protein secretion

Damian C Ekiert 1,1, Jeffery S Cox 1,1
PMCID: PMC4205667  PMID: 25275011

Significance

Mycobacterium tuberculosis (Mtb) infects nearly a third of the global population, and understanding how Mtb establishes infection and evades host responses is key to development of improved therapies. Two mysterious protein families, called Pro-Glu motif–containing (PE) and Pro-Pro-Glu motif–containing (PPE) proteins, are highly expanded in Mtb and have been linked to virulence, but their function remains unknown. We have determined the crystal structure of a PE-PPE protein dimer bound to ESAT-6 secretion system (ESX) secretion-associated protein G (EspG), a component of the secretion system that translocates PE-PPE proteins to the bacterial cell surface. This structure reveals how each of the four EspGs in Mtb interacts with a different subset of the ∼100 PE and ∼70 PPE proteins, directing specific classes of PE-PPE “effector” proteins through separate secretory pathways.

Keywords: tuberculosis, protein secretion, antigenic variation, virulence factor, host–pathogen interactions

Abstract

Nearly 10% of the coding capacity of the Mycobacterium tuberculosis genome is devoted to two highly expanded and enigmatic protein families called PE and PPE, some of which are important virulence/immunogenicity factors and are secreted during infection via a unique alternative secretory system termed “type VII.” How PE-PPE proteins function during infection and how they are translocated to the bacterial surface through the five distinct type VII secretion systems [ESAT-6 secretion system (ESX)] of M. tuberculosis is poorly understood. Here, we report the crystal structure of a PE-PPE heterodimer bound to ESX secretion-associated protein G (EspG), which adopts a novel fold. This PE-PPE-EspG complex, along with structures of two additional EspGs, suggests that EspG acts as an adaptor that recognizes specific PE–PPE protein complexes via extensive interactions with PPE domains, and delivers them to ESX machinery for secretion. Surprisingly, secretion of most PE-PPE proteins in M. tuberculosis is likely mediated by EspG from the ESX-5 system, underscoring the importance of ESX-5 in mycobacterial pathogenesis. Moreover, our results indicate that PE-PPE domains function as cis-acting targeting sequences that are read out by EspGs, revealing the molecular specificity for secretion through distinct ESX pathways.


Tuberculosis is a major public health challenge, and new interventions are needed to control emerging, highly drug-resistant strains (1). Sequencing of the Mycobacterium tuberculosis (Mtb) genome revealed the presence of two mysterious, highly expanded protein families in pathogenic mycobacteria (2), named PE and PPE, due to the presence of N-terminal domains with conserved Pro-Glu (PE) and Pro-Pro-Glu (PPE) sequence motifs. The Mtb genome encodes ∼100 PE and ∼70 PPE genes, accounting for ∼10% of the genome’s coding capacity (2), whereas nonpathogenic mycobacteria harbor relatively few PE and PPE genes. Both protein families are highly polymorphic, localize to the cell surface or are secreted (3, 4), and are expressed during infection (5), leading to hypotheses that they are involved in virulence, antigenic diversity, or immune evasion (6). Although many PE-PPE proteins are recognized by the immune system during infection (7), it remains unclear whether they are involved in antigenic variation (8). Outside the N-terminal core PE or PPE domain, the C-terminal segments vary widely (2, 9), sometimes encoding putative enzymatic domains or large peptide repeat arrays (several greater than 1,000 aa in length). PE and PPE genes often form operons, suggesting PE and PPE proteins interact with each other, and the crystal structure of a PE–PPE protein complex showed directly that they form a heterodimer (10). However, the PE and PPE domains have no detectable sequence or structural homology to other protein families, and their function remains unknown.

Intriguingly, secretion of PE and PPE proteins has been linked to a set of related, specialized protein export pathways of mycobacteria called the 6-kDa early secreted antigenic target (ESAT-6) secretion system (ESX) (11, 12). “Type VII” secretion systems distantly related to mycobacterial ESX systems have been identified in numerous other Gram-positive bacteria (1317). The protein components of type VII secretion systems differ considerably between bacterial species but generally include the following: (i) one or more small helical proteins of the WXG100 protein family (e.g., ESAT-6, YukE), (ii) an FtsK/SpoIII-type ATPase that is thought to drive protein secretion (e.g., EccC, YukB), and (iii) a multipass transmembrane protein that may form the pore of the translocon (e.g., EccD, YueB). Along with additional species/lineage-specific factors, such as the mycobacterial PE and PPE proteins, these components have been proposed to assemble in the plasma membrane, where they translocate specific protein substrates to the cell surface (or to the periplasmic space of mycobacteria) (figure 4B of ref. 18). The Mtb genome encodes five distinct but evolutionarily related type VII/ESX systems (ESX-1 to ESX-5) at different loci around the genome, and the primary attenuating mutation in the Mycobacterium bovis bacillus Calmette–Guérin vaccine strain is a deletion of a large segment of the ESX-1 locus (19). Each ESX system is thought to secrete a distinct complement of proteins, including the cognate ESAT-6 and 10-kDa culture filtrate protein (CFP-10) homologs encoded in each specific locus (19, 20). All ESX gene clusters, with the exception of ESX-4, also encode at least one PE-PPE gene pair, and all of the PE-PPE proteins tested so far are secreted in an ESX-dependent manner (11, 21), suggesting that PE-PPE secretion may be an important function of the ESX, although many PE-PPE proteins are encoded outside of ESX loci. Recently, PPE proteins were reported to interact with ESX secretion-associated protein G (EspG) (22), another ESX-encoded protein of unknown structure and function. However, the molecular basis for this interaction and its functional importance in PPE secretion remain unclear.

Here, we report the crystal structures of three EspGs from Mtb and Mycobacterium smegmatis (Msmeg), and a ternary complex between an Mtb PE-PPE pair and its cognate EspG. These structures define the key elements of the PPE–EspG interaction and, coupled with bioinformatics and biochemical interaction studies, suggest that EspGs function as adaptors to deliver PE–PPE complexes to their cognate ESX system for translocation across the plasma membrane. Moreover, our work shows that the vast majority of PE-PPE proteins in Mtb interact with EspG from the ESX-5 secretion system, which has been hypothesized to play a major role in PE-PPE protein secretion (9, 20, 23).

Results and Discussion

Structure of the EspG Encoded in the ESX-3 Gene Cluster of Mtb Defines a Novel Protein Fold.

To better understand the role of EspG in PE-PPE protein secretion, we determined the crystal structure of the EspG paralog from the ESX-3 system (EspG3) of Mtb (EspG3Mt) at a resolution of 2.85 Å (SI Appendix, Table S1). Because no structural homologs of EspG could be identified based upon the primary sequence, crystals of Msmeg EspG3 (EspG3Ms) derivatized with selenomethionine were phased by multiwavelength anomalous dispersion (MAD) (24) (SI Appendix, Fig. S1 and Table S2), and the resulting model was used to phase the EspG3Mt dataset by molecular replacement. Here, we focus on the structure of EspG3Mt and will return to discuss EspG3Ms subsequently.

EspG3Mt adopts a novel, mixed α/β-fold consisting of a 10-stranded β-sheet and eight α-helices (Fig. 1A). The large sheet is kinked in the middle of the eighth β-strand (β3′) by a nonclassical bulge due to a two-residue insertion, resulting in a slight bend of the sheet. The sheet is flanked on each end by a three-helix bundle (α1–α3 and α1′–α3′; Fig. 1A), and two additional helices (α4 and α4′) pack against the back face of the β-sheet. In contrast, a large portion of the opposite face of the sheet is solvent-exposed, forming the bottom of a shallow basin lined by nearby loops and helices. The C-terminal 18 residues, including several large hydrophobic residues, are largely disordered in the electron density map but may serve as an ideal docking site for EspG onto the larger ESX apparatus anchored in the plasma membrane.

Fig. 1.

Fig. 1.

Crystal structure of EspG3Mt reveals a novel fold. (A) Overall fold and secondary structure assignment of EspG3Mt. The β-strands and α-helices of the N-terminal subdomain are denoted β1–β5 and α1–α4, whereas the structurally equivalent elements from the C-terminal subdomain are labeled β1′–β5′ and α1′–α4′. (B) EspG3Mt exhibits pseudo-twofold rotational symmetry about an axis perpendicular to the center of the β-sheet. The two pseudosymmetrical subdomains of EspG are colored differently along the axis of pseudosymmetry. There is structural similarity between the N-terminal and C-terminal EspG subdomains (C) and the most closely related protein structure identified in the Protein Data Bank in a Dali Database search (D; PDB ID code 1TU1, a protein of unknown function from Pseudomonas aeruginosa). Cartoon depictions (Left) and topology diagrams (Right) are included. Structurally equivalent elements are depicted in the same color.

Intriguingly, the overall EspG fold is approximately twofold symmetrical, with each half containing five strands and four helices (Fig. 1 B and C). The two halves of EspG have the same topology and connectivity, despite little detectable sequence similarity between the two halves. Thus, the extant EspG fold may have arisen from the duplication and fusion of an ancestral subdomain that has been heavily diversified over the course of evolution. Indeed, whereas a search of the Dali Database (25) was unable to identify proteins with a structure similar to the complete EspG, searches using either of the two subdomains identified several distantly related proteins with a similar core topology, although they were generally missing one or more helices and had additional elements not present in EspG (Fig. 1D).

Although the two halves of EspG bear striking resemblance to one another, there are two main regions of divergence between the two subdomains. First, helix α2 from the N-terminal subdomain is rotated ∼70° from the position of helix α2′ in the C-terminal subdomain (Fig. 1C, Upper vs. Lower). In addition, the short linker connecting helices α1′ and α2′ in the C-terminal subdomain forms a much longer, V-shaped “tongue” in the N-terminal subdomain between α1 and α2, and wraps around α1 to interact with the convex surface of the β-sheet nearly 20 Å away. The second major difference between the subdomains lies in strands β2′ and β3′ of the C-terminal subdomain, which are much longer and distorted by the bulge in β3′ described above (Fig. 1C, Upper vs. Lower).

EspG Encoded in the ESX-5 Gene Cluster of Mtb Interacts Exclusively with the PPE Subunit of PE25-PPE41.

EspG has been reported to interact with PPE proteins and is required for PE-PPE secretion (22, 26). To understand how EspG recognizes specific PE–PPE complexes and mediates their secretion through the ESX, we determined the crystal structure of a ternary complex of the EspG encoded in the ESX-5 gene cluster of Mtb (EspG5Mt) bound to the PE25-PPE41 dimer at a resolution of 2.45 Å (SI Appendix, Fig. S2 and Table S1). In addition, we determined a higher resolution structure of the unbound PE25-PPE41 dimer at a resolution of 1.95 Å (SI Appendix, Table S1). Our unbound PE25-PPE41 dimer is very similar to the previously reported structure (10) (rmsd of ∼0.34 Å over 250 Cα atoms). The PE25-PPE41 dimer is entirely α-helical and forms an elongated, cigar-shaped rod ∼110 Å long and ∼20 Å thick (10) (Fig. 2A). The PPE subunit spans the full length of the molecule, whereas the smaller PE domain is more compact and binds only to one end of the PPE protein. EspG5Mt binds to the opposite end of PPE41, distal from the PE25 binding site (Fig. 2 A and B), resulting in only minor structural changes in PE25-PPE41 upon binding (SI Appendix, Fig. S3). No direct contacts are made between EspG5Mt and PE25, with ∼10 Å separating the closest pair of residues. It is noteworthy that the C termini of both the PE and PPE subunits also reside at the EspG-distal end of the PE25-PPE41 rod. Because many PE and PPE proteins encode large C-terminal domains of unknown function, binding of EspG far from their C termini may prevent steric clashes with these additional domains.

Fig. 2.

Fig. 2.

Structure of the EspG5Mt–PE25–PPE41 ternary complex. (A) Overview of the EspG5Mt–PE25–PPE41 complex, with PE25 in yellow, PPE41 in red, and EspG5Mt in purple. (B) Surface representation of the PE25-PPE41 with EspG5Mt bound. The EspG5Mt contacts on PPE41 are colored red, with key residues indicated. (C) Detailed view of the EspG5Mt contact surface on PPE41 (red), with key interacting residues labeled. The structure is rotated 90° with respect to the representation in B. (D) Contact surface between PPE41 and EspG5Mt, oriented to show the shallow basin on EspG5Mt (white surface with contact residues in purple), which binds PPE41 (red). (E) Four distinct elements on EspG5Mt that make up the PPE binding surface. EspG5Mt is oriented similarly in D and E. (F) Interaction between the long tongue of EspG5Mt (peach) and the α5 helix from PPE41 (red), which accounts for nearly half of the EspG5Mt–PPE41 interface.

The EspG–PPE interface is expansive, including 42 residues on EspG and 29 on PPE protein, with a total of 3,500 Å2 of solvent accessible surface buried upon binding (Fig. 2 C and D), which is nearly two- to threefold the buried surface area in a typical antibody/antigen interaction (2732). Almost the entire EspG contact surface on PPE41 is derived from a single polypeptide segment (residues 116–155), which forms a helix–turn–helix motif at one end of the PE25-PPE41 rod (Fig. 2A). The tip of this helix–turn–helix is inserted into a shallow basin formed on one face of the EspG β-sheet (Fig. 2D), but the interface extends well outside this depression, with interactions running for eight helical turns along the α5 helix of PPE41. A small number of interactions are also made between EspG5Mt and three N-terminal PPE41 residues (His2, Glu4, and Pro7), although these glancing contacts are predicted to make little contribution to specificity or the overall binding energy. On EspG5Mt, four major elements are involved in PPE recognition (Fig. 2E): (i) the convex surface of the β-sheet; (ii) helix α1′ from the C-terminal EspG subdomain; (iii) an open, extended loop between strands β2 and β3; and (iv) the long, “tongue-shaped” α1-α2 loop. The α1-α2 loop alone accounts for ∼40% of the interactions with PPE protein (Fig. 2F).

PPE Binding Surface on EspG Differs Between Paralogs.

The Mtb genome encodes four EspG paralogs, each sharing less than ∼25% pairwise sequence identity (SI Appendix, Fig. S4), and nearly 70 highly variable PPE proteins. The crystal structure of the PE–PPE–EspG complex defines the interacting surfaces between EspG and PPE proteins, allowing for an analysis of how sequence and structural variation in these regions may be involved in determining the specificity of EspG–PPE interactions. Comparison of EspG3Mt and EspG5Mt provides clues into how specificity may be mediated. Additionally, we solved a crystal structure of EspG3Ms at a resolution of 2.80 Å (Fig. 3A and SI Appendix, Tables S1 and S2). Although EspG3Mt and EspG3Ms are orthologs, they are only ∼60% identical at the sequence level, and consequently may provide additional insight into the structurally variable and/or conserved regions of the PPE-binding surface on EspG.

Fig. 3.

Fig. 3.

Structural variation between EspGs. (A) Overlay of EspG3Mt (white) and EspG3Ms (red). The variable β2-β3 loops and α2 helices are indicated. (B) Overlay of EspG3Mt (white) and EspG5Mt (blue). (C) Expanded view of the β2-β3 loop (C) and α1-α2 helices (D) from EspG3Mt (white), EspG3Ms (red), and EspG5Mt (blue). (D) Location of P46, which breaks helix α2 in EspG5Mt, is indicated.

EspG3Mt adopts the same overall fold as EspG5Mt and EspG3Ms (Fig. 3B; EspG5Mt vs. EspG3Mt: rmsd of ∼2.8 Å over 231 Cα atoms; EspG3Ms vs. EspG3Mt: rmsd of ∼1.6 Å over 239 Cα atoms). However, structural deviations in two key regions significantly alter the overall shape of the PPE binding surface and may have important implications for PPE binding specificity. First, a number of residues that interact with PPE protein reside in loops of variable length and structure (SI Appendix, Fig. S4), such as the β2-β3 loop, which ranges in length from approximately seven residues in EspG3Ms and homology models of EspG2Mt to 12 residues in EspG3Mt and 26 residues in EspG5Mt (Fig. 3C). Second, the α2 helix and the α1-α2 loop differ considerably between EspG5Mt and EspG3Mt (Fig. 3D). In EspG3Mt and also EspG3Ms, α2 is approximately five turns long and would clash with a PPE protein bound in the orientation observed in the EspG5Mt–PE–PPE complex. In contrast, the α2 helix in EspG5Mt is broken after ∼2.5 turns by a Pro at position 46 (Pro46), which is strictly conserved among mycobacterial EspG5s (Fig. 3D and SI Appendix, Table S3). The Pro46 kink reconfigures the adjacent, long α1-α2 tongue from an open V-shape in the EspG3 structures to a more closed U-shape in EspG5Mt. The resulting EspG5Mt tongue cradles PPE41’s α5 helix over six helical turns (Fig. 2F), accounting for over 40% of the PPE–EspG interface. The vast majority of EspG1 and EspG3 sequences lack a proline in the vicinity of position 46, and consequently may form longer α2 helices with an α1-α2 loop structure that is more similar to the α1-α2 loop structure in EspG3Mt and EspG3Ms (SI Appendix, Table S3). In contrast, 100% of EspG2 sequences analyzed have Pro46, and the α2 helix and α1-α2 loop of EspG2 may consequently adopt a conformation similar to the conformation of EspG5Mt. Consistent with this prediction, the sequence of the α1-α2 loop tongue from EspG2Mt is similar to that of EspG5Mt, particularly at positions that contact PPE protein in the PE25-PPE41-EspG5Mt crystal structure (SI Appendix, Fig. S5), whereas the EspG3 tongue sequence is divergent. Taken together, variation in the PPE binding surface on EspG suggests that each EspG paralog may interact with distinct subsets of the PE-PPE repertoire.

EspG-Interacting Surface on PPE Proteins Is Conserved.

In contrast to the variability of the EspGs, sequence analysis of the ∼70 PPE proteins from Mtb Erdman reveal that the core of the EspG-interacting surface on PPE protein is surprisingly well-conserved, particularly in light of the high degree of sequence diversity among PPE proteins (mean pairwise identity of ∼47%) (SI Appendix, Fig. S6). Twenty-nine PPE residues make at least one van der Waals contact with EspG, but many of these residues make just a few interactions per residue, appear somewhat flexible, and are loosely packed at the interface. Consequently, many of these residues probably make only small individual contributions to the binding energy, although in aggregate, they may have a modest effect on binding.

Three contact residues are well-conserved across Mtb PPE proteins, despite making relatively minor interactions with EspG. However, these residues play critical roles in stabilizing the helix–turn–helix motif of the PPE tip, suggesting that the EspG-binding surface of most PPE proteins adopts a similar conformation. First, the Asn122 side chain caps the C-terminal end of PPE helix α4, and ∼91% of Mtb PPE proteins have a helix-capping residue at this position (SI Appendix, Fig. S7A). Second, the Asn123 side chain reaches across the helix–turn–helix motif and forms a hydrogen bond with the Phe128 backbone, stabilizing the conformation of the connecting loop (SI Appendix, Fig. S7B). Asn123 is highly conserved, with ∼97% of Mtb PPE proteins carrying an Asn residue at this position. Third, residue Gly126 lies at the tip of the helix–turn–helix motif and adopts a positive phi angle (SI Appendix, Fig. S7B). This region of the Ramachandran plot gives strong preference to Gly, and Gly126 is almost universally conserved (64 of 65 Mtb PPE sequences). The strong conservation of these three residues suggests that the overall conformation of the helix–turn–helix motif is similar in most Mtb PPE proteins, allowing us to model the interaction between various PPE-EspG pairs and make predictions about affinity and specificity, which we subsequently tested.

After excluding residues that make no side-chain interactions (i.e., with little sequence constraint at that position) or relatively low-quality contacts, 10 PPE residues remain that likely have the greatest impact on binding affinity and specificity (Fig. 2C). Many of these residues make buried hydrophobic interactions with EspG, such as Ala124, Leu125, Trp143, and Gly147. Trp143, in particular, is well-packed at the interface, making a total of ∼22 van der Waals contacts and a cation–pi interaction with Arg104 from EspG. In contrast, Ala124, Leu125, and Gly147 are each “underpacked” (SI Appendix, Fig. S8), potentially leaving space for the accommodation of larger hydrophobic side chains at these positions. Indeed, PPE41 has a smaller residue at each of these positions compared with other PPE proteins: Ala124 instead of Val, Leu, Phe, or Trp; Leu125 instead of Phe; and Gly147 instead of Ala or Val. Several other positions make key interactions or may be otherwise conserved due to constraints imposed by the binding interface (details are provided in SI Appendix, Fig. S9). Thus, the EspG-interacting surface on PPE41 is moderately conserved, and most of the observed variation is predicted to have only a modest impact on binding to EspG5Mt.

Most PPE Proteins in Mtb Bind to EspG5.

Variation among EspG paralogs from Mtb suggests that each EspG will interact with distinct subsets of the PPE repertoire, yet the general conservation of the EspG binding surface on PPE protein suggests that most of the PPE proteins in Mtb will interact with a single EspG. Based upon the structure of the EspG5Mt–PE25–PPE41 complex and conservation of the PPE contact surface between PPE41 and most other PPE proteins, we predict that most PPE proteins in Mtb will interact with EspG5, consistent with the immunological results of Sayes et al. (7). However, the degree of sequence variation on both sides of the interface makes it difficult to rule out other models, such as each EspG interacting with numerous PPE proteins and facilitating PPE secretion from each ESX cluster. To explore the specificity of the PPE–EspG interaction further, we selected a panel of eight diverse PPE proteins from the Mtb genome for additional study (SI Appendix, Fig. S10A). This panel included six proteins encoded outside of ESX clusters (PPE14, PPE17, PPE20, PPE36, PPE41, and PPE60) and two proteins encoded within the ESX-1 and ESX-2 loci (PPE68ESX-1 and PPE69ESX-2). Despite moderate variation in the EspG-binding surface of these PPE proteins, all but one were predicted to bind EspG5. The remaining sequence, PPE69ESX-2, is a clear outlier among all of the PPE sequences in Mtb, with several polymorphisms predicted to disrupt binding to EspG5 because it (i) lacks a helix-capping residue at position 122, (ii) has a Gly126Trp substitution that is too bulky to be accommodated at the interface and unable to adopt a positive phi angle, and (iii) has a 2-aa insertion in the vicinity of the 122–126 loop.

Assaying PPE-EspG binding is complicated by the fact that PPE proteins are predicted to interact with specific PE partners and function as obligate heterodimers (33); yet, aside from PE25-PPE41, very few pairs of interacting PE and PPE proteins have been identified (10, 34). Indeed, recombinant expression of PPE proteins alone typically yields nonnative, aggregated, or insoluble material (10). To overcome these technical challenges, we constructed chimeric PPE41s by replacing the key EspG-interacting region of PPE41 with the corresponding sequence from another PPE protein of interest (Fig. 4A). The resulting chimeras are referred to as “PPE14-like,” “PPE17-like,” etc. (SI Appendix, Fig. S10B). The chimeric PPE proteins were coexpressed with PE25 to produce soluble PE-PPE heterodimers. Using this approach, we produced sufficient quantities of seven of the eight PPE proteins selected for the panel, and only the PPE36 chimera failed to express well. Binding between EspG and PPE proteins was assayed by biolayer interferometry. Binding of the PE25-PPE41 heterodimer to EspG5Mt was robust, yielding a dissociation constant (Kd) of 1.3 nM (Fig. 4B). In contrast, binding of PE25-PPE41 to EspG3Mt was undetectable (Fig. 4C and SI Appendix, Fig. S11). Remarkably, five of the six soluble PPE chimeras also bound specifically to EspG5Mt with high affinity, including the PPE14-like, PPE17-like, PPE20-like, PPE60-like, and PPE68-like (Fig. 4C and SI Appendix, Fig. S11). In contrast, the remaining PPE69-like chimera failed to bind EspG5Mt, consistent with its unusual sequence features in the EspG binding region. Instead, the PPE69-like chimera bound specifically to EspG3Mt (Fig. 4C and SI Appendix, Fig. S11). Taken together, our structural, bioinformatic, and biochemical data support a model wherein EspGs specifically recognize PPE domains and the vast majority of PPE proteins in Mtb interact with EspG5Mt, whereas a small number of more specialized proteins interact with the remaining EspGs.

Fig. 4.

Fig. 4.

Most PPE proteins bind EspG5Mt with high affinity. (A) Schematic of chimeric PPE proteins used in binding experiments, with the grafted region highlighted in red. (B) PE25-PPE41 binds to EspG5Mt with high affinity by biolayer interferometry (BLI). The raw data are plotted in blue, whereas the best global fits are shown in red. The uppermost curve corresponds to PE25-PPE41 binding at 500 nM, with progressively lower responses at 250, 125, 62.5, 31.3, 15.6, and 7.81 nM. (C) Summary of qualitative BLI binding experiments for PPE proteins and EspGs, with (+) or without (−) binding (also SI Appendix, Fig. S11).

Model for EccA-Mediated Dissociation of the EspG–PE–PPE Complex.

In contrast to PE and PPE proteins, which are translocated through the ESX, EspG is not secreted (22). In light of the nanomolar Kd between PPE protein and EspG, this observation raises questions about how the PE-PPE dimer is released from the complex for secretion. The length and conformation of the β2-β3 loop is arguably the most distinctive feature of EspG5Mt compared with the shorter loops in the EspG3 structures (Fig. 3C). However, despite the large size and striking conformation of the β2-β3 loop, it makes only a moderate number of contacts with PPE protein and much of the loop remains solvent-exposed. Intriguingly, this loop lies in close proximity to the highly conserved Pro-Pro-Glu motif (Fig. 5 AC) that gave rise to the “PPE” protein family name (2), but it makes only glancing contacts with Pro7 and is unlikely to explain its broad conservation. The PPE surface surrounding the Pro-Pro-Glu motif is well-conserved (Fig. 5 AC), encompassing ∼500 Å2 of solvent accessible surface area and including residues 6, 10, 109, 113, 139, and 143. This conserved patch is part of a larger, generally hydrophobic surface (Fig. 5 AC), consistent with the hypothesis that this region may be important for protein–protein interactions. Thus, along with the adjacent β2-β3 loop from EspG, the conserved PPE surface is poised to act as a docking site for other ESX components, such as ESX conserved component A (EccA). EccA is an ATPase that may be involved in dissociating the EspG–PPE interaction because (i) EccA proteins are only encoded in ESX gene clusters that also encode PE and PPE proteins, (ii) EccA mutants accumulate PPE-bound EspG (22), and (iii) EccA interacts with both PPE protein (35) and EspG (SI Appendix, Fig. S12) in yeast two-hybrid assays. Because the β2-β3 loop is one of the most variable regions of EspG, this loop may potentially allow for specific recruitment of the cognate EccA paralog (e.g., recruitment of EccA5Mt by EspG5Mt, etc.). Binding across the interface near the conserved site on PPE protein and the β2-β3 loop on EspG might allow EccA to dissociate the EspG–PE–PPE complex, passing the PE-PPE proteins off to the rest of the ESX system for secretion and recycling EspG to recruit additional PE-PPE proteins from the cytoplasm (Fig. 5D). Because EccA is an active ATPase (36), ATP hydrolysis is likely involved in the dissociation of the EspG–PE–PPE complex, although future studies will be required to understand this process better, as well as the relative contributions of the ATPase and tetratricopeptide repeat (TPR) (37) domains.

Fig. 5.

Fig. 5.

Putative protein binding site spans the PPE–EspG interface, suggesting a model for EccA-mediated dissociation of the PE–PPE–EspG complex. (A) PPE41 (white surface) bound to EspG5Mt (purple). Sequence analysis of the PPE surface surrounding the conserved Pro-Pro-Glu motif reveals a patch of conserved residues (orange) that are part of a larger hydrophobic surface (cyan), just adjacent to the β2-β3 loop. (B and C) Close-up views of the hydrophobic region (cyan) and conserved patch (orange) on PPE41 (white surface). (D) Possible model for EccA-mediated dissociation of PE-PPE protein from EspG. Binding of the EccA ATPase across the PPE–EspG interface, perhaps coupled with ATP hydrolysis, may facilitate the release of PE-PPE protein for secretion through ESX and recycling of EspG.

Model of PE-PPE Secretion by ESX1ESX5 in Mtb.

Of the five ESX clusters in the Mtb genome, ESX-1, ESX-2, ESX-3, and ESX-5 encode EspG, PE, and PPE paralogs. In addition, numerous other PE and PPE genes are scattered throughout the genome. Because each cluster is thought to encode a complete, independent ESX system, it is generally assumed that PE, PPE, and EspG proteins from a given cluster will interact specifically with each other, and only associate with other components from the same ESX cluster (Fig. 6). However, our results suggest that some cross-talk may occur between clusters. For example, PPE68 is encoded within the ESX-1 locus in Mtb and Mycobacterium marinum, and it was previously shown to interact with EspG1 from the same cluster (22). However, the PPE68-like chimera also binds to EspG5Mt with high affinity (Figs. 4C and 6). Similarly, PPE69 is encoded within the ESX-2 cluster and presumably interacts with EspG2Mt; however, in addition, our PPE69-like chimera binds EspG3Mt (Fig. 4C). Although binding of the ESX-3–encoded PPE protein to EspG3Mt has not been directly demonstrated, the Msmeg ESX-3 orthologs form a stable complex on gel filtration (SI Appendix, Fig. S13), and the contact surfaces on PPE protein and EspG3 are highly conserved between Msmeg and Mtb.

Fig. 6.

Fig. 6.

Model for the network of interactions between the PPE protein and EspG families in Mtb. PPE proteins encoded within a given ESX cluster generally interact with the EspG from the same cluster (blue box, black lines) and are secreted through the cognate ESX. However, some cluster-encoded PPE proteins can cross-react with EspGs from other clusters, at least in vitro. In contrast, non–ESX-encoded PPE proteins, which account for the majority of PPE genes in Mtb, interact preferentially with EspG5Mt (pink box, black lines). In all, ∼95% of all PPE proteins from Mtb are predicted to interact with EspG5Mt, likely leading to their secretion to the cell surface through the ESX-5 secretion system.

Taken together, our data support a model consisting of two classes of PPE–EspG interactions (Fig. 6). First, the PE, PPE, and EspG proteins from each ESX cluster interact with one another, accounting for a relatively small segment of the overall PPE repertoire. This intracluster association probably resembles the situation in nonpathogenic mycobacteria, such as Msmeg, where the ESX-1 and ESX-3 clusters each encode a single PE, PPE, and EspG protein and no other PE or PPE genes are found elsewhere in the genome. In some cases, EspG and PPE proteins may cross-react with components from another cluster, although it remains unclear if this cross-reactivity is functionally relevant or merely arises from incomplete orthogonalization of the different ESX clusters. Second, the non–ESX-encoded PPE proteins interact with EspG5, which includes the vast majority of the PPE sequences (Fig. 6). The dramatic expansion and diversification of the PE and PPE protein families, including the distribution of most of these genes throughout the genome and outside ESX clusters, appears to coincide with the acquisition of ESX-5 in pathogenic mycobacteria, which is consistent with our finding that most PPE proteins interact with EspG5 and are likely secreted in an ESX-5–dependent manner (23).

Although the interaction with EspG sheds light on how PE and PPE proteins interface with ESXs, their molecular functions remain to be defined. One intriguing possibility is that the PE and PPE domains serve as targeting modules, directing any cargo domains attached to their C termini to the ESX for translocation and anchoring to the cell surface (38), a process that is important for modulating interactions with the host during infection (39). In this model, EspG would play a role akin to the signal recognition particle in the conventional secretory pathway, by binding to nascent PE–PPE complexes and directing them to ESX machinery in the plasma membrane for further processing. Consistent with this mechanism, EspG binds ∼100 Å away from the PE and PPE C termini, thereby avoiding steric clashes or specific interactions with the putative cargo domains. However, it is also possible that the PE-PPE subunit may have additional functions apart from directing proteins to the ESX, and further studies are required to explore the relative importance of the PE-PPE vs. C-terminal “cargo” domains in Mtb virulence.

Materials and Methods

Recombinant proteins were expressed in Escherichia coli and purified by standard procedures. All diffraction data were collected at the Advanced Light Source beamline 8.3.1 and solved by MAD or molecular replacement. EspG and PE-PPE binding studies and Kds were assayed using an Octet RED instrument (Forte Bio). Further detailed information is provided in SI Appendix, SI Materials and Methods.

Supplementary Material

Supplementary File

Acknowledgments

We thank G. Bhabha (University of California, San Francisco) for discussions and for figure preparation; A. Murzin (Medical Research Council) for analysis of the EspG fold; J. Holton, G. Meigs, and the staff of Advanced Light Source beamline 8.3.1 for beamline support; and I. Wilson (The Scripps Research Institute) for critical reading of the manuscript and for providing access to the Octet instrument. This work was supported by National Institutes of Health Grant R01AI081727 (to J.S.C.). D.C.E. is a Damon Runyon Fellow supported by the Damon Runyon Cancer Research Foundation (Grant DRG-2140-12). The Advanced Light Source is supported by the Director, Office of Science, Office of Basic Energy Sciences, of the US Department of Energy under Contract DE-AC02-05CH11231.

Footnotes

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

Data deposition: The atomic coordinates and structure factors reported in this paper have been deposited in the Protein Data Bank, www.pdb.org (PDB ID codes 4W4I4W4L).

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1409345111/-/DCSupplemental.

References

  • 1.Udwadia ZF, Amale RA, Ajbani KK, Rodrigues C. Totally drug-resistant tuberculosis in India. Clin Infect Dis. 2012;54(4):579–581. doi: 10.1093/cid/cir889. [DOI] [PubMed] [Google Scholar]
  • 2.Cole ST, et al. Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence. Nature. 1998;393(6685):537–544. doi: 10.1038/31159. [DOI] [PubMed] [Google Scholar]
  • 3.Sampson SL, et al. Expression, characterization and subcellular localization of the Mycobacterium tuberculosis PPE gene Rv1917c. Tuberculosis (Edinb) 2001;81(5-6):305–317. doi: 10.1054/tube.2001.0304. [DOI] [PubMed] [Google Scholar]
  • 4.Cascioferro A, et al. PE is a functional domain responsible for protein translocation and localization on mycobacterial cell wall. Mol Microbiol. 2007;66(6):1536–1547. doi: 10.1111/j.1365-2958.2007.06023.x. [DOI] [PubMed] [Google Scholar]
  • 5.Kruh NA, Troudt J, Izzo A, Prenni J, Dobos KM. Portrait of a pathogen: The Mycobacterium tuberculosis proteome in vivo. PLoS ONE. 2010;5(11):e13938. doi: 10.1371/journal.pone.0013938. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Banu S, et al. Are the PE-PGRS proteins of Mycobacterium tuberculosis variable surface antigens? Mol Microbiol. 2002;44(1):9–19. doi: 10.1046/j.1365-2958.2002.02813.x. [DOI] [PubMed] [Google Scholar]
  • 7.Sayes F, et al. Strong immunogenicity and cross-reactivity of Mycobacterium tuberculosis ESX-5 type VII secretion: Encoded PE-PPE proteins predicts vaccine potential. Cell Host Microbe. 2012;11(4):352–363. doi: 10.1016/j.chom.2012.03.003. [DOI] [PubMed] [Google Scholar]
  • 8.McEvoy CRE, et al. Comparative analysis of Mycobacterium tuberculosis pe and ppe genes reveals high sequence variation and an apparent absence of selective constraints. PLoS ONE. 2012;7(4):e30593. doi: 10.1371/journal.pone.0030593. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Gey van Pittius NC, et al. Evolution and expansion of the Mycobacterium tuberculosis PE and PPE multigene families and their association with the duplication of the ESAT-6 (esx) gene cluster regions. BMC Evol Biol. 2006;6:95. doi: 10.1186/1471-2148-6-95. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Strong M, et al. Toward the structural genomics of complexes: Crystal structure of a PE/PPE protein complex from Mycobacterium tuberculosis. Proc Natl Acad Sci USA. 2006;103(21):8060–8065. doi: 10.1073/pnas.0602606103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Abdallah AM, et al. A specific secretion system mediates PPE41 transport in pathogenic mycobacteria. Mol Microbiol. 2006;62(3):667–679. doi: 10.1111/j.1365-2958.2006.05409.x. [DOI] [PubMed] [Google Scholar]
  • 12.Stanley SA, Raghavan S, Hwang WW, Cox JS. Acute infection and macrophage subversion by Mycobacterium tuberculosis require a specialized secretion system. Proc Natl Acad Sci USA. 2003;100(22):13001–13006. doi: 10.1073/pnas.2235593100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Baptista C, Barreto HC, São-José C. High levels of DegU-P activate an Esat-6-like secretion system in Bacillus subtilis. PLoS ONE. 2013;8(7):e67840. doi: 10.1371/journal.pone.0067840. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Sysoeva TA, Zepeda-Rivera MA, Huppert LA, Burton BM. Dimer recognition and secretion by the ESX secretion system in Bacillus subtilis. Proc Natl Acad Sci USA. 2014;111(21):7653–7658. doi: 10.1073/pnas.1322200111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Garufi G, Butler E, Missiakas D. ESAT-6-like protein secretion in Bacillus anthracis. J Bacteriol. 2008;190(21):7004–7011. doi: 10.1128/JB.00458-08. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Zoltner M, Fyfe PK, Palmer T, Hunter WN. Characterization of Staphylococcus aureus EssB, an integral membrane component of the Type VII secretion system: Atomic resolution crystal structure of the cytoplasmic segment. Biochem J. 2013;449(2):469–477. doi: 10.1042/BJ20121209. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Akpe San Roman S, et al. A heterodimer of EsxA and EsxB is involved in sporulation and is secreted by a type VII secretion system in Streptomyces coelicolor. Microbiology. 2010;156(Pt 6):1719–1729. doi: 10.1099/mic.0.037069-0. [DOI] [PubMed] [Google Scholar]
  • 18.Houben EN, Korotkov KV, Bitter W. 2014 doi: 10.1016/j.bbamcr.2013.11.003. Take five—Type VII secretion systems of Mycobacteria. Biochim Biophys Acta 1843(8):1707–1716. Available at www.sciencedirect.com/science/article/pii/S0167488913003820. Accessed November 25, 2013. [DOI] [PubMed]
  • 19.Mahairas GG, Sabo PJ, Hickey MJ, Singh DC, Stover CK. Molecular analysis of genetic differences between Mycobacterium bovis BCG and virulent M. bovis. J Bacteriol. 1996;178(5):1274–1282. doi: 10.1128/jb.178.5.1274-1282.1996. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Gey Van Pittius NC, et al. The ESAT-6 gene cluster of Mycobacterium tuberculosis and other high G+C Gram-positive bacteria. Genome Biol. 2001;2(10) doi: 10.1186/gb-2001-2-10-research0044. RESEARCH0044. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Abdallah AM, et al. PPE and PE_PGRS proteins of Mycobacterium marinum are transported via the type VII secretion system ESX-5. Mol Microbiol. 2009;73(3):329–340. doi: 10.1111/j.1365-2958.2009.06783.x. [DOI] [PubMed] [Google Scholar]
  • 22.Daleke MH, et al. Specific chaperones for the type VII protein secretion pathway. J Biol Chem. 2012;287(38):31939–31947. doi: 10.1074/jbc.M112.397596. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Bottai D, et al. Disruption of the ESX-5 system of Mycobacterium tuberculosis causes loss of PPE protein secretion, reduction of cell wall integrity and strong attenuation. Mol Microbiol. 2012;83(6):1195–1209. doi: 10.1111/j.1365-2958.2012.08001.x. [DOI] [PubMed] [Google Scholar]
  • 24.Guss JM, et al. Phase determination by multiple-wavelength x-ray diffraction: Crystal structure of a basic “blue” copper protein from cucumbers. Science. 1988;241(4867):806–811. doi: 10.1126/science.3406739. [DOI] [PubMed] [Google Scholar]
  • 25.Hasegawa H, Holm L. Advances and pitfalls of protein structural alignment. Curr Opin Struct Biol. 2009;19(3):341–348. doi: 10.1016/j.sbi.2009.04.003. [DOI] [PubMed] [Google Scholar]
  • 26.Bottai D, et al. ESAT-6 secretion-independent impact of ESX-1 genes espF and espG1 on virulence of Mycobacterium tuberculosis. J Infect Dis. 2011;203(8):1155–1164. doi: 10.1093/infdis/jiq089. [DOI] [PubMed] [Google Scholar]
  • 27.Ekiert DC, et al. A highly conserved neutralizing epitope on group 2 influenza A viruses. Science. 2011;333(6044):843–850. doi: 10.1126/science.1204839. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Xu R, et al. Structural basis of preexisting immunity to the 2009 H1N1 pandemic influenza virus. Science. 2010;328(5976):357–360. doi: 10.1126/science.1186430. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Tsibane T, et al. Influenza human monoclonal antibody 1F1 interacts with three major antigenic sites and residues mediating human receptor specificity in H1N1 viruses. PLoS Pathog. 2012;8(12):e1003067. doi: 10.1371/journal.ppat.1003067. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.McLellan JS, et al. Structure of RSV fusion glycoprotein trimer bound to a prefusion-specific neutralizing antibody. Science. 2013;340(6136):1113–1117. doi: 10.1126/science.1234914. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Zhou T, et al. Structural basis for broad and potent neutralization of HIV-1 by antibody VRC01. Science. 2010;329(5993):811–817. doi: 10.1126/science.1192819. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Whittle JRR, et al. Broadly neutralizing human antibody that recognizes the receptor-binding pocket of influenza virus hemagglutinin. Proc Natl Acad Sci USA. 2011;108(34):14216–14221. doi: 10.1073/pnas.1111497108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Riley R, Pellegrini M, Eisenberg D. Identifying cognate binding pairs among a large set of paralogs: The case of PE/PPE proteins of Mycobacterium tuberculosis. PLOS Comput Biol. 2008;4(9):e1000174. doi: 10.1371/journal.pcbi.1000174. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Tiwari B, Soory A, Raghunand TR. An immunomodulatory role for the Mycobacterium tuberculosis region of difference 1 locus proteins PE35 (Rv3872) and PPE68 (Rv3873) FEBS J. 2014;281(6):1556–1570. doi: 10.1111/febs.12723. [DOI] [PubMed] [Google Scholar]
  • 35.Teutschbein J, et al. A protein linkage map of the ESAT-6 secretion system 1 (ESX-1) of Mycobacterium tuberculosis. Microbiol Res. 2009;164(3):253–259. doi: 10.1016/j.micres.2006.11.016. [DOI] [PubMed] [Google Scholar]
  • 36.Luthra A, Mahmood A, Arora A, Ramachandran R. Characterization of Rv3868, an essential hypothetical protein of the ESX-1 secretion system in Mycobacterium tuberculosis. J Biol Chem. 2008;283(52):36532–36541. doi: 10.1074/jbc.M807144200. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Wagner JM, Evans TJ, Korotkov KV. Crystal structure of the N-terminal domain of EccA₁ ATPase from the ESX-1 secretion system of Mycobacterium tuberculosis. Proteins. 2014;82(1):159–163. doi: 10.1002/prot.24351. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Daleke MH, et al. Conserved Pro-Glu (PE) and Pro-Pro-Glu (PPE) protein domains target LipY lipases of pathogenic mycobacteria to the cell surface via the ESX-5 pathway. J Biol Chem. 2011;286(21):19024–19034. doi: 10.1074/jbc.M110.204966. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Kunnath-Velayudhan S, Porcelli SA. Recent Advances in Defining the Immunoproteome of Mycobacterium tuberculosis. Front Immunol. 2013;4:335. doi: 10.3389/fimmu.2013.00335. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary File

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES