Identifying novel transcription factors required for virulence may lead to the identification of new drug targets. We have found that transcription of a virulence-critical operon of Mycobacterium tuberculosis is regulated by a novel transcription factor binding far upstream of the promoter. This operon includes the espA gene, which is required for ESX-1 secretion. This system has to be precisely regulated since continued secretion of its highly antigenic substrates would alert the immune system to the infection. Transcription of this operon is positively regulated by the EspR transcription factor, and dependent upon far-upstream sequences containing known EspR binding sites. Transcriptional activation therefore takes place over long distances of the chromosome, by looping out of DNA between the EspR binding site and the promoter. EspR thus has DNA binding characteristics similar to those of nucleoid-associated proteins (NAPs) and joins two other NAP-like proteins, Lsr2 and CRPMt, in regulating expression of this operon.
Although virulence in Mycobacterium tuberculosis, the causative agent of tuberculosis, is multifactorial, requiring the participation of several cell wall and cellular components, the ESX-1 secretion system is one component that is essential for a successful infection. In fact, the genes coding for parts of this secretion system, in the RD1 region, are deleted in the attenuated vaccine strain of M. bovis, BCG, and restoring the RD1 region to BCG results in partial reversal of attenuation. This secretion system, also known as a type VII system, transports certain proteins, notably ESAT-6 and CFP-10 (also known as EsxA and EsxB) into host cells, which appears to affect the phagosome membrane, to facilitate dissemination of M. tuberculosis into the cytoplasm. There is also another distinct genetic locus, the espACD-Rv3613c-Rv3612c operon, which is required for ESX-1 function (and thus espA mutants are attenuated in a mouse infection), and which is expressed in a transient manner during infection of the host cell, the macrophage. An unusual and notable feature of ESX-1 is the mutually dependent nature of the secretion, such that the secretion of each substrate relies on the secretion of the other substrates. Because some of the members of the espACD-Rv3613c-Rv3612c operon are themselves substrates, then ESX-1 secretion could be controlled by the short-lived expression of this operon that occurs during infection. Because the ESX-1 secreted proteins are antigenic to the adaptive immune system it is therefore advantageous to the bacterium to switch off the system immediately after phagocytosis by the macrophage.
The espACD-Rv3613c-Rv3612c operon is thus vital to ESX-1 secretion, and hence to the virulence of M. tuberculosis, and clearly precise control of its expression is required for a successful infection. The chromosomal context of this operon is notable in that the upstream region is relatively large, there being 1,357 bp from the start of the espA open-reading frame (ORF) to the start of the divergently transcribed Rv3617 ORF, suggesting that there is enough DNA to accommodate numerous regulatory elements. In fact, several DNA binding proteins have been shown to regulate expression of this operon. Thus it has previously been found that the mycobacterial cyclic AMP receptor protein (CRPMt) (Rickman et al., Mol Microbiol 2005) and the nucleoid-associated protein Lsr2 repress transcription of this operon (Gordon et al., Proc Natl Acad Sci U S A 2010), and the PhoP (Walters et al., Mol Microbiol 2006; Frigui et al., PLoS Pathog 2008) and EspR (Raghavan et al., Nature 2008) proteins have been found to be likely activators. We first became aware that sequences far upstream of the first gene in the operon, espA, were affecting transcription when we made transcriptional promoter fusions. By making a series of such constructs with progressively longer fragments of DNA upstream of espA, we were surprised to notice that there were stepped increases in promoter activity as measured by assays for β-galactosidase, the product of the lacZ gene, when longer fragments of upstream DNA were fused to the reporter gene. Thus, the first 800 bp upstream of espA supported relatively low expression, but when this was increased to include the 800–995 bp region there was a 6-fold enhancement in expression. We designated this latter region as the espA Activating Region (EAR). More detailed analysis showed that the critical region lay between 887 and 995 bp, with an especially large increase associated with the 11 bp between 984 and 995 bp. Curiously, and perhaps significantly, the presence of DNA from 1,100–1,200 bp reduced activity to an intermediate level.
We knew that these increases in activity were not due to new promoter activity since we showed that the only promoter in the 1,357 bp intergenic region was located 67 bp upstream of the espA gene. So the EAR looked much like an enhancer, an invertible, mobile DNA sequence that increases gene expression by mediating RNA polymerase assembly at a promoter. Promoter activity could be decreased by deletions within the EAR (between 993 and 1,003 bp), but it could also be decreased by inserting foreign DNA downstream of the EAR, suggesting that the EAR had been moved too far from the espA promoter at –67 bp. Addition of the EAR to a 520 bp fragment of the upstream region resulted in a sizeable increase in activity, but not if it was added to a 217 bp fragment. Moving the EAR construct further away in the 217 + EAR construct by adding a 661 bp foreign DNA fragment did not increase promoter activity, showing that the failure of the EAR to act from 217 bp was not because it was too close and unable to loop out. So we concluded that the 217 bp fragment lacked a further site required for the EAR, although it is possible that the insert disrupted a genetic element in the 217 bp fragment. Nevertheless, the EAR was not a classical enhancer since inverting it abolished its enhancing effect on transcription.
Our conclusion was that the enhancing effect of the EAR could involve the binding of a protein to this site, but which one? As mentioned previously, one of the DNA binding proteins previously found to be an activator was EspR. To test the likelihood that EspR was binding to the EAR we constructed an espR deletion mutant of M. tuberculosis using homologous recombination, and used this as a host for different length espA promoter fusions. From this experiment it was evident that the increases in promoter activity that resulted from adding longer fragments of upstream DNA were dependent upon a functional espR gene. We concluded that EspR must be mediating the increased transcription that resulted from increased length of the espA upstream region, thereby enhancing promoter activity, by binding far upstream of the start of transcription at –67 bp.
Significantly, while we were completing these experiments, another group published the location of EspR binding sites in the espA upstream region (Rosenberg et al., Proc Natl Acad Sci U S A 2011). These were reported to be at 468 bp, 798 bp and 983 bp upstream of the espA gene; the position of the latter two high affinity sites correlated very closely with the results of our in vivo transcription fusion experiments where there was a particularly large increase in promoter activity when the upstream DNA fragments were extended from 800 bp to 995 bp. These results therefore provided a molecular explanation for our in vivo transcription activity data. The binding site identified was non-palindromic, which would explain why the EAR did not function when inverted.
As mentioned above, it seemed curious that transcription reporter gene activity decreased when longer fragments of upstream DNA were fused to the lacZ reporter gene. A possible explanation of this has now appeared from a recent publication by Blasco et al. (PLoS Pathog 2012) who have mapped the EspR binding sites using ChIP-seq experiments. This has shown that there is another major EspR binding site between 1,113 bp and 1,214 bp upstream of espA: perhaps EspR binding at this site has an inhibitory role whereas binding at the other sites activates transcription.
Interestingly, the DNA upstream of espA containing the EspR-binding sites is missing from some members of the M. tuberculosis complex. For example, in M. bovis the length of DNA upstream of espA is only 465 bp rather than the 1,357 bp in M. tuberculosis. This has previously been noted as the RD8 deletion. It implies that the sequences upstream of –465 bp relative to the start of the espA gene, are not required for virulence since M. bovis is still a virulent strain. By examining the sequences of many members of the M. tuberculosis complex, it appeared most likely that the common ancestor of the members of the M. tuberculosis complex had the complete 1,357 bp upstream region and this had been reduced by two independent deletions, one leading to the evolution of certain members of one lineage coming mainly from the Philippines with an upstream sequences of 288 bp, and another resulting in the evolution of strains related to M. bovis having a sequence of 465 bp. We found that the core promoter of M. bovis, although having three single nucleotide polymorphisms, had a similar activity to the core promoter of M. tuberculosis; nevertheless with the longer upstream region containing the EspR binding sites, transcription emanating from the M. tuberculosis sequence was approximately three times greater than that from the M. bovis sequence. The consequent differences in expression of the ESX-1 secretion system among these members of the M. tuberculosis complex may contribute to their differing pathologies.
Studies of the crystal structure of EspR reported in the paper by Rosenberg et al., mentioned above, and by Blasco et al. (Mol Microbiol 2011) have provided intriguing evidence of how this protein is binding to the upstream DNA sites. These studies have shown that EspR contains an N-terminal helix-turn-helix domain and an atypical C-terminal dimerization domain, resulting in a dimer of dimers with two subunits binding two consecutive major grooves. This means that the other two DNA-binding domains can form higher order oligomers, enabling EspR to bridge distant DNA sites in a cooperative manner, promoting looping of DNA, providing a structural explanation for the results of our in vivo gene expression experiments; in fact Blasco et al. were able to observe such looping using atomic force microscopy with EspR-DNA complexes of the 1,357 bp DNA region upstream of espA.
All of this evidence demonstrating that EspR could influence gene transcription by binding at a considerable distance from the promoter indicated that EspR was not a conventional transcription factor interacting with RNA polymerase at a promoter, but suggested it was more similar to a nucleoid-associated protein (NAP). These proteins can bind to DNA at multiple sites resulting in the bending of DNA: they are the functional equivalent of histones in eukaryotes, organizing higher-order structures of chromosomes, although they do not share structural properties with them. Although some, like HU in Escherichia coli, bind DNA in a sequence-independent fashion, others such as IHF bind to well conserved DNA sequences; in this respect EspR appears to resemble the latter. NAPs can often also influence transcription via multiple possible mechanisms, including in some cases displacing an activator, or even interacting with RNA polymerase like more conventional transcription factors, in other cases by assisting contact between regulatory proteins and RNA polymerase. In many cases these proteins have the ability to introduce bends in the DNA that facilitate protein-protein interactions in the nucleoprotein complexes.
So how is EspR activating transcription from the espA promoter? As previously mentioned, we know that espA transcription is influenced by at least three other proteins: it is activated by another transcription factor PhoP, and repressed by CRPMt and possibly by the known NAP, Lsr2. One plausible model is that the bending of DNA catalyzed by EspR facilitates the contact between a transcription factor, such as PhoP, and RNA polymerase. In the case of CRPMt, we know that it can act as a conventional transcription factor at some genes in M. tuberculosis by interacting with RNA polymerase at sites close to the promoter (Stapleton et al., J Biol Chem 2010), but in the espA upstream region CRPMt binds at a site over 1,000 bp away from the transcription start site (M. Stapleton, C. Kahramanoglou, unpublished data). In E. coli, CRP has also been shown to bind to hundreds of sites along the chromosome and may have many of the characteristics of NAPs (Grainger et al., Proc Natl Acad Sci U S A 2005); based on this paradigm, therefore, one possibility is that CRPMt can also act as an NAP at some loci such as espA to influence gene expression at long range, perhaps in this case by displacing EspR.
Also like many other known NAPs, binding of EspR is not restricted to a single site, the espA upstream region. Thus Blasco et al. (PLoS Pathog 2012) showed that EspR binds to at least 165 genetic loci with binding not being restricted to promoter regions. Like other NAPs, the levels of EspR in the cell are about 30-fold higher than those of classic transcriptional activators, well in excess of that necessary to occupy all of the 582 experimentally detected (within the 165 loci) or the 1,026 computationally predicted binding sites (Blasco et al., PLoS Pathog 2012). It appears to act as an activator of transcription of some genes, but as a repressor of others. Where it does bind to promoter regions, many of the genes concerned have cell wall associated functions, including other ESX secretion systems, and some have known functions in virulence determination, such as the fadD26 gene involved in the synthesis of the wax-like compound phthiocerol dimycocerosate (PDIM). The reduced virulence of espR mutants could therefore be the result of multiple defects rather than solely via espA and ESX-1 secretion. In this respect it resembles CRPMt which also occupies numerous binding sites (C. Kahramanoglou, unpublished data) and affects the expression of many genes; similarly the reduced virulence of crp mutants could be the result of multiple defects. Besides espA, some other genes regulated by CRPMt, such as fadD26, are also regulated by EspR. Moreover, there is a considerable overlap (77%) between the genes in the EspR regulon and those controlled by the known NAP Lsr2, although as Blasco et al. note, the regulatory outcomes are likely to be different since the binding mechanism of these two proteins differ, with Lsr2 binding to the minor groove of DNA and EspR binding to the major groove. Thus it appears that M. tuberculosis has evolved mechanisms involving NAPs to coordinately regulate varied sets of genes at distant sites on the chromosome.
EspR has previously been reported to be secreted by the ESX-1 system (Raghavan et al., Nature 2008). This created a negative feedback loop that modulated the activity of this secretion system. However, the more recent study of Blasco et al. (PLoS Pathog 2012) did not substantiate the claim that EspR is a secreted protein. Nevertheless, a feedback mechanism does probably regulate the levels of EspR, but one based on the control of transcription rather than secretion, since Blasco et al. found that EspR binds to its own promoter to downregulate its expression, and EspR protein levels are growth phase-dependent, reaching a maximum in stationary phase, at which stage espR transcription is lower.
A hallmark of M. tuberculosis infection is the ability to form latent infections through the formation of granuloma in the lungs. These dormant bacteria, which have a low metabolic activity, tend to be inaccessible to therapy by most antibiotics and form a reservoir of infection that can be reactivated especially after disturbance of the immune system, such as occurs after HIV infection. Since it is thought that one-third of the world’s population is infected with M. tuberculosis, mostly latently, the potential for new tuberculosis cases through reactivation of latent infections is huge. An interesting prospect therefore is that NAPs are involved in the reorganization of chromatin and subsequent critical changes in gene expression that must occur after the reactivation of dormant bacteria. This process probably needs to be very closely controlled and this would therefore make EspR, Lsr2 and CRPMt truly key players in the pathogenesis of M. tuberculosis and, as such, potential drug targets.
Footnotes
Previously published online: www.landesbioscience.com/journals/virulence/article/20918