Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2012 Feb 1.
Published in final edited form as: Mol Biochem Parasitol. 2010 Nov 3;175(2):192–195. doi: 10.1016/j.molbiopara.2010.10.009

Regions of intrinsic disorder help identify a novel nuclear localization signal in Toxoplasma gondii histone acetyltransferase TgGCN5-B

Stacy E Dixon a, Micah M Bhatti a,1, Vladimir N Uversky b,c, A Keith Dunker b, William J Sullivan Jr a,*
PMCID: PMC3005016  NIHMSID: NIHMS251304  PMID: 21055425

Abstract

We have previously shown that protozoan parasites, such as Toxoplasma gondii, contain a high prevalence of intrinsically disordered regions in their predicted proteins. Here, we determine that both TgGCN5-family histone acetyltransferases (HATs) contain unusually high levels of intrinsic disorder. A previously identified basic-rich nuclear localization signal (NLS) in the N-terminus of TgGCN5-A is located within such a region of predicted disorder, but this NLS is not conserved in TgGCN5-B. We therefore analyzed the intrinsically disordered regions of TgGCN5-B for basic-rich sequences that could be indicative of a functional NLS, and this led to the identification of a novel NLS for TgGCN5-B, RPAENKKRGR. The functionality of the GCN5-B NLS was validated experimentally and has predictive value. These studies demonstrate that basic-rich sequences within regions predicted to be intrinsically disordered constitute criteria for a candidate NLS.

Keywords: Apicomplexa, parasite, cellular trafficking, GCN5, chromatin, epigenetics


The obligate intracellular protozoan Toxoplasma gondii (Apicomplexa) is a serious opportunist pathogen. Completion of genome sequencing revealed that ~58% of predicted Toxoplasma genes encode hypothetical proteins of unknown function (ToxoDB.org). The discovery of new protein motifs is essential for improving predictions about the location and function of unknown proteins. We have previously determined that the genomes of early-branching eukaryotic protozoa contain a large proportion of predicted proteins with significant amounts of intrinsic disorder [1]. Disordered regions are characterized by moderate to low amino acid sequence complexity with very few bulky, hydrophobic amino acids and with an enrichment of polar and charged amino acids and the structure-breaking proline, and can be predicted using computational methods [2]. Determining the degree of disorder in a protein can assist in predicting the biological relevance of a given domain, as many regions of disorder map to areas of protein-protein interaction or post-translational modification [36].

We have previously described the presence of lengthy (600–800 amino acids), unconserved N-terminal extensions on the two GCN5-family member histone acetyltransferases (HATs), TgGCN5-A and –B [78]. These extensions are not present on the GCN5 homologues in other lower eukaryotes, save the fellow apicomplexan parasite Plasmodium falciparum [9], and they have no currently known protein motifs that would help indicate their function. Previously, we determined that the N-terminus of TgGCN5-A plays a role in localizing the HAT to the parasite nucleus by virtue of a unique nuclear localization signal (NLS) [10]. It was also noted that if TgGCN5-B were deprived of its N-terminal extension, the truncated protein was mainly in the cytoplasm [8]. However, the NLS for TgGCN5-A is not conserved in TgGCN5-B, suggesting that TgGCN5-B uses a different NLS to gain access to the Toxoplasma nucleus.

Primary amino acid sequences for TgGCN5-A and –B were analyzed using PONDR® VLXT, VL3, VSL2, and PONDR-FIT algorithms to identify regions of intrinsic disorder (Suppl. methods). In both cases the bromodomain, which recognizes acetylated lysine residues [11], and HAT catalytic domain are predicted to be highly structured (Fig. 1). In contrast, the various PONDR®s concur that the remainder of each TgGCN5 is likely to be remarkably disordered. The most extensive predicted disorder is located within the N-terminal extension, followed by the ADA2-interacting domain and the extreme C-terminal tail (Fig. 1). Consistent with the idea that unstructured domains engage in protein-protein interactions [4], we have previously verified that the ADA2-interacting domains of TgGCN5-A and –B interact with one or both ADA2 co-activator homologues present in Toxoplasma [8]. Additionally, the NLS elucidated for TgGCN5-A (RKRVKR, amino acids 94–99) is embedded in a region of intrinsic disorder (Fig. 1).

Figure 1. Intrinsic disorder predictions and domain structures of TgGCN5-A (A) and TgGCN5-B (B).

Figure 1

The per-residue propensity for intrinsic disorder was evaluated using a set of PONDR algorithms (VL-XT – red lines; VSL2 – purple lines; VL3 – blue lines). In PONDR plots (top graphs of each plot), segments with scores above 0.5 correspond to the disordered regions, whereas those below 0.5 correspond to the ordered regions/binding sites. Long regions of predicted disorder are highlighted in gray. Position of the NLS is shown in purple. Below each plot a cartoon showing the key domains of each TgGCN5 protein is shown: HAT domain in blue and bromodomain (bromo) in gold, separated by the ADA2-interacting domain (ADA2). NLS is shown in purple.

We sought to determine the NLS for TgGCN5-B by examining basic-rich stretches contained within a disordered region (analogous to the TgGCN5-A NLS). Residues 316-320 (KKRGR) best fit these criteria so we generated plasmids designed to express truncated, FLAG-tagged forms of recombinant TgGCN5-B in Toxoplasma, as we did previously to map the NLS for TgGCN5-A [10]. FLAG-tagged TgGCN5-B lacking the first 320 amino acids (FLAGGCN5-BΔ320) showed cytoplasmic localization (supplemental Fig. S1A). However, FLAG-tagged TgGCN5-B lacking the first 315 amino acids (FLAGGCN5-BΔ315), which retains the KKRGR motif, still displayed cytoplasmic localization (data not shown). Moreover, KKRGR fused to E. coli β-galactosidase (β-gal) expressed in Toxoplasma failed to gain access to the nucleus (data not shown), suggesting KKRGR is necessary but not sufficient for nuclear localization. We hypothesized that additional residues upstream of this basic-rich stretch are required for proper compartmentalization of TgGCN5-B. A new construct lacking the first 304 residues, FLAGGCN5-BΔ304, supports this idea (supplemental Fig. S1B). To define the minimal NLS motif, additional deletion constructs were designed that lacked either the first 310 (FLAGGCN5-BΔ310) or 313 (FLAGGCN5-BΔ313) amino acid residues. While the construct FLAGGCN5-BΔ313 was cytoplasmic (Fig. 2A), FLAGGCN5-BΔ310 was able to enter the parasite nucleus (Fig. 2B). When we excised just the ten amino acids from 311–320 (RPAENKKRGR) from full-length TgGCN5-B (FLAGGCN5-BΔNLS), the recombinant protein was restricted to the parasite cytoplasm (Fig. 2C), validating that these ten residues are necessary for nuclear localization. To demonstrate that the elucidated NLS is sufficient for nuclear localization, we attached the RPAENKKRGR residues onto the C-terminus of E. coli β-galactosidase (β-gal) followed by a FLAG tag and monitored the distribution of the fusion protein within the parasites. While β-gal is normally restricted from the parasite nucleus (Fig. 2D), attachment of the TgGCN5-B NLS resulted in virtually all of the fusion protein translocating to the nucleus (Fig. 2E). To rule out the possibility that the FLAG-epitope following the NLS was contributing to the redistribution of β-gal, we replaced it with an HA tag. Results show that β-gal-NLSHA was also nuclear (supplemental data Fig. S2). We conclude that the ten residue stretch, RPAENKKRGR (amino acids 311–320), is necessary and sufficient to serve as an NLS in Toxoplasma.

Figure 2. Mapping of the NLS for TgGCN5-B.

Figure 2

IFAs using antibody to FLAG tag were used to detect various forms of TgGCN5-B or β-gal fusion proteins. Diagram of each protein is shown to the right with FLAG epitope tag and proteins domains indicated. The blue box of each TgGCN5-B protein diagram represents the HAT catalytic domain and the orange box depicts the bromodomain. β-gal protein cartoon is represented in green. A. Localization of TgGCN5-B lacking the first 313 (Δ313) or B. 310 (Δ310) amino acid residues. C. Localization of TgGCN5-B after an internal deletion of the ten residue NLS (ΔNLS). D. Localization of β-galFLAG. E. Localization of β-gal-NLSFLAG. F. Localization of GCN5-BΔ310 containing alanine substitutions for R311 and P312. hN, host cell nucleus; TgN, Toxoplasma nucleus; green = anti-FLAG; red = DAPI.

Searches of the NLS database [12] did not reveal an entry identical to the RPAENKKRGR NLS of TgGCN5-B, revealing that it is a novel monopartite NLS. We investigated the importance of the upstream RP residues on the function of the NLS by creating further mutations in FLAGGCN5-BΔ310. Point mutation of either the Arg or Pro to Ala did not hinder nuclear localization (data not shown); however, when both residues were mutated to alanines, nuclear localization was significantly attenuated (Fig. 2F). These studies suggest that the RP residues upstream of the basic core cluster are critical for efficient nuclear localization of TgGCN5-B. This result is in marked contrast to what was observed for TgGCN5-A, whose basic cluster of RKRVKR residues is necessary and sufficient to operate as an NLS [10].

To determine the utility of the TgGCN5-B NLS as a predictor for nuclear localization of other Toxoplasma proteins, the ToxoDB was searched for gene predictions harboring a similar motif. TgGCN5-B was the only protein in possession of the exact ten residue NLS. When permutations were allowed for residues that were not basic or proline (RPxxxKKRxR, with “x” being any amino acid), three predicted proteins were identified: two hypothetical proteins (TGGT1_113380 and TGME49_091900) and one with a PHD-finger domain (TGGT1_071200), commonly found on nuclear enzymes. TGME49_091900 is predicted to be nuclear by gene ontology. Additional permutations of the TgGCN5-B NLS revealed predicted proteins such as TGGT1_056400, a putative AT-hook domain protein. AT hook domains are commonly found on DNA-binding proteins in the nuclear compartment, but this has yet to be demonstrated in Toxoplasma [13]. 800 proteins were obtained when just the basic cluster (KKRxR) was searched; while most are hypothetical, numerous proteins identified are nuclear, e.g., DNA polymerases or DNA repair proteins (Suppl. Table I). The putative NLS of TGGT1_056400 (AT-hook 056400), RPKKRRR, falls within a region of intrinsic disorder (Fig. 3A, B). We analyzed an additional 5 predicted proteins from Suppl. Table I (TGGT1_071200, TGME49_091900, TGGT1_071910, TGME49_085520, and TGGT1_068070) and also found their predicted NLS to be within a region of intrinsic disorder (Suppl. Fig. S3). To test the prediction that AT-hook 056400 is nuclear, we tagged the endogenous locus with an HA tag in ΔKu80 parasites [1415]. IFA with anti-HA confirms that the native AT-hook 056400 protein is located in the nucleus (Fig. 3C). While the nuclear localization of AT-hook 056400 lends support to the data in Suppl. Table I, it does not conclusively demonstrate that RPKKRRR is the NLS.

Figure 3. Intrinsic disorder analysis and localization of AT-hook 054600.

Figure 3

A. Intrinsic disorder predictions for the full-length AT-hook protein TGGT1_056400. B. Distribution of the PONDR scores over the 2000–3000 amino acid fragment of the AT-hook protein TGGT1_056400. The per-residue propensity for intrinsic disorder was evaluated using a set of PONDR algorithms (VL-XT – red lines; VSL2 – purple lines; VL3 – blue lines). Segments with scores above 0.5 correspond to the disordered regions, whereas those below 0.5 correspond to the ordered regions/binding sites. Long regions of predicted disorder are highlighted in gray. Approximate area of predicted NLS (amino acids 2,515 – 2,522) is indicated in purple. C. AT-hook 056400 (TGGT1_056400) was endogenously tagged with a C-terminal 3xHA epitope tag and localized by IFA using anti-HA (green). Diagram at the right is a schematic of AT-hook 056400 with AT-hook domains as red boxes and predicted NLS as a green box.

In summary, we have developed criteria that can assist in the elucidation of NLS motifs in Toxoplasma proteins: clusters of basic-rich amino acids residing within regions of intrinsic disorder may be indicative of nuclear localization. It is important to note that basic-rich stretches found in ordered regions do not represent functional NLSs in the TgGCN5s. For example, basic-rich amino acids 1,113–1,117 of TgGCN5-A (KKRNR) and 935–941 of TgGCN5-B (KKKCKKK) are within ordered regions yet have no role in nuclear localization (Fig. 2 and [10]).

Using the criteria outlined above, we were able to determine a unique NLS within TgGCN5-B that is both necessary and sufficient for nuclear compartmentalization. The NLS of TgGCN5-B relies on the presence of an upstream RP dipeptide. This NLS configuration demonstrated predictive value, including nuclear localization of an AT-hook protein (TGGT1_056400). Our results should have utility in helping to assign a function to the many hypothetical proteins predicted in the ToxoDB.

Supplementary Material

01

Table S1: Predictive value of the TgGCN5-B NLS

Table S2: Primers used in this study

Figure S1: Localization of truncated forms of TgGCN5-B. Truncated forms of TgGCN5-B (fused to N-terminal FLAG tag) were localized in IFAs using anti-FLAG. A. Localization of TgGCN5-B lacking the first 320 amino acid residues (Δ320). B. Localization of TgGCN5-B lacking the first 304 amino acid residues (Δ304). TgN, Toxoplasma nucleus; green = anti-FLAG; red = DAPI.

Figure S2: Nuclear localization of the β-gal/TgGCN5-B NLS fusion protein is not due to the FLAG epitope tag. IFA using anti-HA shows localization of β-gal fused to the TgGCN5-B NLS and an HA epitope tag at the C-terminus. TgN, Toxoplasma nucleus; green = anti-HA; red = DAPI.

Figure S3: Predicted NLSs fall within regions of intrinsic disorder. As described in figure legend 1, we analyzed the degree of intrinsic disorder for the following additional proteins found in Table I. A. TGGT1_071200, B. TGME49_091900, C. TGGT1_071910, D. TGME49_085520, and E. TGGT1_068070. The predicted NLS for each one is indicated and found within a region of intrinsic disorder.

Acknowledgments

Research was supported by grants from the National Institutes of Health (AI077502 to WJS) and a Pharmacology/Toxicology Pre-doctoral Fellowship from PhRMA Foundation (to SED). The authors thank Dr. Vern Carruthers (University of Michigan Medical School) for supplying RH Ku80 HXGPRT parasites and Dr. Michael White (University of South Florida) for supplying p3HA.LIC.DHFR plasmid.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

  • 1.Mohan A, Sullivan WJ, Jr, Radivojac P, Dunker AK, Uversky VN. Intrinsic disorder in pathogenic and non-pathogenic microbes: discovering and analyzing the unfoldomes of early-branching eukaryotes. Mol Biosyst. 2008;4:328–40. doi: 10.1039/b719168e. [DOI] [PubMed] [Google Scholar]
  • 2.He B, Wang K, Liu Y, Xue B, Uversky VN, Dunker AK. Predicting intrinsic disorder in proteins: an overview. Cell Res. 2009;19:929–49. doi: 10.1038/cr.2009.87. [DOI] [PubMed] [Google Scholar]
  • 3.Dunker AK, Cortese MS, Romero P, Iakoucheva LM, Uversky VN. Flexible nets. The roles of intrinsic disorder in protein interaction networks. FEBS J. 2005;272:5129–48. doi: 10.1111/j.1742-4658.2005.04948.x. [DOI] [PubMed] [Google Scholar]
  • 4.Tompa P, Fuxreiter M, Oldfield CJ, Simon I, Dunker AK, Uversky VN. Close encounters of the third kind: disordered domains and the interactions of proteins. Bioessays. 2009;31:328–35. doi: 10.1002/bies.200800151. [DOI] [PubMed] [Google Scholar]
  • 5.Dunker AK, Uversky VN. Signal transduction via unstructured protein conduits. Nat Chem Biol. 2008;4:229–30. doi: 10.1038/nchembio0408-229. [DOI] [PubMed] [Google Scholar]
  • 6.Iakoucheva LM, Radivojac P, Brown CJ, O’Connor TR, Sikes JG, Obradovic Z, et al. The importance of intrinsic disorder for protein phosphorylation. Nucleic Acids Res. 2004;32:1037–49. doi: 10.1093/nar/gkh253. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Sullivan WJ, Jr, Smith CK., 2nd Cloning and characterization of a novel histone acetyltransferase homologue from the protozoan parasite Toxoplasma gondii reveals a distinct GCN5 family member. Gene. 2000;242:193–200. doi: 10.1016/s0378-1119(99)00526-0. [DOI] [PubMed] [Google Scholar]
  • 8.Bhatti MM, Livingston M, Mullapudi N, Sullivan WJ., Jr Pair of unusual GCN5 histone acetyltransferases and ADA2 homologues in the protozoan parasite Toxoplasma gondii. Eukaryot Cell. 2006;5:62–76. doi: 10.1128/EC.5.1.62-76.2006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Fan Q, An L, Cui L. Plasmodium falciparum histone acetyltransferase, a yeast GCN5 homologue involved in chromatin remodeling. Eukaryot Cell. 2004;3:264–76. doi: 10.1128/EC.3.2.264-276.2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Bhatti MM, Sullivan WJ., Jr Histone acetylase GCN5 enters the nucleus via importin-alpha in protozoan parasite Toxoplasma gondii. J Biol Chem. 2005;280:5902–8. doi: 10.1074/jbc.M410656200. [DOI] [PubMed] [Google Scholar]
  • 11.Dhalluin C, Carlson JE, Zeng L, He C, Aggarwal AK, Zhou MM. Structure and ligand of a histone acetyltransferase bromodomain. Nature. 1999;399:491–6. doi: 10.1038/20974. [DOI] [PubMed] [Google Scholar]
  • 12.Nair R, Carter P, Rost B. NLSdb: database of nuclear localization signals. Nucleic Acids Res. 2003;31:397–9. doi: 10.1093/nar/gkg001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Aravind L, Landsman D. AT-hook motifs identified in a wide variety of DNA-binding proteins. Nucleic Acids Res. 1998;26:4413–21. doi: 10.1093/nar/26.19.4413. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Huynh MH, Carruthers VB. Tagging of endogenous genes in a Toxoplasma gondii strain lacking Ku80. Eukaryot Cell. 2009;8:530–9. doi: 10.1128/EC.00358-08. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Fox BA, Ristuccia JG, Gigley JP, Bzik DJ. Efficient gene replacements in Toxoplasma gondii strains deficient for nonhomologous end joining. Eukaryot Cell. 2009;8:520–9. doi: 10.1128/EC.00357-08. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

01

Table S1: Predictive value of the TgGCN5-B NLS

Table S2: Primers used in this study

Figure S1: Localization of truncated forms of TgGCN5-B. Truncated forms of TgGCN5-B (fused to N-terminal FLAG tag) were localized in IFAs using anti-FLAG. A. Localization of TgGCN5-B lacking the first 320 amino acid residues (Δ320). B. Localization of TgGCN5-B lacking the first 304 amino acid residues (Δ304). TgN, Toxoplasma nucleus; green = anti-FLAG; red = DAPI.

Figure S2: Nuclear localization of the β-gal/TgGCN5-B NLS fusion protein is not due to the FLAG epitope tag. IFA using anti-HA shows localization of β-gal fused to the TgGCN5-B NLS and an HA epitope tag at the C-terminus. TgN, Toxoplasma nucleus; green = anti-HA; red = DAPI.

Figure S3: Predicted NLSs fall within regions of intrinsic disorder. As described in figure legend 1, we analyzed the degree of intrinsic disorder for the following additional proteins found in Table I. A. TGGT1_071200, B. TGME49_091900, C. TGGT1_071910, D. TGME49_085520, and E. TGGT1_068070. The predicted NLS for each one is indicated and found within a region of intrinsic disorder.

RESOURCES