Abstract
We have previously shown that protozoan parasites, such as Toxoplasma gondii, contain a high prevalence of intrinsically disordered regions in their predicted proteins. Here, we determine that both TgGCN5-family histone acetyltransferases (HATs) contain unusually high levels of intrinsic disorder. A previously identified basic-rich nuclear localization signal (NLS) in the N-terminus of TgGCN5-A is located within such a region of predicted disorder, but this NLS is not conserved in TgGCN5-B. We therefore analyzed the intrinsically disordered regions of TgGCN5-B for basic-rich sequences that could be indicative of a functional NLS, and this led to the identification of a novel NLS for TgGCN5-B, RPAENKKRGR. The functionality of the GCN5-B NLS was validated experimentally and has predictive value. These studies demonstrate that basic-rich sequences within regions predicted to be intrinsically disordered constitute criteria for a candidate NLS.
Keywords: Apicomplexa, parasite, cellular trafficking, GCN5, chromatin, epigenetics
The obligate intracellular protozoan Toxoplasma gondii (Apicomplexa) is a serious opportunist pathogen. Completion of genome sequencing revealed that ~58% of predicted Toxoplasma genes encode hypothetical proteins of unknown function (ToxoDB.org). The discovery of new protein motifs is essential for improving predictions about the location and function of unknown proteins. We have previously determined that the genomes of early-branching eukaryotic protozoa contain a large proportion of predicted proteins with significant amounts of intrinsic disorder [1]. Disordered regions are characterized by moderate to low amino acid sequence complexity with very few bulky, hydrophobic amino acids and with an enrichment of polar and charged amino acids and the structure-breaking proline, and can be predicted using computational methods [2]. Determining the degree of disorder in a protein can assist in predicting the biological relevance of a given domain, as many regions of disorder map to areas of protein-protein interaction or post-translational modification [3–6].
We have previously described the presence of lengthy (600–800 amino acids), unconserved N-terminal extensions on the two GCN5-family member histone acetyltransferases (HATs), TgGCN5-A and –B [7–8]. These extensions are not present on the GCN5 homologues in other lower eukaryotes, save the fellow apicomplexan parasite Plasmodium falciparum [9], and they have no currently known protein motifs that would help indicate their function. Previously, we determined that the N-terminus of TgGCN5-A plays a role in localizing the HAT to the parasite nucleus by virtue of a unique nuclear localization signal (NLS) [10]. It was also noted that if TgGCN5-B were deprived of its N-terminal extension, the truncated protein was mainly in the cytoplasm [8]. However, the NLS for TgGCN5-A is not conserved in TgGCN5-B, suggesting that TgGCN5-B uses a different NLS to gain access to the Toxoplasma nucleus.
Primary amino acid sequences for TgGCN5-A and –B were analyzed using PONDR® VLXT, VL3, VSL2, and PONDR-FIT algorithms to identify regions of intrinsic disorder (Suppl. methods). In both cases the bromodomain, which recognizes acetylated lysine residues [11], and HAT catalytic domain are predicted to be highly structured (Fig. 1). In contrast, the various PONDR®s concur that the remainder of each TgGCN5 is likely to be remarkably disordered. The most extensive predicted disorder is located within the N-terminal extension, followed by the ADA2-interacting domain and the extreme C-terminal tail (Fig. 1). Consistent with the idea that unstructured domains engage in protein-protein interactions [4], we have previously verified that the ADA2-interacting domains of TgGCN5-A and –B interact with one or both ADA2 co-activator homologues present in Toxoplasma [8]. Additionally, the NLS elucidated for TgGCN5-A (RKRVKR, amino acids 94–99) is embedded in a region of intrinsic disorder (Fig. 1).
We sought to determine the NLS for TgGCN5-B by examining basic-rich stretches contained within a disordered region (analogous to the TgGCN5-A NLS). Residues 316-320 (KKRGR) best fit these criteria so we generated plasmids designed to express truncated, FLAG-tagged forms of recombinant TgGCN5-B in Toxoplasma, as we did previously to map the NLS for TgGCN5-A [10]. FLAG-tagged TgGCN5-B lacking the first 320 amino acids (FLAGGCN5-BΔ320) showed cytoplasmic localization (supplemental Fig. S1A). However, FLAG-tagged TgGCN5-B lacking the first 315 amino acids (FLAGGCN5-BΔ315), which retains the KKRGR motif, still displayed cytoplasmic localization (data not shown). Moreover, KKRGR fused to E. coli β-galactosidase (β-gal) expressed in Toxoplasma failed to gain access to the nucleus (data not shown), suggesting KKRGR is necessary but not sufficient for nuclear localization. We hypothesized that additional residues upstream of this basic-rich stretch are required for proper compartmentalization of TgGCN5-B. A new construct lacking the first 304 residues, FLAGGCN5-BΔ304, supports this idea (supplemental Fig. S1B). To define the minimal NLS motif, additional deletion constructs were designed that lacked either the first 310 (FLAGGCN5-BΔ310) or 313 (FLAGGCN5-BΔ313) amino acid residues. While the construct FLAGGCN5-BΔ313 was cytoplasmic (Fig. 2A), FLAGGCN5-BΔ310 was able to enter the parasite nucleus (Fig. 2B). When we excised just the ten amino acids from 311–320 (RPAENKKRGR) from full-length TgGCN5-B (FLAGGCN5-BΔNLS), the recombinant protein was restricted to the parasite cytoplasm (Fig. 2C), validating that these ten residues are necessary for nuclear localization. To demonstrate that the elucidated NLS is sufficient for nuclear localization, we attached the RPAENKKRGR residues onto the C-terminus of E. coli β-galactosidase (β-gal) followed by a FLAG tag and monitored the distribution of the fusion protein within the parasites. While β-gal is normally restricted from the parasite nucleus (Fig. 2D), attachment of the TgGCN5-B NLS resulted in virtually all of the fusion protein translocating to the nucleus (Fig. 2E). To rule out the possibility that the FLAG-epitope following the NLS was contributing to the redistribution of β-gal, we replaced it with an HA tag. Results show that β-gal-NLSHA was also nuclear (supplemental data Fig. S2). We conclude that the ten residue stretch, RPAENKKRGR (amino acids 311–320), is necessary and sufficient to serve as an NLS in Toxoplasma.
Searches of the NLS database [12] did not reveal an entry identical to the RPAENKKRGR NLS of TgGCN5-B, revealing that it is a novel monopartite NLS. We investigated the importance of the upstream RP residues on the function of the NLS by creating further mutations in FLAGGCN5-BΔ310. Point mutation of either the Arg or Pro to Ala did not hinder nuclear localization (data not shown); however, when both residues were mutated to alanines, nuclear localization was significantly attenuated (Fig. 2F). These studies suggest that the RP residues upstream of the basic core cluster are critical for efficient nuclear localization of TgGCN5-B. This result is in marked contrast to what was observed for TgGCN5-A, whose basic cluster of RKRVKR residues is necessary and sufficient to operate as an NLS [10].
To determine the utility of the TgGCN5-B NLS as a predictor for nuclear localization of other Toxoplasma proteins, the ToxoDB was searched for gene predictions harboring a similar motif. TgGCN5-B was the only protein in possession of the exact ten residue NLS. When permutations were allowed for residues that were not basic or proline (RPxxxKKRxR, with “x” being any amino acid), three predicted proteins were identified: two hypothetical proteins (TGGT1_113380 and TGME49_091900) and one with a PHD-finger domain (TGGT1_071200), commonly found on nuclear enzymes. TGME49_091900 is predicted to be nuclear by gene ontology. Additional permutations of the TgGCN5-B NLS revealed predicted proteins such as TGGT1_056400, a putative AT-hook domain protein. AT hook domains are commonly found on DNA-binding proteins in the nuclear compartment, but this has yet to be demonstrated in Toxoplasma [13]. 800 proteins were obtained when just the basic cluster (KKRxR) was searched; while most are hypothetical, numerous proteins identified are nuclear, e.g., DNA polymerases or DNA repair proteins (Suppl. Table I). The putative NLS of TGGT1_056400 (AT-hook 056400), RPKKRRR, falls within a region of intrinsic disorder (Fig. 3A, B). We analyzed an additional 5 predicted proteins from Suppl. Table I (TGGT1_071200, TGME49_091900, TGGT1_071910, TGME49_085520, and TGGT1_068070) and also found their predicted NLS to be within a region of intrinsic disorder (Suppl. Fig. S3). To test the prediction that AT-hook 056400 is nuclear, we tagged the endogenous locus with an HA tag in ΔKu80 parasites [14–15]. IFA with anti-HA confirms that the native AT-hook 056400 protein is located in the nucleus (Fig. 3C). While the nuclear localization of AT-hook 056400 lends support to the data in Suppl. Table I, it does not conclusively demonstrate that RPKKRRR is the NLS.
In summary, we have developed criteria that can assist in the elucidation of NLS motifs in Toxoplasma proteins: clusters of basic-rich amino acids residing within regions of intrinsic disorder may be indicative of nuclear localization. It is important to note that basic-rich stretches found in ordered regions do not represent functional NLSs in the TgGCN5s. For example, basic-rich amino acids 1,113–1,117 of TgGCN5-A (KKRNR) and 935–941 of TgGCN5-B (KKKCKKK) are within ordered regions yet have no role in nuclear localization (Fig. 2 and [10]).
Using the criteria outlined above, we were able to determine a unique NLS within TgGCN5-B that is both necessary and sufficient for nuclear compartmentalization. The NLS of TgGCN5-B relies on the presence of an upstream RP dipeptide. This NLS configuration demonstrated predictive value, including nuclear localization of an AT-hook protein (TGGT1_056400). Our results should have utility in helping to assign a function to the many hypothetical proteins predicted in the ToxoDB.
Supplementary Material
Acknowledgments
Research was supported by grants from the National Institutes of Health (AI077502 to WJS) and a Pharmacology/Toxicology Pre-doctoral Fellowship from PhRMA Foundation (to SED). The authors thank Dr. Vern Carruthers (University of Michigan Medical School) for supplying RH Ku80 HXGPRT parasites and Dr. Michael White (University of South Florida) for supplying p3HA.LIC.DHFR plasmid.
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
References
- 1.Mohan A, Sullivan WJ, Jr, Radivojac P, Dunker AK, Uversky VN. Intrinsic disorder in pathogenic and non-pathogenic microbes: discovering and analyzing the unfoldomes of early-branching eukaryotes. Mol Biosyst. 2008;4:328–40. doi: 10.1039/b719168e. [DOI] [PubMed] [Google Scholar]
- 2.He B, Wang K, Liu Y, Xue B, Uversky VN, Dunker AK. Predicting intrinsic disorder in proteins: an overview. Cell Res. 2009;19:929–49. doi: 10.1038/cr.2009.87. [DOI] [PubMed] [Google Scholar]
- 3.Dunker AK, Cortese MS, Romero P, Iakoucheva LM, Uversky VN. Flexible nets. The roles of intrinsic disorder in protein interaction networks. FEBS J. 2005;272:5129–48. doi: 10.1111/j.1742-4658.2005.04948.x. [DOI] [PubMed] [Google Scholar]
- 4.Tompa P, Fuxreiter M, Oldfield CJ, Simon I, Dunker AK, Uversky VN. Close encounters of the third kind: disordered domains and the interactions of proteins. Bioessays. 2009;31:328–35. doi: 10.1002/bies.200800151. [DOI] [PubMed] [Google Scholar]
- 5.Dunker AK, Uversky VN. Signal transduction via unstructured protein conduits. Nat Chem Biol. 2008;4:229–30. doi: 10.1038/nchembio0408-229. [DOI] [PubMed] [Google Scholar]
- 6.Iakoucheva LM, Radivojac P, Brown CJ, O’Connor TR, Sikes JG, Obradovic Z, et al. The importance of intrinsic disorder for protein phosphorylation. Nucleic Acids Res. 2004;32:1037–49. doi: 10.1093/nar/gkh253. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Sullivan WJ, Jr, Smith CK., 2nd Cloning and characterization of a novel histone acetyltransferase homologue from the protozoan parasite Toxoplasma gondii reveals a distinct GCN5 family member. Gene. 2000;242:193–200. doi: 10.1016/s0378-1119(99)00526-0. [DOI] [PubMed] [Google Scholar]
- 8.Bhatti MM, Livingston M, Mullapudi N, Sullivan WJ., Jr Pair of unusual GCN5 histone acetyltransferases and ADA2 homologues in the protozoan parasite Toxoplasma gondii. Eukaryot Cell. 2006;5:62–76. doi: 10.1128/EC.5.1.62-76.2006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Fan Q, An L, Cui L. Plasmodium falciparum histone acetyltransferase, a yeast GCN5 homologue involved in chromatin remodeling. Eukaryot Cell. 2004;3:264–76. doi: 10.1128/EC.3.2.264-276.2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Bhatti MM, Sullivan WJ., Jr Histone acetylase GCN5 enters the nucleus via importin-alpha in protozoan parasite Toxoplasma gondii. J Biol Chem. 2005;280:5902–8. doi: 10.1074/jbc.M410656200. [DOI] [PubMed] [Google Scholar]
- 11.Dhalluin C, Carlson JE, Zeng L, He C, Aggarwal AK, Zhou MM. Structure and ligand of a histone acetyltransferase bromodomain. Nature. 1999;399:491–6. doi: 10.1038/20974. [DOI] [PubMed] [Google Scholar]
- 12.Nair R, Carter P, Rost B. NLSdb: database of nuclear localization signals. Nucleic Acids Res. 2003;31:397–9. doi: 10.1093/nar/gkg001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Aravind L, Landsman D. AT-hook motifs identified in a wide variety of DNA-binding proteins. Nucleic Acids Res. 1998;26:4413–21. doi: 10.1093/nar/26.19.4413. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Huynh MH, Carruthers VB. Tagging of endogenous genes in a Toxoplasma gondii strain lacking Ku80. Eukaryot Cell. 2009;8:530–9. doi: 10.1128/EC.00358-08. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Fox BA, Ristuccia JG, Gigley JP, Bzik DJ. Efficient gene replacements in Toxoplasma gondii strains deficient for nonhomologous end joining. Eukaryot Cell. 2009;8:520–9. doi: 10.1128/EC.00357-08. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.