Skip to main content
RNA Biology logoLink to RNA Biology
. 2011 Jul 1;8(4):552–556. doi: 10.4161/rna.8.4.15396

Conserved RNA structures in the non-canonical Hac1/Xbp1 intron

Katarzyna B Hooks 1, Sam Griffiths-Jones 1,
PMCID: PMC3225973  PMID: 21593604

Abstract

The unconventional splicing of Hac1 by the ribonuclease Ire1 is a key event in the activation of the unfolded protein response (UPR) in Saccharomyces cerevisiae. This splicing is independent of the spliceosome and is mediated by a secondary structure at the intron-exon boundaries of the mRNA. Similar unconventional splicing was also described for the gene Xbp1 in human, mouse, Caenorhabditis elegans and Drosophila melanogaster, and for Hac1 in five other fungi. We used reported RNA structures to build a multiple sequence alignment and the Infernal package to search for homologous structures. We identified homologous non-canonical intron structures in 128 out of 156 searched eukaryotic genomes. Our results show that the sequence of the Hac1/Xbp1 intron is highly conserved only around the splice sites recognized by Ire1. The consensus structure of the Hac1/Xbp1 mRNA is well conserved in Fungi and Metazoa and resembles structures previously described. We show that a typical Hac1/Xbp1 intron is very short, only 20–26 bases, whereas yeast species have a long intron (>100 bases). We identified six species with unambiguous Hac1/Xbp1 homologs that have lost the non-canonical intron structure. We propose that these species use a different mechanism to regulate the UPR.

Key words: unfolded protein response, splicing, RNA structure, intron, HAC1, XBP1

Introduction

Environmental stress can cause proteins to misfold and accumulate in the endoplasmic reticulum. The unfolded protein response (UPR) protects from excessive accumulation of misfolded proteins by activating pathways that lead to production of protein-folding chaperones (reviewed in ref. 1). Splicing of the mRNA of a transcription factor named Hac1/HacA in fungi and Xbp1 in metazoans is a crucial regulatory step in the activation of the UPR. This splicing is independent of the spliceosome and takes place in the cytoplasm.2 Intron excision is performed by Ire, a kinase with a ribonuclease activity3 and the exons are joined by the tRNA ligase Trl1 (Rlg1).4 The only known substrates for Ire-mediated splicing are Hac1 and Xbp1. The protein translated from the unspliced Hac1/Xbp1 mRNA is unable to trigger UPR. Splicing causes a frame shift, and the spliced mRNA encodes a potent transcription factor.5,6

Ire resides in the membrane of the endoplasmic reticulum where it senses the status of protein folding.7 Upon ER stress, Ire is activated and splices Hac1/Xbp1. Ire recognizes the secondary structure of the mRNA flanking the splice sites and cleaves at consensus sequences in hairpin loops.3,6,8 The excision of the intron changes the open reading frame; this extends the protein in Saccharomyces cerevisiae,3 human,6 Caenorhabditis elegans and mouse,8 whereas in Trichoderma reesei and Aspergillus nidulans the spliced transcript codes for a shorter protein.9 Only proteins translated from spliced transcripts function as transcription factors.5,6

Ire-mediated splicing was first observed in S. cerevisiae.3 A similar mechanism was later described in the fungi T. reesei,9 Candida albicans,10 Yarrowia lipolytica11 and Pichia pastoris,12 and in the metazoans C. elegans, mouse,8,13 human6 and Drosophila melanogaster.14

The Ire endoribonuclease is conserved in all eukaryotes.8 The unconventional splicing mechanism might therefore be expected to be universal. Here we investigate the conservation of the RNA structures recognized by Ire. We find that the Hac1/Xbp1 intron is deeply conserved. We annotate numerous previously unidentified homologs of the Hac1/Xbp1 intron, and we find that the structures that mediate the unconventional splicing mechanism are lost in some taxa.

Results

Identifying homologs of the non-canonical Hac1/Xbp1 intron.

The Ire-spliced Hac1/Xbp1 intron has previously been described in 10 species: S. cerevisiae,3 T. reesei, A. nidulans,9 C. albicans,10 Y. lipolytica,11 Pichia pastoris,12 C. elegans, mouse,8,13 human6 and D. melanogaster.14 We manually aligned the sequences of these known introns and exon flanks, taking into account the published secondary structure annotations.3,811,14 A covariance model was built and searched against available eukaryotic genomes using the Infernal software. An iterative process of covariance model searches, alignment of sequences to the model and manual refinement of the alignment and consensus secondary structure annotation resulted in a final alignment representing the evolutionary conservation of the non-canonical intron (see Sup. Data).

We identify homologs of the Hac1/Xbp1 intron structure in 128 of the 156 full genomes datasets searched. All Chordata besides two Ciona species were found to contain a well-conserved intron (45 out of 47 genomes). Strongylocentrotus purpuratus (Echinodermata) has an intron highly similar to those of Chordata. Almost all analyzed Arthropoda (19/20), all Nematoda (7/7), Annelida (2/2) and Mollusca (2/2) also contain a conserved Hac1/Xbp1 intron structure. The structure was either not found at all or with very low confidence in representatives of Acoelomata, Choanoflagellida, Cnidaria, Dictyosteliida, Placozoa and Porifera (Fig. 1).

Figure 1.

Figure 1

The taxonomic distribution of identified Hac1/Xbp1 introns. Species in which we cannot identify a confident Hac1/Xbp1 intron structure are shown in grey. Lengths of introns are shown on the right. Species with a long insert between H2 and H3 are highlighted. Taxonomy adapted from NCBI (branch lengths not to scale).26,27

In Ascomycota, the conserved Hac1/Xbp1 intron structure was identified in 52 out of 63 species, and its sequence is especially well conserved in Pezizomycotina. We find no homologous structure in Pyrenophora tritici-repentis, probably due to poor assembly of its genome. We identify Hac1 homologs that lack the intronic structure in five Candida-related species, Vanderwaltozyma polyspora and Pichia stipitis, suggesting that the unconventional splicing mechanism may have been lost. Although the Ire-mediated splicing of Hac1 has been previously described for Candida albicans10—and our results show that C. dubliniensis and C. tropicalis have a conserved RNA structure around the putative intron—the intron is not present in other closely related species. In C. parapsilosis, Lodderomyces elongisporus, D. hansenii, Pichia stipitis and C. guilliermondii the hairpin flanking the 5′ splice site is conserved but the 3′ hairpin has been lost. No part of the intron structure appears to be conserved in C. lusitaniae.

No Hac1-like structure was identified in two Schizosaccharomyces sp. representatives of Taphrinomycotina. We find that Schizosaccharomyces genomes do not contain a HAC1 homolog and candidates found during the Infernal homology search lie in intergenic regions. No homologs were identified in several other fungal phylla—Basidiomycota, Mucoromycotina and Chytridiomycota.

Conservation of the RNA secondary structure.

The secondary structure of the Hac1 and Xbp1 mRNA around the non-canonical intron has common features in all species. At the exon/intron boundaries there are two short hairpins: helix 2 pairs the end of the upstream exon with the 5′ portion of the intron and helix 3 pairs the 3′ portion of the intron with the downstream exon (Fig. 2). Splice sites are located in the loop regions of each short hairpin. Terminal loop regions have a well-conserved Ire cleavage motif CNG' CNGN. Upstream and downstream exons pair in helix 1 to bring helices 2 and 3 into close proximity. The consensus cleavage motifs in the two loops are similar and are degenerate palindromes, therefore may form a pseudoknot. However, we see no evidence for compensatory mutations that preserve a pseudoknot. Indeed, most mutations are incompatible with an interaction between the two loops, and we therefore conclude that a pseudoknot is unlikely to be conserved.

Figure 2.

Figure 2

Consensus structure of Hac1/Xbp1 mRNA. The non-canonical intron is highlighted in green with arrows indicating splice sites. Sequence conservation at each position is shown by the color mark-up. Inserts between helices are marked with triangles. The table (inset) summarizes the length variability of helices in different clades.

The Hac1/Xbp1 intron exists in two types: a long form as found in S. cerevisiae, and a short form as found in mammals. Both variants can be found in Fungi. Pezizomycotina and Candida sp. have a short intron (<30 nt), whereas 13 species closely related to S. cerevisiae have a long intron (>80 nt). The long intron maintains the overall structure with two short hairpins marking the splice sites; however, hairpins are separated by 70 to 360 nucleotides. The insertion is present at the same site in all species of S. cerevisiae clade excluding Zygosaccharomyces rouxii, suggesting a single insertional event in the common ancestor of these species and a single subsequent deletion of the inserted sequence.

Species with the longer intron have a slightly different Ire recognition motif at the 3′ boundary of the intron: CNG' AAGC instead of CNG' CNGN (Fig. 3). The hairpin flanking the 3′ splice site is also longer than in other species: 10 paired bases instead of the average 5–7. C. glabrata has the longest predicted intron in HAC1 at 379 nucleotides, which is much longer than the average 205 nucleotides for all other species with the long intron. The intron of C. glabrata also has mutations in the Ire consensus sequence at both splice sites—a C to A mutation in the 4th position of the 5′ site and an A to U mutation in the 5th position of the 3′ site—but maintains the overall RNA structure. Excision of the predicted intron in C. glabrata changes the last 15 amino acids of the predicted open reading frame and extends it by only nine amino acids. These unusual features suggest that the C. glabrata intron may not be present, or that splicing may not be functionally relevant.

Figure 3.

Figure 3

Sequence logos for the 5′ and 3′ splice sides of the Hac1/Xbp1 non-canonical intron. The overall height of a stack represents the information content of the sequence at particular position, whereas relative height of letters in a stack represents the relative frequency of each nucleotide at that position. The splice sites can be identified in only three members of the Candida clade where they are perfectly conserved. Image created by WebLogo 3.0 (weblogo.threeplusone.com/create.cgi).

The shorter intron is found in all species besides the yeast species listed above, suggesting that the ancestral intron was short. The short intron is between 17 (Tribolium castaneum) and 29 nt long (Drosophila willistoni and Y. lipolytica), which makes it shorter than the shortest known spliceosomal intron.15 Excision of the intron results in a reading frame shift of +1 in C. albicans, C. tropicalis, C. dubliniensis and Stagonospora nodorum, and −1 in all other species. Additionally, Candida introns have a distinctive consensus sequence in the 5′ site (Fig. 3).

Discussion

The results presented here represent the first comprehensive comparative characterisation of RNA structures in the Hac1/Xbp1 intron. Published and experimentally confirmed intron sequences were used to train a covariance model to perform homolog searches. The model predicts intron boundaries in other species with high confidence. It is worth noting that it was possible to build an alignment of mRNA structure and short sequence motifs recognized by Ire whereas the alignment of Hac1 and Xbp1 coding sequences is problematic due to very low sequence conservation. We conclude that unconventional splicing of the bZIP-containing protein is evolutionarily old, present in the last common ancestor of Metazoa and Fungi and under strong selection in the majority of species.

The consensus structure of the Hac1/Xbp1 mRNA is similar to structures previously predicted by several authors8,9,16 but juxtaposition of different taxa reveal specific mutations and differences in length of some helices. Comparison of 150 different eukaryotic species clearly shows that the length of the HAC1 intron in S. cerevisiae is exceptional and shared by only 13 closely related Saccharomycotina species. The S. cerevisiae intron has been shown to pair with the 5′ UTR of Hac1 to stall ribosomes such that the level of translation of the inactive unspliced form of Hac1 is very low.17 Our results suggest that the 5′ UTR blocking mechanism described for S. cerevisiae HAC1 is conserved only in the long introns of Saccharomycotina.

The search for HAC/XBP homologs revealed that orthologous intronic structures could not be found in a number of species. In most cases we were unable to identify a HAC/XBP homolog, suggesting that the gene itself is lost. However, two Ciona genomes have unambiguous XBP1 homologs with intact open reading frames but no traces of the intron. This suggests that a switch has occurred in the mechanism of UPR regulation in Ciona that no longer relies on a splicing event. Among seven closely related species in the C. albicans clade, two maintain the mRNA structure perfectly, at least five have parts of the helices conserved, and the rest seem to have lost the non-canonical intron, together with all traces of surrounding RNA structure. Again, the conserved open reading frame is intact in all seven species. We hypothesize that Candida species lacking the intron structure acquired an alternative mechanism to regulate the unfolded protein response.

Materials and Methods

Hac1/Xbp1 intronic structures for mouse, human and C. elegans were extracted from Calfon et al., S. cerevisiae from the Saccharomyces Genome Database [SGD project www.yeastgenome.org (28.06.2010)], C. albicans (EMBL-Bank accession number: EF655649) from Wimalasena et al., T. reesei (EMBL-Bank: AJ413272) and A. nidulans (EMBL-Bank: AJ413273) from Saloheimo et al., Y. lipolytica from Oh et al. and D. melanogaster from FlyBase.18 A multiple sequence alignment (termed the seed alignment) of sequences from these species was created manually using the Emacs editor in RALEE mode.19 An initial consensus secondary structure was manually annotated based on the predicted secondary structure of Xbp1 from Calfon et al. and predicted folding of individual sequences by RNAfold.20 Genome sequences used in this work are listed in Supplemental Table 1. Genomic sequences of vertebrate Xbp1 homologs were downloaded as a multiz alignment from the UCSC Genome Browser.21,22 BLAST and Infernal 1.0.2 (infernal.janelia.org)23 were used to search whole genomes and Xbp1 homolog sequences to find homologous structures, in a method based on the Rfam approach.24 Each sequence in the seed alignment was used as a query for a BLAST search (BLASTN 2.0MP-WashU with parameters W=7 -kap) of the target database. Hits with an e-value lower than 10 were collected, and 300 nt of flanking sequence added. The covariance model build from the seed alignment (using Infernal's cmbuild) was used to search this new database (using Infernal's cmsearch). High scoring Infernal hits were aligned with the seed alignment (using Infernal's cmalign) and the alignment manually refined for sequence and structure similarity using RALEE.19 An iterative process involving Infernal searches, alignment and manual refinement resulted in the alignment of all detected homologs. To construct the consensus sequence of the alignment, the most abundant base in each column was taken, excluding all columns with more than 60% gaps. Consensus RNA structures were visualized in Varna 3.7.25

Supplementary Material

Supplementary Material
rna0804_0552SD1.pdf (330.8KB, pdf)
rna0804_0552SD2.xls (96KB, xls)

References

  • 1.Hotamisligil GS. Endoplasmic reticulum stress and the inflammatory basis of metabolic disease. Cell. 2010;140:900–917. doi: 10.1016/j.cell.2010.02.034. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Uemura A, Oku M, Mori K, Yoshida H. Unconventional splicing of XBP1 mRNA occurs in the cytoplasm during the mammalian unfolded protein response. J Cell Sci. 2009;122:2877–2886. doi: 10.1242/jcs.040584. [DOI] [PubMed] [Google Scholar]
  • 3.Sidrauski C, Walter P. The transmembrane kinase Ire1p is a site-specific endonuclease that initiates mRNA splicing in the unfolded protein response. Cell. 1997;90:1031–1039. doi: 10.1016/s0092-8674(00)80369-4. [DOI] [PubMed] [Google Scholar]
  • 4.Kawahara T, Yanagi H, Yura T, Mori K. Endoplasmic reticulum stress-induced mRNA splicing permits synthesis of transcription factor Hac1p/Ern4p that activates the unfolded protein response. Mol Biol Cell. 1997;8:1845–1862. doi: 10.1091/mbc.8.10.1845. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Cox JS, Walter P. A novel mechanism for regulating activity of a transcription factor that controls the unfolded protein response. Cell. 1996;87:391–404. doi: 10.1016/s0092-8674(00)81360-4. [DOI] [PubMed] [Google Scholar]
  • 6.Yoshida H, Matsui T, Yamamoto A, Okada T, Mori K. XBP1 mRNA is induced by ATF6 and spliced by IRE1 in response to ER stress to produce a highly active transcription factor. Cell. 2001;107:881–891. doi: 10.1016/s0092-8674(01)00611-0. [DOI] [PubMed] [Google Scholar]
  • 7.Tirasophon W, Welihinda AA, Kaufman RJ. A stress response pathway from the endoplasmic reticulum to the nucleus requires a novel bifunctional protein kinase/endoribonuclease (Ire1p) in mammalian cells. Genes Dev. 1998;12:1812–1824. doi: 10.1101/gad.12.12.1812. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Calfon M, Zeng H, Urano F, Till JH, Hubbard SR, Harding HP, et al. IRE1 couples endoplasmic reticulum load to secretory capacity by processing the XBP-1 mRNA. Nature. 2002;415:92–96. doi: 10.1038/415092a. [DOI] [PubMed] [Google Scholar]
  • 9.Saloheimo M, Valkonen M, Penttila M. Activation mechanisms of the HAC1-mediated unfolded protein response in filamentous fungi. Mol Microbiol. 2003;47:1149–1161. doi: 10.1046/j.1365-2958.2003.03363.x. [DOI] [PubMed] [Google Scholar]
  • 10.Wimalasena TT, Enjalbert B, Guillemette T, Plumridge A, Budge S, Yin Z, et al. Impact of the unfolded protein response upon genome-wide expression patterns and the role of Hac1 in the polarized growth, of Candida albicans. Fungal Genet Biol. 2008;45:1235–1247. doi: 10.1016/j.fgb.2008.06.001. [DOI] [PubMed] [Google Scholar]
  • 11.Oh MH, Cheon SA, Kang HA, Kim JY. Functional characterization of the unconventional splicing of Yarrowia lipolytica HAC1 mRNA induced by unfolded protein response. Yeast. 2010;27:443–452. doi: 10.1002/yea.1762. [DOI] [PubMed] [Google Scholar]
  • 12.Guerfal M, Ryckaert S, Jacobs PP, Ameloot P, Van Craenenbroeck K, Derycke R, et al. The HAC1 gene from Pichia pastoris: characterization and effect of its overexpression on the production of secreted, surface displayed and membrane proteins. Microb Cell Fact. 2010;9:49. doi: 10.1186/1475-2859-9-49. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Shen X, Ellis RE, Lee K, Liu CY, Yang K, Solomon A, et al. Complementary signaling pathways regulate the unfolded protein response and are required for C. elegans development. Cell. 2001;107:893–903. doi: 10.1016/s0092-8674(01)00612-2. [DOI] [PubMed] [Google Scholar]
  • 14.Ryoo HD, Domingos PM, Kang MJ, Steller H. Unfolded protein response in a Drosophila model for retinal degeneration. EMBO J. 2007;26:242–252. doi: 10.1038/sj.emboj.7601477. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Gilson PR, Su V, Slamovits CH, Reith ME, Keeling PJ, McFadden GI. Complete nucleotide sequence of the chlorarachniophyte nucleomorph: nature's smallest nucleus. Proc Natl Acad Sci USA. 2006;103:9566–9571. doi: 10.1073/pnas.0600707103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Gonzalez TN, Sidrauski C, Dorfler S, Walter P. Mechanism of non-spliceosomal mRNA splicing in the unfolded protein response pathway. EMBO J. 1999;18:3119–3132. doi: 10.1093/emboj/18.11.3119. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Ruegsegger U, Leber JH, Walter P. Block of HAC1 mRNA translation by long-range base pairing is released by cytoplasmic splicing upon induction of the unfolded protein response. Cell. 2001;107:103–114. doi: 10.1016/s0092-8674(01)00505-0. [DOI] [PubMed] [Google Scholar]
  • 18.Tweedie S, Ashburner M, Falls K, Leyland P, McQuilton P, Marygold S, et al. FlyBase: enhancing Drosophila Gene Ontology annotations. Nucleic Acids Res. 2009;37:555–559. doi: 10.1093/nar/gkn788. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Griffiths-Jones S. RALEE—RNA ALignment editor in Emacs. Bioinformatics. 2005;21:257–259. doi: 10.1093/bioinformatics/bth489. [DOI] [PubMed] [Google Scholar]
  • 20.Mathews DH, Sabina J, Zuker M, Turner DH. Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. J Mol Biol. 1999;288:911–940. doi: 10.1006/jmbi.1999.2700. [DOI] [PubMed] [Google Scholar]
  • 21.Blanchette M, Kent WJ, Riemer C, Elnitski L, Smit AF, Roskin KM, et al. Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res. 2004;14:708–715. doi: 10.1101/gr.1933104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Rhead B, Karolchik D, Kuhn RM, Hinrichs AS, Zweig AS, Fujita PA, et al. The UCSC Genome Browser database: update 2010. Nucleic Acids Res. 2010;38:613–619. doi: 10.1093/nar/gkp939. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Nawrocki EP, Kolbe DL, Eddy SR. Infernal 1.0: inference of RNA alignments. Bioinformatics. 2009;25:1335–1337. doi: 10.1093/bioinformatics/btp157. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Gardner PP, Daub J, Tate JG, Nawrocki EP, Kolbe DL, Lindgreen S, et al. Rfam: updates to the RNA families database. Nucleic Acids Res. 2009;37:136–140. doi: 10.1093/nar/gkn766. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Darty K, Denise A, Ponty Y. VARNA: Interactive drawing and editing of the RNA secondary structure. Bioinformatics. 2009;25:1974–1975. doi: 10.1093/bioinformatics/btp250. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Fitzpatrick DA, Logue ME, Stajich JE, Butler G. A fungal phylogeny based on 42 complete genomes derived from supertree and combined gene analysis. BMC Evol Biol. 2006;6:99. doi: 10.1186/1471-2148-6-99. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Gerlach D, Wolf M, Dandekar T, Muller T, Pokorny A, Rahmann S. Deep metazoan phylogeny. In Silico Biol. 2007;7:151–154. [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material
rna0804_0552SD1.pdf (330.8KB, pdf)
rna0804_0552SD2.xls (96KB, xls)

Articles from RNA Biology are provided here courtesy of Taylor & Francis

RESOURCES