ABSTRACT
The La-related proteins (LaRPs) are an ancient superfamily of RNA-binding proteins orchestrating the major fates of RNA, from processing and maturation to regulation of mRNA translation. LaRPs are instrumental in modulating complex assemblies where the RNA is bound, folded, processed, escorted and presented to the functional effectors often through recruitment of protein partners. This intricate web of protein-RNA and protein–protein interactions is enabled by the modular nature of the LaRPs, comprising several structured domains connected by flexible linkers, and other sequences lacking recognizable folded motifs. Recent structures, together with biochemical and biophysical studies, have provided insights into how each LaRP family has evolved unique mechanisms of RNA recognition, not only through the conserved RNA-binding unit, the La-module, but also mediated by other family-specific motifs. Furthermore, in a series of unexpected twists and turns, they have revealed that the dynamic and conformational interplay of multi-structured domains and disordered regions operate in unison to achieve RNA substrate discrimination. This review proposes a perspective of our current knowledge of the structure–function relationship of the LaRP superfamily.
KEYWORDS: La-related proteins, LaRP, RNA-binding proteins, RNA biology, La-module, RRM, structure-function relationship
Introduction
La and La-related proteins (LaRPs) comprise an ancient superfamily of proteins that are conserved in nearly all eukaryotes, except Plasmodium [1]. These proteins are broadly involved in critical processes of RNA use and metabolism in the nucleus and the cytoplasm [2]. The LaRP superfamily is distinguished by a conserved bipartite RNA-binding unit called the La-module [1], composed by a La-motif (LaM) followed by an RNA-Recognition motif (RRM1). Beyond this, each LaRP family is characterized by distinct family-specific domains and motifs that contribute to structure and function [1,2] (Fig. 1).
To carry out their multifaceted and distinct cellular roles, the LaRPs have adapted to bind a wide variety of RNA ligands and protein partners. The best characterized interactions are for the La protein and LaRP7. La has an intricate functional profile, both in the nucleus and the cytoplasm. In the nucleus, it specifically associates with the short 3ʹ polyuridylate tail shared by all newly synthesized RNA polymerase III (pol III) transcripts, including precursors of small nuclear RNAs (snRNAs, e.g. SRP, 7SK), nucleolar RNAs (snoRNAs) and tRNAs. La acts in the biogenesis of these pol III–transcribed RNAs, playing an essential role in their processing and maturation [2–4]. For tRNAs, La binding to the 3ʹUUU trailer allows 3ʹ-end maturation to occur by endonucleolytic cleavage, eventually releasing the tRNA body for subsequent CCAse action [5,6]. La also exerts a chaperone-like activity by stabilizing the pre-tRNAs in conformations competent for correct processing, an activity that implicates a second binding site on La, distinct from the 3ʹUUUOH interacting surfaces [4,5,7]. Although the majority of the protein is nucleoplasmic, in higher eukaryotes La is also found in the cytoplasm, where it acts as an IRES-transacting factor (ITAF), assisting the translation of several viral and cellular mRNA [[8,9], reviewed in [2]]. No 3ʹ-poly(U) trailers are present in the IRESs, and the interaction of human La (HsLa) with the Hepatitis C Virus (HCV) IRES has been shown to entail an intricate interplay of different subdomains [10]. Intracellular trafficking of HsLa between nucleus, nucleolus and cytoplasm is directed by signalling motifs harboured within the C-terminal region (i.e. nuclear localization, nucleolar localization and nuclear retention signals). Interestingly, these motifs are either in the vicinity or overlap with a short basic motif (SBM), which has been proposed to recognize the 5ʹ-triphosphate end of pre-tRNAs, as well as to bind to and promote IRES-driven translation [2,11,12]. The C-terminal half of HsLa is also subject to phosphorylation, potentially impacting RNA binding and cellular compartmentalization [2,11,13].
LaRP7 seems mainly nuclear, albeit nucleolar localization has also been reported. Analogously to La, LaRP7 also recognizes a terminal UUUOH stretch, but, in contrast to La, it associates preferentially with the 7SK snRNA, a nuclear non-coding RNA of 331 nucleotides [14], which has been thus far identified as its principal ligand [15–17]. The 7SK RNA and associated proteins form the 7SK snRNP (small nuclear ribonucleoprotein) complex, responsible for sequestering the positive transcriptional factor P-TEFb in an inactive state [18]. Whilst binding of 7SK RNA 3ʹUUUOH end is carried out by the La-module, the selective recognition between 7SK RNA and HsLaRP7 relies heavily on the RRM2 domain present in the C-terminal half of the protein, which makes stable contacts with a conserved hairpin loop located in the 3ʹ region [19–21], named 7SK-HP4 hereafter. Intriguingly, whilst both human La and LaRP7 proteins contain an RRM2, the role of this domain in RNA binding appears to have diverged between the two families, with HsLa RRM2 seemingly behaving in a more promiscuous fashion [22].
LaRP1, LaRP4, and LaRP6 proteins are predominantly localized in the cytoplasm during steady state, where they associate with mRNAs and participate in translation regulation. While the RNA-binding activities of LaRP1 La-module remain to be elucidated, its C-terminal DM15 domain specifically binds the m7GpppC cap of 5ʹ terminal oligopyrimidine (5ʹ TOP) motifs [23], a defining trait of mRNAs that encode for ribosomal proteins and other translation factors [24,25]. Thus, DM15 could substitute for the cap-binding initiation factor eIF4E, a cornerstone in the main pathway for mRNA recruitment to the ribosome [23]. Phosphorylation of LaRP1 by mTORC1 also appears to facilitate mTORC1-dependent induction of translation initiation [26], but whether this has any effect on the RNA-binding capability of LaRP1 remains to be established.
The single LaRP4 protein of invertebrates evolved into two closely related vertebrate proteins: LaRP4A (a.k.a. LaRP4) and LaRP4B (a.k.a. LaRP5) [1,2]. Both modulate mRNA stability and enhance translation, through interactions with the 3ʹ UTR of mRNAs, the Poly(A) binding protein (PABP), RACK1 (receptor for activated protein kinase C 1) and polysomes [17,27]. LaRP4A plays a role in poly(A) lengthening of mRNAs and associates with poly(A) sequences of at least 15 nucleotides (nt) [27,28]. Recent structural analysis uncovered an unappreciated affinity of the N-terminal region (NTR) of LaRP4A for this RNA target [29]. Interestingly, this NTR region (Fig. 1) contains a variant PAM2w motif that serves as the binding platform for the PABP MLLE domain but unexpectedly revealed to have also roles in poly(A) recognition, thereby sitting at the crossroads of protein–protein and protein–RNA interactions [27,29]. Much less is known about how LaRP4B recognizes RNA, albeit a PAR-CLIP study identified its targets as AU-rich elements in the 3ʹ UTR of a subset of mRNAs [30].
Vertebrate LaRP6 binds a stem-loop structure, comprising a large internal loop, found in the 5ʹ UTR of the collagen α1 and α2 of type I and α1 type III mRNAs [31]. This association has been named as an essential step of the collagen biosynthetic pathway [32]. It is however highly likely that LaRP6 proteins perform additional biological roles, executed through recognition of other RNA and/or protein partners. In support of this, HsLaRP6 has been identified as a regulator of miR-141 and miR-145 biogenesis [33], and a recent high throughput RNA-SELEX analysis uncovered a complex binding specificity for HsLaRP6, ranging from linear motifs, multiple dimeric motifs, and internal loops embedded in a double-stranded RNA stem [34]. Protein interactors of HsLaRP6 include non-muscle myosin, RNA helicase A (RHA) and STRAP (Serine/Threonine kinase Receptor Associated Protein), the latter recruited via a short conserved motif at the C-terminus known as LSA (LaM and S1 Associated) (Fig. 1) [32,35]. The exact role of these associations in LaRP6 biology, however, is yet to be discovered. In contrast to mammals, vascular plant LaRP6 proteins are encoded by three to six genes [36], classified as 6A, 6B and 6C. The B- and C-type orthologs have acquired a PABP-interacting motif 2 (PAM2) whilst undergoing an evolutionary re-organization of their La-module [36]. Very recently, a role in mRNA regulation during pollen tube guidance has been ascribed to the LaRP6C protein from Arabidopsis thaliana (AtLaRP6C) mediated by recognition of specific boxes in the 5ʹUTR of selected genes. AtLaRP6C was found to regulate the balance between translation, degradation and storage of its mRNA targets (Bousquet-Antonelli, Conte et al., unpublished).
Molecular details for several of the above intermolecular interactions are known. Structural and biochemical work has provided critical insight into the diversity of mechanisms that the LaRPs deploy to select their respective RNA ligands. These structures show that there is extensive adaptation and elaboration of RNA contacts within the La-module, and that the La-module works in concert with family-specific domains and regions of LaRPs to regulate RNA-binding activity and dictate RNA substrate discrimination. Although a compendium of the protein and RNA interactome of LaRPs is either very patchy or missing, especially for many of the less-studied family members, a picture is emerging, placing LaRPs at the centre of major protein-RNA assemblies and regulatory RNP complexes, for which molecular details will be essential to elucidate functional profiles [2].
In this review, we summarize the field’s understanding of LaRP domain structure, dynamics, and how multiple domains and regions within these proteins synergize to bind to diverse RNAs. We will focus on interactions and mechanisms for which molecular details are known and refer elsewhere for material that will not be covered here.
The La-module: the modular RNA-binding unit conserved across LaRPs
The La-module is a modular structure in which the LaM and RRM1 are tethered by a short linker [1] (Fig. 1). It was discovered in the human La protein and defined as a single RNA-binding unit where the LaM and RRM1 domains fold independently, whilst working in full synergy to interact with the RNA [2,37]. These features were recapitulated in La proteins from different species and in most of the La-related proteins, most notably in human LaRP7, human LaRP6, fish LaRP6 (DrLaRP6A) and plant LaRP6 (AtLaRP6A and AtLaRP6C) [2]. Intriguingly, as discussed below, human LaRP4A La-module appears to deviate from this paradigm [29]. We will first discuss the structures of the individual LaM and RRM domains, and then collectively as functional RNA-binding modules.
The LaM: an elaborated winged helix domain as the family crest
The LaM is derived from the winged helix (WH) domain often found in transcription factors, where it interacts with the major groove of DNA [38]. The WH topology was first described in a complex formed by the hepatocyte nuclear-factor-3 (HNF-3) transcription factor and its target DNA sequence [39]. Three α-helices are organized around a three-stranded β-sheet with a α1-β1-α2-α3-β2-β3 topology. DNA recognition involves α3, which inserts deeply into the major groove of the DNA, with the loops contacting the adjacent minor grooves. The complex evokes quite strikingly a butterfly (the protein) on a rod (the DNA) and thus the functional loops were named ‘wing 1ʹ (W1), connecting β2 and β3, and ‘wing 2ʹ (W2), the C-terminal-structured loop that extends from strand β3. Including the wings, a canonical WH is annotated as α1-β1-α2-α3-β2-W1-β3-W2, although there are exceptions to this common topology [40]. The first description of a WH domain performing RNA recognition was the SelB complex with SECIS RNA [41].
In LaRP proteins, the LaM is a modified WH domain where the three-stranded antiparallel sheet is surrounded by six helices, three of which (helices α1ʹ, α2 and α4) representing a bolt-on architectural feature to the canonical fold (Fig. 2A) [37,42]. One long helix (α1) and two additional helices α1ʹ and α2 precede the first strand (β1). Three helices, α3, α4, and α5, are inserted between strands β1 and β2 adopting a ‘C’-like shape, visible in Fig. 2. Notably, a hydrophobic pocket that is not present in other WH domains is here constructed by the assembly of helical elements α1ʹ, α2, α3 and α4 against the apolar face of α1, generating the main RNA interaction site (see below). The twisted β-sheet presents a solvent-exposed surface on the opposite side to α1 [37]. The wing W1, linking strands β2 and β3, is five to six residues long and superimposes well in all the available 3D-structures of LaRP LaMs. The wing W2, defined as the structured loop at the C-terminus of the domain, interacts with the base of helix α1ʹ [37,43].
The LaM is the most conserved region in LaRP superfamily [1] and accordingly the available LaM 3D structures of HsLa protein [37], Trypanosoma brucei La [42], HsLaRP7 [20], HsLaRP6 [43] and HsLaRP4A [29] show high overall similarity, albeit variations of length of helices α1 and α2 were observed (Fig. 2A-C). A tilt of α2 towards the solvent was also noted in HsLaRP6 [43]. The remarkable preservation of the LaM fold in LaRPs reflects conservation of residues that maintain the core secondary and tertiary structures, for example F25 and L36 in the hydrophobic core (numbering from HsLa will be used throughout unless otherwise stated), E22 and S75 forming a H-bond between helix α1 and the loop α5-β2, and D27 and R91 connecting the loop α1-α1ʹ to the end of β3. The N-terminal boundary for wing W2 is assigned to the conserved arginine (R90) located at the end of strand β3. The signature sequence P(V/L)P/X marks the end of wing W2 (and of the LaM itself) for HsLa [44] and HsLaRP7 [20], but not in HsLaRP6, where two downstream residues engage in contacts with helix α1ʹ [43]. This denotes the need for 3D-structural information for the correct demarcation of the length and boundaries for the wing W2, LaM and interdomain linker between LaM and RRM1. Intriguingly, LaRP4A LaM lacks wing W2: the conserved arginine is followed by a proline and the P(L/V)P sequence is absent [29]. Thus, even at the level of the LaM building block, the family crest of LaRPs, it is clear that a conserved global aspect can mask subtle discrepancies across the LaRP superfamily.
Conservation of other amino acids in LaM of LaRPs relates directly to function. In particular, residues Q20, Y23, Y24, D33, F35 and F55 decorate the concave surface of the LaM hydrophobic pocket, together with N29 (Fig. 2D-G). In human La and LaRP7, these six residues were shown to be responsible for the specific binding to the uridine stretch at the 3ʹ-end of their associated RNAs [20,44], as detailed below. Interestingly, they cast a similar structural pattern in HsLaRP4A [29] and HsLaRP6 [43] (Fig. 2F,G), even though these LARPs do not recognize 3ʹ UUUOH. Of note, two of the six ‘salient’ residues are not conserved in the LaRP4 family [36]: the hydrophobic residues Y24 and F55 are replaced in HsLaRP4A by the sulphur-containing residues C130 and M160, with consequent alteration of the cavity’s properties.
The RRM1: the variable building block with yet undiscovered roles
The RNA-Recognition Motif (RRM) is built on a four-stranded β-sheet scaffold flanked by two alpha helices, in a β1-α1-β2-β3-α2-β4 topology with five loops (L1-L5). The strands assemble in a β4-β1-β3-β2 antiparallel arrangement, with the helices α1 and α2 packing on the same side of the β-sheet (Fig. 2H-J). A hallmark of canonical RRMs is the presence of two short motifs decorating the central strands β3 and β1, namely RNP1 (K/R/G – Y/G/A – F/Y – V/I/L – X – F/Y) and RNP2 (I/V/L – F/Y – I/V/L – X – N – L), respectively [45]. Aromatic residues within RNP1/2 sequences were shown to contact the RNA ligand across the face of the β-sheet in many canonical domains [45]. The RRM fold is the most common and widespread RNA-binding motif and unsurprisingly it has been extensively adapted to support a variety of functions, including variations that do not conserve RNP1 and RNP2 and others that mediate protein–protein interactions [45,46].
The LaRPs contain two types of RRMs: the so-called ‘RRM1 that has co-evolved with the LaM within the La-module [1], and a downstream ‘RRM2' that is located in the C-terminal region of the La and LaRP7 families (see below). In contrast to the highly conserved LaM, the La-module RRM1 is much more divergent across the LaRPs. Although some of the RRM1 domains conserve the RNP1 and RNP2 sequences (e.g. human LaRP7 [20]), the majority do not. Subclasses of LaRP RRM1s were first defined by phylogenetic analysis, based on the number of predicted β strands in the core β-sheet and the length of loops (L) between secondary structure elements [1]. High-resolution structures revealed that the RRM1 domains of human La, LaRP4A and LaRP6 are indeed built around a four-stranded β-sheet scaffold [29,37,43] (Fig. 2H-J). In HsLa, the RRM1 core is preceded by a 310 helix (αN) and a third helix (α3) is found at the C-terminus, angling away from β4 [37]. The NMR structure of HsLaRP4A showed a longer α2 (Fig. 2I) and a slightly elongated loop L5 connecting helix α2 with strand β4 [29]. The RRM1 of HsLaRP4A does not have the additional αN and the C-terminal α3 helices, but a flexible C-terminal extension of 13 residues. Phylogenetic analysis revealed that HsLaRP6 RRM1 is characterized by an elongated L3 relative to other LaRPs [1]. Interestingly, the length of the loops was also instrumental to cluster the LaRP6 RRM1s into two groups: the ‘RRM-L3a’ with a long L3 in plants, protists and invertebrates, and the ‘RRM-L3b’ in vertebrates, where the extended L3 co-occurs with a longer L1 [1]. The solution structure of HsLaRP6 RRM1-L3b revealed that L3, connecting β2 to β3, comprise an additional helix, named α1ʹ, which stably packs against the β-sheet, occluding access to this putative RNA-binding platform [43]. There is little conservation in L1 and L3 regions between RRM-L3a and RRM-L3b variants [43], raising the question as to whether helix α1ʹ is conserved in the RRM-L3a subtype. The structure of HsLaRP6 RRM1 shows that while a part of L1 is ill-defined, seven amino acids just after β1 form a short helix, α0ʹ (Fig. 2J). HsLaRP6 does not contain the terminal helices αN or α3.
The RRM1 of HsLaRP7 lacks a C-terminal helix α3 but shows a short helix αN (Fig. 2H) in the crystal structure of the La-module in complex with RNA [20], albeit the lack of an isolated structure of the RRM1 poses the question as to whether the helix αN would be present in the apoprotein or it is solely a structural feature of the interdomain linker upon RNA binding (see below). Intriguingly, the crystal structure does not show a β4 strand, suggesting that HsLaRP7 RRM1 contains a three-stranded β-sheet platform. Nonetheless, it cannot be excluded that a complete four-stranded β-sheet may be built with additional regions of the protein that are not present in the construct used for crystallization [20] (see below). Although the lack of β4 is a clear distinguishing feature of LaRP7, this region is distal to the RNA-binding hub (see below) and, on the whole, HsLaRP7 RRM1 closely resembles HsLa RRM1 (Fig. 2H).
Structural data are not yet available for the RRM1s of other LaRPs. By sequence alignment, LaRP1 RRM has a slightly shortened L3 relative to La RRM1, but any diversification in the arrangement of the secondary structure elements will be revealed upon 3D structure determination.
Taken together, the extensive sequence and structural variability shown by the RRM1 of LaRPs offer a wealth of potential alternative RNA-binding surfaces: each LaRP contains unique RRM1 elaborations that likely contribute to RNA-binding specificity (see below).
The La-module at work: innovation and versatility in RNA recognition
Detailed knowledge of how a La-module binds RNA is available only for complexes with a 3ʹ-terminal sequence of RNA. The structures of the La-module of the human La protein in complex with short poly-uridine [poly(U)] stretches [44,47] demonstrated that the LaM and the RRM1 invented their own way to bind RNAs. They no longer follow the schemes observed in WH domains binding to DNA or RNA, which involve the wings and the recognition helix (corresponding to helix α5 of LaM) [40] or the canonical surface of the β-sheet in RRM1 [45]. Binding of the 3ʹ-terminal uridines by HsLa and HsLaRP7 [20,44] relies simultaneously on both LaM and RRM1, with their correct domain positioning aided by the interdomain linker (discussed below). Despite the missing α3 and β4 elements in HsLaRP7 RRM1, the La-modules of HsLa and HsLaRP7 establish very similar contacts with the UUUOH RNA ligands in the crystal structures, especially with the last (U−1) and penultimate (U−2) uridines (negative reverse numbering from the 3ʹ-end) (Fig. 3). The RNA is held in a cleft of the LaM edged by helices α1, α1ʹ and α2 and lined with the conserved Q20, Y23, Y24, D33, F35 and F55 residues (Fig. 3C). The base U−2 is splayed out towards the RRM1 whilst U−3 forms a stacking ladder on top of U−1, itself stacked on F35. Residues located on α1 (Q20, Y23, Y24 in human La), α1-α2 loop (D33), α2 (F35) and α3 (F55) maintain this arrangement: aromatic side chains establish π-interactions with the RNA bases whilst other amino acids anchor the RNA in place with H-bonds and electrostatic contacts, contributing to the specific RNA recognition [20,44,48]. A particular role is devoted to D33 which binds simultaneously the 2ʹ and 3ʹ hydroxyls of the terminal uridine (Fig. 3C), an interaction further stabilized by N29 and by a dipolar interaction induced by the N-terminus of helix α2. This specific recognition of the terminal ribose ring ensures substrate discrimination of free 3ʹ OH-containing targets, and provides a mechanistic explanation for the inefficient binding of ligands bearing a 3ʹ-phosphate group [2].
Whilst the majority of the La-module-RNA contacts of human La and LaRP7 occur in the LaM, they are not sufficient for stable binding. Indeed, association of the isolated LaMs to RNA is too weak to be measurable by available biochemical and biophysical methodologies. The RRM1 domain makes an essential contribution to RNA recognition by engaging main-chain atoms at the edge of the β-sheet, in particular strand β2, to pin down the penultimate uridine U−2 in place, in the deepest recess of the RNA-binding pocket (Fig. 3). Specific recognition in HsLa is ensured by hydrogen bonds between the O2 and O4 atoms of the U−2 with the side-chain amide of Q20 (LaM) and backbone amide of I140 (RRM1), respectively [44] (Fig. 3B,C). Remarkably, U−2 is held strictly in the same position in HsLaRP7 by homologous amino acids, namely Q41 and I154, respectively, despite notable differences in the I154 immediate environment (specifically in α1 and β2 in the RRM1) where residues diverge between LaRP7 and La families [20]. Recognition around U−2 serves to pull the LaM and the RRM1 of HsLa and HsLaRP7 around the RNA, generating a V-shape structure and creating a binding crevice that fully protects the 3ʹ end of the RNA accommodated within.
Binding of La-modules to larger RNAs is still shrouded in mystery. The interaction of the La-module of HsLa and HsLaRP7 with short 3ʹ terminal oligo(U) sequences leaves putative RNA-binding surfaces outside the 3ʹUUUOH binding cleft – namely, the recognition helix (α5) and wings in LaM and the β-sheet surface in RRM1 – available to participate in binding larger RNAs. In support to this, mutations of HsLa RRM1 residues that are not involved in 3ʹUUUOH interaction negatively impact association to the larger pre-tRNAs [5,7]. Likewise, a mutation of E130 in HsLaRP7 indicated that this residue, conserved in the LaRP7 family and located on strand β1 of RRM1 near the RNP-2 motif, may participate in contacting the intact, 331 nt-long 7SK RNA [20]. Furthermore, it has been suggested that the HsLaRP7 La-module could bind 7SK RNA beyond the short 3ʹ-terminal uridine stretch, towards the foot of the 7SK 3ʹ-hairpin [Fig. S5 in [20]].
The La-modules of human La and LaRP7 display predominantly positive charges and similar electrostatic surface potential in and around the cavity where the terminal uridines are held, including the region in the LaM towards the N-terminal end of helix α1 and the interdomain linker portion below the binding crevice (Fig. 4A, I, respectively). Nonetheless, beyond this, distinct characteristics of these otherwise closely related La-modules emerge. A particularly striking difference concerns the RRM1s: in HsLa, the surface adjacent to the RNA-binding site exhibits a strong positive character overall, except for a region towards helix α1 and loop L5 (α2-β4) (Fig. 4A, E), compared to the considerably more negative electrostatic surface potential of HsLaRP7 (Fig. 4I). Furthermore, in HsLaRP7 this electronegative area is topped by a positive patch, corresponding to residues in helix α1 and strand β2 which are particularly well conserved in the LaRP7 family [20]. The other remarkable difference concerns the LaM on the solvent-exposed side of helix α1 (Fig. 4B, F, J) and the ‘back’ surface (Fig. 4C, G, K), with HsLaRP7 exhibiting a significantly more electropositive character than HsLa. From this analysis, one may speculate that beyond the 3ʹUUUOH binding cavity, larger RNA molecules may be differently accommodated by the La-modules of La and LaRP7. Generally, positively charged regions are mostly involved in binding the ribose-phosphate chain, which may play a role in guiding the RNA towards its binding site rather than specific recognition. On the other hand, electronegative surfaces have been shown to bind RNA [49], for example, negatively charged residues can establish H-bond interactions with RNA bases and backbone (e.g. D33 in HsLa).
A structure of human LaRP6 La-module in complex with RNA ligands, e.g. the stem-loop from collagen mRNA, has yet to be determined. Nonetheless, structural information on the isolated LaM and RRM1 allied with mutagenic screening showed that analogously to La and LaRP7, the conserved hydrophobic pocket of the LaM appears to be involved in RNA interaction, although the recognized stem-loop substrate is a large structure and not a 3nt UUUOH sequence [43]. An exciting riddle is thus why the key six residues involved in binding the 3ʹUUU-end RNA in La or LaRP7 are nonetheless conserved in LaRP6 and shown to be important for binding its significantly different target RNA [2,43]. Particularly enigmatic is the case of D33, the residue directly responsible for terminal ribose recognition in La and LaRP7, a feature absent from the internal collagen stem-loop RNA bound by LaRP6. The exact role of the conserved residues D33 (extensively discussed in [2]) or Q20 in binding internal RNA sequences versus 3ʹ termini will be revealed upon new structural information of LaRP6-RNA and other LaRP complexes. Close examination of the electrostatic surface potential reveals that HsLaRP6 LaM has a distinctive and highly polar character, with strongly positive and negative patches arranged side by side (Fig. 5A, E). The RNA-binding surface of the LaM as discerned by mutational analysis [43] coincides with this highly polarized region, largely but not exclusively with the electropositive surface (Fig. 5C, G).
Consonant with La and LaRP7, the interaction of LaRP6 with RNA involves the synergic action of the LaM, interdomain linker and RRM1 [43]. Extensive mutagenesis was performed on HsLaRP6 RRM1, targeting the central β-sheet (L187, Y189, R231, I260, E262), and the LaRP6-specific L1 (K196, W198 on α0ʹ) and L3 regions (R237 and R244, R245, R249 on α1ʹ helix) [43], summarized in Fig. 5D, H. The entire loop L1 was also swapped with the equivalent in HsLa [43] (not shown). Somewhat surprisingly, none of these mutations significantly altered RNA association, albeit replacing the RRM1 of HsLaRP6 with that of HsLa completely abolished binding [43]. Thus, LaRP6 RRM1 is probably engaging with the RNA in a different, family-specific fashion, distinct from both the ‘canonical’ RRMs and the La RRM1 elaboration. Notably, HsLaRP6 RRM1 displays an appreciably different surface topology and electrostatic potential than HsLa RRM1 (Figs. 5B & 4A respectively), manifest in the electronegative region encompassing helix α2 and part of loop L3 and in the electropositive patch corresponding to α1ʹ, the novel L3 α-helix typifying LaRP6 RRM1. Superposition of HsLaRP6 LaM and RRM1 to the respective domains of HsLa in complex with RNA places the La-bound poly(U) RNA in the vicinity of positive surface areas on both HsLaRP6 LaM and RRM1 (Fig. 5A dotted circle and 5B arrow). However, as mutagenesis of several residues around this region of the RRM1 did not impair RNA-binding activity (see above), it casts doubts as to whether this superposition may resemble the RNA-binding surface of HsLaRP6 La-module at all [43].
The case of HsLaRP4A La-module was an unexpected additional surprise, as it was found not to be the main locus of interaction for poly(A) RNA [29]. Mutagenesis and NMR chemical shift perturbation data indicate that the RRM1 participates to 15nt oligoA recognition together with N-terminal sequences, whereas the LaM appears peripheral to the process [29]. The weak binding activity of HsLaRP4A La-module for poly(A) RNA is almost entirely recapitulated by the isolated RRM1, consistent with the RRM1 functioning independently of the LaM [29]. This is, therefore, the first and to date the only example of a La-module not engaging as a single entity in RNA recognition, failing to exploit the cooperative power of the LaM and RRM1 to bind RNA. Interestingly, NMR perturbation analysis maps potential RNA-binding surfaces to the helix α2 and to the central β-sheet of HsLaRP4A RRM1, the latter being a ‘canonical’ RNA-binding site for an RRM, despite lacking the hallmark RNP-1 and RNP-2 features [29]. The LaM and RRM1 of HsLaRP4A La-module were shown by NMR investigations not to adopt a fixed relative orientation [29,50]. Nonetheless, independently of tandem domain configuration, HsLaRP4A La-module shows a strongly positive character at the interface between the LaM and RRM1, assigned to positively charged residues from the strand β3 of the LaM and from the linker (K197, R198) (Fig. 6). Strikingly, a superposition of the La-modules of HsLa and HsLaRP4A over the LaM shows that the electropositive 3ʹUUUOH RNA-binding pocket of HsLa overlays with a strongly electronegative patch on HsLaRP4A LaM (Fig. 6A, indicated by an arrowhead). Two LaRP4A-conserved glutamates (E161 and E162, human LaRP4A numbering) contribute to this unique feature, and interestingly, in other LaRP families, both of these positions are distinctly electropositive (N/K/H and K/R, respectively). Even LaRP4B has an aspartate in lieu of E161 and a non-homologous histidine in place of E162. One may speculate that this difference in electrostatic potential of the LaM hydrophobic pocket may at least in part account for the unusual RNA-shy behaviour of HsLaRP4 LaM [29]. Moreover, HsLaRP4A LaM contains positively charged surfaces on both the ‘front’ and ‘back’ faces of the domain (Fig. 6A, B), although any involvement of these regions in RNA recognition for other targets beyond poly(A) (see below) remains unknown.
In conclusion, it is clearly emerging from structural analyses that other LaRPs may not follow the same RNA-binding script as La and LaRP7 and suggests that La-modules can further adapt to bind more complex RNA structures.
The dynamic interdomain linker and LaM–RRM1 relationship in RNA substrate selection
In the LaRPs, as described above, the LaM and the RRM1 co-evolved to give rise to family-and member-specific La-modules able to select their RNA substrates fine-tuned to their diverse functions. The correct LaM/RRM1 combination dictates RNA target discrimination: replacing the RRM1 of HsLaRP6 with the RRM1 from HsLa or AtLaRP6C produces RNA-binding incompetent chimera proteins [[43] and Conte, unpublished], and swapping the RRM1 of HsLaRP7 with HsLa led to a loss of specificity [15].
Less attention has been given to the differences and the evolutionary pathway of the interdomain linkers connecting the LaM and RRM1. A limiting factor for such analyses resides in the uncertainty of the exact boundaries of these linkers, especially where structural data are absent (see above). Despite this, large variations in amino acid composition and length for these linkers across the LaRP superfamily have been unambiguously noted [2,43].
The interdomain linker starts immediately after the end of the LaM, which for HsLa and HsLaRP7 corresponds to the PLP/PLG signature of the W2. These have in fact been identified as the last residues that structurally belong to the LaM, as they form stable contacts with the rest of the domain. In HsLa, therefore, the linker begins at residue E99 and comprises two amino acids (99EV100) appended by the sequence 101TDEYKNDVKNR111. This latter stretch upon RNA recognition folds into a three-turn α-helix, incorporating the 310 αN helix of the RRM1 present in the apo form (spanning residues 108–111), somewhat complicating the exact definition of the component boundaries. An analogous situation is plausibly recapitulated in HsLaRP7, where the linker of the RNA-bound species begins with the 117ERPK120 stretch after the PLG motif, followed by the α-helix 121DEDERT126. Nonetheless, the lack of structural information on LaRP7 in the absence of the RNA ligand makes it difficult to assess the extent of overlap between the linker α-helix of the holo form and the presumed RRM1 αN of the apo state. In HsLaRP6, the structured LaM extends beyond the 172PVP174 motif to P177, making the linker only three residues long (178NEN180), without an α-helix. In HsLaRP4A, only two residues separate the LaM and the RRM1 (197–198). Based on available structural and mutagenesis work, differences in interdomain linker features are expected to play critical roles in RNA recognition [2,43,51]: swapping the linker of human LaRP6 with the one from human La decreased binding affinity for collagen stem-loop RNA by approximately 10-fold [43], likely as a result of perturbations of interdomain distances or configurations, and/or relative degree of conformational sampling.
The dynamic interplay of multiple elements (LaM, RRM1 and linker) in the La-module recognition of their cognate RNA remains to be characterized and is an area of active investigation. For human La, in solution and in absence of RNA ligand, the LaM and RRM1 are structurally independent domains connected by a disordered linker, without exhibiting a rigid tandem 3D arrangement [44,52]. The La-UUUOH complex is instead composed of a V-shaped compact state, where the two domains and the linker reorient together to interact with one another and with the RNA. The fixed domain configuration adopted by the LaM and RRM1 in a complex with UUUOH is thereby abetted by the induced ordering of the interdomain linker upon RNA binding and by a few new LaM-RRM1 contacts, for example, a H-bond established by the side chains of Y23 and N139 and a salt-bridge between R57-D125. In HsLaRP7 complexed with RNA, two salt-bridges between the LaM, linker and/or RRM1 residues were observed, specifically involving the family-conserved pairs R118-E122 and K53-E172 [20]. However, whether these bridges are formed upon RNA association or whether they restrict the tandem domain motion in the apo form to ease RNA binding remains to be established. Intriguingly, recent SAXS analyses (Conte, Dock-Bregeon et al. unpublished) appear to suggest a possible conformational selection mechanism whereby conformational sampling of flexible domains and linkers in HsLa, HsLaRP7 and HsLaRP6 apo La-modules populates pre-existing bound conformations. Similar mechanisms of multi-domain conformational selection have been shown to operate in other proteins, affecting RNA substrate selection [51].
Distinctively, HsLaRP4A La-module exists in a more extended architecture than other LaRPs, most likely a consequence of the lack of the W2 in the LaM and a short interdomain linker [29,50]. Thus far, the La-module of HsLaRP4A appears to lack high RNA-binding capability in isolation: whether this is linked to its different extended tandem LaM/RRM1 configuration, its dynamic ensemble, sequence composition, electrostatic properties or a combination of all, remains to be established.
LaRPs in action beyond the La-module
The La-module of LaRPs is a versatile RNA-binding unit. In many known cases, as described above, this module is necessary and sufficient for RNA interaction, e.g. for La and LaRP7 to bind 3ʹ oligoU stretches and for LaRP6 to bind the collagen mRNA stem-loop [20,43,44]. However, in other instances the La-module was found to work together with other regions of the protein to associate with target RNAs. Moreover, beyond the La-module, other domains of LARPs have emerged as having major roles in RNA recognition in an assortment of modes.
C-terminal sequences: the DM15 domain
The LaRP1 family contains a DM15 domain in the C-terminal region that has been shown to be a cap-binding protein and suggested to bind to TOP mRNA sequences or other sequences for which the +1 position is a C [[23] and extensively reviewed in [2,53]]. It is yet unclear whether there is cooperation between the La-module of LaRP1 and the DM15 in RNA substrate binding, albeit initial speculations have been made to propose a compelling model for regulation of protein synthesis of TOP-containing mRNAs (see Fig. 9 in [2]).
C-terminal sequences: RRM2, RRM1-RRM2 linkers and SBM
In the La and LaRP7 families, the La-module has been shown to work together with the C-terminal RRM [a.k.a. RRM2 or RRM2α [2]] and the RRM1-RRM2 interconnecting linkers. Structures are known for the RRM2s of HsLa [54], HsLaRP7 [21] and p65 from Tetrahymena (Ttp65), a protein of the LaRP7 family involved in telomerase assembly in ciliates [55] (Fig. 7). These domains exhibit atypical and distinctive elements, in that they lack conventional RNP-1 and RNP-2 motifs and feature an additional amphipathic helix α3 that packs across their β-sheet [21,54,56]. The structural analysis of Ttp65 RRM2 (defined as an ‘xRRM’ [56]) also uncovered a conserved (Y/W-X-D) sequence in LaRP7 members, termed RNP-3 motif and located on the β2 strand [56]. Although the RNP-3 is absent in La, HsLa RRM2 shares with Ttp65 a fifth β-strand (β4ʹ) that runs antiparallel to β4, bringing about a very pronounced twist in the β-sheet (Fig. 7A). HsLaRP7 RRMs is instead a four β-strand domain with a longer and kinked α2 helix (Fig. 7B).
In HsLa, helix α3 is primarily stabilized by a hydrophobic network involving residues V310, A314 and I318 on the helix and amino acids L233, W261, I272, L274 and L306 on the β-sheet [54] (Fig. 7A). The opposite face of helix α3 contains polar residues, namely K316, K317, E320, and E324 [54]. In HsLa this helix also houses a Nuclear Export Signal (NES) as discussed in [2]. Analogously to Ttp65, the helix α3 of HsLaRP7 RRM2 is angled away from the central β platform compared to HsLa [21,56] (Fig. 7B), contacting the β-sheet through one pair of aromatic stacks engaging Y532 on the helix and H494 on the β-sheet [21]. Aromatic and hydrophobic residues are located deep in the elbow formed by the central β-sheet and helix α3 (Fig. 7B), leaving room, past the β-sheet, for the polar residues of α3 to make direct contacts with the RNA ligand [57]. Upon binding to the HP4 hairpin of 7SK RNA, α3 elongates by two residues, nesting into the major groove of the hairpin via electrostatic interactions (Figs. 7C & 6I). In particular, R496 forms two hydrogen bonds with the Hoogsteen face of G312, whilst Y483 (from the RNP-3) and K543 establish hydrogen bonds with the backbone phosphate of U313. The side chain of Y483, located on the edge of the β-sheet, also intercalates between G312 and G314, which are flipped out from the apical loop of the 7SK-HP4 hairpin to slot in two binding pockets within the RRM2 (Fig. 7F) [57].
Interestingly, the structure of Ttp65 RRM2 bound to a hairpin of the telomerase TER RNA, whilst akin to the HsLaRP7-HP4 complex in many ways, displays a strikingly different overall topology [56]. Firstly, in Ttp65, binding engages nucleotides from a lateral bulge of the RNA hairpin, rather than the apical loop. Consequently, the helical axis of the TER RNA is perpendicular to the 7SK RNA axis in the LaRP7 complex (compare Fig. 7E, H to Fig. 7F, I). Furthermore, whereas one of the two bound base moieties (G312 in 7SK, G121 in TER) is accommodated in a similar fashion within the RRM2 recognition pocket, and contacting similar residues (arginine, tyrosine and lysine), the other one (G314 in 7SK and A122 in TER) lies on opposite sides. The extension of the C-terminal helix in the major groove of the bound RNA is also less impressive in HsLaRP7 (2 residues versus 13 in Ttp65).
Although HsLaRP7 RRM2 binds to 7SK-HP4 in isolation with high affinity, it could be envisaged that the La-module, as well as other regions of the protein, may cooperate with the RRM2 in binding the large intact 7SK RNA. Indeed, HsLaRP7 still associates efficiently with a 7SK RNA deprived of its 3ʹ domain (following deletion of the HP4 hairpin and the 3ʹUUU tail) [20]. The long linker between RRM1 and RRM2 could contribute to binding, as indicated by RNA footprint patterns observed only with the full-length HsLaRP7, and not with the combination of the La-module and an RRM2-containing C-terminal deletion mutant [20]. HsLaRP7 RRM1-RRM2 linker is very long (268 amino acids compared to 42 in HsLa) and contains several stretches of conserved basic residues, mostly poly-lysines, for instance the sequence 356–368 (in HsLaRP7). This is adjacent to the conserved stretch of residues 376–403 that has been suggested to form a short β-strand and an α-helix to complete the shortened RRM1, possibly upon 7SK binding (ACDB, personal communication). The region just upstream of RRM2 (residues 433–454 in HsLaRP7) was included in the construct used in Uchikawa et al. [20] as it bound RNA with higher affinity than a fragment starting at the exact N-terminal boundary of the RRM2 (i.e. beginning of strand β1, residue 454) (ACDB, personal communication).
In contrast to LaRP7, the RRM2 of HsLa has not been shown to have significant RNA-binding ability in isolation, at least for the RNA sequences tried thus far ([54] and Conte, unpublished data). Cooperation of LaM, RRM1, RRM2 and the in-between linkers led to the recognition of internal stem-loops, e.g. HCV IRES domain IV [10] and pri-miRNA [58]. This is a structure-oriented type of recognition, but not sequence specific, and therefore diametrically distinct from 3ʹ oligoU association [10]. Although a 3D structure is not yet available for this system, chemical shift perturbation analysis showed that the RRM2 helix α3 and the flexible RRM1-RRM2 linker participate in HCV IRES recognition, especially the linker residues 201–215 that are conserved across species [10]. Between amino acids 196 and 208 the linker comprises several positively charged residues, reminiscent of what is observed in LaRP7, and are predicted by some web servers to fold in a helix (not shown), albeit NMR data did not confirm the prediction [10]. Interestingly, yeast Lhp1p lacks an RRM2 but contains a flexible C-terminal tail downstream of RRM1, which has been suggested to contact a variety of non-coding RNA precursors and assist with their biogenesis [59].
The C-terminal region of La proteins from higher eukaryotes bears a short basic motif (SBM) that has been suggested to interact with RNA, e.g. the 5ʹpppG of nascent RNAs [12,60] and IRES regions [11], but molecular details are unknown, as extensively discussed in a previous review [2]. Furthermore, in some cases, a minimal region comprising only RRM1-linker-RRM2 of HsLa has been reported to suffice for binding a subset of IRES domains [11,61], illustrating the existence of alternative modes of RNA recognition and underscoring the adaptability of LaRPs.
Finally, as for the linkers, LaRP C-terminal regions outside of folded domains exhibit family-specific physiognomies. Very little is known to date about their conformational properties, except for HsLa where the sequences downstream of RRM2 bear the hallmarks of intrinsic disorder [10,54]. Interestingly, a prediction of secondary structures in HsLaRP7 suggests the formation of an α-helix following RRM2 (not shown). Taken together, it is conceivable that a combination of disordered and structured elements characterizes these lesser-studied regions of LaRPs.
The La-module and the N-terminal region: a newly discovered partnership
An interesting and salient difference across the LaRP superfamily concerns the location of the La-module in the proteins’ primary structure (Figs. 1 & 8). Whereas generally comprised in the N-terminal half, the La-module entity is found in and around the very N-terminus in La proteins and progressively shifts towards more central positions in the LaRP7, LaRP6, LaRP4 and LaRP1 families [1]. Intriguingly, very recent work on human LaRP4A [29] and zebrafish LaRP6A (Lewis, unpublished) indicated that the La-module can cooperate with preceding N-terminal regions (NTRs) to interact with RNA ligands.
Analysis of the RNA-binding properties of HsLaRP4A revealed that for optimal binding to single-stranded (ss) oligoA sequences a region comprising the La-module and the NTR, defined as the N-terminal domain (NTD, Figs. 1 & 8), is required. Defying expectations, this work revealed that the association of HsLaRP4A with poly(A) is dominated by the NTR, an intrinsically disordered region with no recognizable RNA-binding sequences or motifs, in a novel mechanism of RNA recognition yet to be fully characterized [29]. The NTR sequence conservation in the tetrapod lineage of LaRP4A proteins underscores an evolutionary-conserved RNA-recognition mechanism [29]. HsLaRP4A La-module contributes to ss-poly(A) binding mainly via the RRM1, surprisingly keeping the LaM away from the process as discussed above [29].
Additional clues to the potential role of the NTD in RNA recognition emerge from recent work on the LaRP6 family. Although as described above the La-module of human and zebrafish LaRP6 proteins is necessary and sufficient for high-affinity interaction with the conserved stem-loop of the 5ʹUTR collagen mRNA [31,43,62], the entire NTD of DrLaRP6A (comprising NTR and La-module) displays different affinities for ss homopolymeric sequences than the La-module alone, suggesting that the NTR may modulate RNA-binding activity in zebrafish (Lewis, unpublished). Interestingly, and in a striking similarity with HsLaRP4A, the NTR sequence of zebrafish LaRP6 also contains disordered regions ([62] and Lewis, unpublished data). NTR sequences are fairly conserved across vertebrates, with 61% identity between human and zebrafish NTR, suggesting that the NTR may contribute to mammalian LaRP6 binding activity as well.
Notably, a very recent study identified human LaRP6 as able to bind to distinct sequences, in particular internal stem-loop structures and linear motifs through different strategies [34]. This may endorse the view that the La-module alone has evolved to select for structured loop elements, whereas the single-stranded RNA-binding activity is executed by the NTD, i.e. the La-module working in cooperation with N-terminal sequences. This draws parallels with what was observed for the La-module of HsLa working with C-terminal sequences (RRM2 and RRM1-RRM2 linker) to bind alternative RNA structures (see above).
Could this La-module/NTR partnership be a general trait of LaRP6, LaRP4 and LaRP1 families? The La-module of HsLaRP4A is not the main locus of poly(A) recognition, but it cannot be excluded that may be dominant in interacting with different sequences – and intriguingly a putative large structured RNA, the pre-miR141, has been proposed as a HsLaRP4A target [29,63].
Finally, LaRP7 proteins also possess a longer N-terminal stretch upstream of the LaM compared to La (30 residues vs 4 in La), which comprises a stretch of basic residues. Its involvement in RNA recognition is untested but it may extend the electropositive region observed in the structure of the La-module of HsLaRP7 at the N-terminal part of helix α1 (Fig. 4K).
LaRPs as partners in RNP assembly
As discussed in the introduction, protein partners of LaRPs have been discovered and reported, and have been extensively reviewed in [2]. These interactions are likely to bear functionally important roles, fine-tune LaRP-RNA interactions and regulate RNP assembly. For example, LaRP1 contributes in a very intricate way to several pathways of general eukaryotic translation, through its association with RAPTOR (regulatory-associated protein of mTOR) [24] and 5ʹ-TOP mRNAs [53], and LaRP6’s integrated protein and RNA interactions have been proposed to serve roles in the translation of fibrillar collagen [32]. Regrettably, the structure-function investigation for the majority of these interactions is still in its infancy. Although information on the composition and function of several LaRP-containing RNP complexes has accumulated in the literature, the details of protein–protein and protein–RNA interactions that participate in their functional recruitment at their site of action and/or infer their mechanistic profiles remain elusive.
One of the best-documented cases to date concerns the assembly of the 7SK snRNP complex. LaRP7 is an essential member of this RNP: it establishes multiple contacts with the 7SK RNA as discussed above, ensuring its conformation in the ‘active’ closed form [20,64], whilst also physically associating with the methylase MePCE, the enzyme responsible for the addition of a methyl group on the γ-phosphate of the 7SK RNA 5ʹ cap [65]. In this assembly, LaRP7 stays bound to MePCE probably via its C-terminal region [64] and it cannot be excluded that it also contacts HEXIM, the factor responsible for recruiting and repressing p-TEFb [66].
An interesting recurrent theme across evolution is the interaction of the cytosolic LaRPs with PABP (reviewed in [2]). LaRP1 and PABP co-sediment with polyribosomes [67], and a putative PAM2 motif mapped within the La-module is required for robust LaRP1-PABP association [24]. A well-characterized PAM2 motif was found in the N-terminal regions of the A.thaliana paralogs AtLaRP6B and AtLaRP6C, mediating functionally relevant interactions with the MLLE domain of PABP [36]. Vertebrate LaRP4A and LaRP4B proteins have been reported to associate with PABP in a bipartite manner, through a variant PAM2w motif in the N-terminal region and a less-well defined PBM (PABP-binding motif) located downstream of the RRM2 [27,68] (Fig. 1). The variant PAM2w motif harbours a tryptophan residue in a position otherwise invariantly occupied by a phenylalanine in other PAM2 sequences [27,68,69]. Notwithstanding this substitution, HsLARP4A and HsLaRP4B PAM2w sequences interact with the MLLE domain of PABP in a very similar fashion to canonical PAM2, e.g. Paip2 or eRF3a [27,70–72] (Fig. 9). The only notable difference is the exit path of the peptide chain following either the F (in PAM2) or W (in PAM2w), albeit its significance stays elusive. Intriguingly, HsLaRP4A association with PABP MLLE via its PAM2w motif interferes with its ability to bind poly(A) [29]. This is a particularly exciting mechanism of a mutually exclusive protein–protein and protein–RNA interaction by HsLaRP4A, likely to play important roles in the regulation of a LaRP4A-PABP-mRNA assembly at the 3ʹ poly(A) tail end, and in the recruitment of any other PAM2-containing translation factors to the 3ʹUTR of mRNAs [29].
Finally, protein and RNA-binding activity of LaRPs, as well as their cellular localization and functional activities, have been reported to be modulated by post-translational modifications (PTMs) [2,13,26,73,74]. Nonetheless, the structural consequences of PTMs on LaRPs and the exact mechanism by which they affect function remain largely unexplored to date.
Conclusions
The La-related proteins have evolved an innovative way to bind RNA via the La-module, a creative and adaptable RNA-binding unit. As more structures are being solved, it is becoming clear that the La-module’s intrinsic versatility infers it with the ability to select different RNA partners, playing on the modular construction and dynamic properties of its component domains and linker regions. However, the exact mechanism by which RNA substrate selection is achieved remains to be fully understood and it will be interesting to elucidate how large RNA ligands bind to the La-modules and whether they can access the apparent multiplicity of the RNA-binding regions that it contains. As the field progresses, it is also emerging that other regions of LaRPs beyond the La-module contribute significantly to RNA binding. It may also appear that, in some cases, the La-modules have co-evolved with other regions of the LaRPs to work together in a cooperative fashion: the combined action of multiple domains and linkers can not only achieve the required RNA-binding affinity and specificity but may also expand the repertoire of RNA ligands that can be recognized. Furthermore, protein partners and post-translation modifications are likely to influence the RNA-binding capability and function of the LaRPs. Studies on the interactome of each LaRP will, therefore, be highly informative as we place the molecular mechanisms of each LaRP into the broader context of cellular signalling and physiological function. The discoveries and insights on LaRP structure and function described above are beginning to coalesce into a general model of LaRPs as key assemblers for large RNP complexes in both the nucleus and the cytoplasm. The assembly of dynamic RNPs can be regulated through LaRP cellular localization, protein–protein interactions, and post-translational modifications to exert control on RNA metabolism and biology at nearly every level, from RNA synthesis to its processing, use and degradation.
We look forward to new structural details to elucidate exact mechanisms of action, as well as the composition and molecular organization of RNP complexes, which will undoubtedly reveal important features in LaRP biology, and, knowing this superfamily of proteins, probably throw more surprises and unexpected elaborations.
Funding Statement
The authors are grateful for grant support of their work in this area: MRC (Wellcome Trust, Newton Royal Society Fellowship, ref. [NF140482], EU Horizon 2020 research and innovation programme Marie Sklodowska-Curie fellowship, agreement [655341]); ACDB (Région Bretagne SAD2-2017-RNA2init, SAD17009); KAL (National Institutes of Health – National Institute of General Medical Sciences) [R15GM119096].
Disclosure statement
No potential conflict of interest was reported by the authors.
References
- [1].Bousquet-Antonelli C, Deragon JM.. A comprehensive analysis of the La-motif protein superfamily. Rna. 2009;15:750–764. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [2].Maraia RJ, Mattijssen S, Cruz-Gallardo I, et al. The La and related RNA-binding proteins (LARPs): structures, functions, and evolving perspectives. Wiley Interdiscip Rev RNA. 2017;8:e1430. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [3].Stefano JE. Purified lupus antigen La recognizes an oligouridylate stretch common to the 3ʹ termini of RNA polymerase III transcripts. Cell. 1984;36:145–154. [DOI] [PubMed] [Google Scholar]
- [4].Wolin SL, Cedervall T. The La protein. Annu Rev Biochem. 2002;71:375–403. [DOI] [PubMed] [Google Scholar]
- [5].Bayfield MA, Maraia RJ. Precursor-product discrimination by La protein during tRNA metabolism. Nat Struct Mol Biol. 2009;16:430–437. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [6].Yoo CJ, Wolin SL. The yeast La protein is required for the 3ʹ endonucleolytic cleavage that matures tRNA precursors. Cell. 1997;89:393–402. [DOI] [PubMed] [Google Scholar]
- [7].Huang Y, Bayfield MA, Intine RV, et al. Separate RNA-binding surfaces on the multifunctional La protein mediate distinguishable activities in tRNA maturation. Nat Struct Mol Biol. 2006;13:611–618. [DOI] [PubMed] [Google Scholar]
- [8].Costa-Mattioli M, Svitkin Y, Sonenberg N. La autoantigen is necessary for optimal function of the poliovirus and hepatitis C virus internal ribosome entry site in vivo and in vitro. Mol Cell Biol. 2004;24:6861–6870. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [9].Holcik M, Korneluk RG. Functional characterization of the X-linked inhibitor of apoptosis (XIAP) internal ribosome entry site element: role of La autoantigen in XIAP translation. Mol Cell Biol. 2000;20:4648–4657. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [10].Martino L, Pennell S, Kelly G, et al. Analysis of the interaction with the hepatitis C virus mRNA reveals an alternative mode of RNA recognition by the human La protein. Nucleic Acids Res. 2012;40:1381–1394. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [11].Kuehnert J, Sommer G, Zierk AW, et al. Novel RNA chaperone domain of RNA-binding protein La is regulated by AKT phosphorylation. Nucleic Acids Res. 2015;43:581–594. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [12].Maraia RJ, Intine RV. La protein and its associated small nuclear and nucleolar precursor RNAs. Gene Expr. 2002;10:41–57. [PMC free article] [PubMed] [Google Scholar]
- [13].Intine RV, Tenenbaum SA, Sakulich AL, et al. Differential phosphorylation and subcellular localization of La RNPs associated with precursor tRNAs and translation-related mRNAs. Mol Cell. 2003;12:1301–1307. [DOI] [PubMed] [Google Scholar]
- [14].Diribarne G, Bensaude O. 7SK RNA, a non-coding RNA regulating P-TEFb, a general transcription factor. RNA Biol. 2009;6:122–128. [DOI] [PubMed] [Google Scholar]
- [15].He N, Jahchan NS, Hong E, et al. A La-related protein modulates 7SK snRNP integrity to suppress P-TEFb-dependent transcriptional elongation and tumorigenesis. Mol Cell. 2008;29:588–599. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [16].Krueger BJ, Jeronimo C, Roy BB, et al. LARP7 is a stable component of the 7SK snRNP while P-TEFb, HEXIM1 and hnRNP A1 are reversibly associated. Nucleic Acids Res. 2008;36:2219–2229. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [17].Markert A, Grimm M, Martinez J, et al. The La-related protein LARP7 is a component of the 7SK ribonucleoprotein and affects transcription of cellular and viral polymerase II genes. EMBO Rep. 2008;9:569–575. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [18].AJ CQ, Bugai A, Barboric M. Cracking the control of RNA polymerase II elongation by 7SK snRNP and P-TEFb. Nucleic Acids Res. 2016;44:7527–7539. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [19].Muniz L, Egloff S, Kiss T. RNA elements directing in vivo assembly of the 7SK/MePCE/Larp7 transcriptional regulatory snRNP. Nucleic Acids Res. 2013;41:4686–4698. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [20].Uchikawa E, Natchiar KS, Han X, et al. Structural insight into the mechanism of stabilization of the 7SK small nuclear RNA by LARP7. Nucleic Acids Res. 2015;43:3373–3388. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [21].Eichhorn CD, Chug R, Feigon J. hLARP7 C-terminal domain contains an xRRM that binds the 3ʹ hairpin of 7SK RNA. Nucleic Acids Res. 2016;44:9977–9989. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [22].Brown KA, Sharifi S, Hussain R, et al. Distinct dynamic modes enable the engagement of dissimilar ligands in a promiscuous atypical RNA recognition motif. Biochemistry. 2016;55:7141–7150. [DOI] [PubMed] [Google Scholar]
- [23].Lahr RM, Fonseca BD, Ciotti GE, et al. La-related protein 1 (LARP1) binds the mRNA cap, blocking eIF4F assembly on TOP mRNAs. eLife. 2017;6:e24146. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [24].Fonseca BD, Zakaria C, Jia JJ, et al. La-related protein 1 (LARP1) represses terminal oligopyrimidine (TOP) mRNA translation downstream of mTOR complex 1 (mTORC1). J Biol Chem. 2015;290:15996–16020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [25].Philippe L, Vasseur JJ, Debart F, et al. La-related protein 1 (LARP1) repression of TOP mRNA translation is mediated through its cap-binding domain and controlled by an adjacent regulatory region. Nucleic Acids Res. 2018;46:1457–1469. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [26].Hong S, Freeberg MA, Han T, et al. LARP1 functions as a molecular switch for mTORC1-mediated translation of an essential class of mRNAs. eLife. 2017;6:e25237. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [27].Yang R, Gaidamakov SA, Xie J, et al. La-related protein 4 binds poly(A), interacts with the poly(A)-binding protein MLLE domain via a variant PAM2w motif, and can promote mRNA stability. Mol Cell Biol. 2011;31:542–556. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [28].Mattijssen S, Arimbasseri AG, Iben JR, et al. LARP4 mRNA codon-tRNA match contributes to LARP4 activity for ribosomal protein mRNA poly(A) tail length protection. eLife. 2017;6:e28889. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [29].Cruz-Gallardo I, Martino L, Kelly G, et al. LARP4A recognizes polyA RNA via a novel binding mechanism mediated by disordered regions and involving the PAM2w motif, revealing interplay between PABP, LARP4A and mRNA. Nucleic Acids Res. 2019;47:4272–4291. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [30].Kuspert M, Murakawa Y, Schaffler K, et al. LARP4B is an AU-rich sequence associated factor that promotes mRNA accumulation and translation. Rna. 2015;21:1294–1305. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [31].Cai L, Fritz D, Stefanovic L, et al. Binding of LARP6 to the conserved 5ʹ stem-loop regulates translation of mRNAs encoding type I collagen. J Mol Biol. 2010;395:309–326. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [32].Stefanovic B. RNA protein interactions governing expression of the most abundant protein in human body, type I collagen. Wiley Interdiscip Rev RNA. 2013;4:535–545. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [33].Treiber T, Treiber N, Plessmann U, et al. A compendium of RNA-binding proteins that regulate microRNA biogenesis. Mol Cell. 2017;66:270–284 e213. [DOI] [PubMed] [Google Scholar]
- [34].Jolma A, Zhang J, Mondragón E, et al. Binding specificities of human RNA binding proteins towards structured and linear RNA sequences. bioRxiv. 2019;317909. doi: 10.1101/317909. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [35].Vukmirovic M, Manojlovic Z, Stefanovic B. Serine-threonine kinase receptor-associated protein (STRAP) regulates translation of type I collagen mRNAs. Mol Cell Biol. 2013;33:3893–3906. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [36].Merret R, Martino L, Bousquet-Antonelli C, et al. The association of a La module with the PABP-interacting motif PAM2 is a recurrent evolutionary process that led to the neofunctionalization of La-related proteins. Rna. 2013;19:36–50. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [37].Alfano C, Sanfelice D, Babon J, et al. Structural analysis of cooperative RNA binding by the La motif and central RRM domain of human La protein. Nat Struct Mol Biol. 2004;11:323–329. [DOI] [PubMed] [Google Scholar]
- [38].Teichmann M, Dumay-Odelot H, Fribourg S. Structural and functional aspects of winged-helix domains at the core of transcription initiation complexes. Transcription. 2012;3:2–7. [DOI] [PubMed] [Google Scholar]
- [39].Clark KL, Halay ED, Lai E, et al. Co-crystal structure of the HNF-3/fork head DNA-recognition motif resembles histone H5. Nature. 1993;364:412–420. [DOI] [PubMed] [Google Scholar]
- [40].Gajiwala KS, Burley SK. Winged helix proteins. Curr Opin Struct Biol. 2000;10:110–116. [DOI] [PubMed] [Google Scholar]
- [41].Yoshizawa S, Rasubala L, Ose T, et al. Structural basis for mRNA recognition by elongation factor SelB. Nat Struct Mol Biol. 2005;12:198–203. [DOI] [PubMed] [Google Scholar]
- [42].Dong G, Chakshusmathi G, Wolin SL, et al. Structure of the La motif: a winged helix domain mediates RNA binding via a conserved aromatic patch. Embo J. 2004;23:1000–1007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [43].Martino L, Pennell S, Kelly G, et al. Synergic interplay of the La motif, RRM1 and the interdomain linker of LARP6 in the recognition of collagen mRNA expands the RNA binding repertoire of the La module. Nucleic Acids Res. 2015;43:645–660. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [44].Kotik-Kogan O, Valentine ER, Sanfelice D, et al. Structural analysis reveals conformational plasticity in the recognition of RNA 3ʹ ends by the human La protein. Structure. 2008;16:852–862. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [45].Clery A, Blatter M, Allain FH. RNA recognition motifs: boring? Not quite. Curr Opin Struct Biol. 2008;18:290–298. [DOI] [PubMed] [Google Scholar]
- [46].Afroz T, Cienikova Z, Clery A, et al. One, two, three, four! How multiple RRMs read the genome sequence. Methods Enzymol. 2015;558:235–278. [DOI] [PubMed] [Google Scholar]
- [47].Teplova M, Yuan YR, Phan AT, et al. Structural basis for recognition and sequestration of UUU(OH) 3ʹ temini of nascent RNA polymerase III transcripts by La, a rheumatic disease autoantigen. Mol Cell. 2006;21:75–85. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [48].Curry S, Kotik-Kogan O, Conte MR, et al. Getting to the end of RNA: structural analysis of protein recognition of 5ʹ and 3ʹ termini. Biochim Biophys Acta. 2009;1789:653–666. [DOI] [PubMed] [Google Scholar]
- [49].Travis B, Shaw PLR, Liu B, et al. The RRM of the kRNA-editing protein TbRGG2 uses multiple surfaces to bind and remodel RNA. Nucleic Acids Res. 2019;47:2130–2142. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [50].Cruz-Gallardo I, Martino L, Trotta R, et al. Resonance assignment of human LARP4A La module. Biomol NMR Assign. 2019;13:169–172. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [51].Mackereth CD, Sattler M. Dynamics in multi-domain protein recognition of RNA. Curr Opin Struct Biol. 2012;22:287–296. [DOI] [PubMed] [Google Scholar]
- [52].Sanfelice D, Kelly G, Curry S, et al. NMR assignment of the N-terminal region of human La free and in complex with RNA. Biomol NMR Assign. 2008;2:107–109. [DOI] [PubMed] [Google Scholar]
- [53].Fonseca BD, Lahr RM, Damgaard CK, et al. LARP1 on TOP of ribosome production. Wiley Interdiscip Rev RNA. 2018;9:e1480. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [54].Jacks A, Babon J, Kelly G, et al. Structure of the C-terminal domain of human La protein reveals a novel RNA recognition motif coupled to a helical nuclear retention element. Structure. 2003;11:833–843. [DOI] [PubMed] [Google Scholar]
- [55].Witkin KL, Collins K. Holoenzyme proteins required for the physiological assembly and activity of telomerase. Genes Dev. 2004;18:1107–1118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [56].Singh M, Wang Z, Koo BK, et al. Structural basis for telomerase RNA recognition and RNP assembly by the holoenzyme La family protein p65. Mol Cell. 2012;47:16–26. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [57].Eichhorn CD, Yang Y, Repeta L, et al. Structural basis for recognition of human 7SK long noncoding RNA by the La-related protein Larp7. Proc Natl Acad Sci U S A. 2018;115:E6457–E6466. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [58].Liang C, Xiong K, Szulwach KE, et al. Sjogren syndrome antigen B (SSB)/La promotes global microRNA expression by binding microRNA precursors through stem-loop recognition. J Biol Chem. 2013;288:723–736. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [59].Kucera NJ, Hodsdon ME, Wolin SL. An intrinsically disordered C terminus allows the La protein to assist the biogenesis of diverse noncoding RNA precursors. Proc Natl Acad Sci U S A. 2011;108:1308–1313. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [60].Fan H, Goodier JL, Chamberlain JR, et al. 5ʹ processing of tRNA precursors can be modulated by the human La antigen phosphoprotein. Mol Cell Biol. 1998;18:3201–3211. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [61].Horke S, Reumann K, Rang A, et al. Molecular characterization of the human La protein.hepatitis B virus RNA.B interaction in vitro. J Biol Chem. 2002;277:34949–34958. [DOI] [PubMed] [Google Scholar]
- [62].Castro JM, Horn DA, Pu X, et al. Recombinant expression and purification of the RNA-binding LARP6 proteins from fish genetic model organisms. Protein Expr Purif. 2017;134:147–153. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [63].Nussbacher JK, Yeo GW. Systematic discovery of RNA binding proteins that regulate microRNA levels. Mol Cell. 2018;69:1005–1016 e1007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [64].Brogie JE, Price DH. Reconstitution of a functional 7SK snRNP. Nucleic Acids Res. 2017;45:6864–6880. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [65].Jeronimo C, Forget D, Bouchard A, et al. Systematic analysis of the protein interaction network for the human transcription machinery reveals the identity of the 7SK capping enzyme. Mol Cell. 2007;27:262–274. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [66].Martinez-Zapien D, Saliou JM, Han X, et al. Intermolecular recognition of the non-coding RNA 7SK and HEXIM protein in perspective. Biochimie. 2015;117:63–71. [DOI] [PubMed] [Google Scholar]
- [67].Tcherkezian J, Cargnello M, Romeo Y, et al. Proteomic analysis of cap-dependent translation identifies LARP1 as a key regulator of 5ʹTOP mRNA translation. Genes Dev. 2014;28:357–371. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [68].Schaffler K, Schulz K, Hirmer A, et al. A stimulatory role for the La-related protein 4B in translation. Rna. 2010;16:1488–1499. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [69].Xie J, Kozlov G, Gehring K. The “tale” of poly(A) binding protein: the MLLE domain and PAM2-containing proteins. Biochim Biophys Acta. 2014;1839:1062–1068. [DOI] [PubMed] [Google Scholar]
- [70].Grimm C, Pelz JP. 3PTH. 2010. [Google Scholar]
- [71].Kozlov G, Gehring K. Molecular basis of eRF3 recognition by the MLLE domain of poly(A)-binding protein. PloS One. 2010;5:e10169. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [72].Kozlov G, Menade M, Rosenauer A, et al. Molecular determinants of PAM2 recognition by the MLLE domain of poly(A)-binding protein. J Mol Biol. 2010;397:397–407. [DOI] [PubMed] [Google Scholar]
- [73].Kota V, Sommer G, Durette C, et al. SUMO-modification of the La protein facilitates binding to mRNA in vitro and in cells. PloS One. 2016;11:e0156365. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [74].Zhang Y, Stefanovic B. mTORC1 phosphorylates LARP6 to stimulate type I collagen expression. Sci Rep. 2017;7:41173. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [75].Jurrus E, Engel D, Star K, et al. Improvements to the APBS biomolecular solvation software suite. Protein Sci. 2018;27:112–128. [DOI] [PMC free article] [PubMed] [Google Scholar]