Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2017 Feb 13;114(9):2206–2211. doi: 10.1073/pnas.1616371114

Rules of RNA specificity of hnRNP A1 revealed by global and quantitative analysis of its affinity distribution

Niyati Jain a,1, Hsuan-Chun Lin b,1, Christopher E Morgan a, Michael E Harris b,2, Blanton S Tolbert a,2
PMCID: PMC5338530  PMID: 28193894

Significance

Human heterogeneous nuclear ribonucleoprotein A1 (hnRNP A1) is a key protein that functions in RNA metabolism both under normal and diseased cellular conditions. Despite playing a central role in eukaryotic biology, the mechanisms by which hnRNP A1 discriminates between cognate and noncognate RNA targets remain poorly understood. Here, we combined high-throughput sequencing analysis of equilibrium binding (HTS-EQ) experiments with independent biophysical measurements to reveal the complete hnRNP A1 specificity landscape. The data show that RNA sequence, motif copy number, spacing, and secondary structure determine specificity by modulating rates of productive hnRNP A1-RNA encounters. Thus, our work provides significant insights into the combinatorial factors that determine how hnRNP A1 identifies functional binding sites in vivo.

Keywords: hnRNP A1, protein–RNA specificity, RNA structure, thermodynamics, binding kinetics

Abstract

Heterogeneous nuclear ribonucleoprotein A1 (hnRNP A1) is a multipurpose RNA-binding protein (RBP) involved in normal and pathological RNA metabolism. Transcriptome-wide mapping and in vitro evolution identify consensus hnRNP A1 binding motifs; however, such data do not reveal how surrounding RNA sequence and structural context modulate affinity. We determined the affinity of hnRNP A1 for all possible sequence variants (n = 16,384) of the HIV exon splicing silencer 3 (ESS3) 7-nt apical loop. Analysis of the affinity distribution identifies the optimal motif 5′-YAG-3′ and shows how its copy number, position in the loop, and loop structure modulate affinity. For a subset of ESS3 variants, we show that specificity is determined by association rate constants and that variants lacking the minimal sequence motif bind competitively with consensus RNA. Thus, the results reveal general rules of specificity of hnRNP A1 and provide a quantitative framework for understanding how it discriminates between alternative competing RNA ligands in vivo.


Gene expression is regulated by an ensemble of protein–RNA complexes that assemble and disassemble throughout the lifetime of a transcript (15). Knowledge of the determinants of RNA-binding protein (RBP) specificity is therefore essential to understanding the relative affinity for sites of association within the transcriptome. A characteristic feature of many RBPs is their ability to elicit biological function by binding at sites in RNAs that vary significantly in sequence and local structure. This broad specificity of RBPs challenges the simplistic description of RNA binding as either specific or nonspecific. Global profiling methods provide consensus sequence motifs that represent preferential binding sites by RBPs in the transcriptome (611). Although powerful, such approaches are usually nonquantitative. Moreover, they do not readily provide information on how surrounding sequence identity or positioning of preferred sequences within higher order structure alters affinity. Methods such as high-throughput sequencing kinetics (HTS-KIN), RNA Bind-n-Seq, and RNA MaP have recently been developed that allow the affinities and reaction kinetics of thousands of RNAs to be measured simultaneously (6, 1215). These methods can potentially provide global information on the surrounding context of a given sequence motif; however, they have yet to see wide application for structure/function studies of RNA specificity.

Heterogeneous nuclear ribonucleoproteins (hnRNPs) represent a diverse family of RBPs that are implicated at most stages of posttranscriptional gene regulation (16, 17). The prototypical member of this family, hnRNP A1, regulates alternative splicing, nuclear export, translation, and other RNA processing events (17). HnRNP A1 has a modular domain organization consisting of tandem RNA recognition motifs (RRMs), collectively referred to as unwinding protein 1 (UP1), and an intrinsically disordered RGG-rich C terminus. UP1 is the primary RNA binding domain, whereas the C terminus mediates homologous and heterogeneous protein–protein interactions. The individual RRMs of UP1 have a high degree of sequence homology and adopt nearly identical 3D structures (1820). The hnRNPs, including A1, are thought to bind to pre-mRNAs combinatorially as dictated by their intrinsic specificity for RNA sequence and structure (11, 21). Thus, a comprehensive and quantitative description of hnRNP A1 specificity is required to understand its function in gene expression.

Early studies demonstrated that hnRNP A1 binds selectively to single-stranded RNA and the human telomeric sequence (TTAGGG)n (22). Selection amplification experiment(s) (SELEX) identified a high-affinity motif 5′-UAGGGA/U-3′ that now serves as a benchmark for sequence-specific hnRNP A1 binding (23). Transcriptome-wide studies of functional hnRNP A1 binding sites using motif finder algorithms further indicate specificity for a 5′-UAG-3′ motif (11, 24). In previous work, we determined a small angle X-ray scattering (SAXS) model of UP1 in complex with the HIV ESS3 stem loop (SL3ESS3), which was guided by a 1.92-Å crystal structure of UP1 bound to a 5′-AGU-3′ oligomer (25). The structure indicates that RRM1 and the inter-RRM linker fold to form a nucleobase pocket that specifically recognizes 5′-AG-3′, which is part of the 7-nt SL3ESS3 apical loop (5′-GAUUAGU-3′) (Fig. 1). Moreover, single- or double-cytosine substitutions within the apical loop reduce UP1 binding affinity (upward of 20-fold) by slowing down the rate of complex formation (26).

Fig. 1.

Fig. 1.

UP1 domain of hnRNP A1 binds specifically to 5′-AG-3′ through its nucleobase pocket. (A) Domain organization of full-length hnRNP A1. (B) Secondary structure of the HIV ESS3 (NL4-3 strain) stem loop wherein apical loop residues involved in UP1 binding are colored red. (C) Surface representation of UP1 bound to 5′-AGU-3′ (red). UP1 color coding is the same as in A.

To evaluate the distribution of hnRNP A1 among the sea of alternative binding sites in the transcriptome requires a more complete understanding of specificity. Here, we comprehensively evaluate the determinants of specific UP1–SL3ESS3 interactions by measuring relative equilibrium association constants (KA,rel) for all variants of a fully randomized 7-nt SL3ESS3 apical loop (n = 16,384). To achieve this goal, we used high-throughput sequencing analysis of equilibrium binding (HTS-EQ), a newly developed method that uses Illumina sequencing to quantify the protein or enzyme binding of thousands of RNAs simultaneously. Quantitative analyses of the resulting affinity distribution from HTS-EQ further establishes the consensus motif 5′-YAG-3′, but also comprehensively reveal how affinity for alternative RNAs is controlled by its position in the loop, spacing when two motifs are present, and the presence of unfavorable secondary structure. For a subset of SL3ESS3 variants studied, we further show that discrimination is primarily controlled by the association rate constant, and that variants lacking the 5′-YAG-3′ motif nonetheless compete with consensus RNA for binding. Thus, the results comprehensively reveal the rules of specificity of hnRNP A1, establish a mechanistic basis for discrimination, and provide a framework for understanding how it discriminates between alternative competing RNA targets in vivo.

Results

HTS-EQ Reveals the Global Affinity Distribution of UP1-SL3ESS3.

To evaluate the determinants of UP1-SL3ESS3 specificity comprehensively, we randomized the 7-nt SL3ESS3 apical loop and performed HTS-EQ (27) (Fig. 2A). HTS-EQ involves binding of a randomized RNA population to increasing concentrations of UP1 under equilibrium conditions, followed by separation of the free and bound RNAs by EMSA (HTS-EQ work flow is shown in Fig. S1). By analyzing the change in distribution of individual sequence variants quantified by Illumina sequencing as a function of RBP concentration allows measurement of the KA,rel values for all possible sequence variants calibrated to native SL3ESS3 [KA,rel = KA(NNNNNNN)/KA(GAUUAGU)]. Variants that bind weaker than the native RNA have KA,rel values <1, whereas variants that bind stronger have KA,rel values >1.

Fig. 2.

Fig. 2.

Determination of the affinity distribution for UP1 binding to all possible variants of the 7-nt SL3ESS3 apical loop. (A) Secondary structure of SL3ESS3 showing the constant base-paired stem loop and the randomized 7-nt apical loop region (red). (B) Affinity distribution of KA,rel values for UP1 binding to all possible sequence variants in the randomized pool of SL3ESS3 variants. The vertical line corresponds to the value for the reference native SL3ESS3 (lnKA,rel = 0). Seven hundred twenty-two variants bind with higher affinities than native SL3ESS3, of which 72% contain at least one 5′-AG-3′ dinucleotide. (C) Probability sequence logo calculated for the 50 highest affinity SL3ESS3 variants. (D) Comparison of experimental KA,rel values and KA,rel values predicted using a PWM model that includes interaction coefficients (IC values) between nucleobases. (E) Comparison of the IC values derived from fitting the KA,rel affinity distribution to Eq. S7 as described in SI Materials and Methods. The magnitude of the IC values is indicated by differences in color: red, positive interactions; blue, negative interactions.

Fig. S1.

Fig. S1.

Overview of UP1-SL3ESS3 HTS-EQ work flow. (1) Randomization of the SL3ESS3 7-nt apical loop results in >16,000 different sequence variants. (2) Binding reactions are done under equilibrium binding conditions so that standard binding models can be applied. (3) Free and bound populations are separated by EMSA and purified by standard molecular biology methods. (4) Library is constructed and subjected to Illumina sequencing. Bar coding allows parallel analyses of multiple samples and experiments. (5) KA,rel is calculated as described in SI Materials and Methods using a simple competitive binding model from the ratio of each substrate to a reference substrate at known concentrations of UP1. (6) Resulting affinity distribution contains information that comprehensively defines sequence specificity, which can be modeled using PWM + IC analyses. Combining these data with additional high-throughput quantitative biochemical and computational data can provide global insights into protein specificity.

Fig. 2B shows the resulting affinity distribution [i.e., distribution of the number of sequences among the range of observed KA,rel (lnKA,rel) values]. The KA,rel values are best visualized as the natural log so that the magnitude reflects relative free energies between sequence variants. The histogram reveals native SL3ESS3 is optimized to bind UP1 because it lies toward the high-affinity region of the distribution; however, 722 variants bind with lnKA,rel values greater than the native sequence. Of the 722 variants, 72% contain at least one 5′-AG-3′ motif and 98% of the top 50 variants each have a 5′-AG-3′. Calculation of an optimal sequence logo from the top 50 variants (Fig. 2C) shows an overall preference for U, A, and G, which is consistent with the base composition of the high-affinity 5′-UAGGGA/U-3′ sequence identified by SELEX (23); however, the SELEX sequence lies slightly to the low-affinity side of native SL3ESS3 (lnKA,rel = −0.24) in the distribution, indicating the complexity of hnRNP A1 specificity. Nevertheless, HTS-EQ clearly reveals that the signature of a specific hnRNP A1 target minimally contains a 5′-AG-3′.

The relative binding affinities for different sequence variants are nonetheless clearly further tuned by the identity of neighboring sequences in the RNA. Such important features of specificity are not fully accounted for by models derived only from optimal sequence variants. To identify the local sequence and structure properties that determine specificity, we fit the entire distribution of KA,rel values to two analytical models of RNA sequence specificity. First, a position weight matrix (PWM) model was used that only considers the identity of a nucleobase at each position in the loop (28). The affinity distribution was also fit to a PWM model that includes interaction coefficients (IC values) between nucleobases that account for positive and negative effects of sequence variation between positions in the binding site (29, 30). The PWM model performs poorly at recapitulating the determinants of UP1-SL3ESS3 specificity, whereas inclusion of IC values greatly improves the correspondence between the model and experimental data (R2 > 0.7) (compare Fig. 2D and Figs. S2 and S3). Therefore, binding specificity does not solely involve the selection of optimal AG nucleobases at individual positions in the loop but also must include additional contributions from the surrounding sequence context.

Fig. S2.

Fig. S2.

Quantitative sequence specificity modeling of the KA,rel affinity distribution for UP1 binding of the 7-nt SL3ESS3 randomized apical loop population. (A) Comparison of experimentally observed KA,rel values to predicted values obtained by fitting the affinity distribution to a PWM sequence specificity model (Eq. S6) as described in SI Materials and Methods. Evaluation of the plot using a linear function indicates the model is sufficient to explain only ca. 20% of the data. (B) Projection of PWM values (linear coefficient) obtained from fitting the KA,rel affinity distribution to a sequence specificity model containing both PWM and pairwise IC value terms using Eq. S7 as described in SI Materials and Methods. (C) Dot-plot comparison of IC values obtained by fitting KA,rel affinity distributions for UP1 binding of the 7-nt SL3ESS3 randomized apical loop population from two independent experiments. An ideal correlation line [m (slope) = 1, b (intercept) = 0] is shown for reference. Fitting the dot plot projection of the data to a linear function yields an R2 value of 0.68.

Fig. S3.

Fig. S3.

Comparison of ln(KA,rel) values obtained in two independent experiments. Individual KA,rel data points are compared as a dot plot for independently performed HTS-EQ of two unique 7-nt SL3ESS3 randomized apical loop populations. The depletion of bound species from the free population is quantified in HTS-EQ, and the greatest experimental error is observed for lower affinity sequence variants.

Recasting the IC values as a heat map offers insights into sequence and structure that modulate binding affinity (Fig. 2E). For example, negative IC values are observed between N1–N7 and N2–N6 for sequence combinations that can form Watson–Crick pairs. This result indicates that base-pairing interactions that reduce the loop size are highly unfavorable. Also, the subsets of variants containing stable loop sequences UUCG, GNRA, and UNCG have distributions that are shifted to lower KA,rel values (Fig. S4). In contrast, the heat map shows positive couplings for 5′-UA-3′ and 5′-AG-3′ dinucleotides as well as, to a lesser extent, 5′-CA-3′ dinucleotides at adjacent positions in the loop (Fig. 2E).

Fig. S4.

Fig. S4.

Histogram comparisons of subsets of the ln(KA,rel) affinity distribution to isolate the effects of sequence and structure signatures. (A) Histogram comparison of the entire ln(KA,rel) affinity distribution for UP1 binding of the 7-nt SL3ESS3 randomized apical loop population (green) to subsets of the population of sequence variants that either contain an AG (red) or lack this dinucleotide sequence (woAG; blue). Note that the native sequence used as a reference has an ln(KA,rel) = 0. (B) Histogram comparison of the subsets of sequence variants in the ln(KA,rel) affinity distribution that contain stable tetraloop sequences CUUG (red), GNRA (green), or UNCG (blue). The sequence reference has an ln(KA,rel) = 0.

The affinity distribution of variants containing a 5′-AG-3′ but lacking an unfavorable secondary structure is clearly shifted to higher KA,rel values compared with AG variants that contain secondary structure (Fig. 3A). Approximately 200 of 722 variants with KA,rel values >1 lack at least one AG in the loop. To identify the positive determinants at adjacent positions, high-affinity variants were aligned according to the AG dinucleotide in their sequence regardless of its position in the loop (Fig. 3B). A sequence probability logo calculated from this alignment further identifies C/U located 5′ to 5′-AG-3′ and reveals a G positioned 3′ as an additional positive determinant. A probability logo of high-affinity variants (KA,rel > 1) lacking 5′-AG-3′ shows a polyG pattern (Fig. 3C). Nonetheless, the optimal specificity determinant is clearly a 5′-YAG-3′ motif that may occur at multiple positions in the loop. Comparison of the KA,rel values for this motif at each possible position in the loop reveals that UAG is optimal at N5–N7 (Fig. 3 E and F). This observation contrasts with the position of UAG at N4–N6 in the native HIV ESS3 loop (Fig. 3E). Sequence variants with two 5′-AG-3′ dinucleotides, on average, bind with higher affinity than the native sequence; however, an optimal spacing of at least two nucleotides is required for the additive effect to be observed (Fig. 3D). Thus, analysis of the KA,rel affinity distribution reveals an unfavorable loop structure; establishes 5′-YAG-3′ as the minimal sequence specificity motif; and, most importantly, defines how local sequence context modulates the contribution of this motif to affinity.

Fig. 3.

Fig. 3.

Analysis of sequence and structure attributes that modulate binding affinity of hnRNP A1. (A) Histogram comparison of the KA,rel affinity distributions of sequence variants with a 5′-AG-3′ dinucleotide motif (AG, red) and the subset of these sequences filtered to remove sequence variants with predicted base-pairing interactions between N1N2 and N6N7 (no structure; steel blue). (B) Sequence probability logo for the highest affinity (KA,rel > 1) variants aligned with respect to the position of the 5′-AG-3′ in their sequence. Because all of the variants included in the analysis contained an AG at the aligned positions, their binary probability is 2 by definition. (C) Sequence probability logo of the highest affinity SL3ESS3 loop variants that lack a 5′-AG-3′ dinucleotide in their sequence. (D) Violin plot of the distributions of KA,rel values for SL3ESS3 loop sequence variants with two 5′-AG-3′ motifs plotted according to the number of nucleotides between the two AG sequences. (E) Violin plot of the probability density of KA,rel values for SL3ESS3 loop sequence variants with only one 5′-AG-3′ motif plotted according to the position of the AG in the loop (1–2 = A1G2; 2–3 = A2G3, etc.). (F) Sequence probability logo for highest affinity SL3ESS3 loop sequence variants with A6G7.

Analysis of the Thermodynamic and Kinetic Basis for UP1-SL3ESS3 Specificity.

To understand the thermodynamic contributions to UP1 specificity, we randomly selected SL3ESS3 variants from different regions of the affinity distribution and quantitatively analyzed their binding properties by calorimetric titrations (Fig. 4A and Fig. S5). The KA,rel values measured by HTS-EQ and calorimetry correlate for the selected SL3ESS3 variants (Fig. 4B and Fig. S5), indicating HTS-EQ provides an accurate depiction of the distribution of binding affinities. Although the KA,rel correlation between HTS-EQ and isothermal titration calorimetry (ITC) is good (R2 = 0.85), a few outliers are observed (Fig. S5). Variations in sample quality for ITC and inherent noise in the high-throughout sequencing data that increases with lower affinity substrates are likely major contributors to the ∼15% discrepancy.

Fig. 4.

Fig. 4.

Independent biophysical measurements correlate with HTS-EQ and reveal insights into the binding mechanism. (A) Six representative isotherms for UP1 titrated into SL3ESS3 variants from the three regions of the affinity distribution: high, medium, and low. Titrations were performed at 298 K, 140 mM K+, and pH 6.5. Processed isotherms were fit to a 1:1 binding model. Reported dissociation constants correspond to average values of two measurements. (B) Comparison of the relative binding affinities measured by HTS-EQ and ITC for the selected SL3ESS3 variants. The relative binding affinities for each variant are calibrated to native SL3ESS3 (GAUUAGU). The plot shows that the two methods provide comparable information regarding the relative binding affinities and that HTS-EQ provides a true depiction of specificity. Fig. S5 shows the comparison between HTS-EQ and ITC for all 15 SL3ESS3 variants (R2 = 0.85). (C) Comparison of the thermodynamic signatures of UP1 titrations with SL3ESS3 variants. Thermodynamic values represent average ± SD for two replicates. (D and E) Summary of kinetic binding parameters for UP1 with select SL3ESS3 variants reveals the predominant driver of specificity is a change in association rates, whereas dissociation rates are mostly unperturbed. Kinetic values represent average ± SEM for two replicates. (F) Dissociation constants for UP1 with select SL3ESS3 variants as determined from kinetic binding parameters. Note that the trend in the equilibrium dissociation constants measured by BLI and ITC correlate but are, on average, 1.5-fold different.

Fig. S5.

Fig. S5.

(A) Calorimetric titration profiles for the nine additional randomly selected SL3ESS3 variants. The nine panels correspond to variants selected from the high, medium, and low regions of the HTS-EQ distribution. Reported dissociation constants correspond to average values of two measurements. Titrations were performed at 298 K in 10 mM K2HPO4 (pH 6.5), 120 mM KCl, 10 mM NaCl, 0.5 mM EDTA, and 1 mM TCEP. Processed thermograms were fit to a single-site binding isotherm. (B) Full comparison of the relative binding affinities measured by HTS-EQ and ITC for selected SL3ESS3 variants. The relative binding affinities for each variant are calibrated to native SL3ESS3 (GAUUAGU). The plot shows that the two methods provide comparable information regarding the relative binding affinities and that HTS-EQ provides a true depiction of UP1 specificity.

Consistent with observations from HTS-EQ, UP1 binds SL3ESS3 variants that contain at least one 5′-AG-3′ with greater affinity than non-AG variants, and sequences with two 5′-AG-3′ motifs bind slightly stronger than sequences with just one, likely reflecting a higher probability for UP1 to interact productively with these variants. On average, SL3ESS3 variants with no 5′-AG-3′ bind approximately sevenfold weaker than native SL3ESS3. This observation is consistent with our previous results using single-cytosine substitutions, wherein UP1 binds the loop sequence 5′-GAUUCGU-3′ 10-fold weaker than 5′-GAUUAGU-3′ (26). The thermodynamic signatures reveal that the reduction in binding affinity for non-AG variants manifests primarily as a decrease in favorable association enthalpy (ΔΔH = 8 kcal/mol).

As predicted from the sequence specificity model derived from fitting the affinity distribution measured by HTS-EQ, some variants form new Watson–Crick base pairs within the apical loop that are inhibitory to binding (Fig. 2E). Indeed, 1H NMR of the imino region of 5′-GGUGACC-3′ and 5′-UGUGGCA-3′ (proposed pairing interactions underlined here and throughout remaining text) show new imino signals with chemical shifts consistent with GC and AU base pairs (Fig. S6). The new base pairs reduce the loop from 7 to 3 nt. Calorimetric titrations of UP1 with 5′-GGUGACC-3′ and 5′-UGUGGCA-3′ show binding affinities are weaker relative to native SL3ESS3 as a consequence of a significant loss of binding enthalpy (∼ΔΔH = 15 kcal/mol).

Fig. S6.

Fig. S6.

(Left) Secondary structure of SL3ESS3 with the randomized nucleotides indicated in red. Numbering of the constant base-paired region corresponds to the HIV NL4-3 strain. (Right) One-dimensional 1H NMR spectra (500 MHz) show the imino region of select SL3ESS3 variants. Assignments are based on native SL3ESS3. G(T7) corresponds to nonnative 5′-G added to increase transcription efficiency. UN+1, GN+1, and GN+2 correspond to additional imino signals that are the result of new base pairing within the apical loop. All spectra were collected at 278 K in 10 mM K2HPO4 (pH 6.5), 120 mM KCl, 10 mM NaCl, 0.5 mM EDTA, and 1 mM TCEP.

We next investigated the kinetic contributions to specificity by performing biolayer interferometry (BLI) studies with selected SL3ESS3 variants (Fig. 4 DF, Fig. S7, and Table S1). Kinetic analysis shows that all six variants dissociate from UP1 with comparable apparent first-order off-rate constants (koff, app = 4.5–6.2 × 10−3 s−1), whereas the apparent second-order on-rate constants for association (kon, app = 0.3–1.1 × 105 M−1⋅s−1) show a modest but larger sequence dependence. SL3ESS3 variants with one 5′-AG-3′ bind with slightly slower kon, app values compared with 5′-GUAGGAG-3′; however, non-AG variants bind, on average, 2.5-fold slower, and 5′-UGUGGCA-3′ binds approximately fourfold slower. Decreases in kon, app values were previously determined as the predominant mechanism that contributes to weaker affinities for cytosine-substituted SL3ESS3 constructs, and similar observations have recently been made for other protein–RNA systems (13, 25, 27). Thus, for some SL3ESS3 variants, specificity is driven, in large part, by the frequency with which UP1 collides productively with a cognate RNA that minimally exposes a conformationally flexible 5′-AG-3′ motif. However, a more detailed kinetic study of SL3ESS3 variants is needed to understand fully if the differences in kon, app values result from possible intermediates in the binding pathway. Nevertheless, the trend in the equilibrium dissociation constants measured independently by calorimetric titrations (Fig. 4A) and BLI (Fig. 4F) correlate, indicating that the kinetic parameters indeed offer some insights into the mechanisms by which hnRNP A1 discriminates cognate from noncognate RNA targets.

Fig. S7.

Fig. S7.

Analysis of the binding kinetics of UP1 with six different SL3ESS3 variants as measured by BLI. The processed unimolecular dissociation and bimolecular association curves are shown, along with their respective fits. All kinetic data were analyzed using GraphPad Prism. Global fits to Y = (Y0 − NS) * exp−kt + NS, where Y0 is binding at time 0 and NS is binding (nonspecific) at infinite time, were used to determine the first-order off-rate constant (koff, app) from the dissociation curves. Of note, the dissociation curves were all scaled to the same t = 0 value for visualization. Global fits to Y = Ymax * (1 – exp(−1 * kob * X)) were used to determine the apparent kon, app. Here, Ymax is the signal at maximal occupancy and kobs = kon, app * L + koff, where L is a fitting parameter. BLI experiments were performed in duplicate and at 298 K in 10 mM K2HPO4 (pH 6.5), 120 mM KCl, 10 mM NaCl, 0.5 mM EDTA, and 1 mM TCEP.

Table S1.

BLI binding kinetic parameters and global fitting statistics

RNA Avg. koff, app ± SEM (×10−3 s−1) R2 1 R2 2 95% CI 1 (×10−3 s−1) 95% CI 2 (×10−3 s−1) Avg. kon, app ± SEM (×105 M−1⋅s−1) R2 1 R2 2 95% CI 1 (×105 M−1⋅s−1) 95% CI 2 (×105 M−1⋅s−1) Avg. Kd ± SEM, nM
GUAGGAG 4.49 ± 0.08 0.99 0.99 4.38–4.45 4.54–4.61 1.18 ± 0.29 0.97 0.96 0.88–0.90 1.46–1.49 40.2 ± 9.2
GUAGUUG 6.22 ± 0.07 0.98 0.98 6.22–6.36 6.08–6.22 0.87 ± 0.04 0.95 0.94 0.91–0.93 0.82–0.84 71.2 ± 2.8
GUAGUUU 5.28 ± 0.16 0.97 0.97 5.04–5.19 5.36–5.53 0.59 ± 0.16 0.95 0.95 0.74–0.76 0.42–0.44 97.2 ± 28.6
GGUGUCA 5.81 ± 0.11 0.99 0.98 5.87–5.98 5.63–5.75 0.62 ± 0.02 0.98 0.96 0.60–0.61 0.63–0.64 93.5 ± 4.3
GGUGACC 5.13 ± 0.05 0.99 0.98 5.22–5.32 5.03–5.13 0.51 ± 0.08 0.97 0.95 0.44–0.44 0.59–0.59 102 ± 16.0
UGUGGCA 6.18 ± 0.04 0.98 0.98 6.08–6.19 6.17–6.28 0.30 ± 0.01 0.97 0.97 0.26–0.29 0.31–0.32 206 ± 7.4

Kinetic measurements were performed in duplicate, wherein the statistics are shown for each replicate (1 and 2). Avg., average; BLI, biolayer interferometry; CI, confidence interval; SEM, standard error to the mean.

UP1 Uses the Same Binding Surface to Interact with 5′-AG-3′ and Non–AG-Containing RNAs.

We previously showed UP1 interacts with native SL3ESS3 through its RRM1 domain and inter-RRM linker to form a 1:1 complex (25). Although capable of binding RNA, RRM2 is not considered to be the preferred high-affinity binding surface. To test whether SL3ESS3 variants compete for the same surface on UP1, we carried out competitive calorimetric titrations with SL3ESS3 loop sequence variants 5′-GUAGGAG-3′ and 5′-GCACUUU-3′. For simplicity, we refer to these RNAs as (AG)SL3ESS3 and (AC)SL3ESS3, respectively. Titration of (AC)SL3ESS3 with UP1 shows that it binds with a dissociation constant of 344 nM; however, (AC)SL3ESS3 does not bind to a preformed UP1–(AG)SL3ESS3 complex under these conditions (Fig. 5A). This observation is consistent with the ∼18-fold stronger binding affinity of UP1 for (AG)SL3ESS3 compared with (AC)SL3ESS3 (Fig. 5 A and B). Conversely, (AG)SL3ESS3 binds with reduced apparent affinity for UP1 when titrated into a reaction containing preformed UP1–(AC)SL3ESS3 complexes (Fig. 5B). As expected for competitive binding, the apparent affinity of (AG)SL3ESS3 for UP1 decreases as the concentration of (AC)SL3ESS3 increases.

Fig. 5.

Fig. 5.

SL3ESS3 variants compete for the same UP1 binding surface. (A) Calorimetric titrations of (AC)SL3ESS3 with free UP1 and UP1 preloaded with (AG)SL3ESS3 at a 1.3:1 (RNA/protein) ratio show that (AC)SL3ESS3 is unable to bind to a preformed UP1–(AG)SL3ESS3 complex. (B) Calorimetric titrations of (AG)SL3ESS3 into UP1 preloaded with increasing ratios of (AC)SL3ESS3 show (AG)SL3ESS3 can still bind UP1, although with decreasing apparent affinities. (C) 1H-13C HMQC titration of UP1 with (13C)U-selectively labeled (AC)SL3ESS3 indicates UP1 binds site specifically to the apical loop. (D) Inclusion of unlabeled (AG)SL3ESS3 with (13C)U-selectively labeled (AC)SL3ESS3 at equimolar concentrations reveals UP1 no longer binds (AC)SL3ESS3. (E) Addition of twofold excess of (AC)SL3ESS3 into UP1 preloaded with (13C)U-selectively labeled (AC)SL3ESS3 shows (AG)SL3ESS3 can effectively displace bound (AC)SL3ESS3. Collectively, these results indicate that SL3ESS3 variants compete for the same binding surface on UP1 and that UP1 preferentially binds RNA elements with a 5′-AG-3′ motif. Titrations were performed at 298 K, 140 mM K+, and pH 6.5.

It is possible that lower affinity SL3ESS3 variants bind to UP1 in a nonspecific manner in which the protein associates weakly with multiple different sites on the RNA. To test the binding mode directly, 1H-13C heteronuclear multiple quantum coherence (HMQC) titrations were performed using a (13C)U selectively labeled (AC)SL3ESS3 construct. Fig. 5C shows that despite its lower affinity, UP1 binds (AC)SL3ESS3 in a site-specific manner because only a subset of the correlation peaks show significant chemical shift perturbations. In the presence of an equimolar amount of unlabeled (AG)SL3ESS3, the 1H-13C correlation peaks of (AC)SL3ESS3 are identical to its unbound form (Fig. 5D). Fig. 5E shows that adding twofold excess (AG)SL3ESS3 to a preformed UP1–(AC)SL3ESS3 complex effectively displaces (AC)SL3ESS3. Collectively, the calorimetric and NMR titrations reveal that SL3ESS3 variants compete for the same binding surface on UP1 and that UP1 preferentially associates with (AG)SL3ESS3 over (AC)SL3ESS3 when both are present at similar concentrations. The observation that different SL3ESS3 variants compete for the same surface on UP1 is significant because it verifies that the relative binding affinities determined from HTS-EQ reflect the same binding mode.

The UP1 HTS-EQ Affinity Distribution Allows Prediction of Potential microRNA Targets.

The RNA specificity model derived from the UP1-SL3ESS3 HTS-EQ affinity distribution reveals sequence and structure attributes that should extend to other biologically relevant stem loop systems. To test the predicative capabilities, we applied the UP1-SL3ESS3 trained model to microRNAs (miRNAs) derived from miRBase. Fig. S8 shows the predicted affinity distribution for a subset (n = 1,193) of miRNAs that contain hairpin loops >3 nt long (Materials and Methods). The distribution indicates that hnRNP A1 has a range of affinities for miRNA targets that reflect both positive and negative contributions of RNA sequence and structure. Indeed, the model recapitulates trends in relative affinities for 20 of 22 miRNAs for which experimental binding data to hnRNP A1 were determined using a chemiluminescent microtiter assay (31). Significantly, the model predicts the relative affinities of mir-18a (119 nM) and let-7a (12 nM), for both of which biogenesis is regulated by hnRNP A1 (32, 33). Consistent with attributes derived from the PWM + IC model, the hairpin loops of mir-18a and let-7a each contain at least one unpaired 5′-UAG-3′ motif. Thus, the RNA specificity model derived from the UP1-SL3ESS3 HTS-EQ data can be used to predict other biologically relevant RNA targets of hnRNP A1. Application of the model to miRNAs hints that hnRNP A1 might be a general accessory factor for a subset of primary miRNAs.

Fig. S8.

Fig. S8.

Prediction of potential miRNA targets for hnRNP A1. The distribution of predicted affinities of apical loops with at least three unpaired nucleotides in human miRNAs are shown by histogram. The relative positions of 22 miRNAs for which binding affinities were experimentally measured using a chemiluminescent microtiter assay are displayed (31). Equilibrium dissociation constants for each miRNA are enclosed in parentheses (red). Of note, those miRNAs with equilibrium dissociation constants listed at >150 nM were not accurately measured in the chemiluminescent microtiter assay (black).

Discussion

RNA sequence and structure contribute to specific protein recognition; therefore, identifying these intrinsic properties is necessary to interpret fully how protein–RNA networks control gene expression. Although they have proven transformative, most transcriptome-wide methods probe RBP specificity without obtaining information on the underlying structural context; they also do not provide quantitative information about relative affinities for alternative binding sites. Here, we applied HTS-EQ, a newly developed high-throughput method that allows measurement of the relative equilibrium constants of an RBP to thousands of RNAs simultaneously (27). We used HTS-EQ to evaluate interactions between the UP1 domain and a randomized pool of HIV SL3ESS3 variants containing all possible loop sequences. Quantitative sequence specificity modeling of the resulting distribution of binding affinities allowed the intrinsic determinants of hnRNP A1 recognition to be identified. HnRNP A1 is generally characterized as nonspecific owing to its wide range of cellular functions; however, crosslinking immunoprecipitation (CLIP) and RNAcompete studies identified similar consensus patterns that center around a composite UAGG motif (11, 24, 34). Consistent with those results, the highest affinity SL3ESS3 variant identified from HTS-EQ has a GUAGGAG sequence and 76% of the top 50 variants all contain UAG. Despite this correspondence, we were unable to determine a clear 7-nt consensus pattern from the HTS-EQ distribution because the minimal 5′-AG-3′ motif can occupy multiple and energetically similar registers around the apical loop. Indeed, quantitative modeling of the affinity distribution revealed favorable couplings between UA and AG dinucleotides at adjacent positions throughout the loop, which is consistent with 5′-AG-3′ being the most frequently observed dinucleotide identified in variants that bind tighter than native SL3ESS3. Thus, we conclude that the specificity of hnRNP A1 centers on a conformationally exposed 5′-AG-3′ dinucleotide but that binding affinities are modulated by the surrounding sequence and structural context.

The preference for 5′-AG-3′ is understood by analysis of the UP1-AGU crystal structure, which shows both rA and rG engage in stereospecific contacts within the nucleobase pocket. Purines are preferred at both positions within the pocket due to favorable van der Waals and cation–π stacking interactions. Pyrimidines, due to their smaller ring size, are unlikely to satisfy these interactions with equivalent energetics. An adenine in the first position makes specific hydrogen bond interactions with functional groups contributed from Val90 and Arg88. At the second position, the guanine is primarily selected through hydrogen bonds with Gln12 and Lys15. Although some of these interactions can be satisfied with other dinucleotides, the combined effects of size and functional group specificity are unique to 5′-AG-3′. We previously hypothesized the hydrogen bond interaction between the N6 amino group of adenine and the α-carbonyl oxygen of Arg88 triggers a conformational change in UP1 that transmits across the inter-RRM linker to RRM2. Arg88 is involved in one of two conserved salt-bridge interactions that stabilize the relative RRM orientations and the nucleobase pocket. Mutation of Arg88 resulted in an ∼18-fold reduction in binding affinity for native SL3ESS3, presumably by causing the nucleobase pocket to misfold (25). Thus, the ability of rA to induce a conformational change might increase the apparent thermodynamic contribution of rA at this position, because no other nucleobase can simultaneously fulfill the stacking interactions and hydrogen bond with Arg88.

It is confounding that 5′-AG-3′ can determine functional specificity, given its potential to be highly redundant in genomes. As revealed in the HTS-EQ distribution, contextual features surrounding 5′-AG-3′ modulate binding affinities of UP1 for SL3ESS3 variants and individual biophysical measurements indicate the differences in affinities manifest as perturbations in association rate constants. Therefore, hnRNP A1 achieves specificity, in part, from the frequency with which it collides productively with RNA targets that contain 5′-AG-3′ sequences embedded within optimized environments. Although the UP1 domain of hnRNP A1 retains binding affinity for non–AG-containing RNAs, the biological significance of those associations will also depend on physiological conditions within the cell. Transcript abundance, posttranscriptional modifications, and subcellular localization will each play a role in determining the extent to which hnRNP A1 occupies non-AG transcripts. Of note, hnRNP A1 has been shown to partition to RNA granules under different physiological conditions, including those physiological conditions associated with ALS pathogenesis (35). It will be of great interest to determine the sequence and structural composition of transcripts that contribute to this property.

A key function of hnRNP A1 is to regulate alternative splicing events, in which it associates with intronic or exonic splicing silencers to occlude components of the splicing apparatus. The results presented here suggest that hnRNP A1 might more globally affect splicing outcomes by binding directly to 3′-acceptor sites that contain consensus 5′-YAG-3′ motifs. Along those lines, hnRNP A1 was shown to proofread binding of U2AF to 3′-acceptor sites that contain AG but not CG dinucleotides (36). As revealed by quantitative modeling of the HTS-EQ distribution, CG dinucleotides contribute negatively toward UP1-SL3ESS3 recognition at most positions around the 7-nt apical loop (Fig. 2E). Thus, the ability of hnRNP A1 to regulate splicing likely derives from how rapidly it forms stable complexes with target transcripts that are also being competed for by other RBPs.

In summary, we applied HTS-EQ to determine the global affinity distribution of hnRNP A1 by evaluating binding of its UP1 domain to a randomized pool of HIV SL3ESS3 variants. The distribution reveals hnRNP A1 retains binding affinity for a wide range of RNA targets; however, specificity is decisively determined by the local context of 5′-YAG-3′ motifs. Therefore, the work presented here represents a quantitative and comprehensive evaluation of hnRNP A1 specificity, and it raises interesting questions with regard to the combinatorial factors that determine how hnRNP A1 identifies functional binding sites in vivo.

Materials and Methods

The overall work flow of the HTS-EQ method is summarized in Fig. S1. Details regarding sample preparation, data analysis, and data interpretation for HTS-EQ and the biophysical studies can be found in SI Materials and Methods.

SI Materials and Methods

Measurement of RNA Binding Affinity Distributions by HTS-EQ.

Randomized ESS3 RNA sequence variants (SL3ESS3R) were synthesized by in vitro transcription from synthetic DNA templates following standard procedures. The UP1 protein was expressed and purified as previously described (37). For the binding assay, 1 μM SL3ESS3R was mixed with increasing concentrations of UP1 to make the molar ratios 0, 0.25, 0.5, and 0.75. The complex samples were equilibrated at room temperature in 10 mM K2HPO4 (pH 6.5), 120 mM KCl, 10 mM NaCl, 0.5 mM EDTA, and 1 mM Tris(2-carboxyethyl)phosphine (TCEP) for 30 min. Unbound randomized substrates were separated from UP1–SL3ESS3R complex by EMSA, which was run at 80 V for 1.2 h in Tris/Borate/EDTA (TBE) buffer and at 4 °C. After separating the UP1–SL3ESS3R complex and the substrate, the unbound substrates were collected for high-throughput sequencing. The sample preparation steps were similar to those sample preparation steps published previously (38). Following separation of unbound and UP1 bound SL3ESS3 variants, the unbound RNAs were ligated to a 17-nt adapter, 5′-rAppCTGTAGGCACCATCAAT-NH2-3′ (NEB S1315S; New England Biolabs) by T4 RNA ligase 2 (NEB M0242S; New England Biolabs), and gel purification was used to remove the unreacted sequences. The adapter-ligated RNA constructs were then reverse-transcribed using SuperScript III Reverse Transcriptase (ThermoFisher) and the following reverse transcription primer: 5′-(Phos)-AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGC-(SpC18)-CACTCA-(SpC18)-TTCAGACGTGTGCTCTTCCGATCTATTGATGGTGCCTACAG-3′.

The designation (Phos) indicates 5′ phosphorylation, and -(SpC18)- indicates a hexa-ethyleneglycol spacer. After gel purification, the RT products were cyclized by CircLigase ssDNA Ligase. The indexed reverse library PCR primer was as follows: 5′-CAAGCAGAAGACGGCATACGAGATNNNNNNGTGACTGGAGTTCAGACGTGTGCTCTTCCG-3′. (The underlined NNNNNN indicates the reverse complement of the index sequence discovered during Illumina sequencing). The forward library PCR primer was 5′-AATGATACGGCGACCACCGAGATCTACAC-3′, and the reverse PCR primers with different barcodes are indicated below.

Forward index (5′→3′) Indexed reverse library PCR primer (5′→3′)
ACGACT CAAGCAGAAGACGGCATACGAGATAGTCGTGTGACTGGAGTTCAGACGTGTGCTCTTCCG
ATCAGT CAAGCAGAAGACGGCATACGAGATACTGATGTGACTGGAGTTCAGACGTGTGCTCTTCCG
CAGCAT CAAGCAGAAGACGGCATACGAGATATGCTGGTGACTGGAGTTCAGACGTGTGCTCTTCCG
CGACGT CAAGCAGAAGACGGCATACGAGATACGTCGGTGACTGGAGTTCAGACGTGTGCTCTTCCG
GCAGCT CAAGCAGAAGACGGCATACGAGATAGCTGCGTGACTGGAGTTCAGACGTGTGCTCTTCCG
TACGAT CAAGCAGAAGACGGCATACGAGATATCGTAGTGACTGGAGTTCAGACGTGTGCTCTTCCG
CTGACG CAAGCAGAAGACGGCATACGAGATCGTCAGGTGACTGGAGTTCAGACGTGTGCTCTTCCG
GCTACG CAAGCAGAAGACGGCATACGAGATCGTAGCGTGACTGGAGTTCAGACGTGTGCTCTTCCG

The gel-purified PCR products were analyzed by Illumina sequencing using standard 50-bp, single-end read protocols. High-throughput sequencing data were obtained in fastq format. The raw data were parsed using Perl scripts to sort samples according to the different barcodes located in the primer sequences. Sequences were extracted by Nanobarcode in the software suite Nanocraft. The number of individual sequences and the two random nucleotides included in the PCR primers were counted and reported as a hash table by Perl script. The relative dissociation constant of each sequence was calculated using the number of sequencing reads as described (27).

Calculation of KA,rel for hnRNP A1 Binding.

The relative equilibrium constants for hnRNP A1 binding to alternative RNA ligands were determined from the Illumina sequencing data by applying a simple model for two competing RNA substrates S1 and S2. The ratio of their free concentrations at a particular concentration of enzyme (E) is expressed by

S1S2=S1,0S2,01+EK21+EK1, [S1]

where S1,0 and S2,0 are the initial concentrations of substrates S1 and S2 in the absence of enzyme, and K1 and K2 are their respective equilibrium association constants (KA = [ES]/[E][S]). The association constant for the S2 substrate is therefore

K2=E((S1S2)(S1,0S2,0)(1+EK1))1. [S2]

A relative dissociation constant can therefore be calculated by setting the KA value for the reference substrate, S1 to the native sequence, to 1:

K2,rel=E((S1S2)(S1,0S2,0)(1+E))1. [S3]

Thus, the relative association constant (KA,rel) can be calculated by measuring the ratio of each sequence relative to a reference at specific enzyme concentrations. The greatest resolution relative to the affinity of the native substrate will be obtained at enzyme concentrations near to or below the reference, K1. Accordingly, samples for HTS-EQ were taken at f[ES] = 0.2–0.5.

As described previously (27), equilibrium dissociation constants can also be determined at low-protein concentrations by directly fitting the Illumina sequencing data to Eq. S1 using the number of sequence reads as an estimate of f[ES]. These two approaches provide equivalent values for KA,rel; however, greater inaccuracy is observed for sequences with low affinity as a consequence of low levels of depletion of specific substrates from the residual substrate population.

Quantitative Modeling of RNA Sequence Specificity.

The KA,rel affinity distribution determined using HTS-EQ was fit to a PWM model of sequence specificity that assume bases in the binding sequence contribute in an independent and identically distributed, noninteracting manner. As described in Results, the contributions of individual nucleotides at a particular position in the binding site can be dependent on their local sequence context. We used a model of sequence specificity that adds an interaction term to the independent variables in a standard PWM model that describes the interaction between bases. The basic postulate of these terms is that they are also independent and identically distributed, but they nonetheless allow a simple quantification of the coupling between the contributions of individual positions to UP1 binding.

For computational analysis, the symbols of bases for each sequence in the UP1 binding site are first converted to 0 and 1 by the following function:

{Ni,j;8i3,jA,C,G,U}. [S4]

For each nucleotide, we denote ni,j, where i is the position in the loop and j represents the base identity at position i. This configuration gives rise to 28 random variables that are independent and identically distributed. The following function indicates that the value of our independent random variables is 0 or 1:

Xi(c)={c=N,xi=1cN,xi=0,NA,C,G,U,cA,C,G,U. [S5]

Because the relationship between the relative association constant KA,rel and free energy is logarithmic, the following equation is used for linear correlation analysis:

ln(KA,rel)i=38aiAi+ciC+igiGi+uiUi. [S6]

The values for PWM are based on the following coefficients: ai, ci, gi, ui, i = 1–7.

IC values (In) between individual nucleotide positions (αn) included in the model result in significantly better fits to the experimental data as shown in previous work (30):

ln(KA,rel)i=38(aiAi+ciC+igiGi+uiUi)+i=1nαnIn. [S7]

During the data fitting, those IC values with T values larger than 3.5 were retained in each round of regression. The resulting IC values are reported in the heat map shown in Fig. 2E. The affinity distributions from HTS-EQ experiments and IC values were visualized using MATLAB and Origin to generate the appropriate plots and histograms.

Preparation of SL3ESS3 Variants for Biophysical Studies.

The RNA samples were in vitro transcribed from synthetic DNA templates (Integrated DNA Technologies). Uniformly 13C-labeled uridine (UTP) was purchased from Cambridge Isotope Laboratories for NMR studies. Following RNA synthesis, the samples were purified using 16% (vol/vol) PAGE and electroeluted. RNA samples prepared for biophysical and NMR studies were annealed in water by heating the samples at 95 °C for 3 min, followed by snap-cooling on ice. The RNA samples for calorimetry and kinetic studies were exchanged into 10 mM K2HPO4 (pH 6.5), 120 mM KCl, 10 mM NaCl, 0.5 mM EDTA, and 1 mM TCEP using a Millipore Amicon Ultra-4 centrifugal device. RNA samples for NMR analysis were exchanged into 10 mM K2HPO4 (pH 6.5), 120 mM KCl, 10 mM NaCl, 0.5 mM EDTA, 1 mM TCEP, and 100% D2O.

Quantification of Equilibrium Binding Affinity by Isothermal Titration Calorimetry.

Calorimetric titration studies were carried out at 25 °C using a VP-ITC calorimeter (MicroCal, LLC) with the SL3ESS3 variants. Each RNA sample was prepared by diluting to a concentration of 2.5 μM in binding buffer [10 mM K2HPO4, 120 mM KCl, 10 mM NaCl, 0.5 mM EDTA, 1 mM TCEP (pH 6.5)]. C-terminal His6-tagged UP1 protein was prepared for the titration studies by exchanging it into the same binding buffer as used for RNA sample preparation using the Amicon Ultra-4 centrifugal filter devices. The UP1 protein (∼50 μM) was titrated into ∼1.4 mL of 2.5 μM RNA over 36 injections of 8 μL each. In competition binding experiments, the GUAGGAG SL3ESS3 variant (∼50 μM) was titrated into preformed UP1–GCACUUU SL3ESS3 complex at different molar ratios. Before fitting to a 1:1 binding isotherm in Origin v7.0, the raw data were corrected for heats of dilution by subtracting the average of the last eight values of the titration curve.

Analysis of Binding Kinetics Using BLI.

All kinetics experiments were performed in 10 mM K2HPO4, 120 mM KCl, 10 mM NaCl, 0.5 mM EDTA, and 1 mM TCEP buffer at pH 6.5. UP1 biotinylation and offline loading onto streptavidin biosensors (FortéBio), along with 96-well plate preparation using serial dilutions of RNA, were conducted as previously described (25). Kinetics experiments were performed in duplicate at 25 °C and 1,000 rpm at a collection rate of 5 Hz using an Octet Red 96 instrument (FortéBio). Data were collected and processed using FortéBio Octet Data Acquisition and Data Analysis, version 7.0. RNA constructs were split into two groups according to their relative affinity during initial scouting runs, with the three tightest binders in one group and the three weakest binding RNA constructs in another. Before data acquisition, the sensor-immobilized UP1 protein was presaturated with RNA to reduce any nonspecific interactions that could occur during kinetic measurements. To accomplish this procedure, the highest RNA concentration in the 96-well plate was associated with the protein for 600 s and dissociated for 900 s in buffer for the weak group or associated for 900 s followed by 1,800 s of dissociation in buffer for the tight group. After initial presaturation, the baseline was recorded for 60 s, followed by 900 s of association for both groups of constructs and 1,800 s of dissociation in buffer for the tight binders or 1,200 s of dissociation in buffer for the weak binders. The process of baseline association and dissociation was repeated to account for the seven serial dilutions created for each construct. After data acquisition, 500 s of association and dissociation data were analyzed in GraphPad Prism 6.0 (GraphPad Software) to determine kinetic values. The apparent first-order dissociation rate constants (koff, app) and apparent second-order association rate constants (kon, app) were determined by global fitting of the experimental data using a 1:1 binding model in Prism 6.0 (Fig. S7).

NMR Analysis of SL3ESS3 Variants and UP1 Competitive Binding Studies.

NMR experiments were carried out on Bruker Avance high-field NMR spectrometers (800 and 900 MHz) equipped with cryogenically cooled hydrogen, carbon, and nitrogen (HCN) triple-resonance probes and a z-axis pulsed-field gradient accessory. To check RNA conformational homogeneity and purity, 1D 1H NMR experiments were performed using a Bruker Avance III HD instrument (500 MHz) equipped with the Prodigy broadband cryoprobe. All NMR data were processed by NMRPipe/NMRDraw (39) and analyzed using NMRView J software (40). For the HMQC titration experiments, a 13C-selectively labeled GCACUUU SL3ESS3 sample at 100 μM was titrated into an unlabeled UP1 at 100 μM plus unlabeled GUAGGAG SL3ESS3 at 150 μM containing 10 mM K2HPO4 (pH 6.5), 120 mM KCl, 10 mM NaCl, 0.5 mM EDTA, 1 mM TCEP, and 100% D2O at 298 K. For the displacement experiment, a twofold excess of unlabeled GUAGGAG SL3ESS3 was titrated into a preformed UP1–13C-labeled GCACUUU SL3ESS3 complex (1:1).

Application of the UP1-SL3ESS3 Trained Specificity Model to miRNA Targets.

The sequences of human pre-miRNA were fetched from the miRBase database. Secondary structures were predicted using the Vienna Package V2.0. Due to the sensitivity of hnRNP A1 to base pairing in the loop sequence of ESS3, we selected the miRNAs containing more than three nucleotides open on their stem loops. If the number of nucleotides on the stem loop was three or five, two or one nucleotides on both sides of the open loop were counted to match the HTS-EQ dataset of seven nucleotides. Both extended sequences were analyzed, and the sequence with the highest KA,rel value was included in Fig. S8. For miRNAs with four and six nucleotides in the open loop, the corresponding 7-nt sequence with the maximum affinity is counted. If the predicted loop contained more than seven nucleotides, the sequence with maximum predicted affinity of all possible loop sequences was included in the histogram.

Acknowledgments

We thank James Delproposto for assistance with biolayer interferometry kinetic studies, which were supported by Grant P50GM103297. The Bruker Avance III HD (500 MHz) was purchased using funds provided by National Science Foundation Grant 1334048. This work was supported by National Institute of General Medical Sciences Grants GM101979 (to B.S.T.) and GM056740 (to M.E.H.).

Footnotes

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1616371114/-/DCSupplemental.

References

  • 1.Singh G, Pratt G, Yeo GW, Moore MJ. The clothes make the mRNA: Past and present trends in mRNP fashion. Annu Rev Biochem. 2015;84:325–354. doi: 10.1146/annurev-biochem-080111-092106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Iadevaia V, Gerber AP. Combinatorial control of mRNA fates by RNA-binding proteins and non-coding RNAs. Biomolecules. 2015;5(4):2207–2222. doi: 10.3390/biom5042207. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Van Assche E, Van Puyvelde S, Vanderleyden J, Steenackers HP. RNA-binding proteins involved in post-transcriptional regulation in bacteria. Front Microbiol. 2015;6:141. doi: 10.3389/fmicb.2015.00141. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Mitchell SF, Parker R. Principles and properties of eukaryotic mRNPs. Mol Cell. 2014;54(4):547–558. doi: 10.1016/j.molcel.2014.04.033. [DOI] [PubMed] [Google Scholar]
  • 5.Rissland OS. The organization and regulation of mRNA-protein complexes. Wiley Interdiscip Rev RNA. 2017;8(1):1–17. doi: 10.1002/wrna.1369. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Ascano M, Gerstberger S, Tuschl T. Multi-disciplinary methods to define RNA-protein interactions and regulatory networks. Curr Opin Genet Dev. 2013;23(1):20–28. doi: 10.1016/j.gde.2013.01.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Jayaseelan S, Doyle F, Tenenbaum SA. Profiling post-transcriptionally networked mRNA subsets using RIP-Chip and RIP-Seq. Methods. 2014;67(1):13–19. doi: 10.1016/j.ymeth.2013.11.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Licatalosi DD, et al. HITS-CLIP yields genome-wide insights into brain alternative RNA processing. Nature. 2008;456(7221):464–469. doi: 10.1038/nature07488. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Cook KB, Hughes TR, Morris QD. High-throughput characterization of protein-RNA interactions. Brief Funct Genomics. 2015;14(1):74–89. doi: 10.1093/bfgp/elu047. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Campbell ZT, Wickens M. Probing RNA-protein networks: Biochemistry meets genomics. Trends Biochem Sci. 2015;40(3):157–164. doi: 10.1016/j.tibs.2015.01.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Huelga SC, et al. Integrative genome-wide analysis reveals cooperative regulation of alternative splicing by hnRNP proteins. Cell Reports. 2012;1(2):167–178. doi: 10.1016/j.celrep.2012.02.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Jankowsky E, Harris ME. Specificity and nonspecificity in RNA-protein interactions. Nat Rev Mol Cell Biol. 2015;16(9):533–544. doi: 10.1038/nrm4032. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Buenrostro JD, et al. Quantitative analysis of RNA-protein interactions on a massively parallel array reveals biophysical and evolutionary landscapes. Nat Biotechnol. 2014;32(6):562–568. doi: 10.1038/nbt.2880. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Lambert N, et al. RNA Bind-n-Seq: Quantitative assessment of the sequence and structural binding specificity of RNA binding proteins. Mol Cell. 2014;54(5):887–900. doi: 10.1016/j.molcel.2014.04.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Ozer A, et al. Quantitative assessment of RNA-protein interactions with high-throughput sequencing-RNA affinity profiling. Nat Protoc. 2015;10(8):1212–1233. doi: 10.1038/nprot.2015.074. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Geuens T, Bouhy D, Timmerman V. The hnRNP family: Insights into their role in health and disease. Hum Genet. 2016;135(8):851–867. doi: 10.1007/s00439-016-1683-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Jean-Philippe J, Paz S, Caputi M. hnRNP A1: The Swiss army knife of gene expression. Int J Mol Sci. 2013;14(9):18999–19024. doi: 10.3390/ijms140918999. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Ding J, et al. Crystal structure of the two-RRM domain of hnRNP A1 (UP1) complexed with single-stranded telomeric DNA. Genes Dev. 1999;13(9):1102–1115. doi: 10.1101/gad.13.9.1102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Xu RM, Jokhan L, Cheng X, Mayeda A, Krainer AR. Crystal structure of human UP1, the domain of hnRNP A1 that contains two RNA-recognition motifs. Structure. 1997;5(4):559–570. doi: 10.1016/s0969-2126(97)00211-6. [DOI] [PubMed] [Google Scholar]
  • 20.Shamoo Y, Krueger U, Rice LM, Williams KR, Steitz TA. Crystal structure of the two RNA binding domains of human hnRNP A1 at 1.75 A resolution. Nat Struct Biol. 1997;4(3):215–222. doi: 10.1038/nsb0397-215. [DOI] [PubMed] [Google Scholar]
  • 21.Dreyfuss G, Kim VN, Kataoka N. Messenger-RNA-binding proteins and the messages they carry. Nat Rev Mol Cell Biol. 2002;3(3):195–205. doi: 10.1038/nrm760. [DOI] [PubMed] [Google Scholar]
  • 22.Ishikawa F, Matunis MJ, Dreyfuss G, Cech TR. Nuclear proteins that bind the pre-mRNA 3′ splice site sequence r(UUAG/G) and the human telomeric DNA sequence d(TTAGGG)n. Mol Cell Biol. 1993;13(7):4301–4310. doi: 10.1128/mcb.13.7.4301. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Burd CG, Dreyfuss G. RNA binding specificity of hnRNP A1: Significance of hnRNP A1 high-affinity binding sites in pre-mRNA splicing. EMBO J. 1994;13(5):1197–1204. doi: 10.1002/j.1460-2075.1994.tb06369.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Bruun GH, et al. Global identification of hnRNP A1 binding sites for SSO-based splicing modulation. BMC Biol. 2016;14:54. doi: 10.1186/s12915-016-0279-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Morgan CE, et al. The first crystal structure of the UP1 domain of hnRNP A1 bound to RNA reveals a new look for an old RNA binding protein. J Mol Biol. 2015;427(20):3241–3257. doi: 10.1016/j.jmb.2015.05.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Rollins C, Levengood JD, Rife BD, Salemi M, Tolbert BS. Thermodynamic and phylogenetic insights into hnRNP A1 recognition of the HIV-1 exon splicing silencer 3 element. Biochemistry. 2014;53(13):2172–2184. doi: 10.1021/bi500180p. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Lin H-C, Niland CN, Zhao J, Jankowsky E, Harris ME. Analysis of binding landscapes: C5 protein contributes to ribonuclease P specificity of pre-tRNA 5′ leaders in the ground state and transition state for binding. Cell Chem Biol. 2016;23(10):1271–1281. doi: 10.1016/j.chembiol.2016.09.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Stormo GD. DNA motif databases and their uses. Curr Protoc Bioinformatics. 2015;51:2.15.1–2.15.6. doi: 10.1002/0471250953.bi0215s51. [DOI] [PubMed] [Google Scholar]
  • 29.Zhao Y, Ruan S, Pandey M, Stormo GD. Improved models for transcription factor binding site identification using nonindependent interactions. Genetics. 2012;191(3):781–790. doi: 10.1534/genetics.112.138685. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Guenther UP, et al. Hidden specificity in an apparently nonspecific RNA-binding protein. Nature. 2013;502(7471):385–388. doi: 10.1038/nature12543. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Towbin H, et al. Systematic screens of proteins binding to synthetic microRNA precursors. Nucleic Acids Res. 2013;41(3):e47. doi: 10.1093/nar/gks1197. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Michlewski G, Cáceres JF. Antagonistic role of hnRNP A1 and KSRP in the regulation of let-7a biogenesis. Nat Struct Mol Biol. 2010;17(8):1011–1018. doi: 10.1038/nsmb.1874. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Michlewski G, Guil S, Cáceres JF. Stimulation of pri-miR-18a processing by hnRNP A1. Adv Exp Med Biol. 2010;700:28–35. doi: 10.1007/978-1-4419-7823-3_3. [DOI] [PubMed] [Google Scholar]
  • 34.Ray D, et al. A compendium of RNA-binding motifs for decoding gene regulation. Nature. 2013;499(7457):172–177. doi: 10.1038/nature12311. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Molliex A, et al. Phase separation by low complexity domains promotes stress granule assembly and drives pathological fibrillization. Cell. 2015;163(1):123–133. doi: 10.1016/j.cell.2015.09.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Tavanez JP, Madl T, Kooshapur H, Sattler M, Valcárcel J. hnRNP A1 proofreads 3′ splice site recognition by U2AF. Mol Cell. 2012;45(3):314–329. doi: 10.1016/j.molcel.2011.11.033. [DOI] [PubMed] [Google Scholar]
  • 37.Jain N, Morgan CE, Rife BD, Salemi M, Tolbert BS. Solution structure of the HIV-1 intron splicing silencer and its interactions with the UP1 domain of heterogeneous nuclear ribonucleoprotein (hnRNP) A1. J Biol Chem. 2016;291(5):2331–2344. doi: 10.1074/jbc.M115.674564. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Ingolia NT, Brar GA, Rouskin S, McGeachy AM, Weissman JS. The ribosome profiling strategy for monitoring translation in vivo by deep sequencing of ribosome-protected mRNA fragments. Nat Protoc. 2012;7(8):1534–1550. doi: 10.1038/nprot.2012.086. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Delaglio F, et al. NMRPipe: A multidimensional spectral processing system based on UNIX pipes. J Biomol NMR. 1995;6(3):277–293. doi: 10.1007/BF00197809. [DOI] [PubMed] [Google Scholar]
  • 40.Johnson BA, Blevins RA. NMR View: A computer program for the visualization and analysis of NMR data. J Biomol NMR. 1994;4(5):603–614. doi: 10.1007/BF00404272. [DOI] [PubMed] [Google Scholar]

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES