SUMMARY
The comprehensive but specific identification of RNA binding proteins as well as the discovery of RNA-associated protein functions remain major challenges in RNA biology. Here, we adapt the concept of RNA dependence, defining a protein as RNA-dependent if its interactome depends on RNA. We converted this concept into a proteome-wide, unbiased and enrichment-free screen called R-DeeP (RNA-dependent proteins) based on density gradient ultracentrifugation, Quantitative mass spectrometry identified 1784 RNA-dependent proteins including 537 lacking known links to RNA, Exploiting the quantitative nature of R-DeeP, proteins were classified as not, partially or completely RNA-dependent, R-DeeP identified the transcription factor CTCF as completely RNA-dependent and we uncovered that RNA was required for the CTCF-chromatin association, Additionally, R-DeeP allows reconstruction of protein complexes based on co-segregation, The whole dataset is available at http://R-DeeP.dkfz.de providing a proteome-wide, specific and quantitative identification of proteins with RNA-dependent interactions aiming at future functional discoveries on RNA-protein complexes.
eTOC Blurb
Caudron-Herger et al. developed a proteome-wide, specific and quantitative analysis of RNA-dependent proteins based on density gradient ultracentrifugation. The R-DeeP resource is available online and guides the discovery of unexpected RNA functions by newly linking 537 proteins to RNA and enabling the quantification of the RNA-dependent fraction of a protein.
Graphical Abstract

INTRODUCTION
During transcription, RNA molecules associate with various RNA-binding proteins (RBPs) to form so called ribonucleoprotein (RNP) complexes. RBPs will determine the fate of the RNA transcript and vice versa, RNAs can affect the fate of their interacting proteins. RNPs are dynamic entities, whose protein composition around the RNA can change with each RNA processing, maturation, transport and localization step. While some proteins may constitutively bind to the RNA, others dissociate or transiently bind at later stages (Dreyfuss et al., 2002; Moore, 2005). In addition, the dynamic exchange of RBPs within RNPs responds to a variety of cellular and environmental stimuli (Glisovic et al., 2008). Since RBPs exert a broad range of key functions in RNA metabolism and regulation of gene expression (Kishore et al., 2010), defects in RBP functions are linked to severe diseases ranging from neurodegenerative disorders to cancer (Castello et al., 2013; Lukong et al., 2008).
RBPs are not only involved in the biogenesis and function of coding messenger RNAs, they also associate with non-coding RNAs (ncRNAs), which are transcribed from large parts of the genome (Consortium et al., 2007; Djebali et al., 2012). Long non-coding RNAs (IncRNA) constitute one diverse class of ncRNAs and have been implicated in numerous physiological and pathological processes including cancer (Gutschner and Diederichs, 2012; Schmitt and Chang, 2016). Understanding the mechanisms underlying IncRNA role is essential to provide new perspectives on their functional repertoire. However, it remains difficult to elucidate specific molecular interactions between individual IncRNAs and their interacting partners (Chu et al., 2015; Engreitz et al., 2015; Roth and Diederichs, 2015). Hence, it sets one of the most challenging tasks in RNA biology to discover novel mechanisms and the full spectrum of molecular complexes affected by (nc)RNAs.
To comprehensively identify RBPs, proteome-wide approaches in various mammalian systems have been developed, often based on pulldown assays of polyadenylated (polyA+) RNAs (Baltz et al., 2012; Beckmann et al., 2015; Castello et al., 2012; Conrad et al., 2016; Gerstberger et al., 2014; Kwon et al., 2013; Perez-Perri et al., 2018). These studies have identified hundreds of novel polyA+ RNA-interacting RBPs. In addition, they highlighted the presence of many RBPs lacking classical RNA-binding domains (Lunde et al., 2007) but containing intrinsically disordered domains (Castello et al., 2012). Recently, methods based on protease digestion (Castello et al., 2016; Mullari et al., 2017), modified nucleotides (Bao et al., 2018; He et al., 2016) or organic phase separation of protein-crosslinked RNAs (Trendel et al., 2018) were developed to also include non-polyA+ RNA and to characterize RNA-binding domains, further lengthening the extensive list of RBP candidates, When intersecting eight human proteome-wide screens for RBPs, we found 215 proteins overlapping between the eight studies, while 1241 were found only in one of the studies, the latter including a large contribution of studies also including non-polyA+ RNA (Figure S1A).
Complementary strategies could help to define a core set of RBPs independent of affinity- or property-based purifications. Besides published RBP databases (Cook et al., 2011; Ray et al., 2013) and catalogs (Gerstberger et al., 2014), the SONAR analysis predicted additional RBPs based on the interaction of proteins with RBPs or RBP candidates (Brannan et al., 2016). Also, studies on the interactomes of specific RNA species like microRNAs are emerging (Treiber et al., 2017). Here, we contribute to this important task and adapt the concept of “RNA dependence” to tackle the aforementioned challenges, We aim at determining and quantifying the full set of proteins affected by RNA and thereby paving the way to identifying additional pathways and mechanisms controlled by RNA, We present a proteome-wide, enrichment-free, and quantitative screen for proteins whose interactome depends on RNA including 537 RNA-dependent proteins not previously linked to RNA like HMGN1, CASP7, REEP4 or THYN1. To illustrate the quantitative nature of our approach and demonstrate its advantage, we characterize the complete RNA dependence of the transcription factor CTCF and establish RNA as an important factor for the recruitment of CTCF to chromatin.
RESULTS
R-DeeP: A Proteome-Wide Screen to Identify RNA-dependent Proteins
First, we re-define the concept of “RNA dependence” and qualify a protein as “RNA-dependent” if its interactome depends on RNA without necessarily directly binding to RNA. It is conceptually different from the notion of “RNA-binding” since RNA-dependent proteins are involved in complexes bound to RNA, which also include proteins that do not interact directly with RNA (Figure 1A). This concept allows the specific and quantitative discovery of RNA dependence of proteins and complexes - also from pathways not previously linked to RNA.
Figure 1. R-DeeP: A Proteome-wide Screen to Identify RNA-dependent Protein.
(A) Untreated (Control) or RNase-treated (RNase) HeLa S3 cell lysates were prepared and loaded on the top of 5% to 50% sucrose density gradients. Following ultracentrifugation, triplicates of the gradients were fractionated and subjected to mass spectrometry and western blot analysis for validation of the screen. The raw mass spectrometry data were fitted using Gaussian curves to generate various parameters (position of the maxima, amplitude difference, shift distance and amount of protein as given by the area under the curve at each peak).
(B) Heatmap showing the total amount of each protein as a sum of all fractions in the control and RNase replicates. The replicates are strongly positively correlated (Pearson’s coefficient R > 0.94 for all replicate pairs).
(C) Distribution of the amount of each protein per fraction in pairs of replicates after a fraction-wise normalization step. The higher the color intensity, the higher the number of points at this position.
(D) Heatmaps of the sub-categories for shifting proteins (left shift, right shift or precipitated), representing the enrichment in the control (green) or in the RNase (red) fractions.
(E) Graph depicting the position of the maxima for each shift in the control and the RNase sample according to the mean fit of three replicates. The bar-graph inset (bottom right) shows the number of shifts in each sub-category.
(F) Classification of the proteins analyzed in the R-DeeP screen according to their shifting behavior and their prior classification (Table S1). RBP* = RBP or RBP candidate.
(G) Violin plots representing the distribution of the isoelectric points of the proteins classified as in (F). RBP* = RBP or RBP candidate. The bar indicates the median in each group (p<0.05 between all groups).
RNA-dependent proteins and complexes are expected to migrate to different positions in a sucrose density gradient in presence or absence of RNA. This enabled us to screen specifically, unbiasedly and without enrichment strategies for RNA-dependent proteins (R-DeeP): we loaded HeLa S3 cell lysate on a 5% to 50% sucrose density gradient and fractionated it into 25 fractions after ultracentrifugation. The protein content of each fraction was analyzed by western blot or mass spectrometry. Each protein showed a specific distribution throughout the gradient. A shift between fractions of the control and the RNase-treated gradient revealed the RNA dependence of the protein (Figure 1A). A quality control of the RNase digest showed that RNAs were digested down to a size well below 100 nt (Figure S1B and S1C), leading to a complete loss of detection for small and large RNA species (Figure S1D). In addition, significant shifts were observed for proteins like DICER and ARGONAUTE2, known to interact with small RNA species (Meister et al., 2005) (Figures S1E and S1F).
We generated control and RNase-treated sucrose density gradients in triplicates and quantified the abundance of proteins in each fraction, using quantitative mass spectrometry-based proteomics. Thereby, we reproducibly identified and quantified 4765 proteins. We developed a comprehensive statistical analysis to identify RNA-dependent proteins based on the Gaussian-fitted distribution of each protein in the gradient (see STAR Methods). The shifts were characterized by (i) the position of the maxima in the control and RNase-treated gradients, (ii) the amount of protein shifting, represented by the area under the Gaussian fit curve, (iii) the distance and the direction of the shift, (iv) the amplitude difference at each maximum between the control- and the RNase-fitted curves and (v) the statistical significance of the difference (Figure 1A). The criteria for an “RNA-dependent” shift were defined as a distance of >1 fraction and a significant difference (FDR-corrected p-value < 0.05).
Overall, protein levels within the same sample were heterogeneous reflecting the natural spectrum of expression. Between replicates, the amount of each protein remained largely consistent (Figure 1B). Following a fraction-wise normalization step, the amount of each protein per fraction correlated well for all proteins when comparing pairs of replicates, indicating the reproducibility of the method (Figure 1C, Pearson’s coefficient R >0.99). The proteins whose distribution displayed significant differences between control and RNase-treated gradients were classified into sub-categories with different types of shifts: left-shifts (shift towards lower sucrose density fractions) as the largest category, right-shifts (towards higher fractions) and precipitated proteins (accumulated in last fraction, Figure 1D).
Applying this thorough statistical analysis, we determined the fitted peak positions for each protein finding a high correlation between replicates (R>0.99, Figure S1G) and categorized them as left-shift (1931), right-shift (120) or precipitated (425), while no significant shift was observed for the majority of proteins (2857, no shift) (Figure 1E). In total, we identified 1784 proteins presenting a significant shift (Figure 1F). When comparing this group to 19 previous studies and catalogs of RBPs, 1247 (69.9%) RNA-dependent proteins had previously been listed at least once as RNA-binding (Shift & RBP), while 537 (30.1%) RNA-dependent proteins had never been linked to RNA before (Shift & No RBP). These newly identified proteins might be either RBPs that escaped the detection by the other methods or proteins that interact with RNA indirectly. They are prime candidates for the future identification of unexpected RNA-dependent molecular processes. Vice versa, 1402 proteins previously listed as RBP candidates displayed no significant RNA-dependent shifts (No shift & RBP). When compared to individual proteome-wide RBP studies in human cells, the overlap of R-DeeP ranged from 56% to 86% of the RBP candidates (Figure S2A). The percentage of RNA-dependent proteins increased with the number of times a protein had been listed as RBP candidate (Figure S2B), showing the complementarity of our approach to the previous studies. In addition, when considering proteins from complexes involved in catalytic activities acting on RNA, RNA synthesis or degradation, the R-DeeP screen compared well with the previous studies for the identification of RNA-dependent subunits (Table S2).
To further analyze the specific properties of the protein sub-categories, we investigated their isoelectric points (pi), low complexity domain (LCD) contents, amino acid composition and protein domains. Shifting proteins had a significantly higher average pi than non-shifting proteins, independently of their identification as RBP candidate (Figure 1G). The average pi for the four shifting groups - right shift, no shift, left shift and precipitated - persistently increased from right-shifted to precipitated proteins (Figure S2C). The newly identified RNA-dependent proteins (Shift & no RBP) had significantly longer LCDs (Figure S2D). Accordingly, shifting proteins were also significantly enriched in disorder promoting amino acids (48% as compared to 46% for non-shifting proteins, p<2.2e-16) and positively charged amino acids (15%−16% for shifting proteins vs. 14% for non-shifting proteins, p<2.2e-16). Shifting and not shifting RBP candidates had a similar distribution of LCD lengths, longer than not shifting non-RBP candidates (Figure S2D). While left-shifted proteins were enriched in protein domains linked to RNA-binding (interPro ID IPR000504 and related IDs IPR012677, IPR035979), right-shifted and precipitated proteins were enriched in actin family domains (IPR004000) and zinc finger-related domains (IPR001909, IPR036236), respectively. The GO molecular function “RNA binding” was enriched in the left-shifted or precipitated proteins (Figure S2E), while the GO terms “DNA binding” and “Catalytic activity” were enriched in the shifting and non-shifting proteins, respectively, independent of their classification as RBP candidate (Figure S2E), highlighting the similarities between the shifting RBPs and the newly R-DeeP-identified RNA-dependent proteins.
Validation of the R-DeeP Screen
To validate the results of the R-DeeP screen, known RBPs like HNRNPU (Heterogeneous nuclear ribonucleoprotein U) and RPS3 (40S ribosomal protein S3) served as positive controls. The results of three independent gradients and subsequent western blots were quantified. These matched the profiles of HNRNPU and RPS3 distributions as observed from mass spectrometry (Figures 2A, S3A) and confirmed the R-DeeP screen results. As controls for RNA-independent proteins, we analyzed ASNS (Asparagine synthetase) and PSMA1 (Proteasome subunit alpha type-1). Again, the quantitative analysis of three western blots mirrored the profiles of the mass spectrometry dataset for these proteins (Figures 2B, S3B). While ASNS had not been linked to RNA, PSMA1 appeared as RBP candidate in one proteome-wide study and two catalogs and is listed as RBP according to the Gene Ontology Consortium (Ashburner et al., 2000; Gerstberger et al., 2014; Mullari et al., 2017). Similarly, the protein PSMB1 (Proteasome subunit beta type-1) was listed twice as RBP candidate but did not show any RBP-dependence in the R-Deep screen (Figures S3C). To clarify this discrepancy, we performed a crosslinking immunoprecipitation (CLIP) experiment followed by PNK radioactive labelling for HNRNPU, CTCF, ASNS and PSMB1 (Figures S3D and S3E). As expected, HNRNPU and CTCF bound to RNA. In contrast, no interaction with RNA could be detected for ASNS and PSMB1 confirming their RNA-independence. Among the new RNA-dependent proteins, we selected REEP4 (Receptor expression-enhancing protein 4), HMGN1 (High mobility group nucleosome binding domain 1), CASP7 (Caspase-7) and THYN1 (Thymocyte nuclear protein 1) which depicted significant shifts after RNase treatment (Figure 2C) and had not been previously linked to RNA. Our CLIP analysis revealed the RNA-binding ability of all four proteins, with a signal reduction in non-crosslinked or excessively RNase A-treated samples confirming the specificity of these results (Figures 2D and S3F). This further validated the robustness of our findings and the potential of R-DeeP to link RNA to new biological pathways.
Figure 2. Validation of the R-DeeP Screen.
(A) Mass spectrometry and western blot (WB) analysis for HNRNPU. Top panel: graphical representation of the protein amount in the 25 fractions of the sucrose density gradient as analyzed by mass spectrometry. Raw data (mean of 3 replicates) are indicated by the lines with markers. The lines without markers correspond to the respective Gaussian fit (control in green; RNase in red). The overall protein amount of the raw data was normalized to 100. Middle panel: WB analysis of the protein in 25 fractions of representative control and RNase samples. Bottom panel: graph of the quantitative analysis of three WB replicates depicting the mean of three experiments with standard error of the mean (SEM).
(B) Same as in (A) for ASNS.
(C) Graphical representation as in (A) of the amount of REEP4, HMGN1, CASP7 and THYN1 in the 25 fractions of the gradient.
(D) Representative CLIP analysis of HNRNPU (positive control), ASNS (negative control), REEP4, HMGN1, CASP7 and THYN1. WB = western blot. CLIP = CLIP autoradiography. Green arrows indicate the presence of RNA bound to the protein at the respective size. Red arrows indicate the absence RNA.
R-DeeP Analysis of Protein Interaction Networks
Next, we investigated protein interaction networks and their dependence on RNA based on the CORUM database for protein complexes (Ruepp et al., 2010). We classified the 2710 proteins listed in the CORUM database for their relation to RNA based on the 19 studies mentioned before: “RBP” for proteins listed as RBP candidates, “RBP-interacting” for proteins found only in complex with RBP candidates, “RBP-indirect” for proteins found only in complex with “RBP-interacting” proteins and “RBP-independent” for proteins not listed as RBP or not interacting with RBPs or RBP-interacting proteins (Figure 3A). Since 81% of the CORUM proteins appeared as RBP candidates or RBP-interacting proteins and only 12% were RBP-independent, RBPs are highly overrepresented in the CORUM database. Comparing the RNA-dependent proteins from R-DeeP to the four different classes from the CORUM database revealed a positive correlation with the largest fraction of RNA-dependent protein found in the RBP class and the lowest fraction found in the RNA-independent class (Figure 3B). But this analysis also uncovered that 44% of the previously listed RBPs did not show RNA dependence, while 15% of the proteins in the RNA-independent class nonetheless did.
Figure 3. R-DeeP Analysis of Protein Interaction Networks.
(A) Distribution of human proteins listed in the CORUM database in “RBP”: protein listed in at least one of the RBP resources (Table S1); “RBP-interacting”: protein engaged only in complexes with RBPs; “RBP-indirect”: protein engaged only in complexes with “RBP-interacting” proteins; “RBP-independent”: protein engaged in complexes with neither of the three RBP-related categories.
(B) Proportion of RNA-dependent proteins found in the R-DeeP screen for the categories defined in (A).
(C) STRING database (Szklarczyk et al., 2015) representation of the mSIN3A CORUM complex and its R-DeeP analysis. The table and the graphs illustrate the presence of a common peak in the control sample around fraction 18.6 ± 0.4. All proteins of the complex are RNA-dependent.
(D) Same as in (C) for the MCM CORUM complex.
See also Figures S4 and S5.
Based on our screening approach, we hypothesized that our dataset allowed the analysis of protein complexes since the subunits should share a common peak in the presence of RNA. The analysis of multiple protein complexes verified this hypothesis: for all five proteins of the mSIN3A complex, control peaks mapped to the same position at fraction 18.6 ± 0.4 (mean ± standard deviation) (Figure 3C). Similarly, the vast majority of the 47 ribosomal proteins of the 60S subunit co-segregated and shared a common control peak (Figure S4). As non-shifting example, the MCM complex subunits shared a common control peak at fraction 11.2 ± 0.2 also found in the RNase sample (Figure 3D). Similarly, all proteasomal proteins had a common control and RNase peak (Figure S5).
In summary, the R-Deep dataset can aid in the reconstruction of protein complexes based on co-segregation in the control fractions. The peak position after RNase treatment provides further information about the remaining interactions based on apparent size or co-segregation.
Protein Interaction Networks in the Absence of RNA
To determine the interaction state of each protein after RNase treatment, we performed a calibration of the sucrose density gradient using reference proteins with known molecular weight. RNase A (14 kDa), BSA (60 kDa), Aldolase (4*40 kDa =160 kDa), Catalase (4*60 kDa = 240 kDa) and Ferritin (24*20 kDa = 480 kDa) were loaded onto the gradient and the position of each protein was determined (Figure S6A). Since the position in the gradient depends on the protein weight and its shape, this only provided estimates. Nonetheless, molecular weights and positions of these proteins in the gradient correlated well with each other (Figure 4A). In addition, human orthologs of these proteins were detected by mass spectrometry close to the position of the standard proteins (Figure S6B). Comparing the position of proteins in the RNase-treated samples to the reference proteins allowed roughly classifying proteins according to their apparent molecular weight as “small” (smaller than the published molecular weight), “monomeric” (matching the published molecular weight), “in complex” (much larger than expected) or “precipitated” (Figure 4A). While 32% of the significantly shifting proteins became apparently monomeric in the absence of RNA, 61% remained in a smaller complex (Figure 4A). For the UPF complex, most subunits became monomeric after RNase treatment (Figure 4B), whereas all subunits of the RFC complex remained in complex (Figure 4C). Within the MeCP1 complex, most of the subunits remained in complex, but CHD4 and RBBP7 partially returned to their monomeric state (Figure S6C), illustrating the specificity of each R-DeeP profile as well as highlighting the versatility of this dataset to identify the fate of complex subunits.
Figure 4. Interaction Networks after RNase Treatment.
(A) Graph showing the apparent molecular weight and the fraction of the first maximum in the gradient for the reference proteins (RNase A: 14 kDa, BSA: 60 kDa, Aldolase: 4 * 40 = 160 kDa, Catalase: 4 * 60 = 240 kDa and Ferritin: 24 * 20 = 480 kDa). These were used to roughly calibrate the sucrose density gradient and to estimate the apparent molecular weight (MW) of each protein after RNase treatment. The black dashed line indicates the extrapolated relation between R-DeeP gradient fraction and MW. With respect to their position on the graph, proteins were classified into four categories after RNase treatment: at monomeric MW (green), larger than expected from their MW indicating remaining complexes (blue), smaller than monomeric MW (red) and precipitated proteins (yellow). The pie chart indicates the percentage of proteins per category.
(B) Illustration of the RNA-dependent shift of the UPF complex as defined in CORUM. UPF1 and UPF3B shift exactly to their monomeric weight, while UPF2 has an apparent molecular weight close to twice its predicted molecular weight. The name and positions (triangles) of the subunits are indicated on the graph from (A).
(C) Same as in (B) for the RFC complex from CORUM, of which all subunits shifted but remained in a smaller complex after RNase treatment.
See also Figure S6.
Quantitative Analysis of the RNA-dependent Shifts
Taking advantage of the quantitative nature of our approach, we determined the proportion of each protein shifting and classified them as RNA-independent, partially or completely RNA-dependent. For each shift, we calculated a shifting coefficient based on the amount of protein at a given peak position and its change after RNase treatment. These shifting coefficients separated the quantitative shifting categories RNA-independent (no shift), partially (partial shift) or completely RNA-dependent (complete shift) on a graphical representation (Figure 5A).
Figure 5. Quantitative Analysis of the RNA-dependent Shifts.
(A) Graph indicating the behavior of each protein for each pair of control and RNase peaks depicting a shifting coefficient (protein amount at maxima * loss or gain). The colors indicate proteins with no significant shift (red), with significant shifts from one control to one RNase peak (dark green) or with shifts between multiple peaks (light green). For the latter category, all possible combinations of control and RNase peaks are depicted. Proteins shifting with their full amount from a control to an RNase peak are located at the top right part of the graph (complete shift). Proteins with no change are located at the bottom left part of the graph (no shift). In between, proteins are found with a partial shift. Exemplary proteins are indicated on the graph.
(B) Schematic illustration and R-DeeP analysis of the partially shifting NPM3.
(C) Same as in (B) for the completely shifting HNRNPU.
(D) Mass spectrometry R-DeeP analysis of partially shifting established RBPs like the splicing factors LSM8 (U6 snRNA-associated Sm-like protein LSm8), SF3B6 (Splicing factor 3B subunit 6) and SYF1 (Pre-mRNA-splicing factor SYF1) and the chromatin factor SMC4 (Structural maintenance of chromosomes protein 4).
(E) Violin plots representing the distribution of the pi of proteins with full or partial shifts. The bar indicates the median, **p<0.01, two-sided t-test.
Almost all partially shifting proteins were characterized by the presence of a smaller shifting control peak and a larger non-shifting control peak to which the proteins from the smaller peak shifted. The smaller peak, lost upon RNase treatment, corresponded to the RNA-dependent fraction of the protein. The larger control peak represented the RNA-independent fraction of this protein. For example, only a small portion of the protein Nucleoplasmin-3 shifted, indicating that it was only partially RNA-dependent (Figure 5B). In contrast, HNRNPU showed a complete RNA-dependent shift (Figure 5C). Interestingly, several known RBPs involved in splicing or chromatin organization showed only partial shifts (Figure 5D). Further analysis revealed that partially shifting proteins had on average a significantly lower pi than fully shifting proteins (Figure 5E). Moreover, fully shifting proteins were enriched in LCD (38% vs. 29% for partially shifting proteins) and canonical protein domains linked to RNA binding (interPro ID IPR000504, IPR012677, IPR035979). These results may indicate that fully shifting proteins resemble more classical RBPs, while partially shifting proteins may possess a broader spectrum of domains and functions.
RNA Dependence of CTCF Interaction with Chromatin
CTCF is a transcription factor known for its interaction with chromatin (Ohlsson et al., 2010; Teif et al., 2012). Next to DNA, CTCF was shown to bind to RNA (Kung et al., 2015; Sun et al., 2013) as we also confirmed by CLIP (Figures S3D and S3E) and to multimerize in an RNA-dependent manner (Saldana-Meyer et al., 2014). Surprisingly, not a minor fraction but rather all CTCF appeared to be RNA-dependent in our screen, which was validated by a quantitative western blot analysis (Figure 6A). This complete shift led us to hypothesize that RNA could mediate CTCF interaction with chromatin. To test this hypothesis, we performed independently chromatin immunoprecipitation (ChIP), biochemical cell fractionation and immunofluorescence analysis.
Figure 6. RNA Dependence of CTCF Interaction with Chromatin.
(A) Mass spectrometry (left panel) and western blot analysis (right panel) as in Figure 2A for CTCF.
(B) Chromatin immunoprecipitation followed by qPCR analysis of two control regions not interacting with CTCF and six specific CTCF binding sites without (green) or with (red) RNase treatment. The mean of three experiments with SEM is depicted. **p<0.01, *p<0.05, two-sided t-test.
(C) Left panel: western blot analysis of CTCF and Histone H3 in nucleoplasmic and chromatin fractions without (green) or after (red) RNase treatment. Right panel: quantitative analysis of three western blot replicates including standard deviation (SD). *p<0.05, two-sided t-test.
(D) Confocal laser scanning fluorescence microscopy images showing CTCF (immunofluorescence, red), DNA (DAPI, green) and actin (immunofluorescence, grey) in control (Control) or RNase A-treated (RNase) HeLa cells. Scale bars: 20 μm.
(E) Violin plots depicting the Pearson’s correlation coefficient of the CTCF and DNA fluorescent signal in untreated (control) as compared to RNase A-treated (RNase) HeLa cells. Three replicates with 30 cells each were analyzed. Bars indicate the median of each distribution. **p<0.01. two-sided t-test.
First, CTCF ChIP followed by quantitative PCR uncovered an up to 800-fold enrichment of CTCF at its specific binding sites, which was significantly decreased upon RNase A treatment (Figure 6B). A ChIP of the transcription factor SP1, which did not shift upon RNase treatment (Figure S7A), confirmed that RNase treatment did not decrease SP1 at its tested binding sites, validating the specificity of our results (Figure S7B). Second, cell fractionation revealed that CTCF was found in the nucleoplasm or bound to chromatin (Figures S7C and S7D), in agreement with previous findings reporting dynamic and bound CTCF subsets (Agarwal et al., 2017). RNase A treatment of the cell nuclei prior to fractionation resulted in the significant dissociation of CTCF from chromatin, while histone H3 remained bound to it (Figure 6C). Finally, immunofluorescence staining for CTCF with a DAPI stain for DNA in HeLa cells revealed the rearrangement of both CTCF and chromatin albeit to a different extent (Figure 6D). Although they both occupied a large proportion of the nuclear space, RNase A treatment induced a visible and significant decrease of co-localization as revealed by computing the correlation of the fluorescent signals (Figure 6E).
Altogether, these experiments established RNA as one important factor in CTCF recruitment to chromatin based on the quantitative insight gained from the R-DeeP approach.
DISCUSSION
Functional RNAs are mostly identified by means of their expression pattern or cellular phenotypes - but elucidating the molecular mechanisms and protein interaction partners remains difficult and poses a major challenge in the field of RNA biology. Strategies have been developed to gain insight into RNA function starting from the protein side. Approaches to characterize the RNA-binding proteome range from affinity purification of individual RNAs to polyA+ RNA pulldown or organic phase separation of protein-crosslinked RNAs (Queiroz et al., 2019; Trendel et al., 2018; Urdaneta et al., 2019).
Since each technique has its advantages and limitations, complementary and orthogonal strategies that do not rely on UV light crosslinking, probe-based capture and affinity purification will substantially help to establish whether a protein is linked or not to RNA. The SONAR computational approach (Brannan et al., 2016) for example predicted multiple new RBP candidates based on the observation that proteins interacting with RBPs were frequently RBPs themselves, a concept termed “RNA dependence”. Here, we modify this concept: we do not assume that RBP-interacting proteins are likely RBPs themselves, but postulate that also proteins which are not directly binding to RNA themselves, but whose interactome depends on RNA, are “RNA-dependent”. R-DeeP hence encompasses all proteins which are “affected” by RNA without necessarily directly binding to it (both RBPs and RBP-interacting, Figure 1A). Thus, this definition conceptually differs from the “RNA binding” concept. Considering that most RNPs are dynamic entities changing composition through assembly and disassembly within short periods of time, the concept of RNA dependence is particularly meaningful: RNA dependent proteins may directly contact RNA or reside within well-characterized RNPs, even if they only partially or indirectly interact with RNA like selected components of the spliceosome or ribosome (Akerman et al., 2015; Gerstberger et al., 2014).
This concept directly translates into a screening approach which offers several major advantages over current methods. First, it enables proteome-wide, specific and unbiased screening - neither dependent on enrichment steps nor restricted to polyA+-RNA, similar to a previous study (He et al., 2016). Second, R-DeeP yields quantitative data establishing for each protein not only the RNA-dependence, but in addition, the fraction of the protein that is RNA-dependent. This aspect is unique compared to previous approaches. A majority of the RNA-dependent proteins overlaps with RBP candidates from previous studies, but the quantitative nature of R-DeeP also allows gaining new insights into the RNA dependence of known RBPs. Third, R-DeeP can aid in the reconstruction of protein complexes based on co-segregation and integrates the information from the CORUM dataset on known protein-protein complexes. The simplicity of the protocol allows to biochemically fractionate the cells and to repeat mass spectrometry or western blot analysis on specific cell compartments as well as numerous other biological systems in the future.
The R-DeeP dataset and a comprehensive analysis and integration with orthogonal data is publicly available in a user-friendly and versatile database at http://R-DeeP.dkfz.de (Figure 7). The R-DeeP database provides access to the full dataset and offers search options for different types of analysis. For each protein, the database displays information about peaks and shifts, a graphical view of the protein profiles in the gradients and specific information about the protein, links to other databases with various download options. R-DeeP offers in addition a compilation of multiple resources for RBPs and RBP candidates. To facilitate the analysis of protein complexes, R-DeeP has been fully integrated with the CORUM database, which includes information on known mammalian protein-protein complexes (Ruepp et al., 2010). Finally, lists of co-segregating proteins for each control and RNase-treated fraction provide the possibility to search for potential interaction partners and to help reconstructing complexes. Together, the R-DeeP database is a resource for proteome-wide, specific and quantitative identification of proteins and complexes whose interactions depend on RNA.
Figure 7. R-DeeP Database: Database for RNA-dependent Proteins.
The R-DeeP database provides user-friendly access to the RNA dependence analysis for currently 4765 proteins. Multiple search options offer the possibility to display information for multiple proteins at once. in addition to peaks and shift information, graphical view and download options, the database also offers details about the protein with links to other protein databases, summarizing information about the actual RBP resources and integrates the analysis of the subunits of CORUM complexes for each protein. in addition, to help reconstructing complexes, a search option for potential interaction partners from the lists of co-segregating proteins has been implemented for each fraction from the control or RNase sample.
The analysis of protein properties showed that the newly identified RNA-dependent proteins were more closely related to the shifting RBPs than to the non-shifting proteins, with typical characteristics for RBPs: increased pi and positively charged amino acids (Brannan et al., 2016; Castello et al., 2012; Pancaldi and Bahler, 2011), as well as a high content of LCD and disorder-promoting amino acids (Castello et al., 2012).
The group of right-shifted proteins was of particular interest as right shifts were rare events (Figure 1D and 1E). For this group, we observed the lowest average pi (Figure S2C) and no significant enrichment for the molecular function “RNA binding” (Figure S2E). We assume that they gain interaction partners upon loss of RNA, e.g. because binding sites become available or repulsive RNA charges are diminished.
R-DeeP enables the distinction of proteins only partially interacting with RNA from proteins completely interacting with RNA revealing new aspects and dimensions of the RNA-related function of RBPs. Notably, a number of well-established RBPs showed only partial shifts (Figure 5D). For CTCF, the observation that the entire CTCF pool is RNA-dependent indicates that CTCF-binding to RNA is not related to a “moonlighting” function of CTCF and led us to establish RNA as one important factor in CTCF recruitment to chromatin (Figure 6), in addition to its RNA-dependent multimerization (Saldana-Meyer et al., 2014). Partial shifts are found in the R-DeeP screen with protein amounts down to 17.3% in the control fraction. This indicates that partial interactions are generally detectable by our approach which for the first time allows their quantitative assessment.
Regarding the non-shifting RBP candidates from previous studies, several explanations can be invoked to explain this discrepancy: they are not RBPs, they are involved in highly transitory interactions only captured with crosslinking or they are below the parameter cutoff for validation in our analysis. The threshold for the minimum shifting distance was set to >1 fraction. Repeating the analysis with a threshold of 0.7 fractions only added 185 proteins (106 RBP candidates and 79 new RNA-dependent proteins, Table S3). Some proteins might show RNA dependence in this second tier like GAPDH (Asencio et al., 2018; Garcin, 2018).
By definition, our approach does not allow determining the sites of interaction between RNA and proteins as complementary approaches did (Castello et al., 2016; He et al., 2016; Mullari et al., 2017), but it identifies in contrast both proteins directly and indirectly bound to RNA, proteins that may not crosslink efficiently with UV and proteins linked to non-polyadenylated RNAs. In addition, it quantifies the fraction of each protein depending on RNA.
Although our screening approach does not require any pre-treatment or pre-incubation of the cells with modified nucleotides as well as no purification steps, it relies on the preparation of cellular extract using detergent-containing buffer conditions. Possibly, interactions could be artificially occurring in the extract between cellular components that would be kept separated in intact cells. Such interactions cannot be excluded, but so far, we do not have any evidence in this regard with positive and negative controls behaving as expected and protein-protein complexes frequently co-segregating as previously determined according to the CORUM database. In turn, intracellular complexes could be separated during lysis and extraction, nonetheless many complexes defined in CORUM are recapitulated in the R-DeeP dataset. However, weak or short-lived interactions could be lost during the long centrifugation step so that one may expect to see mainly strong interactions. Finally, magnesium is known to stabilize RNA folding and ribosome particles. To favor strong interactions, we chose buffer conditions without magnesium. Given this potential bias, we expect the 537 RNA-dependent proteins newly discovered with R-DeeP to be based on strong interactions and to pave the way to relevant future findings in RNA biology.
Methods based on density gradient centrifugation are widely used in biology to separate or isolate cellular components and protein complexes. Recently, a study based on glycerol density gradient centrifugation, named Grad-seq, allowed the analysis of the whole cellular RNA of the bacteria Salmonella entehca with respect to its protein interactions (Smirnov et al., 2016). Future applications of R-DeeP are virtually unlimited with respect to the starting material: different cell types, different species, different pre-treatments of the cells or different subcellular compartments will expand the map of RNA dependence including its dynamics and species-, cell type- and disease-association.
In summary, we introduce the concept of “RNA dependence” to elucidate the impact of RNA on protein complexes, foster the discovery of novel RNA functions and provide the R-DeeP resource for proteome-wide, specific and quantitative identification of proteins and complexes, whose interactions depend on RNA.
STAR METHODS
CONTACT FOR REAGENT AND RESOURCE SHARING
Further information and requests for resources and reagents should be directed to and will be fulfilled by the lead contact Sven Diederichs (s.diederichs@dkfz-heidelberg.de).
EXPERIMENTAL MODEL AND SUBJECT DETAILS
HeLa S3 cells (female) were grown at 37°C and 5% CO2 in Ham’s F-12K (Kaighn’s) medium (Gibco, Cat. No. 21127–022) supplemented with 10% fetal bovine serum. HeLa cells (female) were grown at 37°C and 5% CO2 in Dulbecco’s modified eagles medium (DMEM) medium supplemented with 10% fetal bovine serum. Both cell lines were authenticated.
METHOD DETAILS
HeLa S3 Cell Lysate Preparation
HeLa S3 cells were harvested and centrifuged at 1200 rpm for 5 min at room temperature (RT). Lysis buffer (25 mM Tris-HCl (pH 7.4), 150 mM KCl, 0.5% (v/v) NP-40, 2 mM EDTA, 1 mM NaF, 0.5 mM DTT, 1× EDTA-free Protease Inhibitor Cocktail (add fresh)) was added to the cell pellet (100 μl for 2 × 107 cells) and cell suspension was incubated for 30 min on ice with resuspension by vortexing every 5 min. To completely break the cells, the cell suspension was frozen in liquid nitrogen, immediately and quickly thawed and homogenized using a thin needle. This step was repeated two times. To separate lysate from cell debris, the suspension was centrifuged in a microfuge at 13000 rpm for 10 min at 4°C and the supernatant was transferred into a new tube. The protein concentration was measured using 10 μl of cell lysate in 200 μl of a Copper(ll)-sulfate solution mixed 1:50 with a bicinchoninic acid (BCA) solution. As reference, 10 μl of standard bovine serum albumin (BSA) solutions with concentrations ranging from 1 to 15 mg/ml were used in 200 μl of BCA mix. If needed for storage purposes, the cell lysate was snap frozen in liquid nitrogen and stored at −80°C.
For sucrose density gradient fractionation, a cell lysate volume containing 2 mg of proteins was layered on top of the gradient.
RNase Treatment of Cell Lysates
For the analysis of RNA-dependent proteins, 100 μl of cell extract was pre-incubated with 5 μl of each RNase (10 μg/μl RNase A, 10 U/μl RNase I, 1000 U/μl RNase T1, 10 U/μl RNase H and 1 U/μl RNase III) at 4°C for 1 h. As control, 5 μl of lysis buffer were added to 100 μl of the same lysate, which were also incubated at 4°C for 1 h.
Sucrose Density Gradient Ultracentrifugation
To prepare the sucrose density gradient, ten 1.3 ml sucrose solutions from 50% (w/v) to 5% (w/v) sucrose in 100 mM NaCl, 10 mM Tris (pH 7.5) and 1 mM EDTA were layered on top of each other starting with the 50% sucrose solution at the bottom of the tube (Ultra-clear 14 ml tubes, Beckman Coulter). Each sucrose solution layer was frozen at −80°C for 15 min prior to addition of the next layer. The sucrose gradient tubes were stored at −20°C. An hour before starting the ultracentrifuge, the sucrose gradient tubes were placed in a cold room to slowly thaw the sucrose solutions without mixing the layers. The sample to analyze was layered on top of the gradient using a 200 μl cut pipet tip to avoid perturbing the gradient. Sucrose density gradient centrifugation was performed in a Sorvall Discovery 90SE Ultracentrifuge equipped with a SW 40 Ti Swinging-Bucket Rotor at 30,000 rpm for 18 h at 4°C as previously described (Hock et al., 2007). Centrifuge, rotor and tube references are listed in the Key Resources Table.
After ultracentrifugation, the tubes were carefully removed from the centrifuge and starting from the top of the gradient, 25 fractions (500 μl each) were transferred by pipetting into fresh 1.5 ml tubes. The sucrose gradient fractions were stored at −80°C or directly further processed for Coomassie staining, western blot or mass spectrometry analysis.
Western Blot Analysis
Sodium dodecyl sulfate - polyacrylamide gel electrophoresis (SDS-PAGE) was performed to separate proteins according to their molecular weight. A polyacrylamide gel was prepared consisting of a 10% separating gel and a 5% stacking gel. The samples containing 1× SDS sample buffer (30% (w/v) glycerol, 12% (w/v) SDS, 3.6 M DTT, 0.012% (w/v) bromophenol blue, 500 mM Tris-HCI (pH 6.8)) were boiled at 95°C for 5 min and shortly centrifuged afterwards. The same volume of 20 μl from each fraction was loaded per lane and separated by discontinuous SDS-PAGE. PageRuIer Prestained Protein Ladder was used as protein marker. The gels were run at 100 V to 120 V in a vertical electrophoresis chamber filled with 1× Laemmli buffer (25 mM Tris base, 192 mM glycine, 0.1% (w/v) SDS). The proteins were transferred from the gel onto a nitrocellulose membrane using a wet transfer system with wet transfer buffer (25 mM Tris base, 192 mM glycine) containing 20% methanol for either 2 h at 60 V or overnight at 20 V, both in a cold room. After blotting, the membrane was blocked for 1 h at RT in 5% milk in Tris-buffered saline (24.7 mM Tris-HCI (pH 7.4), 137 mM NaCl, 2.7 mM KCI) containing 0.05% Tween-20 (TBST). Following blocking, the membrane was incubated with the respective primary antibody diluted in the blocking solution or 5% BSA in TBST at 4°C overnight on a shaker. Next, the membrane was washed 3 times for 5 min with TBST and incubated with the appropriate secondary HRP-conjugated antibody diluted 1:5000 in blocking solution for 1 h at RT. Finally, the membrane was washed 3 times for 5 min with 1× TBST at RT. For development, the membrane was incubated with ECL reagent (Super Signal West PICO) for 5 min and developed using an INTAS ECL Chemocam imager. All antibodies are listed in the Key Resources Table.
Quantitative Analysis of the Western Blot Images
Quantitative analysis of western blot images was performed using the software LabImage 1D 2006 (Kapelan Bio-Imaging GmbH, www.labimage.com). The software allows the quantification of western blot bands with reduction of background using the “rolling ball” method. The sum of the western blot signal over the 25 fractions was normalized to 100 in order to be compared to the mass spectrometry analysis (Figure 2).
TCA Protein Precipitation
For mass spectrometry analysis, the protein content of 200 μl of each fraction was precipitated. First, 200 μl of HPLC-MS grade water (Burdick Jackson/Honeywell, Cat. No. 3654) were added to the 200 μl fraction in a standard Eppendorf tube (Sigma, Cat. No. 0030 120.086 (EU) or 022363204 (USA)) and the proteins were precipitated using 100 μl of 100% TCA (100% TCA stock solution: 10 g of trichloroacetic acid (SIGMA, Cat. No. 91228) were dissolved in a final, total volume of 10 ml HPLC-MS grade water and stored at 4°C). After intense vortexing and incubating on ice for 15 min, the samples were spun 10–20 min at 13000 rpm in a microfuge at 4°C. When placing samples in the microfuge, the orientation of the tube was noted so that the spot where a pellet would be located can be tracked. Most of the TCA solution was slowly and carefully removed with a P-1000 pipet while holding the tube with the pellet away from the experimenter. The last few hundred microliters were removed with a P-200. 1 ml 10% TCA was added, vortexed and spun 10 min at 13000 rpm in a microfuge at 4°C. The 10% TCA wash was removed and repeated once before addition of 1 ml of cold HPLC-MS grade acetone (Fisher Optima, Cat. No. A929). After vortexing to mix well and spinning for 10 min at 13000 rpm in a microfuge at 4°C, as much as possible of the acetone was carefully removed. The acetone wash was repeated twice. Finally, the tubes were left open on the bench sideways to allow to air dry without getting dust to settle within them.
Quantitative Mass Spectrometry Analysis
Samples were trypsin-digested (Promega, Cat. No. V5113) into peptides and dried. Previously dried, individual TMT reagent aliquots (Thermo Fisher Scientific, Cat. No. 90066) were resuspended in 40 μl of 150 mM HEPES (SIGMA, Cat. No. H3375) pH 8.5 / 20% acetonitrile (Burdick/Jackson, Cat. No. BJLC015), followed by transfer of the resuspended reagent aliquots to tubes containing the peptide digest and vortexing to mix reagent and peptides. After 1 h at RT, each reaction was quenched with 5 μl of 500 mM ammonium bicarbonate (Sigma-Aldrich, Cat. No. 9830) solution for 10 min, mixed based on fraction, diluted 3-fold with 0.1% trifluoroacetic acid (TFA, Burdick/Jackson, Cat. No. BB360P050) in water, and desalted. The desalted multiplexes were dried by vacuum centrifugation and analyzed on an Orbitrap Fusion (Senko et al., 2013) (Thermo Fisher Scientific) LC-MS/MS platform as previously described (Rusin et al., 2015). The resulting data files were searched using Comet (Eng et al., 2013) with a static mass of 229.162932 on peptide N-termini and lysines and 57.02146 Da on cysteines, and a variable mass of 15.99491 Da on methionines against the target-decoy version of the human proteome FASTA (UniProt; www.uniprot.org) and filtered to a ~1% FDR at the peptide level. TMT reporter ion intensities were determined using in-house software and summed per protein. At least two unique peptide sequences were required to contain quantification results per protein for inclusion into further analysis. For the detailed statistical analysis of the mass spectrometry data, please refer to the “Quantification and Statistical Analysis section” below.
Production of Heatmaps and Density Graphs
Heatmap Figure 1B: for each protein in each sample replicate (Control 1 to 3 and RNase 1 to 3), the sum of the amount in the 25 fractions (after fraction-wise normalization) was calculated and normalized to the maximum protein amount, The function heatmap.2 from the “gplots” R package (https://www.rdocumentation.org) was used to produce the heatmap.
Heatmap Figure 1D: for each shifted protein in each fraction, the average protein amount was calculated for the three control as well as for the three RNase replicates. The difference in protein amount between the average of the control minus the average of the RNase sample was calculated for each fraction, normalized to the maximum difference and represented in a heatmap using the R function heatmap.2.
Density graph Figure 1C: the protein amounts per fraction (after fraction-wise normalization and normalization of the total protein amount over the 25 fractions to 100) were represented for pairs of replicates (replicate 1 vs. replicate 2, 2 vs. 3 and 1 vs. 3) using the log10 of the protein amount and the R function hexbinplot from the “hexbin” R package. The increase in color intensity on the graph indicates an increased points density. To calculate the corresponding Pearson’s correlation coefficient, the function “cor.test” from the stats R package was applied on a table containing all possible combinations of protein amounts.
Comparison of RBP Resources
The protein lists of the RBP resources used in our study were retrieved from the respective main part or supplementary information of the corresponding manuscripts (Baltz et al., 2012; Bao et al., 2018; Beckmann et al., 2015; Brannan et al., 2016; Castello et al., 2012; Castello et al., 2016; Conrad et al., 2016; Cook et al., 2011; Gerstberger et al., 2014; He et al., 2016; Hendrickson et al., 2016; Kwon et al., 2013; Mullari et al., 2017; Perez-Perri et al., 2018; Ray et al., 2013; Sundararaman et al., 2016; Treiber et al., 2017; Trendel et al., 2018). For studies in other species than human, the human homolog proteins were retrieved from the UniProt database (www.uniprot.org). A list of human proteins was retrieved from AmiGO 2 (http://amigo.geneontology.org) by searching for GO terms associated with “RNA binding” and listing the corresponding human proteins from the UniProt database.
All proteins were listed using the entry names of the proteins according to the UniProt release from February 2018. A table summarizing the RBP resources is found as Table S1.
Calculation of the Shifting Coefficients
Shifting coefficients at control peaks were calculated as the product of the “protein amount at maxima * loss”. Shifting coefficients at RNase peaks were calculated as the product of the “protein amount at maxima * gain”. For each protein, a shifting coefficient was produced for each peak in the control and in the RNase sample. For proteins with multiple peaks in either condition, all possible pairs were represented in a graph (Figure 5A).
Proteins shifting with their full amount from a control to an RNase peak are located at the top right part of the graph (one peak in each control and RNase sample). Proteins depicting no change between the control and the RNase sample are located at the bottom left part of the graph. Proteins with multiple peaks in control and/or RNase sample are distributed above or under the diagonal and present various ranges of shifting coefficient values. Proteins with no statistically significant difference at the control nor the RNase peak were depicted in red with few proteins showing larger shifting coefficient values but lacking statistical significance.
Analysis of Protein Group Properties
Isoelectric point (pI) analysis was conducted using the EMBOSS pepstats function (European Bioinformatics institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK). Low complexity domains and protein domains were analyzed via BioMart (Seg, interPro functions) (Smedley et al., 2015). For low complexity domains, only sequences longer than 30 amino acids were considered. Amino acid composition was analyzed using the function “composition” on the local HUSAR server, returning the number and percentage of each amino acid in all the sequences of a protein group. Amino acids promoting protein disordered were analyzed (A, K, P, G, Q, E, S) (Theillet et al., 2013). The molecular function GO term analysis was conducted on the website of the Gene Ontology Consortium (http://geneontology.org).
Integration of the CORUM Database
Information about mammalian protein complexes were downloaded from the CORUM database. For Figure 3A, each protein was classified into one of the categories: “RBP” = RNA-binding protein according to the 15 RBP lists described above; “RBP-interacting” = protein engaged in complexes with RBPs defined above; “RBP-indirect”: protein engaged in complexes with “RBP-interacting” proteins but not “RBPs” themselves; “RBP-independent”: protein engaged neither in complexes with “RBP” nor “RBP-interacting” proteins. Also, for the R-DeeP-detected proteins in each of the categories, the percentage of shifting proteins was calculated (Figure 3B).
Protein complex information from the CORUM database was included into the R-DeeP database for each protein with CORUM information.
Development of the R-DeeP Database
The R-DeeP database for RNA-dependent Proteins was developed using the Shiny open source R package from R Studio for building web applications using R (https://shiny.rstudio.com). A User Guide is included in the Supplemental Information.
Gradient Calibration with Standard Proteins
To calibrate the sucrose density gradient, standard proteins (RNase A, 14 kDa; BSA, 65 kDa; Aldolase, 160 kDa; Catalase, 240 kDa and Ferritin, 480 kDa) were loaded on the gradient and their respective positions were evaluated from a Coomassie staining of the fractions on a 10% Sodium dodecyl sulfate - polyacrylamide gel electrophoresis (SDS-PAGE) (Figure S4). For SDS-PAGE details, see method section on “Western Blot Analysis”.
Evaluation of Apparent Molecular Weight
To evaluate the apparent MW of the mass spectrometry detected proteins, the distribution of the standard proteins (position vs MW) was fitted. Using the best fit (y=1146.9×2.2577; R2=0.9984), we calculated the apparent MW for each shifted protein using the position of the first RNase peak. By setting a cut-off of 0.5- and 2-fold for the ratio apparent MW/theoretical MW, we classified the proteins into four groups: proteins with small apparent MW, proteins at approximately monomeric size and proteins remaining in complex larger than their expected monomeric size. All the proteins found with a first RNase peak at fraction 24 or higher were considered as precipitated.
Preparation of Protein Overexpression Plasmids
Gateway clones were obtained from the GPCF at the DKFZ as specified in the Key Resources Table. C-terminal Stop codons were introduced into the plasmid containing the CTCF and PSMB1 ORFs using the primer Fwd. CTCF_Stop, Rev. CTCF_Stop, Fwd. PSMBl_Stop, Rev. PSMBl_Stop, respectively. All constructs were cloned into the pFRT-DEST/FLAG-HA by the Gateway LR cloning protocol using the Gateway LR Clonase II enzyme mix (Thermo Fisher Scientific) and following the manufacturer’s instructions. The empty Gateway expression plasmids were amplified in DB3.1 competent cells (Thermo Fisher Scientific). For LR reactions and mutagenesis, we used the One Shot TOP10 Chemically Competent cells (Thermo Fisher Scientific).
The plasmid pFRT-FLAG-HA-ΔCmR-ΔccdB (empty plasmid on Figure S2) was produced from the Gateway plasmid pFRT-Flag-HA through deletion of a fragment containing the CmR and ccdB elements by Notl digestion. The plasmid was amplified in Mach1 T1R competent cells.
Cross-Linking and Immunoprecipitation - CLIP
Based on the protocol from the Landthaler lab (Baltz et al., 2012), 14× 15 cm plates containing 5×106 HeLa cells per plate were used for 5 CLIP samples. First, at 90% confluence, the empty plasmid (pFRT-FLAG-HA-ΔCmR-ΔccdB, 4 plates) or the overexpression plasmid for the protein of interest (pFRT-FLAG-HA_ASNS, 4 plates; pFRT-FLAG-HA_CTCF, 1 plate; pFRT-FLAG-HA_hnRNPU, 1 plate and pFRT-FLAG-HA_PSMB1, 4 plates; pFRT-FLAG-HA_HMGN1, 1 plate; pFRT-FLAG-HA_CASP7, 1 plate; pFRT-FLAG-HA_REEP4, 1 plate and pFRT-FLAG-HA_THYN1, 1 plate) were transfected into the cells using Lipofectamine 2000 transfection reagent (Thermo Fisher Scientific) and incubated for 6 h before changing the medium. 24 h post transfection, the cells were crosslinked at 254 nm in a Stratalinker 2400 (Stratagene) or similar device with 0.15 J/cm2 (XL) or not crosslinked (No XL). 5 ml PBS were added per plate, cells were collected using a cell scraper and transferred on ice into a pre-chilled 50 ml tube. The cells were centrifuged at 800 g for 5 min at 4°C and flash frozen in liquid nitrogen for storage at −80°C or further processed.
The cells were resuspended in three cell pellet volumes of NP-40 lysis buffer (50 mM HEPES-KOH pH 7.5, 150 mM KCl, 2 mM EDTA, 1 mM NaF, 1% (v/v) NP-40, 0.5 mM DTT, 1× cOmplete EDTA-free protease inhibitor) and incubated on ice for 30 min. The resulting cell lysate was cleared by centrifugation at 13000 g for 15 min at 4°C and the lysate was cleared once more by centrifugation at 13000 g for 10 min at 4°C. 20 to 80 μl were taken out as input control for WB. RNase T1 was added to a final concentration of 0.2 U/μl and incubated in a water bath for 15 min at 22°C with inverting every 2 min to mix. Finally, this reaction was cooled on ice for 5 min.
40 μl of Anti-FLAG M2 beads slurry provided as 50% suspension were washed twice with 1 ml of TBS (50 mM Tris-HCl pH 7.4, 150 mM NaCl) and resuspended in one original volume of TBS. The lysate was added to the pre-washed Anti-FLAG M2 beads and incubated in a 1.5 ml Safe Lock Eppendorf tube on a rotating wheel for 1 h at 4°C. The beads were collected on a magnetic rack, the supernatant was removed and the beads were washed 3× with 1 ml of IP wash buffer (50 mM HEPES-KOH pH 7.5, 300 KCl, 0.05% (v/v) NP-40, 0.5 mM DTT, 1× cOmplete EDTA-free protease inhibitor). The beads were resuspended in the original volume of IP wash buffer, incubated with RNase T1 to a final concentration of 10 U/μl in a water bath for 8 min at 22°C and cooled on ice for 5 min. The beads were washed 3× with 1 ml of high-salt wash buffer (50 mM HEPES-KOH pH 7.5, 500 mM KCl, 0.05% (v/v) NP-40, 0.5 mM DTT, 1× complete EDTA-free protease inhibitor) prior to resuspension in 1 bead volume of dephosphorylation buffer (50 mM Tris-HCl pH 7.9, 100 mM NaCl, 10 mM MgCI2, 1 mM DTT). Calf intestinal alkaline phosphatase (CIP, NEB) was added to a final concentration of 0.5 U/μl and incubated for 10 min at 37°C mixing at 800 rpm. Then, the beads were washed 2× with 1 ml phosphatase buffer and 2× with 1 ml PNK buffer (50 mM Tris-HCl pH 7.5, 50 mM NaCl, 10 mM MgCI2). The beads were resuspended in one original bead volume of PNK buffer. γ[32P]ATP was added to a final concentration of 0.5 μCi /μl (stock 10 μCi/μl) and T4 PNK (NEB) was added to a final concentration of 1 U/μl and incubated for 30 min at 37°C and mixing at 800 rpm. Finally, non-radioactive ATP was added to a final concentration of 100 μM (stock 10 mM) and incubated for another 5 min at 37°C and mixing at 800 rpm. The beads were washed 6× with 800 μl of PNK buffer and resuspended in 20 μl of elution buffer that consists of 3 ml AM 1000 (10% glycerol, 0.2 mM EDTA pH 8.0, 5 mM MgCI2, 1000 mM KCl), 7 ml AM 0 (10% glycerol, 0.2 mM EDTA pH 8.0, 5 mM MgCI2) and 100 μl 0.1% NP-40. 4 μl of FLAG peptide were added and incubated overnight at 4°C with 650 rpm mixing. Alternatively, the beads were treated with RNase A (0.2U/μl RNase A for 15 min at 37°C) to monitor the reduction of the RNA-bound radioactive signal before elution (XL + RNase). The supernatant was transferred to a new tube and 6 μl of NuPAGE LDS sample buffer (Thermo Fisher Scientific) supplemented with 100 mM DTT were added and incubated for 5 min at 70°C.
20 μl of the FLAG peptide elution were loaded onto a 4–15% precast Mini-Protean-TGX gel (BioRad) and transferred on a membrane at 120 V for 1.5 h on ice. The hot membrane was further exposed for 1 day to a phosphorimager screen. For protein analysis per western blot, 20 μl of the IP were loaded on a 4–15% precast Mini-Protean-TGX gel (BioRad). At least one empty lane was left between the different samples.
The CLIP signal was measured in the triplicate experiments by applying the standard intensity measurement function of the free image processing software ImageJ (Schneider et al., 2012) on areas of the same size for all proteins. The signal was normalized to the maximum possible intensity (area in pixel*256) knowing that the maximum pixel value in a 8-bit image is 256.
Chromatin Immunoprecipitation - ChIP
16 × 106 cells (4 × 106 per ChIP sample) were harvested and washed twice in cold PBS (137 mM NaCl, 2.7 mM KCI, 10 mM Na2HPO4, 2 mM KH2PO4) with centrifugation at 1,000 g at 4°C for 5 min. To isolate cell nuclei, the cell pellet was resuspended in 350 μl ChIP cell lysis buffer (5 mM HEPES pH 8.0, 85 mM KCI, 0.5% (v/v) NP-40, 1× EDTA-free Protease Inhibitor Cocktail (add fresh)) per 8 × 106 cells (2× control and 2× RNase-treated sample together) and incubated on ice for 10 min. The samples were centrifuged at 2400 g for 5 min at 4°C and the pelleted cell nuclei were washed once with iT buffer (25 mM Tris HCl (pH 7.5), 137 mM NaCl, 5 mM KCI, 0.5 mM MgCI2, 0.7 mM CaCI2, 0.3 mM Na2HPO4, 1× EDTA-free Protease Inhibitor Cocktail (add fresh)). Then, the nuclei were dissolved in 350 μl iT buffer and treated with 1 μl of each RNase (10 μg/μl RNase A, 10 U/μl RNase 1, 1000 U/μl RNase T1, 10 U/μl RNase H and 1 U/μl RNase III) or with 5 μl iT buffer as control for 10 min at 4°C while rotating. Next, the nuclear proteins were crosslinked to chromatin by the addition of formaldehyde to a final concentration of 1%. Samples were rotated for 10 min at RT. Crosslinking was stopped by adding glycine to a final concentration of 125 mM and incubating for 5 min at RT on a rotating shaker. The cell nuclei were collected by centrifugation at 1000 g for 5 min at 4°C. To lyse cell nuclei, 400 μl nuclei lysis buffer (50 mM Tris-HCI (pH 8.1), 10 mM EDTA (pH 8.0), 1% (w/v) SDS, 1× EDTA-free Protease Inhibitor Cocktail (add fresh)) was added and samples were incubated on ice for 10 min. The lysate was transferred to special Bioruptor Plus TPX microtubes (Diagenode) and sonicated for 100 cycles (30 sec on/off, high, 4°C) in a Bioruptor (Diagenode) to result in DNA fragments of 200 to 400 bp in length. Cellular debris were removed by centrifugation at 17000 g for 10 min at 4°C and the supernatant (chromatin solution) was further processed for immunoprecipitation.
To start the chromatin immunoprecipitation (IP), the lysates were diluted 1:5 in ChIP dilution buffer (0.01% (w/v) SDS (w/v), 1.1% (v/v) Triton-X, 1.2 mM EDTA (pH 8.0), 16.7 mM Tris-HCI (pH 8.1), 167 mM NaCl, 1× EDTA-free Protease Inhibitor Cocktail (add fresh)). Chromatin solutions were pre-cleared with 20 μl slurry of pre-washed protein G magnetic beads for at least 30 min at 4°C and rotating before they were incubated with specific antibodies or the corresponding normal IgG. The beads were collected using a DynaMag-2 magnet and discarded. 100 μl of the sample (input) was taken out for later normalization purpose and the rest was split for IgG and specific antibody ChIP. 1 ng of anti-CTCF monoclonal antibody (Cell signaling) or 0.5 ng anti-SPl antibody (Cell Signaling) was used per IP sample. As a control, the same amount of normal rabbit IgG was added to another IP chromatin sample. Samples were incubated overnight at 4°C rotating. The immunocomplexes were incubated with 30 μl slurry of pre-washed protein G magnetic beads for 2 h at 4°C while rotating. The beads were collected with a DynaMag-2 magnet and washed sequentially for 5 min by rotation at 4°C with 700 μl of the following buffers: low-salt wash buffer (0.1% (v/w) SDS, l% (v/v) Triton-X, 2 mM EDTA, 20 mM Tris-HCI (pH 8.1), 150 mM NaCl), twice with high-salt wash buffer (0.1% (w/v) SDS, 1% (v/v) Triton-X, 2 mM EDTA, 20 mM Tris-HCI (pH 8.1), 500 mM NaCl), LiCI wash buffer (10 mM Tris-HCI (pH 8.1), 250 mM LiCI, 1% (v/v) NP-40, 1% (w/v) Deoxycholic acid, 1 mM EDTA) and twice with TE buffer (10 mM Tris-HCI (pH 8.1), 1 mM EDTA). The immunocomplexes were then eluted in 150 μl of elution buffer (100 mM NaHCO3, 1% (w/v) SDS) during 20 min at RT with rotation. This elution step was repeated once and both elutions were combined in one tube (300 μl in total). At this stage, 200 μl TE buffer was added to the input sample. To reverse the formaldehyde crosslink, NaCl was added to a final concentration of 0.3 M together with RNase A (final concentration 30 μg/ml) and incubated overnight at 65°C shaking at 800 rpm. Then, the samples were supplemented with 800 μl of 100% ethanol and incubated at −80°C for 2 h. After centrifugation at 13000 rpm at 4°C for 20 min, the supernatant was completely removed, each pellet was dissolved in 100 μl of H2O, 2 μl of 0.5 M EDTA (pH 8.0), 4 μl 1 M Tris-HCl (pH 6.5) and 1.5 μl of proteinase K (20 mg/ml) and was incubated at 45°C for 2 h. Finally, the samples were cleaned up by using the QIAquick Nucleotide Removal Kit following the manufacturer’s instructions and the chromatin was eluted in 50 μl H2O. The samples were stored at −20°C and further analyzed by quantitative polymerase chain reaction (PCR).
Quantitative PCR Analysis
In order to analyze the result of the CTCF ChIP samples, a qPCR was performed using an Applied Biosystems StepOnePlus Thermocycler. This method was able to quantify the amount of specific DNA fragments in the ChIP samples via non-specific fluorescent dyes that intercalate with dsDNA. The Power SYBR™ Green PCR Master Mix was used according to manufacturer’s instructions. The reaction was setup in a total volume of 10 μl using 2 μl of the ChIP samples, 5 μl 2× Power SYBR™ Green PCR Master Mix, 2.6 μl H2O and 0.2 μl of 10 μM forward and reverse primer each. For each qPCR reaction, samples were pipetted in triplicates onto a MicroAmp Fast Optical 96-well reaction plate (Thermo Fischer Scientific). The plate was covered with a MicroAmp Optical Adhesive Film (Thermo Fischer Scientific) and centrifuged at 1000 rpm for 1 min prior to start. The initial holding stage was at 95°C for 10 min. The cycling stage was performed 40 times consisting of a denaturation step at 95°C for 15 s and a subsequent annealing/elongation step at 60°C for 1 min.
The qPCR data was analyzed via the comparative Ct method using the StepOne™ Software v2.2.2 and Microsoft Excel. Amplification curves were checked and the threshold was set to 0.2 for every primer pair in order to facilitate the comparison of Ct values. If the SD within the technical replicates was higher than 0.5, the outlying value was deleted only if the SD was reduced to less than 0.5 afterwards. The data was normalized by calculating the fold enrichment over IgG using the formula 2−ΔΔCt. ΔΔCt was obtained by subtracting the Ct-value of the ChIP sample from the Ct-value of the IgG control.
For the ChIP-qPCR, experimentally identified CTCF binding sites were obtained from the CTCF-binding site database CTCFBSDB 2.0 (Ziebarth et al., 2013). The precise nucleotide sequences of the binding sites were retrieved from UCSC Genome Browser or Ensembl. To design target-specific primers, the publically available software Primer Blast (NCBI) was used. Primers for qPCR had a length of approximately 20–30 nt with a melt temperature of approximately 60°C and a GC content between 40 and 60%. The resulting amplicons had a length of 80–170 nt. Sequence information of primer pairs is available in the Key Resources Table.
RT-qPCR Analysis
The reverse transcriptase reaction was started with 2 μg total RNA incubated for 5 min at 65°C with 10 mM dNTPs and 1 μl of random hexamer primer. 5× Reaction Buffer, 0.5 μl of Ribolock and 1 μl of RevertAid Reverse Transcriptase (Thermo Fisher Scientific, #EP0441) were added to the reaction and incubated at RT (25°C) for 10 min, 42°C for 1 h and 70°C for 10 min. The resulting cDNA was diluted in water 1:20 prior to use for qPCR reaction. For RNase-treated lysate, the same volumes as for the untreated lysate was used.
Immunofluorescence Staining
For immunofluorescent (IF) staining, cells were grown on glass cover slips in a 12-well plate. Cells were washed with cold PBS and permeabilized with PBS containing 0.25% Triton-X for 2 min prior to treatment with 1 mg/ml RNase A in PBS for 10 min at RT. In parallel, another permeabilized sample was incubated with PBS for 10 min at RT as untreated control. After washing the cells with cold PBS, they were fixed with 4% paraformaldehyde for 10 min at RT. To remove the paraformaldehyde completely, cells were washed twice with cold PBS. Thereafter, cells were blocked in 10% goat serum/PBS containing 0.3% Triton-X at RT for 30 min. Primary antibodies specific to the protein of interest were diluted in 10% goat serum/PBS in presence of 0.25% Triton-X (CTCF rabbit mAb 1:800, P-Actin mouse mAb 1:500) and incubated on the slides overnight at 4°C. To remove excessive primary antibody, cells were washed three times with PBS. Appropriate secondary antibodies conjugated with fluorescent dyes were diluted 1:500 in 5% goat serum/PBS containing 0.1% Triton-X and incubated on the slides at RT for 1 h (Alexa Fluor 488 anti-rabbit, Alexa Fluor 594 anti-mouse). Then, the cells were incubated in 10 μg/ml DAPI solution in PBS for 10 min at RT. The slides were again washed three times in PBS and mounted on microscope slides with Mowiol (10% (w/v) Mowiol 4–88, 25% (w/v) glycerol and 0.1 M Tris (pH 8.5)). The fluorescently stained cells were analyzed via fluorescence microscopy using a Leica TCS SP5 II confocal laser scanning microscope in the DKFZ Microscopy Core Facility.
Colocalization Analysis
To analyze colocalization of CTCF and DAPI staining in cells, 30 images of each condition were taken using the same magnification (63×/1.40 objective). The quantification of colocalization between CTCF and DAPI stains in the images was measured using the free image processing software ImageJ (Schneider et al., 2012) with its pre-installed plugin Coloc 2 (https://imagej.net/Coloc_2). This tool performs measurements of the pixel intensity correlation of the two different fluorescence signals and resulted in a Pearson’s correlation coefficient.
Cell Fractionation
For cell fractionation and RNase treatment of the cell nuclei, we adapted a previous protocol (Gagnon et al., 2014): 107 cells were harvested and resuspended by gently pipetting in 1ml ice-cold HLB buffer (Hypotonic Lysis Buffer: 10 mM Tris-HCl (pH 7.5), 10 mM NaCl, 3 mM MgCl2, 0.3% (v/v) NP-40, 10% (v/v) glycerol), complemented with 1× protease inhibitor (complete EDTA-free protease inhibitor cocktail, Roche) and 1× phosphatase inhibitor (PhosSTOP, Roche). The mix was incubated on ice 10 min and vortexed briefly. The cells were then centrifuged at 800 g for 8 min at 4°C. The supernatant (cytoplasmic fraction) was transferred on a new tube, kept on ice and 5 M NaCl solution was added to adjust the NaCl concentration to 140 mM. The nuclei pellet was washed 4× by adding HLB, pipetting and centrifuging at 200 g for 2 min at 4°C. It was then resuspended in 500 μl ice-cold MWS buffer (Modified Wuarin-Schiebler buffer, 10 mM Tris-HCl (pH 7.5), 300 mM NaCl, 4 mM EDTA, 1 M urea, 1% (v/v) NP-40, 1% (v/v) glycerol), complemented with 1× protease inhibitor and 1× phosphatase inhibitor. The nuclei were incubated on ice for 15 min with or without treatment with 1 μl RNase A, vortexed and centrifuged at 1000 g for 5 min at 4°C. The supernatant (nucleoplasmic fraction) was transferred to a new tube on ice. The chromatin pellet was washed twice with MWS buffer by incubating 5 min on ice and centrifuging at 500 g for 3 min at 4°C. 0.5mL NLB (Nuclei Lysis Buffer, 20 mM Tris-HCl (pH 7.5), 150 mM KCl, 3 mM MgCl2, 0.3% (v/v) NP-40, 10% (v/v) glycerol) was added to the chromatin pellet and the pellet was sonicated 3× at 20% power for 15 s in an ice bath, with 2 min cooling between each sonication. The sonicated chromatin (chromatin fraction) was kept on ice. The three fractions were used for western blot or kept at −80°C for further processing.
Agarose Gel Electrophoresis
To prepare the agarose gel, 1× Tris-Borate-EDTA (TBE) buffer (89 mM Tris base, 89 mM Boric acid, 2 mM EDTA pH 8.0) containing 1% (w/v) LE agarose was boiled in a microwave until the agarose was dissolved. After cooling down to approximately 60°C, the solution was poured into a gel electrophoresis chamber and supplemented with 2 μl ethidium bromide. The solid gel was submerged in TBE buffer. The RNA samples were mixed with 6× RNA loading dye (95 % (v/v) formamide, 0.05 % (w/v) SDS, 0.05 % (w/v) bromophenol blue, 0.45 mM Tris, 0.045 mM EDTA). The samples were loaded onto the gel and the electrophoresis was performed at 100 V for 20–50 min. The separated RNA fragments were visualized under ultraviolet (UV) light using the INTAS gel documentation system.
Bioanalyzer Analysis
Purified total RNA samples were loaded on an Agilent RNA chip and analyzed using a Bioanalyzer 2100 following the manufacturer’s recommendations. Representative uncropped images are shown.
QUANTIFICATION AND STATISTICAL ANALYSIS
Statistical Analysis of the Mass Spectrometry Data
The processing of the raw mass spectrometry data produced a large table containing the amount of each protein (4765 rows) in each fraction and for each condition in triplicate (3×2×25 = 150 columns). Using R programming (www.r-project.org) and the open-source Bioconductor (Gentleman et al., 2004), we developed a thorough statistical analysis to automatically identify RNA-dependent proteins.
First, we normalized the three replicates for each fraction and for each condition (for example: three replicates for fraction 1 in the control condition) using the mean value method as median and quantile methods would not work because there are many fractions without protein (value = 0). In order to automatically determine the maxima of each protein profile, it was necessary to smoothen the data using a sliding window of three fractions (for example, value in fraction 3 = average of values in fractions 2, 3 and 4). Finally, the amount of each protein per sample was normalized to 100. After this step, the sum of all normalized fractions per condition (control or RNase-treated) for each protein is equal to 100 (in other words: for each condition, 100% of each protein is distributed between the 25 fractions). For each protein, the amount of protein in the 25 fractions of one condition constituted the profile of the protein per condition.
Next, we proceeded to the automatic selection of the maxima (value and position) for each mean protein profile, that is calculated as the mean profile from the three replicates for one condition. The absolute maximum was defined as the maximum value and the respective fraction(s). A local maximum was defined at fractions where the value was higher than the value of two neighboring fractions both before and after the local maximum and with a value >2% of the total protein amount. Finally, a number of profiles contained regions that we called “shoulder”, that contained a significant amount of proteins, but that could not be found using the absolute or local maximum methods. To set a value and a position to those shoulder regions, we defined a shoulder maximum with the value of the fraction that was 1) at least three fractions away from any other maximum and 2) in the middle of four consecutive fractions with values >2% of the total protein amount. Local and shoulder maxima were validated as maxima if they were >25% of the absolute maxima of the protein profile. Altogether, the absolute, local and shoulder maxima determined the number of maxima for each protein profile. To automatically fit each profile using a Gaussian curve containing as many peaks as maxima, the position of each maximum, its value and an estimation of the standard deviation was provided to the algorithm (function optim, method=“BFGS”). The fitting algorithm provided a set of optimal parameters for each peak of the Gaussian curve fitted to the protein profile: amplitude (height of the peak), mean (fraction of the peak in the fitted curve), sigma (width of the peak at half-maximal height). These parameters were used as estimate to fit each individual replicate with a Gaussian curve and determine the parameters for the peaks of the individual profiles. The fitting algorithm provided a value (residue) that can be used to estimate the accuracy of the fitting procedure. For fits with residues >260 or fits with negative amplitudes, the fits were manually corrected, either by adjusting the starting parameters given to the algorithm or by using multiple Gaussian curves to fit the individual profile. For these proteins, the graphical representation of the profiles in the R-DeeP database are shown with a red frame.
Using the parameters of the individual Gaussian fits for the three replicates in the control and the RNase-treated samples, the significance of the difference between the control and the RNase Gaussian fit at each maximum was calculated using a student t-test that was corrected for multiple testing using the False Discovery Rate (FDR).
The fitting parameters of the mean profile of three replicates were used to calculate the area under the curve at each peak, which represents the amount of proteins found at this position (fraction) in the gradient. Calculation of the amplitude of the mean RNase fit curve at the maximum of the mean control fit curve and the amplitude of the mean control fit curve at the maximum of the mean RNase fit curve allowed to determine loss at the maxima of the control protein profiles and gain at the maxima of the RNase-treated protein profile. Altogether, we defined a set of parameters to describe a shift between the position of the protein in the control sample and its position in the RNase-treated sample. These parameters are:
the respective positions of the maxima in the control and RNase samples
the protein amounts associated with each maximum
gain and loss at the maxima of the control and RNase protein profile
the significance (FDR corrected p-value) of the difference between the values of the fits in the control and RNase-treated samples
the distance between the maxima in the control and RNase-treated samples.
Finally, specific criteria were applied to identify RNA-dependent proteins:
distance between maxima of control and RNase samples > 1 fraction
Significance p-values < 0.05 (FDR) at the control peak or the RNase peak
Shifts with negative loss or gain values were removed.
DATA AND SOFTWARE AVAILABILITY
The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium (Vizcaino et al., 2014) through the PRIDE partner repository. PX accession number PXD010484. The dataset of raw images is available through Mendeley Data at http://dx.doi.org/10.17632/7gzxpzp9fj.l.
ADDITIONAL RESOURCES
The database on RNA-dependent proteins providing several search and download options is available at http://R-DeeP.dkfz.de.
Supplementary Material
Highlights.
R-DeeP implements the RNA dependence concept using density gradient centrifugation
Proteome-wide, specific and quantitative analysis of 1784 RNA-dependent proteins
Comprehensive online resource including 537 proteins never linked to RNA before
Quantitative RNA-dependence uncovers RNA impact on CTCF recruitment to chromatin
ACKNOWLEDGEMENTS
First, the authors would like to thank the CORUM database team for providing and maintaining this important resource. We are grateful to Prof. Dr. Ute Kothe, University of Lethbridge, Canada, for helpful discussions and constructive feedback on this study. We thank the following Core Facilities of the German Cancer Research Center (DKFZ): The Light Microscopy Unit, the team of the Vector and Clone Repository, the HUSAR team (GPCF DKFZ) and Erik Bernstein from the Core Facility for Information Technology. We also thank Matthias Groß, Annette Weninger and Karin Klimo for their help with western blotting and Bioanalyzer analysis. This research was partially funded by the DKFZ NCT3.0 Integrative Project in Cancer Research (NCT3.0_2015.54 DysregPT) grant (to S.D.) and the National Institutes of Health (NIH) (T32 GM008704 to S.F.R. and R35 GM119455 to A.N.K.).
Footnotes
DECLARATION OF INTERESTS
The authors declare no competing financial interests. S.D. is co-owner of siTOOLs Biotech GmbH, Martinsried, Germany.
SUPPLEMENTAL INFORMATION
Supplemental Information includes seven figures, one table and one User Guide for the R-DeeP database.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
REFERENCES
- Agarwal H, Reisser M, Wortmann C, and Gebhardt JCM (2017). Direct Observation of Cell-Cycle-Dependent Interactions between CTCF and Chromatin. Biophys J 112, 2051–2055. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Akerman M, Fregoso OI, Das S, Ruse C, Jensen MA, Pappin DJ, Zhang MQ, and Krainer AR (2015). Differential connectivity of splicing activators and repressors to the human spliceosome. Genome Biol 16, 119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Asencio C, Chatterjee A, and Hentze MW (2018). Silica-based solid-phase extraction of cross-linked nucleic acid-bound proteins. Life Science Alliance 1, e201800088. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, et al. (2000). Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25, 25–29. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Baltz AG, Munschauer M, Schwanhausser B, Vasile A, Murakawa Y, Schueler M, Youngs N, Penfold-Brown D, Drew K, Milek M, et al. (2012). The mRNA-bound proteome and its global occupancy profile on protein-coding transcripts. Mol Cell 46, 674–690. [DOI] [PubMed] [Google Scholar]
- Bao X, Guo X, Yin M, Tariq M, Lai Y, Kanwal S, Zhou J, Li N, Lv Y, Pulido-Quetglas C, et al. (2018). Capturing the interactome of newly transcribed RNA. Nat Methods 15, 213–220. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Beckmann BM, Horos R, Fischer B, Castello A, Eichelbaum K, Alleaume AM, Schwarzl T, Curk T, Foehr S, Huber W, et al. (2015). The RNA-binding proteomes from yeast to man harbour conserved enigmRBPs. Nat Commun 6, 10127. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brannan KW, Jin W, Huelga SC, Banks CA, Gilmore JM, Florens L, Washburn MP, Van Nostrand EL, Pratt GA, Schwinn MK, et al. (2016). SONAR Discovers RNA-Binding Proteins from Analysis of Large-Scale Protein-Protein Interactomes. Mol Cell 64, 282–293. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Castello A, Fischer B, Eichelbaum K, Horos R, Beckmann BM, Strein C, Davey NE, Humphreys DT, Preiss T, Steinmetz LM, et al. (2012). Insights into RNA biology from an atlas of mammalian mRNA-binding proteins. Cell 149, 1393–1406. [DOI] [PubMed] [Google Scholar]
- Castello A, Fischer B, Frese CK, Horos R, Alleaume AM, Foehr S, Curk T, Krijgsveld J, and Hentze MW (2016). Comprehensive Identification of RNA-Binding Domains in Human Cells. Mol Cell 63, 696–710. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Castello A, Fischer B, Hentze MW, and Preiss T (2013). RNA-binding proteins in Mendelian disease. Trends Genet 29, 318–327. [DOI] [PubMed] [Google Scholar]
- Chu C, Spitale RC, and Chang HY (2015). Technologies to probe functions and mechanisms of long noncoding RNAs. Nat Struct Mol Biol 22, 29–35. [DOI] [PubMed] [Google Scholar]
- Conrad T, Albrecht AS, de Melo Costa VR, Sauer S, Meierhofer D, and Orom UA (2016). Serial interactome capture of the human cell nucleus. Nat Commun 7, 11212. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Consortium EP, Birney E, Stamatoyannopoulos JA, Dutta A, Guigo R, Gingeras TR, Margulies EH, Weng Z, Snyder M, Dermitzakis ET, et al. (2007). Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447, 799–816. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cook KB, Kazan H, Zuberi K, Morris Q, and Hughes TR (2011). RBPDB: a database of RNA- binding specificities. Nucleic Acids Res 39, D301–308. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Djebali S, Davis CA, Merkel A, Dobin A, Lassmann T, Mortazavi A, Tanzer A, Lagarde J, Lin W, Schlesinger F, et al. (2012). Landscape of transcription in human cells. Nature 489, 101–108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dreyfuss G, Kim VN, and Kataoka N (2002). Messenger-RNA-binding proteins and the messages they carry. Nat Rev Mol Cell Biol 3, 195–205. [DOI] [PubMed] [Google Scholar]
- Eng JK, Jahan TA, and Hoopmann MR (2013). Comet: an open-source MS/MS sequence database search tool. Proteomics 13, 22–24. [DOI] [PubMed] [Google Scholar]
- Engreitz J, Lander ES, and Guttman M (2015). RNA antisense purification (RAP) for mapping RNA interactions with chromatin. Methods Mol Biol 1262, 183–197. [DOI] [PubMed] [Google Scholar]
- Gagnon KT, Li L, Janowski BA, and Corey DR (2014). Analysis of nuclear RNA interference in human cells by subcellular fractionation and Argonaute loading. Nat Protoc 9, 2045–2060. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Garcin ED (2018). GAPDH as a model non-canonical AU-rich RNA binding protein. Semin Cell Dev Biol. [DOI] [PubMed] [Google Scholar]
- Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, et al. (2004). Bioconductor: open software development for computational biology and bioinformatics. Genome Biol 5, R80. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gerstberger S, Hafner M, and Tuschl T (2014). A census of human RNA-binding proteins. Nat Rev Genet 15, 829–845. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Glisovic T, Bachorik JL, Yong J, and Dreyfuss G (2008). RNA-binding proteins and posttranscriptional gene regulation. FEBS Lett 582, 1977–1986. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gutschner T, and Diederichs S (2012). The hallmarks of cancer: a long non-coding RNA point of view. RNA Biol 9, 703–719. [DOI] [PMC free article] [PubMed] [Google Scholar]
- He C, Sidoli S, Warneford-Thomson R, Tatomer DC, Wilusz JE, Garcia BA, and Bonasio R (2016). High-Resolution Mapping of RNA-Binding Regions in the Nuclear Proteome of Embryonic Stem Cells. Mol Cell 64, 416–430. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hendrickson DG, Kelley DR, Tenen D, Bernstein B, and Rinn JL (2016). Widespread RNA binding by chromatin-associated proteins. Genome Biol 17, 28. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hock J, Weinmann L, Ender C, Rudel S, Kremmer E, Raabe M, Urlaub H, and Meister G (2007). Proteomic and functional analysis of Argonaute-containing mRNA-protein complexes in human cells. EMBO Rep 8, 1052–1060. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kishore S, Luber S, and Zavolan M (2010). Deciphering the role of RNA-binding proteins in the post-transcriptional control of gene expression. Brief Funct Genomics 9, 391–404. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kung JT, Kesner B, An JY, Ahn JY, Cifuentes-Rojas C, Colognori D, Jeon Y, Szanto A, del Rosario BC, Pinter SF, et al. (2015). Locus-specific targeting to the X chromosome revealed by the RNA interactome of CTCF. Mol Cell 57, 361–375. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kwon SC, Yi H, Eichelbaum K, Fohr S, Fischer B, You KT, Castello A, Krijgsveld J, Hentze MW, and Kim VN (2013). The RNA-binding protein repertoire of embryonic stem cells. Nat Struct Mol Biol 20, 1122–1130. [DOI] [PubMed] [Google Scholar]
- Lukong KE, Chang KW, Khandjian EW, and Richard S (2008). RNA-binding proteins in human genetic disease. Trends Genet 24, 416–425. [DOI] [PubMed] [Google Scholar]
- Lunde BM, Moore C, and Varani G (2007). RNA-binding proteins: modular design for efficient function. Nat Rev Mol Cell Biol 8, 479–490. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Meister G, Landthaler M, Peters L, Chen PY, Urlaub H, Luhrmann R, and Tuschl T (2005). Identification of novel argonaute-associated proteins. Curr Biol 15, 2149–2155. [DOI] [PubMed] [Google Scholar]
- Moore MJ (2005). From birth to death: the complex lives of eukaryotic mRNAs. Science 309, 1514–1518. [DOI] [PubMed] [Google Scholar]
- Mullari M, Lyon D, Jensen LJ, and Nielsen ML (2017). Specifying RNA-Binding Regions in Proteins by Peptide Cross-Linking and Affinity Purification. J Proteome Res 16, 2762–2772. [DOI] [PubMed] [Google Scholar]
- Ohlsson R, Bartkuhn M, and Renkawitz R (2010). CTCF shapes chromatin by multiple mechanisms: the impact of 20 years of CTCF research on understanding the workings of chromatin. Chromosoma 119, 351–360. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pancaldi V, and Bahler J (2011). In silico characterization and prediction of global protein-mRNA interactions in yeast. Nucleic Acids Res 39, 5826–5836. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Perez-Perri JI, Rogell B, Schwarzl T, Stein F, Zhou Y, Rettel M, Brosig A, and Hentze MW (2018). Discovery of RNA-binding proteins and characterization of their dynamic responses by enhanced RNA interactome capture. Nat Commun 9, 4408. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Queiroz RML, Smith T, Villanueva E, Marti-Solano M, Monti M, Pizzinga M, Mirea DM, Ramakrishna M, Harvey RF, Dezi V, et al. (2019). Comprehensive identification of RNA-protein interactions in any organism using orthogonal organic phase separation (OOPS). Nat Biotechnol 37, 169–178. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ray D, Kazan H, Cook KB, Weirauch MT, Najafabadi HS, Li X, Gueroussov S, Albu M, Zheng H, Yang A, et al. (2013). A compendium of RNA-binding motifs for decoding gene regulation. Nature 499, 172–177. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Roth A, and Diederichs S (2015). Molecular biology: Rap and chirp about X inactivation. Nature 521, 170–171. [DOI] [PubMed] [Google Scholar]
- Ruepp a., Waegele B, Lechner M, Brauner B, Dunger-Kaltenbach I, Fobo G, Frishman G, Montrone C, and Mewes HW (2010). CORUM: the comprehensive resource of mammalian protein complexes−−2009. Nucleic Acids Res 38, D497–501. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rusin SF, Schlosser KA, Adamo ME, and Kettenbach AN (2015). Quantitative phosphoproteomics reveals new roles for the protein phosphatase PP6 in mitotic cells. Sci Signal 8, rsl2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Saldana-Meyer R, Gonzalez-Buendia E, Guerrero G, Narendra V, Bonasio R, Recillas-Targa F, and Reinberg D (2014). CTCF regulates the human p53 gene through direct interaction with its natural antisense transcript, Wrap53. Genes Dev 28, 723–734. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schmitt AM, and Chang HY (2016). Long Noncoding RNAs in Cancer Pathways. Cancer Cell 29, 452–463. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schneider CA, Rasband WS, and Eliceiri KW (2012). NIH Image to ImageJ: 25 years of image analysis. Nat Methods 9, 671–675. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Senko MW, Remes PM, Canterbury JD, Mathur R, Song Q, Eliuk SM, Mullen C, Earley L, Hardman M, Blethrow JD, et al. (2013). Novel parallelized quadrupole/linear ion trap/Orbitrap tribrid mass spectrometer improving proteome coverage and peptide identification rates. Anal Chem 85, 11710–11714. [DOI] [PubMed] [Google Scholar]
- Smedley D, Haider S, Durinck S, Pandini L, Provero P, Allen J, Arnaiz O, Awedh MH, Baldock R, Barbiera G, et al. (2015). The BioMart community portal: an innovative alternative to large, centralized data repositories. Nucleic Acids Res 43, W589–598. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Smirnov A, Forstner KU, Holmqvist E, Otto A, Gunster R, Becher D, Reinhardt R, and Vogel J (2016). Grad-seq guides the discovery of ProQ as a major small RNA-binding protein. Proc Natl Acad Sci U S A 113, 11591–11596. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sun S, Del Rosario BC, Szanto A, Ogawa Y, Jeon Y, and Lee JT (2013). Jpx RNA activates Xist by evicting CTCF. Cell 153, 1537–1551. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sundararaman B, Zhan L, Blue SM, Stanton R, Elkins K, Olson S, Wei X, Van Nostrand EL, Pratt GA, Huelga SC, et al. (2016). Resources for the Comprehensive Discovery of Functional RNA Elements. Mol Cell 61, 903–913. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Szklarczyk D, Franceschini A, Wyder S, Forslund K, Heller D, Huerta-Cepas J, Simonovic M, Roth A, Santos A, Tsafou KP, et al. (2015). STRING v10: protein-protein interaction networks, integrated over the tree of life. Nucleic Acids Res 43, D447–452. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Teif VB, Vainshtein Y, Caudron-Herger M, Mallm JP, Marth C, Hofer T, and Rippe K (2012). Genome-wide nucleosome positioning during embryonic stem cell development. Nat Struct Mol Biol 19, 1185–1192. [DOI] [PubMed] [Google Scholar]
- Theillet FX, Kalmar L, Tompa P, Han KH, Selenko P, Dunker AK, Daughdrill GW, and Uversky VN (2013). The alphabet of intrinsic disorder: I. Act like a Pro: On the abundance and roles of proline residues in intrinsically disordered proteins. Intrinsically Disord Proteins 1, e24360. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Treiber T, Treiber N, Plessmann U, Harlander S, Daiss JL, Eichner N, Lehmann G, Schall K, Urlaub H, and Meister G (2017). A Compendium of RNA-Binding Proteins that Regulate MicroRNA Biogenesis. Mol Cell 66, 270–284 e213. [DOI] [PubMed] [Google Scholar]
- Trendel J, Schwarzl T, Horos R, Prakash A, Bateman A, Hentze MW, and Krijgsveld J (2018). The Human RNA-Binding Proteome and Its Dynamics during Translational Arrest. Cell. [DOI] [PubMed] [Google Scholar]
- Urdaneta EC, Vieira-Vieira CH, Hick T, Wessels HH, Figini D, Moschall R, Medenbach J, Ohler U, Granneman S, Selbach M, et al. (2019). Purification of cross-linked RNA-protein complexes by phenol-toluol extraction. Nat Commun 10, 990. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vizcaino JA, Deutsch EW, Wang R, Csordas A, Reisinger F, Rios D, Dianes JA, Sun Z, Farrah T, Bandeira N, et al. (2014). ProteomeXchange provides globally coordinated proteomics data submission and dissemination. Nat Biotechnol 32, 223–226. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ziebarth JD, Bhattacharya A, and Cui Y (2013). CTCFBSDB 2.0: a database for CTCF-binding sites and genome organization. Nucleic Acids Res 41, D188–194. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium (Vizcaino et al., 2014) through the PRIDE partner repository. PX accession number PXD010484. The dataset of raw images is available through Mendeley Data at http://dx.doi.org/10.17632/7gzxpzp9fj.l.







