Abstract
RNA-binding proteins (RBPs) regulate diverse cellular processes by dynamically interacting with RNA targets. However, effective methods to capture both stable and transient interactions between RBPs and their RNA targets are still lacking, especially when the interaction is dynamic or samples are limited. Here we present an assay of reverse transcription-based RBP binding site sequencing (ARTR-seq), which relies on in situ reverse transcription of RBP-bound RNAs guided by antibodies to identify RBP binding sites. ARTR-seq avoids ultraviolet crosslinking and immunoprecipitation, allowing for efficient and specific identification of RBP binding sites from as few as 20 cells or a tissue section. Taking advantage of rapid formaldehyde fixation, ARTR-seq enables capturing the dynamic RNA binding by RBPs over a short period of time, as demonstrated by the profiling of dynamic RNA binding of G3BP1 during stress granule assembly on a timescale as short as 10 minutes.
Subject terms: RNA sequencing, RNA
This work introduces ARTR-seq for in situ measurement of RNA-binding protein (RBP) binding sites, which has been demonstrated in a small number of cells and for capturing dynamic RBP binding within short timeframes.
Main
RBPs dynamically interact with their RNA targets to regulate RNA fate in all aspects, including transcription, splicing, modification, localization, translation and degradation1. The dysfunction of RBPs or their binding to RNA substrates can lead to various defects or even diseases. Effective methods to capture RBP–RNA interactions, particularly dynamic or even transient interactions, are critical for a better understanding of RBP and its functional effect on target RNAs2.
The widely used approaches to identify RBP targets are based on immunoprecipitation (IP) of the specific RBP along with its bound RNAs, either through direct RNA IP (RIP) or crosslinking IP (CLIP) assisted by covalent capture3–15. Substrate RNAs bound by a specific RBP can be enriched through either RIP or CLIP using the antibody against the RBP, followed by high-throughput sequencing (seq) to profile RBP targets across the whole transcriptome. CLIP-seq captures RBP binding sites on substrate RNAs via covalent crosslinking. RNase treatment digests RBP-free regions of RNAs, increasing the resolution of binding site detection7–10,14,15. CLIP-seq variants such as PAR-CLIP or eCLIP improve the crosslinking efficiency, specificity or binding site resolution7,9. While effective and widely used, these methods also have limitations. They often require a large amount of starting materials due to the low IP efficiency; the ultraviolet (UV) crosslinking in CLIP-based methods is a low-efficiency chemical reaction. Recently reported tRIP-seq and LACE-seq can be applied in low-input samples but at the cost of reducing the library complexity12,13.
TRIBE and STAMP type approaches fuse RBPs with an RNA base editor to introduce mutations nearby RBP binding sites, bypassing IP to identify RBP binding sites16–21. These methods could be readily applied to study RBP binding in live cells and with limited materials down to single-cell level. Their deployments into research have offered new opportunities; however, these editing-based methods still have limitations. They require genome manipulation by inserting base editing proteins in germlines or cell lines, hindering their application in primary cells and tissues. Inducing editing protein expression typically takes roughly 24 hours or longer, which cannot be applied to monitor dynamic RNA binding by RBPs. These base editors have their own sequence preferences, potentially changing the native binding profile of the target RBP. While we were working on our method, RT&Tag, a method derived from the CUT&Tag strategy, was published22,23. This method profiles RBP–RNA interaction by oligo(dT) primer-initiated reverse transcription (RT) and Tn5 tagmentation of the resulting full-length RNA–complementary DNA (cDNA) heteroduplex in isolated nuclei. RT&Tag can identify RBP binding in polyadenylated RNAs but is ineffective in nonpolyadenylated RNAs and cytoplasmic RBP binding. Due to the low efficiency of the Tn5 enzyme on heteroduplex, it requires 25,000–100,000 nuclei to obtain sufficient transcriptome-wide binding signals.
To overcome the limitations of existing methods, we introduce an assay of RT-based RBP binding site sequencing (ARTR-seq) to capture RBP–RNA interactions through in situ RT. We demonstrate that ARTR-seq sensitively profiles RBP targets with good sequencing quality, using as few as 20 cells or a single tissue section. Additionally, an imaging step can be readily built into the ARTR-seq procedure, providing direct spatial information of RBPs. With ARTR-seq, we show distinct binding patterns of splicing factors and the YTH family reader proteins of RNA N6-methyladenosine (m6A) modification. ARTR-seq unbiasedly detects RNA binding by RBPs in both cytoplasm and nucleus and measures RBP binding strength on RNA substrates. Furthermore, ARTR-seq could monitor dynamic RNA binding by G3BP1 during stress granule (SG) assembly on a small timescale of 10 minutes.
Results
Strategy and development of ARTR-seq
In ARTR-seq, we started with rapid formaldehyde fixation to preserve the cellular structure, followed by permeabilization of cell membranes (Fig. 1a(i)). We then targeted the reverse transcriptase (RTase) to the RBP of interest using corresponding antibodies (Fig. 1a(ii)). This involved delivering the primary antibody for RBP recognition (Fig. 1a(ii)1), followed by a secondary antibody to enhance the local antibody concentration, capitalizing on the potential for multiple secondary antibodies to bind a single primary antibody (Fig. 1a(ii)2). Subsequently, a fusion protein of protein A/G and RTase (pAG-RTase) was delivered to bind both primary and secondary antibodies, enabling site-specific attachment of RTase to the target RBP (Fig. 1a(ii)3). Each step was followed by thorough washing to remove any unbound antibodies or pAG-RTase.
After localizing RTase to the RBP, we initiated in situ RT at RBP binding sites by adding necessary RT components (Fig. 1a(iii)). To achieve efficient RT, we screened three commonly used RTases, including engineered Moloney murine leukemia virus (MMLV) RTase24,25, human immunodeficiency virus RTase and a truncated version of engineered MMLV RTase (25–497) in the pAG-RTase fusion constructs with a 30-amino-acid linker (Extended Data Fig. 1a,b). By employing RT with quantitative polymerase chain reaction (RT–qPCR), we confirmed pAG-MMLV RTase (25–497) as the most active and selected it for subsequent studies (Fig. 1b and Extended Data Fig. 1c).
To identify all RBP binding sites without sequence bias, we applied random RT primers with an adapter tag for library construction, and extended the primer length from commonly used 6 nucleotides (nts) to 10 nts to enhance RT efficiency (Extended Data Fig. 1d). For effective cDNA enrichment, biotinylated dNTPs were introduced into cDNA products. After screening, we found that biotin-16-dUTP and biotin-16-dCTP exhibited the least hindrance on RT efficiency (Extended Data Fig. 1e). These were included in a 1:1 ratio with regular dTTP and dCTP, respectively, in the current ARTR-seq protocol. Following cDNA enrichment with streptavidin beads, we performed adapter ligation, library amplification and high-throughput sequencing to acquire the RBP binding profile (Fig. 1a(iv)). Note that after in situ RT, immunofluorescence imaging could be performed to reveal RBP subcellular localization without disturbing the subsequent library construction if the secondary antibody and pAG-RTase are fluorophore-modified.
Validation of ARTR-seq using PTBP1
To evaluate ARTR-seq, we applied ARTR-seq to PTBP1, a well-studied splicing factor with a variety of published CLIP-seq datasets for comparison. To verify the production of biotinylated cDNAs from in situ RT, we monitored the biotin group in the cDNA products by dot plot, confirming the incorporation of biotin and requirements of pAG-RTase and primary antibody for successful cDNA synthesis (Fig. 1c). With immunofluorescence staining, we further validated the colocalization of pAG-RTase, the secondary antibody and newly synthesized cDNA, and their signals largely disappeared on exclusion of the primary antibody, supporting the localized RT reaction performed by pAG-RTase tethered to the targeted RBP (Fig. 1d and Extended Data Fig. 1f). Note that the use of the secondary antibody increased the biotinylated cDNA yield (Fig. 1d and Extended Data Fig. 1f,g). Altogether, ARTR-seq specifically and effectively reverse transcribes RNAs near the targeted protein into biotinylated cDNA products.
We next tested ARTR-seq on PTBP1 using 40,000 HepG2 or HeLa cells, and compared the results with the published data from several known methods, namely CLIP, iCLIP, irCLIP, eCLIP, sCLIP, tRIP, LACE-seq and RT&Tag9–13,22,26,27. We observed that ARTR-seq displayed a comparable or higher percentage of usable reads compared to published methods, indicating a high complexity of the ARTR-seq libraries (Extended Data Fig. 2a,b). Then, we calculated the correlation between biological replicates (R = 0.98 for both HepG2 and HeLa samples), and confirmed good reproducibility of ARTR-seq (Fig. 2a).
Further, we introduced input samples prepared by ARTR-seq with the omission of the primary antibody as controls to help filter out potential background signals from the nonspecific binding of the secondary antibody and RTase (Extended Data Fig. 2c). For PTBP1, we found that over 70% of usable reads and over 80% of ARTR-seq peaks were annotated to introns, with most exon peaks located within the 3′ untranslated region (3′ UTR), consistent with results reported by other methods10,12,13,26–29 (Fig. 2b and Extended Data Fig. 2d,e). The consensus motif of PTBP1 ARTR-seq peaks was identified as the canonical CU-enriched sequence, as known previously30 (Fig. 2b). At the whole-transcriptome scale, ARTR-seq reads for PTBP1 piled up at the eCLIP peaks, while the input sample did not show such accumulation31 (Extended Data Fig. 3a,b). Additionally, we observed that more than 50% of genes identified by ARTR-seq were also detected by other methods (52% for eCLIP, 51% for LACE-seq and 82% for iCLIP). At the peak level, ARTR-seq successfully identified 41% of eCLIP-targeted peaks (Extended Data Fig. 3c). Examination of individual PTBP1 binding sites revealed similar read distribution and density between ARTR-seq and eCLIP or iCLIP results (Fig. 2c and Extended Data Fig. 3d). To further validate PTBP1 bindings captured by ARTR-seq, we knocked down PTBP1 in HepG2 cells using two distinct small-interfering RNAs (siRNAs) and performed ARTR-seq (Extended Data Fig. 3e). The reads located around the ARTR-seq peaks reduced accordingly on PTBP1 knockdown, indicating the high specificity of ARTR-seq (Fig. 2d).
Direct versus indirect binding sites detected by ARTR-seq
ARTR-seq identifies RBP binding by in situ RT, enabling the capture of RNAs directly bound by the RBP (direct targets) or potentially those spatially close to the RBP (indirect targets) (Extended Data Fig. 4a). To evaluate direct versus indirect targets, we used the splicing factor RBFOX2 as an example; RBFOX2 possesses a well-defined canonical binding motif ‘UGCAUG’9,31. Peaks near the UGCAUG motifs likely represent direct targets, while those farther away may indicate indirect targets. We found more than 70% of ARTR-seq peaks were within 500 nts from UGCAUG. This percentage is slightly higher than that of eCLIP9. The two methods were comparable when the distance from peaks to UGCAUG was within 200 nts (Extended Data Fig. 4b). It is worth noting that RBFOX2 may have other noncanonical binding sites beyond the UGCAUG motif, as suggested by the similar percentage of distant RBFOX2 eCLIP peaks from this motif. Stringent cutoffs of signal values and q values for peaks increased confidence in identifying the direct targets, albeit at the expense of target numbers (Extended Data Fig. 4c,d). Furthermore, we also examined YTHDF2, an m6A binding protein32. Approximately 80% of YTHDF2 ARTR-seq peaks were within 300 nts from m6A sites identified by m6A-SAC-seq33, comparable to that from the PAR-CLIP method32 (Extended Data Fig. 4e). These results indicate that the indirect interactions captured in ARTR-seq are likely limited. The percentage of direct targets identified by ARTR-seq is comparable to those observed in CLIP-based methods.
To further interrogate potential indirect targets identified in ARTR-seq, we limited the movement range of RTase by shortening the linker in pAG-RTase or omitting the secondary antibody (Extended Data Fig. 5a–c). We found shorter linkers reduced RT activity of pAT-RTase, indicating that shorter linkers might lead to a slowdown in the RTase kinetics (Extended Data Fig. 5d). In RBFOX2 ARTR-seq, the use of shorter linkers or omitting the secondary antibody resulted in decreased biotinylated cDNA yields but slightly increased read accumulation at RBFOX2 ARTR-seq peaks, indicating reduced RT efficiency but concentrated signals (Extended Data Fig. 5e–g). Moreover, we observed a little higher percentage (1.9–3.4%) of peaks within 500 nts of UGCAUG with a shorter linker or omitting the secondary antibody (Extended Data Fig. 5h). These findings indicate that restricting the RTase movement range tested here moderately reduced potential indirect RNAs captured by ARTR-seq. Optimal RT efficiency is another factor that needs to be considered when designing linkers.
Resolution of ARTR-seq
To assess the resolution of ARTR-seq, we examined the distribution of RBFOX2 peak centers around UGCAUG sites, and observed a clear enrichment with most peaks positioned within 200 nts flanking the UGCAUG motif (Extended Data Fig. 6a). Furthermore, we conducted a parallel analysis on YTHDF2. Compared to RBFOX2, we observed a similar but more enriched distribution for YTHDF2 around the corresponding m6A sites, further supporting the capability of ARTR-seq in capturing RBP binding sites (Extended Data Fig. 6b).
In an attempt to improve the resolution of binding site identification by ARTR-seq, we evaluated the impact of RNase treatment on RBFOX2 ARTR-seq. As expected, the stronger RNase treatment reduced the library fragment lengths (Extended Data Fig. 6c). We observed that the stronger RNase treatment led to a sharper enrichment of RBFOX2 ARTR-seq peaks around UGCAUG sites, indicating an improved resolution upon RNase treatment (Extended Data Fig. 6d). Through quantification of biotinylated cDNA, we found that samples with stronger RNase treatment exhibited lower RT efficiency (Extended Data Fig. 6e). Moreover, stronger RNase treatment markedly reduced the proportion of peaks located within 500 nts of the canonical UGCAUG motif. This suggests that the application of RNase may reduce reads from direct targets, thereby potentially elevating the ratio of nonspecific or indirect binding signals (Extended Data Fig. 6f). Overall, our studies revealed that RNase treatment could improve ARTR-seq resolution. The strength of RNase treatment in ARTR-seq needs to be optimized to achieve the desired balance between resolution and sensitivity, especially for samples with limited starting materials.
ARTR-seq detects PTBP1 binding sites with as few as 20 cells
The in situ RT-based ARTR-seq bypasses the IP step to minimize sample loss, potentially making it feasible for low cell number samples. To test this, we generated libraries for PTBP1 using different numbers of HepG2 cells and compared the results with published data from LACE-seq and RT&Tag of low cell number samples13,22. The correlations remained strong for ARTR-seq libraries prepared from as few as 20 cells (Extended Data Fig. 7a). Additionally, ARTR-seq libraries exhibited a much higher percentage of usable reads compared to other methods when using comparable numbers of cells (Fig. 2e and Extended Data Fig. 7b,c). Furthermore, PTBP1 ARTR-seq presented a consistently high percentage of intronic reads, suggesting its effectiveness in capturing informative reads even with the limited starting materials (Extended Data Fig. 7d). We further subsampled libraries to an equal sequencing depth and examined their reads distribution at peaks identified in the corresponding bulk samples. Compared to LACE-seq, ARTR-seq exhibited a clearer accumulation at the peak center with a higher proportion of effective reads (Fig. 2f and Extended Data Fig. 7e). Visible ARTR-seq signal remained stable for libraries with different numbers of cells as exemplified in the Integrative Genomics Viewer (IGV) plot (Fig. 2g).
Because PTBP1 binds to a canonical CU-enriched sequence, we compared the CT percentages in usable reads of PTBP1 libraries constructed by different methods. We found that all the ARTR-seq libraries showed comparable or higher CT percentages compared to that of other methods10,13,26–28 (Fig. 2h). We further assessed the read distribution around CU-enriched regions and observed the stable read accumulation in ARTR-seq libraries of all cell numbers, peaking at the region center (Fig. 2i). Taken together, ARTR-seq can effectively and specifically capture the RBP binding sites, even with limited starting materials.
Application of ARTR-seq in mouse embryo sections
RBPs can have strong tissue-specific expression, or are only expressed in certain tissues rather than cultured cells. Identifying RBP binding sites in tissues remains technically challenging34. IP-based methods require dissociating tissues into single cells for UV crosslinking, limiting their application to whole tissues, particularly embedded frozen tissues or formalin-fixed tissues. Editing-based methods require genetic modification and cannot be applied to patient tissues.
ARTR-seq offers an opportunity for identifying RBP binding sites in tissues. We studied RBFOX2 with a section of OCT-embedded E11 mouse embryo to validate the feasibility of ARTR-seq in tissue samples (Fig. 3a). We first confirmed the nuclear localization of RBFOX2 with the ARTR-seq built-in imaging (Fig. 3b). The ARTR-seq reads for mouse embryo tissue showed a high percentage of usable reads and good reproducibility between biological replicates (Supplementary Fig. 1a,b). Compared to the input, a higher percentage of usable reads from RBFOX2 ARTR-seq were mapped to introns, consistent with the known binding preference of RBFOX2 (ref. 31) (Supplementary Fig. 1c). RBFOX2 binding peaks were mostly located in introns and contained the canonical UGCAUG motif9 (Fig. 3c). Additionally, we observed that mouse tissue samples displayed a similar percentage of usable reads containing UGCAUG motifs to that of HepG2 cell samples, indicating comparable signal detection efficiency of ARTR-seq for tissues and cultured cells (Fig. 3d). Examination of individual binding sites further supported the recognition of UGCAUG by RBFOX2 (Fig. 3e). Overall, ARTR-seq can identify RBP binding sites in embedded tissue samples with high specificity.
ARTR-seq profiles regulatory features of splicing factors
PTBP1 and RBFOX2 are well-known splicing factors, with PTBP1 belonging to the heterogeneous ribonucleoprotein (hnRNP) family35. To show broader applicability of ARTR-seq, we also studied HNRNPC, another splicing factor belonging to the hnRNP family (Extended Data Fig. 8a). Consistent with the binding preference of the splicing factors, both reads (over 70%) and peaks (over 80%) from the ARTR-seq libraries of all three splicing factors (PTBP1, HNRNPC and RBFOX2) were mainly located in introns in HepG2 cells (Fig. 4a,b and Extended Data Fig. 8b). The RNA-binding motifs of RBFOX2 and HNRNPC were the canonical UGCAUG and U-rich sequences, respectively, consistent with the previous report31 (Fig. 4a,b).
To explore the association between splicing factor binding and splicing regulation, we identified the alternative splicing events by comparing the ENCODE (Encyclopedia of DNA Elements) RNA sequencing (RNA-seq) data from RBP-knockdown (KD) cells with those from control cells36. We found most alternative splicing events were categorized as exon skipping (Fig. 4c). We then generated ‘splicing maps’ for exon skipping events37 (Fig. 4d). The corresponding ARTR-seq peaks were predominantly enriched at upstream proximal introns of the included exons upon RBP-KD, at downstream proximal introns of the excluded exons upon RBFOX2-KD and at both upstream and downstream proximal introns of the included exons upon HNRNPC-KD, but not around native cassette exons and constitutive exons. We quantified relative RBP binding strength by ARTR-seq enrichment at the gene level, and observed that genes with higher ARTR-seq enrichment tend to present a higher splicing difference upon RBP-KD (Fig. 4e and Extended Data Fig. 8c). In addition to exon skipping, the number of included retained introns upon PTBP1-KD (491 events) outnumbered other splicing modes. With further inspection, we found that higher enrichment corresponded to higher splicing inclusion differences of retained introns, similar to the trend observed for exon skipping instances (Extended Data Fig. 8d). Altogether, ARTR-seq robustly captures distinctive binding patterns for different splicing factors, and the ARTR-seq enrichment could indicate differences in splicing.
ARTR-seq identifies binding features of m6A reader proteins
In addition to sequence recognition, RBPs can also target RNAs in a chemical modification-dependent manner. m6A modification is the most prevalent chemical modification in mammalian messenger RNA (mRNA), and m6A reader proteins can preferentially bind m6A-modified RNAs to regulate its processing and metabolism in both the nucleus and cytoplasm32,38–41. We performed ARTR-seq for two cytosolic m6A readers YTHDF1 and YTHDF2, and a nuclear reader YTHDC1 in HeLa cells.
We first verified the subcellular localization of the three readers with ARTR-seq built-in imaging (Extended Data Fig. 9a). Sequencing data from ARTR-seq remained highly reproducible between replicates (Extended Data Fig. 9b). Over 80% of the peaks of the two cytoplasmic m6A readers (YTHDF1 and YTHDF2) were located in exons, whereas roughly 81% of the peaks of nuclear reader YTHDC1 were located in introns or intergenic regions, consistent with their distinct subcellular localization (Fig. 5a and Extended Data Fig. 9a,c). The high unique peak ratios observed for the three reader proteins (84.2% for YTHDC1, 34.3% for YTHDF1 and 47.5% for YTHDF2) are attributed to their unique subcellular localization; YTHDF1 and YTHDF2 display different sequences of the N-terminal low-complexity domains, which most likely affect their binding to different partner proteins and therefore different RNA targets42 (Extended Data Fig. 9d). We further investigated the much more abundant non-exonic peaks of YTHDC1, and found more than half of them located in repeat elements, with long interspersed nuclear elements (roughly 45%) being the most prevalent, consistent with a previous report41 (Fig. 5b). Analysis of exonic peak distribution along mRNA showed enrichment around stop codons for all these m6A readers, resembling the meta profile of m6A modifications, especially for YTHDF1 and YTHDF2 (ref. 33) (Fig. 5c and Extended Data Fig. 9e).
Further, we calculated the percentage of exonic peaks overlapping with m6A sites in polyadenylated RNAs identified by m6A-SAC-seq (ref. 33). The ARTR-seq peaks for all three readers showed higher percentages than random peaks, comparable to the YTHDF2 peaks from PAR-CLIP32, supporting the m6A-dependent binding features of these three readers (Fig. 5d). We then analyzed the association between the m6A fraction and RBP binding strength, and observed that the group with higher m6A fractions showed higher RBP enrichment signals for YTHDF1 and YTHDF2, further suggesting ARTR-seq can measure the relative binding strength of RBPs (Fig. 5e). However, the association for YTHDC1 was weaker, potentially due to the limited number of exonic YTHDC1 peaks (Extended Data Fig. 9f). Overall, ARTR-seq captures different features of three m6A binding proteins in cytoplasm and nucleus.
Dynamic RNA binding of G3BP1 during SG assembly
SGs are membraneless organelles composed of proteins and RNAs and formed in response to stress. The RBP G3BP1 is the central node in the network of protein–RNA interaction during SG assembly43,44. Under sodium arsenite (NaAsO2) treatment, SGs could be observed after 13 min with a progressive increase in size over time, with most of the SG assembly completed by 40 min, providing a rapid stress response45. However, whether RNA targets of G3BP1 vary during SG assembly has yet to be investigated.
Taking advantage of the potential high temporal resolution offered by fast formaldehyde fixation and low material requirements of ARTR-seq, we performed ARTR-seq for G3BP1 in HeLa cells with 0.5 mM NaAsO2 treatment and monitored the SG assembly process at time intervals of 0, 10, 20 and 60 min poststress. We first visualized G3BP1 localization using immunofluorescence imaging, and confirmed the gradual condensation of G3BP1 into granules over time (Fig. 6a). The colocalization of G3BP1 and biotinylated cDNA products was further verified (Fig. 6b). Subsequently, the verified samples were used for ARTR-seq library construction and sequencing. We determined G3BP1 binding strength by calculating the ARTR-seq log2 fold change (log2FC) between G3BP1 and input samples at the gene level. Roughly 78% of G3BP1–RNA targets (log2FC ≥ 1, P < 0.05) were no longer enriched at 60 min (T60) post-NaAsO2 treatment (Fig. 6c). SG enrichment of RNA was previously assessed by sequencing RNAs isolated from NaAsO2-induced SGs to quantify their relative localization within SGs46. Through integrative analysis, we observed that G3BP1 targets at T60 showed notably higher SG enrichment compared to the starting point without stress (Fig. 6d). These results support the accuracy of ARTR-seq and revealed distinct RNA binding of G3BP1 in the presence and absence of stress. The functions of stress-induced G3BP1 targets (T60_only) were enriched to Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways of protein processing in the endoplasmic reticulum and human papillomavirus infection, consistent with previous observations47,48 (Fig. 6e).
To further explore the dynamic RNA targeting of G3BP1 over time, we calculated pairwise correlations of the G3BP1 binding strength among time points. The generally low correlation coefficients (R = 0.38–0.57) suggested distinct G3BP1 bindings at different time intervals (Extended Data Fig. 10a). RNAs were previously classified into SG-enriched RNAs and SG-depleted RNAs according to their SG enrichment46. We found that during SG assembly, G3BP1 binding strength from ARTR-seq gradually increased for SG-enriched RNAs and decreased for SG-depleted RNAs, suggesting a shift of G3BP1 targets toward SG-enriched RNAs (Fig. 6f,g). Some RNAs displayed stable G3BP1 binding, while others showed dynamic G3BP1 binding across time intervals (Fig. 6h and Extended Data Fig. 10b,c). We then grouped these RNAs based on G3BP1 binding strength using the fuzzy c-means clustering algorithm. We found that G3BP1 binding strength for these RNAs displayed not only unidirectional trajectories of increasing or decreasing, but also transient changes during 60 minutes of NaAsO2 treatment, suggesting rapid and dynamic cellular responses to stress (Fig. 6h,i and Extended Data Fig. 10d). Taken together, ARTR-seq unveiled the highly dynamic nature of G3BP1–RNA interactions during SG assembly, demonstrating its capability in tracking temporal changes of protein–RNA interactions with limited starting materials.
Discussion
In this work, we present ARTR-seq, a method that captures RBP binding sites using in situ RT by antibody-located RTase. ARTR-seq demonstrated high sensitivity and specificity, even when using as few as 20 cells or limited tissues. The procedure is compatible with immunofluorescence imaging, providing direct spatial information of the targeted proteins without affecting downstream sequencing. With ARTR-seq, we observed the unique binding characteristics of PTBP1, RBFOX2 and HNRNPC related to their splicing regulatory roles. ARTR-seq also detected the preferences of m6A reader proteins, YTHDF1, YTHDF2 and YTHDC1. Furthermore, we showed dynamic RNA binding of G3BP1 during SG assembly.
One advantage of ARTR-seq is the use of in situ RT to bypass the antibody-based IP step, thereby reducing material loss. ARTR-seq is also highly versatile and applicable for cell lines, tissues, and even clinical formaldehyde-fixed samples. Both inspired by CUT&Tag49, ARTR-seq displays distinct advantages compared to the recently reported RT&Tag22. First, ARTR-seq uses random primers to unbiasedly capture local signals, while RT&Tag uses oligo(dT) primer for RT, potentially losing signals from nonpolyadenylated RNAs. Additionally, RT&Tag may experience reduced local resolution due to uniform RT initiation from the poly-A tail and long matured mRNA length (roughly 2,065 bp)50, leading to coverage bias toward the RNA 3′ end. Second, Tn5 tagmentation on the RNA–cDNA heteroduplex is less efficient, hindering its applications when using limited starting materials. Third, ARTR-seq can be applied in various cellular compartments, whereas RT&Tag is limited to the isolated nucleus.
Investigations of dynamic RBP binding have been hindered by low UV-crosslinking efficiency, long incubation time and high material demands using the existing methods. Benefiting from highly efficient formaldehyde crosslinking and low starting material requirements, ARTR-seq excels at capturing transient RBP binding across various time intervals. In this work, we have demonstrated its application in capturing dynamic RNA binding of G3BP1 during SG assembly on a timescale of 10 minutes. We envision that the high temporal resolution of ARTR-seq will enable the investigation of dynamic or even transient RBP–RNA interaction in many other events.
Limitations
The good quality of the primary antibody is a prerequisite for ARTR-seq. For those RBPs without good quality antibodies, ARTR-seq may not accurately capture RBP–RNA interactions. However, the availability of a suitable antibody is a common challenge faced by all antibody-based methods. To overcome this limitation, strategies such as knocking in a tag protein in frame with the targeted RBP or expressing the tagged RBP could be used.
Formaldehyde fixation preserves biological samples at a high temporal resolution, but limitations exist, such as perturbing biomolecular condensates due to the faster protein–protein interaction dynamic than the fixation rate51. Strategies to increase the fixation rate, such as increasing the formaldehyde concentration or moderately raising the fixation temperature, can mitigate such artifacts. Like most other methods, ARTR-seq may face challenges when applied to low-abundance RBPs. Approaches such as increasing starting materials or RBP overexpression could be used. Additionally, unlike the editing-based methods, which are compatible with long-read sequencing, ARTR-seq typically shows short fragment lengths (averaging around 60 bp), hindering the identification of isoform-specific binding patterns (Extended Data Fig. 6c). Last, the linker length needs to be optimized when detecting direct versus indirect targets using ARTR-seq, and RNase treatment could be considered to obtain higher resolution binding sites.
Methods
Cell culture and stress treatment
HeLa cells (American Type Culture Collection (ATCC) catalog no. CCL-2) and HepG2 cells (ATCC, catalog no. HB-8065) were purchased from ATCC and cultured in DMEM medium (Gibco) supplemented with 10% fetal bovine serum (Gibco) and penicillin-streptomycin (Gibco). K562 cells (ATCC, catalog no. CCL-243) were obtained from ATCC and cultured in RPMI 1640 Medium (Gibco) supplemented with 10% (v/v) fetal bovine serum. Penicillin-streptomycin (Gibco) and 2 mM l-glutamine (Gibco). Cells were grown at 37 °C with 5% CO2. For NaAsO2 treatment, HeLa cells were grown to 90% confluence and replaced in the prewarmed DMEM medium containing 0.5 mM NaAsO2, which was further maintained at 37 °C with 5% CO2 for indicated times.
Expression and purification of recombinant protein A/G-RTase
The recombinant plasmids were constructed by assembly of pet28A vector, protein A/G (pAG), linkers of different lengths and RTase or the modified RTase with NEBuilder HiFi DNA Assembly Master Mix (NEB) or USER enzyme (NEB) following the manufacturer’s protocols. The Protein A/G DNA segment was amplified from the pAG/MNase plasmid (Addgene, catalog no. 123461). The engineered MMLV RTase was modified from the pCMV-PE2 plasmid (Addgene, catalog no. 132775). The recombinant proteins were expressed in BL21(DE3) Competent Escherichia coli (NEB) with isopropyl-β-d-thiogalactoside induction at 16 °C for 18 h. Cells were collected by centrifuge at 5,500g for 10 min and lysed in the buffer of 50 mM Tris-HCl pH 7.5, 300 mM NaCl and 1 mM PMSF with sonication at 10 s on and 10 s off setting for 10 min at 4 °C. The recombinant proteins were purified from the supernatant using HisTrap HP column (GE Healthcare), followed by an ion exchange chromatography column (GE Healthcare) on an AKTA Purifier 10 system (GE Healthcare) according to the manufacturer’s protocol, and then concentrated to about 20 mg ml−1. The purified enzyme was supplemented with 40% glycerol and stored at −80 °C for future use.
RT–qPCR
RNA was reverse transcribed with the purified pAG-RTases or commercial RTases in reaction buffer (50 mM Tris-HCl, 150 mM NaCl, pH 7.5) at 37 °C for 15 min, and denatured at 85 °C for 5 min. qPCR was performed with FastStart Essential DNA Green Master (Roche) on LightCycler 96 System (Roche). The efficiency of RT was quantified using the delta quantitation cycle method.
Protein detection by Coomassie brilliant blue stain and western blot
The mammalian cell samples were lysed with cold RIPA buffer (Thermo Fisher Scientific) containing 1× protease inhibitor cocktail (Roche). The cell lysate was cleared with centrifugation at 15,000g for 10 min at 4 °C. The supernatant or purified protein was then mixed with LDS loading buffer (Bio-Rad) and boiled at 95 °C for 10 min. Denatured protein was loaded into 4–12% NuPAGE Bis-Tris gel (Thermo Fisher Scientific). For Coomassie brilliant blue stain, the gel was stained with Imperial Protein Stain (Thermo Fisher Scientific) and imaged by FluroChem R (Proteinsimple). For the western blot, the protein was transferred to the polyvinyl difluoride membrane from the gel. The membranes were blocked in 3% BSA (diluted in PBST (PBS with 0.1% Tween-20)) for 1 h at room temperature, incubated in a 1:1,000 diluted primary antibody solution at 4 °C overnight, washed four times with PBST (PBS with 0.1% Tween-20), and incubated in a 1:5,000 dilution of horseradish peroxidase (HRP)-conjugated secondary antibody for 1 h at room temperature if the primary antibody was not conjugated with HRP. The membranes were supplied with SuperSignal West Dura Extended Duration Substrate kit (Thermo Fisher Scientific) and imaged on the FluroChem R machine (Proteinsimple). Quantification was performed using ImageJ software (v.2.3.0).
Transfection
PTBP1 siRNA was purchased from Horizon Discovery/Dharmacon. Cells were seeded in 30% confluency. After incubation for 12 h, siRNA was transfected with RNAimax (Thermo Fisher Scientific) following the manufacturer’s manual. The fresh medium was changed at 6 h posttransfection. Cells were cultured for another 48 h, and the protein knockdown efficiency was quantified by western blot.
ARTR-seq
Cells were fixed to an imaging-compatible chamber with 1.5% paraformaldehyde (PFA) at room temperature for 10 min. To mitigate cell loss, 1.5% PFA crosslinking was applied instead of the commonly used 1% PFA crosslinking. The samples were then quenched with 125 mM glycine at room temperature for 5 min, washed twice with Dulbecco’s PBS (DPBS) and permeabilized with 0.5% Triton X-100 in DPBS on ice for 10 min. Each DPBS washing step involved 3 min of incubation at room temperature. Next, samples were washed twice with DPBS, blocked with the blocking buffer (1 mg ml−1 UltraPure BSA, 0.2 U μl−1 RNaseOUT in DPBS) at room temperature for 30 min and stained with the diluted primary antibody at room temperature for 1 h. The primary antibody was diluted with blocking buffer according to the manufacturer’s instructions for immunofluorescence or at a 1:200 dilution if no specific guidance was provided. For input samples, the primary antibody diluent was replaced by the blocking buffer. Subsequently, samples were stained with fluorophore-labeled secondary antibody (1:500 diluted in the blocking buffer) at room temperature for 30 min, followed by incubation with pAG-RTase (10 nM in the blocking buffer) for an additional 30 min. Cells were washed three times with DPBS after each staining step by shaking at room temperature for 3 min.
An RT reaction mixture was prepared by mixing 2 μM adapter-RT primer (5′-AGACGTGTGCTCTTCCGATCTNNNNNNNNNN-3′), 0.05 mM biotin-16-dUTP (Jena Bioscience), 0.05 mM biotin-16-dCTP (Jena Bioscience), 0.05 mM dTTP (Thermo Fisher Scientific), 0.05 mM dCTP (Thermo Fisher Scientific), 0.1 mM dATP (Thermo Fisher Scientific), 0.1 mM dGTP (Thermo Fisher Scientific), 1 U μl−1 RNaseOUT (Thermo Fisher Scientific) in 50 μl buffer of DPBS supplemented with 3 mM MgCl2. In situ RT was performed by immersing cells with the RT reaction mixture and incubating at 37 °C for 30 min, then stopped by adding 20 mM EDTA and 10 mM EGTA and incubating at room temperature for 3 min.
Next, cells were stained with 1:200 diluted biotin monoclonal antibody (BK-1/39), alexa fluor 488 (Thermo Fisher Scientific) in DPBS by incubation at room temperature for 1 h, followed by staining with 1 μg ml−1 Hoechst 33342 dye (Thermo Fisher Scientific) at room temperature for 15 min. The samples were then imaged by Leica SP8 laser confocal microscope. The fluorescence intensity distribution on a line was quantified by ImageJ software.
After imaging, cells were digested with 1 mg ml−1 proteinase K (Thermo Fisher Scientific) at 37 °C for 2 h. The nucleic acids were recovered by phenol-chloroform extraction (pH 8.0) and concentrated by ethanol precipitation. RNA was digested with 0.2 U μl RNase H (NEB) and 1:20 diluted RNase A/T1 (Thermo Fisher Scientific) in 50 μl of the RNase reaction buffer (50 mM Tris-HCl pH 7.5, 75 mM KCl, 10 mM MgCl2, 10 mM DTT) at 37 °C for 1 h, followed by biotinylated cDNA enrichment using 10 μl preblocked Dynabeads MyOne Streptavidin C1 (Thermo Fisher Scientific) at room temperature for 20 min. The beads were preblocked with 1 μg μl−1 UltraPure BSA (Thermo Fisher Scientific), 1 μg μl−1 UltraPure Salmon Sperm DNA Solution (Thermo Fisher Scientific) and 1 μg μl−1 Yeast transfer RNA (tRNA) (Thermo Fisher Scientific) with incubation at room temperature for 30 min before performing biotinylated cDNA enrichment.
Subsequently, the cDNA adapter ligation mixture was prepared by combining 50 mM Tris-HCl pH 7.5, 10 mM MgCl2, 25% PEG 8000, 1 mM ATP, 1 U μl−1 T4 RNA ligase 1 (NEB), and 5 μM of 3′ cDNA adapter (5′Phos-NNNNNNNNAGATCGGAAGAGCGTCGTGT-3′SpC3). The 3′ cDNA adapter ligation was performed by suspending the beads in the cDNA adapter ligation mixture and incubating at 25 °C for 16 h. The biotinylated cDNA was recovered using an elution buffer composed of 95% (v/v) formamide and 10 mM EDTA (pH 8.0) by boiling at 95 °C for 10 min, followed by ethanol precipitation. The cDNA was then dissolved in 10 μl of water.
For library amplification, 40 μl of mixture was prepared by mixing 1× NEBNext Ultra II Q5 Master Mix (NEB), 10 μl of cDNA solution and 0.5 μM Illumina sequencing primers, such as NEBNext Multiplex Oligos for Illumina (NEB catalog no. E7335S). The library PCR amplification followed this program: 98 °C for 30 s (98 °C for 10 s, 60 °C for 30 s, 65 °C for 45 s) for 13 cycles and 65 °C for 5 min; hold at 4 °C. The final libraries were purified using 6% Novex TBE Gel (Thermo Fisher Scientific) with size selection between 180 and 400 bp. Next-generation sequencing was carried out either at the University of Chicago Single Cell Immunophenotyping Core on an Illumina NextSeq 550 machine or Illumina NextSeq 2000 machine, or at the University of Chicago Genomics Facility on an Illumina NovaSeq 6000 platform.
RNase treatment in ARTR-seq
RNase treatment was incorporated into the ARTR-seq procedure with the following modifications: After permeabilization, Cells were incubated with 1 U μl−1 RNase I (Thermo Fisher Scientific) at 37 °C for 5 min, followed by two washes with DPBS. For samples with strong RNase treatment, an additional RNase I treatment was conducted as previously described before initiating RT.
Dot blot
After the proteinase K digestion step in ARTR-seq, the total nucleic acids were recovered with Oligo Clean & Concentrator Kits (Zymo) to get rid of free biotinylated dNTP. The concentration of nucleic acids was measured by Nanodrop 8000 Spectrophotometer and adjusted to 50 ng µl−1. Next, 1 µl of nucleic acids were loaded onto the Amersham Hybond- N+ membrane (GE Healthcare). Membranes were air-dried and crosslinked by UV strata linker 2400 at 150 mJ cm−2 twice. The membranes were then blocked in 5% fatty-acid-free BSA in PBST at room temperature for 1 h, followed by incubation in streptavidin-HRP (Thermo Fisher Scientific) in PBST supplemented with 5% fatty-acid free BSA at room temperature for another 1 h. The membrane was washed with PBST four times before being supplied with SuperSignal West Dura Extended Duration Substrate kit (Thermo Fisher Scientific) and imaged by the FluroChem R machine (Proteinsimple).
ARTR-seq in the mouse embryo
C57 mouse embryo (E11) frozen tissue sections were purchased from Zyagen. The slide with frozen tissue sections was brought to room temperature for 10 min of incubation. The PAP pen was used to draw a circle around the mouse tissue on the slide, providing a thin film-like hydrophobic barrier for reagent incubation. Then the tissue was subjected to typical ARTR-seq procedures with the following change. The 2 μM adapter-barcoded RT primer (5′-AGACGTGTGCTCTTCCGATCT-(8 nt-barcode)-NNNNNNNNNN-3′) was applied for in situ RT.
ARTR-seq with low input
ARTR-seq was applied to 20 to 5,000 HepG2 cells with the following changes. 4% PFA was used to minimize cell loss for low-input samples. The 2 μM adapter-barcoded RT primer (5′-AGACGTGTGCTCTTCCGATCT-(8 nt-barcode)-NNNNNNNNNN-3′) was applied for in situ RT. After digestion of proteinase K, two biological replicates were pooled together for biotinylated cDNA enrichment, adapter ligation, library amplification and library sequencing. Sequence data were isolated based on the 8 nt barcode in adapter-barcoded RT primers.
Genome reference
Genome and the corresponding reference of Homo sapiens (GRCh38.p13, GENCODE Release 39), Mus musculus (GRCm39, GENCODE Release M29) and Drosophila melanogaster (BDGP6.32, Ensembl Release 107) were used for mapping the sequencing reads in this study. Ribosomal RNA (rRNA) reference sequences were downloaded from the National Center for Biotechnology Information (NCBI) for H. sapiens (NR_003285.3, NR_003286.4, NR_003287.4, NR_023363.1), M. musculus (NR_003278.3, NR_003279.1, NR_003280.2, NR_046156.1) and from FlyBase for D. melanogaster (5SrRNA-CR33353, 18SrRNA-CR45841, 5.8SrRNA-CR45842 and 28SrRNA-CR4584)
ARTR-seq primary data processing
Reads from the small cell number libraries containing cell barcodes were first demultiplexed with an in-house script using read 2. The adapter sequences were trimmed with Cutadapt54 (v.4.2) using the parameter cutadapt–nextseq-trim=20 -a AGATCGGAAGAGCACACGTCTGAACTCCAG; the 8 nt unique molecular identifier sequences were moved and add to the read name for the further deduplication. An extra 4 nts at the reads’ 3′ end were removed from the adapter-free sequence to minimize mapping mismatch caused by the imperfect paired sequence in the random primer.
The reads were first mapped to the corresponding rRNA sequences using Bowtie2 (ref. 55) (v.2.4.4) with parameters: –seedlen=15, and the mapped reads were discarded to avoid rRNA contamination. The remaining unmapped reads were mapped to the corresponding genome using STAR56 (v.2.7.9a) with parameters: –readFilesCommand zcat–alignEndsType EndToEnd–genomeLoad NoSharedMemory–quantMode TranscriptomeSAM–alignMatesGapMax 15000–outFilterMultimapNmax 1–outFilterMultimapScoreRange 1–outSAMprimaryFlag AllBestScore–outSAMattributes All–outSAMtype BAM SortedByCoordinate–outFilterType BySJout–outReadsUnmapped Fastx–outFilterScoreMin 10–outFilterMatchNmin 24. Uniquely mapped reads were deduplicated to get the usable reads using UMI-tools57 (v.1.1.2) with the parameter, –method unique. The usable reads were assigned to genomic regions with RNASeQC58 (v.2.4.2) using default parameters. Deduplicated reads were assigned to genes with featureCounts59 (v.2.0.3) for the calculation of Pearson’s correlation coefficient between biological replicates. For visualization in IGV60 (v.2.13.1), .bam files of the usable reads were converted to bigWig with bamCoverage in the deepTools suite61 (v.3.5.1) with normalization by its respective sequencing depth using the parameters –normalizeUsing BPM–binSize 1. All the sample tracks were set to the same scale for display, except for the additional instruction noted in the legend.
Peaking calling
For peak calling, we first split the usable reads in one library into two .bam files containing reads aligned to the positive and negative strands, respectively. We used macs3 (ref. 62) to identify peaks with default parameters, except for adding ‘–keep-dup all–nomodel –extsize 30’. macs3 gives the fold enrichment (signal value) and P value based on Poisson distribution, and corrects the P values for multiple comparison using the Benjamini–Hochberg correction. The peaks located in two strands were called separately using the corresponding strand read in the input libraries as background. The two peak files for one library were later combined. To generate the consensus motif for peaks, we first extended 20 nts to both upstream and downstream, and the overrepresented sequences were generated using findMotifsGenome.pl in the HOMER suite52 (v.4.11) with parameters: -rna -S 10 -len 5,6,7,8,9. Specifically, for motif generation for peaks in mouse tissue, the peak genomic coordinates were converted from mm39 to mm10 using liftOver from the UCSC Genome Browser63. Peaks were assigned to specific genomic regions with in-house scripts, and the peaks overlapping two genomic regions were assigned to the region of longer overlapping base pairs. The peaks from the reader YTHDC1 were further assigned to repeats and other regions with annotatePeaks.pl in the HOMER suite.
Subsampling
To calculate the percentage of usable reads at different sequencing depths, we subsample the uniquely mapped reads with the samtools view in the Samtools suite64 (v.1.16.1). For the comparison between small cell number input libraries for different methods, the sizes of all libraries were reduced to that of the smallest library. Specifically, instead of directly subsampling the fastq files, we subsampled the uniquely mapped reads to calculate the usable read percentage of each library.
Alternative splicing identification
The differential alternative splicing events of each gene were identified using rMATS (v.4.1.2). The RBP-knockdown RNA-seq libraries bam files and the corresponding control libraries’ .bam files with the annotation of ENCODE4 v.1.2.1 GRCh38 V29 were downloaded from the ENCODE and were analyzed by rMATS for the identification of five alternative splicing modes, including skipped exon, mutually exclusive exons, alternative 3′ splice site, alternative 5′ splice site and retained introns. Events of FDR ≥ 0.05 were discarded for the subsequent analysis.
ARTR-seq enrichment level at the gene level
To calculate the ARTR-seq enrichment at the gene level, we divided the reads in one library into two groups by whether they were in one specific gene, and had a pair of in–out read numbers for each of the IP and Input libraries. For each gene, we generated two-by-two tables for all the combinations of in–out read numbers between IP and Input libraries. The ARTR-seq enrichment for a gene is defined as the common odds ratio of the tables with significance determined by the Cochran–Mantel–Haenszel chi-squared test.
Data visualization and statistical analysis
Read heatmaps and profiles were generated with plotHeatmap and plotProfile in the deepTools suite61 (v.3.5.1), using genomic coordinates unless otherwise indicated. The splicing maps of splicing factors are generated by RBP-Maps37 with default parameters in the ‘Plotting peaks’ mode (–peak), and the hg19 coordinates of native cassette exons and constitutive exons were downloaded from the software GitHub deposit. The peak genomic coordinates of the peaks for the splicing factors were first converted from GRCh38 to hg19 using liftOver from the UCSC Genome Browser63. The random regions are random exonic regions with the same length as pooled ARTR-seq peaks from the three m6A reader proteins, generated by bedtools shuffle in the BEDTools suite65 (v.2.30.0).
The meta distributions of binding peaks were generated by the R package Guitar66 (v.2.16.0). All statistical analyses were performed with R67, and all the plots were generated by the R package ggplot2 (ref. 68) (v.3.4.1).
Quantification of ARTR-seq signal at the gene level
To analyze G3BP1 binding strength at the gene level, ARTR-seq reads were counted for genes in both G3BP1 and paired input samples, and FCs and significance between G3BP1 and input were determined by DESeq2 (ref. 69). Only genes with the read sum equal to or greater than ten for G3BP1 and input samples were considered. RNA targets of G3BP1 were defined as those with a FC ≥ 2 and P < 0.05. Both FC and P value were calculated by DESeq2 with the default setting.
Clustering analysis of G3BP1 ARTR-seq signal
To track the changing pattern of G3BP1 binding single during the SG assembly, we used log2FC (G3BP1/input) of genes to represent the G3BP1 binding signal, and performed fuzzy c-means clustering analysis on log2FC by the Mfuzz package70 (v.2.54.0). Only genes with the top 50% of the greatest standard deviation (s.d.) of log2FC were considered, and the log2FC values were scaled by z score before clustering. The cluster number was determined by the ‘Dmin’ function in the Mfuzz package. Clustering was calculated by the ‘mfuzz’ function in the Mfuzz package with 10,000 iterations with Euclidean distance as the clustering method. The membership values indicate the degree of association of genes with their respective clusters.
Functional enrichment analysis
KEGG enrichment analysis was carried out to compare G3BP1–RNA targets at different time points using the ‘compareCluster’ function in the clusterProfiler package53 (v.4.4.4). The KEGG terms with adjusted P values less than 0.05 were visualized.
Statistics and reproducibility
Unless otherwise stated, a two-tailed Student’s t-test or Wilcoxon test were performed to assess the statistical significance between groups. The resulting P values are indicated in the figure or legends. For boxplots, the box represents the 25th to 75th percentiles with a line at the median, whiskers to 1.5 times the interquartile range, a dot at the mean (if applicable) and outliers omitted. Immunofluorescence imaging experiments were repeated in at least two biological samples with consistent results.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Online content
Any methods, additional references, Nature Portfolio reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at 10.1038/s41592-023-02146-w.
Supplementary information
Source data
Acknowledgements
We thank all He laboratory members for the discussions. We thank P. Faber and other staff of the University of Chicago Genomics Facility for sequencing support. This work was supported by US National Institutes of Health grant no. RM1 HG008935 (C.H.). C.H. is an investigator of the Howard Hughes Medical Institute. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the paper.
Extended data
Author contributions
Y.X. and C.H. conceived and designed the study. Y.X. and C.H. designed and supervised the experiments with the help from Y.-M.C. Y.-M.C. and Y.X. performed computation analysis with help from C.Y., X.D. and S.L. Y.X. performed most experiments with help from Y.-M.C., Z.Z., J.W., H.Y., P.W., T.-B.Z., Q.L., J.F. and W.T. Y.X., Y.-M.C. and C.H. wrote the paper with input from C.Y., X.D. and C.L. All authors discussed the results and commented on the paper.
Peer review
Peer review information
Nature Methods thanks Peng Jin, Yuanchao Xue and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Lei Tang and Hui Hua, in collaboration with the Nature Methods team.
Data availability
All the sequencing data generated in this study have been deposited in the NCBI’s Gene Expression Omnibus (GEO) under the accession number GSE226161. Previously published data are available under accession numbers GSE42701 (CLIP-seq26), ENCSR384KAN and ENCSR981WKN (eCLIP28), E-MTAB-3108 (iCLIP27), GSE78832 (irCLIP10), GSE137925 (LACE-seq13), GSE92995 (sCLIP11), DRA005743 (tRIP-seq12) and GSE195654 (RT&Tag22). The data were downloaded and processed as described in the articles. The processed .bam files of RNA-seq data for knockdown HNRNPC, PTBP1 and RBFOX2, along with their corresponding control data, were downloaded from ENCODE portal28 under the accession numbers of ENCSR052IYH, ENCSR305XWT, ENCSR767LLP, ENCSR104ABF, ENCSR064DXG and ENCSR603TCV. The published PAR-CLIP data and the corresponding peaks for YTHDF2 are available under the GEO accession number GSE49339. The m6A modification sites identified by m6A-SAC-seq are available under the GEO accession number GSE198246. Source data are provided with this paper.
Code availability
Codes for processing ARTR-seq data are available in the following GitHub repository: https://github.com/mingming-cgz/ARTR-seq.
Competing interests
C.H. is a scientific founder, a member of the scientific advisory board and equity holder of Aferna Green, Inc. and AccuaDX Inc. and a scientific cofounder and equity holder of Accent Therapeutics. The remaining authors declare no competing interests.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
These authors contributed equally: Yu Xiao, Yan-Ming Chen.
Extended data
is available for this paper at 10.1038/s41592-023-02146-w.
Supplementary information
The online version contains supplementary material available at 10.1038/s41592-023-02146-w.
References
- 1.Gerstberger S, Hafner M, Tuschl T. A census of human RNA-binding proteins. Nat. Rev. Genet. 2014;15:829–845. doi: 10.1038/nrg3813. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Gebauer F, Schwarzl T, Valcarcel J, Hentze MW. RNA-binding proteins in human genetic disease. Nat. Rev. Genet. 2021;22:185–198. doi: 10.1038/s41576-020-00302-y. [DOI] [PubMed] [Google Scholar]
- 3.Lerner MR, Steitz JA. Antibodies to small nuclear RNAs complexed with proteins are produced by patients with systemic lupus erythematosus. Proc. Natl Acad. Sci. USA. 1979;76:5495–5499. doi: 10.1073/pnas.76.11.5495. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Tenenbaum SA, Carson CC, Lager PJ, Keene JD. Identifying mRNA subsets in messenger ribonucleoprotein complexes by using cDNA arrays. Proc. Natl Acad. Sci. USA. 2000;97:14085–14090. doi: 10.1073/pnas.97.26.14085. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Ule J, et al. CLIP identifies Nova-regulated RNA networks in the brain. Science. 2003;302:1212–1215. doi: 10.1126/science.1090095. [DOI] [PubMed] [Google Scholar]
- 6.Licatalosi DD, et al. HITS-CLIP yields genome-wide insights into brain alternative RNA processing. Nature. 2008;456:464–469. doi: 10.1038/nature07488. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Hafner M, et al. Transcriptome-wide identification of RNA-binding protein and microRNA target sites by PAR-CLIP. Cell. 2010;141:129–141. doi: 10.1016/j.cell.2010.03.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Konig J, et al. iCLIP reveals the function of hnRNP particles in splicing at individual nucleotide resolution. Nat. Struct. Mol. Biol. 2010;17:909–915. doi: 10.1038/nsmb.1838. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Van Nostrand EL, et al. Robust transcriptome-wide discovery of RNA-binding protein binding sites with enhanced CLIP (eCLIP) Nat. Methods. 2016;13:508–514. doi: 10.1038/nmeth.3810. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Zarnegar BJ, et al. irCLIP platform for efficient characterization of protein-RNA interactions. Nat. Methods. 2016;13:489–492. doi: 10.1038/nmeth.3840. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Kargapolova Y, Levin M, Lackner K, Danckwardt S. sCLIP-an integrated platform to study RNA-protein interactomes in biomedical research: identification of CSTF2tau in alternative processing of small nuclear RNAs. Nucleic Acids Res. 2017;45:6074–6086. doi: 10.1093/nar/gkx152. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Masuda A, et al. tRIP-seq reveals repression of premature polyadenylation by co-transcriptional FUS-U1 snRNP assembly. EMBO Rep. 2020;21:e49890. doi: 10.15252/embr.201949890. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Su R, et al. Global profiling of RNA-binding protein target sites by LACE-seq. Nat. Cell Biol. 2021;23:664–675. doi: 10.1038/s41556-021-00696-9. [DOI] [PubMed] [Google Scholar]
- 14.Blue SM, et al. Transcriptome-wide identification of RNA-binding protein binding sites using seCLIP-seq. Nat. Protoc. 2022;17:1223–1265. doi: 10.1038/s41596-022-00680-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Lorenz DA, et al. Multiplexed transcriptome discovery of RNA-binding protein binding sites by antibody-barcode eCLIP. Nat. Methods. 2023;20:65–69. doi: 10.1038/s41592-022-01708-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.McMahon AC, et al. TRIBE: hijacking an RNA-editing enzyme to identify cell-specific targets of RNA-binding proteins. Cell. 2016;165:742–753. doi: 10.1016/j.cell.2016.03.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Brannan KW, et al. Robust single-cell discovery of RNA targets of RNA-binding proteins and ribosomes. Nat. Methods. 2021;18:507–519. doi: 10.1038/s41592-021-01128-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Nguyen DTT, et al. HyperTRIBE uncovers increased MUSASHI-2 RNA binding activity and differential regulation in leukemic stem cells. Nat. Commun. 2020;11:2026. doi: 10.1038/s41467-020-15814-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Xu W, Rahman R, Rosbash M. Mechanistic implications of enhanced editing by a HyperTRIBE RNA-binding protein. RNA. 2018;24:173–182. doi: 10.1261/rna.064691.117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Flamand MN, Ke K, Tamming R, Meyer KD. Single-molecule identification of the target RNAs of different RNA binding proteins simultaneously in cells. Genes Dev. 2022;36:1002–1015. doi: 10.1101/gad.349983.122. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Meyer KD. DART-seq: an antibody-free method for global m6A detection. Nat. Methods. 2019;16:1275–1280. doi: 10.1038/s41592-019-0570-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Khyzha N, Henikoff S, Ahmad K. Profiling RNA at chromatin targets in situ by antibody-targeted tagmentation. Nat. Methods. 2022;19:1383–1392. doi: 10.1038/s41592-022-01618-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Kaya-Okur HS, et al. CUT&Tag for efficient epigenomic profiling of small samples and single cells. Nat. Commun. 2019;10:1930. doi: 10.1038/s41467-019-09982-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Anzalone AV, et al. Search-and-replace genome editing without double-strand breaks or donor DNA. Nature. 2019;576:149–157. doi: 10.1038/s41586-019-1711-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Potter, R. J. & Rosenthal, K. High fidelity reverse transcriptases and uses thereof. US patent US7056716B2 (2006).
- 26.Coelho MB, et al. Nuclear matrix protein Matrin3 regulates alternative splicing and forms overlapping regulatory networks with PTB. EMBO J. 2015;34:653–668. doi: 10.15252/embj.201489852. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Xue Y, et al. Direct conversion of fibroblasts to neurons by reprogramming PTB-regulated microRNA circuits. Cell. 2013;152:82–96. doi: 10.1016/j.cell.2012.11.045. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.The ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature489, 57–74 (2012). [DOI] [PMC free article] [PubMed]
- 29.Fred RG, Tillmar L, Welsh N. The role of PTB in insulin mRNA stability control. Curr. Diabetes Rev. 2006;2:363–366. doi: 10.2174/157339906777950570. [DOI] [PubMed] [Google Scholar]
- 30.Xue Y, et al. Genome-wide analysis of PTB-RNA interactions reveals a strategy used by the general splicing repressor to modulate exon inclusion or skipping. Mol. Cell. 2009;36:996–1006. doi: 10.1016/j.molcel.2009.12.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Van Nostrand EL, et al. A large-scale binding and functional map of human RNA-binding proteins. Nature. 2020;583:711–719. doi: 10.1038/s41586-020-2077-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Wang X, et al. N6-methyladenosine-dependent regulation of messenger RNA stability. Nature. 2014;505:117–120. doi: 10.1038/nature12730. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Ge R, et al. m6A-SAC-seq for quantitative whole transcriptome m6A profiling. Nat. Protoc. 2023;18:626–657. doi: 10.1038/s41596-022-00765-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Hafner M, et al. CLIP and complementary methods. Nat. Rev. Methods Prim. 2021;1:20. doi: 10.1038/s43586-021-00018-1. [DOI] [Google Scholar]
- 35.Dvinge H. Regulation of alternative mRNA splicing: old players and new perspectives. FEBS Lett. 2018;592:2987–3006. doi: 10.1002/1873-3468.13119. [DOI] [PubMed] [Google Scholar]
- 36.Luo Y, et al. New developments on the Encyclopedia of DNA Elements (ENCODE) data portal. Nucleic Acids Res. 2020;48:D882–D889. doi: 10.1093/nar/gkz1062. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Yee BA, Pratt GA, Graveley BR, Van Nostrand EL, Yeo GW. RBP-Maps enables robust generation of splicing regulatory maps. RNA. 2019;25:193–204. doi: 10.1261/rna.069237.118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Shi H, Wei J, He C. Where, when, and how: context-dependent functions of RNA methylation writers, readers, and erasers. Mol. Cell. 2019;74:640–650. doi: 10.1016/j.molcel.2019.04.025. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Wang X, et al. N6-methyladenosine modulates messenger RNA translation efficiency. Cell. 2015;161:1388–1399. doi: 10.1016/j.cell.2015.05.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Roundtree IA, et al. YTHDC1 mediates nuclear export of N6-methyladenosine methylated mRNAs. eLlife. 2017;6:e31311. doi: 10.7554/eLife.31311. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Liu J, et al. N6-methyladenosine of chromosome-associated regulatory RNA regulates chromatin state and transcription. Science. 2020;367:580–586. doi: 10.1126/science.aay6018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Zou Z, Sepich-Poore C, Zhou X, Wei J, He C. The mechanism underlying redundant functions of the YTHDF proteins. Genome Biol. 2023;24:17. doi: 10.1186/s13059-023-02862-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Yang P, et al. G3BP1 is a tunable switch that triggers phase separation to assemble stress granules. Cell. 2020;181:325–345 e328. doi: 10.1016/j.cell.2020.03.046. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Protter DSW, Parker R. Principles and properties of stress granules. Trends Cell Biol. 2016;26:668–679. doi: 10.1016/j.tcb.2016.05.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Wheeler JR, Matheny T, Jain S, Abrisch R, Parker R. Distinct stages in stress granule assembly and disassembly. eLlife. 2016;5:e18413. doi: 10.7554/eLife.18413. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Khong A, et al. The stress granule transcriptome reveals principles of mRNA accumulation in stress granules. Mol. Cell. 2017;68:808–820 e805. doi: 10.1016/j.molcel.2017.10.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Chou RH, Huang H. Sodium arsenite suppresses human papillomavirus-16 E6 gene and enhances apoptosis in E6-transfected human lymphoblastoid cells. J. Cell. Biochem. 2002;84:615–624. doi: 10.1002/jcb.10055. [DOI] [PubMed] [Google Scholar]
- 48.Sun H, et al. Sodium arsenite-induced learning and memory impairment is associated with endoplasmic reticulum stress-mediated apoptosis in rat hippocampus. Front. Mol. Neurosci. 2017;10:286. doi: 10.3389/fnmol.2017.00286. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Henikoff S, Ahmad K. In situ tools for chromatin structural epigenomics. Protein Sci. 2022;31:e4458. doi: 10.1002/pro.4458. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Lopes I, Altab G, Raina P, de Magalhaes JP. Gene size matters: an analysis of gene length in the human genome. Front. Genet. 2021;12:559998. doi: 10.3389/fgene.2021.559998. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Irgen-Gioro S, Yoshida S, Walling V, Chong S. Fixation can change the appearance of phase separation in living cells. eLife. 2022;11:e79903. doi: 10.7554/eLife.79903. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Heinz S, et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol. Cell. 2010;38:576–589. doi: 10.1016/j.molcel.2010.05.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Wu T, et al. clusterProfiler 4.0: a universal enrichment tool for interpreting omics data. Innov. 2021;2:100141. doi: 10.1016/j.xinn.2021.100141. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet. J. 2011;17:10–12. doi: 10.14806/ej.17.1.200. [DOI] [Google Scholar]
- 55.Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat. Methods. 2012;9:357–359. doi: 10.1038/nmeth.1923. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Dobin A, et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29:15–21. doi: 10.1093/bioinformatics/bts635. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Smith T, Heger A, Sudbery I. UMI-tools: modeling sequencing errors in unique molecular identifiers to improve quantification accuracy. Genome Res. 2017;27:491–499. doi: 10.1101/gr.209601.116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Graubert A, Aguet F, Ravi A, Ardlie KG, Getz G. RNA-SeQC 2: efficient RNA-seq quality control and quantification for large cohorts. Bioinformatics. 2021;37:3048–3050. doi: 10.1093/bioinformatics/btab135. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Liao Y, Smyth GK, Shi W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics. 2014;30:923–930. doi: 10.1093/bioinformatics/btt656. [DOI] [PubMed] [Google Scholar]
- 60.Robinson JT, et al. Integrative Genomics Viewer. Nat. Biotechnol. 2011;29:24–26. doi: 10.1038/nbt.1754. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Ramirez F, Dundar F, Diehl S, Gruning BA, Manke T. deepTools: a flexible platform for exploring deep-sequencing data. Nucleic Acids Res. 2014;42:W187–W191. doi: 10.1093/nar/gku365. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Zhang Y, et al. Model-based analysis of ChIP-Seq (MACS) Genome Biol. 2008;9:R137. doi: 10.1186/gb-2008-9-9-r137. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Kent WJ, et al. The human genome browser at UCSC. Genome Res. 2002;12:996–1006. doi: 10.1101/gr.229102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Danecek P, et al. Twelve years of SAMtools and BCFtools. Gigascience. 2021;10:giab008. doi: 10.1093/gigascience/giab008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26:841–842. doi: 10.1093/bioinformatics/btq033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Cui X, et al. Guitar: an R/Bioconductor package for gene annotation guided transcriptomic analysis of RNA-related genomic features. BioMed. Res. Int. 2016;2016:8367534. doi: 10.1155/2016/8367534. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.R Core Team. R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, 2014); https://www.R-project.org/
- 68.Wickham, H. ggplot2: Elegant Graphics for Data Analysis (Springer, 2009).
- 69.Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15:550. doi: 10.1186/s13059-014-0550-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Kumar L, Futschik ME. Mfuzz: a software package for soft clustering of microarray data. Bioinformation. 2007;2:5–7. doi: 10.6026/97320630002005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Oscorbin IP, Filipenko ML. M-MuLV reverse transcriptase: selected properties and improved mutants. Comput. Struct. Biotechnol. J. 2021;19:6315–6327. doi: 10.1016/j.csbj.2021.11.030. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All the sequencing data generated in this study have been deposited in the NCBI’s Gene Expression Omnibus (GEO) under the accession number GSE226161. Previously published data are available under accession numbers GSE42701 (CLIP-seq26), ENCSR384KAN and ENCSR981WKN (eCLIP28), E-MTAB-3108 (iCLIP27), GSE78832 (irCLIP10), GSE137925 (LACE-seq13), GSE92995 (sCLIP11), DRA005743 (tRIP-seq12) and GSE195654 (RT&Tag22). The data were downloaded and processed as described in the articles. The processed .bam files of RNA-seq data for knockdown HNRNPC, PTBP1 and RBFOX2, along with their corresponding control data, were downloaded from ENCODE portal28 under the accession numbers of ENCSR052IYH, ENCSR305XWT, ENCSR767LLP, ENCSR104ABF, ENCSR064DXG and ENCSR603TCV. The published PAR-CLIP data and the corresponding peaks for YTHDF2 are available under the GEO accession number GSE49339. The m6A modification sites identified by m6A-SAC-seq are available under the GEO accession number GSE198246. Source data are provided with this paper.
Codes for processing ARTR-seq data are available in the following GitHub repository: https://github.com/mingming-cgz/ARTR-seq.