Skip to main content
Nucleic Acids Research logoLink to Nucleic Acids Research
. 2024 Nov 30;53(2):gkae1173. doi: 10.1093/nar/gkae1173

Functional RNA mining using random high-throughput screening

Li-Hua Liu 1, Jinde Chen 2, Shijing Lai 3, Xuemei Zhao 4, Min Yang 5, Yi-Rui Wu 6,, Zhiqian Zhang 7,, Ao Jiang 8,
PMCID: PMC11754670  PMID: 39673274

Abstract

Functional RNA participates in various life processes in cells. However, there is currently a lack of effective methods to screen for functional RNA. Here, we developed a technology named random high-throughput screening (rHTS). rHTS uses a random library of ∼250-nt synthesized RNA fragments, with high uniformity and abundance. These fragments are circularized into circular RNA by an auto-cyclizing ribozyme to improve their stability. Using rHTS, we successfully screened and identified three RNA fragments contributing significantly to the growth of Escherichia coli, one of which possesses coding potential. Moreover, we found that two noncoding RNAs (ncRNAs) effectively inhibited the growth of E. coli, in vivo rather than in vitro. Subsequently, we applied the rHTS to a coenzyme-dependent screening platform. In this context, two ncRNAs were identified that could effectively promote the conversion from NADPH to NADP+. Exogenous expression of these two ncRNAs was able to increase the conversion rate of glycerol dehydrogenase from glycerol to 1,3-dihydroxyacetone from 18.3% to 21.8% and 23.2%, respectively. These results suggest that rHTS is a powerful technology for functional RNA mining.

Graphical Abstract

Graphical Abstract.

Graphical Abstract

Introduction

RNA, one of the main carriers of genetic information, is a long-chain molecule composed of ribonucleotides linked by phosphodiester bonds (1). Within cells, RNA participates in complex life processes. One of the most well-known functions is to encode proteins, commonly referred to as messenger RNA (mRNA) (1). Other RNAs are collectively called noncoding RNA (ncRNA) and perform diverse roles depending on their sequence, structure, positioning and partners, such as microRNA, transfer RNA, small nuclear RNA, long ncRNA and so on (1–3). In recent years, with the development of high-throughput sequencing and bioinformatics, the field of ncRNA has received increasing attention (1,4). In cells, mRNA accounts for <5%, which hints at the importance of ncRNA within cells (4,5). Functional ncRNA is widely involved in life processes in diverse dimensions, including gene editing (6), gene expression regulation (1,7), senescence and aging (8,9), and other vital processes (1,5,10). Some ncRNAs with special functions have also become research hotspots for their potential in disease treatment and tool development (5,6).

For a long time, it has been believed that these genes are highly unlikely to have evolved from random sequences, since the combination of short nucleotide sequences is almost infinite, but only a very small part of RNA sequences may have biological activities (11,12). Intriguingly, genomic and transcriptomic analyses have revealed that many new transcripts are generated de novo from random parts of the genome, and the evolutionarily youngest lineage always has the highest emergence rate, indicating a relatively relaxed constraint on de novo evolution (13–15).

Therefore, a series of new technologies have recently emerged for identifying functional RNAs, including genome-wide CRISPR screening (16,17), high-throughput technology (18,19), deep learning-based RNA structure (20,21) and function design and identification (22), and other advanced methods (23,24). However, these approaches are primarily aimed at pinpointing RNA molecules within the transcriptome with specific functions and at crafting novel RNA molecules with analogous functions based on existing RNA sequences. The functional screening of RNA molecules from random sequences remains largely unexplored. Prior efforts from our laboratory and others have focused on creating random open reading frame (ORF) libraries via polymerase chain reaction (PCR) mutagenesis or oligonucleotide synthesis, seeking to identify peptides with unique functionalities (11,25,26). Yet, these techniques are constrained by the length of the synthesized fragments and potential biases in the preparation process. Many functional RNAs can extend over several hundred nucleotides (6,19). In recent years, with the development of DNA synthesis technology, the synthesis of long random sequences has become an effective way to explore new genes (27). This method does not rely on existing gene or genomic information to construct large-scale DNA libraries, making it more likely for researchers to discover functional DNA and products that do not exist or have not been discovered in nature (11,25,26).

Materials and methods

Strains, plasmids and gene synthesis

All strains, plasmids and RNA sequences used in this study are listed in Supplementary Table S1. The Escherichia coli strain DH5α and BL21(DE3) Chemically Competent Cells were purchased from TransGen Biotech. pBAD18, pET28a, pTnsABC and pQcascade-IS1 plasmids were purchased from Biofeng. The 250-bp random nucleotide sequence flanked by the 3′ intron and 5′ intron of T4td auto-cyclizing ribozyme (AR) gene was synthesized from Dynegene Biotech. DNA of the T4td AR gene and the ORF2 gene (after codon optimization from GenScript Biotech) was synthesized from IGE Biotech. The DNA of NADP+-dependent glycerol dehydrogenase (GDH) from Saccharomyces cerevisiae was synthesized from IGE Biotech. DNA of CRISPR RNA (crRNA) array was synthesized and inserted in the pQcascade-IS1 plasmid.

Plasmid construction

The 250-bp random nucleotide sequence with the flank sequence of 3′ intron and 5′ intron of T4td AR gene was amplified using Hieff Canace® Gold High Fidelity DNA Polymerase (Yeasen Biotech) using primers specific to the 3′ intron and 5′ intron, respectively. The amplified PCR product was recovered using the MolPure Gel Extraction Kit (Yeasen Biotech) and constructed into the pBAD18-T4td using the Hieff Clone® Universal II One Step Cloning Kit (Yeasen Biotech). The product was then transformed into E. coli strain DH5α by electroconversion. Total plasmid library was extracted using the MolPure® Plasmid Mini Kit and used as the 250-bp random nucleotide sequence plasmid library.

For CRISPR-associated transposon, the characteristic left and right end sequences of transposon were inserted into the paired end of pGEX-6P-1 plasmid at the flank of the lacI gene and the tac promoter, using mutant primer-mediated plasmid mutagenesis described previously. The rna7 and rna8 were amplified using the primers containing the J23119 promoter and T7 terminator. These sequences were further inserted into the engineered pGEX-6P-1 plasmid by replacing the DNA sequence between the left and right end sequences of transposon. These plasmids served as donor plasmids of the CRISPR-associated transposon. The udhA gene from E. coli DH5α and GDH gene were amplified and inserted into the pT7-GFP plasmid to replace the GFP gene.

For in vitro transcription, the sequences of rna5 and rna6 were amplified and inserted into the pT7-GFP plasmid immediately following the T7 promoter, according to the user’s manual of the T7 High Yield RNA Synthesis Kit (Yeasen Biotech).

Media and growth conditions

Cloning was carried out with E. coli DH5α and protein expression was performed with E. coli BL21(DE3). All E. coli were cultured in the lysogeny broth (LB) medium (10 g/l tryptone, 5 g/l yeast extract and 10 g/l NaCl, pH 7.0) with corresponding antibiotics, unless otherwise noted. The NADP+-dependent screening system [engineered E. coli BL21(DE3) with the promoter substitution of the glucose-6-phosphate isomerase (pgi), phosphogluconate dehydratase (edd), pyridine nucleotide transhydrogenase (udhA) and quinone oxidoreductase (qorA) genes] was cultured in the LB medium with an isopropyl-β-d-thiogalactopyranoside (IPTG) concentration of 0.1 mmol/l and screened in the M9 minimal medium (4 g/l glucose, 6.78 g/l Na2HPO4, 3 g/l KH2PO4, 0.5 g/l NaCl, 1.0 g/l NH4Cl, 0.241 g/l MgCl2 and 0.011 g/l CaCl2, pH 7.0). For solid medium plates, 15 g/l agar was added in addition to the liquid medium composition. Unless specifically mentioned, induction was initiated with final concentrations of 0.1% (w/v) l-arabinose for strains with the BAD promoter and 0.1 mmol IPTG for strains with the lac operon.

CRISPR-associated transposon

CRISPR-associated transposon procedures were conducted following a previously reported method (25). Briefly, the BL21(DE3) strain containing pTnsABC, pQcascade and donor plasmid was cultured in the LB medium with 50 mg/l kanamycin and 50 mg/l spectinomycin until the OD600 value reached 0.6. The strain was induced using 0.2 mM IPTG and incubated at 30°C overnight. Inserted fragments were identified using bacterial PCR, and the PCR products were further verified by Sanger sequencing. Positive clones were subsequently passaged multiple times in an antibiotic-free medium to eliminate the pTnsABC, pQcascade and donor plasmid until they could no longer grow in the LB medium containing kanamycin or spectinomycin. The lac operon was inserted into the promoter of the pgi, edd, udhA and qorA genes, and the rna7 and rna8 were inserted into the IS1 site, respectively.

NADP+-dependent screening system

The NADP+-dependent screening system [engineered E. coli BL21(DE3) with the promoter substitution of the pgi, edd, udhA and qorA genes] was cultured in the LB medium with an IPTG concentration of 0.1 mmol/l. To verify its efficiency, the strain was cultured in the M9 medium with or without 0.1 mmol/l IPTG, and the E. coli strain containing the plasmid of udhA, rna7 or rna8 was cultured in the M9 medium. Their growth curves were tested in 12 h. Subsequently, the concentrations of NADPH and NADP+ were determined using the Coenzyme II NADP(H) Content Assay Kit (Yeasen Biotech).

Random high-throughput screening

To verify the uniformity and coverage of the library, the synthesized 250 random sequence was constructed into an Illumina sequencing library using the Hieff NGS® DNA Library Prep Kit (Yeasen Biotech). The DNA library was then sequenced by the Illumina NovaSeq 6000 platform using the PE150 model.

For growth-dependent screening, the 250 random sequence library plasmid pool was transferred into E. coli DH5α. The strain pools were cultured in the LB medium with or without a final concentration of 10 mM l-arabinose at a temperature of 37°C and at a rotation speed of 200 rpm until the OD600 value reached 0.6. The plasmid pool was extracted and amplified using primers containing Illumina universal sequence and index. The PCR product was recovered and sequenced by the Illumina NovaSeq 6000 platform using the PE150 model.

For NADPH-dependent screening, the 250 random sequence library plasmid pool was transferred into the NADP+-dependent screening system. The strain pools were cultured in the M9 medium at a temperature of 37°C and at a rotation speed of 200 rpm until the OD600 value reached 0.6. The plasmid pool was extracted and amplified using primers containing Illumina universal sequence and index. The PCR product was recovered and sequenced by the Illumina NovaSeq 6000 platform using the PE150 model.

After sequencing, the adaptor was trimmed by Trimmomatic. Reads 1 and 2 were assembled. The abundance of these random sequence was calculated, and the enrichment ratio was analyzed. The ribosome binding site (RBS) was predicted by an RBS calculator (28), and the ORF was found by SnapGene.

Growth curve and rate determination

The E. coli strain was inoculated into a sterile LB medium containing 0.05% (w/v) l-arabinose, starting at an OD600 of 0.01, and cultivated at 37°C with a rotation speed of 200 rpm. The OD600 was measured hourly to calculate the growth rate. For the l-arabinose removal assay, the sugar was removed at the 4-h timepoint. The E. coli strain was harvested by centrifugation at 10°C and 5000 rpm for 5 min. The pellet was washed and resuspended three times with the pre-warmed LB medium at 37°C, equivalent in volume to the original, and then divided into two equal aliquots. One aliquot was supplemented with l-arabinose to a final concentration of 0.05% (w/v).

Reverse transcription and quantitative PCR

Total RNA from the E. coli strain was isolated using the MolPure Bacterial RNA Kit (Yeasten Biotech). RNA concentration and integrity were evaluated using a NanoDrop Microvolume Spectrophotometer (Thermo Fisher Scientific). Reverse transcription was performed using the Hifair II 1st Strand cDNA Synthesis Kit (gDNA digester plus) (Yeasten Biotech), starting with 2 μg of total RNA. The synthesized complementary DNA served as a template for quantitative PCR (qPCR) reactions, conducted according to the Hieff® qPCR SYBR Green Master Mix protocol. Primers specific for rna1–6 were qAR-F (5′-ttcg tctggatta gttacttatcgtg-3′) and qAR-R (5′-cggaattata tccagctgcatg-3′).

In vitro transcription

In vitro transcription was carried out using the T7 High Yield RNA Synthesis Kit (Yeasen Biotech) according to the user’s manual. Briefly, 20 μg of the plasmid containing GFP, rna5 or rna6 was used in a 200 μl of reaction mix. The reaction was performed at 37°C overnight. Subsequently, 20 U of DNase I was added and the reaction continued to be incubated at 37°C for 1 h. The RNA was recovered using RNA Cleaner magnetic beads (Yeasen Biotech). The concentration and quality of RNA were determined by NanoDrop.

Activity analysis of GDH

The pT7-GDH plasmid was transformed into the E. coli strain BL21(DE3) with or without the rna7 or rna8 inserted into the IS1 site by CRISPR-associated transposon. The strains (initiated at an OD600 of 0.2) were cultured in the LB medium containing 20 g/l glycerol at 30°C for 48 h.

The concentrations of glycerol and 1,3-dihydroxyacetone (DHA) were detected by high-performance liquid chromatography with the following conditions of column: Lichrospher5-NH (4.6 mm × 250 mm); temperature: 30°C; mobile phase: acetonitrile–water (90:10); flow rate: 1.0 ml/min. DHA was detected using an ultraviolet detector with a detection wavelength of glycerol using a differential refractive index detector at 271 nm and a temperature of 35°C. The retention times for DHA and glycerol are ∼5.4 and 9.5 min, respectively.

The activity of GDH was determined by calculating the conversion rate from glycerol to DHA. The conversion rate represents the ratio of the final molar production of DHA to the molar input of glycerol.

Statistical analysis

Statistical analyses were performed using the two-tailed Student’s t-test, one-way analysis of variance or Mann–Whitney nonparametric U-test with GraphPad Prism 9.0. *P ≤ 0.05 and **P ≤ 0.01 as determined by an unpaired two-tailed t-test. All experiments were performed with three replicates, and the error bars in the figure legends represent means ± standard deviation values.

Results

Establishment of random RNA library

In order to screen for functional RNA efficiently and quickly, we synthesized a DNA fragment containing 250 random nucleotides (250N) as the main body of random RNA. Furthermore, in order to reduce the impact of flanking sequences and improve the stability of RNA, we employed an AR, originating from the modified group I catalytic intron of the thymidylate synthase gene in the T4 bacteriophage (T4td). This ribozyme can circularize exogenous RNA through self-splicing when flanked by an inserted fragment and then detach, leaving behind minimal nucleotides to form a stable covalently closed circular RNA (circRNA) molecule (29,30). We inserted the 250N sequence between the 3′ and 5′ introns of T4td AR to circularize random RNA (Figure 1A). The random circRNA library was expressed under the control of the BAD promoter (pBAD), a highly stringent promoter derived from the arabinose operon in E. coli, to screen for functional RNA.

Figure 1.

Figure 1.

Construction of the 250N library. (A) Schematic depiction of circRNA library containing random 250-nt sequence. (B) Base distribution of circRNA library containing random 250-nt sequence. (C) Sequence number distribution of circRNA library containing random 250-nt sequence. The x-axis is arranged in ascending order of the abundance of random 250-nt sequences, while the y-axis represents the proportion of reads for each sequence relative to the total number of reads.

To verify the richness and preference of the random library, we performed Sanger sequencing on the constructed random library. The results showed that among the 10 strain clones we selected, the sequence distribution of these DNA fragments was diverse. Subsequently, next-generation sequencing data showed that the bases at each site in the random library had a diverse distribution (Figure 1B). Despite some bias in sequence abundance, we detected over 40 000 distinct sequences of RNA molecules (Figure 1C), indicating a strong inclusiveness in our random library.

High-throughput screening method for identifying survival-related RNA in E. coli

Based on the random RNA expression plasmid library constructed above, we developed a high-throughput screening method and named it random high-throughput screening (rHTS). We screened functional RNAs based on their viability in E. coli, as illustrated in Figure 2A. After over 100 generations of culture, we extracted plasmids from the surviving strains and performed next-generation sequencing. Seven RNA candidates exhibited significant alterations in the surviving strain, suggesting that these RNA molecules may activate the growth of strains (Figure 2B and C). Among the top four RNA candidates, three had obvious promoting effects on the growth of E. coli (Figure 2C). We found that only rna2 contained a potential ORF, coding a peptide with a length of 30 amino acids (Figure 2D). To verify whether RNA or protein was at work, we redesigned the codon of this peptide. The results showed that the expression of ORF2 slightly promoted the proliferation of E. coli, indicating that the peptide had some functional activity (Figure 2E). This effect was more evident in complex carbon source media (LB, SOB, 2YT), potentially due to ORF2-mediated enhancement of complex nutrient utilization in E. coli (Figure 2F). Growth curve analysis indicated that ORF2 expression positively influenced the overall growth rate of E. coli without substantially modifying the growth pattern, culminating in increased biomass (Figure 2G and H). Moreover, we found that rna1, without a potential ORF, could greatly strengthen the proliferation of E. coli (Figure 2C), indicating that the RNA itself was responsible for this property. Through sequence truncation, we found that a core sequence (130–190) might be crucial for this function, while its function was still weaker than that of full-length RNA, indicating that the complete sequence contributes significantly to its activity (Figure 2I).

Figure 2.

Figure 2.

Screening survival-related RNA using rHTS. (A) Process diagram showing growth-dependent screening. (B) Scatter plot showing changes in the proportion of sequences before and after screening. (C) The effect of screened ncRNA on the growth of E. coli and the expression level of screened ncRNA detected by reverse transcription qPCR (RT-qPCR). **P ≤ 0.01 as determined by an unpaired two-tailed t-test comparing scores for the experimental group [with 0.1% (w/v) l-arabinose] and the control group (without l-arabinose); ns: not significant. (D) The potential ORF shown in rna2. (E) The effect of ORF2 expression controlled by pBAD on the growth of E. coli. The y-axis represents the OD600 value ratio of the group with or without 0–10 mM l-arabinose. **P ≤ 0.01 as determined by an unpaired two-tailed t-test comparing scores for the ORF2 group and the empty group; ns: not significant. (F) The OD600 value of E. coli with or without ORF2 expression in six types of media. *P ≤ 0.05 and **P ≤ 0.01 as determined by an unpaired two-tailed t-test comparing scores for this indicated group and the empty group; ns: not significant. (G) The growth curve of E. coli with or without ORF2 expression in the LB medium. (H) The growth rate of E. coli with or without ORF2 expression in the LB medium. (I) The effect of truncated rna1 on the growth of E. coli. The highlighted sequences represent fragments that may contain core sequences. *P ≤ 0.05 and **P ≤ 0.01 as determined by an unpaired two-tailed t-test comparing scores for this indicated group and the empty group; ns: not significant.

Furthermore, we also found that four RNA candidates decreased in the surviving strain, suggesting that these RNA molecules may inhibit the growth of strains (Figure 2B and C). Unlike our previous study that screened antimicrobial peptides from random ORFs (25), these four RNA candidates lacked ORFs, indicating that their function was caused by the RNA molecules. We monitored the abundance changes of rna1–6 by quantifying AR-L (Figure 1A). The RT-qPCR results indicated that l-arabinose could enhance the expression levels of rna1–6 by hundreds of times; however, no significant differences were observed in their RNA levels under the induction condition (Figure 2C). Subsequently, we were interested in determining whether these RNAs exert their growth inhibitory functions in vivo or in vitro. We employed the exogenous plasmids to produce these RNAs in vivo or added RNA molecules generated by in vitro transcription to the culture medium. The results showed that these two RNAs could function in vivo, rather than in vitro (Figure 3A and B). These results were quite different from the antimicrobial peptides we screened, possibly due to these RNAs not being autonomously presented into the strain cells. The inhibitory effect was observed across different media types, including defined (TB, SOC, M9) and complex carbon sources (Figure 3C). Growth curve analysis confirmed significant inhibitory impacts of these RNAs during the logarithmic phase of the E. coli life cycle (Figure 3D and E). The inhibitory effect was mitigated by removing l-arabinose from the culture, underscoring the in vivo growth inhibitory role of these two RNAs (Figure 3F).

Figure 3.

Figure 3.

Screening RNA with growth inhibitory property. The growth inhibitory properties of rna5 and rna6 were verified in vivo (A) and in vitro (B), respectively. The plasmids expressing RNAs were utilized in vivo, and the purified RNAs were utilized in vitro. **P ≤ 0.01 as determined by an unpaired two-tailed t-test comparing scores for the rna5 or rna6 group and the empty group; ns: not significant. (C) The OD600 value of E. coli with or without rna5 and rna6 expression in six types of media. *P ≤ 0.05 and **P ≤ 0.01 as determined by an unpaired two-tailed t-test comparing scores for this indicated group and the empty group; ns: not significant. (D) The growth curve of E. coli with or without rna5 and rna6 expression in the LB medium. (E) The growth rate of E. coli with or without rna5 and rna6 expression in the LB medium. (F) The growth curve of E. coli with or without 0.05% (w/v) l-arabinose removal in the LB medium.

Screening RNA for NADP+ regeneration in E. coli

In addition to growth-dependent screening, coenzyme-dependent screening has also been a popular high-throughput screening method in recent years (31). It relies on intracellular redox reactions to maintain the concentration balance of oxidative and reduced coenzymes, ensuring normal cellular metabolism. An exemplar is the NADPH-dependent screening platform, designed to disrupt the NADPH/NADP+ balance by inhibiting essential NADPH-dependent enzymes such as pgi, edd, udhA and qorA (31). This disruption can result in a significant delay in the cell cycle, and cell growth can only be resumed upon restoration of NADPH-dependent enzyme activity, facilitating NADP+ regeneration (Figure 4A). To construct such a system, we utilized CRISPR-associated transposon technology (32), an efficient genome integration tool that integrates Tn6677-like transposons (TnsA, TnsB and TnsC) with a nuclease-deficient CRISPR–Cas system (tniQ, Cas8, Cas7 and Cas6) to catalyze crRNA-directed insertion of mobile genetic elements (cargo) into the genome (crRNA target sites), to replace the promoters of NADPH-dependent enzymes with aTac-lacO promoter, allowing their expression to be controlled by IPTG (Figure 4A). In the absence of IPTG, the expression of these NADPH-dependent enzymes was inhibited; therefore, the growth of the engineered E. coli BL21(DE3) was dramatically inhibited, due to a drastic NADPH/NADP+ imbalance (Figure 4B). This phenomenon was effectively alleviated by IPTG supplementation or additional expression of udhA, which oxidizes NADPH to NADP+ (Figure 4B). These results suggested that we successfully constructed the NADPH-dependent system.

Figure 4.

Figure 4.

Screening RNA for NADP+ regeneration using rHTS. (A) Schematic representation of the NADPH-dependent platform. Inhibition of enzymes dependent on NADPH leads to a significant disruption of the NADPH/NADP+ balance, consequently causing E. coli growth arrest (upper panel). The expression of these NADPH-dependent enzymes is regulated by the lactose operon, which is achieved by the CRISPR-associated transposon (lower panel). (B) Growth curve and NADPH/NADP+ ratio test before and after inhibiting NADPH-dependent enzymes, and being recovered by udhA overexpression. **P ≤ 0.01 as determined by an unpaired two-tailed t-test comparing scores for the indicated group and the group without IPTG. (C) Scatter plot showing changes in the proportion of sequences before and after screening. Growth curve (D) and NADPH/NADP+ ratio test (E) in NADPH-dependent platform before and after rna7 and rna8 overexpression. **P ≤ 0.01 as determined by an unpaired two-tailed t-test comparing scores for the indicated group and the group without IPTG. (F) Conversion rate determination from glycerol to DHA before and after rna7 and rna8 overexpression. **P ≤ 0.01 as determined by an unpaired two-tailed t-test comparing scores for the rna5 or rna6 group and the empty group.

Subsequently, we performed the rHTS in this NADPH-dependent system. Two RNAs without potential ORF were significantly enriched in our sequencing result (Figure 4C). Their overexpression significantly alleviated the growth inhibition and NADPH/NADP+ imbalance, albeit not as effectively as udhA overexpression (Figure 4D and E), indicating that these two RNAs could regulate the oxidation from NADPH to NADP+.

NADP+-dependent GDH catalyzes the oxidation reaction from glycerol to DHA, a widely used functional additive, with the coenzyme transformation from NADP+ to NADPH (33). However, the conversion rate from glycerol to DHA was limited by the concentration of NADP+ in vivo, ∼18.3% (Figure 4F). When we used these two RNAs to regenerate NADP+ from NADPH, the conversion rate increased to 23.2% in rna7 and 21.8% in rna8, respectively (Figure 4F). These results indicated that these two RNAs can effectively strengthen the catalytic efficiency of GDH by promoting NADPH oxidation to NADP+.

Discussion

Functional RNAs, particularly ncRNAs, have been a hot research topic in recent years, as they participate in complex life processes in the body (4,5). With the development of omics technology, the spectrum of RNA functions continues to expand, including gene expression regulation (5), gene editing (6), immune regulation (34), genetic information transfer (35), epigenetics (36) and more (2,8). These advancements indicate that a lot of functional RNAs exist in random sequences, supporting the ‘RNA world’ hypothesis (37). However, little is currently known about how to screen for functional RNAs on a large scale, especially from random RNA libraries. Our research demonstrates that random RNA libraries can serve as a rich source for screening for functional RNAs, aligning with previous findings (11,26). It is worth noting that unlike our previous research (25), this study did not enforce the insertion of RBS and start codons into the random RNA fragments, and most of the screened RNA lacked potential ORFs. This implies the harsh conditions for the generation of ORF and RBS during the evolutionary process, and also proves the feasibility of screening functional ncRNAs from random sequences.

Of course, there are still many limitations in using random sequence libraries to screen for functional RNAs. First, the difficulty of synthesizing random sequence DNA skyrockets with increasing length, making it impossible to screen many large RNA structures or sequences using existing methods. Recently, the hydrolytic endonuclease (HYER), a novel ribozyme with gene editing potential, has an average length of over 600 nt (6). Second, current random DNA synthesis techniques still struggle to achieve comprehensive and uniform coverage (38,39). The 250-nt sequence fragments synthesized in this study showed significant sequence preference in next-generation sequencing. Third, under conventional deep sequencing conditions, only tens of thousands of sequences can be detected, which is much lower than the possibility of random nucleotide combinations. However, the cost of deep sequencing is relatively high and can bring about a lot of signal noise. Therefore, building a more specific random library may be more important. This often requires large-scale data accumulation and feature analysis, such as conserved core sequences or structures, to seek more rational random library design (40,41). Finally, although we have developed two growth-dependent screening methods, these screened functional RNAs exhibit considerable species specificity, as evidenced by the absence of significant growth regulatory functions in Bacillus subtilis (data not shown). There is still a lack of effective high-throughput screening methods for most and broad-spectrum functional RNAs.

Despite limitations in richness, sequencing depth and screening methods, we still effectively screened some functional ncRNAs under two conditions, promoting E. coli proliferation and NADP+ generation, and successfully validated and applied them. However, since these RNAs originate from random sequences and do not exist in known nucleic acid databases, it is difficult to infer the specific molecular mechanisms by which they perform these functions, whether as ribozymes or as regulatory RNAs for enzyme activity or gene expression. A series of technologies may contribute to dissect the molecular mechanism. For example, AlphaFold3 (20) can be used to construct their tertiary structure, RNA immunoprecipitation can identify the interactome (42) and RNA sequencing can be employed to dissect the gene expression changes, among others. These are very interesting research directions.

In summary, we developed a high-throughput random RNA screening method called rHTS using a DNA library of 250 random nucleotides. We screened functional ncRNAs on a large scale from E. coli using both growth-dependent screening and NADP+-dependent screening modes, and validated and applied their functions. Our research not only provides an efficient high-throughput screening method for ncRNA, but also demonstrates the biological activity and application value of ncRNA.

Supplementary Material

gkae1173_Supplemental_File

Acknowledgements

Author contributions: A.J., Z.Z. and Y.-R.W. designed and performed the flows of studies and manuscript writing. L.-H.L., S.L. and X.Z. performed the rHTS and verified the screening results. L.-H.L. and J.C. performed the CRISPR-associated transposon. L.-H.-L. verified the NADP+-dependent screening system, determined the NADPH/NADP+ ratio, and performed in vitro transcription, RT-qPCR and GDH activity assay. J.C. performed the growth curve analysis. L.-H.L. analyzed the next-generation sequencing data.

Contributor Information

Li-Hua Liu, Tidetron Bioworks Technology (Guangzhou) Co., Ltd., Guangzhou Qianxiang Bioworks Co., Ltd., Tongchaunghui South District, No. 40, Shangchong South, Haizhu District, Guangzhou, Guangdong 510000, P.R. China.

Jinde Chen, Tidetron Bioworks Technology (Guangzhou) Co., Ltd., Guangzhou Qianxiang Bioworks Co., Ltd., Tongchaunghui South District, No. 40, Shangchong South, Haizhu District, Guangzhou, Guangdong 510000, P.R. China.

Shijing Lai, Tidetron Bioworks Technology (Guangzhou) Co., Ltd., Guangzhou Qianxiang Bioworks Co., Ltd., Tongchaunghui South District, No. 40, Shangchong South, Haizhu District, Guangzhou, Guangdong 510000, P.R. China.

Xuemei Zhao, Tidetron Bioworks Technology (Guangzhou) Co., Ltd., Guangzhou Qianxiang Bioworks Co., Ltd., Tongchaunghui South District, No. 40, Shangchong South, Haizhu District, Guangzhou, Guangdong 510000, P.R. China.

Min Yang, Tidetron Bioworks Technology (Guangzhou) Co., Ltd., Guangzhou Qianxiang Bioworks Co., Ltd., Tongchaunghui South District, No. 40, Shangchong South, Haizhu District, Guangzhou, Guangdong 510000, P.R. China.

Yi-Rui Wu, Tidetron Bioworks Technology (Guangzhou) Co., Ltd., Guangzhou Qianxiang Bioworks Co., Ltd., Tongchaunghui South District, No. 40, Shangchong South, Haizhu District, Guangzhou, Guangdong 510000, P.R. China.

Zhiqian Zhang, Tidetron Bioworks Technology (Guangzhou) Co., Ltd., Guangzhou Qianxiang Bioworks Co., Ltd., Tongchaunghui South District, No. 40, Shangchong South, Haizhu District, Guangzhou, Guangdong 510000, P.R. China.

Ao Jiang, Tidetron Bioworks Technology (Guangzhou) Co., Ltd., Guangzhou Qianxiang Bioworks Co., Ltd., Tongchaunghui South District, No. 40, Shangchong South, Haizhu District, Guangzhou, Guangdong 510000, P.R. China.

Data availability

All data generated or analyzed during this study are included in this published article.

Supplementary data

Supplementary Data are available at NAR Online.

Funding

No external funding.

Conflict of interest statement. None declared.

References

  • 1. Ponting C.P., Oliver P.L., Reik W.. Evolution and functions of long noncoding RNAs. Cell. 2009; 136:629–641. [DOI] [PubMed] [Google Scholar]
  • 2. Slack F.J., Chinnaiyan A.M.. The role of non-coding RNAs in oncology. Cell. 2019; 179:1033–1055. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Gandhi M., Caudron-Herger M., Diederichs S.. RNA motifs and combinatorial prediction of interactions, stability and localization of noncoding RNAs. Nat. Struct. Mol. Biol. 2018; 25:1070–1076. [DOI] [PubMed] [Google Scholar]
  • 4. Cech T.R., Steitz J.A.. The noncoding RNA revolution—trashing old rules to forge new ones. Cell. 2014; 157:77–94. [DOI] [PubMed] [Google Scholar]
  • 5. Batista P.J., Chang H.Y.. Long noncoding RNAs: cellular address codes in development and disease. Cell. 2013; 152:1298–1307. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Liu Z.X., Zhang S., Zhu H.Z., Chen Z.H., Yang Y., Li L.Q., Lei Y., Liu Y., Li D.Y., Sun A.et al.. Hydrolytic endonucleolytic ribozyme (HYER) is programmable for sequence-specific DNA cleavage. Science. 2024; 383:eadh4859. [DOI] [PubMed] [Google Scholar]
  • 7. Wang K.C., Yang Y.W., Liu B., Sanyal A., Corces-Zimmerman R., Chen Y., Lajoie B.R., Protacio A., Flynn R.A., Gupta R.A.et al.. A long noncoding RNA maintains active chromatin to coordinate homeotic gene expression. Nature. 2011; 472:120–124. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Rossi M., Gorospe M.. Noncoding RNAs controlling telomere homeostasis in senescence and aging. Trends Mol. Med. 2020; 26:422–433. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Devaux Y., Zangrando J., Schroen B., Creemers E.E., Pedrazzini T., Chang C.P., Dorn G.W. 2nd, Thum T., Heymans S.Cardiolinc Network . Long noncoding RNAs in cardiac development and ageing. Nat. Rev. Cardiol. 2015; 12:415–425. [DOI] [PubMed] [Google Scholar]
  • 10. Nagano T., Fraser P.. No-nonsense functions for long noncoding RNAs. Cell. 2011; 145:178–181. [DOI] [PubMed] [Google Scholar]
  • 11. Neme R., Amador C., Yildirim B., McConnell E., Tautz D.. Random sequences are an abundant source of bioactive RNAs or peptides. Nat. Ecol. Evol. 2017; 1:0217. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Alcock F., Clements A., Webb C., Lithgow T.. Evolution. Tinkering inside the organelle. Science. 2010; 327:649–650. [DOI] [PubMed] [Google Scholar]
  • 13. Carvunis A.R., Rolland T., Wapinski I., Calderwood M.A., Yildirim M.A., Simonis N., Charloteaux B., Hidalgo C.A., Barbette J., Santhanam B.et al.. Proto-genes and de novo gene birth. Nature. 2012; 487:370–374. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Reinhardt J.A., Wanjiru B.M., Brant A.T., Saelao P., Begun D.J., Jones C.D.. De novo ORFs in Drosophila are important to organismal fitness and evolved rapidly from previously non-coding sequences. PLoS Genet. 2013; 9:e1003860. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Zhao L., Saelao P., Jones C.D., Begun D.J.. Origin and spread of de novo genes in Drosophila melanogaster populations. Science. 2014; 343:769–772. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Liu Y., Cao Z., Wang Y., Guo Y., Xu P., Yuan P., Liu Z., He Y., Wei W.. Genome-wide screening for functional long noncoding RNAs in human cells by Cas9 targeting of splice sites. Nat. Biotechnol. 2018; 36:1203–1210. [DOI] [PubMed] [Google Scholar]
  • 17. Xu F., Li C.H., Wong C.H., Chen G.G., Lai P.B.S., Shao S., Chan S.L., Chen Y.. Genome-wide screening and functional analysis identifies tumor suppressor long noncoding RNAs epigenetically silenced in hepatocellular carcinoma. Cancer Res. 2019; 79:1305–1317. [DOI] [PubMed] [Google Scholar]
  • 18. Zhou Y., Zhu S., Cai C., Yuan P., Li C., Huang Y., Wei W.. High-throughput screening of a CRISPR/Cas9 library for functional genomics in human cells. Nature. 2014; 509:487–491. [DOI] [PubMed] [Google Scholar]
  • 19. Montalbano A., Canver M.C., Sanjana N.E.. High-throughput approaches to pinpoint function within the noncoding genome. Mol. Cell. 2017; 68:44–59. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Abramson J., Adler J., Dunger J., Evans R., Green T., Pritzel A., Ronneberger O., Willmore L., Ballard A.J., Bambrick J.et al.. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature. 2024; 630:493–500. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Townshend R.J.L., Eismann S., Watkins A.M., Rangan R., Karelina M., Das R., Dror R.O.. Geometric deep learning of RNA structure. Science. 2021; 373:1047–1051. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Sumi S., Hamada M., Saito H.. Deep generative design of RNA family sequences. Nat. Methods. 2024; 21:435–443. [DOI] [PubMed] [Google Scholar]
  • 23. Li S., Wu H., Chen L.L.. Screening circular RNAs with functional potential using the RfxCas13d/BSJ-gRNA system. Nat. Protoc. 2022; 17:2085–2107. [DOI] [PubMed] [Google Scholar]
  • 24. Zhang Y., Nguyen T.M., Zhang X.O., Wang L., Phan T., Clohessy J.G., Pandolfi P.P.. Optimized RNA-targeting CRISPR/Cas13d technology outperforms shRNA in identifying functional circRNAs. Genome Biol. 2021; 22:41–47. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Zhang Y., Liu L.-H., Xu B., Zhang Z., Yang M., He Y., Chen J., Zhang Y., Hu Y., Chen X.et al.. Screening antimicrobial peptides and probiotics using multiple deep learning and directed evolution strategies. Acta Pharm. Sin. B. 2024; 14:3476–3492. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Keefe A.D., Szostak J.W.. Functional proteins from a random-sequence library. Nature. 2001; 410:715–718. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Eisenstein M. Enzymatic DNA synthesis enters new phase. Nat. Biotechnol. 2020; 38:1113–1115. [DOI] [PubMed] [Google Scholar]
  • 28. Salis H.M. The Ribosome Binding Site Calculator. Methods Enzymol. 2011; 498:19–42. [DOI] [PubMed] [Google Scholar]
  • 29. Qu L., Yi Z., Shen Y., Lin L., Chen F., Xu Y., Wu Z., Tang H., Zhang X., Tian F.et al.. Circular RNA vaccines against SARS-CoV-2 and emerging variants. Cell. 2022; 185:1728–1744. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Wesselhoeft R.A., Kowalski P.S., Anderson D.G.. Engineering circular RNA for potent and stable translation in eukaryotic cells. Nat. Commun. 2018; 9:2629–2638. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Nielsen J.R., Weusthuis R.A., Huang W.E.. Growth-coupled enzyme engineering through manipulation of redox cofactor regeneration. Biotechnol. Adv. 2023; 63:108102. [DOI] [PubMed] [Google Scholar]
  • 32. Klompe S.E., Vo P.L.H., Halpin-Healy T.S., Sternberg S.H.. Transposon-encoded CRISPR–Cas systems direct RNA-guided DNA integration. Nature. 2019; 571:219–225. [DOI] [PubMed] [Google Scholar]
  • 33. Liepins J., Kuorelahti S., Penttila M., Richard P.. Enzymes for the NADPH-dependent reduction of dihydroxyacetone and D-glyceraldehyde and L-glyceraldehyde in the mould Hypocrea jecorina. FEBS J. 2006; 273:4229–4235. [DOI] [PubMed] [Google Scholar]
  • 34. Liu C.X., Guo S.K., Nan F., Xu Y.F., Yang L., Chen L.L.. RNA circles with minimized immunogenicity as potent PKR inhibitors. Mol. Cell. 2022; 82:420–434. [DOI] [PubMed] [Google Scholar]
  • 35. Samanta B., Joyce G.F.. A reverse transcriptase ribozyme. eLife. 2017; 6:e31153. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36. Herman A.B., Tsitsipatis D., Gorospe M.. Integrated lncRNA function upon genomic and epigenomic regulation. Mol. Cell. 2022; 82:2252–2266. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Higgs P.G., Lehman N.. The RNA World: molecular cooperation at the origins of life. Nat. Rev. Genet. 2015; 16:7–17. [DOI] [PubMed] [Google Scholar]
  • 38. Meiser L.C., Koch J., Antkowiak P.L., Stark W.J., Heckel R., Grass R.N.. DNA synthesis for true random number generation. Nat. Commun. 2020; 11:5869–5877. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Davydova A.S., Krasheninina O.A., Tupikin A.E., Kabilov M.R., Venyaminova A.G., Vorobyeva M.A.. Synthesis of random DNA libraries for in vitro selection and analysis of their nucleotide composition. Russ. J. Bioorg. Chem. 2019; 45:656–661. [Google Scholar]
  • 40. Angenent-Mari N.M., Garruss A.S., Soenksen L.R., Church G., Collins J.J.. A deep learning approach to programmable RNA switches. Nat. Commun. 2020; 11:5057. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41. Bayard C.J., Yingling Y.G.. Computer-assisted design and characterization of RNA nanostructures. Methods Mol. Biol. 2023; 2709:31–49. [DOI] [PubMed] [Google Scholar]
  • 42. Trendel J., Schwarzl T., Horos R., Prakash A., Bateman A., Hentze M.W., Krijgsveld J.. The human RNA-binding proteome and its dynamics during translational arrest. Cell. 2019; 176:391–403. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

gkae1173_Supplemental_File

Data Availability Statement

All data generated or analyzed during this study are included in this published article.


Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

RESOURCES