Pseudouridine site assignment by high-throughput in vitro RNA pseudouridylation and sequencing

Nicole M Martinez; Cassandra Schaening-Burgos; Wendy V Gilbert

doi:10.1016/bs.mie.2021.06.026

. Author manuscript; available in PMC: 2022 Jul 30.

Published in final edited form as: Methods Enzymol. 2021 Jul 30;658:277–310. doi: 10.1016/bs.mie.2021.06.026

Pseudouridine site assignment by high-throughput in vitro RNA pseudouridylation and sequencing

Nicole M Martinez ¹, Cassandra Schaening-Burgos ², Wendy V Gilbert ^1,^*

PMCID: PMC9258999 NIHMSID: NIHMS1808511 PMID: 34517951

Abstract

Pseudouridine (Ψ) is one of the most abundant modifications in cellular RNAs. High-throughput pseudouridine profiling of eukaryotic mRNAs from cells has revealed novel sites of modification across the transcriptome. Pseudouridine affects RNA structure and RNA-protein interactions with the potential to influence many steps of mRNA metabolism and thereby affect gene expression. Identifying the mechanisms by which individual pseudouridines sites are modified by pseudouridine synthases (PUS) will facilitate studies on the molecular functions of Ψ. Multiple pseudouridine synthases are expressed in all organisms and might direct pseudouridylation of diverse cellular RNAs, but the RNA targets of many enzymes and their specificity determinants remain to be defined. We developed a high-throughput in vitro pseudouridylation assay followed by sequencing that allows validation of candidate sites identified in cells, assignment of sites as direct targets of PUS and interrogation of the RNA sequence and structural features that direct modification. We also implemented an analysis pipeline to assign Ψ sites from these data, including an updated approach to peak-calling that accounts for noisy signal from low-abundance transcripts.

Keywords: Pseudouridine, RNA modification, mRNA modification, PUS, Pseudouridine synthase, pseudouridylation, Pseudo-seq

1. Introduction

New high-throughput sequencing methods have facilitated the discovery of RNA modifications in messenger RNAs (mRNAs). mRNAs are extensively modified with an increasing repertoire of non-canonical bases. These modifications in mRNA, collectively termed the epitranscriptome, have the potential to influence the fate of mRNAs to regulate gene expression [1]. Pseudouridine is an abundant RNA modification present in cellular mRNAs of yeast and human cells [2–4]. Pseudouridine has distinct chemical properties that alter RNA structure, protein-RNA and RNA-RNA interactions [5]. When artificially introduced into mRNAs the molecular differences between pseudouridine and uridine are sufficient to impact mRNA metabolism [5]. However, the direct endogenous functions of this modification remain largely unknown. Isomerization of uridine to pseudouridine in mRNAs is predominantly catalyzed by the RNA-independent pseudouridine synthases (PUS) which bind and recognize sequence and/or structural features in their RNA targets [6]. Multiple pseudouridine synthases are expressed in all organisms and their expression varies across tissues and cell lines, suggesting that regulated mRNA pseudouridylation might be used to control gene expression.

Consistent with regulatory potential, pseudouridine content of mRNAs changes in response to nutrient deprivation and other cellular stresses [2, 3].

A first step toward elucidating the functions of pseudouridines in mRNA metabolism is to determine which enzymes modify which targets. We have developed a sequencing-based high-throughput in vitro pseudouridylation assay, which allows the identification of the direct targets of individual PUS (Fig. 1). This approach allows testing thousands of RNA sequences in parallel for pseudouridylation by recombinant PUS or cellular sources of PUS. Extent of pseudouridylation is monitored by derivatization of pseudouridines with a carbodiimide (CMC) and detection of CMC-induced reverse transcriptase stops by highthroughput sequencing. This approach allows validation of candidate sites identified by pseudouridine profiling in cells and assignment of targets to PUS [6]. Additionally, including RNA structural and sequence variants of PUS targets defines the RNA elements required for modification by a PUS [6]. The included protocol is based on the detection of the pseudouridine modification, but can easily be adapted to the study of other RNA modifications.

Figure 1. — Pool of thousands of DNA sequences that includes candidate pseduouridines and flanking sequence. Wildtype and mutant sequences can be included to interrogate the sequence and structural features that direct Ψ. The DNA oligos include a T7 promoter sequence and a 3′ adapter for PCR amplification and in vitro transcription. In vitro transcribe RNA sequeces from the DNA pool template. In vitro psedudouridylate by incubating the folded pool of RNA sequences with reocmbinant pseudouridine synthases (PUS) or cell extracts as a source of PUS. Perform Pseduo-seq to detect Ψ installed by specific PUS and to determine the effects of mutations on Ψ. Use the Z-score base approach for site calling. Figure adapted from Carlile et al. 2019 Nat Chem Bio.

2. Pool design

Determine PUS that modifies site and validate sites identified in cells.

Design a pool of DNA oligo sequences corresponding to RNA sequences of interest. These may include pseudouridine sites identified in cells by a transcriptomic approach, regions that are predicted to be modified (e.g. based on sequence or structural similarity to known PUS targets) or other RNA of interest to interrogate as possible PUS targets (e.g. RNAs that crosslink to a PUS in cells). For each pseudouridine site, include desired amount of flanking sequence centered on the pseudouridine (e.g. for an RNA sequence length of 130 nucleotides in length include the pseudouridine and 65 nucleotides of upstream and 64 nucleotides of downstream sequence. We have tested pools with up to 130 nucleotides of endogenous sequence. Append the T7 promoter sequence (Table 1) upstream of the test sequence for in vitro transcription and a common sequence downstream of the test sequence for cDNA synthesis. These adapter sequences will also serve as common handles for library construction. Order the commercially synthesized pool of DNA oligo sequences (Twist Biosciences, Fig.1). Our DNA oligo pools are a total of 170 nucleotides in length. Twist Biosciences can synthesize DNA pools of up to 300 nucleotides in length including any number of total distinct DNA oligos. For these assays we have included thousands to tens of thousands of unique sequences as part of the pool. The throughput or number of test sequences can be scaled up or down according to experimental needs.

Table 1:

Key Resources Table

REAGENT or RESOURCE	SOURCE	IDENTIFIER
Biological Samples
Nuclear extract	Human cells
S100	Yeast cells
Chemicals, Peptides, and Recombinant Proteins
Recombinant pseudouridine synthase	Human, yeast	PUS1, PUS7, TRUB1, TRUB2, RPUSD2
CMC (for pseudouridine modification detection)	Sigma	cat. no. C106402
Deposited Data
GSE99487	GEO archive	[1]
Oligonucleotides
GCTAATACGACTCACTATAGGG	IDT	T7 sequence to include at 5′ end of DNA oligos
CACTCGGGCACCAAGGAC	IDT	3′ adapter to append to 3′end of DNA oligo
GCTAATACGACTCACTATAGGG	IDT	Pool PCR F primer
GTCCTTGGTGCCCGAGTG	IDT	Pool PCR R primer
GCCTTGGCACCCGAGAATTCCGTCCTTGGTGCCCGAGTG	IDT	poolRT_short primer
/5Phos/NNNNNNNNNNGATCGTCGGACTGTAGAACTCTGAACGTG/3SpC3/	IDT	5′ adapter
AATGATACGGCGACCACCGAGATCTACAC GTTCAGAGTTCTACAGTCCGA	IDT	Library PCR F primer
CAAGCAGAAGACGGCATACGAGATXXXXXXGTGACTGGAGTTCCTTGGCACCCGAGAATTCCA	IDT	Library PCR R primer (barcoded)
Recombinant DNA
Twist DNA Pool	Twist Biosciences
Software
python
anaconda
tophat2		[2]
samtools		[3]
bedtools		[4]
cutadapt		[5]

Open in a new tab

3. Materials and equipment

1. Buffers

Before starting the experiment prepare the following buffers:

Buffers for nucleic acid preparation and extraction

DNA elution buffer

300 mM NaCl

10 mM Tris·Cl, pH 8.0

DNA loading dye (6x)

30% (v/v) Glycerol

0.025% (w/v) bromphenol blue 0.025% (w/v) xylene cyanol FF

RNA elution buffer

300 mM sodium acetate, pH 5.3

1 mM EDTA, pH 8.0

100 U/ml RNasin Plus

Add the RNasin immediately before use

Formamide buffer (2x)

95% (v/v) formamide

5 mM EDTA, pH 8.0

0.025% (w/v) SDS

0.025% (w/v) bromphenol blue 0.025% (w/v) xylene cyanol FF. Store at −20°C
Buffers for in vitro pseudouridylation

5x Pseudouridylation buffer

500 mM Tris pH 8.0

500 mM Ammonium Acetate

25 mM MgCl₂

1.5 mM EDTA

Filter sterilize using a 0.2-μm filter
Buffers for pseudouridine detection

BEU buffer

7 M Urea

4 mM EDTA, pH 8.0

50 mM Bicine, pH 8.5

Filter sterilize using a 0.2-μm filter

Adjust the final pH to ~9.0 using NaOH and HCl

Sodium carbonate buffer

50 mM Na2CO3, pH 10.4

2 mM EDTA, pH 8.0

Prepare from stocks of 1 M Na2CO3, pH 10.4, and 0.5 M EDTA, pH 8.0, and adjust the final pH to 10.4 using sodium bicarbonate. Filter sterilize using 0.2-μm filter.
RT buffer w/o Mg2+, 10×

500 mM Tris·Cl, pH 8.6

600 mM NaCl

100 mM DTT

Store at −20°C

2. Reagents

Pool of DNA sequences (Twist Biosciences)

Recombinant PUS or cell lysate containing endogenous PUS

Deionized distilled H2O

Ice

Liquid nitrogen

Acid phenol

Chloroform

Isopropanol

100% Ethanol

70% Ethanol

5 M NaCl

10 mM Tris·Cl, pH 8.0

40 mM EDTA, pH 8.0

GlycoBlue

BEU buffer (see recipe)

CMC (Sigma, cat. no. C106402)

0.5 M CMC in BEU buffer (make fresh immediately before CMC treatment)

3 M sodium acetate, pH 5.3

Sodium carbonate buffer (see recipe)

RNasin Plus (Promega, cat. no. N2615)

2x RNA loading dye (see recipe)

10-bp DNA ladder (Invitrogen, cat. no. 10821-015)

0.5x TBE

SYBR Gold (Invitrogen, cat. no. S-11494)

10x RT buffer without Mg2+ (see recipe)

10 mM dNTPs

100 mM MgCl2

AMV RT (Promega, cat. no. M5108)

1 N NaOH

1 N HCl

CircLigase ssDNA ligase (Epicentre, cat. no. CL4115K)

1 mM ATP

50 mM MnCl2

Phusion High-Fidelity (HF) DNA polymerase (NEB, cat. no. M0530L) HF buffer

6x DNA loading dye (see recipe)

3. Equipment

Tabletop and refrigerated centrifuges

50-and 15-ml conical tubes

Pipettes

Vortex mixer

Thermomixer (Eppendorf)

Thermal cycler (Eppendorf)

Gel electrophoresis equipment

Rocker platform

Blue light box

1.5 mL microcentrifuge tubes

200-μl PCR tubes

Alternatives:

Incubation steps can be carried out in thermomixer, thermal cycler or heat blocks.

4. Protocol

This protocol is based on our recently described protocol [6] for in vitro pseudouridylation and high-throughput sequencing. In this version of the protocol we include the following additions to the experimental approach: use of a 5′ adapter ligation instead of cDNA circularization for library construction; a protocol for making and using cell extracts as a source of PUS activity; an analysis pipeline and updated peak calling method.

4.1. PCR amplify the DNA pool

The first step of the protocol is to amplify the DNA pool by PCR to ensure sufficient material as input for the in vitro transcription reaction. This is achieved using PCR primers complementary to the T7 and 3′ adapter sequences that were appended to the test sequences during the Pool Design (section 2, Table 1).

The DNA oligo pool comes lyophilized. Resuspend the DNA pool at a concentration of 0.5 ng/μL with 10 mM Tris-HCl, pH 8.0.

Prepare an 11X or 550 μL PCR master mix to amplify the DNA pool.

PCR Master mix:	1X	11X
5X HF buffer	10 μL	110 μL
10 μM T7 primer	1 μL	11 μL
10 μM 3′ adapter primer	1 μL	11 μL
10 mM dNTPs	1 μL	11 μL
DMSO	1.5 μL	16.5 μL
H2O	34.25 μL	376.75 μL
Oligo Pool	1 μL	11 μL
Phusion polymerase	0.25 μL	2.75μL

Open in a new tab

Distribute 50uL of PCR master mix into individual PCR tubes. Place in thermocycler for 12 PCR cycles as indicated below. Select an appropriate annealing temperature based on the melting temperature of the primer sequences.

Initial denaturation: 98°C 30s

Cycle

Denature: 98°C 10s

Anneal: 58°C 30s

Extend: 72°C 30s

Final extension: 72°C 5 min

Open in a new tab

Note: The optimal number of PCR amplification cycles to maximize yield while minimizing overamplification products should be determined for each DNA pool.
Gel purify the template DNA by polyacrylamide gel electrophoresis. Cast two 8% non-denaturing TBE-polyacrylamide mini-gels.
Add 100 μL of 6x DNA loading buffer to 500 μL PCR reactions and load the PCR product distributed across all the lanes of two 8% TBE polyacrylamide gels. Run the samples for 50 min at 200V.
Stain DNA by placing gels in a container and covering each with 15 mL of 0.5X TBE plus 15 μL of SYBR Gold (1:10,000). Incubate on rocking platform for 5 min. Visualize gels with blue light illumination and cut out bands (Fig. 2).

Note: The use of blue light illumination is important to avoid damaging of the DNA template that can result from using UV light sources.
Using a razor blade cut out the band and elute PCR product by placing gel slices in two 1.5mL tubes each containing 750 μL of DNA elution buffer. Rock overnight at room temperature. Make sure that the gel slices are submerged in buffer to ensure complete DNA extraction.
To precipitate DNA do as follows: Transfer solution with eluted DNA into new 1.5 mL tubes and add 750 μL isopropanol (1 volume) and 2 μL glycoblue to each tube. Vortex sample and incubate at −20°C for at least 30 min. Spin at maximum speed on microfuge for 30 min at 4°C to pellet DNA. Wash pellets with 750 μL of 70% Ethanol and spin down at maximum speed on microfuge for 10 min at 4°C. Remove supernatant and allow pellets to dry for 5 minutes at room temperature. Resuspend the DNA pellet in 8 μL of water.

Pause Point: The DNA can be stored in the isopropanol precipitation reaction at −20C to resume at a later time or date.

Figure 2. — Example gel of PCR product distributed across wells

4.2. In vitro transcribe RNA

The next step in the protocol is to make RNA from the DNA template pool by in vitro transcription with T7 polymerase. The resulting RNA pool will be used as a substrate for in vitro pseudouridylation reactions with recombinant pseudouridine synthases.

Set-up in vitro transcription reaction using megashortscript kit (Ambion) for in vitro transcription.

To the 8 μL of gel purified PCR product add:

2 μL 10X buffer

2 μL ATP

2 μL UTP

2 μL CTP

2 μL GTP

2 μL enzyme mix
Incubate the in vitro transcription reaction at 37°C for 2h. Then add 2 μL of additional T7 polymerase and incubate another 2h at 37°C.
To degrade the DNA template that was used to transcribe the RNA add 2 μL DNase TURBO and incubate at 37°C for 15 min.
Gel purify the RNA by denaturing polyacrylamide gel electrophoresis. Cast 8% TBE-urea-polyacrylamide mini-gel. Pre-run gels in gel electrophoresis apparatus filled with 0.5X TBE for 20 min at 200V. Pre-running will pre-heat gel to ensure that the RNA samples remain uniformly denatured and run true to size.
Add 24 μL of 2X formamide buffer (formamide is a denaturing agent) to the RNA samples and heat denature by incubating at 98°C for 2 min.
With a syringe flush each of the wells on the gel with 0.5X TBE to remove urea from the wells.
Load 6 μL of denatured RNA samples into each well of the gel and run at 200V for 60 min.
Stain RNA by placing gels in a container and covering each with 15 mL of 0.5X TBE plus 15 μL of SYBR Gold (1:10,000). Incubate on rocking platform for 5 min. Visualize gels with blue light illumination and cut out bands (Fig. 3).

Note: The use of blue light illumination is important to avoid damaging the RNA by exposure to UV light sources.
Elute RNA by placing gel slices in 750 μL of RNA elution buffer and incubating overnight at 4°C on a rocking platform.
Precipitate the RNA: Transfer the supernatant to a fresh 1.5 mL tube and precipitate RNA by adding 750 μL isopropanol (1 volume) and 2 μL glycoblue. Vortex and incubate at −20°C for at least 30 min. Spin down at maximum speed in microfuge for 30 min to pellet RNA. Wash pellets with 750 μL of 70% Ethanol and spin down at maximum speed in microfuge for 10 min. Remove supernatant an allow pellets to dry for 5 minutes at room temperature. Resuspend dried RNA pellets in 20 μL of 10mM Tris pH 8.

Pause Point: The RNA can be stored in the isopropanol precipitation reaction at −20C to resume at a later time or date.

Figure 3. — Example gel of RNA in vitro transcribed from pool DNA template distributed across wells

4.3. In vitro pseudouridylation with recombinant PUS or PUS-containing lysate

Once the substrate RNAs have been made they are pseudouridylated in vitro by incubating with a source of pseudouridine synthase activity. Include a no PUS control along with the individual PUS samples. The no PUS sample will identify PUS-dependent RT stops indicative of a pseudouridine and aid in PUS assignment.

Folding the RNA

PUS proteins and other tRNA modifying enzymes in many cases recognize structural features of folded RNA. Therefore, it is important to fold the purified in vitro transcribed RNA to ensure modification. The RNA is denatured in water followed by addition of 5x pseudouridylation buffer which includes magnesium to facilitate folding at 37°C.

Make an RNA stock solution (5 μM final concentration).
Prepare a master mix to fold RNA by combining 6 μL (30 pmol) of 5 μM RNA stock solution and add 33.5 μL of water for each PUS being tested and a no PUS control.
Incubate the RNA reactions at 75°C for 2 min to denture the RNA.
Place the sample on ice for 5 min.
Add 10 μL of 5X Pseudouridylation Buffer to the RNA solution for each PUS being tested and a No PUS control.
Incubate sample at 37°C for 20 min to re-fold RNA.

In vitro pseudouridylation

The PUS-treated sample is incubated with a purified recombinant PUS or another source of PUS activity such as cellular extracts (e.g. nuclear extract, S100 extract). We have observed robust pseudouridylation by some but not all PUS in cell extracts, allowing for testing modification in the absence of protein purification. Extracts from WT and PUS-depleted cells can be used identify the PUS that directs pseudouridylation of target RNAs. The cell extract approach would also supply potential cofactors that might influence modification by some RNA modifying enzymes.

Purify recombinant PUS

Purify the recombinant PUS of interest or prepare cell extract. The protein purification strategy will need to be optimized for each enzyme. In [1] we describe the purification strategy for several PUS, which may provide a useful starting point. We have not included a protocol here. We have used yeast S100 extract and human extracts as a source of PUS activity. Included are protocols to prepare these. The yeast S100 extract preparation was adapted from [7].
Protocol for yeast S100 preparation
1. Grow a 750 mL yeast culture to mid log phase and pellet cells by centrifugation at 16,000 x g for 5 min.
2. Resuspend the cell pellet in 25 mL ice-cold water, and transfer to a 50 mL conical tube.
3. Pellet cells by centrifugation at 3400 x g for 5 min at 4ºC, and pour off supernatant. Pipet to remove residual water from cells and weigh the tubes. The pellets should be approximately 5–10 g wet weight.
  
  Pause Point: The washed cell pellets can be flash frozen in liquid nitrogen and stored at −80ºC.
4. Resuspend cell pellets in 1.5 mL ice cold lysis buffer (50 mM Tris HCl pH7.5, 100 mM KCl, 10 mM MgCl₂, 0.1 mM EDTA, 10% glycerol, 10 mM beta mercaptoethanol, 1 mM PMSF, protease inhibitor tablet) per gram of pellet weight.
5. Add 5 g zirconia/silica 0.5 mm beads (Biospec, cat. no. 11079105z) per gram of pellet.
6. Lyse cells by vortexing at high speed 6 × 30 sec with ≥1 min on ice in between.
7. Centrifuge crude lysate at 3400 x g for 5 min.
8. Transfer supernatant to high-speed centrifuge tubes.
9. Centrifuge at 12,000 x g for 20 min.
10. Transfer supernatant to SW41 ultra centrifuge tubes. Top off volume with lysis buffer to fill tubes within 3 mm of top.
11. Centrifuge at 100,000 x g for 60 min.
12. Pipet out clarified S100 extract, avoiding any top layer (contains lipids) and leaving a generous layer behind on top of the pellet.
13. Flash freeze extract aliquots in liquid nitrogen and store at −80ºC.
Protocol for human nuclear extract preparation.
1. Trypsinize cells of interest (e.g. HepG2) and pellet ~2×10⁶ cells by spinning down at 500 x g for 5 min.
2. Resuspend cells in 1 mL of PBS and transfer to 1.5 mL tube.
3. Centrifuge at 845 x g for 1 min.
4. Resuspend the cell pellet in ~100 uL (per 2×10⁶ cells) of cytoplasmic extract buffer (10 mM HEPES pH 7.6, 1.5 mM MgCl2, 10 mM KCl, 0.15% NP-40).
5. Incubate on ice for 5 minutes.
6. Spin down for at 4000 x g for 3 min.
7. Collect supernatant as cytoplasmic fraction.
8. Resuspend the nuclear pellet in an equal volume of nuclear extract buffer (20 mM HEPES pH 7.6, 1.5 mM MgCl2, 420 mM NaCl, 0.2 mM EDTA, 20% glycerol).
9. Perform three freeze-thaw cycles: Incubate 15 minutes at − 80°C followed by thawing at 37°C and vortexting for 1 minute.
10. Spin down at max speed for 15 minutes at 4°C in microfuge.
11. Transfer supernatant to new 1.5 mL tube as nuclear extract and determine concentration.

In vitro pseudouridylation reaction

Prepare the in vitro pseudouridylation reaction and pre-warm at 30°C prior to addition of folded RNA. Folded RNA and recombinant PUS should be added last. Final reagent concentrations are 2mM DTT, 1x Pseudouridylation Buffer, 600 nM PUS.

In vitro pseudouridylation reaction:
90 μL	5x Pseudouridylation buffer
10 μL	100mM DTT
2.4 μL	recombinant PUS (130 μM) or (~100μg of nuclear extract)
46.5 μL of folded RNA
Bring volume to 500 μL with water.

Open in a new tab

Incubate reactions at 30°C for 45 min to complete pseudouridylation. 45 min is sufficient for nearly complete pseudouridylation by many PUS in our hands.
Immediately add 500 μL of acid phenol to samples to stop the reactions, vortex and spin 5 min at maximum speed in a microfuge.
Transfer top phase to a new tube and add 500 μL of chloroform, vortex and spin 5 min at maximum speed in microfuge.
Precipitate the in vitro pseudouridylated RNAs: Transfer the top phase to a new tube and add 1x volume of isopropanol, 1/9 volume of sodium acetate, 2 μL of glycogen. Vortex and incubate the reaction at −20°C for at least 30 min and spin at maximum speed for 30 min in microfuge to precipitate the RNA. Wash the RNA pellet with 750 μL of 70% Ethanol and spin down at maximum speed in microfuge for 10 min. Remove supernatant and allow pellets to dry for 5 minutes at room temperature. Resuspend pellet in 30 μL of water.

Pause Point: The RNA can be stored in the isopropanol precipitation reaction at −20C to resume at a later time or date.

4.4. CMC modification and reversal

Pseudouridines are detected by derivatization with the carbodiimide N-cyclohexyl-N^, (2-morpholinoethyl) carbodiimide metho-p-toluenesulfonate (CMC) with RNA isolated and purified from the in vitro pseudouridylation reaction. CMC will form covalent adducts guanidine, uridine, and pseudouridine. However, only the CMC-N3 pseudouridine adduct is resistance to reversal under alkaline conditions [8]. These CMC-pseudouridine adducts are then identified based on CMC-dependent reverse transcriptase stops.

Note: Each PUS treatment condition will be split into two reactions a –CMC (mock treated) and a +CMC (treated) sample. The -CMC negative control samples will allow filtering out of CMC-independent RT stops that do not result from pseudouridines. It is important that the -CMC negative control be “mock treated” and subjected to the modification and reversal in parallel to account for significant RNA fragmentation during these steps.

CMC modification
1. Prepare a fresh 0.5 M CMC BEU buffer solution immediately before use.
2. Split RNA sample into two 0.2 mL tubes by transferring 12 μL of RNA into a tube for the mock treated (–CMC) and 18 μL of RNA into a separate tube for the CMC treated sample. Bring the volume of each sample to 20 μL with water.
  
  Note: More RNA is distributed to the +CMC sample to account for lower recovery of CMC modified RNA.
3. Add 2.9 μL of 40 mM EDTA to each sample and incubate at 80°C for 3 min in thermocycler.
  
  Note: This step denatures the RNA to increase accessibility of pseudouridines that might be inaccessible due to RNA structure and improve derivatization with CMC.
4. Add 100 μL of 0.5 M CMC BEU Buffer to +CMC samples and 100 μL of BEU Buffer to the mock treated (–CMC samples).
5. Incubate RNA for 45 min at 40°C with shaking at 1,000 rpm in thermomixer.
6. Transfer samples to fresh 1.5 mL tubes.
7. Precipitate the RNA: To clean-up the reaction add 1 mL of 100% ethanol, 50 μL of sodium acetate and 2 μL of glycoblue. Vortex and incubate the reaction at −20°C for at least 30 min. Spin at maximum speed for 30 min in microfuge to precipitate the RNA. Wash pellets 2X by adding 500 μL of 70% ethanol and spin down for 10 min in microfuge at maximum speed and 4°C. Air dry pellets for 5 minutes at room temperature.
  
  Note: Use of ethanol precipitation is important because it yields better recovery of CMC modified RNA than isopropanol precipitation.
  
  Pause Point: The CMC modified RNA can be stored in the ethanol precipitation reaction at −20C to resume at a later time or date.
CMC reversal
1. Resuspend each pellet in 30 μL of sodium carbonate buffer (alkaline buffer for reversal).
2. Incubate the RNA for 2h at 50°C with shaking at 1000 rpm in thermomixer.
3. Precipitate the RNA: Add 2 μL glycoblue, 1/9 volume of sodium acetate and 2.5 volumes of ethanol and incubate at −20°C for 30 min. Spin at maximum speed for 30 min in microfuge to precipitate the RNA. Wash pellets 2X by adding 500 μL of 70% ethanol and spin down for 10 min in microfuge at maximum speed and 4°C. Allow pellets to air dry for 5 minutes. Resuspend RNA in 8μL of 10mM Tris-HCl pH 8.
  
  Pause Point: The RNA can be stored in the ethanol precipitation reaction at −20C to resume at a later time or date.
Size selection of full-length RNA after CMC reversal.
Note: There is significant RNA degradation during the incubation at 50°C under alkaline conditions for CMC reversal. Adding a gel purification step to isolate full-length prevents a 3′ read coverage bias that might result from RNA degradation during this step and ensures that read 5′ ends are generated during reverse transcription and not as a result of RNA degradation.
1. Gel purify the RNA by denaturing polyacrylamide gel electrophoresis. Cast 8% TBE-urea-polyacrylamide mini-gel. Pre-run gels in gel electrophoresis apparatus filled with 0.5X TBE for 20 min at 200V. Pre-running will pre-heat gel to ensure that the RNA samples remain uniformly denatured and run true to size.
2. Add 8 μL of 2X formamide buffer (formamide is a denaturing agent) to the RNA samples and heat denature by incubating at 98°C for 2 min, then place on ice.
3. With a syringe flush each of the wells on the gel with 0.5X TBE to remove urea from the wells.
4. Load 16 μL sample of denatured RNA samples into each well of the gel and run at 200V for 60 min.
5. Stain RNA by placing gels in a container and covering each with 15 mL of 0.5X TBE plus 15 μL of SYBR Gold (1:10,000). Incubate on rocking platform for 5 min. Visualize gels with blue light illumination and cut out full length product (~151nt for the example provided in this protocol or chosen RNA length during pool design, Fig. 4).
  
  Note: The use of blue light illumination is important to avoid damaging the RNA by exposure to UV light sources.
6. Elute RNA from gel slices in 400 μL of RNA elution buffer by incubating overnight at 4°C on a rocking platform.
7. Precipitate the RNA: Transfer eluate into new tube and add 1 mL of 100% ethanol, 2 μL of glycoblue. Vortex and incubate at −20°C for at least 30 min. Spin at maximum speed for 30 min in microfuge to precipitate the RNA. Wash pellets 2X by adding 500 μL of 70% ethanol and spin down for 10 min in microfuge at maximum speed and 4°C. Allow pellets to dry for 5 minutes at room temperature. Resuspend the RNA in 6.2 μL H2O
  
  Pause Point: The RNA can be stored in the ethanol precipitation reaction at −20C to resume at a later time or date.

Figure 4. — Example of gel purification of full length RNA after CMC reversal

4.5. Reverse transcription

During cDNA synthesis CMC-pseudouridine adducts induce stops to reverse transcriptase. Reverse transcribe the pool of RNA sequences using a reverse primer complementary to the 3′ adapter sequence present at the 3′ end of all the RNAs in the pool. Then size select truncated cDNAs resulting from CMC-pseudouridine adducts.

Transfer 6.2 μL of RNA into clean PCR tubes.
Add 1 μL of gel purified poolRT_short primer (25 μM), 0.8 μL 10x RT buffer without magnesium to the 6.2 μL of purified RNA samples. In parallel prepare a no RNA reaction control with 6.2 μL.
Anneal the RT primer to the RNA by incubating the reactions as follows:

65C for 4 min

55C for 2 min

45C for 2 min

42C for 2 min
Centrifuge briefly to collect condensation and place on ice.
Prepare an RT extension master mix with the following components per reaction.

RT master mix:

0.6 μL 10x RT buffer w/o Mg2

2.24 μL 25 mM dNTPs

1.16 μL 240mM MgCl2

1.0 μL RNasin Plus

1.1 μL AMV RT

Note: Other RTs including Superscript III also stop at CMC adducts.
Add 6 μL of the RT master mix to the annealing reaction.
Incubate for 1 h at 42°C in thermocycler to synthesize cDNA.

Pause Point: The cDNA samples can be stored at −20C to resume protocol at a later time or date.
Cast enough 8% TBE-urea-polyacrylamide mini-gel for running all samples. Pre-run gels in gel electrophoresis apparatus filled with 0.5X TBE for 20 min at 200V. Pre-running will pre-heat gel to ensure that the RNA samples remain uniformly denatured and run true to size.
Add 1.5 μL 1N NaOH to each sample and incubate for 15 min at 98°C to hydrolyze RNA templates.
Add 1.5 μL HCl to each reaction to neutralize pH.
Add 17 μL of 2x formamide buffer to each reaction.
Heat samples for 2 min at 98°C to denature the sample and place on ice.
Load samples and run 8% TBE-urea-polyacrylamide mini-gels for 68 min at 200V.
Stain gel with 15 mL of 0.5X TBE and 15 μL of SYBR Gold (1:10,000) for 5 min. Use blue light illumination to visualize gel since reverse transcription is very inefficient on UV-damaged RNA template. Cut out truncated cDNA of expected size relative to the position of the expected pseudouridine dependent stop and taking into account the length added by the RT primer to the cDNA. It is likely that only the full-length product will be visible on the gel. Cut the truncated cDNAs (due to CMC dependent stops one nucleotide 3′ to the expected Ψ position) to enrich for the pseudouridine containing RNAs using a suitable sized ladder as a reference.

Note: For an RNA pool consisting of 130 nt oligos with the expected pseudouridine at position 66, and using the 3′ adapter and RT primer noted above, the truncated cDNAs should run at ~104 nt. Cut a broad band on the gel −/+30 nucleotides the expected size (size range: 74 to 134 bp) (Fig. 5).
Elute cDNA by placing each gel slice in 400 μL DNA elution buffer by incubating at room temperature with rocking overnight.
Precipitate by adding 1 mL of ethanol, 2 μL glycoblue and incubating at −20°C for at least 30 min.

Pause Point: The eluted cDNA samples can be stored in the ethanol precipitation reaction at −20C to resume at a later time or date.
Spin down at maximum speed for 30 min.
Wash pellets with 750 μL of 70% ethanol and spin down at maximum speed in microfuge for 10 min.
Allow pellets to dry for 10 min at room temperature.
Resuspend each pellet in 5 μL 10mM Tris pH 8.

Figure 5. — Example of gel purification of truncated cDNA.

4.6. 5′ adapter ligation

An adapter is ligated to the 3′ end of the synthesized cDNA fragments or the 5′ end of the target RNAs. This 5′ adapter (Table 1) is ligated to introduce a common handle for PCR amplification and high-throughput sequencing. In the past we have made these libraries using a long RT primer and cDNA circularization instead of 5′ adapter ligation. However, we have found that the 5′ adapter ligation strategy overcomes the inefficient circularization of longer RNAs by CircLigase. The 5′ adapter also introduces a 10N unique molecular identifier (UMI), which allows collapsing of PCR duplicates that may arise during final library amplification. Removal of PCR duplicates is important for quantifying differences in pseudouridylation among sequences.

Add 0.8 μl of 5′ adapter to cDNA samples and 1 μL of DMSO.
Heat at 75°C for 2 min and then place on ice.
Assemble a ligation mix:

2 uL10x NEB RNA ligase buffer

0.2 uL 0.1M ATP

9 uL50% PEG 8000

1 uL RNA ligase

1.1 uL Water
Vortex briefly to mix and flick the tube frequently throughout day.
Incubate at room temperature overnight.

4.7. Silane cleanup of linker-ligated cDNA

The ligated cDNA products are cleaned up on magnetic silane beads to remove the unligated linker.

Prepare beads for binding
1. Magnetically separate 10 μl MyONE Silane beads per sample by placing on mangnet and remove supernatant.
2. Wash 1X with 500 μl RLT buffer.
3. Resuspend beads in 60 μL RLT buffer per sample.
Bind RNA
1. Add beads in 60 μl RLT buffer to each sample and mix.
2. Add 60 μL 100% EtOH.
3. Incubate sample for 5 min at room temperature and pipette sample to mix during the incubation.
Wash beads
1. Place samples on magnet to separate beads and remove supernatant.
2. Add 1 mL 75% EtOH resuspend beads and transfer to new tube.
3. After 30s, place sample on magnet to separate and remove supernatant
4. Wash 2X for 30s with 75% EtOH.
5. Magnetically separate and remove residual wash.
6. Air dry pellet 5 min.
Elute RNA
1. Resuspend beads in 27 μl 10mM Tris-HCl pH 8. Incubate for 5 min at room temperature.
2. Place samples on magnet to separate. Transfer 25 μL of eluate to a new tube.
  
  Pause Point: The adapter ligated cDNA samples that are used as the input for the library PCR reactions can be stored at −20C to perform PCR reactions at a later time.

4.6. Diagnostic PCR

Perform a diagnostic PCR to determine the optimal cycle number for each library and then proceed to final library preparation using identified conditions.

Prepare a PCR master mix.

PCR master mix reagents per reaction:

3.34 μL 5X HF buffer

0.33 μL 10mM dNTPs

0.84 μL 10 μM RP1 primer

0.84 μL 10 μM barcoding primer

10.17 μL H2O

0.17 μL Phusion polymerase
Transfer 15.7 μL of master mix into individual PCR tubes and add 1μL cDNA.
Mix reactions and divide sample into 2 reactions for testing 2 PCR cycles at a time (e.g. 10, 12).
Set PCR to the following cycles:

Initial denaturation: 98°C for 30 sec

Cycle

Denature: 98°C for 10 sec

Anneal: 60C for 20 sec

Extend: 72°C for 40 sec

Final extension: 72°C for 5 min

Open in a new tab
Add 6x DNA loading dye to each reaction.
Cast 8% non-denaturing TBE-polyacrylamide mini-gels.
Load PCR samples and run on 8% non-denaturing TBE polyacrylamide mini-gel for 40 min at 200V.
Stain DNA by placing gels in a container and covering each with 15 mL of 0.5X TBE plus 15 μL of SYBR Gold (1:10,000). Incubate on rocking platform for 5 min. Visualize gels with blue light illumination.
Evaluate optimal cycle number which amplifies desired product but does not produce extra bands that are not of the expected size from overamplification.

4.7. Final PCR

PCR amplify the libraries for high-throughput sequence. The final library PCR primer introduce the sequences needed for Illumina sequencing and a per sample barcode that allows pooling of libraries for sequencing.

Set-up PCR master mix reactions.

PCR master mix components per reaction:

10.0 μL 5x HF buffer

1.0 μL 10mM dNTPs

2.5 μL 10 μM RP1 primer

2.5 μL 10 μM barcoding primer

30.5 μL H2O

0.5 μL Phusion
Add 3 μL of cDNA to each tube. Mix samples and run PCR program as above with the optimal number of cycles for each sample (determined in diagnostic PCR).
Add 10 μL of 6X DNA loading dye to each reaction.
Split each PCR reaction split over 3 lanes in 8% non-denaturing TBE polyacrylamide mini-gels for 40 min at 200V.
Stain gel with 15 mL of SYBR Gold (1:10,000) for 5 min, visualize stained gel and cut out PCR product (Fig. 6).

Note: It is important to gel purify the library PCR product away from the contaminating PCR product that results from amplification of empty adapters to decrease the proportion of unusable reads.
Elute PCR product by placing gel slice in 400 μL DNA elution buffer and incubating at room temperature with rocking overnight.
Transfer eluate into new tube and precipitate by adding 1 mL of 100% Ethanol, 2 μL of glycoblue. Vortex and incubate at −20°C for at least 30 min.
Wash pellets 1X with 750 μL of 70% Ethanol and spin down at maximum speed in microfuge for 10 min.
Allow pellets to dry for 5 min at room temperature.
Resuspend in 10 μL 10mM Tris pH 8.

Figure 6. — Example of gel purification of final library PCR.

4.8. Next-generation sequencing and pseudouridine detection

Submit libraries for sequencing on Illumina HiSeq single end 40–75bp reads.

5. Quantification and statistical analysis

CMC-induced reverse transcription stops appear as peaks of read 5′ ends at the pseudouridylated site. To identify these sites, the reads are first mapped to the reference sequences, and then the number of read 5′ ends at each position is retrieved. For a given position, the significance of a peak is calculated relative to the positions surrounding it by means of a Z-score calculation. Pseudouridine sites are then identified as those that have a high Z-score in the CMC-treated library, but low or close to background in the mock-treated library.

Throughout this section, names within brackets (such as <filename>) indicate placeholders for file or directory names. We have provided scripts that carry out each of the analysis steps in https://github.com/wvgilbert-lab/PseudoSeq

5.1. Read processing

If multiple libraries are combined into one FASTQ file, demultiplex into one file per library.
Adapter sequences at read 3′ ends, corresponding to the 5′ end of the amplicon, are trimmed using cutadapt [5].
PCR duplicates can be collapsed by virtue of the 10N UMI introduced with the RT primer during library construction using fastx_collapser [9].

5.2. Read mapping

The use of RNAs that were in vitro transcribed from a defined pool greatly simplifies read mapping. The reads are mapped to a custom index of the target sequences instead of the full genome, resulting in much faster read mapping, and splicing is not taken into consideration, as the RNAs are not processed further after transcription.

In the example below, the alignment generated by the read mapper is stored in a SAM file named alignment.sam. Before any additional steps are taken, samtools is used to convert it to a BAM file, which is a binary version of a SAM file, and allows for faster manipulations. This compressed alignment file is then sorted by position.

Generate a custom bowtie index of pool sequences. This step receives a FASTA file with all of the pool sequences and generates a bowtie2 index, which is stored in a series of files with a .bt2 suffix.

bowtie2-build <sequences.fasta> <pool_sequences>
Processed reads can then be mapped to a custom bowtie index of pool sequences using tophat2 [2].

tophat2 --output-dir <dirname> --library-type fr-unstranded \ --no-novel-juncs --no-novel-indels \ <pool_sequences> <reads.fastq> 1>>log 2>>log
Convert alignment file to a BAM file and sort mapped reads using samtools.

samtools view -bS alignment.sam | samtools sort -o sorted.bam

samtools index sorted.bam
Retrieve 5′ ends from mapped reads using bedtools

bedtools genomecov -d −5 -strand + -ibam sorted.bam > ends.txt

5.3. Peak height calculation

Pseudouridine sites are identified as CMC-dependent peaks, exhibiting a significant pileup of read 5′ ends at the modified site relative to their neighboring positions in CMC-treated libraries, but not in libraries that are mock-treated or lacked the relevant pseudouridine synthase (Figure 7A). To identify such sites, a modified Z-score is calculated for each position in each library, and cutoffs are selected to classify sites as significantly above background.

Figure 7. — a. Read 5′ ends mapped to a sample sequence in CMC- and mock-treated libraries. The pseudourine site is detected as a CMC-dependent peak 1-nt downstream. Peak height is calculated relative to the target region (shaded). b. Fraction of read 5′ ends mapped to each position in each sequence. The high-coverage region (shaded) is a consequence of size-selection after reverse transcription. c. Scatter plot comparing peak heights for each position in the CMC-treated vs the mock treated libraries. Significant CMC-dependent peaks (blue circle) are assigned as pseudouridines. A similar comparison can be made for PUS-treated vs mock-treated libraries to assign enzyme-dependence.

Because of the coverage biases introduced during sample processing, especially during size-selection of truncated cDNAs, it is necessary to select two sets of parameters empirically when calling sites. The first of these is the set of positions that putative Ψ sites will be compared to during Z-score calculation. Read 5′ ends outside of the selected window are depleted during the size selection step (Figure 7A,B). A background window that includes many positions outside of actual target window would result in artificially high Z-scores because of the low coverage at these positions. To address this, we visualize the distribution of read coverage to determine the bounds of the target region (Figure 7B).

Next, Z-score cutoffs must be chosen (Figure 4C). This can be done by generating a scatter plot of Z-scores in the +CMC vs the -CMC libraries, and identifying the cluster of sites along the y-axis: these are the sites that have significantly high peaks in the CMC-treated libraries, but not in the -CMC libraries.

Below, we describe the analysis steps for a single site relative to the surrounding positions.

Identify the coverage window by generating a metaplot of read fractions. For each sequence, calculate the fraction of mapped reads that are assigned to each position, and visualize these values on a scatter plot. (See Figure 4B for an example). The target region will appear as a ~40-nucleotide stretch with increased coverage. Select the left-most and right-most positions of this region.
Normalize the read count at each position by the total number of reads mapped to the target window in the sequence that contains it.
Select a set of background positions, which consists of all positions in the window except for the position of interest and the immediately adjacent positions (x, x+1, x-1)
Compute the mean (μ) and standard deviation (σ) using the adjusted background set, and use these to calculate a Z-score.
$Z_x = (reads_x - μ) / σ$
Select +CMC and -CMC cutoffs.

5.4. Using the provided scripts

The files and scripts mentioned below can be found in the GitHub repository for this method, https://github.com/wvgilbert-lab/PseudoSeq.

Acquire the necessary dependencies by installing the conda environment, stored in pool_pseudoseq.yml

conda env create -f pool_pseudoseq.yml
Activate the conda environment before running any of the scripts.

conda activate pool_pseudoseq
Use processReads.sh to trim, demultiplex, map, and retrieve 5′ ends. The main outputs of this step are a BAM file with sorted reads and the read 5′ ends in a tabular file.
Calculate Z scores using the pool_peak_caller.py script. This takes the read 5′ ends obtained from the mapped reads, and returns peak information in a tabular file. The bounds of the target region are specified using the --left and --right parameters. In the example below, these are 60-nt and 95-nt relative to the start of the sequence, but they should be changed by the user.

python pool_peak_caller.py --left 60 --right 95 \ --plusfiles ends.txt -o peaks.txt
Deactivate the conda environment by running conda deactivate

6. Alternative methods/procedures

6.1. Kinetic analysis of to identify structural and sequence motifs that drive modification.

The in vitro Pseudo-seq assay as described above allows for the assignment of sites as Ψ or U, but it does not allow for comparison across sites, due to the context and library capture biases that influence the baseline signal. This limitation can be overcome by employing a kinetic approach, allowing determination of the pseudouridylation efficiency of different substrates. For this approach, the experiment is carried out as a time course series, where a relative initial velocity (v_0,rel) is calculated for each sequence from a series of time points.

The kinetic approach is particularly useful for defining the RNA sequence and structural determinants that are required for modification by a PUS. Mutations can be introduced to alter the sequence and/or structural context of a site, and their contribution to pseudouridylation efficiency can be determined by comparing the v_0,rel values among mutants of the same sequence.

Identify motifs associated with modification by PUS
1. Determine the targets of the PUS of interest by the in vitro Pseudo-seq presented in this protocol or from genetic assignment of sites from in vivo Pseudo-seq in cells that express or lack a PUS.
2. Identify sequence motifs
  
  Perform motif enrichment analysis of PUS RNA targets using appropriate software (e.g. MEME [10]) or determine the frequency of nucleotides at each position flanking the pseudouridine to identify overrepresented sequences (e.g. Weblogo [11]).
3. Identify RNA structural motifs
  
  To identify shared RNA structural features among targets, fold all the RNA targets of a PUS in silico (e.g. RNA Fold). If available use chemical structure probing data performed on cellular RNAs or on the RNA pool (e.g. SHAPE-seq [12], DMS-seq [13]).
Design pool of mutants that disrupt identified sequence and structural features
1. To determine if the identified sequence features are required for modification, design a pool that includes all possible sequence variations of the identified motif. Keep each position of the motif constant while varying the other positions to all possible nucleotides.
2. To interrogate the importance of identified structural features, design a pool that introduces structure disrupting mutations of structural features shared among PUS targets. For instance, to determine the importance of a stem in a stem-loop, design mutants that disrupt and restore base pairing of the stem. Design weak stem-disrupting mutants by randomly selecting 25% of base pairs for mutation, and strong stem-disrupting mutants by randomly selecting an additional 25% of base pairs for mutations. Include compensatory mutations on the opposite strand to restore base pairing. Rescue of RNA pseudouridylation defects by compensatory mutations that restore RNA folding would rule out that the effect is due to the change in sequence.
3. Append a unique barcode for each wildtype and sequence variant upstream of the 3′ adapter sequence handle. This will allow unique mapping and analysis of reads originating from closely related sequences. The length of the barcode to be used in the experiment can be determined by calculating the appropriate Hamming distance (number of nucleotide differences between two barcodes) given the total number of sequences to be included in the pool.
In vitro pseudouridylation time course
1. Use the fraction of reads in the +CMC samples mapping to the Ψ-dependent RT stop position to determine fraction Ψ [1]. Since the CMC-dependent RT stop is known to result from a pseudouridine the −CMC libraries can be omitted from this time course experiment to reduce the total number of samples and the cost of the experiment.
2. Include reactions for multiple time points. The specific range of time points will need to be determined empirically for each PUS to select suitable time points prior to saturation. We have used times between 0 and 15 min to define RNA sequence and structural features that are important for modification by yeast Pus1.
3. Stop each reaction by snap freezing by dropping tubes in liquid nitrogen to quickly stop each time point. Proceed to addition of acid phenol once samples for all timepoints have been collected and snap frozen.
Next-generation sequencing and kinetic analysis
Modify the following steps from the base protocol:
1. In order to obtain the 10N UMI and the stop position at the 5′ end as well as the barcode at the 3′ end of the amplicon, the libraries for the kinetic analysis of wildtype and structure disrupting mutants need to be sequenced using paired-end 40–75 bp reads.
2. Primer sequences at the 5′ end of can be trimmed from the reverse read using cutadapt. Trimmed paired-end reads are then merged with PEAR.
3. PCR duplicates can be collapsed by virtue of the 10N UMI introduced with the RT primer during library construction using fastx_collapser [9]. Collapsed reads need to be trimmed of the nucleotides corresponding to the barcode 3′ end and of 3′ adapter sequence using cutadapt.
4. For the kinetic analysis, the pseudouridine signal is calculated as the fraction of reads whose 5′ ends map to the expected stop position. To calculate the relative initial velocities for each substrate, the background signal (fraction of reads) from the average of two 0 min timepoints is subtracted from each time point and any negative values are set to 0. The values for each time point are normalized by the maximum signal (fraction of reads) obtained for the wild type sequence. A linear regression analysis is then performed to obtain the initial velocity (slope). Differences in the relative initial velocity (v_0,rel) between all wildtype and mutants substrates of the same type (e.g. stem disrupted) by performing a paired, two-tailed Student′s t-test (p-value < 0.05).

6.2. Adaptation to study other RNA modifications

Design pool based on transcriptome-wide modification profiling (Mod-seq) in cells or based on regions of interest to interrogate modification status. Include known targets of RNA modifying enzyme as positive controls.
Purify recombinant RNA modifying enzyme or use cell extracts. Determine if enzyme is active on positive control sequences.
In vitro modification of RNAs with RNA modifying enzyme.
Read out modification status using appropriate Mod-seq approach such as chemical-based detection or antibody-based pull-down of modified RNAs followed by sequencing for modification detection.

7. Advantages

Transcriptome-wide pseudouridine profiling allows identification of endogenous sites of modification in cellular RNAs. This approach led to discovery of pseudouridines in mRNA and allows the study of this modification in different cell types and under different environmental conditions. However, high-throughput in vitro pseudouridylation assays have several advantages compared to pseudouridine profiling from cellular RNAs. One advantage of the in vitro approach presented here is that the sequencing reads for these experiments are evenly distributed and focused on the RNAs of interest that are included as part of the pool. This allows interrogation of whether lowly expressed RNAs are pseudouridylated by a PUS. The high sequencing coverage attained by the in vitro approach also allows confident assignment of a site as PUS dependent based on a gain of a reverse transcriptase stop in the PUS treated sample. This is advantageous because genetic assignment of PUS in cells is challenging as you need very high coverage to interpret the loss of reads as evidence for modification by a PUS in PUS depleted cells. Additionally, the in vitro approach allows for assignment of RNAs as direct targets of a particular PUS, whereas loss of pseudouridine signal in cells genetically lacking a PUS could be an indirect effect of the genotype.

The RNA features that direct modification by PUS are incompletely understood. Mutagenesis at endogenous loci using genetic engineering (e.g. CRISPR) is limited in throughput compared to in vitro assays with synthetic RNA pools. This in vitro approach allows systematic testing of the requirement of RNA sequence and structural features for modification by PUS [1]. An oligo pool can be designed to test thousands of sequence variants in parallel that disrupt or restore specific RNA features. The kinetic approach described in this protocol, which relies on varying time or enzyme concentrations, allows determination of site-specific pseudouridylation efficiency to distinguish between weak and strong substrates of a PUS. Distinguishing strong versus weak substrates will identify weak sites that might be more sensitive to regulation in different cellular contexts.

8. Limitations

In vitro Pseudo-seq has many advantages as a pseudouridine interrogation approach. One limitation is the high likelihood of false negatives. A failure to be pseudouridylated in vitro could reflect the need for additional RNA sequence for target recognition (pool lengths are currently limited to 300 nucleotides). Some PUS and sites might require cellular cofactors, which are not present when performing in vitro Pseudo-seq with recombinant PUS. This limitation might be overcome in some cases by using cell extracts as a source of PUS (see protocol above). Other features present in the endogenous context might not be recapitulated in the in vitro assay such as possible PUS recruitment mechanisms or coupling with other mRNA processing steps. Additionally, some true targets of a PUS might be modified in vitro but not in cells because the site is not accessible due to differences in subcellular localization or cellular factors might be present that sterically block the target site. Cloning the pool of DNA oligos into a mammalian expression reporter that allows expression of the pool of RNAs in cells would provide many of the benefits of the in vitro assay and provide some aspects of the endogenous context that might be important for modification [14].

9. Conclusions

Here we describe a high-throughput in vitro pseudouridylation assay and an analysis pipeline for pseudouridine peak calling. This approach allows testing of thousands of RNA sequences for modification by recombinant PUS or PUS from cell extracts. Mutational analysis of RNA sequence and structural features of RNA substrates using this in vitro pseudouridylation assay can be used to define the determinants of target recognition by PUS. RNA motifs or features identified by this approach can then be used to predict new sites or assign sites identified in cells in the absence of in cell pseudouridine profiling data [1]. Finally, knowledge of PUS RNA recognition features obtained from these experiments can be used to design engineered mutations that abolish modification of individual sites for functional characterization in the context of the endogenous mRNAs and/or in reporters. In summary, we describe a robust in vitro pseudouridylation assay that will facilitate the study of diverse aspects of pseudouridine deposition and function in RNA metabolism.

10. Acknowledgments

This work was supported by NIH (R01GM101316) to W.V.G and NIH (K99GM135537) N.M.M.

References

1.Carlile TM, Martinez NM, Schaening C, Su A, Bell TA, Zinshteyn B, Gilbert WV. (2019) mRNA structure determines modification by pseudouridine synthase 1. Nat Chem Biol 15:966–974. 10.1038/s41589-019-0353-z [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL (2013) TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol 14:R36. 10.1186/gb-2013-14-4-r36 [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R (2009) The Sequence Alignment / Map (SAM) Format and SAMtools 1000 Genome Project Data Processing Subgroup. Bioinformatics [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Quinlan AR, Hall IM (2010) BEDTools: A flexible suite of utilities for comparing genomic features. Bioinformatics. 10.1093/bioinformatics/btq033 [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal 17:10. 10.14806/ej.17.1.200 [DOI] [Google Scholar]
6.Martinez Nicole M., Gilbert WV. (2021) Investigating pseudouridylation mechanisms by high-throughput in vitro RNA pseudouridylation and sequencing. Methods Mol Biol 2298: [DOI] [PubMed] [Google Scholar]
7.Jiang HQ, Motorin Y, Jin YX, Grosjean H (1997) Pleiotropic effects of intron removal on base modification pattern of yeast tRNA(Phe): An in vitro study. Nucleic Acids Res. 10.1093/nar/25.14.2694 [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Bakin A, Ofengand J (1993) Four Newly Located Pseudouridylate Residues in Escherichia coli 23S Ribosomal RNA Are All at the Peptidyltransferase Center: Analysis by the Application of a New Sequencing Technique. Biochemistry. 10.1021/bi00088a030 [DOI] [PubMed] [Google Scholar]
9.Gordon A, Hannon GJ, Gordon (2014) FASTX-Toolkit. [Online] http://hannonlab.cshl.edu/fastx_toolkit
10.Bailey TL, Johnson J, Grant CE, Noble WS (2015) The MEME Suite. Nucleic Acids Res 43:W39–W49. 10.1093/nar/gkv416 [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Crooks GE (2004) WebLogo: A Sequence Logo Generator. Genome Res 14:1188–1190. 10.1101/gr.849004 [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Lucks JB, Mortimer SA, Trapnell C, Luo S, Aviran S, Schroth GP, Pachter L, Doudna JA, Arkin AP (2011) Multiplexed RNA structure characterization with selective 2′hydroxyl acylation analyzed by primer extension sequencing (SHAPE-Seq). Proc Natl Acad Sci 108:11063–11068. 10.1073/pnas.1106501108 [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Rouskin S, Zubradt M, Washietl S, Kellis M, Weissman JS (2014) Genome-wide probing of RNA structure reveals active unfolding of mRNA structures in vivo. Nature 505:701–705. 10.1038/nature12894 [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Safra M, Nir R, Farouq D, Slutzkin IV, Schwartz S (2017) TRUB1 is the predominant pseudouridine synthase acting on mammalian mRNA via a predictable and conserved code. Genome Res 27:393–406. 10.1101/gr.207613.116 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R1] 1.Carlile TM, Martinez NM, Schaening C, Su A, Bell TA, Zinshteyn B, Gilbert WV. (2019) mRNA structure determines modification by pseudouridine synthase 1. Nat Chem Biol 15:966–974. 10.1038/s41589-019-0353-z [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] 2.Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL (2013) TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol 14:R36. 10.1186/gb-2013-14-4-r36 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] 3.Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R (2009) The Sequence Alignment / Map (SAM) Format and SAMtools 1000 Genome Project Data Processing Subgroup. Bioinformatics [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] 4.Quinlan AR, Hall IM (2010) BEDTools: A flexible suite of utilities for comparing genomic features. Bioinformatics. 10.1093/bioinformatics/btq033 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] 5.Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal 17:10. 10.14806/ej.17.1.200 [DOI] [Google Scholar]

[R6] 6.Martinez Nicole M., Gilbert WV. (2021) Investigating pseudouridylation mechanisms by high-throughput in vitro RNA pseudouridylation and sequencing. Methods Mol Biol 2298: [DOI] [PubMed] [Google Scholar]

[R7] 7.Jiang HQ, Motorin Y, Jin YX, Grosjean H (1997) Pleiotropic effects of intron removal on base modification pattern of yeast tRNA(Phe): An in vitro study. Nucleic Acids Res. 10.1093/nar/25.14.2694 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] 8.Bakin A, Ofengand J (1993) Four Newly Located Pseudouridylate Residues in Escherichia coli 23S Ribosomal RNA Are All at the Peptidyltransferase Center: Analysis by the Application of a New Sequencing Technique. Biochemistry. 10.1021/bi00088a030 [DOI] [PubMed] [Google Scholar]

[R9] 9.Gordon A, Hannon GJ, Gordon (2014) FASTX-Toolkit. [Online] http://hannonlab.cshl.edu/fastx_toolkit

[R10] 10.Bailey TL, Johnson J, Grant CE, Noble WS (2015) The MEME Suite. Nucleic Acids Res 43:W39–W49. 10.1093/nar/gkv416 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] 11.Crooks GE (2004) WebLogo: A Sequence Logo Generator. Genome Res 14:1188–1190. 10.1101/gr.849004 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] 12.Lucks JB, Mortimer SA, Trapnell C, Luo S, Aviran S, Schroth GP, Pachter L, Doudna JA, Arkin AP (2011) Multiplexed RNA structure characterization with selective 2′hydroxyl acylation analyzed by primer extension sequencing (SHAPE-Seq). Proc Natl Acad Sci 108:11063–11068. 10.1073/pnas.1106501108 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] 13.Rouskin S, Zubradt M, Washietl S, Kellis M, Weissman JS (2014) Genome-wide probing of RNA structure reveals active unfolding of mRNA structures in vivo. Nature 505:701–705. 10.1038/nature12894 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] 14.Safra M, Nir R, Farouq D, Slutzkin IV, Schwartz S (2017) TRUB1 is the predominant pseudouridine synthase acting on mammalian mRNA via a predictable and conserved code. Genome Res 27:393–406. 10.1101/gr.207613.116 [DOI] [PMC free article] [PubMed] [Google Scholar]

Initial denaturation:	98°C 30s
Cycle
Denature:	98°C 10s
Anneal:	58°C 30s
Extend:	72°C 30s
Final extension:	72°C 5 min

Initial denaturation:	98°C for 30 sec
Cycle
Denature:	98°C for 10 sec
Anneal:	60C for 20 sec
Extend:	72°C for 40 sec
Final extension:	72°C for 5 min

PERMALINK

Pseudouridine site assignment by high-throughput in vitro RNA pseudouridylation and sequencing

Nicole M Martinez

Cassandra Schaening-Burgos

Wendy V Gilbert

Abstract

1. Introduction

Figure 1. Highthroughput in vitro RNA pseudouridylation and sequencing.

2. Pool design

Determine PUS that modifies site and validate sites identified in cells.

Table 1:

3. Materials and equipment

1. Buffers

2. Reagents

3. Equipment

Alternatives:

4. Protocol

4.1. PCR amplify the DNA pool

Figure 2. PCR of pool sequences.

4.2. In vitro transcribe RNA

Figure 3. In vitro transcribed RNA pool.

4.3. In vitro pseudouridylation with recombinant PUS or PUS-containing lysate

Folding the RNA

In vitro pseudouridylation

4.4. CMC modification and reversal

Figure 4. Size selection of full length RNA.

4.5. Reverse transcription

Figure 5. Size selection of truncated cDNA.

4.6. 5′ adapter ligation

4.7. Silane cleanup of linker-ligated cDNA

4.6. Diagnostic PCR

4.7. Final PCR

Figure 6. Final library PCR.

4.8. Next-generation sequencing and pseudouridine detection

5. Quantification and statistical analysis

5.1. Read processing

5.2. Read mapping

5.3. Peak height calculation

Figure 7. Pseudouridine site calling.

5.4. Using the provided scripts

6. Alternative methods/procedures

6.1. Kinetic analysis of to identify structural and sequence motifs that drive modification.

6.2. Adaptation to study other RNA modifications

7. Advantages

8. Limitations

9. Conclusions

10. Acknowledgments

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases