Abstract
Pseudouridine (psi) is one of the most abundant mRNA modifications, yet its impact on translation is unclear, in part because existing modification maps are inconsistent, curated comparisons across cell types are lacking, and paired analyses with translation are limited. Using direct RNA nanopore sequencing coupled with our Mod-p ID analytical framework, we mapped psi at single-nucleotide resolution across six immortalized human cell lines. Nanopore sequencing provided single-molecule resolution, enabling quantification of relative modification occupancy and detection of co-occurring modifications. Integrating these psi maps with matched proteomic and ribosome profiling datasets revealed that conserved psi sites installed by the psi synthase TRUB1 are associated with increased protein production. TRUB1 knockout experiments demonstrated a motif-specific reduction in protein abundance, providing direct causal evidence that pseudouridylation enhances protein output. In contrast, transcripts bearing clustered psi sites exhibited reduced protein abundance despite elevated translation efficiency. Controlled in vitro translation experiments confirmed that increasing pseudouridine density within a physiologically relevant range directly reduces protein output, demonstrating a density-dependent effect of pseudouridylation on translation. Together, these findings establish a mechanistic framework in which single-site pseudouridylation enhances protein production, whereas hypermodification impairs translational throughput, revealing pseudouridine density and enzyme specificity as key determinants of proteome output across human cell types.
Graphical Abstract
Graphical Abstract.
Introduction
Pseudouridine (psi) is an RNA modification resulting from the isomerization of uridine by the psi synthase family of enzymes (PUS). Psi is found in multiple classes of RNAs, including noncoding RNAs (ncRNAs) and messenger RNAs (mRNAs). On mRNAs, psi has been implicated in modulating translational fidelity [1, 2], structural stability [3], and pre-mRNA processing [4]. Remarkably, studies demonstrate that psi within coding sequences can promote ribosomal readthrough of premature termination codons (PTCs), effectively restoring full-length protein synthesis in transcripts that would otherwise produce truncated, non-functional proteins [5]. Notably, this effect can be recapitulated through programmable pseudouridylation, in which psi is intentionally installed at specific codons to suppress nonsense mutations—highlighting a causal and tunable link between psi placement and translational outcomes [6]. However, these targeted studies represent only a fraction of the hundreds of psi sites identified across the transcriptome, and systematic investigation of how psi influences translation in its native genomic contexts has been hindered by inconsistent modification maps and limited integration with translational measurements. Specifically, the extent to which psi stoichiometry, positional context, and clustering alter ribosome dynamics and protein synthesis remains largely unexplored.
Recent work has revealed that psi is not uniformly distributed but instead exhibits marked cell type-specific and state-dependent variation. In neuronal models, psi landscapes were shown to be partially remodeled during differentiation, with certain sites remaining highly static across cell states while others are highly plastic, responding to changes in differentiation status or environmental perturbation [7]. In particular, comparisons of undifferentiated, differentiated, and lead-treated SH-SY5Y cells revealed subsets of psi sites that were selectively induced or lost under stress conditions, suggesting that psi may contribute to both developmental and adaptive translational programs [7]. Similarly, environmental stressors, including heat shock, oxidative stress, and nutrient deprivation, have been shown to trigger dynamic changes in psi deposition patterns, with stress-responsive sites often enriched for transcripts encoding stress response proteins [8, 9]. In immune cells, psi occupancy differs between immortalized and primary contexts, consistent with transformation-dependent reprogramming of RNA modification profiles [10]. Together, these findings suggest that psi is a regulated, context-specific mark capable of shaping translation in ways that depend on cell identity, environmental cues, and physiological state.
To discern the role of psi in post-transcriptional regulation, comprehensive maps are needed that define both psi positions and occupancy levels across diverse human cell types. Previous transcriptome-wide approaches have identified psi sites [11–13], and biochemical quantification of total nucleoside content has been used to estimate global psi-to-uridine ratios in mRNA [11, 12]. For site-specific detection, these methods rely on chemical modifications — either N-cyclohexyl-N′-(2-morpholinoethyl)carbodiimide methyl-p-toluenesulfonate (CMC), which blocks reverse transcriptase progression [9, 14], or bisulfite treatment, which induces systematic deletion signatures during reverse transcription [11, 13, 15]. While these cDNA-based techniques have been instrumental in cataloging psi sites, they face several limitations: amplification bias can skew quantification [16], false positives can arise when psi sites are adjacent to uridines [11], and their reliance on short-read sequencing prevents detection of co-occurring modifications on individual RNA molecules. This last limitation is particularly consequential, as it precludes analysis of combinatorial patterns that collectively influence translation in ways that single-site analyses cannot capture.
Nanopore direct RNA sequencing (DRS) is currently the only method to detect native human RNA without conversion to cDNA. Nanopore DRS employs a motor-protein adapter that enables voltage-driven translocation of RNA through a biological nanopore, generating characteristic ionic-current disruptions as each canonical nucleotide passes through the pore. During basecalling, these current signatures are decoded into the corresponding RNA sequence. Critically, because native RNA—rather than cDNA—passes through the pore, chemical modifications to the RNA nucleobases produce aberrations in the expected ionic current levels [17]. This feature has enabled DRS-based transcriptome-wide mapping of several mRNA modifications, including psi [18–20], N6-methyladenosine (m6A) [21–26], and inosine (I) [27].
Psi sites can be detected from DRS using machine-learning-based algorithms that interpret ionic current features [28] or as sequence -specific U-to-C base-calling errors [18]. Using our Mod-p ID framework [18, 29, 30]. These errors are compared against an unmodified in vitro transcribed (IVT) control [31]—a critical reference that distinguishes genuine modification-induced signal from sequencing artifacts and enables confident transcriptome-wide modification assignment [31]. Importantly, psi calls identified by Mod-p ID were cross-validated across sequencing platforms. Sites concordant between nanopore- and Illumina-based methods represent high-confidence calls, while platform-specific sites likely reflect genuine modifications revealed by each method’s distinct detection capabilities—underscoring the value of integrating multiple technologies to comprehensively map the psi landscape. Unlike short-read approaches, nanopore DRS measures the relative occupancy of psi at individual sites and detects multiple modifications on single RNA molecules. Because many biological effects arise from differences in modification levels rather than absolute stoichiometry, these relative measurements are sufficient to reveal patterns of psi spatial organization—from isolated high-occupancy sites to densely clustered regions along individual transcripts. Moreover, because DRS libraries are prepared by selecting for polyadenylated RNAs, we simultaneously capture both mRNAs and polyadenylated long noncoding RNAs (lncRNAs), enabling the first cell-type-resolved maps of psi in lncRNAs alongside the protein-coding transcriptome.
Here, we apply Mod-p ID to perform a comparative, transcriptome-wide analysis of psi across six diverse, immortalized human cell lines: A549 (lung), HeLa (cervix), HepG2 (liver), Jurkat (T cells), NTERA (testes), and SH-SY5Y (neuron-like). While immortalized cell lines exhibit some differences from their primary counterparts, our previous work demonstrated that the majority of psi sites are conserved between immortalized and primary cells, with the largest differences arising from differences in mRNA expression rather than modification levels [10], establishing immortalized lines as valid model systems for studying large-scale regulatory patterns. We quantified relative psi occupancy across cell types, examined its relationship with psi synthase expression, and assessed psi's correlation with protein abundance using quantitative proteomics. To directly test the functional consequences of psi on translation, we additionally performed quantitative proteomics in a homozygous TRUB1 knockout cell line to causally link conserved, TRUB1-mediated psi sites to changes in protein output, and conducted in vitro translation assays using transcripts containing increasing, physiologically relevant levels of psi incorporation. Integrating these data with ribosome profiling (Ribo-seq) to measure translational efficiency (TE) and ribosome pausing, we developed a mechanistic model in which psi regulates translation through two distinct modes: single high-occupancy sites enhance translational efficiency while clustered psi sites promote ribosome pausing, decoupling ribosome occupancy from productive elongation. This integrated multi-omic framework reveals how psi stoichiometry and spatial organization coordinate translational regulation across human cell types and provides a quantitative foundation for future programmable therapeutic pseudouridylation strategies.
Materials and methods
Accessing publicly available data sets
Nanopore DRS raw FAST5s for all IVTs (RNA002) were sourced from NIH NCBI SRA BioProject accession PRJNA947135 [31]. Nanopore DRS raw fast5 files for native HeLa replicates (RNA002) were sourced from NIH NCBI SRA BioProject accession PRJNA777450 [18]. Nanopore DRS raw fast5 files for native SH-SY5Y replicates (RNA002) were sourced from NIH NCBI SRA BioProject accession PRJNA1092333 [7]. Nanopore DRS fastq files for A549 cells and corresponding IVT control (RNA004) can be found in NIH NCBI SRA BioProject accession PRJNA1329214.
Cell culture
HeLa, HepG2, A549, and NTERA-2 cells were cultured in DMEM (Gibco, 10566024); Jurkat cells were cultured with RPMI 1640 (Gibco™ 11875093); Jurkat cells were cultured in RPMI 1640 (Cat. No. 11-875-093); and SH-SY5Y cells were cultured in 1:1 EMEM: F12 (Fisher Scientific, 50-983-283, Cytiva, SH3002601). All media were supplemented with 10% fetal bovine serum (Fisher Scientific, FB12999102) and 1% penicillin-streptomycin (Lonza, 17602E). Cells were cultured at 37°C with 5% CO2 in 10 cm tissue culture dishes and allowed to reach ∼80% confluence.
Total RNA extraction and poly(A) selection
Total RNA extraction from cells and poly(A) selection was performed using a previously established protocol [18, 30]. Six confluent 10 cm cell culture dishes were washed with ice-cold PBS, lysed with TRIzol (Invitrogen, 15596026) at room temperature, and transferred to an RNase-free microcentrifuge tube. Chloroform was added to separate the total RNA in the aqueous supernatant from the organic phase following centrifugation. The aqueous supernatant was extracted and transferred to a new RNAse-free microcentrifuge tube. An equal volume of 70% absolute ethanol was added. PureLink RNA Mini Kit (Invitrogen, 12183025) was used to purify the extracted total RNA following the manufacturer’s protocol. Total RNA concentration was measured using the Qubit™ RNA High Sensitivity (HS) assay (Thermo Fisher, Q32852). Poly(A) selection was performed using NEBNext Poly(A) mRNA Magnetic Isolation Module (NEB, E7490L) according to the manufacturer’s protocol. The isolated poly(A) selected RNA was eluted from the beads using Tris buffer. The poly(A) selected RNA concentration was measured using the same Qubit™ assay listed above.
In vitro transcription and polyadenylation from cell lines
The protocol for IVT, capping, and polyadenylation is described previously [18, 30, 31].
In vitro transcription and in vitro translation of FLuc
IVT was performed on FLuc mRNA templates with psiTP incorporated at final concentrations of 0, 0.5, 1, 1.5, and 2% using the Hi-Scribe T7 High Yield RNA Synthesis Kit (NEB, E2040S). IVT products were cleaned up using the Monarch Spin RNA Cleanup Kit (NEB, T2040L). mRNA concentration was measured using the Qubit RNA High Sensitivity Assay Kit (Thermo Fisher, Q32852) post-IVT. The Escherichia coli Poly(A) Polymerase Kit (NEB, M0276S) was then used to perform polyadenylation. Post-polyadenylation, the Qubit assays were performed three times to ensure accurate concentration measurements. IVT and polyadenylation were performed following the manufacturer’s protocol.
In vitro translation of the hypermodified RNAs was performed using the nuclease-treated rabbit reticulocyte lysate (RRL) system (Promega, L4960). Both positive (Fluc mRNA from the RRL kit) and negative (no mRNA) controls were run simultaneously. The mRNA concentrations of each psiTP% were matched to ensure equal loading of mRNA into the RRL reaction setup. The final mRNA concentration in the RRL mixture was optimized to ∼1.1 μg. The RRL reaction was set up in triplicate according to the recommended protocol and incubated at 30°C for 90 min, then immediately placed on ice to stop translation.
To measure protein abundance, the Luciferase Assay System (Promega, E1500) was used. 2.5 μl of each Fluc RRL sample was added to separate wells. Fifty microliters of the luciferase assay substrate was added to each well, and the plate was immediately inserted into the Gemini EM Microplate Spectrofluorometer (Molecular Devices Corporation). An endpoint luminescence reading was recorded, and the resulting data were reported in relative luminescence units (RLUs). The RLUs were normalized to a 0% psiTP value, and a one-way ANOVA was performed to assess significance.
Sequencing, base calling, and alignment procedure
Cell line replicates were prepared for Nanopore DRS following the ONT SQK-RNA002 kit protocol, including the reverse transcription step. RNA sequencing on the MinION and PrometheION platforms was performed using ONT R9.4.1 (FLO-MIN106) and (PRO-002). Jurkat and NTERA cells were sequenced on the PromethION, while all other cell lines were sequenced on the MinION.
DRS runs were base-called with Guppy v6.4.2 using the high-accuracy model and the default base-calling quality score filter of Q ≥ 7 [35]. Basecalled read replicates for a cell line were merged and then aligned to the GRCh38.p10 reference genome using minimap2 for downstream analysis. Mod-p ID analysis to identify pseudouridine modifications was performed as described by Fanari et al. [30].
DRS libraries for the A549 cells were also prepared using the SQK-RNA004 kit (Oxford Nanopore Technologies) according to the manufacturer’s instructions. Libraries were then sequenced on PromethION flow cells (ONT, FLO-PRO004RA). Raw pod5 signal data from RNA004 runs were basecalled using Dorado v0.7.3 with the high-accuracy model. Modified psi bases were called during sequencing using the parameter –modified-bases pseU. The resulting unaligned BAM files were converted to FASTQ format and aligned to the human genome (GRCh38.p10) using minimap2 v2.17 with parameters -ax splice -uf -k14 -y.
Post-processing and annotation of RNA modifications used for comparative analysis in RNA004 datasets were performed using modkit (Oxford Nanopore Technologies).
Genomic DNA extraction and Sanger sequencing
To analyze putative SNVs, we performed Sanger sequencing on genomic DNA (gDNA). gDNA extraction was performed using a Monarch Genomic DNA Purification Kit (NEB, T3010S) following the manufacturer’s protocol. Polymerase chain reaction (PCR) primers were designed to amplify ∼300 nt regions surrounding the detected psi positions using Primer-BLAST with default settings (Supplementary Table S1). Using the manufacturer’s protocol, the PCR reaction was set up with Q5 polymerase (NEB, M0491L). Thermocycling conditions were as follows: initial denaturation at 98°C for 30 s; 25 cycles of 98°C for 10 s, 63°C for 20 s, and 72°C for 15 s; final extension at 72°C for 2 min; and holding at 10°C. PCR products were purified using a Monarch PCR & DNA Cleanup Kit (NEB, T1030S) following the manufacturer’s protocol. The concentration of eluted DNA was determined using a Nanodrop spectrophotometer. The purified PCR products were imaged on a 2% agarose TBE gel to confirm specific amplification. Samples were sent to Quintara Biosciences for SimpliSeq™ Sanger sequencing. Results are shown in Supplementary Fig. S1.
Immunofluorescence staining and analysis of PUS7 and TRUB1
All six cell lines were seeded in Lab Tek eight-well chambers (Thermo Scientific™ Cat. No. 155409PK). A549 and HeLa cells were seeded at 30,000 cells/well, HepG2 and NTERA cells were seeded at 60,000 cells/well, and SHSY5Y cells were seeded at 100,000 cells/well. A549, HeLa, HepG2, and NTERA cell lines were cultured in 300 μl DMEM (Gibco Cat. No. 10566-016), and SHSY5Y cells were cultured in 300 μl 1:1 EMEM: F12 media (Quality Biological Cat. No. 112-018-131, Gibco Cat. No. 11765054). After 24 h, half the media in each well was removed and replenished with an equal volume of 4% formaldehyde. Following a 2-minute incubation at room temperature, the 2% formaldehyde was aspirated and replaced with 300 µl of 4% formaldehyde. Cells were incubated at room temperature for 10 min, washed twice for 5 min each, and stored in PBS at 4°C.
Cells were removed from 4°C, and all PBS was aspirated. Each well was permeabilized with 300 µl of 0.1% PBS-Triton X-100 for 10 min at room temperature. The entire solution was aspirated, and each well was blocked with 300 µl of 2% BSA in 0.1% PBS-Triton X-100. Cells were incubated at room temperature for 1 h and washed three times with 0.1% PBS-Tween 20. Two wells in each plate were treated with 300 µl of 400 µg/µl anti-PUS7 antibodies (Prestige Antibodies, Cat. No. HPA024116) or 200 µg/µl anti-TRUB1 antibodies (Proteintech, Cat. No. 12520-1-AP) in 1% BSA in PBS-Triton. Each well treated with a primary antibody was also stained with 5 µg/ml GAPDH-conjugated antibodies (Invitrogen, Cat. No. MA5-15738-D488). Two wells on each plate were left as no-primary-antibody controls. Cells were incubated overnight in the dark at 4°C and washed once with 0.1% PBS-Tween 20. Secondary antibody staining was performed with 1:500 Alexa Fluor® 594 AffiniPure™ Alpaca Anti-Rabbit IgG (H + L) antibodies (Cat. No. 611-585-215). Following 1 h incubation at room temperature, the cells were washed once and stained with 167 ng/ml DAPI for 20 min. Cells were then washed and stored in 2× SSC.
The analysis of cell staining was conducted using CellProfiler [32]. The details are in the CellProfiler project file (IF_analysis.cpproj). Briefly, the DAPI, GFP, and Alexa594 channels of each image containing many cells were used for analysis. The DAPI channel was used to segment the nuclei based on the object's typical diameter. The GFP channel was used to segment the cells using adaptive minimum cross-entropy thresholding applied to the logarithm of intensity. The fluorescence intensities of the GFP and Alexa594 channels were measured as the mean intensity per cell. For the IF analysis, the mean intensity of the Alexa594 channel is divided by that of the GFP channel to obtain the fold change for both the TRUB1 and PUS7 antibody targets. Ten fluorescence microscopy images from two biological replicates were obtained for the analysis.
Protein extraction and LC-MS/MS for wild-type cell lines
Cell lysis buffer was prepared using 8 M urea (Fisher, Cat. No. U15-500), 40 mM Tris–HCl, pH 7.0 (Fisher, Cat. No. BP1756-100), and Pierce Protease Inhibitor Tablets (Thermo Scientific, Cat. No. PIA32963). Approximately 107 cells from each of the six cell lines were lysed using this buffer. The lysates were incubated on ice for 30 min and then centrifuged at 20 000 × g for 15 min at 4°C. Supernatants were collected, and protein concentrations were measured using the Qubit Protein Assay Kit (Invitrogen by Thermo Fisher Scientific, Cat. No. Q33211).
100 µg of protein per sample from three biological replicates per group was used for proteomics analysis. Samples were reduced with 10 mM TCEP at 56°C for 60 min, then alkylated with 20 mM iodoacetamide at RT for 30 min. Before digestion, proteins were precipitated with acetone. The protein pellet was re-suspended in 100 mM triethylammonium bicarbonate (TEAB) and digested with MS-grade trypsin (Pierce™) overnight at 37°C. Peptide digest was quantified using Thermo Scientific™ Pierce™ Quantitative Colorimetric Peptide Assay (Product No. 23275). An equal amount of peptide was labeled with isobaric TMT reagents (Thermo Scientific™ TMT10plex; Pierce; Rockford, IL, USA) according to the manufacturer's protocols (Thermo Scientific TM). A small fraction of the sample from each isobaric tag channel was pooled and analyzed by LC-MS to obtain the median of all reporter ion intensities. TMT-labeled peptides were pooled and cleaned with Pierce™ C18 spin columns (Thermo Scientific™) and were resuspended in 2% acetonitrile (ACN) and 0.1% formic acid (FA). One microgram of multiplexed sample was loaded onto in-house pull tip 75 µm × 20 cm C18 ReproSil-Pur 120 1.9 µm LC column, then separated with a Thermo RSLC Ultimate 3000 (Thermo Scientific™) with a 160 min gradient of 2%–35% solvent B (0.1% FA in ACN) at 200 nl/min and 25°C with a 220 min total run time. Eluted peptides were analyzed by a Thermo Orbitrap Q Exactive (Thermo Scientific™) mass spectrometer in a data-dependent acquisition mode. A full-scan MS survey (m/z 350–1500) was acquired on the Orbitrap at a resolution of 70 000. The AGC target for MS1 was set to 3 × 106, and the ion filling time was set to 50 ms. The 15 most intense ions with charge states 2–6 were isolated and fragmented using HCD at 33% normalized collision energy, and detected at a mass resolution of 35 000 at 200 m/z. The AGC target for MS/MS was set to 1 × 105, and the ion filling time was set to 120 ms. Dynamic exclusion was set for 30 s with a 0.7 m/z isolation window.
Proteomics analysis for wild-type cell lines
Raw LC-MS/MS data were loaded into MaxQuant [33] v2.4.2.0, and the peptides were identified with the built-in Andromeda search engine using a FASTA file containing all entries from the SwissProt Homo sapiens (human) database (20, 218 proteins) supplemented with common contaminants.
For all searches, carbamidomethylated cysteine was set as a fixed modification, and oxidation of methionine and N-terminal protein acetylation were set as variable modifications. Trypsin/P was specified as the proteolytic enzyme. Precursor tolerance was set to ±10 ppm and fragment ion tolerance to ±20 ppm. For modified peptides, we used the default cutoffs of at least 40 for the Andromeda score and 6 for the delta score. The false discovery rate was set at <1% for peptide spectrum matches and protein group identification employing a target-decoy approach.
Reporter ions from the TMT product data sheet were input into the software to correct for labeling. The peptides were filtered to have <0.05 PEP (posterior error probability), and all the contaminants were excluded. We filtered the peptides to be present (reporter ion intensity >0) in all replicates of each cell line. The median reporter ion intensities were then calculated to account for protein length differences during quantification. The MaxQuant output is available in Supplementary Table S2. Normalized protein abundance and corresponding mRNA transcripts per million (TPM) from the paired DRS data can be found in Supplementary Table S3.
Protein preparation and processing for HeLa cell TRUB1 KO and HeLa wild-type analysis
Protein concentration was determined using a bicinchoninic acid (BCA) assay (Pierce, Thermo Fisher Scientific). Proteins (1 µg) were reduced with 5 mM dithiothreitol for 30 min at 56°C and alkylated with 15 mM iodoacetamide for 30 min in the dark at room temperature. Proteins were then desalted by acetone precipitation overnight, and the protein pellet was resuspended in 50 mM ammonium bicarbonate. One microgram of each sample was digested overnight at 37°C using sequencing-grade modified trypsin (enzyme-to-protein ratio 1:50, w/w). Digestion was quenched by acidification with formic acid to 1% (v/v). Peptides were desalted using C18 spin tips (Thermo Fisher Scientific), dried under vacuum, and reconstituted in 0.1% formic acid prior to LC-MS/MS analysis.
Peptides were analyzed using a nanoElute UHPLC system (Bruker Daltonics) coupled online to a Bruker TimsTOF Ultra2 mass spectrometer. Approximately 50 ng total peptides were loaded onto a C18 trap column (300 µm × 5 mm, 5 µm particle size) and separated on an Ion Optics C18 column (75 µm × 25 cm, 1.6 µm particle size). Mobile phase A consisted of 0.1% formic acid in water, and mobile phase B consisted of 0.1% formic acid in acetonitrile. Peptides were eluted using the following gradient: 0–18 min, 5 to 23% B; 18–22 min, 23 to 35% B; 22–26 min, 35 to 90% B; 26–30 min, 90% B at a flow rate of 250 nl/min, followed by column washing and re-equilibration.
Mass spectrometric analysis was performed on a Bruker timsTOF Ultra 2 mass spectrometer equipped with a CaptiveSpray nano-electrospray ion source and operated in positive ion mode. Data were acquired using data-independent acquisition with parallel accumulation–serial fragmentation (DIA-PASEF). Ions were accumulated and separated in the trapped ion mobility spectrometry (TIMS) analyzer with an ion mobility range of 0.64–1.45 Vs cm⁻². DIA-PASEF scans were acquired over an m/z range of 100–1 700 using multiple isolation windows distributed along the ion mobility dimension. The TIMS ramp time was set to 100 ms, resulting in a total cycle time of ~0.96 s. Collision energies were applied as a linear function of inverse ion mobility, ranging from 20 eV at 1/K₀ = 0.60 Vs cm⁻² to 59 eV at 1/K₀ = 1.60 Vs cm⁻², as implemented in the DIA-PASEF method.
Raw DIA-PASEF data were analyzed using Spectronaut, as implemented within Bruker ProteoScape (BPS 2025c), using a library-free (directDIA) workflow. Spectra were searched against the human UniProt reference proteome (20 420 reviewed proteins), supplemented with common contaminants.
Trypsin was specified as the proteolytic enzyme, allowing up to two missed cleavages. Carbamidomethylation of cysteine residues was defined as a fixed modification, while oxidation of methionine and protein N-terminal acetylation were included as variable modifications. Precursor and fragment mass tolerances were automatically determined by Spectronaut. Identifications were filtered at 1% FDR at the precursor and protein levels, with protein q-values ≤ 0.05. Protein quantification and normalization were performed using the Spectronaut quantitative workflow as implemented with ProteoScape, applying Spectronaut’s local normalization strategy across all runs (Supplementary Table S4).
Bootstrapping
To enable comparison between MinION and PromethION runs, which differ substantially in read depth, we computationally resampled the native mRNA data. We generated 10 in silico replicates, each containing 1.2 M randomly sampled reads from the base-called FASTQ files for each cell line. To accommodate the reduced read depth, we lowered the minimum coverage threshold from 30 to 10 direct reads per query position.
Effect size analysis of psi site number on expression and translation efficiency analysis
To assess the impact of conserved psi site number on gene expression, transcripts were divided into three groups based on the number of psi sites: (i) transcripts with no psi sites, (ii) transcripts with one psi site, and (iii) transcripts with more than two psi sites.
Protein abundance values were obtained from the median protein abundance estimates in Supplementary Table S3. mRNA expression levels were measured as TPM from the same table. Translation efficiency (TE) values were calculated from publicly available ribosome profiling (Ribo-seq) and RNA-seq datasets obtained from GEO (HeLa: GSE21992, GSE79664, GSE143301, GSE188692, SRA099816; A549: GSE82232, GSE101760; HepG2: GSE125757, GSE174419; SH-SY5Y: GSE148827, GSE155727). For each transcript, TE was defined as the ratio of ribosome-protected fragment counts to RNA-seq counts, as compiled in Supplementary Table S3.
Group comparisons were performed for: (i) 0 sites versus 1 site, (ii) 0 sites versus >2 sites, and (iii) 1 site versus >2 sites. Effect sizes were quantified as the log₁₀ fold-change of group medians, such that positive values indicate higher expression or TE in the group with more psi sites. To account for variability in site-level confidence, effect sizes were estimated using weighted regression, with weights corresponding to the number of transcripts contributing to each group. Separate effect sites were computed for protein abundance, mRNA TPM, and TE.
Statistical significance of differences between groups was assessed using the Mann–Whitney U test. Multiple testing correction was performed using the Benjamini–Hochberg procedure to control the false discovery rate.
Statistical analysis
Experiments were performed in multiple independent experiments, as indicated in the figure legends. All statistics and tests are described fully in the text or figure legend.
Results
Comparative analysis of psi mRNA modifications from six human cell lines using DRS
To quantify transcriptome-wide differences in psi expression across human cell types, we isolated and sequenced poly(A)-selected RNA from six immortalized human cell lines: (i) A549, alveolar basal epithelial carcinoma; (ii) HeLa, cervical carcinoma; (iii) HepG2, hepatocellular carcinoma; (iv) Jurkat, T-cell leukemia; (v) NTERA-2, human embryonic carcinoma derived from testicular cancer; and (vi) SH-SY5Y, neuroblastoma (Fig. 1a). The primary alignments for each sample yielded the following number of reads: 2 ,362 ,999 (A549); 4, 257, 869 (HeLa); 2, 407 ,221 (HepG2); 10, 931 ,896 (Jurkat); 10, 107 ,973 (NTERA); 6 ,064, 224 (SH-SY5Y) (Supplementary Table S5). The higher number of reads observed in NTERA and Jurkat is attributed to the use of the PromethION for sequencing one or all replicates of these cell lines. Following basecalling and genomic alignment, we mapped putative psi positions across the transcriptome for each cell line using our previously described Mod-p ID pipeline [18, 30], which compares our direct RNA libraries to a reference IVT library [18, 31]. An IVT control derived from the same cell type (i.e., paired-IVT or p-IVT) is preferred for this analysis; however, we have previously demonstrated that a pan-IVT comprising merged unmodified transcriptomes from different cell lines can enrich data sets in which paired IVTs do not have sufficient coverage [31]. We used stringent filtering criteria to reduce the possibility of false positives in psi identification: (i) IVT control has ≥10 reads, (ii) IVT error at the U site is ≤10%, and (iii) if the paired IVT dataset has <10 reads, we used the pan-IVT dataset. For all sites that passed these criteria, we applied Mod-p ID to calculate p-values in the six cell lines (see the “Materials and methods” section; Fig. 1b).
Figure 1.
Individual cell line putative psi identification by Mod-p ID and PUS expression analysis. (a) Experimental summary of A549, HepG2, HeLa, Jurkat, NTERA, and SH-SY5Y direct and IVT library prep and DRS. (b) Overview of the computational pipeline following DRS. (c) Number of total direct reads versus the relative occupancy of putative (% mm) psi positions detected by Mod-p ID (P < .001) for each cell line. Red dashed lines indicate baseline requirements for a position included in downstream analysis. (d) TPMs of PUS enzymes. Individual points are from the computational resampling of the dataset. Significance from (***P < .001) one-way ANOVA. (e) (top) Fraction of the total number of reads detected by Mod-p ID in a TRUB1 or PUS7 motif divided by the total number of TRUB1 or PUS7 k-mer motifs observed versus the TPMs of the PUS enzyme and standard error of the mean from computational resampling. (bottom) Fraction of the total number of reads detected by Mod-p ID in a TRUB1 or PUS7 k-mer motif divided by the total number of TRUB1 or PUS7 k-mer motifs observed versus the relative protein expression levels of the PUS enzyme (in arbitrary units; A.U.). (f) Scatter plot of U→C mismatch rates for 74 ground–truth psi sites: RNA002 (x–axis) versus RNA004 (y–axis). Filled circles = sites detected by Dorado; open circles = false negatives.
Our first requirement for the direct library was that the putative site must have a significant difference in U-to-C base-calling error between the direct and IVT samples (P < .001). Next, we required that the number of reads from the direct DRS library be ≥10 at a given site. Finally, we required the U-to-C mismatch error in the direct library to be at least 10%. We identified putative psi sites for each cell line (Fig. 1c and Supplementary Table S6). To find the relative ratio of putative psi sites to total uridines for a cell line, we calculated the number of putative psi sites detected by Mod-p ID and normalized by the number of aligned genomic uridines for each cell line. The values obtained from this normalization are 10.40 × 10−4 for A549 cells, 8.97 × 10−4 for HeLa cells, 4.46 × 10−4 for HepG2 cells, 9.80 × 10−4 for Jurkat cells, 10.85 × 10−4 for NTERA cells, and 5.37 × 10−4 for SH-SY5Y cells. Globally, A549 and NTERA cells had the highest relative ratio of putative psi sites to total uridines across cell lines, while HepG2 and SH-SY5Y cells had the lowest (Fig. 1c).
mRNA expression profiling of PUS enzymes across cell types
We calculated TPM values for PUS enzyme mRNAs and compared their levels across cell lines (Fig. 1d and Supplementary Table S7). We categorized putative psi positions within a known psi synthase motif to test the hypothesis that higher PUS enzyme levels lead to more psi sites in each cell type. This analysis assumes that PUS mRNA levels across cell types are commensurate with their corresponding PUS enzyme levels. For the psi synthase TRUB1, we searched for the motif GUUCN [34]; for PUS7, we searched for the motif UNUAR [35]. We computed the proportion of psi positions with the motif divided by the total number of positions with the motif expressed in the transcriptome and deduced from gene models of the genome,
![]() |
![]() |
and compared these proportions to the normalized RNA expression levels for TRUB1 and PUS7. Interestingly, we found that NTERA cells have the highest proportion of psi sites within the PUS7 motif (Fig. 1e), while SH-SY5Y cells have the lowest proportion of psi for both PUS7 and TRUB1 (Fig. 1e). These findings are consistent with our observations of high global psi levels in NTERA cells and low levels in SH-SY5Y cells (Fig. 1c). We also found that Jurkat cells have the highest proportion of psi within a TRUB1 motif, while NTERA cells have the lowest. When comparing the proportion of psi within each of these motifs to the respective enzyme mRNA expression levels, we found no correlation between PUS mRNA expression levels in each cell type and the total number of psi sites for that cell type (Fig. 1e). To assess PUS protein expression levels, we performed immunofluorescence assays, staining each cell line with fluorescently labeled Anti-Pus7 or Anti-Trub1 antibodies. We found no correlation between PUS protein expression levels and the total number of psi sites for each cell line (Fig. 1e).
To further evaluate the robustness of psi site detection, we leveraged improvements in nanopore chemistry introduced during the review process. Oxford Nanopore Technologies recently released a new direct RNA kit (RNA004) and base caller, providing an opportunity to assess whether changes in sequencing chemistry affect psi-calling. Using A549 cells as a reference, we re-analyzed data with Mod-p ID on a benchmark set of orthogonally validated psi sites (“ground truth” psi positions) originally identified from RNA002 A549 data and confirmed by CeU–Seq [12], BID–seq [11], PRAISE seq [15], or RBS–Seq [13]. The updated RNA004 chemistry produced nearly identical modification profiles, reducing U→C error frequencies below 10% at only five psi positions (Fig. 1f). These results demonstrate that psi detection by Mod-p ID is highly reproducible and largely insensitive to changes in nanopore chemistry, supporting the robustness of our comparative psi mapping across cell types.
Comparison of conserved psi sites across six human cell lines
To compare the occupancy of individual psi positions across cell types, we first selected sites for which a psi modification is conserved (i.e., has been identified in every cell line) by Mod-p ID (P < .001). To ensure that any differences in psi occupancy we observed reflected true biological variation rather than differences in transcript abundance, we then filtered these positions to include only those with matched expression levels: at least 30 reads from the direct library and a minimum of 10 IVT reads for all six cell lines. This filtration step produced a list of 70 psi positions on mRNAs that were highly and comparably expressed across all cell types (Supplementary Table S8).
Of these 70 sites, we applied additional filtering criteria, including a minimum of 30% U-to-C error in each cell line, to compile a final list of “housekeeping” psi sites (Supplementary Table S8). We use “housekeeping” here to denote psi sites that meet two criteria: (i) they are present on transcripts with matched abundance across all cell types, and (ii) they show unambiguous, robust psi modification well above the detection threshold in every cell line. To apply even more stringent validation and ensure these sites represent bona fide psi modifications, we cross-validated our 20 detected “housekeeping” sites using orthogonal methods, including siRNA knockdown of TRUB1 or PUS7 [7], CMC-mediated Illumina sequencing methods, and RNA bisulfite labeling methods [8, 11–13, 15]. This produced a final list of 17 “housekeeping” psi sites that have been validated across multiple detection platforms and are conserved across all six cell types (Fig. 2a and Supplementary Table S8). Given their extensive cross-platform and cross-cell-type validation, these 17 sites represent high-confidence psi modifications that could serve as valuable positive controls for future psi detection studies.
Figure 2.
Conserved psi sites across six human cell lines. (a) Heatmap of relative psi occupancy for sites identified by Mod-p ID17 with ≥30 reads and ≥30% U-to-C mismatch for each cell line. Standard deviation and range are reported across cell types. For the same position, the PRAISE(15) deletion ratio is reported. (right) Orthogonal methods confirming the psi site, including biological, bisulfite-based, and CMC-based detection. (b) 21-nt sequence of the conserved TRUB1 psi sites. (c) Predictive secondary structure modeling of canonical 21-nt sequences of the conserved TRUB1 psi sites. (d) 21-nt sequence of randomly selected unmodified uridine positions. (e) Predictive secondary structure modeling of canonical 21-nt sequences of the unmodified uridines. (f) Comparison of sequence properties between 14 TRUB1 substrates and 14 control sequences. The bars show the median value with standard error.
Of these, we mapped 11 sites to the CDS and 6 to the 3′ UTR (Supplementary Fig. S2 and Supplementary Table S9). Interestingly, the TRUB1 motif (GUUCN) was predominantly observed in 14 of our 17 highly conserved sites, and psi modification at those sites was further validated by a TRUB1 siRNA knockdown experiment [7]. These sites are found within genes that perform essential functions across tissue types. However, a few common pathways were identified. NIP7, DKC1, and PARP4 are involved in nucleic acid processing and maintenance, while SLC30A5 and SLC2A1 are involved in transport functions. Interestingly, the only PUS7 substrate on this “conserved” list was RHBDD2, which encodes for an intramembrane serine protease necessary for processing various proteins and maintaining cellular homeostasis.
Next, we calculated the standard deviation (SD) for each psi position to compare across all six cell lines (Fig. 2a). The conserved psi site located on NIP7 (chr16:69342144) has the smallest SD positional occupancy, ranging from 98% (NTERA) to 100% (A549, HepG2, and SH-SY5Y). NIP7 (Nucleolar pre-rRNA processing protein) is involved in RNA-binding activity for ribosomal biogenesis. We also observed in a previous study that this position on NIP7 is relatively static in SH-SY5Y cells across multiple treatments [7]. Conversely, the relative positional occupancy of the psi on SLC2A1 (chr1:42926727) ranges from 32% U-to-C error (Jurkat) to 79% (HepG2). SLC2A1 (solute carrier family 2 member 1) is a critical glucose transporter essential for tissue-specific energy delivery. While SLC2A1 is present in both liver and T cells, its primary role in the liver is to support glucose metabolism, whereas in T cells, SLC2A1 is crucial for immune function and responses. This psi site has also been found to have stable occupancy across different treatments for SH-SY5Y cells [7].
Relationship between transcript levels and positional psi occupancy
The conserved transcript FKBP4 (chr12:2803909) has highly variable psi occupancy across the cell lines, ranging from 38% (Jurkat cells) to 69% (A549 cells), as well as high variability in transcript expression levels, ranging from 81 ± 8 (SH-SY5Y cells) to 735 ± 16 (NTERA cells) TPMs (Fig. 2a and Supplementary Fig. S2). We observed several examples of an inverse relationship between mRNA levels and psi positional occupancy. In NTERA cells, FKBP4 expression is the highest among all cell lines (349 TPMs), but it has one of the lowest psi positional occupancies at 42%. Conversely, A549 cells have significantly lower FKBP4 expression (111 ± 6 TPMs) but show the highest psi-positional occupancy at 69%. SLC2A1 is also differentially expressed with variable psi occupancy at chr1:42926727 across the cell lines. However, this case directly correlates transcript abundance and psi positional occupancy. HepG2 cells have the highest expression of SLC2A1 at 210 ± 9 TPMs and the highest U-to-C mismatch at 79%, and Jurkat cells have the lowest expression and psi occupancy at 37 ± 4 and 32%, respectively (Fig. 2a and Supplementary Fig. S2).
Secondary structure modeling of the region surrounding psi position for TRUB1 targets
As mentioned earlier, most highly conserved targets were TRUB1 substrates (Fig. 2a). Previous studies have analyzed the structural features characteristic of TRUB1 targets in a single cell line [34] and found that the stem-and-loop structure of mRNA within the TRUB1 motif is sufficient for mRNA pseudouridylation. Thus, we aimed to assess whether these structural features were present on sites where psi is conserved across cell lines. Using Mfold [36], we took a 21-mer sequence flanking the conserved and validated psi positions within a TRUB1 motif (14 targets) and predicted the RNA secondary structure (Fig. 2b and c; Supplementary Table S10). The 21-mer sequence was chosen because the predictions are more accurate with shorter sequences [36]. Additionally, we randomly selected 14 control sites for each target, a total of 196 sites with the same motif but bearing no psi modification (Fig. 2d and e; Supplementary Table S10).
In comparing the structural characteristics of these groups, we observed that the control sites exhibit shorter loop lengths on average compared to the psi sites (P = .0369 < .05). While the stem lengths were not significantly different between control sites and psi sites (P = .9723), the 5-mer motif for psi sites tends to be closer to the 5′ end of the stem than in the control sites (P = .0016 < .05). Conversely, the 5-mer motif for controls is closer to the 3′ end of the stem than psi targets, and this difference is significant (P < .0001; Fig. 2f).
We randomly selected 14 control sites from the 196 total controls to directly compare with the 14 psi targets (Fig. 2c and e). Interestingly, some control sites displayed structural variations distinct from those observed in psi targets. For instance, SPR68 exhibits its 5-mer motif beyond the 5′ end, while NACA positions its first nucleotide of the 5-mer at the 3′ end of the stem, with four nucleotides extending behind the stem. In contrast, all psi targets exhibit the 5-mer motif positioned within the stem, loop, or both.
Importantly, all psi targets displayed their U/psi positioned within the loop of hairpin structures for the 21-mer structures, whereas only 7 of the 14 control sites had U/psi positioned within the loop. Additionally, 3 of the 14 control sites contained internal loops, a feature not observed in psi targets. We also compared the U/psi position within 31-mer and 41-mer structures for the 14 psi targets and 14 controls. For psi targets, 12 of 14 have their psi sites on the loop in 31-mer and 41-mer structures (Fig. 2c; Supplementary Table S10, Supplementary Figs S4 and S5). Therefore, longer sequences do not significantly change the preference for loop locating of psi sites. As for the U controls, 3 U/psi sites are located on the loop for the 31-mer structures and 4 for the 41-mer structures. Both 31-mer and 41-mer structures have 8 U/psi sites located on the stem. In summary, for all lengths of structures, U/psi sites prefer to be located on the loop of psi targets. Conversely, the control sites appear in the loop less often.
Site-specific psi presence or absence can be cell-type dependent
We were interested in whether psi sites were present only in specific cell lines, even when RNA expression levels were conserved. We constructed a table of Mod-p ID detected and orthogonally confirmed psi sites where the modification was detected in 3 of the 6 cell lines (Fig. 3a and Supplementary Table S11). For example, DAZAP1 (chr19:1434893) was only present in A549, HeLa, and HepG2 while ITFG1 (chr16:47311407) was present only in A549, HepG2, and SH-S75Y cells. Interestingly, for this group of modified sites, we found no positions within this category that were hypermodification type I [18], meaning all identified sites had <40% U-to-C mismatch, indicating <50% psi occupancy. We compared all aligned reads’ raw ion current traces at the psi position, revealing that psi presence affects the ion current distributions (Fig. 3b).
Figure 3.
Cell type-specific expression of psi. (a) Heatmap of relative psi occupancy for sites detected in three cell lines identified by Mod-p ID and confirmed by orthogonal methods. (b) KDE ionic current traces of cell-type split presence/absence of psi sites. Colors indicate traces for each cell line. The cell line with no psi detected is a gray-shaded distribution. (c) Heatmap of the unique absence of a cell line identified by Mod-p ID and confirmed by orthogonal methods. (d) KDE ionic current traces of cell-type-specific absence of psi sites. Colors indicate traces for each cell line. The cell line with no psi detected is a gray-shaded distribution. (e) Heatmap of cell type-specific unique presence of putative psi sites’ relative occupancy. (f) Sanger sequencing of cell type-specific putative psi positions CENPA (g) KDE ionic current traces of CENPA for ±3 positions. Colors indicate traces for each cell line. The cell line detected to have the putative psi is shaded with its corresponding color. pan-IVT trace is in black.
The unique absence of a psi site for a single cell line may implicate cell type-specific functionality. To study this, we had to change our selection criteria from what was used to analyze “housekeeping” psi targets (i.e., 30 direct reads and >30% U-to-C error in every cell line). We relaxed the criteria to include positions with at least 30 direct reads in every cell line and >10% psi positional occupancy for the detected site in 5 out of 6 cell lines (Supplementary Table S11). We detected 38 putative psi sites that are uniquely absent in a single cell line with Mod-p ID, 12 of which are orthogonally confirmed psi positions (Fig. 3c). All 12 sites are present in A549 and HeLa cells, indicating no unique absence of psi at any of the detected positions in these two cell lines. SH-SY5Y demonstrates the highest occurrence of uniquely absent sites, with only 6 psi sites identified. Four of the six psi positions present in SH-SY5Y are found on transcripts (CDC42 (chr1:22092496), WDR45 (chrX:49075653), YWHAQ (chr2:9584978), and SNAP29 (chr22:20870467)) related to neuron function directly in the cellular activities or through processes crucial for neuronal health and activity. Each of these 12 positions was orthogonally validated to be psi (Fig. 3c). The position with the largest difference between the two cell lines was within H2AZ2 (chr7:448355564), which is hypermodified (type I) in HepG2 cells and unmodified in SH-SY5Y cells, yet is highly expressed in both. This gene encodes a variant of the histone H2A complex and plays roles in synaptic plasticity in neurons and liver development. We do not know which PUS enzyme is responsible for this modification. However, this position was confirmed to be psi by two orthogonal methods [11, 15].
To further confirm the cell-type-specific absence of psi, we overlaid the per-read ionic current intensity distribution ionic current traces for all aligned reads to the psi position (Fig. 3d). As expected, at chr2:9584978 (YWHAQ), we observed a single, smooth distribution in HepG2 cells and the pan-IVT, whereas, in the other five cell lines, there is an additional peak to the left of the expected distribution peak, indicating a difference in the physical properties of the sequences (i.e. thus leading to a change in the ionic current disruption) corresponding to the observed absence of psi (Fig. 3d).
Likewise, the unique presence of psi positions in a cell line may further provide information on the cell-type-specific functionality of psi. Of the 63 orthogonally confirmed psi sites using Mod-p ID present in only one cell line with at least 10% relative positional occupancy, 7 sites have a TRUB1 motif, and 12 have a PUS7 motif. Only 4 of the orthogonally confirmed psi sites have a U-to-C error signature above 20%: DRG2 (chr17:18107480), MRPL46 (chr14:88467298), SLC25A1 (chr22:19177964), and IPO7 (chr11:9445686). Additionally, we identified 14 putative psi sites using Mod-p ID that have >30% U-to-C errors (Fig. 3e). For all sites of unique modification presence, we ruled out the presence of a single-nucleotide variant in these cell lines by Sanger sequencing on the corresponding gDNA (Fig. 3f and Supplementary Fig. S1). We wanted to confirm that these sites deviated from the expected ionic current, which indicates a probable modification. We observed that the pan-IVT trace and the other five cell lines have similar ionic current trace distributions, and that the cell line distribution shows a visible deviation with the modification (Fig. 3g).
Cell-type-resolved psi mapping extends to lncRNAs
In addition to mRNAs, we leveraged poly(A) selection to simultaneously map psi modifications in lncRNAs across all six cell lines, providing the first cell-type-resolved lncRNA pseudouridylation landscape. We identified 24 psi sites in lncRNAs using Mod-p ID and manual curation, with 3 sites conserved across multiple cell types and 2 cell-specific sites (Supplementary Fig. S6 and Supplementary Table S12). Interestingly, while psi sites on mRNA often fall within TRUB1/PUS7 motifs, we find only 4 sites to be harbored by these two known motifs in lncRNAs.
Type II hypermodification of transcripts can be variable or shared across cell lines
Type II psi hypermodification has been previously defined as transcripts with more than one psi modification [18]. To assess the type II hypermodification status across cell types, we filtered to require 10 direct reads and 10 paired-IVT reads at the query position (Supplementary Table S13). We detected up to 5 modified positions per transcript in A549, HeLa, HepG2, Jurkat, and NTERA cells. Note that this analysis does not necessarily indicate that the modifications are on the same strand; rather, it shows that a particular mRNA population has multiple positions that may be modified by psi, and some of these positions may be on the same strand. We detected many transcripts with two modifications across all cell lines. A549 cells, NTERA cells, and Jurkat cells consistently had the highest detected type II hypermodified transcripts. In contrast, SH-SY5Y cells had the fewest hypermodified transcripts, as we detected only up to 3 positions on a single transcript (Fig. 4a).
Figure 4.
Type II hypermodification analysis. (a) Number of hypermodified type II transcripts with 2, 3, 4, and 5 putative psi sites. (b) Variable type II hypermodification across cell lines for the prosaposin gene, PSAP. Location on the transcript of three putative psi sites and IGV snapshots of each position’s motif for the six cell lines’ direct and IVT libraries and pan-IVT. (c) Special case of type II hypermodification: double modification where multiple putative psi positions are detected within a motif on LAPTM4B. IGV snapshots of each position’s motif for the six cell lines’ direct and IVT libraries and pan-IVT. The first detected modification is found in the TRUB1 motif. (d) Merged cell line KDE ionic current traces of the two double-modified putative psi sites for all modification combinations for the two sites in different colors. The ionic trace for the pan-IVT is shaded in gray.
PSAP encodes the prosaposin gene and is one example of a transcript with multiple positions orthogonally confirmed (chr10:71816374 [13, 15], 71816664 [8, 11–13, 15], 71817048 [15], and 71819612 [15]) and detected at varying positional occupancy with our Mod-p ID method (Fig. 4b). The encoded protein is essential for proper lysosomal function. Of the positions identified, chr10:71819612 is in the CDS, while the other three are in the 3′ UTR. None of these confirmed sites has a known PUS motif. All 4 psi locations are present in HeLa cells; 3 in A549 cells; 2 in NTERA; 1 in HepG2; and 0 in Jurkat and SH-SY5Y. The positional occupancies for detected psi sites are relatively low, with a maximum of 33% in HepG2 at chr10:71819612. Mod-p ID also identified a 5th putative position at chr10:71816662 in HeLa and NTERA cells. This additional putative psi site falls within the same motif of chr10:71816664, a putative double modification (Fig. 4b).
LAPTM4B (chr8:97776038, 97776040) is another example of a double modification, where chr8:97776038 has a TRUB1 motif, GUUCU (Fig. 4c), which has been orthogonally confirmed with both PRAISE [15] and CeU-seq [12]. We confirmed with Sanger sequencing that neither position was an SNV (Supplementary Fig. S1). CMC and bisulfite-based methods for psi detection cannot detect two neighboring modifications. Thus, there are no known examples of double modifications.
Double modifications can potentially disrupt current traces in DRS. To explore further, we performed a strand analysis of all six cell lines’ combined reads (pan-direct) to see if the double modification occurred simultaneously in both positions on a given read. This analysis is different than the previous analysis, whereby we searched for the presence of psi modifications on a given transcript. In this case, we can attribute the modifications to the same strand. A read must span both positions and only contain a U or C to be included in the analysis. We separated the 1823 reads into four groups: (i) no modification (1619 reads), (ii) U-to-C error in the first position and an unmodified second position (76 reads), (iii) an unmodified first position and U-to-C error in the second position (124 reads), and (iv) double modification (4 reads). We wanted to visualize the ionic current distribution at these two positions, specifically when the nucleotide of interest is in the middle position of the nanopore, for each grouping. We observe changes in the distribution pattern corresponding to each grouping. Overall, this suggests that singly and doubly modified populations coexist within the same motif, with distinct ionic current distributions (Fig. 4d).
Protein expression of transcripts with conserved psi sites
To assess post-transcriptional regulation by psi, we performed a paired mRNA-protein analysis at conserved positions. This was a high-confidence group of sites validated by previous orthogonal, non-nanopore-based methods, making it an ideal group for this analysis. These positions were found within highly expressed targets (reads >30), and the levels of psi were high (U-to-C error >10%). We performed quantitative mass spectrometry across all 6 cell lines to measure protein abundance of genes containing conserved psi sites (Supplementary Table S3).
We first analyzed all measured transcripts, regardless of psi presence, to visualize the full proteome distribution for each of the 6 cell lines (Fig. 5a and Supplementary Table S3). Overall, the mRNA levels are weakly correlated with corresponding protein abundance, as expected. Additionally, the genes represented in NTERA cells showed overall lower protein expression than in the other 5 cell lines.
Figure 5.
Psi site number and sequence context differentially regulate protein output and translation. (a) Relationship between transcript abundance (TPM) and median protein abundance across six cell lines. Ribosomal protein transcripts and pseudouridylated transcripts are shown relative to unmodified transcripts. (b) Protein abundance changes for conserved pseudouridylated transcripts measured by mass spectrometry in wild-type (WT) and TRUB1 knockout (KO) HeLa cells, plotted as Δlog2(WT − KO). Transcripts with pseudouridine in the TRUB1 motif (GTTCN) show increased protein abundance in WT cells (median = 0.38, 95% CI [−0.10, 0.78]), whereas non-motif targets do not (median = −0.29, 95% CI [−0.67, 0.14]). The median and 95% CI were calculated using bootstrapping. (c) Effect size of protein abundance and mRNA expression comparing transcripts with 0, 1, or > 2 pseudouridine sites. Statistical significance was assessed by the Mann–Whitney U test. (d) TE, calculated from ribosome profiling and RNA-seq datasets, remains elevated for transcripts with multiple pseudouridine sites. Statistical significance was assessed by the Kruskal–Wallis test. (e) Schematic of firefly luciferase (FLuc) mRNA synthesized with defined pseudouridine incorporation and translated in rabbit reticulocyte lysate. (f) Luciferase activity shows reduced protein output at physiologically relevant pseudouridine densities (∼1–4 psi per 1000 nt), consistent with hypermodified Type II transcripts. (g) Model illustrating how increasing pseudouridine density promotes ribosome engagement but reduces protein output, consistent with hypermodification type II behavior. Statistical significance indicated as *P < .05.
Next, to assess post-transcriptional regulation by psi, we applied several stringent criteria for inclusion in the analysis: at least one genomic uridine position on a given transcript must have at least 30 reads and contain a conserved psi site with a minimum of 10% direct U-to-C mismatch across all six cell lines. Filtering with these criteria resulted in 68 unique transcripts. We further filtered these targets to include only those with a protein reading in each of three replicates, resulting in 22 targets, with psi found primarily in the CDS (Fig. 5a and Supplementary Fig. S3). Interestingly, ∼68% of the 68 psi-bearing mRNA targets did not have protein detected despite high levels of mRNA expression (>30 TPMs) compared to 87% of non-psi-bearing transcripts that did not have protein detected despite high levels of mRNA expression (>30 TPMs). This suggests that psi-bearing mRNA targets may have higher protein expression levels because they more frequently exceed the minimum detection threshold.
To confirm this observation, we compared the relative lengths of the detected proteins and found that the median protein lengths in the modified (364 AAs) and unmodified (358 AAs) groups are similar. To understand if the percentages of non-detected proteins in the psi-bearing transcripts (∼68%) are different from the non-psi-bearing mRNAs (87%), we focused on the detected proteins and ascertained whether 32% (proteins found in the psi-bearing group) is statistically different from 13% (proteins found in the non-psi-bearing group) by performing a simulation in which 68 genes were randomly sampled from a dataset containing gene names of mRNA transcripts with >30 reads. A total of 10 million randomly selected sets of 68 were generated and cross-referenced against a dataset of genes for which proteins were identified by mass spectrometry in all 3 replicates of the 6 cell lines. The 95% confidence interval for the mean and standard deviation of this distribution is [13.869; 13.874] and [4.148; 4.153], which suggests that the empirically observed difference of 32% and 13% between the modified and unmodified populations was not due to sample size imbalance and supports the observation that pseudouridylation impacts protein expression.
Consistent with this observation, we noted an increase in protein expression for psi-bearing transcripts of similar expression levels compared to the protein expression of non-psi-bearing transcripts. This is visualized by the average position of psi transcripts on the y-axis (TPMs) in the scatter plot in Fig. 5a (right), yet higher-than-median protein count values along the x-axis of the same scatter plot. One exception to this pattern is NTERA cells, which are left-shifted relative to the rest of the population, indicating lower-than-median protein abundance for these targets.
We also observed two separate populations for transcripts with a conserved psi, specifically a population of targets with high mRNA levels and relatively lower protein expression than expected based on the mRNA levels (Fig. 5a, right). We observed that the targets in the top population were mainly encoding ribosomal proteins. To test the hypothesis that mRNAs encoding ribosomal proteins typically have high mRNA expression levels yet lower protein output, we plotted all mRNAs encoding ribosomal proteins, including those without psi on them (Fig. 5a), and observed a tight population that has a significantly higher trendline than the global correlation trendline for all mRNAs. This suggests that this population was not different due to psi presence, but rather due to other factors.
TRUB1-dependent pseudouridylation selectively enhances protein expression of motif-containing targets
To directly assess the impact of pseudouridylation on protein expression, we performed quantitative mass spectrometry in wild-type (WT) and TRUB1 knockout (KO) HeLa cells and measured protein abundance changes for conserved psi-containing transcripts (Fig. 5b and Supplementary Table S4). Protein abundance was quantified as the difference in log2-transformed protein levels between WT and KO cells [Δlog2 (WT − KO)].
We stratified transcripts based on whether the conserved pseudouridine occurred within the canonical TRUB1 recognition motif (GTTCN). Transcripts containing motif-associated pseudouridines exhibited a positive median Δlog2 (WT − KO) of 0.38 (95% CI [−0.10, 0.78], indicating increased protein abundance in WT cells. In contrast, transcripts with pseudouridines outside the TRUB1 motif showed a negative median Δlog2 (WT − KO) of −0.29 (95% CI [−0.67, 0.14], reflecting reduced or unchanged protein abundance in WT relative to KO cells (Fig. 5b).
These results demonstrate that TRUB1-dependent pseudouridylation selectively enhances protein expression of its motif-containing targets, whereas pseudouridylation outside the TRUB1 motif does not confer the same effect. Together, these findings provide direct evidence that TRUB1-mediated pseudouridylation contributes to the regulation of protein output from specific mRNAs.
Type II hypermodifications correlate with elevated TE but reduce protein abundance
We further assessed whether the presence of psi modifications and the number of sites on a transcript affect expression and TE. We divided transcripts into three groups based on the number of conserved psi sites: (i) transcripts with no psi sites, (ii) transcripts with one site, and (iii) transcripts with more than two psi sites.
First, we assessed the weighted regression coefficient (effect size) of protein abundance and mRNA expression in 3 comparison groups: mRNAs with 0 versus 1 psi-sites, 0 versus >2 psi-sites, and 0 and 1 versus >2 psi-sites. Protein abundance was determined by mass spectrometry. mRNA expression was measured in TPMs from nanopore DRS results. Our analysis revealed that while psi presence consistently correlates with higher transcript abundance, protein output follows a biphasic pattern: transcripts with a single psi site exhibit elevated protein levels, whereas those with two or more psi sites show reduced protein abundance (Fig. 5c).
To investigate the relationship between psi site number and TE, we performed a meta-analysis of ribosome profiling (Ribo-seq) and RNA-seq datasets from the same cell lines, calculating TE as the ratio of ribosome-protected fragments to transcript abundance for the same mRNA targets. Strikingly, TE remains high for transcripts containing multiple psi sites (Fig. 5d and Supplementary Table S3).
To directly test whether increasing pseudouridine density affects protein production independent of cellular regulatory processes, we generated firefly luciferase (FLuc) mRNA by in vitro transcription with defined pseudouridine incorporation levels ranging from 0% to 2% of uridine residues and measured protein output using a rabbit reticulocyte lysate translation system (Fig. 5e and f). This reductionist system isolates translational effects and enables precise control over pseudouridine density. We observed significantly reduced protein output at 0.5%, 1.0%, and 1.5% pseudouridine incorporation, corresponding to ~1.25 to 3.75 pseudouridine residues per 1000 nt mRNA—densities comparable to the majority of hypermodified type II transcripts identified in our transcriptome-wide analysis. Notably, protein output at 2% pseudouridine began to approach levels observed for unmodified transcripts, suggesting a non-linear relationship between pseudouridine density and protein production. Importantly, most endogenous transcripts contain fewer than four pseudouridine sites, placing the experimentally observed reduction in protein output squarely within the physiologically relevant range of hypermodification.
Together with our transcriptome-wide analyses showing elevated TE but reduced protein abundance for hypermodified transcripts, these findings support a mechanistic model in which increasing pseudouridine density enhances ribosome recruitment but promotes elongation pausing or reduced translational throughput, ultimately limiting total protein production (Fig. 5g).
Discussion
Our recently developed method, Mod-p ID [18], enables transcriptome-wide mapping of psi sites and quantitative comparison of psi occupancy levels across conditions. Multiple orthogonal techniques have been developed to detect psi at single-nucleotide resolution, each with distinct strengths and limitations. Mod-p ID achieves comparable levels of de novo site discovery to these established methods, identifying roughly 44% of sites not validated by other approaches—within the 55%–60% unvalidated range reported for alternative methods (Supplementary Fig. S7). In previous work, psi sites uniquely detected by Mod-p ID were orthogonally validated by CLAP (n = 9) [7, 10], demonstrating that nanopore-based detection can reveal bona fide sites missed by short-read sequencing. In addition to de novo discovery, cross-platform and cross-cell-type analyses identified a final set of 17 conserved “housekeeping” psi sites detected across all six cell types and independently validated by multiple orthogonal methods (Fig. 2a and Supplementary Table S8). Given their reproducibility across sequencing platforms and biological contexts, these sites represent high-confidence pseudouridine modifications and provide a valuable reference set for benchmarking and validating future psi detection approaches.
To confirm the robustness of psi detection, we re-evaluated Mod-p ID performance using Oxford Nanopore’s updated direct RNA chemistry (RNA004) and basecaller (Fig. 1f). Using A549 cells as a benchmark, we compared previously validated psi sites—confirmed by CeU-Seq [12], BID-seq [11], PRAISE [15], or RBS-Seq [13]—and found that RNA004 preserved nearly all modification calls, reducing U→C error rates below 10% at only five positions. These results confirm that psi detection by Mod-p ID is highly reproducible and largely unaffected by changes in nanopore chemistry, underscoring the stability of the psi-specific current signature.
The primary advantage of Mod-p ID is its ability to detect multiple modifications on individual RNA molecules. By leveraging long-read DRS, Mod-p ID enables identification of hypermodified transcripts (type II [18]) that cannot be resolved by short-read or chemical-based approaches. This allows direct measurement of PSI stoichiometry, positional context, and co-occurrence along single RNA strands, providing new insight into modification density and spatial organization across transcripts.
Using this approach across six human cell lines, we identified conserved psi sites in highly expressed housekeeping mRNAs, the majority of which (>80%) were substrates for TRUB1 (Fig. 2). Some sites exhibited near-complete occupancy across cell types, while others showed substantial variability, indicating site-specific regulation of modification stoichiometry. Many of these transcripts encode components of the ribosome biogenesis and translational machinery, consistent with a role for pseudouridylation in maintaining translational homeostasis. Structural modeling confirmed enrichment of conserved TRUB1 sites within hairpin loops [34], consistent with established requirements for enzyme recognition.
We also identified cell–type–specific psi sites with lower occupancy, suggesting dynamic or condition-dependent modification. Raw ionic current analysis confirmed distinct signal shifts even at low occupancy levels, supporting the presence of bona fide modifications. In addition, cell-type-resolved mapping revealed conserved and variable pseudouridylation patterns in lncRNAs, suggesting that PSI may contribute to the regulation of noncoding RNA function and gene expression programs.
Beyond single modifications, we identified transcripts bearing multiple closely spaced psi sites, including rare double modifications within the same motif. Such events represent a unique category of type II hypermodification, in which two psi residues simultaneously occupy positions within the ∼11-nt region interacting with the nanopore. These configurations are invisible to short-read methods but detectable in long-read ionic current traces [20]. For example, LAPTM4B harbors double psi modifications in A549, HeLa, and HepG2 cells at chr8:97776038 and chr8:97776040 (Fig. 4). The visible differences in the KDE of merged ionic traces may indicate that the detection of double modification is not a false positive. chr8:97776038 has been orthogonally confirmed [12, 15]. TRUB1 siRNA knockdown in HeLa cells showed the U-to-C error disappeared in both chr8:97776038 and chr8:97776040 compared to both the native HeLa mRNA and the scrambled control. We ruled out the presence of an SNV in all cell lines for these sites by Sanger sequencing on the corresponding gDNA (Supplementary Fig. S1). These findings support a model in which localized modification “hotspots” alter ribosome behavior through additive or cooperative effects.
Functional studies of psi have shown that the modification can influence TE and induce changes in mRNA translation [5, 37, 38]. Our data revealed that pseudouridylation regulates protein production in a context-dependent manner (Fig. 5). Across cell types, mRNAs containing conserved PSI sites produced higher protein levels than unmodified transcripts. Importantly, analysis of TRUB1 knockout cells demonstrated that increased protein output was specific to transcripts containing pseudouridine within the TRUB1 recognition motif, providing direct causal evidence that TRUB1-mediated pseudouridylation enhances protein production of its targets (Fig. 5). We note that this trend was not universal—NTERA cells, for instance, displayed lower-than-median protein abundance for psi-modified transcripts—suggesting that psi’s translational effects depend on cell-specific regulatory contexts or isoform usage.
In contrast, transcripts containing multiple clustered PSI sites exhibited reduced protein abundance despite elevated TE. Together with prior polysome-based analyses [28], our results suggest that PSI can influence ribosome loading and translational throughput in distinct ways, highlighting that ribosome occupancy and protein output are not always tightly coupled. Consistent with this interpretation, controlled in vitro translation experiments demonstrated that increasing pseudouridine density within a physiologically relevant range—corresponding to ~1–4 psi sites per kilobase—directly reduced protein production. This density range closely matches that of endogenous hypermodified transcripts identified in our transcriptome-wide analysis, demonstrating that clustered pseudouridylation can directly impair translational throughput, independent of transcriptional or degradation effects.
Our findings extend the model proposed by Mauger et al. [39], who showed that psi can modulate translation through structural and kinetic effects. While uniform substitution of uridine with pseudouridine in synthetic transcripts enhances stability and TE, endogenous pseudouridylation occurs in discrete clusters. These localized modification domains likely alter ribosome kinetics, promoting pausing or elongation irregularities that reduce overall protein yield despite increased ribosome occupancy.
Taken together, our results establish a quantitative and mechanistic framework for understanding how pseudouridylation regulates translation. Single-site pseudouridylation, particularly at TRUB1 recognition motifs, enhances protein production, whereas clustered hypermodification reduces translational output. These findings demonstrate that pseudouridylation functions as a density-dependent regulator of protein expression. By enabling detection of co-occurring modifications on individual RNA molecules, Mod-p ID provides a foundation for modeling RNA modification networks and understanding how modification stoichiometry and spatial organization control gene expression. This framework also suggests new opportunities for engineering pseudouridylation patterns to modulate translation in therapeutic contexts.
Supplementary Material
Acknowledgements
We thank the Mass Spectrometry facility associated with the Chemistry and Chemical Biology Department at Northeastern University, which provided TMT-labeling and LC-MS/MS services.
Author contributions: Caroline A. McCormick (Conceptualization [lead], Data curation [lead], Formal analysis [lead], Writing—original draft [equal]), Michele Meseonznik (Data curation [supporting], Formal analysis [supporting], Writing—review & editing [supporting]), Yuchen Qiu (Data curation [supporting], Formal analysis [supporting], Writing—review & editing [supporting]), Oleksandra Fanari (Formal analysis [supporting], Writing—review & editing [supporting]), Priyanka Goyal (Formal analysis [supporting], Writing—review & editing [supporting]), Mitchell Thomas (Data curation [supporting], Formal analysis [supporting]), Mina Shokoufandeh (Data curation [supporting]), Yifang Liu (Data curation [supporting], Formal analysis [supporting], Dylan Bloch (Data curation [supporting], Formal analysis [supporting], Writing—review & editing [supporting]), Cole Greenfield (Data curation [supporting]), Isabel N. Klink (Formal analysis [supporting]), Miten Jain (Formal analysis [supporting], Resources [supporting], Supervision [supporting]), Meni Wanunu (Formal analysis [supporting], Funding acquisition [supporting], Supervision [supporting], Writing—original draft [supporting], Writing—review & editing [supporting]), and Sara H. Rouhanifard (Conceptualization [equal], Funding acquisition [lead], Project administration [lead], Resources [lead], Supervision [lead], Writing—original draft [equal], Writing—review & editing [equal]).
Contributor Information
Caroline A McCormick, Dept. of Bioengineering, Northeastern University, Boston, MA, 02115, United States.
Michele Meseonznik, Dept. of Bioengineering, Northeastern University, Boston, MA, 02115, United States.
Yuchen Qiu, Dept. of Bioengineering, Northeastern University, Boston, MA, 02115, United States.
Oleksandra Fanari, Dept. of Bioengineering, Northeastern University, Boston, MA, 02115, United States.
Priyanka Goyal, Dept. of Bioengineering, Northeastern University, Boston, MA, 02115, United States.
Mitchell Thomas, Dept. of Bioengineering, Northeastern University, Boston, MA, 02115, United States.
Mina Shokoufandeh, Dept. of Bioengineering, Northeastern University, Boston, MA, 02115, United States.
Yifang Liu, Dept. of Bioengineering, Northeastern University, Boston, MA, 02115, United States.
Dylan Bloch, Dept. of Bioengineering, Northeastern University, Boston, MA, 02115, United States.
Cole Greenfield, Dept. of Bioengineering, Northeastern University, Boston, MA, 02115, United States.
Isabel N Klink, Dept. of Bioengineering, Northeastern University, Boston, MA, 02115, United States.
Miten Jain, Dept. of Bioengineering, Northeastern University, Boston, MA, 02115, United States; Dept. of Physics, Northeastern University, Boston, MA, 02115, United States.
Meni Wanunu, Dept. of Bioengineering, Northeastern University, Boston, MA, 02115, United States; Dept. of Physics, Northeastern University, Boston, MA, 02115, United States.
Sara H Rouhanifard, Dept. of Bioengineering, Northeastern University, Boston, MA, 02115, United States.
Supplementary data
Supplementary data is available at NAR online.
Conflict of interest
None declared.
Funding
S.H.R. and M.W. acknowledge support from NIH (R01HG011087) and NIH (R01HG012856) as well as support through an Opportunity Fund by the Technology Development Coordinating Center at Jackson Laboratories (NHGRI federal award no. U24HG011735). Funding to pay the Open Access publication charges for this article was provided by NIH (R01HG012856).
Data availability
The data that support this study are available from the corresponding author upon reasonable request. Sequences were aligned to the hg38.p10 genome assembly. Unless otherwise stated (see Accessing Publicly Available Data Sets), all FASTQ files and Fast5 raw data generated in this work are publicly available in NIH NCBI SRA under the BioProject accession PRJNA1108269.
Proteomics data is available on MassIVE (MSV000095673) and the ProteomeXchange (PXD055082). All code used in this work is publicly available at https://github.com/RouhanifardLab/PanHumanPsiProfiling and https://doi.org/10.5281/zenodo.7383335.
References
- 1. Guzzi N, Muthukumar S, Cieśla M et al. Pseudouridine-modified tRNA fragments repress aberrant protein synthesis and predict leukaemic progression in myelodysplastic syndrome. Nat Cell Biol. 2022;24:299–306. 10.1038/s41556-022-00852-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Jack K, Bellodi C, Landry DM et al. rRNA pseudouridylation defects affect ribosomal ligand binding and translational fidelity from yeast to human cells. Mol Cell. 2011;44:660–6. 10.1016/j.molcel.2011.09.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Kierzek E, Malgowska M, Lisowiec J et al. The contribution of pseudouridine to stabilities and structure of RNAs. Nucleic Acids Res. 2014;42:3492–501. 10.1093/nar/gkt1330. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Martinez NM, Su A, Burns MC et al. Pseudouridine synthases modify human pre-mRNA co-transcriptionally and affect pre-mRNA processing. Mol Cell. 2022;82:645–59. 10.1016/j.molcel.2021.12.023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Karijolich J, Yu Y-T. Converting nonsense codons into sense codons by targeted pseudouridylation. Nature. 2011;474:395–8. 10.1038/nature10165. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Song J, Dong L, Sun H et al. CRISPR-free, programmable RNA pseudouridylation to suppress premature termination codons. Mol Cell. 2023;83:139–55. 10.1016/j.molcel.2022.11.011. [DOI] [PubMed] [Google Scholar]
- 7. Fanari O, Tavakoli S, Qiu Y et al. Probing enzyme-dependent pseudouridylation using direct RNA sequencing to assess epitranscriptome plasticity in a neuronal cell line. Cell Syst. 2025;16:101238. 10.1016/j.cels.2025.101238. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Schwartz S, Bernstein DA, Mumbach MR et al. Transcriptome-wide mapping reveals widespread dynamic-regulated pseudouridylation of ncRNA and mRNA. Cell. 2014;159:148–62. 10.1016/j.cell.2014.08.028. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Carlile TM, Rojas-Duran MF, Zinshteyn B et al. Pseudouridine profiling reveals regulated mRNA pseudouridylation in yeast and human cells. Nature. 2014;515:143–6. 10.1038/nature13802. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Fanari O, Bloch D, Qiu Y et al. Pseudouridine reprogramming in the human T cell epitranscriptome: from primary to immortalized states. RNA. 2025;31:1320–34. 10.1261/rna.080633.125. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Dai Q, Zhang L-S, Sun H-L et al. Quantitative sequencing using BID-seq uncovers abundant pseudouridines in mammalian mRNA at base resolution. Nat Biotechnol. 2023;41:344–54. 10.1038/s41587-022-01505-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Li X, Zhu P, Ma S et al. Chemical pulldown reveals dynamic pseudouridylation of the mammalian transcriptome. Nat Chem Biol. 2015;11:592–7. 10.1038/nchembio.1836. [DOI] [PubMed] [Google Scholar]
- 13. Khoddami V, Yerra A, Mosbruger TL et al. Transcriptome-wide profiling of multiple RNA modifications simultaneously at single-base resolution. Proc Natl Acad Sci USA. 2019;116:6784–9. 10.1073/pnas.1817334116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Lovejoy AF, Riordan DP, Brown PO. Transcriptome-wide mapping of pseudouridines: pseudouridine synthases modify specific mRNAs in S. cerevisiae. PLoS One. 2014;9:e110799. 10.1371/journal.pone.0110799. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Zhang M, Jiang Z, Ma Y et al. Quantitative profiling of pseudouridylation landscape in the human transcriptome. Nat Chem Biol. 2023;19:1185–95. 10.1038/s41589-023-01304-7. [DOI] [PubMed] [Google Scholar]
- 16. Aird D, Ross MG, Chen W-S et al. Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries. Genome Biol. 2011;12:R18. 10.1186/gb-2011-12-2-r18. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Workman RE, Tang AD, Tang PS et al. Nanopore native RNA sequencing of a human poly(A) transcriptome. Nat Methods. 2019;16:1297–305. 10.1038/s41592-019-0617-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Tavakoli S, Nabizadeh M, Makhamreh A et al. Semi-quantitative detection of pseudouridine modifications and type I/II hypermodifications in human mRNAs using direct long-read sequencing. Nat Commun. 2023;14:334. 10.1038/s41467-023-35858-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Huang S, Zhang W, Katanski CD et al. Interferon inducible pseudouridine modification in human mRNA by quantitative nanopore profiling. Genome Biol. 2021;22:330. 10.1186/s13059-021-02557-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Begik O, Lucas MC, Pryszcz LP et al. Quantitative profiling of pseudouridylation dynamics in native RNAs with nanopore sequencing. Nat Biotechnol. 2021;39:1278–91. 10.1038/s41587-021-00915-6. [DOI] [PubMed] [Google Scholar]
- 21. Liu H, Begik O, Lucas MC et al. Accurate detection of m6A RNA modifications in native RNA sequences. Nat Commun. 2019;10:4079. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Leger A, Amaral PP, Pandolfini L et al. RNA modifications detection by comparative Nanopore direct RNA sequencing. Nat Commun. 2021;12:7198. 10.1038/s41467-021-27393-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Hendra C, Pratanwanich PN, Wan YK et al. Detection of m6A from direct RNA sequencing using a multiple instance learning framework. Nat Methods. 2022;19:1590–8. 10.1038/s41592-022-01666-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Pratanwanich PN, Yao F, Chen Y et al. Identification of differential RNA modifications from nanopore direct RNA sequencing with xPore. Nat Biotechnol. 2021;39:1394–402. 10.1038/s41587-021-00949-w. [DOI] [PubMed] [Google Scholar]
- 25. Jenjaroenpun P, Wongsurawat T, Wadley TD et al. Decoding the epitranscriptional landscape from native RNA sequences. Nucleic Acids Res. 2021;49:e7. 10.1093/nar/gkaa620. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Lorenz DA, Sathe S, Einstein JM et al. Direct RNA sequencing enables m6A detection in endogenous transcript isoforms at base-specific resolution. RNA. 2020;26:19–28. 10.1261/rna.072785.119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. Nguyen TA, Heng JWJ, Kaewsapsak P et al. Direct identification of A-to-I editing sites with nanopore native RNA sequencing. Nat Methods. 2022;19:833–44. 10.1038/s41592-022-01513-3. [DOI] [PubMed] [Google Scholar]
- 28. Huang S, Wylder AC, Pan T. Simultaneous nanopore profiling of mRNA m6A and pseudouridine reveals translation coordination. Nat Biotechnol. 2024;42:1831–5. 10.1038/s41587-024-02135-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Makhamreh A, Tavakoli S, Fallahi A et al. Nanopore signal deviations from pseudouridine modifications in RNA are sequence-specific: quantification requires dedicated synthetic controls. Sci Rep. 2024;14:22457. 10.1038/s41598-024-72994-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Fanari O, Meseonznik M, Bloch D et al. Protocol for differential analysis of pseudouridine modifications using nanopore DRS and unmodified transcriptome control. STAR Protocols. 2025;6:103948. 10.1016/j.xpro.2025.103948. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. McCormick CA, Akeson S, Tavakoli S et al. Multicellular, IVT-derived, unmodified human transcriptome for nanopore-direct RNA analysis. Gigabyte. 2024;2024:1. 10.46471/gigabyte.129. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Stirling DR, Swain-Bowden MJ, Lucas AM et al. CellProfiler 4: improvements in speed, utility and usability. BMC Bioinformatics. 2021;22:433. 10.1186/s12859-021-04344-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33. Cox J, Mann M. MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat Biotechnol. 2008;26:1367–72. 10.1038/nbt.1511. [DOI] [PubMed] [Google Scholar]
- 34. Safra M, Nir R, Farouq D et al. TRUB1 is the predominant pseudouridine synthase acting on mammalian mRNA via a predictable and conserved code. Genome Res. 2017;27:393–406. 10.1101/gr.207613.116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35. Giambruno R, Zacco E, Ugolini C et al. Unveiling the role of PUS7-mediated pseudouridylation in host protein interactions specific for the SARS-CoV-2 RNA genome. Mol Ther Nucleic Acids. 2023;34:102052. 10.1016/j.omtn.2023.102052. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36. Zuker M. Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res. 2003;31:3406–15. 10.1093/nar/gkg595. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37. Eyler DE, Franco MK, Batool Z et al. Pseudouridinylation of mRNA coding sequences alters translation. Proc Natl Acad Sci USA. 2019;116:23068–74. 10.1073/pnas.1821754116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38. Karikó K, Muramatsu H, Welsh FA et al. Incorporation of pseudouridine into mRNA yields superior nonimmunogenic vector with increased translational capacity and biological stability. Mol Ther. 2008;16:1833–40. 10.1038/mt.2008.200. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39. Mauger DM, Cabral BJ, Presnyak V et al. mRNA structure regulates protein expression through changes in functional half-life. Proc Natl Acad Sci USA. 2019;116:24075–83. 10.1073/pnas.1908052116. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The data that support this study are available from the corresponding author upon reasonable request. Sequences were aligned to the hg38.p10 genome assembly. Unless otherwise stated (see Accessing Publicly Available Data Sets), all FASTQ files and Fast5 raw data generated in this work are publicly available in NIH NCBI SRA under the BioProject accession PRJNA1108269.
Proteomics data is available on MassIVE (MSV000095673) and the ProteomeXchange (PXD055082). All code used in this work is publicly available at https://github.com/RouhanifardLab/PanHumanPsiProfiling and https://doi.org/10.5281/zenodo.7383335.








