Abstract
Background & Aims:
The human liver transcriptome is complex and highly dynamic, e.g., one gene may produce multiple distinct transcripts, each with distinct posttranscriptional modifications. Direct knowledge of the transcriptome dynamics, however, is largely obscured by the inaccessibility of human liver to treatments and the insufficient annotation of the human liver transcriptome at transcript and RNA modifications levels.
Methods:
We generated mice that carry humanized livers of identical genetic background and subjected them to representative metabolic treatments. We then analyzed the humanized livers with Nanopore single-molecule direct RNA sequencing (DRS) to determine the expression level, m6A modification and poly(A) tail length of all RNA transcript isoforms. Our system allows for constructing a de novo annotation of human liver transcriptomes reflecting metabolic responses and studying transcriptome dynamics in parallel.
Results:
Our analysis uncovered a vast number of novel genes and transcripts. Our transcript-level analysis of human liver transcriptomes also identified a multitude of regulated metabolic pathways that were otherwise invisible using conventional short read RNA-seq. We revealed for the first time the dynamic changes in m6A and poly(A) tail length of human liver transcripts, many of which are transcribed from key metabolic genes. Furthermore, we performed comparative analyses of gene regulation between human and mouse and between two individuals using the liver-specific humanized mice, revealing that transcriptome dynamics are highly species- and genetic background-dependent.
Conclusion:
Our work revealed a complex metabolic response landscape of human liver transcriptome and provided a novel resource to understand transcriptome dynamics of human liver in response to physiologically relevant metabolic stimuli (https://caolab.shinyapps.io/human_hepatocyte_landscape/).
Keywords: human liver transcriptome; humanized liver; nanopore single-molecule direct RNA sequencing (DRS); m6A modification; poly(A) tail length, metabolism
Graphical Abstract

Introduction
Direct information on gene regulation in human liver by metabolic stimuli are largely unknown. As normal human livers are generally inaccessible to treatments, knowledge of stimulus-provoked changes in their transcriptomes heavily depend on extrapolation from animal studies, particularly those done in mice. Growing evidence, however, has revealed substantial differences in gene regulation between human and mouse livers[1, 2]. To directly study gene regulation in human liver, the only option is to analyze liver biopsy samples of patients and, very rarely, healthy controls. Patient-based studies, however, are often complicated by the diverse genetic backgrounds and environmental exposures of the patient population. It has been shown that in a general population, the majority of human genes (86.1%) are subject to the regulation of local genetic variants and exhibit differential expression among individuals[3]. While cohort statistics can be employed to identify the fraction of commonly regulated genes, the holistic picture of the metabolic responsive landscape of the transcriptome would have been lost. Thus, direct knowledge of the liver transcriptome dynamics in humans is currently unknown hindering the overall understanding of human liver pathophysiology.
Aside from the limited accessibility to treatments, another critical obstacle to understanding gene regulation in human liver is the incomplete annotation of its transcriptome. Recent advances have unraveled the immense complexity of the human transcriptome. A single gene can produce 10 or more transcript isoforms on average[4, 5] and RNA transcripts often go through extensive modifications, all of which impact RNA metabolism and have shown to play diverse roles in metabolic physiology[6, 7]. Intriguingly, RNA modifications and relative expression levels of transcript isoforms are often highly regulated and condition dependent. Although major annotations such as GENCODE[8] and RefSeq[9] provide useful references for gene loci, they do not have deep coverage of transcript isoforms and RNA modifications, especially those that specifically occur under defined metabolic conditions. The reason for this insufficient annotation is partly caused by the limited accessibility of human livers to treatments. This creates a conundrum, i.e., reliable assessment of transcriptome dynamics in the mostly inaccessible human liver requires a comprehensive transcriptome annotation that requires an accessible liver to obtain.
In this work, we produced mice that carried humanized livers of identical genetic background and subjected them to diverse metabolic stimuli. Nanopore DRS (Direct RNA Sequencing) of the humanized mice revealed a vast number of novel genes and transcript isoforms. We then defined their regulation at transcript and RNA modification levels, revealing substantial human liver transcriptome dynamics that were not previously recognized. This information could serve as a valuable resource to advance the understanding of human liver pathophysiology.
Results
A de novo annotation of the human liver transcriptome reveals pathophysiologically relevant metabolic responses
To enable a thorough examination of gene regulation in human liver by representative metabolic conditions, we first established an inclusive annotation of human liver transcriptomes reflecting these conditions. We first produced liver-specific humanized mice[10–12] that were engrafted with hepatocytes from the same donor to reduce the impact of diverse genetic backgrounds. To capture transcriptomes reflecting a wide range of human metabolic responses, we subjected the humanized mice to several dietary interventions and transcription factor activation treatments. Dietary interventions were fasting (Fast) and ad libitum (AL), two ends of caloric cycle and known to regulate the expression of most metabolic genes[13]. Key transcriptional factors of metabolic pathways activated were: PPARα, a Fast-induced master regulator of fatty acid oxidation[14]; PPARγ, an activator of lipogenic genes[15]; and FXR, a bile acid receptor that regulates broad metabolic pathways[16]. In light of the complexity of RNA metabolic regulation, we analyzed the transcriptome using a combination of Nanopore direct RNA sequencing (DRS), ATAC-seq and short-read RNA-seq[ 17, 18] (Table S1).
DRS was capable of detecting RNA transcripts of up to 15,000 bp, with most falling between 1,000-5,000 bp (Fig. S1A). We compared differentially expressed genes (DEGs) between Fast and AL detected by short-read RNA-seq and DRS, which showed similar expression patterns during fasting for both human and mouse genes (Fig. S1B). Furthermore, DEGs identified by both DRS and short read RNA-seq were significantly enriched in fatty acid metabolism related pathways (Fig. S1C). Moreover, GSVA results, indicating the biological function for each sample, also identified similar enriched patterns between DRS and short-read sequencing in Fast and AL groups (Fig. S1D). These data indicate that Nanopore DRS and short-read RNA-seq are equally capable of capturing liver transcriptome dynamics at the gene level.
Based on the Nanopore DRS analysis of humanized livers, we constructed de novo annotations of human and mouse liver transcriptomes. Our annotation resulted in a substantial expansion of RNA transcripts in both species: 54.6% of human and 70.8% of mouse transcripts isoforms are novel compared to those in reference annotations (GENCODE v33 and GENCODE vM24) (Fig. 1A and Table S2). Furthermore, over 30% of all transcripts in the novel annotations were only detectable under specific treatment conditions (Fig. S1E), strongly suggesting that gene regulation in human liver by metabolic treatments can only be adequately studied when an inclusive annotation that reflects these treatments is established.
Fig. 1. A de novo annotation of human liver transcriptomes reflecting pathophysiologically relevant metabolic responses.



(A) Comparison of DRS and reference annotations (GENCODE v33 for human and GENCODE vM24 for mouse respectively). Matched transcripts (Matched) have completely and exactly matched splicing sites in DRS and reference annotations. Different transcripts (Different) have different isoform structures in the annotations. Other isoforms (Others) have mismatched splicing sites or no actual overlaps to entries in reference annotation. (B) Comparison of protein sequences of different/novel transcripts identified by DRS with those in reference databases. (C) Diagram schematic of the novel gene definition (Top). Gel analysis of the PCR cloned novel genes (Bottom). (D) UpSet plot displaying the overlaps of novel genes regulated by several metabolic interventions. Novel genes commonly regulated in Fast and other treatments were labeled by different colors. (E) Expression levels of human novel 3 (hNovel 3) in the livers of healthy (Normal, n=14), obese (Obese, n=12), nonalcoholic fatty liver (NAFL, n=15), and nonalcoholic steatohepatitis (NASH, n=16) patients (Non-alcoholic fatty liver disease database, PRJNA523510, more details in Supplementary methods). Data represent mean ± SEM, * p<0.05, **p < 0.01, ***p < 0.001, two-tailed unpaired Student’s t-test. (F) The bar plot of the top enriched KEGG pathways correlated with hNovel3. (G) Gene Set Enrichment Analysis (GSEA) enrichment plots for the Sterol Biosynthetic Process pathway using the transcriptome data of hNovel3 knockdown (siRNA1, siRNA2, and siRNA3) and control samples (n=3 for each group).
Next, we evaluated the isoform composition of genes with novel transcripts identified by DRS. We found that novel transcripts were the dominant isoform for 16.45% of these genes in humans and 18.17% in mouse (Fig. S1F). Moreover, more than 40% of these dominant novel transcripts displayed protein coding sequences that are different from those in current reference databases (Fig. 1B), supporting the idea that these novel transcripts significantly contribute to protein diversity in human liver.
In addition to protein-coding genes, noncoding genes have also been shown to play an increasingly prominent role in liver metabolism[10, 11, 19, 20]. To characterize novel transcripts located in noncoding regions, we analyzed their coding potential using five algorithms (CPC2, PLEK, CPAT, CNCI, FEELnc) and found that they were mostly noncoding transcripts. A small fraction of them (5.5%), however, indeed have coding potential (Fig. S1G). For example, a novel transcript in the locus of noncoding gene AC005538.3 (Fig. S1H) that was originally labelled as noncoding in the reference annotation displayed a very high coding potential that suggests our novel annotation could be used to identify novel functional elements in a genomic locus.
Furthermore, we identified 212 novel genes in intergenic regions where no gene has been recorded in current reference annotations, again pointing to the insufficient annotations of the human liver transcriptome (Table S3). To confirm the expression of these novel genes, we randomly cloned some of the novel genes and confirmed that their sequences matched those revealed by Nanopore DRS (Fig. 1C and Table S4). To further explore the potential role of these novel genes in human liver disease, we used our novel annotation to reanalyze a published liver RNA-seq dataset of a nonalcoholic fatty liver disease (NAFLD) study[21], and readily detected 160 novel genes in at least one third of the livers in this NAFLD cohort. Intriguingly, the expression levels of 38.8% of these novel genes were significantly changed in patients with NAFLD (Fig. S1I), suggesting that their dysregulation could potentially contribute to the pathogenesis of NAFLD.
Moreover, more than 40% of the novel genes were responsive to at least one metabolic stimulus, with some responsive to multiple (Fig. 1D). To access whether this responsiveness has functional implications, we took Human novel gene 3 (hNovel 3) as an example. hNovel3 was downregulated in livers of humanized mice treated with FXR agonist and was upregulated in patients with NAFLD (Fig. S1J and Fig. 1E). Pathway enrichment analysis of co-regulated protein coding genes suggested that hNovel3 may be associated with steroid biosynthesis pathway (Fig. 1F). The specific expression of hNovel3 gene was supported by Nanopore DRS as well as ATAC-seq and H3K27Ac ChIP-seq. In addition, short-read RNA-seq also corroborated the endogenous structure of hNovel 3 in both humanized livers and primary human hepatocytes (Fig. S1K). To further explore the biological function of hNovel 3, we knocked it down in primary human hepatocytes and found that hNovel 3 negatively regulated the Sterol Biosynthetic Process, which was consistent with the pathway analysis in humanized mice (Fig. 1G).
By performing DRS on the livers of humanized mice under pathophysiologically relevant metabolic conditions, we have constructed a novel and comprehensive annotation of human liver transcriptomes reflecting these important metabolic responses.
Human liver transcriptome dynamics in response to representative metabolic treatments
Equipped with this comprehensive annotation of human liver transcriptome, we examined how these treatments regulate human liver gene expression at transcript level in an in vivo setting. Strikingly, the vast majority of significantly regulated transcripts displayed no corresponding gene level changes in any of the treatments. For example, less than 6% of the differentially regulated transcripts manifested gene-level changes in response to PPARα agonist treatment (Fig. 2A and Table S5–6). The percentage of genes that were consistently regulated at both gene and transcript levels were also similarly low for all other treatments (Fig. 2A). As an example, Glutathione S-Transferase Mu 1 (GSTM1) gene produced 20 transcript isoforms. While the levels of two isoforms were significantly changed in response to PPARα agonist, the remaining transcripts and the total gene expression level showed no change (Fig. 2B). Among these, transcript 913293c8 had the highest expression, an observation that was also supported by an independent liver DRS dataset (BioSample, SAMD00127219). Interestingly, transcript 913293c8 encodes a 154 amino acid (aa) protein which is much shorter than the protein encoded by the isoform in the reference annotation (ENST00000309851, 219 aa) (Fig. 2C). Remarkably, three-dimensional structural alignment showed that the new isoform lacked a large domain compared to the reference transcript (Fig. 2D). Furthermore, pathways that were enriched by differentially expressed transcripts (DETs) and differentially expressed genes (DEGs) also displayed marked differences (Fig. 2E). For example, the dynamic transcripts induced by Fast indicated important roles in certain crucial biological processes such as glutathione metabolism, tyrosine metabolism and linoleic acid metabolism, none of which were detected in DEG analysis.
Fig. 2. Human liver transcriptome dynamics in response to representative metabolic treatments.





(A) Percentage of differentially expressed transcripts (DETs) with consistent gene level regulation patterns in short read RNA-Seq. The DETs that had the same fold change directions with the DEGs were referred to be consistent (Consistent), whereas the others were inconsistent (Inconsistent). (B) Boxplot of GSTM1 gene and transcript expression levels in PPARα agonist and DMSO treatment samples. The GSTM1 gene expressions were measured by short read RNA-Seq and the transcript expressions by DRS. The two significantly changed transcripts are highlighted by purple and dark blue respectively. * p<0.05, two-tailed unpaired Student’s t-test. (C) Top: Isoform schematic of GSTM1 significantly changed transcripts from humanized mouse liver DRS and public human liver dataset (BioSample, SAMD00127219, https://www.ncbi.nlm.nih.gov/biosample/?term=SAMD00127219). Bottom: left, the percentages of transcript expression levels of GSTM1 gene. Bottom: right, protein sequence alignment between 913293c8 and ENST00000309851.10 and the coding protein potential analysis results. (D) Alignment of the three-dimensional structures of ENST00000309851 (dark blue) and 913293c8 (purple). (E) Comparisons of the top enriched pathways for DETs and DEGs under different treatments. (F) Venn diagram of overlapping human DETs in humanized mouse livers under various conditions. (G) PCA plot of human splicing events across samples treated with different transcription factor agonists. (H) Composition of significantly changed alternative splicing (AS) types, which includes exon skipping (ES), alternative 5’ splice site selection (A5SS), alternative 3’ splice site selection (A3SS), and intron retention (IR), during treatments with FXR, PPARα, and PPARγ agonists. (I) Comparisons of the top enriched pathways (GO BP) for transcripts containing AS events that were significantly changed between treatments.
Metabolic genes are often regulated by multiple metabolic stimuli, a notion called metabolic sensitivity which can be used to identify genes, particularly IncRNA genes, that play a role in metabolism[11, 22]. We found that a sizable portion of transcripts were regulated by at least two treatments (Fig. 2F), and 45 transcripts were changed in all treatments (Fig. S2A). These commonly regulated transcripts displayed divergent regulatory patterns in response to different treatments (Fig. S2B). In addition to the commonly regulated transcripts, we also observed hundreds of transcripts that were specifically regulated by each treatment (Fig. S2C) and enriched in diverse metabolic pathways (Fig. S2D).
Alternative splicing (AS) is one of the most important regulatory mechanisms in transcripts diversity[23, 24]. One clear advantage of DRS is its capability to reliably define AS events. We found that the humanized livers with different metabolic treatments split clearly according to both transcript levels and AS events in PCA analysis (Fig. 2G and Fig. S2E–G). For example, while the dominant AS event regulated by PPARα agonist was exon skipping (ES) (more than 50%), it was intron retention (IR) in PPARγ agonist treated samples (Fig. 2H). Pathway enrichment analysis showed that the significantly changed AS events were involved in important metabolism pathways and displayed differential enrichment patterns in response to different metabolic stimuli (Fig. 2I). For example, the changed AS in response to FXR activation showed greater enrichment in high-density lipoprotein (HDL) particle remodeling process than those in PPARα and PPARγ agonist treatments did. APOA1, a key regulator of HDL biogenesis, displayed more alternative 5′ splice sites (A5SS) in the end of the first exon during FXR activation (Fig. S2H).
Dynamics of m6A modification and Poly(A) tail length of RNA transcripts in humanized livers
In the past decade, RNA modifications, such as m6A and changes in poly(A) tail length, have been shown to play key roles in regulating RNA metabolism and physiological processes[6, 7]. Here we used DRS to analyze m6A modifications on human RNA transcripts[25] and found that the modification sites were enriched in the CDS and 3’ UTR regions with a clear peak at the end of CDS and the start of 3’ UTR (Fig. S3 A), a pattern that was consistent with previous studies[26, 27]. Interestingly, both dietary interventions and transcription factor activations shifted the modification distributions towards 3’UTR regions compared to the control groups (Fig. S3 A). Moreover, PCA analysis of m6A modifications in 3’UTR showed significant differences between treatments while no such distinction was found in the CDS regions (Fig. 3A and Fig. S3B). Using dietary intervention as an example, we found that transcripts from certain genomic regions showed intensive m6A modifications whereas others were sparse (Fig. S3C). While most m6A sites existed in both AL and Fast treatments, Fast-specific modifications did occur and were generally closer to 3’ UTR compared to those in the AL group (Fig. 3B). To further understand the dynamics of m6A modifications during Fast, we performed differential modification analysis and identified 302 significantly changed modification loci. More than one third of these loci contained the motif GGAC A and more than 40% of them displayed transcript-or gene-level changes that suggest m6A modifications on these RNA transcripts could modulate their expression levels (Fig. 3C and Table S7). Pathway analysis indicated that these differentially regulated m6A modifications occurred on transcripts of key metabolic genes in fatty acid and alcohol metabolic processes (Fig. 3D).
Fig. 3. Dynamics of m6A modification and Poly(A) tail length of RNA transcripts in humanized livers.






(A) PCA plot of human m6A modification events localized in 3’ UTR regions with different treatments. (B) Metagene analysis of Fast and AL specific m6A modifications in human. (C) Heatmap of human m6A modification frequencies that were significantly changed between Fast and AL treatments. (D) The hub genes and the central network diagram for top 5 enriched pathways generating from the significantly changed modifications between AL and Fast. (E) Kruskal-Wallis test for poly(A) tail length variance of gene isoforms with different transcript factor agonist treatments. The dashed line labeled the significant bar (p value 0.05). (F) Left: Box plot of relative expression levels of the two APOA2 transcripts in different treatments. Right top: Violin plot of poly(A) tail lengths of the transcripts of APOA2 in different treatments. The difference between the two transcripts were analyzed by two-tailed unpaired Student’s t-test. ns p>0.05* p<0.05. Right bottom: Isoform schematic of the isoform structures of APOA2 transcripts. (G) Percentage of DETs in the human transcripts displaying differentially changed Poly(A) tail lengths (DPTLs). (H) Venn diagram of the overlapped human transcripts with DPTLs in different metabolic treatments (length difference > 10bp and p value < 0.05). (I) Top 5 significantly enriched pathways (GO BP) for human DPTLs transcripts in different treatments.
Compared to conventional RNA-seq, DRS can also precisely determine poly(A) tail length on transcripts. We found that the mean length of poly(A) tails on all human transcripts in humanized livers was around 100 nt (Fig. S3D). Compared to mRNAs, most of the remaining types of transcripts displayed shorter poly(A) tail length. Surprisingly, the poly(A) tail lengths of IncRNAs were significantly longer than mRNAs (Fig. S3D), suggesting a unique regulatory role of poly(A) tails in IncRNA metabolism and function. We also found that the poly(A) tail lengths of transcripts showed a trend of negative correlation with their expression levels (Fig. S3E), and transcripts containing intron retention usually had longer poly(A) tails (Fig. S3F).
In light of the divergent expression regulation of different transcript isoforms within genes, we explored if poly(A) tail lengths also exhibit transcript-specific regulation. We checked the variance of poly(A) tail length among the transcripts from the same genes under different treatments and found that the distributions of poly(A) tail length on transcripts varied from gene to gene. For example, while certain genes, such as CYP2E1, showed significantly divergent poly(A) tail length among all the transcripts, other genes, such as APOE, displayed similar poly(A) lengths on all transcripts (Fig. S3G). Furthermore, different treatments also regulated the diversity of the poly(A) tail lengths of transcripts from the same gene in a context dependent manner (Fig. 3E). For example, two major transcripts from APOA2 displayed no expression level changes under any treatment. While their Poly(A) tail lengths showed no changes in DMSO and PPARγ agonist treatments, they were completely different in PPARα and FXR agonist treatments (Fig. 3F).
To further identify transcripts displaying high degree of dynamics in their Poly(A) tail lengths, we analyzed the tail differences in response to different treatments. Interestingly, most transcripts whose poly(A) tail lengths were regulated by metabolic treatments had similar expression levels under both conditions, suggesting a decoupling of the two events (Fig. 3G). Furthermore, only a small percentage of transcripts showed common changes across all the treatments and most of the dynamic poly(A) tails were specifically regulated by specific metabolic treatment (Fig. 3H and Table S8). To ascertain the potential biological functions of these transcripts with poly(A) tail changes under different treatments, we performed pathway analysis and found that FXR, PPARα, PPARγ activation all impacted major metabolic processes (Fig. 3I).
Divergent transcriptome architecture and dynamics between human and mouse livers
To further understand the robustness of mouse as a model for human physiology, we compared gene regulation between human and mouse based on transcript-level analysis of gene expression in humanized livers. Remarkably, we found that human and mouse shared few commonly regulated DEGs (less than 3%) in response to metabolic treatments (Fig. S4A). To compare transcript-level dynamics of homologous genes between human and mouse, we examined the coefficient of variation (CV) of transcript distribution, which reflects the distributions of expression levels of transcripts and the transcript dynamics among genes. This analysis revealed an inconsistent ranking trend of CV value changes indicating divergent transcript regulation between the two species (Fig. S4B). For example, CYP2E1 displayed high CV values in both human and mouse but the expression levels of their transcripts displayed substantial divergence. More than half of human CYP2E1 transcripts exhibited decreased expression levels during Fast but one transcript, which was the highest expressed one, was up-regulated (Fig. S4C). However, mouse Cyp2e1 transcripts were all increased by Fast treatment. These results suggest that many human and mouse genes may undergo differential and even opposite transcript-level regulations in response to metabolic stimuli.
Furthermore, the regulations of m6A modifications on homologous protein coding genes also displayed substantial differences in human and mouse (Fig. S4D). Interestingly, the dynamic m6A modification loci showed a clear shift towards the 3’ end in response to both dietary treatment and transcription factor activations in mouse but not human (Fig. S4E), suggesting that m6A modification distribution and regulation machinery may be fundamentally different in the two species.
We also found that the overall patterns of poly(A) tail length separated nicely in a PCA analysis, indicating clear divergence of the length distribution between the two species (Fig. S4F). Furthermore, genes that had significantly changed poly(A) tail length in response to metabolic stimuli showed little overlap in the two species (Fig. S4G). Pathway analysis indicated that many genes with oppositely regulated poly(A) tail length play a role in critical metabolic processes such as lipid catabolic process (Fig. S4H). For example, while the opposite genes enriched in lipid catabolic process displayed longer poly(A) tail length during Fast treatment in human, they were significantly shorter on corresponding mouse RNAs (Fig. S4I). These data suggested that human and mouse undergo divergent regulation in poly(A) tail length which may underlie the differential gene expression regulations in the two species.
Divergent transcriptome architecture and dynamics between individuals of different genetic backgrounds
Growing evidence supports that genetic background has a strong influence on an individual’s gene regulation and disease susceptibility[28, 29]. To evaluate the differences in transcriptomes between individuals, we performed DRS on livers of humanized mice that were generated from a second independent donor and had also been subjected to Fast treatment. We found that approximately 55% of RNA transcripts detected in the second donor (donor2) matched to those in the first (donor 1) (Fig. 4A). This similarity, however, was much higher than that between donor 2 and the GENCODE reference (37%), which is a collective population annotation. Nonetheless, around 20% of transcript isoforms were conclusively different between these two donors (Fig. 4A). Interestingly, only 39 (around 40%) novel genes detected in donor1 were expressed in donor2 (Fig. 4B). As an example, the full length of the novel gene ebabaf4b was detected by Nanopore DRS in donor2 and corroborated by short read RNA-seq peaks (Fig. 4C), but no signal was detectable in donor1. This finding supports personalized activation of certain genes, possibly explaining the absence of certain novel genes in the public reference. In response to metabolic changes, such as Fast, only around 10% of the transcripts were consistently changed in the two donors (Fig. 4D). Furthermore, the overall patterns of enriched pathways based on specifically regulated transcripts in donor1 and donor2 were clearly different (Fig. 4E). To assess the consistency of inter-individual regulation, we further examined the expression levels of commonly changed transcripts during Fast. While most of them displayed consistent regulation, around 20% of transcripts were oppositely regulated (Fig. 4F). We found that transcripts consistently changed in both donors were involved in major metabolism pathways whereas oppositely regulated genes were mostly enriched in inflammatory defense responses (Fig. 4G). To further assess the impact of the baseline differences of the two donors on their responses to Fast, we performed a correlation analysis of differentially expressed genes under baseline and Fast conditions and found a clear positive correlation (Fig. S5A, Table S9), indicating that gene expression differences between the two donors at baseline are associated with divergence in their response to Fast treatment.
Fig. 4. Divergent transcriptome architecture and dynamics between individuals of different genetic backgrounds.






(A) Comparison of DRS transcripts between donor2 and donor1 or between donor 2 and the reference annotation. (B) Venn diagram of the overlapped novel genes between donor1 and donor2. (C) Isoform schematic of novel gene from donor2. Donor1 and Donor2’s raw RNA-Seq and DRS reads were also displayed. (D) Venn diagram of the overlapped DETs between donor1 and donor2. Only DETs with the matched isoforms were considered as overlapped ones. (E) Comparison of the top enriched pathways (GO BP) for common, donor1-, and donor2-specific DETs. The donor1 specifically enriched pathways were labeled as red and the donor2 specifically enriched pathways were labeled as blue. (F) Heatmap of the expression levels of commonly regulated DETs in response to AL and Fast treatments in donor1 and donor2. The summary of DETs was on the right. Only DETs with the matched isoforms between the two donors were included. (G) The most significantly enriched pathways (GO BP) for the consistent and opposite DETs between donor1 and donor2. (H) Left top: Distribution of m6A modification sites for donor1 (in red) and donor2 (in blue), with different modifications between them labeled as green. Right bottom: Venn diagram on the bottom right showed the overlapping of m6A modifications between the two donors. (I) Top: Diagram schematic of m6A modification loci on APOA4 gene body. Bottom: Heatmap of m6A modification frequency in APOA4 gene body from donor1 and donor2. The modifications were labeled by different colors. (J) Pearson’s correlation analysis of genes with commonly significantly changed Poly(A) tail lengths in donor1 and donor2. The consistent ones were labeled as dark blue and opposite ones were labeled as pink.
We also compared m6A modifications on human RNA transcripts in the humanized livers from both donors and found that most modification sites (~80%) could be found in both donors (Fig. 4H). For example, m6A modifications were absent in the 3’ end of APOA4 in AL group and seven of them were induced by Fast treatment in this region for both donors (Fig. 4I). Only 30% of common m6A modifications, however, were regulated similarly in donor1 and donor2, and donor-specifically regulated modifications were related to important pathways such as long-chain fatty acid metabolic process for donor1 and secondary alcohol biosynthetic for donor2 (Fig. S5B and S5C).
We observed a similar pattern in the changes in Poly(A) tail length between the donors. Only around 10% of regulated transcripts overlapped in donor1 and donor2 (Fig. S5D). Most of the commonly changed genes displayed consistent regulation but a small percentage of them were regulated oppositely (Fig. 4J). Pathway analysis showed that donor-specifically regulated genes were involved in cellular pathways such as regulation of mRNA metabolic processes in donor1 and mitochondrial translation in donor2 (Fig. S5E).
Discussion
In this study, we combined an isogenic humanized mouse model and Nanopore DRS to establish a de novo annotation of human liver transcriptome and subsequently studied the regulation of all detectable liver RNA transcripts at the levels of isoform expression, m6A modification and poly(A) tail length by representative metabolic conditions. Our work represents one of the first efforts to directly and comprehensively define metabolism-focused transcriptome dynamics of human liver and uncover a complex metabolic responsive landscape of the transcriptome that could help understand human liver physiology.
In order to study transcriptome dynamics of human liver under a physiological context, it is essential to establish an inclusive annotation of the human liver transcriptome and study gene regulation in conjunction. To ensure an inclusive annotation, we subjected a humanized mouse model to diverse metabolic treatments and used DRS to analyze full length native RNAs to resolve key elements of the transcriptome: transcript isoforms, transcript-level dynamics, m6A modifications, and poly(A) tail length. Profiling transcripts under multiple conditions was instrumental in capturing a wide range of transcripts and RNA modifications to study their regulation. Indeed, we found that over 30% of novel genes and transcripts identified in this annotation were only expressed under one specific condition and would otherwise be invisible if only samples of one condition were analyzed. When this work was being prepared, a report that analyzed Genotype-Tissue Expression (GTEx) samples using long-read sequencing was published uncovering a large number of novel transcripts[30], which is consistent with our conclusion. Since our annotation was based on samples under metabolic conditions which we intended to study, all conditionally expressed transcripts and RNA modifications were already included. Thus, It is no surprise that the regulated metabolic pathways identified by our transcript-level analyses were drastically different from those by conventional short-read RNA-seq, which mainly detects gene-level changes in transcripts that are documented in current reference annotations. Moreover, our study also revealed for the first time the dynamic changes in m6A modifications and poly(A) tail length of many key metabolic genes, which might constitute a new mechanism for the liver to regulate the related metabolic pathways at transcript isoform level. At a deeper level, our work was designed to reveal a holistic picture of human liver transcriptome dynamics in a physiological context, which has never been achieved before. In a cell, all gene expression networks are intertwined and strongly impact each other, and the intrinsic connections among gene networks can only be established when expression information of for all genes is available. But in a study involving individuals of different genetic backgrounds, such as a patient-based study, the holistic view would be lost once genes that were specifically regulated in certain patients were removed by cohort statistics. The humanized livers address this critical issue by being created from a single donor liver. Although only humanized livers of specific genetic backgrounds were used in this study, many of the information about gene networks uncovered in this study may well be universal and can be harnessed to uncover novel connections between metabolic gene networks.
Our data reinforced the limitations of studying human liver physiology using mice, although they no doubt remain a useful reference. In addition to gene-level difference between the two species as previously reported[2], our work also revealed much more profound differences at transcript and RNA modification levels which should serve as a strong cautionary note toward using mouse transcriptome dynamics to understand human metabolism. Of course, the humanized mouse model we were currently using has its limitations. The mice are immuno-deficient and are not suitable to study the impact of immune cells on metabolism[12]. In addition, only hepatocytes are humanized in this model and human hepatocytes are sometimes steatotic due to the incompatibility in several pathways between humans and mice[31–33], which could impact the expression and regulation of certain genes. But as human liver is inaccessible to treatments, we believe this model currently provides the closest possible approximation of human livers under representative metabolic conditions, and this study represents one of the first direct analyses of metabolism-related transcriptome dynamics in human liver. We hope this work will serve as a framework for designing additional tools to gain the detailed knowledge of gene regulation in human organs which is foundational to understanding diseases and developing effective therapies.
Materials and Methods
Animal experiments
The male TK-NOG mice (Cat. 12907, Taconic Biosciences) aged 8-10 weeks received intraperitoneal (i.p.) injection of Ganciclovir at a dose of 20 mg/kg (4 μl/g) one week before surgery. Thawed Primary human hepatocytes were recovered in Cryopreserved Hepatocyte Recovery Medium (Cat. CM700, Thermo Fisher Scientific, Waltham, MA, USA), and re-suspended in ice-cold HBSS prior to surgery. Mice were anesthetized and were injected with 1x106 primary human hepatocytes via the spleen. Eight weeks after the surgery, the humanized mice with human serum albumin levels above 0.5 mg/ml were utilized in experiments. For the dietary treatment, the humanized mice were allowed free access to food (AL, n=4 for donor1, n=3 for donor2) or subjected to twenty-four hours of food withdrawal (Fast, n=4 for donor1, n=3 for donor2). For the transcription factor agonist treatments, the humanized mice were injected with DMSO, fenofibrate (50mg/kg, Sigma-Aldrich, Cat. F6020), rosiglitazone (10mg/kg, Sigma-Aldrich, Cat. R2408) and GW4064 (30mg/kg, Sigma-Aldrich, Cat. G5172). The livers were collected after 6 hours of fasting.
Analysis of RNA-Seq data
RNAs that have been extracted by Trizol were purified using the MagMAX RNA extraction kit (Thermo Fisher Scientific, Cat.AM1830). Strand-specific sequencing libraries were constructed using the TruSeq Stranded Total RNA Prep kit (Illumina). DNA sequencings were performed at the NHLBI DNA Sequencing and Genomics Core. The original FASTQ read files were trimmed and cleaned using fastp/0.23.2, then quality analysis was performed using FastQC/0.11.8. A custom genome reference index was created by combining human (GRCh38.p13) and mouse (GRCm38.p6) genome sequences for alignment of humanized mouse chimeric liver RNA-seq data as previously described [2, 10]. The primary assemblies were obtained from the GENCODE database (https://www.gencodegenes.org/). The trimmed reads were aligned with default settings of HISAT2/2.2.1.0 and quantified using featureCounts (subread/2.0) with human annotation GENCODE v33 and mouse GENCODE vM24. The combined raw count files generated by the subread featureCounts tool were imported into the R package DESeq2/3.1.0 and used for differentially expressed gene (DEG) analysis. A cutoff of log2 fold change of more than 0.5 and p value less than 0.05 was used for differential expression for all liver samples.
Polyadenylated RNA isolation and Nanopore direct RNA sequencing (DRS)
The 75ug freshly isolated RNAs were adjusted to a volume of 100 μL with nuclease-free water. Polyadenylated RNAs were selected using Dynabeads™ mRNA Purification Kit (Invitrogen, Cat. 61006) following the manufacturer’s instructions, and the quality and quantity of RNAs were assessed using the NanoDrop 2000 spectrophotometer (Thermo Fisher Scientific). 500ng RNAs were used for Nanopore direct RNA sequencing (DRS) generally following the Oxford Nanopore Technologies (ONT) SQK-RNA002 kit protocol, including the optional reverse transcription step recommended by ONT. Libraries were loaded onto ONT R9.4 flow cells (Oxford Nanopore Technologies) and were sequenced using the GridION platform, with the standard MinKNOW/19.12.5 protocol script being used.
Isoform-identification of DRS
The analysis flow was performed as previously described[34, 35]. The ONT Guppy workflow (version 2.1.0) was used for basecalling the DRS data with the parameters “--flowcell FLO-MIN106 --kit SQK-RNA002 -x cuda:all -- records_per_fastq 0” being employed[36]. The FASTQ files were aligned to the GRCh38 human genome reference and GRCm38 mouse genome reference respectively using minimap2/2.17, with the parameters “-ax splice -uf -kl4” being employed. The FLAIR pipeline (https://github.com/BrooksLabUCSC/flair) with some modifications was employed to identify the isoform from the aligned reads. Firstly, reads with deletion length greater than 100 nt were removed. Secondly, only reads with 5’ ends that overlapped with chromatin open regions as indicated by ATAC-Seq information were considered. Next, the splice-site boundaries of DRS reads were corrected by matching them with corresponding short read splice junctions and only the junctions that were supported by at least three uniquely mapped short reads were considered valid and included [35]. Finally, FLAIR collapse with the default settings was utilized to generate DRS isoform annotations for both human and mouse, including splicing sites and sequence information.
Quantification, differential expression and alternative splicing analyses of DRS
The Nanopore reads were quantified to DRS annotation using FLAIR quantify with default parameters. Only alignments with quality scores of 1 or greater were counted. The quantified isoform counts were analyzed using FLAIR diffExp for characterization of differentially expressed genes (DEGs) and transcripts (DETs), as well as FLAIR diffSplice for analysis of alternative splicing (AS) events. The default parameters were used for all analyses. To find the DEGs and DETs, a threshold of log2 (fold change) larger than 0.5 and a p value less than 0.05 were used. The events having a p value of less than 0.05 were considered to have significantly altered AS.
m6A Modification Detection and Analysis
RNA m6A modifications of DRS were examined using MINES (https://github.com/YeoLab/MINES)[25]. Briefly, DRS reads and modification values were aligned by Tombo (https://github.com/nanoporetech/tombo) with either a genomic or a cDNA (transcriptomic) reference. Genomic references (GRCh38/hg38 for human and GRCm38/mm10 for mouse) were from GENCODE, and cDNA references from DRS isoform identification step. A new set of regions were created by extending 10 bp on either side of the “A” in the DRACH motifs to identify all areas in the reference with such motifs. These areas were filtered to have a minimum of five reads of coverage. In the same treatment group, modification events that occurred in less than half of the samples were excluded. The distance measure and metagene analysis of the modification sites were performed by MetaPlotR (https://github.com/olarerin/metaPlotR)[37]. DESeq2/3.1.0 was used to examine the differential modification events with its default settings. The modification event counts within the same genes were combined for gene level comparison. A log2 (fold change) cutoff greater than 0.5 and a p value lower than 0.05 were used to identify significantly changed modifications.
Novel Transcripts and Novel Gene Identification
Reference annotations were downloaded from GENCODE database (https://www.gencodegenes.org/,GENCODE v33 for human and GENCODE vM24 for mouse). The DRS and public reference annotation were compared using GffCompare (https://github.com/gpertea/gffcompare)[38]. The output GTF file’s attribute values from “class code” were used to categorize the transcripts/isoforms. Transcripts with a class code of “=“ were considered to be matched since their intron chains matched the reference annotation exactly. Novel/different transcripts were defined as those with “class code” of either “m, j, o, x, i, y, or u” and having different isoform structures from those in the reference annotations. Transcripts with the “class code” of either “c, k, n, e, s, p or r” and having partially matched splicing sites or no actual overlap with those in reference annotations were defined as “others”. Additionally, novel transcripts with the “class code” of “u” were categorized as new genes since they showed no overlap with any genes in reference annotations.
Statistics
The two-tailed unpaired Student’s t-test was used for comparisons between two groups shown in Fig. 1E, Fig. 2B, Fig. 3F, and Fig. 4E and Fig. S1J, 3D–F, and S4B. For comparisons of Poly(A) tail length variance, Kruskal–Wallis one-way analysis was used in Fig. 3E and Fig. S3G. A p value of less than 0.05 was considered significant. Sample size was determined based on general standards for biological studies and requirements for statistical analysis. For in vitro experiments, a minimum sample size of 3 biological replicates was used; for in vivo experiments, a sample size of 3-4 independent mice was used.
Study approval
All animal experiments were performed in accordance with and with approval from the NHLBI Animal Care and Use Committee or the Animal Care Committee of the CIEA, Kawasaki, Japan. All human-related data sets were downloaded from public domains.
Data availability
The source data underlying Figures, Extended Data Figures and Tables are provided as a Source Data file. The raw sequencing data and processed data can be accessed at GEO through the SuperSeries dataset GSE224281, which consists of multiple SubSeries, including GSE130525 and GSE224279 forRNA-Seq data of donor 1, GSE126587 forRNA-Seq data of donor 2, GSE224277 for ATAC-Seq data, and GSE224278 for nanopore direct RNA sequencing data. All data is available from the corresponding author upon reasonable request.
Supplementary Material
Impact and implications.
Direct knowledge of the human liver transcriptome is currently very limited hindering the overall understanding of human liver pathophysiology. We combined a liver-specific humanized mouse model and long read direct RNA sequencing technology to establish a de novo annotation of the human liver transcriptome and identified a multitude of regulated metabolic pathways that were otherwise invisible using conventional technologies. The extensive regulatory information of human genes we provided would allow basic scientists to infer the disease relevance of their genes of interest and physician scientists to better pinpoint the changes in metabolic networks underlying a specific pathophysiology.
Highlights.
Liver-specific humanized mice rendered human liver cells accessible to diverse treatments in a physiological context.
A de novo annotation of human liver transcriptome was established by nanopore single-molecule direct RNA sequencing (DRS) of humanized livers under diverse conditions.
Transcript-level analysis of human liver transcriptomes enabled the identification of regulated metabolic pathways that were invisible in conventional short read RNA-seq.
Nanopore DRS revealed dynamic changes in m6A and poly(A) tail length of human liver transcripts in response to metabolic stimuli.
Individuals of different genetic backgrounds display divergent baseline transcriptome architectures which strongly influence their responses to regulation.
Acknowledgments
We thank Yan Luo, Poching Liu and Yuesheng Li (NHLBI DNA Sequencing and Genomics Core) for RNA-seq analysis. The authors gratefully acknowledge the technical assistance of Ms. Megumi Nishiwaki, Mr. Takaya Homma, and Hiroaki Kato. This study was funded by NHLBI Division of Intramural Research funds to HC (1ZIAHL006103, 1ZIAHL006159).
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Conflict of interest
The authors have declared that no conflict of interest exists.
References
- [1].Breschi A, Gingeras TR, Guigo R. Comparative transcriptomics in human and mouse. Nat Rev Genet 2017; 18:425–440. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [2].Jiang C, Li P, Ruan X, et al. Comparative Transcriptomics Analyses in Livers of Mice, Humans, and Humanized Mice Define Human-Specific Gene Networks. Cells 2020;9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [3].Consortium GT, Laboratory DA, Coordinating Center -Analysis Working G, et al. Genetic effects on gene expression across human tissues. Nature 2017;550:204–213. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [4].Nilsen TW, Graveley BR. Expansion of the eukaryotic proteome by alternative splicing. Nature 2010;463:457–463. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [5].Reyes A, Huber W. Alternative start and termination sites of transcription drive most transcript isoform differences across human tissues. Nucleic Acids Res 2018;46:582–592. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [6].Jiang X, Liu B, Nie Z, et al. The role of m6A modification in the biological functions and diseases. Signal Transduct Target Ther 2021;6:74. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [7].Jobbins AM, Haberman N, Artigas N, et al. Dysregulated RNA polyadenylation contributes to metabolic impairment in non-alcoholic fatty liver disease. Nucleic Acids Res 2022;50:3379–3393. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [8].Frankish A, Diekhans M, Ferreira AM, et al. GENCODE reference annotation for the human and mouse genomes. Nucleic Acids Res 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [9].O’Leary NA, Wright MW, Brister JR, et al. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res 2016;44:D733–745. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [10].Ruan X, Li P, Chen Y, et al. In vivo functional analysis of non-conserved human lncRNAs associated with cardiometabolic traits. Nat Commun 2020; 11:45. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [11].Ruan X, Li P, Ma Y, Jiang CF, et al. Identification of human long noncoding RNAs associated with nonalcoholic fatty liver disease and metabolic homeostasis. J Clin Invest 2021; 131. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [12].Hasegawa M, Kawai K, Mitsui T, et al. The reconstituted ‘humanized liver’ in TK-NOG mice is mature and functional. Biochem Biophys Res Commun 2011;405:405–410. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [13].Bideyan L, Nagari R, Tontonoz P. Hepatic transcriptional responses to fasting and feeding. Genes Dev 2021;35:635–657. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [14].Pawlak M, Lefebvre P, Staels B. Molecular mechanism of PPARalpha action and its impact on lipid metabolism, inflammation and fibrosis in non-alcoholic fatty liver disease. J Hepatol 2015;62:720–733. [DOI] [PubMed] [Google Scholar]
- [15].Lehrke M, Lazar MA. The many faces of PPARgamma. Cell 2005;123:993–999. [DOI] [PubMed] [Google Scholar]
- [16].Calkin AC, Tontonoz P. Transcriptional integration of metabolism by the nuclear sterol-activated receptors LXR and FXR. Nat Rev Mol Cell Biol 2012;13:213–224. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [17].Leonardi T, Leger A. Nanopore RNA Sequencing Analysis. Methods Mol Biol 2021;2284:569–578. [DOI] [PubMed] [Google Scholar]
- [18].Soneson C, Yao Y, Bratus-Neuenschwander A, et al. A comprehensive examination of Nanopore native RNA sequencing for characterization of complex transcriptomes. Nat Commun 2019;10:3359. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [19].Statello L, Guo CJ, Chen LL, et al. Gene regulation by long non-coding RNAs and its biological functions. Nat Rev Mol Cell Biol 2021;22:96–118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [20].Poller W, Dimmeler S, Heymans S, et al. Non-coding RNAs in cardiovascular diseases: diagnostic and therapeutic perspectives. Eur Heart J 2018;39:2704–2716. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [21].Suppli MP, Rigbolt KTG, Veidal SS, et al. Hepatic transcriptome signatures in patients with varying degrees of nonalcoholic fatty liver disease compared with healthy normal-weight individuals. Am J Physiol Gastrointest Liver Physiol 2019;316:G462–G472. [DOI] [PubMed] [Google Scholar]
- [22].Yang L, Li P, Yang W, et al. Integrative Transcriptome Analyses of Metabolic Responses in Mice Define Pivotal LncRNA Metabolic Regulators. Cell Metab 2016;24:627–639. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [23].Kelemen O, Convertini P, Zhang Z, et al. Function of alternative splicing. Gene 2013;514:1–30. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [24].Leung SK, Jeffries AR, Castanho I, et al. Full-length transcript sequencing of human and mouse cerebral cortex identifies widespread isoform diversity and alternative splicing. Cell Rep 2021;37:110022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [25].Lorenz DA, Sathe S, Einstein JM, et al. Direct RNA sequencing enables m(6)A detection in endogenous transcript isoforms at base-specific resolution. RNA 2020;26:19–28. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [26].Meyer KD, Saletore Y, Zumbo P, et al. Comprehensive analysis of mRNA methylation reveals enrichment in 3’ UTRs and near stop codons. Cell 2012;149:1635–1646. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [27].Dominissini D, Moshitch-Moshkovitz S, Schwartz S, et al. Topology of the human and mouse m6A RNA methylomes revealed by m6A-seq. Nature 2012;485:201–206. [DOI] [PubMed] [Google Scholar]
- [28].Turan N, Katari S, Coutifaris C, et al. Explaining inter-individual variability in phenotype: is epigenetics up to the challenge? Epigenetics 2010;5:16–19. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [29].Fair BJ, Blake LE, Sarkar A, et al. Gene expression variability in human and chimpanzee populations share common determinants. Elife 2020;9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [30].Glinos DA, Garborcauskas G, Hoffman P, et al. Transcriptome variation in human tissues revealed by long-read sequencing. Nature 2022;608:353–359. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [31].Naugler WE, Tarlow BD, Fedorov LM, et al. Fibroblast Growth Factor Signaling Controls Liver Size in Mice With Humanized Livers. Gastroenterology 2015;149:728–740 e715. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [32].Tateno C, Kataoka M, Utoh R, et al. Growth hormone-dependent pathogenesis of human hepatic steatosis in a novel mouse model bearing a human hepatocyte-repopulated liver. Endocrinology 2011;152:1479–1491. [DOI] [PubMed] [Google Scholar]
- [33].Carbonaro M, Wang K, Huang H, et al. IL-6-GP130 signaling protects human hepatocytes against lipid droplet accumulation in humanized liver models. Sci Adv 2023;9:eadf4490. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [34].Workman RE, Tang AD, Tang PS, Jain M, Tyson JR, Razaghi R, et al. Nanopore native RNA sequencing of a human poly(A) transcriptome. Nat Methods 2019;16:1297–1305. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [35].Tang AD, Soulette CM, van Baren MJ, et al. Full-length transcript characterization of SF3B1 mutation in chronic lymphocytic leukemia reveals downregulation of retained introns. Nat Commun 2020;11:1438. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [36].Wick RR, Judd LM, Holt KE. Performance of neural network basecalling tools for Oxford Nanopore sequencing. Genome Biol 2019;20:129. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [37].Olarerin-George AO, Jaffrey SR. MetaPlotR: a Perl/R pipeline for plotting metagenes of nucleotide modifications and other transcriptomic sites. Bioinformatics 2017;33:1563–1564. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [38].Pertea G, Pertea M. GFF Utilities: GffRead and GffCompare. F1000Res 2020;9. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The source data underlying Figures, Extended Data Figures and Tables are provided as a Source Data file. The raw sequencing data and processed data can be accessed at GEO through the SuperSeries dataset GSE224281, which consists of multiple SubSeries, including GSE130525 and GSE224279 forRNA-Seq data of donor 1, GSE126587 forRNA-Seq data of donor 2, GSE224277 for ATAC-Seq data, and GSE224278 for nanopore direct RNA sequencing data. All data is available from the corresponding author upon reasonable request.
