Skip to main content
PLOS Biology logoLink to PLOS Biology
. 2024 Feb 16;22(2):e3002505. doi: 10.1371/journal.pbio.3002505

Single-cell analysis of isoform switching and transposable element expression during preimplantation embryonic development

Chaoyang Wang 1,2,3,#, Zhuoxing Shi 3,#, Qingpei Huang 1,2, Rong Liu 1,2, Dan Su 1,2, Lei Chang 1,2, Chuanle Xiao 3, Xiaoying Fan 1,2,4,*
Editor: Bon-Kyoung Koo5
PMCID: PMC10903961  PMID: 38363809

Abstract

Alternative splicing is an essential regulatory mechanism for development and pathogenesis. Through alternative splicing one gene can encode multiple isoforms and be translated into proteins with different functions. Therefore, this diversity is an important dimension to understand the molecular mechanism governing embryo development. Isoform expression in preimplantation embryos has been extensively investigated, leading to the discovery of new isoforms. However, the dynamics of isoform switching of different types of transcripts throughout the development remains unexplored. Here, using single-cell direct isoform sequencing in over 100 single blastomeres from the mouse oocyte to blastocyst stage, we quantified isoform expression and found that 3-prime partial transcripts lacking stop codons are highly accumulated in oocytes and zygotes. These transcripts are not transcription by-products and might play a role in maternal to zygote transition (MZT) process. Long-read sequencing also enabled us to determine the expression of transposable elements (TEs) at specific loci. In this way, we identified 3,894 TE loci that exhibited dynamic changes along the preimplantation development, likely regulating the expression of adjacent genes. Our work provides novel insights into the transcriptional regulation of early embryo development.


Alternative splicing is an essential regulatory mechanism for development and pathogenesis. This study uses single cell sequencing of mouse oocyte to plastocyst to reveal large-scale of isoform switching and transposable element regulation during the maternal-to-zygotic gene transition.

Introduction

A gene can be transcribed into various isoforms, which are then translated into different proteins. Isoform compositions differ between cell types and states, making isoform switching a crucial factor in determining cell identity [1,2]. Third-generation sequencing-based single-cell RNA-sequencing methods like SCAN-seq, HIT-scISOseq, and MAS-ISO-seq have been developed to directly sequence gene isoforms [1,37]. SCAN-seq is known for its high gene detection sensitivity and ability to detect many novel transcripts in rare samples [4]. However, it fails to quantify the absolute abundance of genes and isoforms due to the high error rate of Nanopore sequencing [8,9]. On the other hand, HIT-scISOseq and MAS-ISO-seq use the PacBio HiFi sequencing platform to quantify isoform abundance in single cells with improved data throughput [6,7].

Isoform switch plays an important role in cell fate determination. PBX1, for example, can be transcribed into 3 different isoforms, each with distinct functions. PBX1a maintains the pluripotency of mouse embryonic stem cells (mESCs), while PBX1b promotes differentiation [10]. Other genes such as Tcf3 and Sall4 have similar regulatory patterns in mESCs [11,12]. The molecular regulation of preimplantation embryo development has been the focus of many studies, particularly maternal to zygote transition (MZT), which is crucial for whole-body development [1316]. Although hundreds of genes have been identified in zygotic genome activation (ZGA), the functional regulators remain largely unclear, including whether isoform switching participates in the process [1719].

Transposable elements (TEs) account for approximately 46% and approximately 37.5% of the human and mouse genome, respectively [20,21], contributing to evolution and genetic regulation. They can be divided into 2 major classes based on the transposition mode: class I retrotransposon and class II DNA transposon [22,23]. Class I, which makes up about 95% of total TEs, includes long and short interspersed elements (LINEs and SINEs, respectively) and long terminal repeats (LTRs). TEs are known to play a crucial role during embryo development [24]. For instance, MERVL and MT2_mm (a truncated form of ERVL containing only the LTR domain) can serve as promoters for totipotent gene expression, and their expression has been considered an essential totipotent biomarker [18,25,26]. LINEs, particularly LINE1, have been reported to suppress the expression of totipotent genes such as Dux [27,28]. A previous study showed that the hominoid-specific transposon (SINE-VNTR-Alu) acts as an enhancer to promote the ZGA process [29]. However, due to their highly repetitive nature, it is challenging to determine the activity of TEs at the locus level with limited read length. Analyzing TE expression in specific loci is therefore important for gene transcriptional regulation.

In this study, we adapted the HIT-scISOseq method for low-throughput cell analysis and sequenced isoforms in single blastomeres of mouse preimplantation embryos [6]. We analyzed cell heterogeneity within the same embryos at the same stage, providing insights into the timing of cell fate diversification during preimplantation embryo development. Isoforms of each gene in every single cell were quantified, and different isoform types showed varied proportions across embryonic stages. Notably, a significant number of 3-prime partial transcripts (lacking stop codons and generating proteins lacking C-termini) were observed in oocytes and zygotes, but were quickly degraded at the early 2-cell (E2C) stage. Furthermore, locus-specific TEs were analyzed, revealing dynamic expression changes during embryonic development. These TEs showed high correlation with the expression of adjacent genes, indicating their potential importance in developmental events.

Results

Modified HIT-scISOseq for the mouse preimplantation embryos sequencing

To identify gene isoforms in each blastomere of mouse oocytes and preimplantation embryos, we amplified RNAs in individual cells using a 10× gel bead and the Smart-seq2 protocol [6,30]. Subsequently, the amplified cDNAs from different cells were combined for ligation and PacBio library construction following the HIT-scISOseq method [6]. Concurrently, the barcode sequence of each cell was predetermined through Sanger sequencing of the cDNAs (Fig 1A). Data on gene and isoform expression from 161 single blastomeres were collected from 3 batches, covering various developmental stages and mESC (Table 1). Each sequencing batch produced approximately 5 million circular consensus sequencing (CCS) reads in 1 SMRT Cell 8M, with an average length of around 4 kb, indicating ligation of 2 to 3 cDNA molecules in most cases. After data splitting and mapping, about 90% of the isoforms could be accurately assigned to cells (Table 1). This approach allowed for relatively deep sequencing of samples from each stage (Fig 1B). To assess the precision of measuring absolute numbers of isoforms using our method, we also amplified ERCC and SIRV spike-ins [31,32]. At the gene level, we observed high correlation values between the added molecules and the detected UMI counts (Fig 1C). At the isoform level, different isoforms of the same SIRV gene were accurately identified without any false matches (Fig 1D). These findings demonstrate that our workflow precisely measures transcript abundance in each single cell.

Fig 1. Quality evaluation of the single-cell isoform expression data.

Fig 1

(A) Diagram of the experimental and analysis workflow for single-cell isoform sequencing of mouse preimplantation embryos. (B) Saturation curve of representative cells from each stage. The raw data for this plot is supplied in S1 Data. (C) Correlation between detected UMI counts and absolute spiked abundances of each ERCC gene. The raw data for this plot is supplied in S1 Data. (D) Isoform mapping results of the SIRV spike-ins. The raw data for this plot is supplied in S1 Data.

Table 1. Quality evaluation of samples from 3 batches.

Library Batch 1 Batch 2 Batch 3
Polymerase Reads Polymerase Reads 5534582 6002833 7099439
Polymerase Yield (GB) 427.02 540.46 410.25
Polymerase Max Length 479619 493469 483234
Polymerase Mean Length 77155.42 90033.59 57786.35
Polymerase Read N50 154554 167339 130615
Subreads Subread Yield (GB) 420.1 535.33 404.78
Subreads Max Length 479619 493469 483234
Subreads Mean Length 3001.47 4482.57 3127.49
Subread N50 3557 5186 3508
CCS CCS Reads 4412788 5079207 5914414
CCS Yield (GB) 15.6 25.46 23.5
CCS Max Length 27871 27593 27810
CCS Mean Length 3534.33 5013.54 3973.65
CCS N50 Length 4216 5791 4760
CCS Mean Passes 27 19 17
CCS MeanQV 0.96 0.97 0.94
Full-length (FL) Isoform Detection All Paired 11184443 20967437 11008106
FL 10013148 19427495 9427600
Non-FL 206358 283797 283953
Unknow 964937 1256144 1296552
FL (%) 89.53 92.66 85.64
Non-FL (%) 1.85 1.35 2.58
Unknow (%) 8.63 5.99 11.78
FL MeanLen 934.75 888.57 1081.15
FL N50 1128 1044 1365
Cell Barcode (CB) Identification FLNC 10013148 19427495 9427600
CB in Whitelist 8858087 17083701 7834317
CB in Whitelist (%) 88.46 87.94 83.1
CB Correction 254281 553029 334615
CB Correction (%) 2.54 2.85 3.55
Total Corrected CB 9112368 17636730 8168932
Total Corrected CB (%) 91 90.78 86.65

Gene and isoform expression patterns in the mouse preimplantation embryos

The mouse oocyte and zygote contain a higher number of RNA molecules compared to later stage blastomeres due to maternally inherited RNA degradation [33]. The number of transcript molecules was strongly correlated (R = 0.96) with the number of expressed genes in the cells (Fig 2A). Principal component analysis (PCA) using gene expression data and isoform expression data showed that blastomeres of different stages were clearly separated (Fig 2B and 2C). The oocyte and zygote exhibited similar expression patterns; Late 2-cell (L2C) and 4-cell blastomeres were grouped together; the 8-cell, 16-cell, and 32-cell stages were similar; and the blastocyst cells were analogous to the mESCs. Stage-specific genes and transcripts were extracted, resulting in 3,867 genes and 6,819 isoforms, respectively (S1 Table). These were divided into 6 corresponding clusters based on their expression patterns across all embryonic stages (Fig 2D and 2E and S1 Table). Cluster 1 (C1) transcripts were highly abundant in oocytes and zygotes, subsequently degraded from E2C stage. Cluster 2 (C2) included transcripts that were only up-regulated in the E2C stage. Cluster 3 (C3), cluster 4 (C4), and cluster 5 (C5) transcripts were highly expressed in the L2C to 4-cell stages, 8-cell to 32-cell stages, and blastocyst stages, respectively. The mESC-specific transcripts were in cluster 6 (C6). More genes were identified in each cluster at the isoform level, and most of the isoforms were consistent with the genes (Fig 2F and S1 Table). The results indicate that single-cell isoform expression data can be used to illustrate cellular heterogeneity and distinguish different types of cells as single-cell gene expression does.

Fig 2. Gene and isoform expression in the mouse preimplantation embryos.

Fig 2

(A) Number of genes and UMIs detected in each cell at different stages. The raw data for this plot is supplied in S2 Data. (B, C) PCA plot of all the blastomeres and mESCs based on gene expression (B) and isoform expression (C). The PCA loadings are supplied in S1 Table. The raw data for these 2 plots are supplied in S2 Data. (D, E) Heatmap of stage-specific genes (D) and isoforms (E). The raw data for these 2 plots are supplied in S2 Data. (F) Venn plot of pairwise groups of stage-specific genes and isoform-corresponding genes. (G) Isoform switch during embryo development, the upper picture is the reference transcript and our sequence isoform and the lower part is the gene and main isoform expression of Cfdp1 and Cnot7. The raw data for this plot is supplied in S2 Data. (H) Number of isoform switch events between each adjacent embryonic stages. The raw data for this plot is supplied in S2 Data. (I) GO results of genes showed isoform switch during preimplantation embryo development. The raw data for this plot is supplied in S2 Data. GO, Gene Ontology; mESC, mouse embryonic stem cell; PCA, principal component analysis.

Furthermore, isoforms were found to show different expression patterns compared to the host genes. Isoform switch events largely occurred during the transition from zygote to E2C and from E2C to L2C stages, during the time windows of minor and major ZGA, respectively (Fig 2H and S1 Table). For example, Cfdp1 increased the gene expression from E2C to L2C, with its isoform PB.73528.5 showing the same pattern, but the other isoform, PB.73528.7, was largely down-regulation at the same stage (Fig 2G). The expression levels of Cnot7 gene and its isoforms PB.71257.4 and PB.71257.165 decreased from E2C to L2C, while PB.71257.450, another isoform of the gene was inversely changed (Fig 2G). Gene Ontology (GO) analysis revealed that the genes happened with isoform switch events were enriched in cytoplasmic translation, cell division, mRNA and DNA metabolic processes, etc. (Fig 2I). Functional and structural changes were identified in only 7 genes using FunFam [34,35] (S1 Fig). These results indicate that isoform switch regulation largely exists during embryonic development, especially during the ZGA process, in addition to gene expression level.

Isoform diversity decreases along preimplantation embryo development

To explore the connection between gene and isoform expression during mouse preimplantation development, we grouped genes into 6 categories based on the number of isoform types they expressed (S2A Fig). While most genes expressed only 1 type of isoform across different stages, more genes expressed multiple types of isoforms in the earlier stages. In mouse oocytes and zygotes, around 60% of genes expressed more than 1 type of isoform, and nearly 20% of genes were found with over 5 types of isoforms. In contrast, approximately 70% of genes in mESCs expressed only 1 type of isoform, and less than 5% of genes expressed more than 5 types of isoforms (S2A Fig). The same isoform expression characteristics were observed in SCAN-seq data (S2B Fig), indicating a diverse range of isoforms in early mouse embryos [4]. To rule out the possibility that this observation was caused by higher mRNA abundance in early embryos, especially in oocytes and zygotes, we performed oocyte splits. The results showed that the ratios of genes containing different numbers of isoform types were almost consistent among intact oocytes, 1/2 oocytes, and 1/4 oocytes (S2C Fig), suggesting that the detected isoform diversity was hardly affected by the amount of mRNAs. Additionally, the genes expressing more types of isoforms were detected with higher expression levels in both our data and SCAN-seq data (S2D and S2E Fig). To validate this hypothesis, we randomly selected 3 highly expressed genes (CPM > 100) and 3 lowly expressed genes (CPM < 10) in mESC to confirm their isoform diversity by reverse transcription and PCR (RT-PCR). Although there were more types of isoforms revealed by RT-PCR than the sequencing results, the highly expressed genes still showed higher isoform diversity (S2F Fig). We then assessed the isoform dominant level in each gene expressing multiple types of isoforms by calculating the ratio of the UMI number of the major isoform to the total UMI number of the corresponding gene. The major isoform ratios increased from early to late embryonic stages, especially after the ZGA process (S2G Fig). In comparison, the major isoforms accounted for 90% of most genes in mESCs, indicating a dominant isoform expression pattern and less isoform diversity in these cells.

Large abundance of 3-prime partial transcripts are observed in mouse oocytes and zygotes

Based on the putative integrity of corresponding open reading frames (ORFs), the transcripts were categorized into 5 types: complete isoforms encoding the full ORFs, 3-prime partial transcripts and 5-prime partial transcripts lacking the stop codon and start codon sections respectively, internal transcripts predicted with proteins lacking both ends, and others where the detected ORF lengths in the isoforms were below the software-set threshold [36] (Fig 3A). When compared to the annotated transcription start site (TSS), the 5-prime partial transcripts exhibited the lowest overlap ratio with the CAGE peaks (S3A Fig), indicating that some of these transcripts might be generated by incomplete reverse transcription.

Fig 3. Expression patterns of different types of transcripts during mouse preimplantation embryo development.

Fig 3

(A) Schematic diagram of different transcript types. (B) Ratios of each type of transcripts at different stages. The raw data for this plot is supplied in S3 Data. (C) Number of 3-prime partial transcripts detected at each stage. The raw data for this plot is supplied in S3 Data. (D) The GO analysis of 3-prime partial transcripts. The raw data for this plot is supplied in S3 Data. (E) Gel picture showing the isoforms by RT-PCR of Dnajc3, Sf3b2, Hsp90aa1, and Sprk1 in mouse oocytes and mESCs. The raw image for this plot is supplied in S1 Raw Images. (F) Sanger sequencing of the candidate 3-prime partial transcripts in Fig 3E. GO, Gene Ontology; mESC, mouse embryonic stem cell.

As anticipated, the complete transcripts displayed the longest lengths, while the internal transcripts were the shortest (S3B Fig). However, the predicted protein length was similar for the 3 incomplete transcript types (S3C Fig). Intriguingly, we observed that the 3-prime partial transcripts were highly expressed in oocyte and zygote, but their expression dramatically decreased from the E2C stage (Fig 3B and 3C). This expression pattern was also observed in the SCAN-seq data (S3D and S3E Fig). Subsequently, we conducted GO analysis on genes detected with 3-prime partial transcripts (S2 Table). These genes were enriched in pathways related to RNA processing, cell cycle checkpoint, ribonucleoprotein complex biogenesis, DNA metabolic process, chromatin organization, etc. (Fig 3D), all of which are known to play essential roles in mouse and human preimplantation embryo development [13,16,3740].

We then selected some candidates to validate the enrichment of 3-prime partial transcripts. The host genes related to RNA processing (Sf3b2, Srpk1) and protein translation and transporting (Dnajc3, Hsp90aa1) were revealed by gel analysis of mouse oocyte RT-PCR products (Fig 3E). Furthermore, no stop codons were identified in these transcripts according to the Sanger sequencing results (Fig 3F).

Expression of Ncl showed significantly isoform switch during embryonic development

The Nucleolin gene encodes NCL, which is involved in various cellular processes such as ribosome biogenesis, chromatin organization and stability, and DNA and RNA metabolism [41]. It also regulates totipotent genes expression with KAP1 and LINE1 [27,28]. Isoform sequencing revealed that the Ncl gene encodes 6 isoform types, categorized based on the number of RNA recognition motifs (RRMs) they contain (Fig 4A). RT-PCR showed that the short isoform (Ncl-S-350) was more abundant than the complete isoform (Ncl-FL-71) in mouse oocyte (Fig 4B). The 2 of most enriched short isoforms were confirmed as 3-prime partial isoform by Sanger sequencing (Fig 4C). Ncl abundance was first down-regulated in the E2C stage and then increased from the L2C stage at the gene level (Fig 4D). The 2 categories of short Ncl isoforms were highly expressed in oocytes and zygotes and then almost disappeared. Conversely, the complete Ncl isoform showed lower expression in maternal RNA and was largely up-regulated after ZGA (Fig 4D). Our findings highlight the isoform switch of the Ncl gene during the ZGA process, which is masked in gene-level analysis.

Fig 4. Isoform expression pattern of Ncl during mouse preimplantation embryonic development.

Fig 4

(A) Schematic diagram of Ncl isoforms. (B) Gel picture (left) and Q-sep result (right) of Ncl RT-PCR products from mouse oocyte. The raw image for this plot is supplied in S1 Raw Images. (C) The Sanger sequencing results of the top 2 enriched short Ncl isoforms. (D) Expression levels of each category of Ncl isoforms and the Ncl gene at different stages. The raw data for this plot is supplied in S4 Data. (E) The relative ratios of Ncl full-length isoforms at different stages detected by RT-qPCR. The relative ratio of Ncl-FL/All in morula was set as 1.0. The morula had 1 sample, while other stages all had 3 replicates. The raw data for this plot is supplied in S4 Data.

To demonstrate this result, we performed reverse transcription and real-time quantitative PCR (RT–qPCR) using primers targeting all the Ncl isoform types or only the complete type, respectively, and calculated the relative percentages of the full-length isoform in different stages of mouse preimplantation embryos. As expected, less than 20% of the Ncl transcripts are complete in oocytes when we set the full-length relative ratio as 100% at the morula stage (Fig 4E). This result confirmed the dynamic isoform switch during embryo development in vivo.

TEs are dynamically activated during embryonic development

Due to the repetitive and interspersed features of TE sequences and their transcripts, TGS-based sequencing is more suitable for TE research [42,43]. Our long-read and highly accurate results can help us investigate TE expression at specific loci. TE expression was quantified in each single cell. Generally, the amount of TE RNAs belonging to different super-families decreased along embryonic stages (Fig 5A). Although maternal RNA contains the largest pool of TE elements, zygotes were detected with more TE RNA copies, suggesting TE as an important regulator to promote minor ZGA (Fig 5A). Additionally, TE expression elevated from 32-cell to blastocyst stage, indicating that TEs play a role in embryonic pluripotent stem cells. Our single-cell direct isoform sequencing data enabled mapping the TE reads to specific loci confidently, and we also calculated the number of expressed TE loci at each stage. Hundreds of TE loci were transiently transcribed at the zygote stage, further supporting the deduction that TEs regulate minor ZGA (Fig 5B). More active TE loci were detected in blastocysts than morulae, also indicating the important role of TEs in embryonic pluripotent stem cells (Fig 5B).

Fig 5. Dynamic expression patterns of TEs during mouse preimplantation embryo development.

Fig 5

(A) UMI counts of each TE superfamily detected in single cells at each stage. The raw data for this plot is supplied in S5 Data. (B) Number of expressed TE loci per cell at each stage. The raw data for this plot is supplied in S5 Data. (C) Expression level of each TE locus (upper panel) and all TE counts (lower panel) belonging to different classes in each stage, satellite was also included. Each dot represents a cell. Only active loci in each cell were calculated. The raw data for this plot is supplied in S5 Data. (D) Expression heatmap of stage specific TEs (n = 3,894 unique loci). The TEs were ordered by their averaged expression levels within each stage (also see S3 Table). The raw data for this plot is supplied in S5 Data. (E) The top5 subfamilies of ERVL expressed in clusters C1–C3 in Fig 5D. (F) The top5 subfamilies of LINE1 expressed in clusters C1–C3 in Fig 5D. (G) Correlation values between expression levels of representative stage-specific LINE1 and ERVL loci (averaged UMI counts >5) and their adjacent genes (<10 kb distance). Non-adjacent genes are defined as genes over 1 Mb away from the TE loci. The raw data for this plot is supplied in S5 Data. (H) Heatmap showing examples of consistent expressions between specific TE loci and their adjacent genes along developmental stages. The raw data for this plot is supplied in S5 Data. LINE, long interspersed element; TE, transposable element.

We further calculated the TE expression level according to total reads or single locus mapped reads belonging to each TE superfamily in each single cell (Fig 5C). In both calculation ways, the oocyte and zygote were detected with higher expression levels. However, more different expression patterns were observed between total TE superfamily expression and individual TE locus expression. For example, LINE was slightly down-regulated from E2C to 32-cell stage when looking at the superfamily, but the transcription level at each locus was up-regulated. Differently, the total LTR expression transiently increased at L2C stage, but each locus was detected with lower expression level (Fig 5C). These differences still exist when calculating in each TE family (S4A and S4B Fig). The ERVL, which has been proven to be associated with totipotent genes’ activation [18,25,26], is indeed the highest expressed in L2C samples. However, each active ERVL site in the L2C genome did not express the most copies of corresponding RNAs. This indicates that more ERVL sites are transiently active to regulate a large scale of major ZGA genes at L2C stage (S4A and S4B Fig). Gaining information on locus-specific TE expression may help us to gain a deeper understanding of how TEs regulate different developmental processes.

To study the role of specific TEs at distinct locations, not categorized in a family or subfamily, in regulating preimplantation embryo development, we sought out TE loci with stage-specific expression patterns. We identified a total of 3,894 TE loci, which could be classified into 5 clusters based on their expression patterns across all embryonic stages (Fig 5D and S3 Table). Specifically, Cluster 1 (C1) TEs exhibited higher expression in oocytes and zygotes. Cluster 2 (C2) TEs were predominantly expressed in the E2C stage. Cluster 3 (C3) and Cluster 4 (C4) TEs showed high expression in the L2C to 4-cell stages and 8-cell to blastocyst stages, respectively. The mESC-specific TEs were grouped in Cluster 5 (C5).

Subsequently, we explored the involvement of ERVL and LINE1 subfamilies in the MZT process and later stage development [18,2527] (Figs 5E, 5F, S4C and S4D). As anticipated, MERVL-int and MT2_mm, which only contain the LTR promoter of MERVL element, were the primary active MERVL subtypes in C3, a stage during which major totipotent genes are expressed [19,44] (Fig 5E). Conversely, MT2B1 was the most active subtype in the maternal genome. The LINE1 superfamily has been reported to silence totipotent genes such as Dux [27]. In our data, Lx7 emerged as the most active LINE1 subfamily since the L2C stage (Figs 5F and S4D).

A TE subfamily consists of hundreds to thousands of TE copies from different loci, and these copies are transcribed independently. We observed diverse activation of different TE copies even within the same subfamily (S5A and S5B Fig). For example, although MERVL-int and MT2_mm participate in ZGA, some copies from chromosome 1 and chromosome 5 were exclusively active in the maternal genome (S5A Fig). Most of the LINE1 copies belonging to different subfamilies were actively transcribed in oocytes, but more Lx7 copies became active only after the E2C stage (S5B Fig). The expression level of TE loci showed a notably higher correlation with that of adjacent genes (Fig 5G). We observed consistent expression of specific LINE1 and ERVL loci with adjacent genes, indicating common active regulations for different TE families (Fig 5H). As TEs can function as transcriptional regulatory elements such as enhancers, detecting locus-specific TE expression provides more insights into understanding the regulatory mechanism of preimplantation embryo development [45].

Discussion

Single-cell sequencing has significantly advanced our research in preimplantation embryo development. We have identified genes, isoforms, and TE elements that are specific to different developmental stages, revealing a rich diversity of isoforms in the maternal contents. By modifying the HIT-scISO-seq to a low-throughput method, we were able to track cells from the same embryo during the experiment. This single-cell approach not only illustrated differences across developmental stages but also highlighted cell heterogeneity within a specific stage, allowing for the evaluation of fate differentiation of blastomeres within an embryo.

The 2 blastomeres in the 2-cell embryo showed high consistency in both gene and isoform expression levels, but became increasingly different from each other from the 4-cell stage (S5C and S5D Fig), that fate divergence could be revealed at the 4-cell stage on the transcriptional level. When comparing the heterogeneity between cells from different embryos at the same stage, the correlation values also decreased along with developmental stages. The decreased cell–cell correlation at the zygote stage likely resulted from batch effects of different embryo replicates. However, the low-throughput approach, affected by the low-throughput and high cost of TGS, also makes it difficult to obtain a large sample size, especially since the cells collected in the blastocyst stage are insufficient to illustrate intra-embryo heterogeneity. We believe that future studies using high-throughput approaches are more powerful in fully specifying the transcript regulations in cell fate specification at the blastocyst stage.

TGS-based isoform sequencing, at the bulk or single-cell level, has annotated an abundance of novel transcripts and splicing events in preimplantation embryos [4,17,22]. However, it remains unclear how different types of isoforms regulate the developmental process. In the present study, by dividing the transcripts into subtypes according to their coding characteristics, we found a large number of 3-prime partial transcripts, which lack stop codons, in mouse MII oocyte and zygote (Fig 3B, 3C and 3E). This type of transcript has been extensively studied in cancer and is considered an oncogenic factor [46]. In early embryos, these transcripts might be important for the MZT process, as the host genes are highly enriched in biological processes responsible for mouse and human preimplantation embryo development (Fig 3D). Further studies are needed to resolve the generation and function of these transcripts.

TEs are the main components of the mammalian genome. However, their biological function is still largely unclear, and most of them were previously regarded as parasites or “junk DNA” [27]. Although some TE classes have been investigated in preimplantation embryos, previous studies almost exclusively used NGS-based analysis, only revealing the TEs at the subfamily level [22,27,47,48]. Our long-read direct isoform sequencing directly quantifies the TE transcription from different loci (Figs 5 and S3). Our data displays more detailed TE expression dynamics, which helps us to investigate these genome “dark matters” in more detail.

TEs had been reported to regulate gene expression by different ways. For example, LINE1 has the ability to increase the chromatin accessibility [21,27,49], while MERVL and MT2_mm can derive totipotent gene expression in 2-cell and 4-cell embryo [18,25,26]. Therefore, exploring how dynamic expression of locus-specific TEs regulates gene expression requires further investigation. On the other hand, abnormal expression of TEs in most differentiated tissues is harmful to humans. For instance, LINE1 overexpression is highly related to cancers such as gastric cancer and lung squamous cell carcinoma [5052]. HERVK overexpression is related to aging in mouse, monkey, and human [53]. Therefore, measuring TE expression at the locus-specific level offers a new way to decode the mechanisms in various human diseases.

Beyond just TE expression, several studies have identified transcript isoforms where TEs are used as alternative promoters for gene expression [18,5457]. We also attempted to find TE chimeric transcripts in the mouse preimplantation embryos. A total of 6,143 TE chimeric transcripts were identified, with over 80% only detected in 1 cell with 1 copy (S4 Table). One reason could be the low expression levels of these transcripts. Considering the limited sequencing depth from the TGS in this study, it is difficult to fully detect these transcripts. Another reason might be the current bioinformatic methods, which are not suitable for analyzing the TE chimeric transcripts. As the recent study by Berrens and colleagues [22] also captured quite a limited number of TE-derived isoforms in each cell (approximately 200 for the mouse 2-cell sample and approximately 20 for human iPSCs). The increase of TGS throughput and the development of bioinformatics would help us to further explore on such interesting regulations.

Materials and methods

Ethics statement

All animal experiments were performed according to the guidelines of the Institutional Animal Care and the Ethics Committee of the Guangzhou Institutes of Biomedicine and Health (Guangzhou, China). The research license number is IACUC2020113.

Animals and single blastomere collection

We used 6- to 8-week-old C57BL/6J female mice and DBA/2NCrl male mice in the experiment. The female mice were first injected with 7.5 IU of pregnant mare’s serum gonadotropin (PMSG) (Ningbo SanSheng Biological Technology, Cat. 110044564) and with 7.5 IU of human chorionic gonadotropin (hCG) (Ningbo SanSheng Biological Technology, Cat. 50030248) after 46 to 48 h injected. After mating, the embryos of each stage were collected at defined time periods after hCG administration [58]: 20 h (MII oocyte, no mating), 22 to 24 h (zygote), 30 to 32 h (early 2cell), 46 to 48 h (late 2cell), 54 to 56 h (4cell), 68 to 70 h (8-cell), 78 to 80 h (16cell to 32cell), and 88 to 90 h (early blastocyst). All animal experiments were performed according to the guidelines of the Guangzhou Institutes of Biomedicine and Health (Guangzhou, China). Collection of single blastomeres at each stage was carried out as previously described [4].

Single-cell cDNA amplification and TGS library construction for PacBio sequencing

We used the same amplification procedure as SCAN-seq [4], except for changing the reverse transcription primer with a 10× gel bead for each reaction for the embryonic samples. Then, each pre-amplification product was purified by 0.6× Ampure XP beads (Beckman, Cat. A63882). The concentration was measured using Qubit dsDNA HS and BR Assay Kits (Invitrogen, Cat. Q32854). The PCR product from about 60 blastomeres which were confirmed of effective amplification were pooled together in proportion to the number of amplified cycles. We took 100 ng of the pooled cDNA to build PacBio sequencing library following the protocol of HIT-scISOseq [6] and sequenced for 1 cell with HiFi mode.

Cell barcode sequence identification by Sanger sequencing

About 2 ng of the pre-amplified cDNA of each cell was further amplified using 2 × Taq Plus Master Mix (Vazyme, Cat. P212), and then cloned into T vector (Transgen, Cat. CT111-01). Next, the ligated plasmid transferred into Trans5α chemically competent cell (Transgen, Cat. CD201-01) by heat shock. The M13 primer were used to identify positive clones inserted with cDNA fragments. Single clones of bacteria were collected for Sanger sequencing to identify the barcode sequence of each cell.

Mouse ES cell culture

Mouse E14 Tg2A (E14) ES cells (male) were used for all experiments. The mESCs were cultured on 0.1% gelatin-coated plates in ES-FBS culture medium as previously described [27].

Validation of the 3-prime partial transcripts

The oocytes or the mESCs RNA were reverse transcription using oligo-dT primer (AAGCAGTGGTATCAACGCAGAGTACTTTTTTTTTTTTTTTTTTTTTTTTT). Then, the anchor sequence of oligo-dT primer and a primer located in the start codon of the interested gene (Ncl: ATGGTGAAGCTCGCAAAGGC; Dnajc3: ATGGTGGCCCCCGGCTCGGTG; Sf3b2: ATGGCGGCGGAGCATCCCGAACCT; Hsp90aa1: ATGCCTGAGGAAACCCAGACCCA; Srpk1: ATGGAGCGGAAAGTGCTCGCGCT) were used to amplify all isoforms containing the 5′ sites. The PCR products were first checked on 1.5% agarose gel. The candidate gel bands were recovered and the sequences were confirmed by Sanger sequencing.

Validation isoform diversity of highly and lower expressed genes

The mESCs RNA were reverse transcription using oligo-dT primer as previous mentioned. Then, the forward primer that could distinguish most diverse isoforms was used to amplification target genes (Srsf7: ATGTCACGCTACGGGCGGTA; Rps5: CTGTCTGTATCAGGGCGGCG; Rps19: TTTCCCCTGGCTGGCAGCGC; Plp2: ATGGCGGATTCTGAGCGTCT; Mrps6: ATGCCCCGCTACGAGTTGGC; Ss18l2: ATGTCTGTCATCTTCGCTCCTG). All the reverse primer used was oligo-dT. The PCR products were checked on 1.5% agarose gel.

Single-cell isoform sequencing data processing

We used stand-alone versions of SMRT-Link (version 8.0.0.80529) software package to transform raw Subreads to Calling Circular Consensus Sequencing (CCS) reads with the following parameters: “—min-passes 0—min-length 50—max-length 21000—min-rq 0.75.” After CCS calling, we used HIT-scISOseq analysis software kit scISA-Tools (https://github.com/shizhuoxing/scISA-Tools) for Full-Length Non-Concatemer (FLNC) reads identification, cell barcode and UMI extraction and correction.

Alignment and generation of single-cell gene expression matrix

After trimming the primers, cell barcodes, UMIs, and polyA tails, the remaining FLNC sequences were aligned to mouse genome (10x Genomics pre-build mouse mm10 reference dataset: refdata-gex-mm10-2020-A) using minimap2 (version 2.17-r974-dirty) in spliced alignment mode with the following parameters: “-ax splice -uf—secondary = no -C5.” Then, we used gffcompare (version 0.11.6) to assign the mapped FLNCs to mm10 annotation gene models (10x Genomics pre-build mouse mm10 reference dataset: refdata-gex-mm10-2020-A) base on FLNCs genome alignment SAM file. Next, based on the identified cell barcodes, we used scGene_matrix utility of scISA-Tools to generate the single-cell gene expression matrix. The expression values were normalized as copy number per 100,000 mapped reads (CPM/10).

SIRV data evaluation

The FLNC reads were aligned to the SIRVome using minimap2 (version 2.17-r974-dirty) with the following parameters: “-ax splice -uf—MD—sam-hit-only.” We only annotated the reads with assigned barcodes and valid UMIs. Then, we used gffcompare (version 0.11.6) to assign the mapped FLNCs to SIRVome annotation GTF (SIRV-Set4) base on FLNCs SIRVome alignment SAM file. A confusion matrix was generated with the counts of FLNCs assigned to the primary SIRV isoforms or not using an in-house script.

Nonredundant isoforms classification and quality assessment

First, the “collapse_isoforms_by_sam.py” python script in cDNA_Cupcake software package (https://github.com/Magdoll/cDNA_Cupcake) was used to collapse mapped FLNCs to nonredundant isoforms with parameters: “—dun-merge-5-shorter.” After that, we used SQANTI3 (https://github.com/ConesaLab/SQANTI3). To assess whether transcripts are within known TSSs, we aligned them using the CAGE peak data (mouse.refTSS_v3.1.mm10.bed) provided with the “—CAGE_peak” parameter in SQANTI3. We further used SQANTI3 “RulesFilter” script to filtered artifact isoforms. Isoforms which classified as FSM, ISM, NIC, and NNC were kept for downstream analysis.

Isoform type classification by ORF prediction

After being processed with SQANTI3, the FASTA file of mapped genome sequences were extracted according to the SQANTI3 output GTF file. Base on the FASTA file, we used TransDecoder (v5.5.0) for ORF extraction and prediction. For those predicted with multiple ORFs, the longest ones were selected as the representative ORF. TransDecoder assigns each detected isoform as one of 4 types based on whether then contains the start and stop codon of the reference ORF: complete, 5-prime partial, 3-prime partial, and internal. Additionally, we assigned the isoforms that did not mapped to an ORF region by TransDecoder as others.

Generation of single-cell isoform expression matrix

After the SQANTI3 procedure, the scIsoform_matrix utility of scISA-Tools was used to generate single-cell isoform expression matrix based on the identified cell barcodes. We further filtered isoforms detected in less than 5 cells and finally 68,012 isoforms in the mouse embryonic samples were preserved.

PCA analysis based on gene and isoform expression

Before PCA dimensionality reduction, we used “FindVariableGenes()” function in Seurat R package to select the top 1,000 highly variable genes and isoforms, respectively. Then, the “PCA()” function in FactoMineR was used for dimension reduction process and we used the function “fviz_pca_ind()” in factoextra R package to plot the PCA map.

Stage-specific genes and isoforms

Based on the gene expression matrix and isoform expression matrix, respectively, we used “edgeR” to find differentially expressed genes/transcripts between each pair of adjacent embryonic stages under the criterion of logFC>1 and p-value <0.01. A total of 3,867 and 6,819 stage-specific genes and transcripts were identified, respectively. These genes and transcripts were clustered into 6 groups according to their expression patterns across all stages. Visualization of these genes’ and transcripts’ expression was done using R package “pheatmap.”

SCAN-seq data processing

We downloaded the SCAN-seq data available from the Sequence Read Archive (SRA) database (accession number: PRJNA616184). Following the described data processing steps of SCAN-seq. Briefly, nanoplexer (https://github.com/hanyue36/nanoplexer/) was used to demultiplex barcode for each cell in the library, and nanofilt (v2.5.0) was used for filtering low-quality reads (qscore <7) and short reads (length <100 bp), then Pychopper (v2.3) (https://github.com/nanoporetech/pychopper) was used to extract full-length reads.

After obtaining the full-length reads, we generated the gene expression matrix and isoform expression matrix using the same procedure as we did for HIT-scISOseq data.

Isoform switch analysis

Based on the isoform expression matrix, we used “IsoformSwitchAnalyzeR” to identify switch isoforms between each pair of adjacent embryonic stages under the criterion of isoform_switch_q_value<0.05 and gene_switch_q_value<0.01.

TE expression analysis

To quantify TE at the locus level, we first aligned all FLNC (full-length non-chimeric) reads to the mm10 genome. Based on the TE annotation file obtained from UCSC (http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/rmsk.txt.gz), we calculated the overlap between each uniquely aligned FLNC read and TE loci. Subsequently, we applied filtering criteria: the starting or ending position of FLNC alignment to the genome must fall within an overlapping TE locus, and only the TE locus with the longest overlap length was considered for quantitative counting of the same FLNC.

Following the above steps, we performed aggregate counting based on the cell barcode sequences corresponding to each FLNC, enabling us to obtain quantitative expression measurements of TE loci at the single-cell level.

To quantify the expression of TE-associated chimeric transcripts, we developed an in-house script. Firstly, this script extracts the chimeric alignments from FLNC. Secondly, it employs a method to identify TE sites that overlap with FLNC, as described in the unique TE mapping method above. Thirdly, we align the chimeric-mapped FLNC with the protein-coding gene positions from the reference annotation, aggregate these alignment results, and thus determine the expression quantification (UMI counts) of protein-coding genes linked to each TE locus.

Supporting information

S1 Fig. Isoform switch caused functional/structural changes of genes.

Expression pattern of gene isoforms which showed switch during preimplantation embryo development. The exact functional/structural domains predicted in each isoform are annotated based on the CATH database (http://cathdb.info/search/by_sequence).

(EPS)

pbio.3002505.s001.eps (1.6MB, eps)
S2 Fig. Relationship between gene and isoform expression.

(A, B) The ratios of genes detected with different numbers of isoform types for each stage of mouse embryos and mESCs in this study (A) and SCAN-seq data (B). The raw data for these 2 plots are supplied in S6 Data. (C) The ratios of genes detected with different numbers of isoform types for full oocyte, 1/2 oocyte, and 1/4 oocyte. The raw data for this plot is supplied in S2 Data. (D, E) Expression levels of genes detected with different numbers of isoform types for each stage of mouse embryos and mESCs in this study (D) and SCAN-seq data (E). The raw data for these 2 plots are supplied in S6 Data. (F) Gel view of cDNA amplification products of each gene. Srsf7, Rps5, and Rps19 are examples of highly expressed genes (CPM >100) and Plp2, Mrps6, and Ssl8l2 are lowly expressed genes (CPM <10). The raw image for this plot is supplied in S1 Raw Images. (G) Density plot showing the proportion of the major isoforms in genes expressing multiple isoform types. Only genes detected with UMI counts over 5 were included. The raw data for this plot is supplied in S6 Data.

(TIF)

pbio.3002505.s002.tif (2.4MB, tif)
S3 Fig. The characteristics of different types of transcripts.

(A) Ratios of the transcripts overlapped with annotated TSS. Transcripts with the 5-terminal locating within 200 bp of the CAGE peaks are regarded as overlapped transcripts. The raw data for this plot is supplied in S7 Data. (B) Length distribution of different types of transcripts. The raw data for this plot is supplied in S7 Data. (C) Relative length of the predicted protein to the complete reference ORF of each type of transcript. The raw data for this plot is supplied in S7 Data. (D) Ratios of each type of transcript at different stages calculated using SCAN-seq data. The raw data for this plot is supplied in S7 Data. (E) Expression level of the 3-prime partial transcripts detected at each stage in SCAN-seq data. The raw data for this plot is supplied in S7 Data.

(EPS)

pbio.3002505.s003.eps (1.6MB, eps)
S4 Fig. Characteristics of TE expression during preimplantation embryo development.

(A) Expression level of all TE counts belonging to different families in each stage. Each dot represents a cell. The raw data for this plot is supplied in S5 Data. Expression level of each TE locus (lower panel) belonging to different families in each stage. Only active loci in each cell were calculated. Each dot represents a cell. The raw data for this plot is supplied in S5 Data. (B) The top 5 subfamilies of ERVL expressed in clusters C4–C5 in Fig 5E. (C) The top 5 subfamilies of LINE1 expressed in clusters C4–C5 in Fig 5F.

(EPS)

pbio.3002505.s004.eps (5.1MB, eps)
S5 Fig. The loci expression of TEs and cell heterogeneity along developmental stages.

(A) The loci expression of stage specific TEs (mean expression >1 log count across all cells) belonging to ERVL family. The raw data for this plot is supplied in S8 Data. (B) The loci expression of stage specific TEs (mean expression >1 log count across all cells) belonging to LINE1 family. The raw data for this plot is supplied in S8 Data. (C) Correlation coefficients of blastomeres within the same embryos or different embryos at the same stage base on gene expression data. The raw data for this plot is supplied in S8 Data. (D) Correlation coefficients of blastomeres within the same embryos or different embryos at the same stage base on isoform expression data. The raw data for this plot is supplied in S8 Data.

(EPS)

pbio.3002505.s005.eps (3.9MB, eps)
S1 Table. The list of stage-specific genes and stage-specific isoforms.

(XLSX)

pbio.3002505.s006.xlsx (714.4KB, xlsx)
S2 Table. The expression matrix of 3-prime partial transcripts.

(XLSX)

S3 Table. The expression matrix of TE loci.

(XLSX)

pbio.3002505.s008.xlsx (3.6MB, xlsx)
S4 Table. The list of TE chimeric genes.

(XLSX)

pbio.3002505.s009.xlsx (360.5KB, xlsx)
S1 Raw Images. Raw images.

(PDF)

pbio.3002505.s010.pdf (7.7MB, pdf)
S1 Data. Raw data for Fig 1.

(XLSX)

pbio.3002505.s011.xlsx (54.4KB, xlsx)
S2 Data. Raw data for Fig 2.

(XLSX)

pbio.3002505.s012.xlsx (12.8MB, xlsx)
S3 Data. Raw data for Fig 3.

(XLSX)

pbio.3002505.s013.xlsx (20.7KB, xlsx)
S4 Data. Raw data for Fig 4.

(XLSX)

pbio.3002505.s014.xlsx (24.5KB, xlsx)
S5 Data. Raw data for Figs 5 and S4.

(XLSX)

pbio.3002505.s015.xlsx (32.2MB, xlsx)
S6 Data. Raw data for S2 Fig.

(XLSX)

pbio.3002505.s016.xlsx (53.4MB, xlsx)
S7 Data. Raw data for S3 Fig.

(XLSX)

pbio.3002505.s017.xlsx (14MB, xlsx)
S8 Data. Raw data for S5 Fig.

(XLSX)

pbio.3002505.s018.xlsx (345.5KB, xlsx)

Acknowledgments

We thank Man Zhang (Guangzhou Laboratory) for teaching the methods to collect the preimplant embryos from mice. We thank Boyan Huang (Guangzhou Laboratory) for providing with the mESC cells. We thank Xining Li and Enze Deng (Guangzhou Laboratory) for helping analysis of some preliminary RNA-seq data.

Abbreviations

CCS

circular consensus sequencing

E2C

early 2-cell

GO

Gene Ontology

hCG

human chorionic gonadotropin

L2C

Late 2-cell

LINE

long interspersed element

LTR

long terminal repeat

mESC

mouse embryonic stem cell

MZT

maternal to zygote transition

ORF

open reading frame

PCA

principal component analysis

PMSG

pregnant mare’s serum gonadotropin

RRM

RNA recognition motif

SINE

short interspersed element

SRA

Sequence Read Archive

TE

transposable element

TSS

transcription start site

ZGA

zygotic genome activation

Data Availability

All single-cell isoform sequencing data generated in this study are available at NCBI Gene Expression Omnibus (https://www.ncbi.nlm.nih.gov/geo/) under the accession number GSE250381. Code availability The HIT-scISOseq analysis pipeline and source code are available from https://github.com/shizhuoxing/scISA-Tools. Additionally, the source code utilized in this study has been published at: https://zenodo.org/records/10394889.

Funding Statement

This work was supported by grants from the National Key Research and Development Program of China (2020YFA0112201 to XF), the National Natural Science Foundation of China (32071451 to XF), the Guangdong Provincial Pearl River Talents Program (2021QN02Y747 to XF), and the R&D Program of Guangzhou National Laboratory (SRPG21-001 to XF). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Lebrigand K, Magnone V, Barbry P, Waldmann R. High throughput error corrected Nanopore single cell transcriptome sequencing. Nat Commun. 2020;11(1):4025. Epub 2020/08/14. doi: 10.1038/s41467-020-17800-6 ; PubMed Central PMCID: PMC7423900. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Joglekar A, Prjibelski A, Mahfouz A, Collier P, Lin S, Schlusche AK, et al. A spatially resolved brain region- and cell type-specific isoform atlas of the postnatal mouse brain. Nat Commun. 2021;12(1):463. Epub 2021/01/21. doi: 10.1038/s41467-020-20343-5 ; PubMed Central PMCID: PMC7815907. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Oguchi Y, Ozaki Y, Abdelmoez MN, Shintaku H. NanoSINC-seq dissects the isoform diversity in subcellular compartments of single cells. Sci Adv. 2021;7(15):eabe0317. doi: 10.1126/sciadv.abe0317 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Fan X, Tang D, Liao Y, Li P, Zhang Y, Wang M, et al. Single-cell RNA-seq analysis of mouse preimplantation embryos by third-generation sequencing. PLoS Biol. 2020;18(12):e3001017. Epub 2020/12/31. doi: 10.1371/journal.pbio.3001017 ; PubMed Central PMCID: PMC7773192. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Lebrigand K, Bergenstråhle J, Thrane K, Mollbrink A, Meletis K, Barbry P, et al. The spatial landscape of gene expression isoforms in tissue sections. Nucleic Acids Res. 2023;51(8):e47–e. doi: 10.1093/nar/gkad169 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Shi ZX, Chen ZC, Zhong JY, Hu KH, Zheng YF, Chen Y, et al. High-throughput and high-accuracy single-cell RNA isoform analysis using PacBio circular consensus sequencing. Nat Commun. 2023;14(1):2631. Epub 2023/05/07. doi: 10.1038/s41467-023-38324-9 ; PubMed Central PMCID: PMC10164132. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Al’Khafaji AM, Smith JT, Garimella KV, Babadi M, Popic V, Sade-Feldman M, et al. High-throughput RNA isoform sequencing using programmed cDNA concatenation. Nat Biotechnol. 2023. doi: 10.1038/s41587-023-01815-7 [DOI] [PubMed] [Google Scholar]
  • 8.Liu Z, Roberts R, Mercer TR, Xu J, Sedlazeck FJ, Tong W. Towards accurate and reliable resolution of structural variants for clinical diagnosis. Genome Biol. 2022;23(1):68. Epub 2022/03/05. doi: 10.1186/s13059-022-02636-8 ; PubMed Central PMCID: PMC8892125. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Lu H, Giordano F, Ning Z. Oxford Nanopore MinION Sequencing and Genome Assembly. Genomics Proteomics Bioinformatics. 2016;14(5):265–79. Epub 2016/10/31. doi: 10.1016/j.gpb.2016.05.004 ; PubMed Central PMCID: PMC5093776. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Xu Y, Zhao W, Olson SD, Prabhakara KS, Zhou X. Alternative splicing links histone modifications to stem cell fate decision. Genome Biol. 2018;19(1):133. Epub 2018/09/16. doi: 10.1186/s13059-018-1512-3 ; PubMed Central PMCID: PMC6138936. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Salomonis N, Schlieve CR, Pereira L, Wahlquist C, Colas A, Zambon AC, et al. Alternative splicing regulates mouse embryonic stem cell pluripotency and differentiation. Proc Natl Acad Sci U S A. 2010;107(23):10514–9. Epub 2010/05/26. doi: 10.1073/pnas.0912260107 ; PubMed Central PMCID: PMC2890851. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Rao S, Zhen S, Roumiantsev S, McDonald LT, Yuan GC, Orkin SH. Differential roles of Sall4 isoforms in embryonic stem cell pluripotency. Mol Cell Biol. 2010;30(22):5364–80. Epub 2010/09/15. doi: 10.1128/MCB.00419-10 ; PubMed Central PMCID: PMC2976381. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Jukam D, Shariati SAM, Skotheim JM. Zygotic Genome Activation in Vertebrates. Dev Cell. 2017;42(4):316–32. Epub 2017/08/23. doi: 10.1016/j.devcel.2017.07.026 ; PubMed Central PMCID: PMC5714289. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Tadros W, Lipshitz HD. The maternal-to-zygotic transition: a play in two acts. Development. 2009;136(18):3033–42. Epub 2009/08/25. doi: 10.1242/dev.033183 . [DOI] [PubMed] [Google Scholar]
  • 15.Vastenhouw NL, Cao WX, Lipshitz HD. The maternal-to-zygotic transition revisited. Development. 2019;146(11). Epub 2019/06/14. doi: 10.1242/dev.161471 . [DOI] [PubMed] [Google Scholar]
  • 16.Zhao LW, Zhu YZ, Wu YW, Pi SB, Shen L, Fan HY. Nuclear poly(A) binding protein 1 (PABPN1) mediates zygotic genome activation-dependent maternal mRNA clearance during mouse early embryonic development. Nucleic Acids Res. 2022;50(1):458–72. Epub 2021/12/15. doi: 10.1093/nar/gkab1213 ; PubMed Central PMCID: PMC8855302. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Qiao Y, Ren C, Huang S, Yuan J, Liu X, Fan J, et al. High-resolution annotation of the mouse preimplantation embryo transcriptome using long-read sequencing. Nat Commun. 2020;11(1):2653. Epub 2020/05/29. doi: 10.1038/s41467-020-16444-w ; PubMed Central PMCID: PMC7253418. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Macfarlan TS, Gifford WD, Driscoll S, Lettieri K, Rowe HM, Bonanomi D, et al. Embryonic stem cell potency fluctuates wi t h endogenous retrovirus activity. Nature. 2012;487(7405):57–63. Epub 2012/06/23. doi: 10.1038/nature11244 ; PubMed Central PMCID: PMC3395470. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Yang M, Yu H, Yu X, Liang S, Hu Y, Luo Y, et al. Chemical-induced chromatin remodeling reprograms mouse ESCs to totipotent-like stem cells. Cell Stem Cell. 2022;29(3):400–18 e13. Epub 2022/02/11. doi: 10.1016/j.stem.2022.01.010 . [DOI] [PubMed] [Google Scholar]
  • 20.Waterston RH, Lindblad-Toh K, Birney E, Rogers J, Abril JF, Agarwal P, et al. Initial sequencing and comparative analysis of the mouse genome. Nature. 2002;420(6915):520–62. Epub 2002/12/06. doi: 10.1038/nature01262 . [DOI] [PubMed] [Google Scholar]
  • 21.Fadloun A, Le Gras S, Jost B, Ziegler-Birling C, Takahashi H, Gorab E, et al. Chromatin signatures and retrotransposon profiling in mouse embryos reveal regulation of LINE-1 by RNA. Nat Struct Mol Biol. 2013;20(3):332–8. Epub 2013/01/29. doi: 10.1038/nsmb.2495 . [DOI] [PubMed] [Google Scholar]
  • 22.Berrens RV, Yang A, Laumer CE, Lun ATL, Bieberich F, Law CT, et al. Locus-specific expression of transposable elements in single cells with CELLO-seq. Nat Biotechnol. 2022;40(4):546–54. Epub 2021/11/17. doi: 10.1038/s41587-021-01093-1 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Geis FK, Goff SP. Silencing and Transcriptional Regulation of Endogenous Retroviruses: An Overview. Viruses. 2020;12(8). Epub 2020/08/23. doi: 10.3390/v12080884 ; PubMed Central PMCID: PMC7472088. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Chuong EB, Elde NC, Feschotte C. Regulatory activities of transposable elements: from conflicts to benefits. Nat Rev Genet. 2017;18(2):71–86. doi: 10.1038/nrg.2016.139 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Peaston AE, Evsikov AV, Graber JH, de Vries WN, Holbrook AE, Solter D, et al. Retrotransposons regulate host genes in mouse oocytes and preimplantation embryos. Dev Cell. 2004;7(4):597–606. Epub 2004/10/08. doi: 10.1016/j.devcel.2004.09.004 . [DOI] [PubMed] [Google Scholar]
  • 26.Eckersley-Maslin MA, Svensson V, Krueger C, Stubbs TM, Giehr P, Krueger F, et al. MERVL/Zscan4 Network Activation Results in Transient Genome-wide DNA Demethylation of mESCs. Cell Rep. 2016;17(1):179–92. Epub 2016/09/30. doi: 10.1016/j.celrep.2016.08.087 ; PubMed Central PMCID: PMC5055476. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Percharde M, Lin CJ, Yin Y, Guan J, Peixoto GA, Bulut-Karslioglu A, et al. A LINE1-Nucleolin Partnership Regulates Early Development and ESC Identity. Cell. 2018;174(2):391–405 e19. Epub 2018/06/26. doi: 10.1016/j.cell.2018.05.043 ; PubMed Central PMCID: PMC6046266. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Sun Z, Yu H, Zhao J, Tan T, Pan H, Zhu Y, et al. LIN28 coordinately promotes nucleolar/ribosomal functions and represses the 2C-like transcriptional program in pluripotent stem cells. Protein Cell. 2021. Epub 2021/08/01. doi: 10.1007/s13238-021-00864-5 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Yu H, Chen M, Hu Y, Ou S, Yu X, Liang S, et al. Dynamic reprogramming of H3K9me3 at hominoid-specific retrotransposons during human preimplantation development. Cell Stem Cell. 2022;29(7):1031–50.e12. Epub 2022/07/09. doi: 10.1016/j.stem.2022.06.006 . [DOI] [PubMed] [Google Scholar]
  • 30.Picelli S, Bjorklund AK, Faridani OR, Sagasser S, Winberg G, Sandberg R. Smart-seq2 for sensitive full-length transcriptome profiling in single cells. Nat Methods. 2013;10(11):1096–8. Epub 2013/09/24. doi: 10.1038/nmeth.2639 . [DOI] [PubMed] [Google Scholar]
  • 31.Hardwick SA, Chen WY, Wong T, Deveson IW, Blackburn J, Andersen SB, et al. Spliced synthetic genes as internal controls in RNA sequencing experiments. Nat Methods. 2016;13(9):792–8. Epub 2016/08/10. doi: 10.1038/nmeth.3958 . [DOI] [PubMed] [Google Scholar]
  • 32.Paul L, Kubala P, Horner G, Ante M, Holländer I, Alexander S, et al. SIRVs: Spike-In RNA Variants as External Isoform Controls in RNA-Sequencing. bioRxiv. 2016:080747. doi: 10.1101/080747 [DOI] [Google Scholar]
  • 33.Molè MA, Weberling A, Zernicka-Goetz M. Comparative analysis of human and mouse development: From zygote to pre-gastrulation. Curr Top Dev Biol. 2020;136:113–38. Epub 2020/01/22. doi: 10.1016/bs.ctdb.2019.10.002 . [DOI] [PubMed] [Google Scholar]
  • 34.Lewis TE, Sillitoe I, Dawson N, Lam SD, Clarke T, Lee D, et al. Gene3D: Extensive prediction of globular domains in proteins. Nucleic Acids Res. 2018;46(D1):D1282. Epub 2017/12/02. doi: 10.1093/nar/gkx1187 ; PubMed Central PMCID: PMC5753360. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Sillitoe I, Bordin N, Dawson N, Waman VP, Ashford P, Scholes HM, et al. CATH: increased structural coverage of functional space. Nucleic Acids Res. 2020;49(D1):D266–D73. doi: 10.1093/nar/gkaa1079 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Saha S, Matthews DA, Bessant C. High throughput discovery of protein variants using proteomics informed by transcriptomics. Nucleic Acids Res. 2018;46(10):4893–902. Epub 2018/05/03. doi: 10.1093/nar/gky295 ; PubMed Central PMCID: PMC6007231. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Xue Z, Huang K, Cai C, Cai L, Jiang CY, Feng Y, et al. Genetic programs in human and mouse early embryos revealed by single-cell RNA sequencing. Nature. 2013;500(7464):593–7. Epub 2013/07/31. doi: 10.1038/nature12364 ; PubMed Central PMCID: PMC4950944. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Yang Y, Wang L, Han X, Yang WL, Zhang M, Ma HL, et al. RNA 5-Methylcytosine Facilitates the Maternal-to-Zygotic Transition by Preventing Maternal mRNA Decay. Mol Cell. 2019;75(6):1188–202 e11. Epub 2019/08/11. doi: 10.1016/j.molcel.2019.06.033 . [DOI] [PubMed] [Google Scholar]
  • 39.Sha QQ, Zhu YZ, Li S, Jiang Y, Chen L, Sun XH, et al. Characterization of zygotic genome activation-dependent maternal mRNA clearance in mouse. Nucleic Acids Res. 2020;48(2):879–94. Epub 2019/11/30. doi: 10.1093/nar/gkz1111 ; PubMed Central PMCID: PMC6954448. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Fu X, Zhang C, Zhang Y. Epigenetic regulation of mouse preimplantation embryo development. Curr Opin Genet Dev. 2020;64:13–20. Epub 2020/06/22. doi: 10.1016/j.gde.2020.05.015 ; PubMed Central PMCID: PMC7641911. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Jia W, Yao Z, Zhao J, Guan Q, Gao L. New perspectives of physiological and pathological functions of nucleolin (NCL). Life Sci. 2017;186:1–10. doi: 10.1016/j.lfs.2017.07.025 [DOI] [PubMed] [Google Scholar]
  • 42.Lanciano S, Cristofari G. Measuring and interpreting transposable element expression. Nat Rev Genet. 2020;21(12):721–36. doi: 10.1038/s41576-020-0251-y [DOI] [PubMed] [Google Scholar]
  • 43.Shahid S, Slotkin RK. The current revolution in transposable element biology enabled by long reads. Curr Opin Plant Biol. 2020;54:49–56. Epub 2020/02/03. doi: 10.1016/j.pbi.2019.12.012 . [DOI] [PubMed] [Google Scholar]
  • 44.Shen H, Yang M, Li S, Zhang J, Peng B, Wang C, et al. Mouse totipotent stem cells captured and maintained through spliceosomal repression. Cell. 2021;184(11):2843–59 e20. Epub 2021/05/16. doi: 10.1016/j.cell.2021.04.020 . [DOI] [PubMed] [Google Scholar]
  • 45.Song M, Pebworth MP, Yang X, Abnousi A, Fan C, Wen J, et al. Cell-type-specific 3D epigenomes in the developing human cortex. Nature. 2020;587(7835):644–9. Epub 2020/10/16. doi: 10.1038/s41586-020-2825-4 ; PubMed Central PMCID: PMC7704572. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Mohanan NK, Shaji F, Koshre GR, Laishram RS. Alternative polyadenylation: An enigma of transcript length variation in health and disease. WIREs RNA. 2021;13(1). doi: 10.1002/wrna.1692 [DOI] [PubMed] [Google Scholar]
  • 47.Zhang W, Chen F, Chen R, Xie D, Yang J, Zhao X, et al. Zscan4c activates endogenous retrovirus MERVL and cleavage embryo genes. Nucleic Acids Res. 2019;47(16):8485–501. Epub 2019/07/16. doi: 10.1093/nar/gkz594 ; PubMed Central PMCID: PMC7145578. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Franke V, Ganesh S, Karlic R, Malik R, Pasulka J, Horvat F, et al. Long terminal repeats power evolution of genes and gene expression programs in mammalian oocytes and zygotes. Genome Res. 2017;27(8):1384–94. Epub 2017/05/20. doi: 10.1101/gr.216150.116 ; PubMed Central PMCID: PMC5538554. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Jachowicz JW, Bing X, Pontabry J, Bošković A, Rando OJ, Torres-Padilla M-E. LINE-1 activation after fertilization regulates global chromatin accessibility in the early mouse embryo. Nat Genet. 2017;49(10):1502–10. doi: 10.1038/ng.3945 [DOI] [PubMed] [Google Scholar]
  • 50.Shigaki H, Baba Y, Watanabe M, Murata A, Iwagami S, Miyake K, et al. LINE-1 hypomethylation in gastric cancer, detected by bisulfite pyrosequencing, is associated with poor prognosis. Gastric Cancer. 2013;16(4):480–7. Epub 2012/11/28. doi: 10.1007/s10120-012-0209-7 ; PubMed Central PMCID: PMC3824342. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Rodriguez-Martin B, Alvarez EG, Baez-Ortega A, Zamora J, Supek F, Demeulemeester J, et al. Pan-cancer analysis of whole genomes identifies driver rearrangements promoted by LINE-1 retrotransposition. Nat Genet. 2020;52(3):306–19. Epub 2020/02/07. doi: 10.1038/s41588-019-0562-0 ; PubMed Central PMCID: PMC7058536. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Zhang R, Zhang F, Sun Z, Liu P, Zhang X, Ye Y, et al. LINE-1 Retrotransposition Promotes the Development and Progression of Lung Squamous Cell Carcinoma by Disrupting the Tumor-Suppressor Gene FGGY. Cancer Res. 2019;79(17):4453–65. Epub 2019/07/11. doi: 10.1158/0008-5472.CAN-19-0076 . [DOI] [PubMed] [Google Scholar]
  • 53.Liu X, Liu Z, Wu Z, Ren J, Fan Y, Sun L, et al. Resurrection of endogenous retroviruses during aging reinforces senescence. Cell. 2023;186(2):287–304 e26. Epub 2023/01/08. doi: 10.1016/j.cell.2022.12.017 . [DOI] [PubMed] [Google Scholar]
  • 54.Modzelewski AJ, Shao W, Chen J, Lee A, Qi X, Noon M, et al. A mouse-specific retrotransposon drives a conserved Cdk2ap1 isoform essential for development. Cell. 2021;184(22):5541–58.e22. Epub 2021/10/14. doi: 10.1016/j.cell.2021.09.021 ; PubMed Central PMCID: PMC8787082. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Fueyo R, Judd J, Feschotte C, Wysocka J. Roles of transposable elements in the regulation of mammalian transcription. Nat Rev Mol Cell Biol. 2022;23(7):481–97. Epub 2022/03/02. doi: 10.1038/s41580-022-00457-y ; PubMed Central PMCID: PMC10470143. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Wang J, Xie G, Singh M, Ghanbarian AT, Raskó T, Szvetnik A, et al. Primate-specific endogenous retrovirus-driven transcription defines naive-like stem cells. Nature. 2014;516(7531):405–9. Epub 2014/10/16. doi: 10.1038/nature13804 . [DOI] [PubMed] [Google Scholar]
  • 57.Tang WW, Dietmann S, Irie N, Leitch HG, Floros VI, Bradshaw CR, et al. A Unique Gene Regulatory Network Resets the Human Germline Epigenome for Development. Cell. 2015;161(6):1453–67. Epub 2015/06/06. doi: 10.1016/j.cell.2015.04.053 ; PubMed Central PMCID: PMC4459712. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Du Z, Zheng H, Huang B, Ma R, Wu J, Zhang X, et al. Allelic reprogramming of 3D chromatin architecture during early mammalian development. Nature. 2017;547(7662):232–5. Epub 2017/07/14. doi: 10.1038/nature23263 . [DOI] [PubMed] [Google Scholar]

Decision Letter 0

Richard Hodge

25 Jun 2023

Dear Dr Fan,

Thank you for submitting your manuscript entitled "Single-cell direct isoform sequencing reveals isoform switch and locus specific transposable elements regulation during preimplantation embryo development" for consideration as a Research Article by PLOS Biology. Please accept my sincere apologies for the delay in getting back to you as we consulted with an academic editor about your submission.

Your manuscript has now been evaluated by the PLOS Biology editorial staff, as well as by an academic editor with relevant expertise, and I am writing to let you know that we would like to send your submission out for external peer review.

However, before we can send your manuscript to reviewers, we need you to complete your submission by providing the metadata that is required for full assessment. To this end, please login to Editorial Manager where you will find the paper in the 'Submissions Needing Revisions' folder on your homepage. Please click 'Revise Submission' from the Action Links and complete all additional questions in the submission questionnaire.

Once your full submission is complete, your paper will undergo a series of checks in preparation for peer review. After your manuscript has passed the checks it will be sent out for review. To provide the metadata for your submission, please Login to Editorial Manager (https://www.editorialmanager.com/pbiology) within two working days, i.e. by Jun 27 2023 11:59PM.

If your manuscript has been previously peer-reviewed at another journal, PLOS Biology is willing to work with those reviews in order to avoid re-starting the process. Submission of the previous reviews is entirely optional and our ability to use them effectively will depend on the willingness of the previous journal to confirm the content of the reports and share the reviewer identities. Please note that we reserve the right to invite additional reviewers if we consider that additional/independent reviewers are needed, although we aim to avoid this as far as possible. In our experience, working with previous reviews does save time.

If you would like us to consider previous reviewer reports, please edit your cover letter to let us know and include the name of the journal where the work was previously considered and the manuscript ID it was given. In addition, please upload a response to the reviews as a 'Prior Peer Review' file type, which should include the reports in full and a point-by-point reply detailing how you have or plan to address the reviewers' concerns.

During the process of completing your manuscript submission, you will be invited to opt-in to posting your pre-review manuscript as a bioRxiv preprint. Visit http://journals.plos.org/plosbiology/s/preprints for full details. If you consent to posting your current manuscript as a preprint, please upload a single Preprint PDF.

Feel free to email us at plosbiology@plos.org if you have any queries relating to your submission.

Kind regards,

Richard

Richard Hodge, PhD

Senior Editor, PLOS Biology

rhodge@plos.org

PLOS

Empowering researchers to transform science

Carlyle House, Carlyle Road, Cambridge, CB4 3DN, United Kingdom

ORCiD I plosbio.org I @PLOSBiology I Blog

California (U.S.) corporation #C2354500, based in San Francisco

Decision Letter 1

Richard Hodge

26 Jul 2023

Dear Dr Fan,

Thank you for your patience while your manuscript "Single-cell direct isoform sequencing reveals isoform switch and locus specific transposable elements regulation during preimplantation embryo development" was peer-reviewed at PLOS Biology. Please accept my apologies for the delays that you have experienced during the peer review process. Your manuscript has now been evaluated by the PLOS Biology editors, an Academic Editor with relevant expertise, and by three independent reviewers.

In light of the reviews, which you will find at the end of this email, we would like to invite you to revise the work to thoroughly address the reviewers' reports.

As you will see, the reviewers are generally positive about the findings but raise overlapping concerns that the manuscript lacks some analyses to control for technical artefacts and to provide additional biological relevance for the findings. In addition, Reviewer #3 notes that additional functional insight into the roles of specific isoforms in driving embryo development should be provided, as well as lineage tracing data to investigate the functional role of transposable elements. After discussions with the Academic Editor, we agree with the reviewer that these insights would strengthen the manuscript and would be required to consider a revised version at the journal.

Given the extent of revision needed, we cannot make a decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is likely to be sent for further evaluation by all or a subset of the reviewers.

We expect to receive your revised manuscript within 3 months. Please email us (plosbiology@plos.org) if you have any questions or concerns, or would like to request an extension.

At this stage, your manuscript remains formally under active consideration at our journal; please notify us by email if you do not intend to submit a revision so that we may withdraw it.

**IMPORTANT - SUBMITTING YOUR REVISION**

Your revisions should address the specific points made by each reviewer. Please submit the following files along with your revised manuscript:

1. A 'Response to Reviewers' file - this should detail your responses to the editorial requests, present a point-by-point response to all of the reviewers' comments, and indicate the changes made to the manuscript.

*NOTE: In your point-by-point response to the reviewers, please provide the full context of each review. Do not selectively quote paragraphs or sentences to reply to. The entire set of reviewer comments should be present in full and each specific point should be responded to individually, point by point.

You should also cite any additional relevant literature that has been published since the original submission and mention any additional citations in your response.

2. In addition to a clean copy of the manuscript, please also upload a 'track-changes' version of your manuscript that specifies the edits made. This should be uploaded as a "Revised Article with Changes Highlighted" file type.

*Re-submission Checklist*

When you are ready to resubmit your revised manuscript, please refer to this re-submission checklist: https://plos.io/Biology_Checklist

To submit a revised version of your manuscript, please go to https://www.editorialmanager.com/pbiology/ and log in as an Author. Click the link labelled 'Submissions Needing Revision' where you will find your submission record.

Please make sure to read the following important policies and guidelines while preparing your revision:

*Published Peer Review*

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. Please see here for more details:

https://blogs.plos.org/plos/2019/05/plos-journals-now-open-for-published-peer-review/

*PLOS Data Policy*

Please note that as a condition of publication PLOS' data policy (http://journals.plos.org/plosbiology/s/data-availability) requires that you make available all data used to draw the conclusions arrived at in your manuscript. If you have not already done so, you must include any data used in your manuscript either in appropriate repositories, within the body of the manuscript, or as supporting information (N.B. this includes any numerical values that were used to generate graphs, histograms etc.). For an example see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5

*Blot and Gel Data Policy*

We require the original, uncropped and minimally adjusted images supporting all blot and gel results reported in an article's figures or Supporting Information files. We will require these files before a manuscript can be accepted so please prepare them now, if you have not already uploaded them. Please carefully read our guidelines for how to prepare and upload this data: https://journals.plos.org/plosbiology/s/figures#loc-blot-and-gel-reporting-requirements

*Protocols deposition*

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

Thank you again for your submission to our journal. We hope that our editorial process has been constructive thus far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Richard

Richard Hodge, PhD

Senior Editor, PLOS Biology

rhodge@plos.org

------------------------------------

REVIEWS:

Reviewer #1: Wang, Shi et al present a study to observe full length transcripts using nanopore sequencing in murine preimplantation embryos on a single cell level. This technique aids the identification of transcript isoforms as well as transposable elements. The authors conclude on a switch in isoform diversity and dynamics, which coincides with zygotic genome activation (ZGA). Additionally, the technique allows for the observation of transposable elements (TEs) on a family as well as locus specific level.

Although the authors present an interesting adaptation of the SMART-seq2 technique for low input samples, the manuscript seems to lack several key analyses and controls, as well as biological interpretation of their findings as described in more detail below. Additionally, several minor comments are listed below to improve the clarity of the text.

Major comments

1. In line 182-184 the authors state that genes expressing more isoforms are among highly expressed genes. However, this could be caused by a sampling artifact considering that at this sequence depth it is more likely to detect more isoforms of genes that are highly transcribed compared to genes that are expressed at lower levels. The authors should confirm the lower isoform diversity of lowly expressed genes by comparing RT-PCR and sanger sequencing results of high and low expressed genes as was done for other findings in their study.

2. In line 209-213, the authors discuss isoform candidates that are truncated at the 3' end. They state (as a biological finding) that there are no isoforms detected without poly (A) tail. This conclusion is in my view, incorrect, considering that the technique used (Smart-seq2) uses a poly-A-based RT approach and thus requires the transcript to have a poly(A) tail (as a consequence, SMART-seq-2 cannot naturally detect transcripts without poly(A) tails).

3. Along these lines, the authors claim that isoforms are much more frequently truncated at the 5' end. Can the authors confirm that this is indeed a biological finding and not instead a result of incomplete reverse transcription?

4. In figure 4B a DNA gel is shown accompanied by the statement that shorter isoforms are more abundant than the full-length transcript. Although this may very well be the case, a DNA gel can be hard to interpret for this purpose as longer fragments will appear brighter than shorter fragments due to the nucleotide staining. The authors should consider confirming this finding by bioanalyzer or by other more quantitative techniques.

5. It is a novel and interesting finding that DNA transposons are found in this dataset to be expressed to a similar degree as LINEs, SINEs and LTRs. However, this finding does not get explained or highlighted by the authors. This should be added to the discussion.

6. According to the methods, for the TE analysis, only transcripts fully mapped to the TE sequence are taken into consideration. However, several studies (among which a recent study by Modzelewski, Cell 2021) have identified transcript isoforms where TEs are used as an alternative promoter for gene expression. It would be highly relevant for the authors to investigate chimeric transcripts containing both genic and TE regions in their dataset.

Minor comments

1. The text should be thoroughly checked for spelling and grammar mistakes for clarity and correctness. Several examples of spelling errors (incomplete list);

a. Line 74; "Calss1" should be "class I".

b. Line 85; "Lous" should be "locus".

c. Figure 5b + legend "Expressed TE locus" should be "Expressed TE loci".

d. In the methods in line 419 the authors state that expression values are normalized by copy number per 100,000 mapped reads. However, CPM is typically per 1,000,000 mapped reads.

2. Line 77 introduces MERVL and MT2_Mm. It should be made clear the MT2_Mm is the LTR of the internal (originally coding) sequence of MERVL for a better understanding of the author's statement.

3. On several occasions the text is currently missing references. For example;

a. Line 82-85; there are several reviews that discuss this problem and computational tools/wet lab techniques that address this problem that should be cited here.

b. Line 132 after the statement that the mouse oocyte/zygote contains more RNA molecules compared to later stages.

c. Line 181 a reference is missing for this public (?) SCAN-seq dataset and method.

d. Line 286; there are several reviews that have highlighted this particular challenge of TE biology that can be cited (e.g. Lanciano and Cristofari, Nature Reviews Genetics 2020)

4. All figure legends and labels should be improved for clarity. In particular;

a. Figure 2 lacks information on how the rows in heatmap D-E are ordered within each cluster.

b. Figure 5a lacks information on what the y-axis reflects (actual raw counts? Normalized counts? Number of TEs that were identified or number of counts that are inside TEs?).

c. Figure 5C legends also does not contain information on whether the dots represent individual blastomeres or TE loci.

d. The legend of Figure 5D states that unique loci are mapped (n=3894), but this number would make more sense if the heatmap shows unique TE families (e.g. MT2_Mm). Is this correct?

e. Similar to figure 2, it is unclear how the rows of the heatmaps in figure 5 (5D, 5G and 5H) are ordered/clustered. This should be mentioned in the figure legend. Additionally, it should be highlighted in the legend that Figure 5G-H only show examples of individual loci and does not reflect all insertions of a specific family.

5. Line 303-304; The statement "Our data shows that Lx7 …" should either be toned down or moved to the discussion as this is a very strong statement without experimental proof for support.

6. Line 352; "On the other hand, abnormal expression of TE is harmful to human"; it should be specified that this is the case for most differentiated tissues, not during embryo development.

Reviewer #2 (Jong Kyoung Kim, signs review): Summary: In this manuscript, the authors developed a protocol for long-read single-cell RNA sequencing to quantify the expression levels of isoforms and locus-specific transposable elements in single cells. The protocol, modified from SCAN-seq (using nanopore sequencing) to use PacBio HiFi sequencing, was applied to mouse oocytes, zygotes and blastomeres. They characterized the dynamic isoform switching patterns and locus-specific expression of transposable elements (TEs) during mouse embryonic development.

The present manuscript is potentially important because it provides a valuable single-cell resource for understanding the complete landscape of gene expression during early embryonic development by profiling individual isoforms and locus-specific TEs. However, there are several major points that should be addressed to substantiate the authors' claims.

Major points:

1. The authors used SIRV spike-ins to evaluate the accuracy of quantifying isoform expression levels. For the SIRV spike-ins, they should cite the bioRxiv preprint by Paul et al., (https://doi.org/10.1101/080747). Fig. 1D does not provide any information on the accuracy of measuring isoform expression levels. The authors should provide this information by comparing the true and estimated isoform expression levels per cell, and by showing the relationship between technical noise (squared CV) and the mean isoform expression levels. I'm also wondering if the accuracy and technical noise of quantifying isoform expression in single cells are affected by the total amount of mRNAs in the cells. This can be done by evaluating the accuracy and technical noise of isoform quantification for each developmental stage, since the total amount of mRNAs per cell would decrease from oocytes to blastomeres.

2. Fig. S1A and B: Since the number of isoform types is strongly influenced by the expression levels of genes (Fig. S1C and D), the rich isoform diversity observed in early mouse embryos may be an artefact by their increased total amount of mRNAs per cell. The authors should show that technical replicates of a cell from early embryos with reduced total amount of mRNAs have similar isoform diversity. The technical replicates of a cell can be obtained by splitting a cell lysate into multiple aliquots. The same also applies to Fig. 3C.

3. A list of genes showing dynamic isoform switching during early embryonic development should be provided as a supplementary table, and their shared biological properties should be systematically characterized and discussed.

4. Fig. 5G and H: I'm wondering if neighboring genes of TE copies with similar expression patterns show more correlated expression during embryonic development. The role of TEs as putative regulators should be further characterized.

5. The discussion section seems truncated and abruptly ended. The limitations of the proposed protocol and the current study should be discussed.

Minor comments:

1. The terms "3' prime" and "5' prime" through the manuscript and figures should be replaced with "3'" (or "3-prime") and "5'" (or "5-prime"), respectively.

2. Many typos in the manuscript (e.g. "down regulated" in line 156, "showes" in line 241, and "is consists of" in line 305)

3. Line 278-279: The sentence is not matched with Fig. 5C.

Reviewer #3: In this manuscript, the authors present a novel approach utilizing HIT-scISOseq, a hifi long-read sequencing technique from PacBio, to investigate isoform switching during embryo development in mouse preimplantation embryos. The authors have previously published a method called SCAN-seq (Fan X et al. PLoS Biol. 2020) for long-read single-cell RNA sequencing in mouse preimplantation embryos. In this study, they aim to enhance transcript abundance estimation by incorporating unique molecular identifiers (UMIs) and spike-in correction and evaluate the capability of the modified platform to identify isoforms.

The sequencing component of this study introduces two noteworthy advancements. Firstly, the authors successfully adapt the HIT-scISOseq method for low-throughput applications, enabling the quantification and identification of isoforms associated with not only alternative splicing but transposable elements. Secondly, they classify transcripts into various types, such as 3' prime partial, 5' prime partial, complete, and insertion, and perform quantitative analyses to assess changes in these categories.

However, it remains challenging to discern the specific benefits of this approach in revealing or supporting biological themes related to embryo development. Consequently, I recommend some revisions to enhance the impact and citation potential of this paper.

Major points

1. The number of blastomeres used in this study appears to be limited, with only 161 cells analyzed from various stages of preimplantation embryos. This sample size may not provide sufficient statistical power to discern cellular differences within each stage. Additionally, the rationale behind utilizing single-cell RNA sequencing instead of bulk-long read sequencing technology is not clearly explained. It would be beneficial to clarify the necessity of a single-cell-based approach in this study. In particular, the selection of only 13 blastocyst cells raises concerns about their representativeness, as these cells appear to be more clustered compared to cells from other stages (as shown in Figure 2B and 2C).

2. Isoforms can play distinct roles based on various factors such as changes in expression levels, structural conformation, loss of enzymatic activity, and rewiring of protein interactions. To provide a more comprehensive understanding of isoform dynamics and their implications in reprogramming during development, the authors should conduct further analyses. One approach could involve leveraging protein residue-level function annotation tools, such as FunFam from Christine A. Orengo, to identify functional dissimilarities among isoforms. By employing these tools, the authors may gain insights into the specific isoforms driving the developmental stages of preimplantation embryos. Integrating such analyses into the study would enhance our understanding of the functional implications of isoform switching during embryo development.

3. An important consideration is that even when using different embryos, the presence of inserted transposable elements (TEs) should be preserved if these TEs play a crucial role in development. Therefore, the identification of functional TEs can offer an opportunity for lineage tracing across different stages of development. This aspect underscores the significance of employing a single-cell-based approach. By characterizing the expression and activity of TEs at the single-cell level, the authors can potentially unravel their functional roles and trace the lineage of cells across different developmental stages. This approach would provide valuable insights into the impact of TEs on developmental processes, further justifying the use of a single-cell-based approach in this study.

Minor points

- Please show loadings when you make PCA plots in figure 2B and 2C.

- Please add more information for figure captions and legends (e.g. -1 to 1 in 2D).

- Typo: Page 10 line 74 calss1 -> class1

Decision Letter 2

Richard Hodge

7 Dec 2023

Dear Dr Fan,

Thank you for your patience while we considered your revised manuscript "Single-cell direct isoform sequencing reveals isoform switch and locus specific transposable elements regulation during preimplantation embryo development" for publication as a Research Article at PLOS Biology. Please accept my apologies for the delays that you have experienced during this round of the peer review process. This revised version of your manuscript has been evaluated by the PLOS Biology editors, the Academic Editor and the original reviewers.

Based on the reviews, I am pleased to say that we are likely to accept this manuscript for publication, provided you satisfactorily address the remaining points raised by Reviewer #1. In addition, we would strongly encourage you to enlist the services of a professional editing service or a native English-speaking colleague to improve the overall quality of the writing in the manuscript at this stage.

Please also make sure to address the following data and other policy-related requests that I have provided below (A-I):

(A)After discussions with the editorial team, we think your manuscript would be a better fit as a Resource Article. During resubmission, we would be grateful if you could please tick ‘Methods and Resources’ as the article type in the drop down menu.

(B)We would like to suggest the following modification to the title, to make it more compelling and accessible for our broad readership:

“Single-cell analysis of isoform switching and transposable element expression during preimplantation embryonic development”

(C) In addition, we would also like to suggest the following edits to the Abstract to improve the quality of the writing. In addition, we note that the Abstract is lacking a conclusion sentence that we think should be added at this stage. The conclusion sentence would highlight the relevance of the work and/or describe what future work the dataset now enables for instance.

“Alternative splicing is an essential regulatory mechanism for development and pathogenesis. Through alternative splicing one gene can encode multiple isoforms and be translated into proteins with different functions. Therefore, this diversity is an important dimension to understand the molecular mechanisms governing embryo development. Isoform expression in preimplantation embryos has been extensively investigated, leading to the discovery of new isoforms. However, the dynamics of isoform switching of different types of transcripts throughout development remains unexplored. Here, using single-cell direct isoform sequencing in over one hundred single blastomeres from the mouse oocyte to blastocyst stage, we quantified isoform expression and found that 3-prime partial transcripts lacking stop codons are highly accumulated in oocytes and zygotes. These transcripts are not transcription by-products and might play a role in the maternal to zygote transition (MZT) process. Long-read sequencing also enabled us to determine the expression of transposable elements (TEs) at specific loci. In this way, we identified 3894 TE loci that exhibited dynamic changes during preimplantation development, likely regulating the expression of adjacent genes.”

(D) You may be aware of the PLOS Data Policy, which requires that all data be made available without restriction: http://journals.plos.org/plosbiology/s/data-availability. For more information, please also see this editorial: http://dx.doi.org/10.1371/journal.pbio.1001797

Note that we do not require all raw data. Rather, we ask that all individual quantitative observations that underlie the data summarized in the figures and results of your paper be made available in one of the following forms:

-Supplementary files (e.g., excel). Please ensure that all data files are uploaded as 'Supporting Information' and are invariably referred to (in the manuscript, figure legends, and the Description field when uploading your files) using the following format verbatim: S1 Data, S2 Data, etc. Multiple panels of a single or even several figures can be included as multiple sheets in one excel file that is saved using exactly the following convention: S1_Data.xlsx (using an underscore).

-Deposition in a publicly available repository. Please also provide the accession code or a reviewer link so that we may view your data before publication.

Regardless of the method selected, please ensure that you provide the individual numerical values that underlie the summary data displayed in the following figure panels as they are essential for readers to assess your analysis and to reproduce it:

Figure 1B-D, 2A-E, 2G-I, 3B-C, 4D-E, 5A-D, 5G-H, S2A-E, S2G, S3A-E, S4A-B, S5A-D

NOTE: the numerical data provided should include all replicates AND the way in which the plotted mean and errors were derived (it should not present only the mean/average values).

(E) We ask that you please deposit the scRNA-seq dataset in a public data repository, such as the GEO (https://www.ncbi.nlm.nih.gov/geo/). Please provide the accession number/URL of the deposition in the Data Availability Statement and ensure that the deposition is made publicly available at this stage.

(F) Please also ensure that each of the relevant figure legends in your manuscript include information on *WHERE THE UNDERLYING DATA CAN BE FOUND*, and ensure your supplemental data file/s has a legend.

(G) We require the original, uncropped and minimally adjusted images supporting all blot and gel results reported in the following Figures:

Figure 3E, 4B, S3F

We will require these files before a manuscript can be accepted so please prepare and upload them now. Please carefully read our guidelines for how to prepare and upload this data: https://journals.plos.org/plosbiology/s/figures#loc-blot-and-gel-reporting-requirements

(H) Thank you already providing the underlying code for the HIT-scISO-seq analysis pipeline in Github (https://github.com/shizhuoxing/scISA-Tools). However, we ask that you please attach this deposition to the Zenodo repository (https://zenodo.org/) to ensure long-term maintenance of the deposition and that it is given a DOI.

(I) Please ensure that your Data Statement in the submission system accurately describes where your data can be found and is in final format, as it will be published as written there.

------------------------------------------------------------------------

As you address these items, please take this last chance to review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the cover letter that accompanies your revised manuscript.

We expect to receive your revised manuscript within two weeks.

To submit your revision, please go to https://www.editorialmanager.com/pbiology/ and log in as an Author. Click the link labelled 'Submissions Needing Revision' to find your submission record. Your revised submission must include the following:

- a cover letter that should detail your responses to any editorial requests, if applicable, and whether changes have been made to the reference list

- a Response to Reviewers file that provides a detailed response to the reviewers' comments (if applicable, if not applicable please do not delete your existing 'Response to Reviewers' file.)

- a track-changes file indicating any changes that you have made to the manuscript.

NOTE: If Supporting Information files are included with your article, note that these are not copyedited and will be published as they are submitted. Please ensure that these files are legible and of high quality (at least 300 dpi) in an easily accessible file format. For this reason, please be aware that any references listed in an SI file will not be indexed. For more information, see our Supporting Information guidelines:

https://journals.plos.org/plosbiology/s/supporting-information

*Published Peer Review History*

Please note that you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. Please see here for more details:

https://blogs.plos.org/plos/2019/05/plos-journals-now-open-for-published-peer-review/

*Press*

Should you, your institution's press office or the journal office choose to press release your paper, please ensure you have opted out of Early Article Posting on the submission form. We ask that you notify us as soon as possible if you or your institution is planning to press release the article.

*Protocols deposition*

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

Please do not hesitate to contact me should you have any questions.

Kind regards,

Richard

Richard Hodge, PhD

Senior Editor, PLOS Biology

rhodge@plos.org

------------------------------------------------------------------------

Reviewer remarks:

Reviewer #1: The authors have added quite some more analyses to TEs (as myself and the other reviewers had suggested), in particular the neighboring gene analysis.

Overall, they addressed the comments I raised fully. However, I again strongly suggest the authors and the editorial team to improve the text for grammar correctness and clarity. Additionally, one minor suggestion is that I think the authors made a mistake in the label of figure 5C. As far as I understand the figure shows TE expression on a locus specific level in the upper panel and on a subfamily level on the lower level, however the legend states the opposite.

Reviewer #2 (Jong Kyoung Kim, signs review): The authors have satisfactorily addressed my comments.

Decision Letter 3

Richard Hodge

18 Jan 2024

Dear Dr Fan,

Thank you for the submission of your revised Methods and Resources entitled "Single-cell analysis of isoform switching and transposable element expression during preimplantation embryonic development" for publication in PLOS Biology. On behalf of my colleagues and the Academic Editor, Bon-Kyoung Koo, I am delighted to let you know that we can in principle accept your manuscript for publication, provided you address any remaining formatting and reporting issues. These will be detailed in an email you should receive within 2-3 business days from our colleagues in the journal operations team; no action is required from you until then. Please note that we will not be able to formally accept your manuscript and schedule it for publication until you have completed any requested changes.

Please take a minute to log into Editorial Manager at http://www.editorialmanager.com/pbiology/, click the "Update My Information" link at the top of the page, and update your user information to ensure an efficient production process.

PRESS

We frequently collaborate with press offices. If your institution or institutions have a press office, please notify them about your upcoming paper at this point, to enable them to help maximise its impact. If the press office is planning to promote your findings, we would be grateful if they could coordinate with biologypress@plos.org. If you have previously opted in to the early version process, we ask that you notify us immediately of any press plans so that we may opt out on your behalf.

We also ask that you take this opportunity to read our Embargo Policy regarding the discussion, promotion and media coverage of work that is yet to be published by PLOS. As your manuscript is not yet published, it is bound by the conditions of our Embargo Policy. Please be aware that this policy is in place both to ensure that any press coverage of your article is fully substantiated and to provide a direct link between such coverage and the published work. For full details of our Embargo Policy, please visit http://www.plos.org/about/media-inquiries/embargo-policy/.

Many congratulations and thanks again for choosing PLOS Biology for publication and supporting Open Access publishing. We look forward to publishing your study. 

Sincerely, 

Ines

--

Ines Alvarez-Garcia, PhD

Senior Editor

PLOS Biology

on behalf of

Richard Hodge, PhD

Senior Editor

PLOS Biology

rhodge@plos.org

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Fig. Isoform switch caused functional/structural changes of genes.

    Expression pattern of gene isoforms which showed switch during preimplantation embryo development. The exact functional/structural domains predicted in each isoform are annotated based on the CATH database (http://cathdb.info/search/by_sequence).

    (EPS)

    pbio.3002505.s001.eps (1.6MB, eps)
    S2 Fig. Relationship between gene and isoform expression.

    (A, B) The ratios of genes detected with different numbers of isoform types for each stage of mouse embryos and mESCs in this study (A) and SCAN-seq data (B). The raw data for these 2 plots are supplied in S6 Data. (C) The ratios of genes detected with different numbers of isoform types for full oocyte, 1/2 oocyte, and 1/4 oocyte. The raw data for this plot is supplied in S2 Data. (D, E) Expression levels of genes detected with different numbers of isoform types for each stage of mouse embryos and mESCs in this study (D) and SCAN-seq data (E). The raw data for these 2 plots are supplied in S6 Data. (F) Gel view of cDNA amplification products of each gene. Srsf7, Rps5, and Rps19 are examples of highly expressed genes (CPM >100) and Plp2, Mrps6, and Ssl8l2 are lowly expressed genes (CPM <10). The raw image for this plot is supplied in S1 Raw Images. (G) Density plot showing the proportion of the major isoforms in genes expressing multiple isoform types. Only genes detected with UMI counts over 5 were included. The raw data for this plot is supplied in S6 Data.

    (TIF)

    pbio.3002505.s002.tif (2.4MB, tif)
    S3 Fig. The characteristics of different types of transcripts.

    (A) Ratios of the transcripts overlapped with annotated TSS. Transcripts with the 5-terminal locating within 200 bp of the CAGE peaks are regarded as overlapped transcripts. The raw data for this plot is supplied in S7 Data. (B) Length distribution of different types of transcripts. The raw data for this plot is supplied in S7 Data. (C) Relative length of the predicted protein to the complete reference ORF of each type of transcript. The raw data for this plot is supplied in S7 Data. (D) Ratios of each type of transcript at different stages calculated using SCAN-seq data. The raw data for this plot is supplied in S7 Data. (E) Expression level of the 3-prime partial transcripts detected at each stage in SCAN-seq data. The raw data for this plot is supplied in S7 Data.

    (EPS)

    pbio.3002505.s003.eps (1.6MB, eps)
    S4 Fig. Characteristics of TE expression during preimplantation embryo development.

    (A) Expression level of all TE counts belonging to different families in each stage. Each dot represents a cell. The raw data for this plot is supplied in S5 Data. Expression level of each TE locus (lower panel) belonging to different families in each stage. Only active loci in each cell were calculated. Each dot represents a cell. The raw data for this plot is supplied in S5 Data. (B) The top 5 subfamilies of ERVL expressed in clusters C4–C5 in Fig 5E. (C) The top 5 subfamilies of LINE1 expressed in clusters C4–C5 in Fig 5F.

    (EPS)

    pbio.3002505.s004.eps (5.1MB, eps)
    S5 Fig. The loci expression of TEs and cell heterogeneity along developmental stages.

    (A) The loci expression of stage specific TEs (mean expression >1 log count across all cells) belonging to ERVL family. The raw data for this plot is supplied in S8 Data. (B) The loci expression of stage specific TEs (mean expression >1 log count across all cells) belonging to LINE1 family. The raw data for this plot is supplied in S8 Data. (C) Correlation coefficients of blastomeres within the same embryos or different embryos at the same stage base on gene expression data. The raw data for this plot is supplied in S8 Data. (D) Correlation coefficients of blastomeres within the same embryos or different embryos at the same stage base on isoform expression data. The raw data for this plot is supplied in S8 Data.

    (EPS)

    pbio.3002505.s005.eps (3.9MB, eps)
    S1 Table. The list of stage-specific genes and stage-specific isoforms.

    (XLSX)

    pbio.3002505.s006.xlsx (714.4KB, xlsx)
    S2 Table. The expression matrix of 3-prime partial transcripts.

    (XLSX)

    S3 Table. The expression matrix of TE loci.

    (XLSX)

    pbio.3002505.s008.xlsx (3.6MB, xlsx)
    S4 Table. The list of TE chimeric genes.

    (XLSX)

    pbio.3002505.s009.xlsx (360.5KB, xlsx)
    S1 Raw Images. Raw images.

    (PDF)

    pbio.3002505.s010.pdf (7.7MB, pdf)
    S1 Data. Raw data for Fig 1.

    (XLSX)

    pbio.3002505.s011.xlsx (54.4KB, xlsx)
    S2 Data. Raw data for Fig 2.

    (XLSX)

    pbio.3002505.s012.xlsx (12.8MB, xlsx)
    S3 Data. Raw data for Fig 3.

    (XLSX)

    pbio.3002505.s013.xlsx (20.7KB, xlsx)
    S4 Data. Raw data for Fig 4.

    (XLSX)

    pbio.3002505.s014.xlsx (24.5KB, xlsx)
    S5 Data. Raw data for Figs 5 and S4.

    (XLSX)

    pbio.3002505.s015.xlsx (32.2MB, xlsx)
    S6 Data. Raw data for S2 Fig.

    (XLSX)

    pbio.3002505.s016.xlsx (53.4MB, xlsx)
    S7 Data. Raw data for S3 Fig.

    (XLSX)

    pbio.3002505.s017.xlsx (14MB, xlsx)
    S8 Data. Raw data for S5 Fig.

    (XLSX)

    pbio.3002505.s018.xlsx (345.5KB, xlsx)
    Attachment

    Submitted filename: Response to Reviewers-2023-10-20.docx

    pbio.3002505.s019.docx (1.5MB, docx)
    Attachment

    Submitted filename: response to reviewers.docx

    pbio.3002505.s020.docx (10.8KB, docx)

    Data Availability Statement

    All single-cell isoform sequencing data generated in this study are available at NCBI Gene Expression Omnibus (https://www.ncbi.nlm.nih.gov/geo/) under the accession number GSE250381. Code availability The HIT-scISOseq analysis pipeline and source code are available from https://github.com/shizhuoxing/scISA-Tools. Additionally, the source code utilized in this study has been published at: https://zenodo.org/records/10394889.


    Articles from PLOS Biology are provided here courtesy of PLOS

    RESOURCES