Skip to main content
Briefings in Bioinformatics logoLink to Briefings in Bioinformatics
. 2025 Jan 11;26(1):bbae720. doi: 10.1093/bib/bbae720

Supervised analysis of alternative polyadenylation from single-cell and spatial transcriptomics data with spvAPA

Qinglong Zhang 1,#, Liping Kang 2,#, Haoran Yang 3, Fei Liu 4, Xiaohui Wu 5,6,
PMCID: PMC11724721  PMID: 39799000

Abstract

Alternative polyadenylation (APA) is an important driver of transcriptome diversity that generates messenger RNA isoforms with distinct 3′ ends. The rapid development of single-cell and spatial transcriptomic technologies opened up new opportunities for exploring APA data to discover hidden cell subpopulations invisible in conventional gene expression analysis. However, conventional gene-level analysis tools are not fully applicable to APA data, and commonly used unsupervised dimensionality reduction methods often disregard experimentally derived annotations such as cell type identities. Here, we proposed a supervised analytical framework termed spvAPA, specifically used for APA analysis from both single-cell and spatial transcriptomics data. First, an iterative imputation method based on weighted nearest neighbor was designed to recover missing APA signatures, by integrating both gene expression and APA modalities. Second, a supervised feature selection method based on sparse partial least squares discriminant analysis was devised to identify APA features distinguishing cell types or spatial morphologies. Additionally, spvAPA improves the visualization of high-dimensional data for discovering novel cell subtypes, which considers APA features and dual modalities of gene expression and APA. Evaluations across nine single-cell and spatial transcriptomics datasets demonstrate the effectiveness and applicability of spvAPA. spvAPA is available at https://github.com/BMILAB/spvAPA.

Keywords: alternative polyadenylation, single-cell RNA-seq, spatial transcriptomics, supervised analysis, visualization

Introduction

Alternative polyadenylation (APA) is an essential post-transcriptional modification during messenger RNA (mRNA) maturation in eukaryotic cells [1], which generates multiple transcript isoforms with distinct 3′ ends via selective choice of different polyadenylation [poly(A)] sites. APA contributes greatly to the complexity of the transcriptome and the diversity of the proteome. APA is highly tissue specific and is dynamically regulated under various environmental conditions, cell types, and states [2–6]. APA also plays an important role in various biological processes, including cell proliferation, differentiation, neurological diseases, and tumorigenesis [7].

Currently, many sequencing strategies targeting 3′ ends of the transcripts have been developed for single-cell transcriptomics (e.g. 10x Chromium [8], Drop-seq [9], and mcSCRB-seq [10]) or spatial transcriptomics (e.g. 10x Visium [11]). These emerging technologies have also given rise to innovative bioinformatics analysis methods that enable the study of gene expression regulation at the transcript level rather than the conventional gene level [12, 13]. The transcript-level analysis has unveiled novel cell subtypes and disease-associated genes that are not discernible through traditional gene expression profiling alone [13–15]. A variety of tools, such as Sierra [12] and scAPAtrap [13], are available for identifying and quantifying poly(A) sites from single-cell RNA-seq (scRNA-seq) data. Similar to the gene–cell expression matrix obtained from scRNA-seq, an APA usage matrix (Inline graphic), with each row denoting an APA gene and each column denoting a cell, can be obtained after quantifying genome-wide poly(A) sites. The Inline graphic matrix usually records the relative usage (denoted as φ) of the proximal or distal poly(A) site in the 3′ untranslated region (3′ UTR) of APA genes. However, due to the high dropout rates inherent in scRNA-seq, the Inline graphic matrix suffers from a considerably higher degree of sparsity than the gene–cell expression matrix, posing substantial challenges to transcriptome analysis from the APA layer. Moreover, unlike the gene–cell/spot expression matrix that consists of count values, the Inline graphic matrix typically holds ratios between 0 and 1, which fails to satisfy the underlying data distribution assumptions required by many gene-level analysis methodologies.

Currently, only a few tools have been developed for APA analysis from single-cell and/or spatial transcriptomics data, including scDaPars [15] and stAPAminer [16]. scDaPars, an extension of DaPars [17] that was originally designed for bulk RNA-seq, identifies poly(a) sites at the single-cell level and imputes missing entries in the Inline graphic matrix. Stapaminer was designed for identifying spatially variable APA genes from spatial transcriptomics data, wherein missing values were estimated by considering neighboring spots determined by gene expression profiles. However, both methods solely rely on data from a single modality (APA or gene expression) for recovering missing entries, overlooking the complementary nature between the two modalities. Moreover, these methods used existing gene-level methods for count data to identify differentially expressed genes (DEGs) or spatially variable genes (SVGs), which are not fully applicable to the Inline graphic matrix typically holding ratio values. For example, spark [18], used in stapaminer for identifying SVGs, expects the input count data to fit a Poisson distribution. In addition, single-cell or spatial transcriptomics data are high dimensional, commonly used unsupervised computational methods such as uniform manifold approximation and projection (UMAP) have been widely used for dimensional reduction and visualization [19]. Although these algorithms are powerful, they may not constantly be the best option for a specific dataset, depending on the biological questions and analytical objectives [20]. Moreover, although these unsupervised methods can construct an unbiased manifold representation from the data, they disregard experimentally derived, biologically meaningful annotations, such as cell type identities, tissue morphological annotations, or differentiation stages. These annotations carry critical metadata about the underlying drivers of biological variation and could potentially be harnessed to improve the interpretability and accuracy of data analysis if properly incorporated into the dimensionality reduction process [21]

In this study, we proposed a supervised analytical framework termed spvAPA specifically designed for APA analysis from both single-cell and spatial transcriptomics data. Firstly, we designed an iterative imputation method for imputing missing entries in the Inline graphic matrix. Secondly, a supervised feature selection method based on sparse partial least squares discriminant analysis (sPLS-DA) is devised to identify APA features from scRNA-seq and spatial transcriptomics data. Additionally, spvAPA integrates a flexible visualization module that considers both the selected features and the dual modalities of gene expression and APA, thereby enhancing the visualization of high-dimensional scRNA-seq or spatial transcriptomics data. Evaluations across nine datasets demonstrated the applicability and effectiveness of spvAPA.

Materials and methods

spvAPA is a supervised analytical framework tailored for APA analysis from single-cell and spatial transcriptomics data, which consists of four modules: Inline graphic matrix computation module, APA signature imputation module, supervised feature selection module, and visualization module (Fig. 1). The Inline graphic matrix computation module (Fig. 1a) is responsible for generating the APA usage matrix. Genome-wide poly(A) sites were identified and quantified from single-cell and spatial transcriptomics data, and genes with at least two poly(A) sites in the 3′ UTR (called 3′ UTR-APA genes) were retained. Then, the APA usage (Inline graphic) for each 3′ UTR-APA gene, represented by the relative usage of distal poly(A) site (RUD), was calculated to construct the Inline graphic matrix. Next, by integrating the Inline graphic matrix and the gene–cell expression matrix, the APA signature imputation module recovers missing values in the ∅ matrix using an iterative method based on weighted nearest neighbor (WNN) [22] (Fig. 1b). Further, the supervised feature selection module (Fig. 1c) applies sPLS-DA to the matrix after imputation (called Inline graphic) in a supervised learning fashion. It considers metadata (e.g. cell or spot label) to identify APA features that distinguish different cell types or spatial domains. Lastly, the visualization module (Fig. 1d) integrates selected APA features and both gene expression and APA modalities for visualizing high-dimensional data. This allows for enhanced visualization of single-cell or spatial transcriptomics data, revealing potentially hidden cell subpopulations that might not be discovered solely through the gene expression profile.

Figure 1.

Figure 1

Schema of spvAPA. (a) The Inline graphic matrix computation module generates the raw APA Inline graphicmatrix from single-cell or spatial transcriptomics data. (b) The APA signature imputation module recovers missing entries in the APA Inline graphicmatrix using the WNN-based method. (c) The supervised feature selection module identifies APA features from the APA Inline graphicmatrix based on sPLS-DA that considers prior cell annotations. (d) The visualization module enhances dimensional reduction and visualization by integrating multimodal data for discovering subcell types and marker genes. WNN, weighted nearest neighbor; sPLS-DA, sparse partial least squares discriminant analysis.

A total of nine single-cell and spatial transcriptomics datasets were analyzed in this study (Supplementary Table 1), including two datasets of mouse olfactory bulb (sc-MOB1 and sc-MOB2) [23], a mouse spermatogenesis dataset [24], an HESC dataset [25], two datasets of human peripheral blood mononuclear cells (pbmc4k and pbmc8k), and three tissue sections of st-MOB analyzed in previous studies [7, 16, 26–29] (st-MOB5, st-MOB11, and st-MOB12). The single-cell datasets were all collected from whole cells rather than nuclei. Pearson’s correlation coefficients (PCCs) and silhouette coefficient (SC) score [30] were used to evaluate the performance of the imputation methods. More details are described in the supplementary text.

Results

spvAPA effectively recovers alternative polyadenylation signatures in single-cell and spatial transcriptomics data

The spvAPA framework includes an unsupervised WNN-based imputation module for recovering the highly sparse APA matrix Inline graphic. Here, we evaluated the performance of spvAPA’s imputation method using seven single-cell and spatial transcriptomics datasets (Supplementary Table 1) and compared spvAPA against two existing APA signature imputation methods, namely, scDaPars [15] and stAPAminer [16]. We also compared spvAPA with another two tools for imputing single-cell gene expression data, SAVER [31] and VIPER [32]. Since the APA profile of the imputed matrix Inline graphic should closely resemble that of the original matrix Inline graphic, we used the PCC to evaluate the correlation between the two matrices (see Materials and Methods in Supplementary Text). For all the seven datasets, the cell–cell or spot–spot PCC values after imputation by spvAPA were greatly improved (Fig. 2 and Supplementary Fig. 1). After imputation using stAPAminer, an improvement in PCC was also observed. In contrast, scDaPars performed the worst, with the PCC value close to zero and, in some cases, even lower than the PCC of the raw data. Surprisingly, SAVER or VIPER, originally designed for imputing gene expression data, performed even better than stAPAminer or scDaPars. Overall, spvAPA performs the best among all tools, achieving much higher PCC values than other tools.

Figure 2.

Figure 2

Pearson’s correlations estimated using imputed Inline graphic matrices by different methods and the raw Inline graphic matrix for SC-MOB1 (a), PBMC4K (b), and ST-MOB11 (c). For each dataset, the Pearson’s correlation of the APA profile of each cell/spot in each cell type/layer with the average APA profile of the raw data in the respective cell type/layer were calculated.

Next, we further evaluated the robustness and effectiveness of the WNN module from different perspectives. First, we compared the WNN module with the latest deep learning model called JAMIE (Joint Variational Autoencoders for Multimodal Imputation and Embedding) [33]. Results across the seven datasets showed that the WNN outperformed JAMIE in imputing APA signatures (Supplementary Fig. 2). Second, we evaluated the imputation effect of the WNN module on imbalanced datasets. The results suggested that the performance improvement of rare cell types after imputation is lower than that of normal cell types, but imputation always contributes to performance improvement (Supplementary Fig. 3). Moreover, we examined the impact of the parameter of the WNN module that determines the number of nearing neighbors (Inline graphic) on APA matrix imputation and APA feature selection. Results demonstrated the robustness of the WNN method to different Inline graphic values (Supplementary Fig. 4). In addition, we evaluated the computational efficiency of the WNN method on datasets with a different number of genes and cells, and results demonstrated its scalability (Supplementary Fig. 5). The details of these results are provided in Supplementary Text Note 1.

Next, we explored spvAPA’s capability in recovering genes with differentially used APA sites (called DEAPA genes). We used the mouse spermatogenesis dataset [24] for evaluation as distinct APA dynamics have been revealed during the three spermatic states, spermatocytes (SCs), round spermatids (RSs), and elongating spermatids (ESs) [13, 14, 34–36]. First, we identified DEAPA genes from the matrix Inline graphic (before imputation) as the reference. We applied the Wilcoxon test to test the significance of differences on Inline graphic values of the matrix Inline graphic for each APA gene between each pair of cell types and obtained DEAPA genes among the three stages. For instance, we identified 684 DEAPA genes between SC and ES, with 592 DEAPA genes with longer 3′ UTR in SC and only 92 in ES (Supplementary Fig. 6), aligning with previous studies that reported a shortening of 3′ UTR length in sperm cells as development progresses (SC → RS → ES) [13, 14, 34–36]. Next, we randomly masked 10%–90% nonmissing entries in the Inline graphic matrix to construct a matrix Inline graphic with a higher dropout rate. Accordingly, the same entries in the corresponding gene expression matrix Inline graphic were masked. This allowed us to use the original matrix Inline graphic as the gold standard for assessing the efficacy of imputation methods. After imputing the matrix Inline graphic with different imputation methods, DEAPA genes were identified. The recovery effect of DEAPA genes by different methods was evaluated by precision, recall, and F1 score (Fig. 3). The recall values of stAPAminer and spvAPA were comparable, with spvAPA exhibiting a marginally higher recall than stAPAminer and both demonstrating much higher performance than scDaPars in this regard. In terms of precision, spvAPA surpassed stAPAminer, but it had a mixed record against scDaPars. Using F1 score as a combined measure of precision and recall, spvAPA consistently outperformed stAPAminer and showed a notably superior performance over scDaPars in the majority of cases. These results suggested the high performance of spvAPA in effective recovering of APA signatures from data with high dropout rates.

Figure 3.

Figure 3

Evaluation of different methods in recovering genes with differentially used APA sites (DEAPA genes) using the mouse spermatogenesis dataset. DEAPA genes among each pair of the three developmental states during mouse spermatogenesis, including SCs, RSs, and ESs, were calculated. DEAPA genes from the raw matrix Inline graphic without masking were used as the reference. Nonmissing entries in the Inline graphic matrix were randomly masked to construct matrices with higher dropout rate from 10% to 90%. Three imputation methods were then applied to these matrices for recovering APA signatures, and DEAPA genes were obtained.

spvAPA enhances dimensionality reduction and visualization for single-cell and spatial transcriptomics data

Having demonstrated that spvAPA can effectively recover the APA profile, we then examined whether the recovered APA profile can enhance the visualization of single-cell data to more clearly distinguish different cell types. Across all seven datasets, the UMAP visualization of the Inline graphic matrices derived from spvAPA showed superior performance compared to both the original Inline graphic matrix and the imputed Inline graphic matrices from stAPAminer and scDaPars, indicating notably enhanced differentiation among distinct cell types (Fig. 4a and Supplementary Fig. 7). Particularly, in complex tissue samples like PBMC4K and SC-MOB1 (Fig. 4a), the visualization of the Inline graphic matrix obtained by spvAPA clearly differentiates all cell types. In contrast, for PBMC4K, the visualization of the matrix Inline graphic obtained by stAPAminer or scDaPars only separated B cells and mononuclear cells (Mono) but failed to distinguish other cell types. For SC-MOB1, the visualization of the original Inline graphic matrix or the Inline graphic matrix obtained by stAPAminer or scDaPars failed to separate almost all cell types.

Figure 4.

Figure 4

Dimensionality reduction and visualization for single-cell and spatial transcriptomics data. (a) UMAP visualization for the raw Inline graphic matrix or imputed Inline graphic matrices by different methods for PBMC4K, SC-MOB1, and ST-MOB11. For each matrix, normalization and dimensionality reduction with PCA were first performed by Seurat before UMAP visualization. (b) Evaluation of different methods in distinguishing cell types for different datasets using SC.

Next, we used the SC metric to compare the effectiveness of Inline graphic matrices obtained by different methods in distinguishing different cell types. Across all seven datasets, except for st-MOB5, SC scores obtained by spvAPA were noticeably higher than those obtained by other methods or the original matrix Inline graphic (Fig. 4b). For st-mob5, the SC score achieved with spvAPA was only marginally lower than that of stapaminer (spvAPA = 0.389; stAPAminer = 0.423), yet still much higher than the SC score obtained by scDaPars (0.192). These results, as evidenced by evaluations across diverse scRNA-seq and spatial transcriptomics datasets, demonstrated that restoring the highly sparse APA profile via spvAPA enhanced visualization and enabled more accurate discrimination between different cell types and spatial domains.

spvAPA discovers subcell types from scRNA-seq data

In the above visualization of the PBMC4K and PBMC8K data, we observed some small clusters containing very few single cells in the UMAP plots (Fig. 4a and Supplementary Fig. 7). Subsequently, we delved into the potential of spvAPA in discovering subcell types. We obtained additional annotation of subcell types using Azimuth [22] and found that a small cluster of cells in PBMC4K and PBMC8K were annotated as platelet (Fig. 5a and Supplementary Fig. 8a). Platelets are diminutive cellular fragments derived from megakaryocytes within the bone marrow. Within the milieu of PBMC samples, platelets are a scant fraction that have not been fully segregated, rendering their recognition challenging [22]. Two platelet-specific marker genes, PF4 and PPBP, were universally highly expressed in the platelet cluster (Fig. 5b and Supplementary Fig. 8b), indicating that the small cluster exclusively discovered by spvAPA indeed is platelet. In contrast, using the original matrix Inline graphic or the matrix Inline graphic obtained by stAPAminer or scDaPars failed to distinguish platelets from other cells (Fig. 4a). Next, we proceeded to visualize another highly heterogeneous dataset, SC-MOB, to examine whether neuronal subtypes can be identified. The visualization of the original Inline graphic matrix or the Inline graphic matrices obtained through stAPAminer or scDaPars only distinguished one to two neuronal subtypes (Fig. 5c). In contrast, the visualization based on spvAPA-derived Inline graphic matrix revealed a clear separation of most neuronal subtypes. Moreover, SC values obtained by spvAPA were much higher than those from the original Inline graphic matrix and the Inline graphic matrices obtained from stAPAminer or scDaPars.

Figure 5.

Figure 5

Discovery of subcell types from scRNA-seq data with spvAPA. (a) UMAP plot showing a small cluster of platelets in PBMC4K. (b) UMAP plot showing the expression of two platelet-specific marker genes, PF4 and PPBP. (c) UMAP visualization of neuronal subtypes using the raw Inline graphic matrix or imputed Inline graphic matrices by different methods for SC-MOB1 and SC-MOB2. The corresponding SC score was shown on each plot. d UMAP plot using the matrix Inline graphic generated by spvAPA shows that human embryonic stem cells differentiated for 96 h were further divided into two subclusters. (e) Expressions of four marker genes related to endoderm development in the two subclusters.

APA patterns are globally regulated and play a crucial role in cell differentiation. During embryonic development, the usage of proximal poly(a) site gradually decreases [6, 37]. Gao et al. [15] applied scdapars to the human embryonic stem cell data to obtain the matrix Inline graphic, and identified a new cell subcell type invisible to conventional gene expression analysis. Next, we explored whether the matrix Inline graphic generated by spvAPA could aid in discovering novel subcell types. Initially, by applying spvAPA to impute the matrix ∅ of the HESC data and visualizing through UMAP, the imputed matrix Inline graphic was found to separate cells of various differentiation times (Supplementary Fig. 9a). Especially, in Gao et al. [15] who also analyzed the HESC data, cells differentiated for 72 and 96 h could not be distinguished, and cells differentiated for 12 and 24 h had low separability (Supplementary Fig. 9b). In contrast, the matrix Inline graphic generated by spvAPA clearly distinguished cells from different differentiation times. Using Louvain clustering, cells differentiated for 96 h were further divided into two subclusters, Subcluster 0 and Subcluster 1 (Fig. 5d). To verify that the identified subclusters represent different stages of differentiation, we examined four marker genes related to endoderm development: gata6, eomes, lhx1, and hmga2. Gata6 and eomes were expressed higher in the subcluster with a higher degree of differentiation [25]. lhx1 is crucial for kidney development [38]; hmga2 is essential for epithelial differentiation during embryonic lung development [39]. These marker genes were more highly expressed in Subcluster 1, suggesting that Subcluster 1 was more differentiated than Subgroup 0 (Fig. 5e). These results demonstrated that the APA information restored by spvAPA could effectively discover subcell types and separate complex cell types invisible to conventional gene expression analysis.

spvAPA identifies alternative polyadenylation features from scRNA-seq data and enhances visualization

Next, we proceeded with the single-cell mouse olfactory bulb (MOB) data (SC-MOB1 and SC-MOB2) to demonstrate the ability of spvAPA in selecting APA features distinguishing cell types in a supervised manner. Firstly, we utilized the imputation module of spvAPA to obtain the matrix Inline graphic for SC-MOB1 data. Then, we employed the feature selection module based on sPLS-DA in spvAPA to identify distinctive APA features of different cell types of MOB. In total, we obtained 11 components and 881 nonredundant APA features from SC-MOB1 (Supplementary Table 2). Each component could be considered as a meta-gene, and the overall score of each meta-gene is a linear combination of the APA features of the corresponding component. Then, cell type–specific APA features were identified in each meta-gene, according to the contributions of APA features in the loading matrix obtained by sPLS-DA (Supplementary Fig. 10). For example, the meta-gene corresponding to the first component (comp 1) scores lower in red blood cells (RBCs) than in other cell types; the meta-gene of the fourth component (comp 4) scores lower in endothelial cells (ECs) but higher in microglia (MicroG). Through UMAP visualization of genes with the highest contribution in each component, it can be observed that these genes exhibit differential usage preferences of APA sites in specific cell types (Fig. 6a). For instance, the RUD score (Inline graphic) for SEMA3D in RBCs is much higher than in other cell types, indicating a preference for using the distal poly(A) site in RBCs. SEMA3D has been reported to encode a member of the semaphorin III family, which was involved in axon guidance during neuron development [40].

Figure 6.

Figure 6

spvAPA identifies APA features from scRNA-seq data and enhances visualization. (a) Distribution of single-cell RUD scores of genes with the highest contribution in each component. (b) Venn diagram showing the overlap of APA features identified by spvAPA in SC-MOB1 with 902 marker genes. (c) Top GO terms for the 881 APA features identified by spvAPA in SC-MOB1. BP, biological process; CC, cellular component; MF, molecular function. (d) Venn diagram showing the overlap of GO terms derived from the APA features identified by spvAPA and the collected 3088 important genes for the olfactory system in SC-MOB1. (e) UMAP visualization by integrating the gene expression and APA modalities using sPLS + UMAP and Seurat + UMAP, respectively.

Comparing the 881 APA features identified in SC-MOB1 with 902 marker genes for 12 cell types identified from the single-cell gene expression profile [23] (Supplementary Table 3), only 49 genes overlapped (Fig. 6b). The 881 APA features identified in SC-MOB1 were enriched in 1248 gene ontology (GO) terms (Fig. 6c and Supplementary Table 4), with the majority of these GO terms being closely related to synapse organization and neuron development. For example, the GO term with the lowest adjusted P-value (Inline graphic), GO:0016358, is related to dendritic development of neural cells and involves 39 APA genes. As a comparison, we also conducted GO enrichment analysis for the collected 3088 important genes of the olfactory system (Supplementary Table 3) and obtained a total of 3314 significantly enriched GO terms (Supplementary Fig. 11 and Supplementary Table 5). Of the 1248 GO terms derived from the APA features, 73.5% (917) were found among the pool of 3314 GO terms (Fig. 6d). Notably, there was substantial overlap between the top 50 GO terms enriched in both gene sets (APA features and collected genes), with 48 GO terms in common. In addition, the APA features were enriched in seven KEGG (Kyoto Encyclopedia of Genes and Genomes) pathways (Supplementary Table 6), among which five pathways were found in the pathways using the collected 3088 important genes (Supplementary Table 7). Particularly, the Polycomb repressive complex pathway enriched exclusively with APA features has been found crucial in regulating basal cell fate during adult olfactory neurogenesis [41]. These results suggest that APA features identified based on APA profiles constitute a distinct set of genes compared to the marker genes derived from traditional gene expression analysis, yet both contribute substantially to the primary functions of the olfactory system. Therefore, APA dynamics are independent of gene expression and represent an important source of cell–cell heterogeneity in the olfactory bulb.

Further, we found that the inclusion of APA features can improve the dimensionality reduction visualization of scRNA-seq data, and different modalities (i.e. matrix Inline graphic and matrix Inline graphic) may have varying or complementary contributions to the differentiation of different cell types. Detailed results are described in Supplementary Text Note 2 and Supplementary Fig. 12. Moreover, the semisupervised dimensionality reduction and visualization scheme of sPLS + UMAP provided in the spvAPA framework could effectively integrate multimodal data, facilitating dimensionality reduction and visualization. Detailed results are described in Supplementary Text Note 3, Fig. 6e, and Supplementary Fig. 12.

spvAPA identifies spatially resolved alternative polyadenylation features from spatial transcriptomics data

Having demonstrated that spvAPA can effectively identify APA features from scRNA-seq data, next, we examined the applicability of svpAPA on spatial transcriptomics data using ST-MOB data as an example. The MOB is arranged in layers including the olfactory nerve layer (ONL), glomerular layer (GL), external plexiform layer (EPL), mitral cell layer (MCL), and granular cell layer (GCL) (Fig. 7a). First, we identified poly(A) sites from ST-MOB and obtained the APA usage matrix Inline graphic. Then, we recovered missing entries in the matrix Inline graphic to obtain the matrix Inline graphic by the imputation module of spvAPA. The matrix Inline graphic was further processed by the sPLS-DA module in spvAPA to identify spatially resolved APA features distinguishing morphological layers (Supplementary Table 8). For ST-MOB11, four components were obtained, with the optimal number of APA features for each component being 300, 260, 50, and 18, respectively. Each component could be considered as a meta-gene, and the average RUD score of each meta-gene in each spot was then computed from the matrix Inline graphic, denoted as meta-φ. Visualization of the meta-φ values for the four components demonstrated distinct spatial APA patterns (Supplementary Fig. 13). For example, the meta-φ value of the first component (comp 1) decreased from the outer to the inner layers; the second component (comp 2) showed lower meta-φ value in the ONL layer and higher in the EPL layer, with moderate level in other layers; the third component (comp 3) exhibited low meta-φ value in the GL layer and high in the EPL layer, with moderate in the remaining layers; and the fourth component (comp 4) showed higher meta-φ value exclusively in the MCL layer. We then selected the most contributive genes from each component (Fig. 7b). Eci3 is a gene involved in lipid formation and the recycling of dopamine (DA) vesicles at synapses, as well as affecting synaptic nuclear protein metabolism [42]. The φ values of Eci3 were close to 1 in the GCL and MCL layers and near 0 in other layers, indicating a preference for distal poly(A) site in the GCL and MCL layers. Lrrc6 shows a preference for proximal poly(A) site in the ONL and GCL layers. Htra3 is a gene involved in intracellular protein hydrolysis and a paralog of Htra1 and was identified as a biomarker for Alzheimer’s disease in mice through proteomic analysis [43]. It prefers using the distal poly(A) site in the GL layer. Enpp6, involved in the myelination of neuronal axons [7], shows a preference for the distal poly(A) site in the MCL layer

Figure 7.

Figure 7

Analysis of spatial transcriptomics data with spvAPA. (a) Hematoxylin & eosin stain image and the layer annotation of the mouse olfactory bulb data ST-MOB11. (b) RUD scores for genes with the highest contribution in each component identified by the sPLS-DA module in spvAPA. (c) Top GO terms for the 468 APA features identified by spvAPA in ST-MOB11. BP, biological process; MF, molecular function. (d) Upset plot showing the overlap of APA features identified from the two replicates of SC-MOB and the three sections of ST-MOB. The bar chart shows the number of APA features from different combination of datasets. The black dots and lines represent the combination of datasets. The smaller bar chart on the left displays the total number of APA features from each dataset. The embedded Venn diagram shows the overlap between nonredundant APA features obtained from the two replicates of SC-MOB and those obtained from the three sections of ST-MOB.

To further select more robust APA features, we utilized the framework of 10-fold cross-validation using sPLS-DA (see Materials and Methods in Supplementary Text). A total of 468 genes were retained, with 258, 208, 35, and 12 APA features for the four components, respectively. These APA features enriched in 214 GO terms (Fig. 7c and Supplementary Table 9). Interestingly, most of these GO terms were related to transcription, which was different from the GO result of SC-MOB1 (Fig. 6c). A recent study by Navarro et al. [44] on ST-MOB conducted GO analysis for morphological layer-specific expression genes and found that a third of the biological process (BP) terms were related to transcription. This suggests that APA features that distinguish morphological layers of the olfactory bulb might predominantly affect activities such as transcription.

Next, we identified the APA features of the other two tissue sections of ST-MOB (ST-MOB5 and ST-MOB12). Surprisingly, the number of APA features recognized in ST-MOB12 (only 31) was much less than the other two sections (ST-MOB5 = 443; ST-MOB11 = 468) (Fig. 7d). However, it is indeed evident from the low-dimensional embeddings of the gene expression matrix and/or APA matrix that the separation of layers in ST-MOB12 is substantially lower than that in ST-MOB5 or ST-MOB11 (Fig. 4b and Supplementary Fig. 7), making it difficult to identify clear APA features for ST-MOB12. In contrast, 881 and 742 APA features were identified from SC-MOB1 and SC-MOB2, with 478 overlaps, which is much higher than the overlap of the three sections in ST-MOB (Fig. 7d). This indicates that the difference among different sections of spatial transcriptome may be much greater than that among different replicates of scRNA-seq data. Moreover, a total of 1067 nonredundant APA features were obtained from SC-MOB, with only 115 overlaps with APA features identified from ST-MOB (Fig. 7d). Such a low overlap rate suggests that only ~10% of APA genes identified based on prior information in the morphological layer may be explained by cell type specificity, and the vast majority of the identified APA features are due to spatial differences rather than differences in cell type.

Discussion

Poly(A) sites are associated with genes, and the modality of APA and gene expression are not independent. In our pipeline, we employed the RUD matrix Inline graphic instead of the poly(A) site expression matrix to represent the APA modality. This is because, for genes with only one poly(A) site, their poly(A) site expression approximates the gene-level expression, which would introduce redundant information when using the poly(A) site expression matrix. In contrast, the matrix Inline graphic represents variations of the 3′ UTR length across cells/spots, providing complementary information to the gene expression profile. However, the matrix Inline graphic is even sparser than the poly(A) site or gene expression matrix, therefore, utilizing imputation methods to restore authentic APA dynamics holds significant importance. The evaluation results demonstrated the superiority of the WNN-based imputation module provided in spvAPA (Figs 2 and 3, and Supplementary Fig. 1). Existing tools for imputing the APA matrix ∅ used only a single modality to identify nearest neighbor cells; stAPAminer used only the matrix G, and scDaPars used only the matrix ∅. However, the information from a single modality is limited. For instance, in some heterogeneous cells, gene expression profiles may be similar but APA profiles may differ greatly. In such case, relying solely on gene expression profiles (the matrix G) to determine nearest neighbor cells will lead to erroneous estimation of cell–cell similarity. Previous studies have revealed hidden cell subpopulations invisible to gene expression [13, 15, 45, 46]. Our analysis on the SC-MOB data revealed that the gene expression profile and APA profile each captured distinct sets of cell-type-specific genes (Figs. 6b–d and Supplementary Fig. 12). Different from scDaPars or stAPAminer, spvAPA integrated both gene expression modality and APA modality to extract cell type heterogeneity from different aspects, thereby accurately identifying nearest neighbor cells for recovering missing APA signatures.

Many existing bioinformatic methods for analyzing single-cell and spatial transcriptomics data focus only on gene level, such as detecting DEGs or SVGs based on gene expression. There are also a few emerging studies that identify DEAPA genes among different cell types or specific to spatial domains, but almost all use nonparametric statistical testing methods such as the Wilcoxon rank-sum test or the chi-square test [12, 14, 45–47]. Although nonparametric tests may be efficient in some cases, they consider only individual genes and ignores potential relationship between genes. Moreover, due to the fact that the APA profile is represented as ratios rather than counts, and many gene-level tools used for DEG or SVG detection require data to follow a certain distribution such as Poisson or negative binomial distribution [18, 48], they are not suitable for APA analysis. The sPLS-DA module implemented in spvAPA is a multivariate prediction method that uses prior category labels for APA feature selection, which can simultaneously consider multiple genes and solve the multicollinearity problem in high-dimensional data without requiring any data distribution assumptions. Therefore, spvAPA is highly suitable for identifying APA markers from single-cell and spatial transcriptomics data that are extremely sparse and noisy (Fig. 6). It should be noted that the proposed spvAPA is a supervised method that can identify and highlight differences between specified categories, which is not applicable to experiments without prior labels. However, due to the fact that many single-cell or spatial datasets contain rich metadata information (e.g. cell type information, developmental time, and cell cycle), spvAPA has broad applicability. For data lacking metadata information, various automatic cell annotation methods [49] can be used to infer cell types through the utilization of curated marker gene databases [50], the correlation with reference expression data [51], or the application of supervised classification for label transfer [52]. It should be also noted that, as a linear method, PLS-DA cannot capture nonlinear relationships, potentially limiting its application in certain datasets. However, despite the importance of nonlinear methods in computational biology, linear methods can have comparable performance to machine learning approaches, as highlighted in previous research [53]. Furthermore, studies have shown that simple, linear models often perform satisfactorily, even for nonlinear data [21, 54–56]. Moreover, similar to the principal components (PCs) obtained from PCA (principal component analysis), PCs derived from PLS-DA can also be utilized in the processing and analysis of single-cell data, which can be fed into methods like UMAP to produce nonlinear embeddings.

Generally, the visualization module of spvAPA is highly flexible, which could be executed in an unsupervised, semisupervised, or supervised manner. The recovered APA profile from spvAPA can be used solely for visualization through unsupervised methods like UMAP (Fig. 4 and Supplementary Fig. 7). The visualization module can also use genes or APA features identified by supervised framework—PLS-DA or sPLS-DA. It also includes a semisupervised dimensionality reduction and visualization strategy combining sPLS-DA and UMAP, which can integrate the gene expression modality and APA modality for enhanced visualization. Most dimensionality reduction algorithms, such as UMAP and PCA, are unsupervised and ignore existing label information. Using spvAPA, even in a single dataset, different cell labels can be used to visualize the same cells in multiple graphs. As such, each graph can highlight the cellular heterogeneity associated with a specific biomarker. Therefore, spvAPA can benefit from information related to experimental design and sample metadata, utilizing the metadata to gain new biological insights. However, admittedly, given the diversity of techniques used in single-cell or spatial transcriptomics, there is no single method that drives optimal visualization and interpretability for all data, and different methods may have varying performance rankings using different metrics [20]. It is encouraged to incorporate PCA, UMAP, and other algorithms to benefit from the unique advantages of each algorithm. Although only the results of the sPLS-DA + UMAP stage were presented in this study, other combinations such as PLS-DA + UMAP, PCA + PLS-DA, and PCA + PLS-DA + UMAP could also be easily implemented and examined. In addition, when simplifying high-dimensional data into two dimensions, information is always lost. Therefore, visualization is only one aspect of data exploration analysis, which should be accompanied by further quantitative and high dimensional analysis.

The single-cell datasets used in this study were all from scRNA-seq rather than single-nucleus RNA-seq (snRNA-seq). Previous studies based on gene-level analysis have revealed that, comparable to scRNA-seq, snRNA-seq generally performed well for sensitivity and classification of cell types [57, 58]. However, to the best of our knowledge, there is currently no research comparing the differences between snRNA-seq and scRNA-seq in APA profiling. In fact, there are currently very few studies specifically using snRNA-seq data for APA analysis. Previously, Agarwal et al. used a dataset comprising single nucleus transcriptional profiling of ~2 million nuclei of mouse embryonic development [59] and revealed the landscape of APA in single cells of the developing mouse embryo. Wang et al. investigated dynamic APA usage in different cell types using the snRNA-seq data of 1424 human brain cells [60]. These studies demonstrate that snRNA-seq data can be used to extract APA sites and analyze APA dynamics across cell types or developmental stages. In principle, as long as APA sites can be extracted, spvAPA can be used for subsequent applications such as APA signature imputation, APA feature selection, and integrative visualization. Nevertheless, it would be interesting to conduct another comparative study in the future to compare the effectiveness of spvAPA on scRNA seq and snRNA seq data.

In addition to APA, alternative splicing (AS) of pre-mRNAs also greatly contributes to transcriptome diversity. The coordination and competition between AS and APA have been reported [61–63]. The percent spliced in (PSI) index that represents the ratio between reads including or excluding exons was commonly used in computational methods for AS [64]. Similar to the APA ratio, the PSI index is also a ratio value between 0 and 1. We speculate that spvAPA is also applicable for recovering missing entries in PSI data and can be used to integrate PSI with gene expression data for downstream analysis and visualization. In the future, additional work will be conducted using more data from different species to test the applicability of spvAPA to AS data. Particularly, it should be noted that the degree of compositionality varies across modalities of gene expression, APA, and AS because they represent proportions or relative values constrained by fixed totals. The current version of spvAPA does not fully address the compositional nature of gene expression and APA data, and it remains unclear how compositionality impacts the results in this study. In the future, specialized analysis techniques [65–68] could be incorporated for compositional single-cell data integration to resolve compositional constraints and extract meaningful biological insights. Moreover, how to integrate compositional data from more modalities such as AS, APA, and gene expression for more accurate cell identity determination is also a direction worth exploring.

Key Points

  • spvAPA is the first tool to explore alternative polyadenylation (APA) in a supervised manner from single-cell and spatial transcriptomics data.

  • spvAPA integrates complementary information from modalities of APA or gene expression, performing better than methods relying on a single modality.

  • spvAPA is capable of recovering missing APA signatures, identifying APA features and marker genes, facilitating interpretable visualization, and discovering novel cell subtypes.

  • Evaluation using nine single-cell and spatial transcriptomics datasets demonstrates the effectiveness and applicability of spvAPA.

Supplementary Material

Supplementary_Figures_bbae720
Supplementary_Tables_bbae720
Supplementary_Text_bbae720

Contributor Information

Qinglong Zhang, Cancer Institute, Suzhou Medical College, Soochow University, NO. 199 Ren-ai Road, SIP, Suzhou 215000, China.

Liping Kang, Cancer Institute, Suzhou Medical College, Soochow University, NO. 199 Ren-ai Road, SIP, Suzhou 215000, China.

Haoran Yang, Cancer Institute, Suzhou Medical College, Soochow University, NO. 199 Ren-ai Road, SIP, Suzhou 215000, China.

Fei Liu, Cancer Institute, Suzhou Medical College, Soochow University, NO. 199 Ren-ai Road, SIP, Suzhou 215000, China.

Xiaohui Wu, Cancer Institute, Suzhou Medical College, Soochow University, NO. 199 Ren-ai Road, SIP, Suzhou 215000, China; Jiangsu Key Laboratory of Infection and Immunity, Soochow University, NO. 199 Ren-ai Road, SIP, Suzhou 215000, China.

 

Conflict of interest: None declared.

Funding

This work was supported by the National Natural Science Foundation of China (T2222007 to X.W.).

Data availability

The datasets supporting the results of this article are available in https://github.com/BMILAB/spvAPA.

References

  • 1. Di Giammartino DC, Nishida K, Manley JL. Mechanisms and consequences of alternative polyadenylation. Mol Cell 2011;43:853–66. 10.1016/j.molcel.2011.08.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Wu X, Liu M, Downie B. et al. Genome-wide landscape of polyadenylation in Arabidopsis provides evidence for extensive alternative polyadenylation. Proc Natl Acad Sci 2011;108:12533–8. 10.1073/pnas.1019732108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Ozsolak F, Kapranov P, Foissac S. et al. Comprehensive polyadenylation site maps in yeast and human reveal pervasive alternative polyadenylation. Cell 2010;143:1018–29. 10.1016/j.cell.2010.11.020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Hoque M, Ji Z, Zheng D. et al. Analysis of alternative cleavage and polyadenylation by 3′ region extraction and deep sequencing. Nat Methods 2013;10:133–9. 10.1038/nmeth.2288. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Derti A, Garrett-Engele P, MacIsaac KD. et al. A quantitative atlas of polyadenylation in five mammals. Genome Res 2012;22:1173–83. 10.1101/gr.132563.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Tian B, Manley JL. Alternative polyadenylation of mRNA precursors. Nat Rev Mol Cell Biol 2017;18:18–30. 10.1038/nrm.2016.116. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Carrion SA, Michal JJ, Jiang Z. Alternative transcripts diversify genome function for Phenome relevance to health and diseases. Genes (Basel) 2023;14;11. 10.3390/genes14112051. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Zheng GX, Terry JM, Belgrader P. et al. Massively parallel digital transcriptional profiling of single cells. Nat Commun 2017;8:14049. 10.1038/ncomms14049. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Macosko EZ, Basu A, Satija R. et al. Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell 2015;161:1202–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Bagnoli JW, Ziegenhain C, Janjic A. et al. Sensitive and powerful single-cell RNA sequencing using mcSCRB-seq. Nat Commun 2018;9:2937. 10.1038/s41467-018-05347-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Haga Y, Sakamoto Y, Kajiya K. et al. Whole-genome sequencing reveals the molecular implications of the stepwise progression of lung adenocarcinoma. Nat Commun 2023;14:8375. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Patrick R, Humphreys DT, Janbandhu V. et al. Sierra: discovery of differential transcript usage from polyA-captured single-cell RNA-seq data. Genome Biol 2020;21:167. 10.1186/s13059-020-02071-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Wu X, Liu T, Ye C. et al. scAPAtrap: identification and quantification of alternative polyadenylation sites from single-cell RNA-seq data. Brief Bioinform 2021;22:bbaa273. 10.1093/bib/bbaa273. [DOI] [PubMed] [Google Scholar]
  • 14. Shulman ED, Elkon R. Cell-type-specific analysis of alternative polyadenylation using single-cell transcriptomics data. Nucleic Acids Res 2019;47:10027–39. 10.1093/nar/gkz781. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Gao Y, Li L, Amos CI. et al. Analysis of alternative polyadenylation from single-cell RNA-seq using scDaPars reveals cell subpopulations invisible to gene expression. Genome Res 2021;31:1856–66. 10.1101/gr.271346.120. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Ji G, Tang Q, Zhu S. et al. stAPAminer: mining spatial patterns of alternative Polyadenylation for spatially resolved Transcriptomic studies. Genomics Proteomics Bioinformatics 2023;21:3. 10.1016/j.gpb.2023.01.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Xia Z, Donehower LA, Cooper TA. et al. Dynamic analyses of alternative polyadenylation from RNA-seq reveal a 3a 3lvedandscape across seven tumour types. Nat Commun 2014;5:5274. 10.1038/ncomms6274. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Zhu J, Sun S, Zhou X. SPARK-X: non-parametric modeling enables scalable and robust detection of spatial expression patterns for large spatial transcriptomic studies. Genome Biol 2021;22:184. 10.1186/s13059-021-02404-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Becht E, McInnes L, Healy J. et al. Dimensionality reduction for visualizing single-cell data using UMAP. Nat Biotechnol 2018;37:38–44. 10.1038/nbt.4314. [DOI] [PubMed] [Google Scholar]
  • 20. Atitey K, Motsinger-Reif AA, Anchang B. Model-based evaluation of spatiotemporal data reduction methods with unknown ground truth through optimal visualization and interpretability metrics. Brief Bioinform 2024;25:bbad455. 10.1093/bib/bbad455. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Amouzgar M, Glass DR, Baskar R. et al. Supervised dimensionality reduction for exploration of single-cell data by HSS-LDA. Patterns 2022;3:100536. 10.1016/j.patter.2022.100536. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Hao Y, Hao S, Andersen-Nissen E. et al. Integrated analysis of multimodal single-cell data. Cell 2021;184:3573–3587.e3529. 10.1016/j.cell.2021.04.048. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Tepe B, Hill MC, Pekarek BT. et al. Single-cell RNA-seq of mouse olfactory bulb reveals cellular heterogeneity and activity-dependent molecular census of adult-born neurons. Cell Rep 2018; 25:2689–2703.e2683. 10.1016/j.celrep.2018.11.034. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Lukassen S, Bosch E, Ekici AB. et al. Characterization of germ cell differentiation in the male mouse through single-cell RNA sequencing. Sci Rep 2018;8:6521. 10.1038/s41598-018-24725-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Chu L-F, Leng N, Zhang J. et al. Single-cell RNA-seq reveals novel regulators of human embryonic stem cell differentiation to definitive endoderm. Genome Biol 2016;17:173. 10.1186/s13059-016-1033-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Li Z, Chen X, Zhang X. et al. Latent feature extraction with a prior-based self-attention framework for spatial transcriptomics. Genome Res 2023;33:1757–73. 10.1101/gr.277891.123. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Liang Y, Shi G, Cai R. et al. PROST: quantitative identification of spatially variable genes and domain detection in spatial transcriptomics. Nat Commun 2024;15:600. 10.1038/s41467-024-44835-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Shi X, Yang Y, Ma X. et al. Probabilistic cell/domain-type assignment of spatial transcriptomics data with SpatialAnno. Nucleic Acids Res 2023;51:e115. 10.1093/nar/gkad1023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Xu H, Fu H, Long Y. et al. Unsupervised spatially embedded deep representation of spatial transcriptomics. Genome Med 2024;16:12. 10.1186/s13073-024-01283-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Ferson S, Rohlf FJ, Koehn RK. Measuring shape variation of two-dimensional outlines. Syst Biol 1985;34:59–68. 10.1093/sysbio/34.1.59. [DOI] [Google Scholar]
  • 31. Huang M, Wang J, Torre E. et al. SAVER: gene expression recovery for single-cell RNA sequencing. Nat Methods 2018;15:539–42. 10.1038/s41592-018-0033-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Chen M, Zhou X. VIPER: variability-preserving imputation for accurate gene expression recovery in single-cell RNA sequencing studies. Genome Biol 2018;19:196. 10.1186/s13059-018-1575-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Cohen Kalafut N, Huang X, Wang D. Joint variational autoencoders for multimodal imputation and embedding. Nat Mach Intell 2023;5:631–42. 10.1038/s42256-023-00663-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34. Liu D, Brockman JM, Dass B. et al. Systematic variation in mRNA 3A 3r multimodal imputation and embedding. Nat Mach Intell 2007;35:234–46. 10.1093/nar/gkl919. [DOI] [Google Scholar]
  • 35. Sartini BL, Wang H, Wang W. et al. Pre-messenger RNA cleavage factor I (CFIm): potential role in alternative Polyadenylation during Spermatogenesis1. Biol Reprod 2008;78:472–82. 10.1095/biolreprod.107.064774. [DOI] [PubMed] [Google Scholar]
  • 36. Li W, Park JY, Zheng D. et al. Alternative cleavage and polyadenylation in spermatogenesis connects chromatin regulation with post-transcriptional control. BMC Biol 2016;14:6. 10.1186/s12915-016-0229-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Ji Z, Lee JY, Pan Z. et al. Progressive lengthening of 3′ untranslated regions of mRNAs by alternative polyadenylation during mouse embryonic development. Proc Natl Acad Sci 2009;106:7028–33. 10.1073/pnas.0900028106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Reidy KJ, Rosenblum ND. Cell and molecular biology of kidney development. Semin Nephrol 2009;29:321–37. 10.1016/j.semnephrol.2009.03.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Singh I, Mehta A, Contreras A. et al. Hmga2is required for canonical WNT signaling during lung development. BMC Biol 2014;12:21. 10.1186/1741-7007-12-21. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Valdembri D, Regano D, Maione F. et al. Class 3 semaphorins in cardiovascular development. Cell Adhes Migr 2016;10:641–51. 10.1080/19336918.2016.1212805. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41. Ko T, Choi R, Issa K. et al. Polycomb repressive complex 2 regulates basal cell fate during adult olfactory neurogenesis. Stem Cell Rep 2023;18:2283–96. 10.1016/j.stemcr.2023.09.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42. Dakik H, Mantash S, Nehme A. et al. Analysis of the Neuroproteome associated with cell therapy after Intranigral grafting in a mouse model of Parkinson disease. Front Neurosci 2021;15:621121. 10.3389/fnins.2021.621121. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43. Bai B, Wang X, Li Y. et al. Deep multilayer brain proteomics identifies molecular networks in Alzheimer's disease progression. Neuron 2020;105:975–991.e977. 10.1016/j.neuron.2019.12.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44. Navarro JF, Croteau DL, Jurek A. et al. Spatial Transcriptomics reveals genes associated with Dysregulated mitochondrial functions and stress Signaling in Alzheimer disease. iScience 2020;23:101556. 10.1016/j.isci.2020.101556. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45. Li GW, Nan F, Yuan GH. et al. SCAPTURE: a deep learning-embedded pipeline that captures polyadenylation information from 3′ tag-based RNA-seq of single cells. Genome Biol 2021;22:221. 10.1186/s13059-021-02437-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46. Zhou R, Xiao X, He P. et al. SCAPE: a mixture model revealing single-cell polyadenylation diversity and cellular dynamics during cell differentiation and reprogramming. Nucleic Acids Res 2022;50:e66. 10.1093/nar/gkac167. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47. Li WV, Zheng D, Wang R. et al. MAAPER: model-based analysis of alternative polyadenylation using 3′ end-linked reads. Genome Biol 2021;22:222. 10.1186/s13059-021-02429-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48. Finak G, McDavid A, Yajima M. et al. MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data. Genome Biol 2015;16:278. 10.1186/s13059-015-0844-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49. Pasquini G, Rojo Arias JE, Schäfer P. et al. Automated methods for cell type annotation on scRNA-seq data. Comput Struct Biotechnol J 2021;19:961–9. 10.1016/j.csbj.2021.01.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50. Zhang AW, O’Flanagan C, Chavez EA. et al. Probabilistic cell-type assignment of single-cell RNA-seq for tumor microenvironment profiling. Nat Methods 2019;16:1007–15. 10.1038/s41592-019-0529-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51. Aran D, Looney AP, Liu L. et al. Reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage. Nat Immunol 2019;20:163–72. 10.1038/s41590-018-0276-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52. Jia S, Lysenko A, Boroevich KA. et al. scDeepInsight: a supervised cell-type identification method for scRNA-seq data with deep learning. Brief Bioinform 2023;24:bbad266. 10.1093/bib/bbad266. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53. Christodoulou E, Ma J, Collins GS. et al. A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. J Clin Epidemiol 2019;110:12–22. 10.1016/j.jclinepi.2019.02.004. [DOI] [PubMed] [Google Scholar]
  • 54. Boufea K, Seth S, Batada NN. scID uses discriminant analysis to identify transcriptionally equivalent cell types across single-cell RNA-Seq data with batch effect. iScience 2020;23:100914. 10.1016/j.isci.2020.100914. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55. Qiao M. Factorized discriminant analysis for genetic signatures of neuronal phenotypes. Front Neuroinform 2023;17:1265079. 10.3389/fninf.2023.1265079. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56. Iwahashi N, Umakoshi H, Fujita M. et al. Single-cell and spatial transcriptomics analysis of human adrenal aging. Mol Metab 2024;84:101954. 10.1016/j.molmet.2024.101954. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57. Ding J, Adiconis X, Simmons SK. et al. Systematic comparison of single-cell and single-nucleus RNA-sequencing methods. Nat Biotechnol 2020;38:737–46. 10.1038/s41587-020-0465-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58. Lake BB, Codeluppi S, Yung YC. et al. A comparative strategy for single-nucleus and single-cell transcriptomes confirms accuracy in predicted cell-type expression from nuclear RNA. Sci Rep 2017;7:6031. 10.1038/s41598-017-04426-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59. Agarwal V, Lopez-Darwin S, Kelley DR. et al. The landscape of alternative polyadenylation in single cells of the developing mouse embryo. Nat Commun 2021;12:5101. 10.1038/s41467-021-25388-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60. Wang Y, Feng W, Xu S. et al. Extensive involvement of alternative polyadenylation in single-nucleus neurons. Genes 2020;11:709. 10.3390/genes11060709. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61. Zhang Z, Bae B, Cuddleston WH. et al. Coordination of alternative splicing and alternative polyadenylation revealed by targeted long read sequencing. Nat Commun 2023;14:5506. 10.1038/s41467-023-41207-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62. Tian B, Pan ZH, Lee JY. Widespread mRNA polyadenylation events in introns indicate dynamic interplay between polyadenylation and splicing. Genome Res 2007;17:156–65. 10.1101/gr.5532707. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63. Blake D, Lynch KW. The three as: alternative splicing, alternative polyadenylation and their impact on apoptosis in immune function. Immunol Rev 2021;304:30–50. 10.1111/imr.13018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64. Wen WX, Mead AJ, Thongjuea S. Technological advances and computational approaches for alternative splicing analysis in single cells. Comput Struct Biotechnol J 2020;18:332–43. 10.1016/j.csbj.2020.01.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65. Büttner M, Ostner J, Müller CL. et al. scCODA is a Bayesian model for compositional single-cell data analysis. Nat Commun 2021;12:6876. 10.1038/s41467-021-27150-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66. Lin H, Peddada SD. Analysis of compositions of microbiomes with bias correction. Nat Commun 2020;11:3514. 10.1038/s41467-020-17041-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67. Gloor GB, Macklaim JM, Pawlowsky-Glahn V. et al. Microbiome datasets are compositional: and this is not optional. Front Microbiol 2017;8:2224. 10.3389/fmicb.2017.02224. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68. Mandal S, van Treuren W, White RA. et al. Analysis of composition of microbiomes: a novel method for studying microbial composition. Microb Ecol Health Dis 2015;26:27663. 10.3402/mehd.v26.27663. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary_Figures_bbae720
Supplementary_Tables_bbae720
Supplementary_Text_bbae720

Data Availability Statement

The datasets supporting the results of this article are available in https://github.com/BMILAB/spvAPA.


Articles from Briefings in Bioinformatics are provided here courtesy of Oxford University Press

RESOURCES