Abstract
The subcellular localization of RNAs, including long non-coding RNAs (lncRNAs), messenger RNAs (mRNAs), microRNAs (miRNAs) and other smaller RNAs, plays a critical role in determining their biological functions. For instance, lncRNAs are predominantly associated with chromatin and act as regulators of gene transcription and chromatin structure, while mRNAs are distributed across the nucleus and cytoplasm, facilitating the transport of genetic information for protein synthesis. Understanding RNA localization sheds light on processes like gene expression regulation with spatial and temporal precision. However, traditional wet lab methods for determining RNA localization, such as in situ hybridization, are often time-consuming, resource-demanding, and costly. To overcome these challenges, computational methods leveraging artificial intelligence (AI) and machine learning (ML) have emerged as powerful alternatives, enabling large-scale prediction of RNA subcellular localization. This paper provides a comprehensive review of the latest advancements in AI-based approaches for RNA subcellular localization prediction, covering various RNA types and focusing on sequence-based, image-based, and hybrid methodologies that combine both data types. We highlight the potential of these methods to accelerate RNA research, uncover molecular pathways, and guide targeted disease treatments. Furthermore, we critically discuss the challenges in AI/ML approaches for RNA subcellular localization, such as data scarcity and lack of benchmarks, and opportunities to address them. This review aims to serve as a valuable resource for researchers seeking to develop innovative solutions in the field of RNA subcellular localization and beyond.
I. INTRODUCTION
RNA is a nucleic acid molecule that plays crucial roles in various cellular processes, including gene expression regulation, catalysis of biochemical reactions, and genetic information translation within cells. RNA is typically single-stranded and composed of a long chain of nucleotides. Each nucleotide contains a ribose sugar, a phosphate group, and one of four nitrogenous bases: adenine (A), guanine (G), cytosine (C), or uracil (U) (in contrast to thymine (T) in DNA). Multiple different categories of RNAs are found in the cell, including messenger RNA (mRNA), long non-coding RNA (lncRNA), microRNA (miRNA), etc. [1–2] RNA plays important roles in the regulation of both transcription and translation within different compartments of a cell [3–4]. In the nucleus, RNA is synthesized from DNA and serves functions such as splicing, capping, and polyadenylation in mRNA processing [5]. Mitochondrial-specific RNA encodes multiple proteins necessary for mitochondrial and metabolic functions [6]. RNA within the chloroplasts of plant cells promotes the expression photosynthesis related genes in the nucleus [7].
Messenger RNA (mRNA) is a single-stranded RNA molecule that is complementary to a specific DNA strand within a genome [8]. mRNA serves as a crucial intermediary in the process of translating genetic information from our DNA into functional proteins [8]. mRNA is more mobile than DNA within cells and can exit the nucleus, allowing it to carry genetic information to the ribosomes in the cytoplasm and act as a template for protein synthesis during translation [9]. Structures within the mRNA also influence aspects of protein synthesis and gene expression regulation [10]. Before translation, mRNA strands undergo modifications to enhance stability (such as capping and polyadenylation) and to enable the selective expression of specific regions of the genetic code, allowing for the production of different proteins from the same gene through splicing [11]. mRNA is not evenly distributed throughout cells but is instead concentrated in specific cellular compartments [12][13]. A notable example is the asymmetric mRNA distribution in ascidian embryos, first revealed by Jeffery et al. [14], a significant finding that enhances our understanding of the mechanisms underlying cell differentiation in early embryonic development. The non-random organization associated with cytoskeletal proteins within the cytoplasm provides insight into a potential mechanism for measuring their concentration [15].
mRNA localization relies on three primary mechanisms: direct transport on the cytoskeleton through molecular motors, protection from degradation, and diffusion with local entrapment [16][17][18]. These mechanisms occur through the interaction of destination-specific proteins and adaptor proteins to form a ribonucleoprotein (RNP) complex. A pivotal determinant for the localization of mRNAs is the interaction between cis-acting signals, which are mainly located in the 3′ untranslated regions (UTRs) and 5′ ends of mRNA sequences [19][20][21][22][23] for facilitating the uneven distribution of specific transcripts. These regions, often referred to as “zip codes”, are critical elements within the linear RNA sequence or structure, containing cis-regulatory elements that interact with trans-acting factors, predominantly RNA-binding proteins (RBPs). These RBPs further interact with other RBPs and RNA-binding domains (RBDs) [24][25]. These interactions are essential for the control of mRNA distribution within the cellular milieu.
Long non-coding RNAs (lncRNAs) are RNA molecules exceeding 200 nucleotides in length that, unlike mRNAs, do not encode proteins [26]. Instead, lncRNAs regulate various biological processes at transcriptional and post-transcriptional levels [27]. Their subcellular localization plays a crucial role in regulating gene expression through spatial and temporal control, which is essential for numerous cellular and developmental processes like regulated translation in highly polarized asymmetric cells, cell migration, maintenance of cellular polarity, orchestration of synaptic plasticity associated with long-term memory, assembly of protein complexes, asymmetric cell division, embryonic patterning, and cellular adaptation to stress [26][28][29][30][31][32][33][34][35][36][37]. LncRNAs can interact with chromatin-modifying complexes, altering chromatin structure and influencing gene expression by modifying transcriptional accessibility [26]. This interaction is vital in cellular differentiation and development, guiding cells toward specific fates during development and tissue formation [38]. In disease diagnosis and therapy, lncRNAs serve as important biomarkers for disease detection, prognosis, and treatment monitoring [39]. Dysregulated lncRNAs, such as MALAT1, H19, MEG3, and HOTAIR, have been linked to various cancers, while others like NBAT-1 are associated with poor cancer prognosis [40][41][42][43]. Furthermore, lncRNAs play pivotal roles in neurodegenerative diseases, such as amyotrophic lateral sclerosis (ALS), and inflammatory conditions, including inflammatory bowel diseases [44][45][46][47][48]. Their functional duality as oncogenes or tumor suppressors makes lncRNAs promising therapeutic targets [49].
Small RNAs, including microRNAs (miRNAs), are shorter than lncRNAs, typically under 200 nucleotides in length [50][51][52]. Unlike mRNAs, miRNAs are non-coding and regulate gene expression at the post-transcriptional level [53][54]. Aberrant expression of miRNAs has been implicated in diseases like cancer and COVID-19 [55][56], spurring research into miRNA-based diagnostic markers and therapeutic strategies using deep learning algorithms [57][58].
The subcellular localization of mRNA, lncRNA, and smaller RNAs such as miRNAs are integral to their functions. Proper localization of RNAs within cells is crucial for spatial and temporal regulation of gene expression, affecting cell differentiation, polarization, and development [26][27][28]. Disruptions in RNA localization are linked to diseases such as cancer [59][60][61][62][63][64], spinal muscular atrophy [65], neuronal dysfunction [59][66][67][68][69][70][71], and developmental disorders [72]. For instance, improper localization of lncRNAs, including MALAT1 and HOTAIR, has been associated with oncogenesis and metastasis [40][41][42]. These findings underscore the importance of understanding RNA localization to uncover mechanisms driving normal cellular processes and disease pathogenesis.
RNA subcellular localization prediction is inspired by parallel achievements in protein subcellular localization prediction. In the past decade, we have developed a series of AI/ML based methods for protein subcellular localization, including gene ontology-based like GOASVM [73], mGOASVM [74], R3P-Loc [75], mPLR-Loc [76] and HybridGO-Loc [77], interpretable features like FUEL-mLoc [78], mLASSO-Hum [79], SpaPredictor [80], and Gram-LocEN [81], and ensemble models like LNP-Chlo [82], EnTrans-Chlo [83] as well as membrane protein function prediction like Mem-mEN [84] and Mem-ADSVM [85]. Proteins and RNA interact in complex networks to regulate and perform cellular activities [86][87]. RNA serves as both a template and a regulatory molecule, whereas proteins often provide structural and catalytic roles that enable or modify RNA’s functions [88][89]. Together, they establish the foundational processes of life, including gene expression, regulation, and cellular maintenance [90][91]. Advances in technology have led to rapid prediction of protein subcellular localization [92][93][94] and a comprehensive review can be found in [46]. Given the intricate relationship between RNA and proteins — such as protein synthesis relying on mRNA-encoded genetic information and RNA localization influencing disease mechanisms — many models have been developed to predict RNA subcellular localization, including for specific RNA types [95][96][97].
Many studies in the past decades have used wet lab methods to determine the subcellular localization of mRNA, lncRNA and miRNA. One common method is to use RNA fluorescent in situ hybridization (RNA-FISH) to detect subcellular localization based on which state-of-the-art technologies have been developed, including smFISH [98], MERFISH [99], seqFISH+ [100] and GeoMx DSP [101], to provide high resolution images of individual transcripts. However, these methods are time-consuming and labor-intensive. In recent years, advanced high throughput RNA sequencing methods such as APEX-RIP [102] CeFra-seq [103], and CLIP-seq [104] have been introduced to detect single RNA subcellular localization. These methodologies still have several flaws, including high complexity, inherent noise, and limitations in achieving high accuracy. Consequently, the limitations of both imaging-based and sequencing-based methods have driven the development of in-silico computational techniques, which offer a faster, more cost-effective, and accurate alternative for predicting the subcellular localization of mRNA, lncRNA, and miRNA [105][106][107].
Most in-silico computational methods are based on supervised machine learning approaches that require carefully designed features. These methods can be broadly categorized into three main types based on the features they utilize: (1) sequence-based methods, which use only amino acid sequences as input; (2) image-based methods, which rely on bio-image data to detect subcellular compartments; and (3) hybrid methods, which combine bio-image and sequence data to achieve higher accuracy by leveraging the strengths of each data type while mitigating their weaknesses. This paper reviews these approaches, discussing their methodologies, strengths, limitations, and applications in RNA subcellular localization prediction. Fig. 1 provides an overview of different machine learning approaches and the main steps involved. The essential components of machine learning approaches to RNA subcellular localization are inputs (nucleotide sequences in FASTA format, bio-images), feature extraction (including sequence-based, image-based, and hybrid approaches), prediction using traditional machine learning and deep learning approaches for sequence-based and image-based models and fusion algorithms for the hybrid model, and finally RNA subcellular localization to assign a particular RNA to a single localization or multiple localizations.
Fig. 1. The overview of machine learning approaches to RNA subcellular localization.
The inputs are usually nucleotide sequences in FASTA format, bio-images, or both. Typical features used in sequence-based approaches include sequence composition, nucleotide frequency distribution, signal motifs, etc. Image-based approaches use morphological features, spatial distribution features, etc. Hybrid approaches use a combination of both sequence and image features. Typical prediction algorithms include traditional machine learning and deep learning approaches for sequence-based and image-based models, fusion algorithms for the hybrid model, etc. Prediction results can be single or multiple localizations for an RNA.
In the rest of the paper, Section II introduces the common features and algorithms in sequence-based methods. Section III covers image-based methods as well as hybrid methods that leverage both image and sequence-based data. In Section IV, existing challenges and future directions are explored and our expectations outlined. Section V concludes the paper.
II. Sequence-based Methods
In this section, we introduce typical RNA subcellular localization methods based on sequence data. Fig. 2 summarizes the general features and algorithms used in sequence-based methods. First, training data are prepared by sequence refining with techniques such as CD-HIT [108] to improve quality. Data imbalance is addressed with techniques like SMOTE [109]. Then sequence features are extracted with techniques like K-mer [110][111][112], PseKNC [113][114][115], Z-curve [116], etc. before feature selection is applied. Optimal and relevant features are selected with techniques like binomial distribution [117], ANOVA [28], Incremental Feature Selection (IFS) [117], etc. Finally, the features are used to train machine learning models with different algorithms, including conventional machine learning algorithms, deep learning algorithms, or ensemble methods.
Fig. 2. Sequence-based approaches for predicting RNA subcellular location.
Training data preparation involves techniques aimed at enhancing sequence quality for further use. Data collection focuses on extracting the desired features required for different models. Data cleaning is then performed to select optimal and relevant features from these desired features. The cleaned data is then used for model training. CD-HIT: Cluster Database at High Identity with Tolerance. SMOTE: Synthetic Minority Over-sampling Technique. PseKNC: Pseudo K-tuple Nucleotide Composition. ANOVA: Analysis of Variance. IFS: Incremental Feature Selection.
A. Sequence-based Features
Many state-of-the-art approaches on RNA subcellular localization are sequence-based because the RNA sequences are easier and cheaper to obtain compared to other features. Sequence features can exhibit various characteristics like physicochemical properties [118], nucleic acid composition [119][120], and 3D representation [121][122]. Numerous feature extraction methods are used to analyze RNA nucleotides, secondary structures, RNA-binding motifs, and genomic loci. Additionally, some models employ self-defined features generated through in silico methodologies, such as large language models [123], to make predictions. Typically, feature selection techniques are used to identify which features contribute most to different subcellular compartments. These models [28][117][124] aim to predict the subcellular locations of mRNAs or lncRNAs by analyzing the correlations between these locations and the information encoded in nucleotide sequences.
Specifically, Table 1 summarizes different kinds of sequence-based features used for RNA subcellular localization. One of the most commonly used features is composition-based features. K-mer [110] is a typical method employed in this context, where a long sequence of nucleotides is analyzed as contiguous subsequences of length K. K-mers facilitate faster sequence comparisons by breaking down sequences into smaller, more manageable pieces, which is particularly useful in large-scale genomic data analysis. Building on this concept, Yuan et al. [111] proposed a model utilizing k-mers of lengths 3, 4, and 5 to identify sequence features and their associated RNA-binding proteins (RBPs) that contribute to lncRNA subcellular localization. Yan et al. [112] developed RNATracker, one of the first computational predictors of mRNA subcellular localization using k-mers, to identify candidate cis-regulatory regions in strongly localized transcripts. However, k-mers may lack contextual information beyond their immediate sequence. To complement this, sequence order information is another type of features which are widely used.
TABLE 1:
A summary of sequence-based features for RNA subcellular localization.
| Feature | Description | References |
|---|---|---|
| K-mer | A contiguous subsequence of length K from a longer sequence of nucleotides | [110][111][112] |
| PseKNC | K-mer + distance or correlation values between K-mers at different positions in the sequence | [113][114][115] |
| PseEIIP | Integrating the Electron-Ion Interaction Potential (EIIP) values of nucleotides with the conventional k-tuple nucleotide composition to create a more informative representation of RNA sequences | [125][126][127][128][129] |
| One-hot encoding | Representing categorical data, such as nucleotide or amino acid sequences, in a vectorial format of numerical values with only a single 1 and 0’s in every other place | [130][131][132] |
PseKNC: Pseudo K-tuple Nucleotide Composition; PseEIIP: Pseudo Electron-Ion Interaction Potential.
Sequence order information is crucial in bioinformatics and genomics as it captures the arrangement and dependencies between nucleotides in a sequence, which cannot be achieved through k-mer analysis alone. Pseudo K-tuple Nucleotide Composition (PseKNC) [113] is used to capture this information by introducing distance or correlation between k-mers at different positions within the sequence. Garg [114] applied PseKNC with k-values of 3, 4, and 5 to predict five subcellular localizations of eukaryotic mRNAs using cDNA/mRNA sequences in a novel model. Su [115] additionally proposed a model incorporating PseKNC to predict the subcellular localization of lncRNAs. PseKNC has certain drawbacks however, including overfitting, limited generalization without feature selection methods, and reduced interpretability. In miRNAs subcellular localization prediction, a novel approach called kmerPR2vec, which is a fusion of positional information of k-mer and k-mer embedding, has been proposed to carry more semantic information and differentiation ability [133].
In addition, physicochemical pattern features are descriptors used in bioinformatics and computational biology to capture the inherent physical and chemical properties of biological sequences like DNA, RNA, and proteins. These features help to understand the functional and structural aspects of sequences by considering properties like hydrophobicity, charge, mass, polarity, and others. Pseudo Electron-Ion Interaction Potential (PseEIIP) is a computational method to represent nucleotide sequences. It integrates the Electron-Ion Interaction Potential (EIIP) values of nucleotides with the conventional k-tuple nucleotide composition, creating a more informative representation of RNA sequences by incorporating physicochemical information into sequence analysis. Many models [125][126][127][128][129] utilize physicochemical pattern features to identify potential drug targets and predict the subcellular localization of mRNA and lncRNA by incorporating these features into the input data.
To improve the prediction performance, some models combine different types of features in term of prediction. For example, MMLmiRLocNet [134] gathered multi-perspective sequence representations by combining lexical features based on k-mer physicochemical properties, syntactic features derived from word2vec embeddings, and semantic representations created using pre-trained feature embeddings.
As a popular data representation technique in natural language processing, one-hot encoding is widely adopted in bioinformatics for representing categorical data like nucleotide or amino acid sequences in a vectorial format of numerical values. For example, DNA and RNA sequences are composed of four types of nucleotides (A, T/U, C, G). One-hot encoding represents each nucleotide as a binary vector of length four, namely (1, 0, 0, 0) for A, (0, 1, 0, 0) for T/U, (0, 0, 1, 0) for C, and (0, 0, 0, 1) for G. In such a vector there is a single 1 and 0’s in every other place, hence the name “one-hot”. This feature representation method was adopted by [41], where the RNA primary structure is represented by 4 bits as shown above, the RNA secondary structure is represented by 6 bits, and a joint primary-secondary representation used 4 × 6 = 24 bits. Compared with the other features, one-hot encoding is very simple and widely applicable. The main disadvantages for one-hot encoding are high dimensionality from long sequences and sparsity due to the many zeroes which may result in increased computational load and memory usage. One-hot encoding is typically used as input for deep learning algorithms [130][131][132] where no biologically inspired feature engineering is needed.
After feature extraction, feature selection is widely applied in various models as features often contribute differently across different compartments. Feature selection plays a vital role in enhancing model performance, reducing complexity and noise while improving model interpretability. One approach involves using a predictive model to evaluate combinations of features and select the best-performing subset. For example, Tang et al. [125] employed the sequential forward search (SFS) strategy to identify optimal feature subsets after removing highly correlated features and ranking the remaining ones. Zhang et al. [117] introduced an IFS strategy with a binomial distribution score. To address the issue of redundancy, the minimal-redundancy-maximal-relevance (mRMR) criterion was used in the model. Another approach for feature selection is based on the statistical properties of the data. Zhang et al. [28] used a combination of the binomial distribution and ANOVA to filter out irrelevant features, while Ahmad et al. [124] employed the Pearson correlation coefficient to identify the most correlated features.
Before feature extraction, Zhang and Qiao [135] employed non-negative matrix factorization (NMF) to analyze images, followed by Kullback-Leibler divergence-based non-negative matrix factorization (KLNMF) to extract features with a high contribution rate. Similar statistics-based data filtering methods are adopted by Fan et al. [128], who utilized Variance Threshold to filter out features whose variances did not meet the specified threshold.
B. Sequence-based Algorithms
After feature extraction and selection, classification algorithms are important to make final predictions of RNA subcellular localization. As the number of features increases, model complexity, sensitivity to parameters, and computational demand also rise, requiring different levels of AI/ML algorithms ranging from conventional machine learning approaches to complex deep learning models. In addition to discussing the development of these models, we will explore techniques for handling multi-label mRNA lncRNA, and miRNA, as well as methodologies for addressing imbalanced data problems.
Table 2 shows a comprehensive list of sequence-based methods ordered chronologically according to their publication years. In the following, we give more details to the algorithms adopted by those methods.
TABLE 2:
A summary of sequence-based methods for RNA subcellular localization.
| Method | Algorithm | Localizations | Single / Multiple Labels | Reference | Year |
|---|---|---|---|---|---|
| MulStack | RF, CNN, BiLSTM | Exosome, membrane, cytosol, ribosome, endoplasmic reticulum, and nucleus for mRNA | Multiple | [136] | 2024 |
| UMSLP | CatBoost, XGBoost, DT, GNB, MLP | Nucleus, cytoplasm, extracellular region, mitochondria, endoplasmic reticulum for mRNA | Single | [116] | 2024 |
| DeepLocRNA | Deep learning with attention | Nucleus, exosome, cytosol, cytoplasm, ribosome, membrane, endoplasmic reticulum, microvesicle, and mitochondrion for mRNA | Multiple | [130] | 2024 |
| Zuckerman & Ulitsky | RF | Cell, cytosol, nucleus for LncRNA | Single | [137] | 2024 |
| Zhang et al. | BERT | Nucleus, exosome, cytoplasm, ribosome, cytosol, extracellular vesicle for LncRNA | Single | [138] | 2024 |
| PreSubLncR | CNN, BiLSTM | Cytoplasm, exosome, nucleus, and ribosome for LncRNA | Single | [139] | 2024 |
| GATLncLoc+ C&S | GNN | Cytoplasm, cytosol, exosome, nucleus, and ribosome for LncRNA | Single | [140] | 2024 |
| Deng | GNN | Cytoplasm, cytosol, exosome, nucleus, and ribosome for LncRNA | Single | [141] | 2024 |
| RNAlight | LightGBM | Nucleus and cytoplasm for mRNA | Single | [111] | 2023 |
| MSLP | Ensemble (RF, XGBoost, CatBoost, DT, SVM, GNB) | Cytoplasm, endoplasmic reticulum, extracellular, mitochondria and nucleus for mRNA | Single | [127] | 2023 |
| MRSLpred | CNN, XGBoost | Ribosome, cytosol, endoplasmic reticulum, membrane, nucleus, and exosome for mRNA | Multiple | [132] | 2023 |
| Allocator | MLP, Graph-based | Nucleus, exosome, cytosol, ribosome, membrane, and endoplasmic reticulum for mRNA | Multiple | [142] | 2023 |
| NN-RNALoc | Neural Network | Cytosol, Insoluble, Membrane, Nuclear for mRNA | Single | [143] | 2023 |
| GraphLncLoc | GCN | nucleus, cytoplasm, ribosome, exosome for LncRNA | Single | [144] | 2023 |
| GM-lncLoc | GCN | Nucleus, exosome, cytoplasm, ribosome, cytosol for LncRNA | Single | [145] | 2023 |
| lncLocator-imb | CNN, RNN | Cytoplasm, nucleus for LncRNA | Single | [129] | 2023 |
| LightGBM-LncLoc | LightGBM | Nucleus, exosome, cytoplasm, ribosome, cytosol for LncRNA | Single | [146] | 2023 |
| LncLocFormer | Transformer | Nucleus, cytoplasm, chromatin, and insoluble cytoplasm for LncRNA | Multiple | [147] | 2023 |
| DlncRNALoc | SVM | Cytoplasm, cytosol, exosome, nucleus, and ribosome for LncRNA | Single | [148] | 2023 |
| SGCL-LncLoc | GCN | Cytoplasm, nucleus for LncRNA | Single | [149] | 2023 |
| LncDLSM | CNN | Cytoplasm, cytosol, exosome, nucleus, ribosome, etc. (from the NONCODE database) for LncRNA | Single | [150] | 2023 |
| EL-RMLocNet | LSTM | Pseudopodium, nucleolus, nucleus, cytosol, mitochondrion, ribosome, endoplasmic reticulum, exosome, microvesicle, and cytoplasm for mRNA | Multiple | [151] | 2022 |
| Clarion | XGBoost | Chromatin, cytoplasm, cytosol, exosome, membrane, nucleolus, nucleoplasm, nucleus and ribosome for mRNA | Multiple | [152] | 2022 |
| DeepLncLoc | Text CNN | Nucleus, exosome, cytoplasm, ribosome, cytosol for LncRNA | Single | [153] | 2022 |
| TACOS | Tree-based (RF, ERT, XGBoost) | Cytoplasm, nucleus for LncRNA | Single | [154] | 2022 |
| mRNALocater | Ensemble (LightGBM, XGBoost, CatBoost) | Cytoplasm, endoplasmic reticulum, extracellular, mitochondria and nucleus for mRNA | Single | [125] | 2021 |
| SubLocEP | LightGBM | Cytoplasm, endoplasmic reticulum, extracellular, mitochondria and nucleus for mRNA | Single | [126] | 2021 |
| DM3Loc | CNN, BiLSTM | Exosome, membrane, cytosol, ribosome, endoplasmic reticulum, and nucleus for mRNA | Multiple | [131] | 2021 |
| mLoc-mRNA | RF, EN | cytoplasm, endoplasmic reticulum, cytosol, exosome, mitochondrion, nucleus, pseudopodium, posterior, ribosome for mRNA | Multiple | [155] | 2021 |
| Yi & Adjeroh | CNN | Nucleus, exosome, cytoplasm, ribosome, cytosol for LncRNA | Single | [156] | 2021 |
| Locate-R | SVM | nucleus, cytoplasm, ribosome, exosome for LncRNA | Single | [124] | 2020 |
| LncLocation | SVM, RF, LR, XGBoost, LightGBM | nucleus, cytoplasm, ribosome, exosome for LncRNA | Single | [128] | 2020 |
| lncLocPred | LR | Nucleus, cytoplasm, ribosome, and exosome for LncRNA | Single | [157] | 2020 |
| RNATracker | CNN, LSTM | Cytosol, nuclear, membranes, insoluble, endoplasmic reticulum, mitochondria for mRNA | Single | [112] | 2019 |
| DeepLncRNA | DNN | Nuclear, cytosolic for LncRNA | Single | [158] | 2018 |
| lncLocator | SVM, RF | Nucleus, exosome, cytoplasm, ribosome, cytosol for LncRNA | Single | [159] | 2018 |
| DeepLNC | DNN | Cytoplasm, cytosol, exosome, nucleus, ribosome, etc. (from the LNCipedia and RefSeq databases) for LncRNA | Single | [160] | 2016 |
RF: Random Forest; CNN: Convolutional Neural Network; BiLSTM: Bi-directional Long short-term memory; CatBoost: Categorical Boost; XGBoost: eXtreme Gradient Boosting; DT: Decision Tree; GNB: Gaussian Naive Bayes; MLP: Multilayer Perceptron; BERT: Bidirectional Encoder Representations from Transformers; GNN: Graph Neural Network; LightGBM: Light Gradient Boosting Machine; GCN: Graph Convolutional Network; RNN: Recurrent Neural Network; SVM: Support Vector Machine; LSTM: Long short-term memory; ERT: Extremely Randomized Trees; EN: Elastic Net; LR: Logistic Regression; DNN: Deep Neural Network.
In several related research areas, such as protein subcellular localization, conventional machine learning classification is very popular [128][161][162][163][164]. Similarly, in RNA subcellular localization, traditional machine learning methods such as Support Vector Machines (SVM), Logistic Regression [157], tree-based methods [154], and Random Forests (RF) are widely utilized. These methods are favored for their substantial computational cost savings, allowing for fast predictions when handling low-level features. For example, Fu et al. [148] proposed a discrete wavelet transform (DWT) feature extraction model to process the physicochemical property matrix of 2-tuple bases before applying SVM and optimizing the feature information using the local Fisher discriminant analysis (LFDA) algorithm. Zuckerman et al. [137] employed a Random Forest classifier to train and test different cell lines in humans and mice.
Although it is tempting to consider RNA subcellular localization prediction as a single-label classification problem - i.e., predicting one RNA to be in one particular subcellular compartment - the problem is inherently multi-label as on most occasions, RNAs may co-localize or move between two or more subcellular compartments. The most common localizations are exosome, membrane, cytosol, ribosome, endoplasmic reticulum (ER), and nucleus [131][132][136][142] though other localizations such as mitochondrion [151], ribosome [155], and microvesicle [130] were also reported. As multi-label classification is more challenging than single-label classification, the above-mentioned traditional machine learning algorithms are usually not the best choice when used alone. For instance, MulStack [136] is an ensemble learning model that adopts both random forest and deep learning algorithms. A similar strategy is adopted by Liu et al. in a 2014 study [113] as a hybrid approach that uses both XGBoost and convolutional neural networks. In many cases [136][142][147][151], deep neural networks with the attention mechanism were usually preferred because xxx. Also, inspired from success of data transformation in natural language processing, L2S-MirLoc [165] converted multi-label miRNA subcellular localization problem into multi-class problem with different machine learning approaches.
Using traditional models also has drawbacks, such as inconsistent performance across different compartments and poor effectiveness in complex scenarios, especially in multi-location predictions. In miRNA subcellular localization prediction, multi-label location prediction is a common scenario [166][167]. To address these issues, many ensemble methods are employed. Initially, LightGBM, as an ensemble model, was often used independently for prediction tasks [111][146]. This approach is similar to XGBoost [152], both of which are gradient boosting frameworks for supervised learning tasks. Subsequently, some models integrated these methods with others, such as CatBoost, SVM, and RF for a single prediction task [125][128] or combined them with sequence-based and physicochemical-based models [126]. Additionally, some models utilized the one-versus-all (OVA) approach [127] to predict multiple positions. To leverage the advantages of both traditional and deep learning approaches, some models such as CNN, MLP, and RF [124][127][132][136] incorporated a combination of methodologies.
Traditional machine learning methods generally perform well on well-designed features. However, when data are unevenly distributed across different subcellular localizations, the prediction performance will vary significantly. To address this challenge and enhance model stability, techniques like the Synthetic Minority Over-sampling Technique (SMOTE) [109] are often employed. Other commonly used methods include Random Over-sampling (ROS) [168], Supervised Over-Sampling (SOS) [159], binomial distribution-based filtering [128], and recursive feature elimination (RFE) based on Autoencoder [128]. GP-HTNLoc [169] chose separate training approaches for head and tail location labels to solve the problem of data imbalance and scarcity. Also, there is a new dataset established based on previous work [170].
In recent years, deep learning methods, particularly those involving neural networks, have gained popularity in lncRNA/mRNA subcellular localization prediction research due to their superior performance compared to conventional machine learning methods. Convolutional Neural Networks (CNNs) are widely used in this area of research, processing features embedded in sequences. Zeng et al. [153] introduced an innovative approach using TextCNN, a powerful deep learning network structure commonly used for text classification. They proposed a new subsequence embedding method where lncRNA sequences are divided into non-overlapping consecutive subsequences. Patterns from each subsequence are then extracted and combined to create a comprehensive representation that preserves sequence order information. To input data into TextCNN, the word2vec word embedding technique, widely used in natural language processing (NLP), was employed. Long Short-Term Memory (LSTM) [171] networks are another widely used technique in deep learning research. Bidirectional LSTM layers and attention mechanisms are often combined with CNNs to achieve higher accuracy in prediction tasks [112][139][172]. This has been used in distinguishing extracellular miRNAs from intracellular miRNAs [173]. Wang et al. [174] used self-attention and fully connected layer to predict miRNAs with different concatenated features including miRNA sequence features converted from sequence similarity network by node2vec. In another study, Wang et al. [131] proposed a multiscale CNN with a multi-head self-attention architecture and analyzed sequence binding motifs extracted from CNN filters. In addition, a CNN with multi-head attention was used by Wang et al. [130] to extract RBP binding signals and utilize Integrated Gradients (IG) scores to analyze motifs. Liu et al. [129] combined CNNs with Gated Recurrent Units (GRU) to address the vanishing gradient and long-term memory problems. During training, they used a label-distribution aware margin (LDAM) loss function to tackle the class imbalance problem. Finally, they introduced the SHAP framework [175] to visualize the model using a physical and chemical property matrix via Normalized Moreau-Broto auto-cross correlation (NMBACC) and sequence order information via word2vec. Word embedding technologies such as word2vec are incorporated in many deep learning training pipelines, for instance, Zhang et al. [138] proposed using a pretrained BERT model after the word embedding step. Furthermore, Zeng et al. [147] used transformer blocks with localization-specific attention. Deep Neural Networks (DNNs) are a common choice for handling high-level features [158][160]. Babaiha et al. [143] proposed an Artificial Neural Network (ANN) to assign probabilities of mRNA belonging to specific compartments based on their distance-based representations. This approach addressed the issue that when feature vectors become extremely large and sparse, models may become memory inefficient and suffer from low performance.
The sequence processing problem can also be reformulated as a graph problem. Some models transform lncRNA sequences into de Bruijn graphs [144]. For instance, Li et al. [149] applied graph convolutional networks (GCNs) to extract high-level features from these graphs after the transformation. Similarly, Deng et al. [140][141] utilized GCNs to predict lncRNA by constructing edges based on cosine similarity, measuring the similarity of features between nodes. This approach was also used with channel attention and spatial attention mechanisms (CBAM) in miRNAs subcellular localization prediction [176]. Furthermore, Deng et al. [177] combined GCN with autoencoder to gather information in neighboring nodes and implicit information of structure based on miRNA sequence semantics information extracted from sequence similarity networks, miRNA–disease association information, and disease semantic information. To address the issues of data imbalance and overfitting, a weighted graph attention network (R-GAT) combined with graph attention mechanisms and weighted loss was proposed. In comparison Cai et al. [145] integrated GCN with MAML in meta-learning and a protein sequence similarity network (SSN) to achieve high prediction accuracy. Malik et al. [151] meanwhile employed the GeneticSeq2Vec approach, a k-hop neighborhood relation-based statistical representation scheme for RNA sequences, to generate graphs, which were then fed into an LSTM layer. Li et al. [142] discovered that an MLP with an attention layer and the Adam optimizer achieved the highest accuracy when applied to k-mer and CKSNAP features.
Deep learning methods typically demonstrate exceptional performance with high-dimensional feature input. When combined with traditional machine learning approaches, those models exhibit enhanced robustness and flexibility in RNA subcellular localization prediction, achieving higher accuracy and improved generalization. These approaches may require a larger dataset for training than traditional machine learning approaches, however.
III. Image-based and Hybrid Methods
In this section, we introduce the state-of-the-art RNA subcellular localization methods using image-based data or both sequence-based and image-based data. Fig. 3. summarizes the features and algorithms with different levels of complexity, including machine learning methods, deep learning methods, and complex models that involve multimodal data and multimodal or ensemble learning. Specifically, from simple to complex approaches, Machine learning methods leverage hand-crafted features extracted from images, such as SIFT, LBP, and HOG, to train models like SVM, RF, and neural networks (e.g., CNN, DNN, GNN, and LSTM). Deep learning approaches further utilize hand-crafted and intensity-based features, including intensity relationships and patterns, to improve predictive performance. Complex models build on these foundations by integrating multi-modal data, such as sequence and image inputs, to create more comprehensive and interpretable models for RNA subcellular localization.
Fig. 3. Three primary categories of computational methodologies for processing imaging data.
The dark yellow arrow illustrates the increasing complexity of prediction models, representing the progression toward more sophisticated computational frameworks. Light green rectangles: features used for model training; light purple rectangles: algorithms for location prediction. (A) Machine Learning Methods. Hand-crafted features are extracted from images and used to train simple models. (B) Deep Learning Methods. This approach employs hand-crafted features, intensity relationships, intensity features, etc. (C) Complex Models. This approach combines multi-modality data, such as sequence and image data, as inputs to create a more comprehensive and interpretable model for RNA subcellular localization. SVM (Support Vector Machine), RF (Random Forest), CNN (Convolutional Neural Network), DNN (Deep Neural Network), GNN (Graph Neural Network) LSTM (Long Short-Term Memory), SIFT (Scale-Invariant Feature Transform), LBP (Local Binary Pattern), and HOG (Histogram of Oriented Gradients)
Table 3 summarizes the main image-based and hybrid methods for RNA subcellular localization prediction, ordered chronologically according to their publication years. In the following sections, we give more details about the features and algorithms. To the best of our knowledge, hybrid algorithms like multimodal fusion models [188] and layered neural networks [194] were only proposed but not implemented. Therefore, there is no information about localizations or single/multiple labels, which are indicated by “N/A” in the table.
TABLE 3:
A summary of image-based and hybrid methods for RNA subcellular localization.
| Method | Feature | Algorithm | Localizations | Single / Multiple Labels | Reference | Year |
|---|---|---|---|---|---|---|
| Wang et al. | Hybrid | Multimodal fusion models | N/A | N/A | [178] | 2023 |
| Bento | Image (spatial features) | RF | Cell edge; cytoplasmic; nuclear; nuclear edge; random | Multiple | [179] | 2022 |
| Savulescu et al. | Hybrid | Layered neural networks | N/A | N/A | [180] | 2021 |
| Dubois et al. | Image (simulated smFISH pre-processed and down-sampled to 128 × 128 pixels) | CNN (SqueezeNet) | Cell edge; cell extension; foci; intranuclear; nuclear edge; polarized; random | Single | [181] | 2019 |
| Samacoits et al. | Image (curated localization features) | Clustering, RF | Foci; extension; nuclear envelope 2D; nuclear envelope 3D; random | Single | [182] | 2018 |
RF: Random Forest; smFISH: single molecule fluorescence in situ hybridization; CNN: Convolutional Neural Network. SqueezeNet is a kind of CNN. In Samacoits et al. [182], nuclear envelope 3D labels are used to discriminate intranuclear distribution and the nucleus membrane that are undisguisable from the view of nuclear envelope 2D. At the time of writing, the hybrid methods [178][180] have not been implemented and so there is no information (“N/A”) about localizations or single/multiple labels.
A. Image-based Features
Image-based features are integral to understanding RNA subcellular localization. Unlike sequence-based data, image data from bioimaging and simulation technologies provide spatial and morphological insights, which are essential for determining the precise localization of RNA molecules within different cellular compartments. The image-based features are derived from high-resolution cellular images, enabling researchers to capture and quantify complex biological structures and processes that are invisible to traditional sequencing approaches [183][184]. The application of image-based features in machine learning approaches to RNA localization is facilitated by advancements in microscopy and image processing technologies. In practice, since acquiring ground-truth image data is very expensive and time-consuming, simulated image data for RNA localization patterns were widely used [185][186].
Representative image-based features are key to identifying and understanding the spatial distribution of RNA within cells, which are inspired by the success with image-based mapping of subcellular protein distribution [187]. Raw image data such as smFISH are usually converted into 2D coordinate systems like transcriptomic data formats. Such conversions are helpful for the extraction of spatial features. For instance, the Bento toolkit [179] extracted a total of 13 spatial features that capture the sample’s point distribution, including proximity to cellular compartments and extensions, measures of symmetry about a center of mass, and measures of dispersion and point density. More sophisticated spatial-statistical features are also used, such as Ripley’s L-functions [188], morphological extraction with RNA counts enrichment ratio [189], and the correlation of z-positions of RNAs and the background intensity [182]. As a more recent trend, raw imaging data are lightly preprocessed before feeding deep neural networks directly, which can automatically extract and learn the most predictive features that capture subtle patterns and variations in RNA distribution that are useful for subcellular localization. Dubois et al.’s work [181] is an example of this trend.
B. Image-based Algorithms
To develop machine learning algorithms to predict RNA subcellular localizations from image-based features, the main task is to process and interpret the vast amounts of data they represent. Abundant in the image data are spatial transcriptomics, which were not fully utilized by traditional methods such as RNA sequencing (RNA-seq) or reverse transcription quantitative PCR (RT-qPCR) [178]. As spatial transcriptomics are orthogonal to nucleotide sequence-based data, image-base algorithms are complementary to sequence-based methods in the field of RNA subcellular localization. They not only streamline the analysis of complex image data but also open new avenues for discovering the roles and regulations of RNA and the underlying cellular mechanisms.
Typical image-based algorithms include supervised, self-supervised, and unsupervised techniques. As a robust ensemble supervised learning approach, RF is often chosen to predict RNA localizations from carefully engineered image-based features. In [14], Clarence et al. applied RF to mRNA localization as a multi-label classification problem. Each multi-label classifier consisted of 5 binary classifiers with the same base model, targeting at cell edge, cytoplasmic, nuclear, nuclear edge, and random localizations. In their experiment RF outperformed the popular SVM supervised algorithm and deep learning algorithms such as CNN. However, as a successful deep learning algorithm in computer image recognition [190] and a self-supervised approach, CNN is a preferred algorithm to handle bio-image data in general by automating feature engineering. In [181], for instance, Dubois et al. applied SqueezeNet as a kind of CNN to the classification of 7 mRNA localizations and achieved an overall accuracy of 91%. Unsupervised learning algorithms such as clustering are useful for exploring unlabeled image data and identify clusters as potential RNA subcellular localizations. For example, Samacoits et al. [182] showed how this can be achieved through various clustering methods, including k-means, spectral, and hierarchical clustering.
C. Hybrid Methods
As discussed above, conventional machine learning methods are uni-modal, meaning they leverage data of only one modality - either RNA sequence data or image data. While sequence data is a valuable source of sequence patterns, physicochemical features, etc. and image data is a valuable source of transcriptomic and spatial distribution patterns, neither of them alone fully utilize all the patterns encoding RNA subcellular localization. Evaluated by AUC, state-of-the-art algorithms typically fall within the range of 0.4 to 0.8 [131], offering room for improvement.
In recent years, multimodal or hybrid methods are on the rise. In contrast to sequence-only or image-only methods, hybrid methods leverage the complementary strengths of different data sources, facilitating a more holistic understanding of the biological systems involved. For instance, combining sequence-derived features with microscopy images can reveal insights into how sequence motifs are linked to spatial distribution on the subcellular level and improve the accuracy of RNA localization prediction.
The idea of multimodal learning is inspired by the popularity of multimodal deep learning [191], which usually integrates two or more modalities of text, audio, image, and video data, and the success of commercial large language models such as GPT-4o [192]. To utilize the complementary information from multiple modalities, fusion-based deep learning approaches [178] are promising. They have been successfully employed in biomedical fields such as disease prognosis and diagnosis [193], with one such example being breast cancer patient stratification by overall survival [194] when heterogeneous clinical data is available. At the time of writing, the authors were not aware of published multimodal learning algorithms used to predict RNA subcellular localization, although detailed algorithm frameworks have been proposed [178][180]. We envision breakthroughs in applications of hybrid models for RNA localization in near future.
IV. Challenges and future directions
Despite its remarkable progress, the application of machine learning approaches to RNA subcellular localization also presents many challenges.
One such challenge is the scarcity and quality of data. RNA localization data is often limited, and obtaining datasets that are high-quality, experimentally validated, and comprehensive can be resource intensive. For example, as Wang et al. [188] points out, the genome profile data may lack paired histopathology image data for multimodal deep learning approaches to localization prediction. This scarcity can lead to models that are trained on insufficient or biased data, impacting generalizability and predictive power.
Another significant challenge is the over-reliance on sequence data coupled with the underutilization of image data. Most current ML models heavily depend on sequence data, such as k-mer composition and RNA-protein binding motifs, which, while informative, may not capture the full complexity of RNA localization. The study by Gudenas and Wang [158] highlighted that k-mer composition accounted for 90% of their model’s decision-making process, indicating a potential over-reliance on sequence-based features. This narrow focus can overlook other critical determinants of RNA localization such as secondary and tertiary RNA structures, post-transcriptional modifications, and dynamic interactions within the cellular environment.
Although sequence data-based approaches far outnumber image data-based approaches, image data can provide rich spatial and contextual information about RNA molecules within cells, as was discussed in Section III. Despite advancements in bioimaging techniques, accurate and automated annotation remains a hurdle, which can also be very time-consuming. When there is a lack of high-quality real image data, simulated data such as smFISH are often used [187]. Imperfect segmentation and compartment identification can skew RNA quantification, impacting the reliability of ML predictions.
A further challenge is the lack of large-scale benchmark datasets for all RNA types to ensure replicability of experimental results. Popular open-source datasets such as the RNALocate-based mRNA dataset [114] are often limited in size (<10k samples), restricted to one type of RNA (e.g., mRNA), and biased towards sequence data, in particular nucleotide sequences. The absence of comprehensive, standardized datasets makes it difficult to validate and compare the performance of different ML models. Benchmark datasets are crucial for assessing the robustness and generalizability of ML approaches across various RNA types and experimental conditions.
On the flip side, these challenges present opportunities for advancement of future directions in this field. Addressing data scarcity through the development of high-throughput experimental techniques and collaborative data-sharing initiatives can significantly enhance model training. Additionally, diversifying the types of features used in ML models — by incorporating structural, biochemical, and interaction data alongside sequence data — can lead to more comprehensive and accurate predictions. As we discussed in Section III, hybrid methods such as multimodal learning algorithms can utilize the complementary information encoded in both sequences and images, which can offer a holistic view of RNA localization.
While the field of RNA subcellular localization faces significant challenges related to data scarcity, reliance on sequence data, the underutilization of image data, and the lack of large-scale benchmark datasets, these obstacles also present exciting opportunities. Efforts to build standardized datasets or comprehensive databases of RNA localization, such as lncLocator and lncATLAS for LncRNA localization [195], provide valuable resources for the research community. Integrating diverse data types, such as sequence data and image data, can enhance predictive models. Hybrid models such as multimodal learning approaches [188] that combine these data sources have shown promise in improving prediction accuracy and providing deeper insights into RNA localization mechanisms. By embracing a multifaceted approach that combines innovative experimental techniques, comprehensive data integration, and hybrid ML methodologies, researchers can overcome these challenges and unlock the full potential of ML in understanding the complexities of RNA subcellular localization.
V. Conclusion
This comprehensive review on RNA subcellular localization has explored three main categories of machine learning approaches: sequence-based, image-based, and hybrid methods. A significant contribution of this work lies in its broad coverage that encompasses various RNA types, including mRNA, lncRNA, miRNA, and other small RNAs. Sequence-based methods utilize RNA sequences to predict subcellular localization, benefiting from the ease of sequence acquisition. Image-based methods leverage bioimaging techniques to analyze RNA localization within cellular compartments, providing a more direct observation of RNA distribution. The integration of sequence and image data in hybrid methods represents a more promising direction, combining the strengths of both data types while eliminating the weakness of either one alone to enhance predictive accuracy by capturing the both the intricate details and the bigger picture of RNA sequences and their spatial context within the cell.
The field of RNA subcellular localization prediction still has several key challenges to address, however. The development of more robust and generalizable models requires larger and more diverse datasets, improved feature extraction techniques, and more sophisticated algorithms capable of handling the complexity of biological data. Additionally, there is a need for standardized benchmarks and evaluation metrics to facilitate proper comparison of different methods and ensure reproducibility.
The convergence of machine learning and RNA biology holds great promise for advancing our understanding of RNA subcellular localization. By continuing to refine and integrate computational methods with experimental approaches, we can unlock new insights into the spatial dynamics of RNA, paving the way for novel therapeutic strategies and a deeper understanding of cellular biology.
Acknowledgements
Research reported in this publication was supported by the National Cancer Institute of the National Institutes of Health under Award Number P30CA036727, and by the Office of The Director, National Institutes Of Health of the National Institutes of Health under Award Number R03OD038391. This work was supported by the American Cancer Society under award number IRG-22-146-07-IRG, and by the Buffett Cancer Center, which is supported by the National Cancer Institute under award number CA036727. This work was supported by the Buffet Cancer Center, which is supported by the National Cancer Institute under award number CA036727, in collaboration with the UNMC/Children’s Hospital & Medical Center Child Health Research Institute Pediatric Cancer Research Group. This study was supported, in part, by the National Institute on Alcohol Abuse and Alcoholism (P50AA030407-5126, Pilot Core grant). This study was also supported by the Nebraska EPSCoR FIRST Award (OIA-2044049). This work was also partially supported by the National Institute of General Medical Sciences under Award Numbers P20GM103427 and P20GM130447. This study was in part financially supported by the Child Health Research Institute at UNMC/Children’s Nebraska. This work was also partially supported by the University of Nebraska Collaboration Initiative Grant from the Nebraska Research Initiative (NRI). The content is solely the responsibility of the authors and does not necessarily represent the official views from the funding organizations.
References
- [1].Vihinen M., “Systematics for types and effects of RNA variations,” RNA Biol., vol. 18, no. 4, pp. 481–498, Apr. 2021, doi: 10.1080/15476286.2020.1817266. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [2].Guo Y. et al. “RNAseq by Total RNA Library Identifies Additional RNAs Compared to Poly(A) RNA Library,” BioMed Res. Int., vol. 2015, p. 862130, 2015, doi: 10.1155/2015/862130. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [3].Adeli K., “Translational control mechanisms in metabolic regulation: critical role of RNA binding proteins, microRNAs, and cytoplasmic RNA granules,” Am. J. Physiol. Endocrinol. Metab., vol. 301, no. 6, pp. E1051–1064, Dec. 2011, doi: 10.1152/ajpendo.00399.2011. [DOI] [PubMed] [Google Scholar]
- [4].Wilkinson M. F. and Shyu A. B., “Multifunctional regulatory proteins that control gene expression in both the nucleus and the cytoplasm,” BioEssays News Rev. Mol. Cell. Dev. Biol., vol. 23, no. 9, pp. 775–787, Sep. 2001, doi: 10.1002/bies.1113. [DOI] [PubMed] [Google Scholar]
- [5].Shatkin A. J. and Manley J. L., “The ends of the affair: capping and polyadenylation,” Nat. Struct. Biol., vol. 7, no. 10, pp. 838–842, Oct. 2000, doi: 10.1038/79583. [DOI] [PubMed] [Google Scholar]
- [6].Barnett W. E., Brown D. H., and Epler J. L., “Mitochondrial-specific aminoacyl-RNA synthetases.,” Proc. Natl. Acad. Sci., vol. 57, no. 6, pp. 1775–1781, Jun. 1967, doi: 10.1073/pnas.57.6.1775. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [7].Pesaresi P. et al. “Nuclear Photosynthetic Gene Expression Is Synergistically Modulated by Rates of Protein Synthesis in Chloroplasts and Mitochondria,” Plant Cell, vol. 18, no. 4, pp. 970–991, Apr. 2006, doi: 10.1105/tpc.105.039073. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [8].Shatkin A. J. and Both G. W., “Reovirus mRNA: transcription and translation,” Cell, vol. 7, no. 3, pp. 305–313, Mar. 1976, doi: 10.1016/0092-8674(76)90159-8. [DOI] [PubMed] [Google Scholar]
- [9].Slobodin B. and Dikstein R., “So close, no matter how far: multiple paths connecting transcription to mRNA translation in eukaryotes,” EMBO Rep., vol. 21, no. 9, p. e50799, Sep. 2020, doi: 10.15252/embr.202050799. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [10].“Regulation of Protein Synthesis by mRNA Structure - Research Projects - Ermolenko Lab - University of Rochester Medical Center.” Accessed: Nov. 22, 2024. [Online]. Available: https://www.urmc.rochester.edu/labs/ermolenko/projects/regulation-of-protein-synthesis-by-mrna-structure.aspx# [Google Scholar]
- [11].Hargrove J. L. and Schmidt F. H., “The role of mRNA and protein stability in gene expression,” FASEB J. Off. Publ. Fed. Am. Soc. Exp. Biol., vol. 3, no. 12, pp. 2360–2370, Oct. 1989, doi: 10.1096/fasebj.3.12.2676679. [DOI] [PubMed] [Google Scholar]
- [12].Kloc M., Zearfoss N. R., and Etkin L. D., “Mechanisms of subcellular mRNA localization,” Cell, vol. 108, no. 4, pp. 533–544, Feb. 2002, doi: 10.1016/s0092-8674(02)00651-7. [DOI] [PubMed] [Google Scholar]
- [13].Martin K. C. and Ephrussi A., “mRNA localization: gene expression in the spatial dimension,” Cell, vol. 136, no. 4, pp. 719–730, Feb. 2009, doi: 10.1016/j.cell.2009.01.044. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [14].Jeffery W. R., Tomlinson C. R., and Brodeur R. D., “Localization of actin messenger RNA during early ascidian development,” Dev. Biol., vol. 99, no. 2, pp. 408–417, Oct. 1983, doi: 10.1016/0012-1606(83)90290-7. [DOI] [PubMed] [Google Scholar]
- [15].Lawrence J. B. and Singer R. H., “Intracellular localization of messenger RNAs for cytoskeletal proteins,” Cell, vol. 45, no. 3, pp. 407–415, May 1986, doi: 10.1016/0092-8674(86)90326-0. [DOI] [PubMed] [Google Scholar]
- [16].Holt C. E. and Bullock S. L., “Subcellular mRNA localization in animal cells and why it matters,” Science, vol. 326, no. 5957, pp. 1212–1216, Nov. 2009, doi: 10.1126/science.1176488. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [17].Medioni C., Mowry K., and Besse F., “Principles and roles of mRNA localization in animal development,” Dev. Camb. Engl., vol. 139, no. 18, pp. 3263–3276, Sep. 2012, doi: 10.1242/dev.078626. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [18].Das S., Vera M., Gandin V., Singer R. H., and Tutucci E., “Intracellular mRNA transport and localized translation,” Nat. Rev. Mol. Cell Biol., vol. 22, no. 7, pp. 483–504, Jul. 2021, doi: 10.1038/s41580-021-00356-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [19].Bergalet J. and Lécuyer E., “The functions and regulatory principles of mRNA intracellular trafficking,” Adv. Exp. Med. Biol., vol. 825, pp. 57–96, 2014, doi: 10.1007/978-1-4939-1221-6_2. [DOI] [PubMed] [Google Scholar]
- [20].Meer E. J., Wang D. O., Kim S., Barr I., Guo F., and Martin K. C., “Identification of a cis-acting element that localizes mRNA to synapses,” Proc. Natl. Acad. Sci., vol. 109, no. 12, pp. 4639–4644, Mar. 2012, doi: 10.1073/pnas.1116269109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [21].Tian L. and Okita T. W., “mRNA-based protein targeting to the endoplasmic reticulum and chloroplasts in plant cells,” Curr. Opin. Plant Biol., vol. 22, pp. 77–85, Dec. 2014, doi: 10.1016/j.pbi.2014.09.007. [DOI] [PubMed] [Google Scholar]
- [22].Besse F. and Ephrussi A., “Translational control of localized mRNAs: restricting protein synthesis in space and time,” Nat. Rev. Mol. Cell Biol., vol. 9, no. 12, pp. 971–980, Dec. 2008, doi: 10.1038/nrm2548. [DOI] [PubMed] [Google Scholar]
- [23].Meignin C. and Davis I., “Transmitting the message: intracellular mRNA localization,” Curr. Opin. Cell Biol., vol. 22, no. 1, pp. 112–119, Feb. 2010, doi: 10.1016/j.ceb.2009.11.011. [DOI] [PubMed] [Google Scholar]
- [24].Chabanon H., Mickleburgh I., and Hesketh J., “Zipcodes and postage stamps: mRNA localisation signals and their trans-acting binding proteins,” Brief. Funct. Genomic. Proteomic., vol. 3, no. 3, pp. 240–256, Nov. 2004, doi: 10.1093/bfgp/3.3.240. [DOI] [PubMed] [Google Scholar]
- [25].Hafner M. et al. “CLIP and complementary methods,” Nat. Rev. Methods Primer, vol. 1, no. 1, pp. 1–23, Mar. 2021, doi: 10.1038/s43586-021-00018-1. [DOI] [Google Scholar]
- [26].Carlevaro-Fita J. and Johnson R., “Global Positioning System: Understanding Long Noncoding RNAs through Subcellular Localization,” Mol. Cell, vol. 73, no. 5, pp. 869–883, Mar. 2019, doi: 10.1016/j.molcel.2019.02.008. [DOI] [PubMed] [Google Scholar]
- [27].Dykes I. M. and Emanueli C., “Transcriptional and Post-transcriptional Gene Regulation by Long Non-coding RNA,” Genomics Proteomics Bioinformatics, vol. 15, no. 3, pp. 177–186, Jun. 2017, doi: 10.1016/j.gpb.2016.12.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [28].Zhang Z.-Y., Yang Y.-H., Ding H., Wang D., Chen W., and Lin H., “Design powerful predictor for mRNA subcellular location prediction in Homo sapiens,” Brief. Bioinform., vol. 22, no. 1, pp. 526–535, Jan. 2021, doi: 10.1093/bib/bbz177. [DOI] [PubMed] [Google Scholar]
- [29].Reichard J. and Zimmer-Bensch G., “The Epigenome in Neurodevelopmental Disorders,” Front. Neurosci., vol. 15, p. 776809, 2021, doi: 10.3389/fnins.2021.776809. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [30].“Brain Organoids and Long Non-coding RNAs: Neuroprotective Pathways in Alcohol-Induced Developmental Neurotoxicity - ProQuest.” Accessed: Nov. 22, 2024. [Online]. Available: https://www.proquest.com/docview/2621683760 [Google Scholar]
- [31].Jung J., “Molecular characterisation of the transcription factor – chromatin landscape interplay during neuronal cell fate acquisition,” Johannes Gutenberg-Universität Mainz, 2018. doi: 10.25358/OPENSCIENCE-3841. [DOI] [Google Scholar]
- [32].Liu Y. et al. “LncRNA SNHG1 enhances cell proliferation, migration, and invasion in cervical cancer,” Biochem. Cell Biol. Biochim. Biol. Cell., vol. 96, no. 1, pp. 38–43, Feb. 2018, doi: 10.1139/bcb-2017-0188. [DOI] [PubMed] [Google Scholar]
- [33].Sosnovski K. E. et al. “Reduced LHFPL3-AS2 lncRNA expression is linked to altered epithelial polarity and proliferation, and to ileal ulceration in Crohn disease,” Sci. Rep., vol. 13, no. 1, p. 20513, Nov. 2023, doi: 10.1038/s41598-023-47997-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [34].Smalheiser N. R., “The RNA-centred view of the synapse: non-coding RNAs and synaptic plasticity,” Philos. Trans. R. Soc. B Biol. Sci., vol. 369, no. 1652, p. 20130504, Sep. 2014, doi: 10.1098/rstb.2013.0504. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [35].Liu J., Yang L.-Z., and Chen L.-L., “Understanding lncRNA–protein assemblies with imaging and single-molecule approaches,” Curr. Opin. Genet. Dev., vol. 72, pp. 128–137, Feb. 2022, doi: 10.1016/j.gde.2021.11.005. [DOI] [PubMed] [Google Scholar]
- [36].Bouckenheimer J. et al. “Long non-coding RNAs in human early embryonic development and their potential in ART,” Hum. Reprod. Update, vol. 23, no. 1, pp. 19–40, Dec. 2016, doi: 10.1093/humupd/dmw035. [DOI] [PubMed] [Google Scholar]
- [37].Solé C., Nadal-Ribelles M., de Nadal E., and Posas F., “A novel role for lncRNAs in cell cycle control during stress adaptation,” Curr. Genet., vol. 61, no. 3, pp. 299–308, 2015, doi: 10.1007/s00294-014-0453-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [38].Mirzadeh Azad F., Polignano I. L., Proserpio V., and Oliviero S., “Long Noncoding RNAs in Human Stemness and Differentiation,” Trends Cell Biol., vol. 31, no. 7, pp. 542–555, Jul. 2021, doi: 10.1016/j.tcb.2021.02.002. [DOI] [PubMed] [Google Scholar]
- [39].Kunej T., Obsteter J., Pogacar Z., Horvat S., and Calin G. A., “The decalog of long non-coding RNA involvement in cancer diagnosis and monitoring,” Crit. Rev. Clin. Lab. Sci., vol. 51, no. 6, pp. 344–357, Dec. 2014, doi: 10.3109/10408363.2014.944299. [DOI] [PubMed] [Google Scholar]
- [40].Wu T. and Du Y., “LncRNAs: From Basic Research to Medical Application,” Int. J. Biol. Sci., vol. 13, no. 3, pp. 295–307, 2017, doi: 10.7150/ijbs.16968. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [41].Prabhakar B., Zhong X.-B., and Rasmussen T. P., “Exploiting Long Noncoding RNAs as Pharmacological Targets to Modulate Epigenetic Diseases,” Yale J. Biol. Med., vol. 90, no. 1, pp. 73–86, Mar. 2017. [PMC free article] [PubMed] [Google Scholar]
- [42].Pandey G. K. et al. “The risk-associated long noncoding RNA NBAT-1 controls neuroblastoma progression by regulating cell proliferation and neuronal differentiation,” Cancer Cell, vol. 26, no. 5, pp. 722–737, Nov. 2014, doi: 10.1016/j.ccell.2014.09.014. [DOI] [PubMed] [Google Scholar]
- [43].Tsagakis I., Douka K., Birds I., and Aspden J. L., “Long non-coding RNAs in development and disease: conservation to mechanisms,” J. Pathol., vol. 250, no. 5, pp. 480–495, Apr. 2020, doi: 10.1002/path.5405. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [44].Wu Y.-Y. and Kuo H.-C., “Functional roles and networks of non-coding RNAs in the pathogenesis of neurodegenerative diseases,” J. Biomed. Sci., vol. 27, no. 1, p. 49, Apr. 2020, doi: 10.1186/s12929-020-00636-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [45].Scotter E. L., Chen H.-J., and Shaw C. E., “TDP-43 Proteinopathy and ALS: Insights into Disease Mechanisms and Therapeutic Targets,” Neurother. J. Am. Soc. Exp. Neurother., vol. 12, no. 2, pp. 352–363, Apr. 2015, doi: 10.1007/s13311-015-0338-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [46].Liao K., Xu J., Yang W., You X., Zhong Q., and Wang X., “The research progress of LncRNA involved in the regulation of inflammatory diseases,” Mol. Immunol., vol. 101, pp. 182–188, Sep. 2018, doi: 10.1016/j.molimm.2018.05.030. [DOI] [PubMed] [Google Scholar]
- [47].Edgington-Mitchell L. E., “Long noncoding RNAs: novel links to inflammatory bowel disease?,” Am. J. Physiol. Gastrointest. Liver Physiol., vol. 311, no. 3, pp. G444–445, Sep. 2016, doi: 10.1152/ajpgi.00271.2016. [DOI] [PubMed] [Google Scholar]
- [48].Zacharopoulou E., Gazouli M., Tzouvala M., Vezakis A., and Karamanolis G., “The contribution of long non-coding RNAs in Inflammatory Bowel Diseases,” Dig. Liver Dis. Off. J. Ital. Soc. Gastroenterol. Ital. Assoc. Study Liver, vol. 49, no. 10, pp. 1067–1072, Oct. 2017, doi: 10.1016/j.dld.2017.08.003. [DOI] [PubMed] [Google Scholar]
- [49].Zou H. et al. “The role of lncRNAs in hepatocellular carcinoma: opportunities as novel targets for pharmacological intervention,” Expert Rev. Gastroenterol. Hepatol., vol. 10, no. 3, pp. 331–340, 2016, doi: 10.1586/17474124.2016.1116382. [DOI] [PubMed] [Google Scholar]
- [50].Grosshans H. and Filipowicz W., “Molecular biology: the expanding world of small RNAs,” Nature, vol. 451, no. 7177, pp. 414–416, Jan. 2008, doi: 10.1038/451414a. [DOI] [PubMed] [Google Scholar]
- [51].Storz G., Vogel J., and Wassarman K. M., “Regulation by small RNAs in bacteria: expanding frontiers,” Mol. Cell, vol. 43, no. 6, pp. 880–891, Sep. 2011, doi: 10.1016/j.molcel.2011.08.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [52].Babski J. et al. “Small regulatory RNAs in Archaea,” RNA Biol., vol. 11, no. 5, p. 484, Mar. 2014, doi: 10.4161/rna.28452. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [53].Li L., Xu J., Yang D., Tan X., and Wang H., “Computational approaches for microRNA studies: a review,” Mamm. Genome Off. J. Int. Mamm. Genome Soc., vol. 21, no. 1–2, pp. 1–12, Feb. 2010, doi: 10.1007/s00335-009-9241-2. [DOI] [PubMed] [Google Scholar]
- [54].Asim M. N., Ibrahim M. A., Imran Malik M., Dengel A., and Ahmed S., “Advances in Computational Methodologies for Classification and Sub-Cellular Locality Prediction of Non-Coding RNAs,” Int. J. Mol. Sci., vol. 22, no. 16, p. 8719, Aug. 2021, doi: 10.3390/ijms22168719. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [55].Hamidi F. et al. “Identifying potential circulating miRNA biomarkers for the diagnosis and prediction of ovarian cancer using machine-learning approach: application of Boruta,” Front. Digit. Health, vol. 5, Aug. 2023, doi: 10.3389/fdgth.2023.1187578. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [56].Ren J., Guo W., Feng K., Huang T., and Cai Y., “Identifying MicroRNA Markers That Predict COVID-19 Severity Using Machine Learning Methods,” Life Basel Switz., vol. 12, no. 12, p. 1964, Nov. 2022, doi: 10.3390/life12121964. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [57].Yu L., Ju B., and Ren S., “HLGNN-MDA: Heuristic Learning Based on Graph Neural Networks for miRNA-Disease Association Prediction,” Int. J. Mol. Sci., vol. 23, no. 21, p. 13155, Oct. 2022, doi: 10.3390/ijms232113155. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [58].“ReHoGCNES-MDA: prediction of miRNA-disease associations using homogenous graph convolutional networks based on regular graph with random edge sampler | Briefings in Bioinformatics | Oxford Academic.” Accessed: Nov. 22, 2024. [Online]. Available: https://academic.oup.com/bib/article/25/2/bbae103/7631472 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [59].Liegro C. M. D., Schiera G., and Liegro I. D., “Regulation of mRNA transport, localization and translation in the nervous system of mammals (Review),” Int. J. Mol. Med., vol. 33, no. 4, p. 747, Jan. 2014, doi: 10.3892/ijmm.2014.1629. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [60].Neelamraju Y., Gonzalez-Perez A., Bhat-Nakshatri P., Nakshatri H., and Janga S. C., “Mutational landscape of RNA-binding proteins in human cancers,” RNA Biol., vol. 15, no. 1, p. 115, Nov. 2017, doi: 10.1080/15476286.2017.1391436. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [61].Leucci E. et al. “Melanoma addiction to the long non-coding RNA SAMMSON,” Nature, vol. 531, no. 7595, pp. 518–522, Mar. 2016, doi: 10.1038/nature17161. [DOI] [PubMed] [Google Scholar]
- [62].Zhang B. et al. “A comprehensive expression landscape of RNA-binding proteins (RBPs) across 16 human cancer types,” RNA Biol., vol. 17, no. 2, pp. 211–226, Feb. 2020, doi: 10.1080/15476286.2019.1673657. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [63].Uemura M., Zheng Q., Koh C. M., Nelson W. G., Yegnasubramanian S., and De Marzo A. M., “Overexpression of ribosomal RNA in prostate cancer is common but not linked to rDNA promoter hypomethylation,” Oncogene, vol. 31, no. 10, pp. 1254–1263, Mar. 2012, doi: 10.1038/onc.2011.319. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [64].Dolezal J. M., Dash A. P., and Prochownik E. V., “Diagnostic and prognostic implications of ribosomal protein transcript expression patterns in human cancers,” BMC Cancer, vol. 18, no. 1, p. 275, Mar. 2018, doi: 10.1186/s12885-018-4178-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [65].Cooper T. A., Wan L., and Dreyfuss G., “RNA and Disease,” Cell, vol. 136, no. 4, p. 777, Feb. 2009, doi: 10.1016/j.cell.2009.02.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [66].Sprenkle N. T., Sims S. G., Sánchez C. L., and Meares G. P., “Endoplasmic reticulum stress and inflammation in the central nervous system,” Mol. Neurodegener., vol. 12, no. 1, p. 42, May 2017, doi: 10.1186/s13024-017-0183-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [67].Nousiainen H. O. et al. “Mutations in mRNA export mediator GLE1 result in a fetal motoneuron disease,” Nat. Genet., vol. 40, no. 2, pp. 155–157, Feb. 2008, doi: 10.1038/ng.2007.65. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [68].Okamura M. et al. “Depletion of mRNA export regulator DBP5/DDX19, GLE1 or IPPK that is a key enzyme for the production of IP6, resulting in differentially altered cytoplasmic mRNA expression and specific cell defect,” PloS One, vol. 13, no. 5, p. e0197165, 2018, doi: 10.1371/journal.pone.0197165. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [69].Bassell G. J. and Warren S. T., “Fragile X syndrome: loss of local mRNA regulation alters synaptic development and function,” Neuron, vol. 60, no. 2, pp. 201–214, Oct. 2008, doi: 10.1016/j.neuron.2008.10.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [70].Dictenberg J. B., Swanger S. A., Antar L. N., Singer R. H., and Bassell G. J., “A direct role for FMRP in activity-dependent dendritic mRNA transport links filopodial-spine morphogenesis to fragile X syndrome,” Dev. Cell, vol. 14, no. 6, pp. 926–939, Jun. 2008, doi: 10.1016/j.devcel.2008.04.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [71].Didiot M.-C. et al. “Nuclear Localization of Huntingtin mRNA Is Specific to Cells of Neuronal Origin,” Cell Rep., vol. 24, no. 10, pp. 2553–2560.e5, Sep. 2018, doi: 10.1016/j.celrep.2018.07.106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [72].Liu-Yesucevitz L. et al. “Local RNA translation at the synapse and in disease,” J. Neurosci. Off. J. Soc. Neurosci., vol. 31, no. 45, pp. 16086–16093, Nov. 2011, doi: 10.1523/JNEUROSCI.4105-11.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [73].Wan S., Mak M.-W., and Kung S.-Y., “GOASVM: a subcellular location predictor by incorporating term-frequency gene ontology into the general form of Chou’s pseudo-amino acid composition,” J. Theor. Biol., vol. 323, pp. 40–48, Apr. 2013, doi: 10.1016/j.jtbi.2013.01.012. [DOI] [PubMed] [Google Scholar]
- [74].Wan S., Mak M.-W., and Kung S.-Y., “mGOASVM: Multi-label protein subcellular localization based on gene ontology and support vector machines,” BMC Bioinformatics, vol. 13, no. 1, p. 290, Nov. 2012, doi: 10.1186/1471-2105-13-290. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [75].Wan S., Mak M.-W., and Kung S.-Y., “R3P-Loc: a compact multi-label predictor using ridge regression and random projection for protein subcellular localization,” J. Theor. Biol., vol. 360, pp. 34–45, Nov. 2014, doi: 10.1016/j.jtbi.2014.06.031. [DOI] [PubMed] [Google Scholar]
- [76].Wan S., Mak M.-W., and Kung S.-Y., “mPLR-Loc: An adaptive decision multi-label classifier based on penalized logistic regression for protein subcellular localization prediction,” Anal. Biochem., vol. 473, pp. 14–27, Mar. 2015, doi: 10.1016/j.ab.2014.10.014. [DOI] [PubMed] [Google Scholar]
- [77].Wan S., Mak M.-W., and Kung S.-Y., “HybridGO-Loc: mining hybrid features on gene ontology for predicting subcellular localization of multi-location proteins,” PloS One, vol. 9, no. 3, p. e89545, 2014, doi: 10.1371/journal.pone.0089545. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [78].Wan S., Mak M.-W., and Kung S.-Y., “FUEL-mLoc: feature-unified prediction and explanation of multi-localization of cellular proteins in multiple organisms,” Bioinforma. Oxf. Engl., vol. 33, no. 5, pp. 749–750, Mar. 2017, doi: 10.1093/bioinformatics/btw717. [DOI] [PubMed] [Google Scholar]
- [79].Wan S., Mak M.-W., and Kung S.-Y., “mLASSO-Hum: A LASSO-based interpretable human-protein subcellular localization predictor,” J. Theor. Biol., vol. 382, pp. 223–234, Oct. 2015, doi: 10.1016/j.jtbi.2015.06.042. [DOI] [PubMed] [Google Scholar]
- [80].Wan S., Mak M.-W., and Kung S.-Y., “Sparse regressions for predicting and interpreting subcellular localization of multi-label proteins,” BMC Bioinformatics, vol. 17, p. 97, Feb. 2016, doi: 10.1186/s12859-016-0940-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [81].Wan S., Mak M.-W., and Kung S.-Y., “Gram-LocEN: Interpretable prediction of subcellular multi-localization of Gram-positive and Gram-negative bacterial proteins,” Chemom. Intell. Lab. Syst., vol. 162, pp. 1–9, Mar. 2017, doi: 10.1016/j.chemolab.2016.12.014. [DOI] [Google Scholar]
- [82].Wan S., Mak M.-W., and Kung S.-Y., “Ensemble Linear Neighborhood Propagation for Predicting Subchloroplast Localization of Multi-Location Proteins,” J. Proteome Res., vol. 15, no. 12, pp. 4755–4762, Dec. 2016, doi: 10.1021/acs.jproteome.6b00686. [DOI] [PubMed] [Google Scholar]
- [83].Wan S., Mak M.-W., and Kung S.-Y., “Transductive Learning for Multi-Label Protein Subchloroplast Localization Prediction,” IEEE/ACM Trans. Comput. Biol. Bioinform., vol. 14, no. 1, pp. 212–224, 2017, doi: 10.1109/TCBB.2016.2527657. [DOI] [PubMed] [Google Scholar]
- [84].Wan S., Mak M.-W., and Kung S.-Y., “Mem-mEN: Predicting Multi-Functional Types of Membrane Proteins by Interpretable Elastic Nets,” IEEE/ACM Trans. Comput. Biol. Bioinform., vol. 13, no. 4, pp. 706–718, 2016, doi: 10.1109/TCBB.2015.2474407. [DOI] [PubMed] [Google Scholar]
- [85].Wan S., Mak M.-W., and Kung S.-Y., “Mem-ADSVM: A two-layer multi-label predictor for identifying multi-functional types of membrane proteins,” J. Theor. Biol., vol. 398, pp. 32–42, Jun. 2016, doi: 10.1016/j.jtbi.2016.03.013. [DOI] [PubMed] [Google Scholar]
- [86].Giannakoulias S., Ferrie J. J., Apicello A., and Mitchell C., “Prot-SCL: State of the art Prediction of Protein Subcellular Localization from Primary Sequence Using Contrastive Learning,” Sep. 05, 2023, bioRxiv. doi: 10.1101/2023.09.01.555932. [DOI] [Google Scholar]
- [87].Wang H., Ding Y., Tang J., Zou Q., and Guo F., “Identify RNA-associated subcellular localizations based on multi-label learning using Chou’s 5-steps rule,” BMC Genomics, vol. 22, no. 1, p. 56, Jan. 2021, doi: 10.1186/s12864-020-07347-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [88].Yang L., Lv Y., Li T., Zuo Y., and Jiang W., “Human proteins characterization with subcellular localizations,” J. Theor. Biol., vol. 358, pp. 61–73, Oct. 2014, doi: 10.1016/j.jtbi.2014.05.008. [DOI] [PubMed] [Google Scholar]
- [89].Wang H., Ding Y., Tang J., Zou Q., and Guo F., “Multi-label learning for identi_cation of RNA-associated subcellular localizations,” Aug. 19, 2020. doi: 10.21203/rs.3.rs-55447/v1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [90].Villanueva E. et al. “System-wide analysis of RNA and protein subcellular localization dynamics,” Nat. Methods, vol. 21, no. 1, pp. 60–71, Jan. 2024, doi: 10.1038/s41592-023-02101-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [91].Pan X., Lu L., and Cai Y.-D., “Predicting protein subcellular location with network embedding and enrichment features,” Biochim. Biophys. Acta Proteins Proteomics, vol. 1868, no. 10, p. 140477, Oct. 2020, doi: 10.1016/j.bbapap.2020.140477. [DOI] [PubMed] [Google Scholar]
- [92].Liao Z., Pan G., Sun C., and Tang J., “Predicting subcellular location of protein with evolution information and sequence-based deep learning,” BMC Bioinformatics, vol. 22, no. 10, p. 515, Oct. 2021, doi: 10.1186/s12859-021-04404-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [93].Xiao H., Zou Y., Wang J., and Wan S., “A Review for Artificial Intelligence Based Protein Subcellular Localization,” Biomolecules, vol. 14, no. 4, p. 409, Mar. 2024, doi: 10.3390/biom14040409. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [94].Briesemeister S., Rahnenführer J., and Kohlbacher O., “Going from where to why—interpretable prediction of protein subcellular localization,” Bioinformatics, vol. 26, no. 9, p. 1232, Mar. 2010, doi: 10.1093/bioinformatics/btq115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [95].Cui T. et al. “RNALocate v2.0: an updated resource for RNA subcellular localization with increased coverage and annotation,” Nucleic Acids Res., vol. 50, no. D1, pp. D333–D339, Jan. 2022, doi: 10.1093/nar/gkab825. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [96].Wang Y. et al. “IDDLncLoc: Subcellular Localization of LncRNAs Based on a Framework for Imbalanced Data Distributions,” Interdiscip. Sci. Comput. Life Sci., vol. 14, no. 2, pp. 409–420, Jun. 2022, doi: 10.1007/s12539-021-00497-6. [DOI] [PubMed] [Google Scholar]
- [97].Yang R., Gao S., Fu Y., and Zhang L., “lncSLPre: An Ensemble Method with Multi-Source Sequence Descriptors to Predict lncRNA Subcellular Localizations from Imbalanced Data,” Jul. 26, 2023, Rochester, NY: 4515036. doi: 10.2139/ssrn.4515036. [DOI] [Google Scholar]
- [98].Raj A., van den Bogaard P., Rifkin S. A., van Oudenaarden A., and Tyagi S., “Imaging individual mRNA molecules using multiple singly labeled probes,” Nat. Methods, vol. 5, no. 10, pp. 877–879, Oct. 2008, doi: 10.1038/nmeth.1253. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [99].Chen K. H., Boettiger A. N., Moffitt J. R., Wang S., and Zhuang X., “Spatially resolved, highly multiplexed RNA profiling in single cells,” Science, vol. 348, no. 6233, p. aaa6090, Apr. 2015, doi: 10.1126/science.aaa6090. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [100].Eng C.-H. L. et al. “Transcriptome-scale super-resolved imaging in tissues by RNA seqFISH+,” Nature, vol. 568, no. 7751, pp. 235–239, Apr. 2019, doi: 10.1038/s41586-019-1049-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [101].Merritt C. R. et al. “Multiplex digital spatial profiling of proteins and RNA in fixed tissue,” Nat. Biotechnol., vol. 38, no. 5, pp. 586–599, May 2020, doi: 10.1038/s41587-020-0472-9. [DOI] [PubMed] [Google Scholar]
- [102].Kaewsapsak P., Shechner D. M., Mallard W., Rinn J. L., and Ting A. Y., “Live-cell mapping of organelle-associated RNAs via proximity biotinylation combined with protein-RNA crosslinking,” eLife, vol. 6, p. e29224, Dec. 2017, doi: 10.7554/eLife.29224. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [103].Benoit Bouvrette L. P. et al. “CeFra-seq reveals broad asymmetric mRNA and noncoding RNA distribution profiles in Drosophila and human cells,” RNA N. Y. N, vol. 24, no. 1, pp. 98–113, Jan. 2018, doi: 10.1261/rna.063172.117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [104].Stoute J. and Liu K. F., “CLIP-Seq to identify targets and interactions of RNA binding proteins and RNA modifying enzymes,” Methods Enzymol., vol. 658, p. 419, Aug. 2021, doi: 10.1016/bs.mie.2021.08.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [105].“MiRGOFS: a GO-based functional similarity measurement for miRNAs, with applications to the prediction of miRNA subcellular localization and miRNA–disease association | Bioinformatics | Oxford Academic.” Accessed: Nov. 22, 2024. [Online]. Available: https://academic.oup.com/bioinformatics/article/34/20/3547/4989872 [DOI] [PubMed] [Google Scholar]
- [106].Xu M., Chen Y., Xu Z., Zhang L., Jiang H., and Pian C., “MiRLoc: predicting miRNA subcellular localization by incorporating miRNA-mRNA interactions and mRNA subcellular localization,” Brief. Bioinform., vol. 23, no. 2, p. bbac044, Mar. 2022, doi: 10.1093/bib/bbac044. [DOI] [PubMed] [Google Scholar]
- [107].Meher P. K., Satpathy S., and Rao A. R., “miRNALoc: predicting miRNA subcellular localizations based on principal component scores of physico-chemical properties and pseudo compositions of di-nucleotides,” Sci. Rep., vol. 10, no. 1, p. 14557, Sep. 2020, doi: 10.1038/s41598-020-71381-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [108].Li W. and Godzik A., “Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences,” Bioinformatics, vol. 22, no. 13, pp. 1658–1659, Jul. 2006, doi: 10.1093/bioinformatics/btl158. [DOI] [PubMed] [Google Scholar]
- [109].Chawla N. V., Bowyer K. W., Hall L. O., and Kegelmeyer W. P., “SMOTE: Synthetic Minority Over-sampling Technique,” Jun. 09, 2011, arXiv: arXiv:1106.1813. doi: 10.48550/arXiv.1106.1813. [DOI] [Google Scholar]
- [110].Moeckel C. et al. “A survey of k-mer methods and applications in bioinformatics,” Comput. Struct. Biotechnol. J., vol. 23, pp. 2289–2303, Dec. 2024, doi: 10.1016/j.csbj.2024.05.025. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [111].Yuan G.-H., Wang Y., Wang G.-Z., and Yang L., “RNAlight: a machine learning model to identify nucleotide features determining RNA subcellular localization,” Brief. Bioinform., vol. 24, no. 1, p. bbac509, Jan. 2023, doi: 10.1093/bib/bbac509. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [112].Yan Z., Lécuyer E., and Blanchette M., “Prediction of mRNA subcellular localization using deep recurrent neural networks,” Bioinforma. Oxf. Engl., vol. 35, no. 14, pp. i333–i342, Jul. 2019, doi: 10.1093/bioinformatics/btz337. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [113].Liu B. et al. “iDNA-Prot|dis: Identifying DNA-Binding Proteins by Incorporating Amino Acid Distance-Pairs and Reduced Alphabet Profile into the General Pseudo Amino Acid Composition,” PLOS ONE, vol. 9, no. 9, p. e106691, Sep. 2014, doi: 10.1371/journal.pone.0106691. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [114].Garg A., Singhal N., Kumar R., and Kumar M., “mRNALoc: a novel machine-learning based in-silico tool to predict mRNA subcellular localization,” Nucleic Acids Res., vol. 48, no. W1, pp. W239–W243, Jul. 2020, doi: 10.1093/nar/gkaa385. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [115].Su Z.-D. et al. “iLoc-lncRNA: predict the subcellular location of lncRNAs by incorporating octamer composition into general PseKNC,” Bioinformatics, vol. 34, no. 24, pp. 4196–4204, Dec. 2018, doi: 10.1093/bioinformatics/bty508. [DOI] [PubMed] [Google Scholar]
- [116].Musleh S., Arif M., Alajez N. M., and Alam T., “Unified mRNA Subcellular Localization Predictor based on machine learning techniques,” BMC Genomics, vol. 25, no. 1, p. 151, Feb. 2024, doi: 10.1186/s12864-024-10077-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [117].Zhang Z.-Y., Sun Z.-J., Yang Y.-H., and Lin H., “Towards a better prediction of subcellular location of long non-coding RNA,” Front. Comput. Sci., vol. 16, no. 5, p. 165903, Jan. 2022, doi: 10.1007/s11704-021-1015-3. [DOI] [Google Scholar]
- [118].Rajesh P. and Krishnamachari A., “Composition, physicochemical property and base periodicity for discriminating lncRNA and mRNA,” Bioinformation, vol. 19, no. 12, pp. 1145–1152, 2023, doi: 10.6026/973206300191145. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [119].Klapproth C., Sen R., Stadler P. F., Findeiß S., and Fallmann J., “Common Features in lncRNA Annotation and Classification: A Survey,” Non-Coding RNA, vol. 7, no. 4, p. 77, Dec. 2021, doi: 10.3390/ncrna7040077. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [120].Forrest M. E., Pinkard O., Martin S., Sweet T. J., Hanson G., and Coller J., “Codon and amino acid content are associated with mRNA stability in mammalian cells,” PLOS ONE, vol. 15, no. 2, p. e0228730, Feb. 2020, doi: 10.1371/journal.pone.0228730. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [121].Schnell S. J., Ma J., and Yang W., “Three-Dimensional Mapping of mRNA Export through the Nuclear Pore Complex,” Genes, vol. 5, no. 4, pp. 1032–1049, Nov. 2014, doi: 10.3390/genes5041032. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [122].Yan F., Wang X., and Zeng Y., “3D genomic regulation of lncRNA and Xist in X chromosome,” Semin. Cell Dev. Biol., vol. 90, pp. 174–180, Jun. 2019, doi: 10.1016/j.semcdb.2018.07.013. [DOI] [PubMed] [Google Scholar]
- [123].Sarumi O. A. and Heider D., “Large language models and their applications in bioinformatics,” Comput. Struct. Biotechnol. J., vol. 23, pp. 3498–3505, Dec. 2024, doi: 10.1016/j.csbj.2024.09.031. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [124].Ahmad A., Lin H., and Shatabda S., “Locate-R: Subcellular localization of long non-coding RNAs using nucleotide compositions,” Genomics, vol. 112, no. 3, pp. 2583–2589, May 2020, doi: 10.1016/j.ygeno.2020.02.011. [DOI] [PubMed] [Google Scholar]
- [125].Tang Q., Nie F., Kang J., and Chen W., “mRNALocater: Enhance the prediction accuracy of eukaryotic mRNA subcellular localization by using model fusion strategy,” Mol. Ther. J. Am. Soc. Gene Ther., vol. 29, no. 8, pp. 2617–2623, Aug. 2021, doi: 10.1016/j.ymthe.2021.04.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [126].Li J., Zhang L., He S., Guo F., and Zou Q., “SubLocEP: a novel ensemble predictor of subcellular localization of eukaryotic mRNA based on machine learning,” Brief. Bioinform., vol. 22, no. 5, p. bbaa401, Sep. 2021, doi: 10.1093/bib/bbaa401. [DOI] [PubMed] [Google Scholar]
- [127].Musleh S., Islam M. T., Qureshi R., Alajez N. M., and Alam T., “MSLP: mRNA subcellular localization predictor based on machine learning techniques,” BMC Bioinformatics, vol. 24, no. 1, p. 109, Mar. 2023, doi: 10.1186/s12859-023-05232-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [128].Feng S., Liang Y., Du W., Lv W., and Li Y., “LncLocation: Efficient Subcellular Location Prediction of Long Non-Coding RNA-Based Multi-Source Heterogeneous Feature Fusion,” Int. J. Mol. Sci., vol. 21, no. 19, p. 7271, Oct. 2020, doi: 10.3390/ijms21197271. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [129].Liu H., Li D., and Wu H., “Lnclocator-imb: An Imbalance-tolerant Ensemble Deep Learning Framework for Predicting Long Non-coding RNA Subcellular Localization,” IEEE J. Biomed. Health Inform., vol. PP, Oct. 2023, doi: 10.1109/JBHI.2023.3324709. [DOI] [PubMed] [Google Scholar]
- [130].Wang J., Horlacher M., Cheng L., and Winther O., “DeepLocRNA: an interpretable deep learning model for predicting RNA subcellular localization with domain-specific transfer-learning,” Bioinformatics, vol. 40, no. 2, p. btae065, Feb. 2024, doi: 10.1093/bioinformatics/btae065. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [131].Wang D. et al. “DM3Loc: multi-label mRNA subcellular localization prediction and analysis based on multi-head self-attention mechanism,” Nucleic Acids Res., vol. 49, no. 8, p. e46, May 2021, doi: 10.1093/nar/gkab016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [132].Choudhury S., Bajiya N., Patiyal S., and Raghava G. P. S., “MRSLpred-a hybrid approach for predicting multi-label subcellular localization of mRNA at the genome scale,” Front. Bioinforma., vol. 4, p. 1341479, 2024, doi: 10.3389/fbinf.2024.1341479. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [133].Asim M. N., Malik M. I., Zehe C., Trygg J., Dengel A., and Ahmed S., “MirLocPredictor: A ConvNet-Based Multi-Label MicroRNA Subcellular Localization Predictor by Incorporating k-Mer Positional Information,” Genes, vol. 11, no. 12, p. 1475, Dec. 2020, doi: 10.3390/genes11121475. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [134].Bai T., Xie J., Liu Y., and Liu B., “MMLmiRLocNet: miRNA Subcellular Localization Prediction based on Multi-view Multi-label Learning for Drug Design,” IEEE J. Biomed. Health Inform., vol. PP, Oct. 2024, doi: 10.1109/JBHI.2024.3483997. [DOI] [PubMed] [Google Scholar]
- [135].Zhang S. and Qiao H., “KD-KLNMF: Identification of lncRNAs subcellular localization with multiple features and nonnegative matrix factorization,” Anal. Biochem., vol. 610, p. 113995, Dec. 2020, doi: 10.1016/j.ab.2020.113995. [DOI] [PubMed] [Google Scholar]
- [136].Liu Z., Bai T., Liu B., and Yu L., “MulStack: An ensemble learning prediction model of multilabel mRNA subcellular localization,” Comput. Biol. Med., vol. 175, p. 108289, Jun. 2024, doi: 10.1016/j.compbiomed.2024.108289. [DOI] [PubMed] [Google Scholar]
- [137].Zuckerman B. and Ulitsky I., “Predictive models of subcellular localization of long RNAs,” RNA N. Y. N, vol. 25, no. 5, pp. 557–572, May 2019, doi: 10.1261/rna.068288.118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [138].Zhang Z.-Y., Zhang Z., Ye X., Sakurai T., and Lin H., “A BERT-based model for the prediction of lncRNA subcellular localization in Homo sapiens,” Int. J. Biol. Macromol., vol. 265, no. Pt 1, p. 130659, Apr. 2024, doi: 10.1016/j.ijbiomac.2024.130659. [DOI] [PubMed] [Google Scholar]
- [139].Wang X., Wang S., Wang R., and Gao X., “PreSubLncR: Predicting Subcellular Localization of Long Non-Coding RNA Based on Multi-Scale Attention Convolutional Network and Bidirectional Long Short-Term Memory Network,” Processes, vol. 12, no. 4, Art. no. 4, Apr. 2024, doi: 10.3390/pr12040666. [DOI] [Google Scholar]
- [140].Deng X., Tang L., and Liu L., “GATLncLoc+C&S: Prediction of LncRNA subcellular localization based on corrective graph attention network,” Mar. 12, 2024, bioRxiv. doi: 10.1101/2024.03.08.584063. [DOI] [Google Scholar]
- [141].Deng Xi, “Application of Graph Attention Networks in LncRNA Subcellular Localization Prediction,” Acad. J. Comput. Inf. Sci., vol. 7, no. 3, Mar. 2024, doi: 10.25236/AJCIS.2024.070310. [DOI] [Google Scholar]
- [142].Li F., Bi Y., Guo X., Tan X., Wang C., and Pan S., “Allocator is a graph neural network-based framework for mRNA subcellular localization prediction,” Dec. 15, 2023, bioRxiv. doi: 10.1101/2023.12.14.571762. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [143].Babaiha N. S., Aghdam R., Ghiam S., and Eslahchi C., “NN-RNALoc: Neural network-based model for prediction of mRNA sub-cellular localization using distance-based sub-sequence profiles,” PloS One, vol. 18, no. 9, p. e0258793, 2023, doi: 10.1371/journal.pone.0258793. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [144].Li M., Zhao B., Yin R., Lu C., Guo F., and Zeng M., “GraphLncLoc: long non-coding RNA subcellular localization prediction using graph convolutional networks based on sequence to graph transformation,” Brief. Bioinform., vol. 24, no. 1, p. bbac565, Jan. 2023, doi: 10.1093/bib/bbac565. [DOI] [PubMed] [Google Scholar]
- [145].Cai J., Wang T., Deng X., Tang L., and Liu L., “GM-lncLoc: LncRNAs subcellular localization prediction based on graph neural network with meta-learning,” BMC Genomics, vol. 24, no. 1, p. 52, Jan. 2023, doi: 10.1186/s12864-022-09034-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [146].Lyu J., Zheng P., Qi Y., and Huang G., “LightGBM-LncLoc: A LightGBM-Based Computational Predictor for Recognizing Long Non-Coding RNA Subcellular Localization,” Mathematics, vol. 11, no. 3, Art. no. 3, Jan. 2023, doi: 10.3390/math11030602. [DOI] [Google Scholar]
- [147].Zeng M. et al. “LncLocFormer: a Transformer-based deep learning model for multi-label lncRNA subcellular localization prediction by using localization-specific attention mechanism,” Bioinformatics, vol. 39, no. 12, p. btad752, Dec. 2023, doi: 10.1093/bioinformatics/btad752. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [148].Fu X., Chen Y., and Tian S., “DlncRNALoc: A discrete wavelet transform-based model for predicting lncRNA subcellular localization,” Math. Biosci. Eng. MBE, vol. 20, no. 12, pp. 20648–20667, Nov. 2023, doi: 10.3934/mbe.2023913. [DOI] [PubMed] [Google Scholar]
- [149].Li M. et al. “SGCL-LncLoc: An Interpretable Deep Learning Model for Improving IncRNA Subcellular Localization Prediction with Supervised Graph Contrastive Learning,” Big Data Min. Anal., vol. 7, no. 3, pp. 765–780, Sep. 2024, doi: 10.26599/BDMA.2024.9020002. [DOI] [Google Scholar]
- [150].Wang Y., Zhao P., Du H., Cao Y., Peng Q., and Fu L., “LncDLSM: Identification of Long Non-coding RNAs with Deep Learning-based Sequence Model,” IEEE J. Biomed. Health Inform., vol. PP, Feb. 2023, doi: 10.1109/JBHI.2023.3247805. [DOI] [PubMed] [Google Scholar]
- [151].Asim M. N. et al. “EL-RMLocNet: An explainable LSTM network for RNA-associated multi-compartment localization prediction,” Comput. Struct. Biotechnol. J., vol. 20, pp. 3986–4002, 2022, doi: 10.1016/j.csbj.2022.07.031. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [152].Bi Y. et al. “Clarion is a multi-label problem transformation method for identifying mRNA subcellular localizations,” Brief. Bioinform., vol. 23, no. 6, p. bbac467, Nov. 2022, doi: 10.1093/bib/bbac467. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [153].Zeng M., Wu Y., Lu C., Zhang F., Wu F.-X., and Li M., “DeepLncLoc: a deep learning framework for long non-coding RNA subcellular localization prediction based on subsequence embedding,” Brief. Bioinform., vol. 23, no. 1, p. bbab360, Jan. 2022, doi: 10.1093/bib/bbab360. [DOI] [PubMed] [Google Scholar]
- [154].Jeon Y.-J., Hasan M. M., Park H. W., Lee K. W., and Manavalan B., “TACOS: a novel approach for accurate prediction of cell-specific long noncoding RNAs subcellular localization,” Brief. Bioinform., vol. 23, no. 4, p. bbac243, Jul. 2022, doi: 10.1093/bib/bbac243. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [155].Meher P. K., Rai A., and Rao A. R., “mLoc-mRNA: predicting multiple sub-cellular localization of mRNAs using random forest algorithm coupled with feature selection via elastic net,” BMC Bioinformatics, vol. 22, p. 342, Jun. 2021, doi: 10.1186/s12859-021-04264-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [156].Yi W. and Adjeroh D. A., “A Deep Learning Approach to LncRNA Subcellular Localization Using Inexact q-mers,” in 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Dec. 2021, pp. 2128–2133. doi: 10.1109/BIBM52615.2021.9669409. [DOI] [Google Scholar]
- [157].Fan Y., Chen M., and Zhu Q., “lncLocPred: Predicting LncRNA Subcellular Localization Using Multiple Sequence Feature Information,” IEEE Access, vol. 8, pp. 124702–124711, 2020, doi: 10.1109/ACCESS.2020.3007317. [DOI] [Google Scholar]
- [158].Gudenas B. L. and Wang L., “Prediction of LncRNA Subcellular Localization with Deep Learning from Sequence Features,” Sci. Rep., vol. 8, no. 1, p. 16385, Nov. 2018, doi: 10.1038/s41598-018-34708-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [159].Cao Z., Pan X., Yang Y., Huang Y., and Shen H.-B., “The lncLocator: a subcellular localization predictor for long non-coding RNAs based on a stacked ensemble classifier,” Bioinforma. Oxf. Engl., vol. 34, no. 13, pp. 2185–2194, Jul. 2018, doi: 10.1093/bioinformatics/bty085. [DOI] [PubMed] [Google Scholar]
- [160].Tripathi R., Patel S., Kumari V., Chakraborty P., and Varadwaj P. K., “DeepLNC, a long non-coding RNA prediction tool using deep neural network,” Netw. Model. Anal. Health Inform. Bioinforma., vol. 5, no. 1, p. 21, Jun. 2016, doi: 10.1007/s13721-016-0129-2. [DOI] [Google Scholar]
- [161].Mak M.-W., Guo J., and Kung S.-Y., “PairProSVM: Protein Subcellular Localization Based on Local Pairwise Profile Alignment and SVM,” IEEE/ACM Trans. Comput. Biol. Bioinform., vol. 5, no. 3, pp. 416–422, Jul. 2008, doi: 10.1109/TCBB.2007.70256. [DOI] [PubMed] [Google Scholar]
- [162].Tahir M., Khan F., Hayat M., and Alshehri M. D., “An effective machine learning-based model for the prediction of protein–protein interaction sites in health systems,” Neural Comput. Appl., vol. 36, no. 1, pp. 65–75, Jan. 2024, doi: 10.1007/s00521-022-07024-8. [DOI] [Google Scholar]
- [163].Zhao L., Wang J., Nabil M. M., and Zhang J., “Deep Forest-based Prediction of Protein Subcellular Localization,” Curr. Gene Ther., vol. 18, no. 5, pp. 268–274, 2018, doi: 10.2174/1566523218666180913110949. [DOI] [PubMed] [Google Scholar]
- [164].Sui J., Chen Y., Cao Y., and Zhao Y., “Accurate Identification of Submitochondrial Protein Location Based on Deep Representation Learning Feature Fusion,” in Advanced Intelligent Computing Technology and Applications, Huang D.-S., Premaratne P., Jin B., Qu B., Jo K.-H., and Hussain A., Eds., Singapore: Springer Nature, 2023, pp. 587–596. doi: 10.1007/978-981-99-4749-2_50. [DOI] [Google Scholar]
- [165].Asim M. N. et al. “L2S-MirLoc: A Lightweight Two Stage MiRNA Sub-Cellular Localization Prediction Framework,” in 2021 International Joint Conference on Neural Networks (IJCNN), Jul. 2021, pp. 1–8. doi: 10.1109/IJCNN52387.2021.9534015. [DOI] [Google Scholar]
- [166].Zhou H., Wang H., Tang J., Ding Y., and Guo F., “Identify ncRNA Subcellular Localization via Graph Regularized k-Local Hyperplane Distance Nearest Neighbor Model on Multi-Kernel Learning,” IEEE/ACM Trans. Comput. Biol. Bioinform., vol. 19, no. 6, pp. 3517–3529, 2022, doi: 10.1109/TCBB.2021.3107621. [DOI] [PubMed] [Google Scholar]
- [167].Ding Y., Tiwari P., Guo F., and Zou Q., “Shared subspace-based radial basis function neural network for identifying ncRNAs subcellular localization,” Neural Netw., vol. 156, pp. 170–178, Dec. 2022, doi: 10.1016/j.neunet.2022.09.026. [DOI] [PubMed] [Google Scholar]
- [168].Ding L. and McDonald D. J., “Predicting phenotypes from microarrays using amplified, initially marginal, eigenvector regression,” Bioinforma. Oxf. Engl., vol. 33, no. 14, pp. i350–i358, Jul. 2017, doi: 10.1093/bioinformatics/btx265. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [169].Han S. and Liu L., “GP-HTNLoc: A graph prototype head-tail network-based model for multi-label subcellular localization prediction of ncRNAs,” Comput. Struct. Biotechnol. J., vol. 23, pp. 2034–2048, Dec. 2024, doi: 10.1016/j.csbj.2024.04.052. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [170].Bai T. and Liu B., “ncRNALocate-EL: a multi-label ncRNA subcellular locality prediction model based on ensemble learning,” Brief. Funct. Genomics, vol. 22, no. 5, pp. 442–452, Nov. 2023, doi: 10.1093/bfgp/elad007. [DOI] [PubMed] [Google Scholar]
- [171].“Long Short-Term Memory | MIT Press Journals & Magazine | IEEE Xplore.” Accessed: Dec. 16, 2024. [Online]. Available: https://ieeexplore.ieee.org/abstract/document/6795963 [Google Scholar]
- [172].“lncLocator 2.0: a cell-line-specific subcellular localization predictor for long non-coding RNAs with interpretable deep learning | Bioinformatics | Oxford Academic.” Accessed: Nov. 22, 2024. [Online]. Available: https://academic.oup.com/bioinformatics/article/37/16/2308/6149475 [DOI] [PubMed] [Google Scholar]
- [173].“iLoc-miRNA: extracellular/intracellular miRNA prediction using deep BiLSTM with attention mechanism | Briefings in Bioinformatics | Oxford Academic.” Accessed: Nov. 22, 2024. [Online]. Available: https://academic.oup.com/bib/article/23/5/bbac395/6693601 [Google Scholar]
- [174].Chen L., Gu J., and Zhou B., “PMiSLocMF: predicting miRNA subcellular localizations by incorporating multi-source features of miRNAs,” Brief. Bioinform., vol. 25, no. 5, p. bbae386, Sep. 2024, doi: 10.1093/bib/bbae386. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [175].Lundberg S. and Lee S.-I., “A Unified Approach to Interpreting Model Predictions,” Nov. 25, 2017, arXiv: arXiv:1705.07874. doi: 10.48550/arXiv.1705.07874. [DOI] [Google Scholar]
- [176].“MGFmiRNAloc: Predicting miRNA Subcellular Localization Using Molecular Graph Feature and Convolutional Block Attention Module | IEEE Journals & Magazine | IEEE Xplore.” Accessed: Nov. 22, 2024. [Online]. Available: https://ieeexplore.ieee.org/document/10487110 [DOI] [PubMed] [Google Scholar]
- [177].“DAmiRLocGNet: miRNA subcellular localization prediction by combining miRNA–disease associations and graph convolutional networks | Briefings in Bioinformatics | Oxford Academic.” Accessed: Nov. 22, 2024. [Online]. Available: https://academic.oup.com/bib/article/24/4/bbad212/7199901 [DOI] [PubMed] [Google Scholar]
- [178].Wang J., Horlacher M., Cheng L., and Winther O., “RNA trafficking and subcellular localization-a review of mechanisms, experimental and predictive methodologies,” Brief. Bioinform., vol. 24, no. 5, p. bbad249, Sep. 2023, doi: 10.1093/bib/bbad249. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [179].Mah C. K. et al. “Bento: a toolkit for subcellular analysis of spatial transcriptomics data,” Genome Biol., vol. 25, no. 1, p. 82, Apr. 2024, doi: 10.1186/s13059-024-03217-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [180].Savulescu A. F., Bouilhol E., Beaume N., and Nikolski M., “Prediction of RNA subcellular localization: Learning from heterogeneous data sources,” iScience, vol. 24, no. 11, p. 103298, Nov. 2021, doi: 10.1016/j.isci.2021.103298. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [181].Dubois R. et al. “A Deep Learning Approach To Identify MRNA Localization Patterns,” in 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019), Apr. 2019, pp. 1386–1390. doi: 10.1109/ISBI.2019.8759235. [DOI] [Google Scholar]
- [182].Samacoits A. et al. “A computational framework to study sub-cellular RNA localization,” Nat. Commun., vol. 9, no. 1, p. 4584, Nov. 2018, doi: 10.1038/s41467-018-06868-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [183].Kraus O. Z. et al. “Automated analysis of high-content microscopy data with deep learning,” Mol. Syst. Biol., vol. 13, no. 4, p. 924, Apr. 2017, doi: 10.15252/msb.20177551. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [184].Winter P. W. and Shroff H., “Faster fluorescence microscopy: advances in high speed biological imaging,” Curr. Opin. Chem. Biol., vol. 20, pp. 46–53, Jun. 2014, doi: 10.1016/j.cbpa.2014.04.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [185].Imbert A. et al. “FISH-quant v2: a scalable and modular tool for smFISH image analysis,” RNA N. Y. N, vol. 28, no. 6, pp. 786–795, Jun. 2022, doi: 10.1261/rna.079073.121. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [186].Walter F. C., Stegle O., and Velten B., “FISHFactor: a probabilistic factor model for spatial transcriptomics data with subcellular resolution,” Bioinformatics, vol. 39, no. 5, p. btad183, May 2023, doi: 10.1093/bioinformatics/btad183. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [187].Thul P. J. et al. “A subcellular map of the human proteome,” Science, vol. 356, no. 6340, p. eaal3321, May 2017, doi: 10.1126/science.aal3321. [DOI] [PubMed] [Google Scholar]
- [188].Kiskowski M. A., Hancock J. F., and Kenworthy A. K., “On the use of Ripley’s K-function and its derivatives to analyze domain size,” Biophys. J., vol. 97, no. 4, pp. 1095–1103, Aug. 2009, doi: 10.1016/j.bpj.2009.05.039. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [189].Bae S., Choi H., and Lee D. S., “Discovery of molecular features underlying the morphological landscape by integrating spatial transcriptomic data with deep features of tissue images,” Nucleic Acids Res., vol. 49, no. 10, p. e55, Jun. 2021, doi: 10.1093/nar/gkab095. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [190].Smirnov E. A., Timoshenko D. M., and Andrianov S. N., “Comparison of Regularization Methods for ImageNet Classification with Deep Convolutional Neural Networks,” AASRI Procedia, vol. 6, pp. 89–94, Jan. 2014, doi: 10.1016/j.aasri.2014.05.013. [DOI] [Google Scholar]
- [191].Yuan C. et al. “Multimodal deep learning model on interim [18F]FDG PET/CT for predicting primary treatment failure in diffuse large B-cell lymphoma,” Eur. Radiol., vol. 33, no. 1, pp. 77–88, Jan. 2023, doi: 10.1007/s00330-022-09031-8. [DOI] [PubMed] [Google Scholar]
- [192].OpenAI et al. “GPT-4o System Card,” Oct. 25, 2024, arXiv: arXiv:2410.21276. doi: 10.48550/arXiv.2410.21276. [DOI] [Google Scholar]
- [193].Cui C. et al. “Deep Multi-modal Fusion of Image and Non-image Data in Disease Diagnosis and Prognosis: A Review,” Jan. 26, 2023, arXiv: arXiv:2203.15588. doi: 10.48550/arXiv.2203.15588. [DOI] [Google Scholar]
- [194].Huang Z. et al. “SALMON: Survival Analysis Learning With Multi-Omics Neural Networks on Breast Cancer,” Front. Genet., vol. 10, Mar. 2019, doi: 10.3389/fgene.2019.00166. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [195].Deng X. and Liu L., “A review of machine learning-based prediction of lncRNA subcellular localization,” Adv. Comput. Signals Syst., vol. 7, no. 9, pp. 58–63, Nov. 2023, doi: 10.23977/acss.2023.070908. [DOI] [Google Scholar]



