Skip to main content
Computational and Structural Biotechnology Journal logoLink to Computational and Structural Biotechnology Journal
. 2024 Jun 26;23:3030–3039. doi: 10.1016/j.csbj.2024.06.030

PRIMITI: A computational approach for accurate prediction of miRNA-target mRNA interaction

Korawich Uthayopas a,b, Alex GC de Sá a,b,c,, Azadeh Alavi d, Douglas EV Pires a,b,e, David B Ascher a,b,c,
PMCID: PMC11340604  PMID: 39175797

Abstract

Current medical research has been demonstrating the roles of miRNAs in a variety of cellular mechanisms, lending credence to the association between miRNA dysregulation and multiple diseases. Understanding the mechanisms of miRNA is critical for developing effective diagnostic and therapeutic strategies. miRNA-mRNA interactions emerge as the most important mechanism to be understood despite their experimental validation constraints. Accordingly, several computational models have been developed to predict miRNA-mRNA interactions, albeit presenting limited predictive capabilities, poor characterisation of miRNA-mRNA interactions, and low usability. To address these drawbacks, we developed PRIMITI, a PRedictive model for the Identification of novel miRNA-Target mRNA Interactions. PRIMITI is a novel machine learning model that utilises CLIP-seq and expression data to characterise functional target sites in 3’-untranslated regions (3’-UTRs) and predict miRNA-target mRNA repression activity. The model was trained using a reliable negative sample selection approach and the robust extreme gradient boosting (XGBoost) model, which was coupled with newly introduced features, including sequence and genetic variation information. PRIMITI achieved an area under the receiver operating characteristic (ROC) curve (AUC) up to 0.96 for a prediction of functional miRNA-target site binding and 0.96 for a prediction of miRNA-target mRNA repression activity on cross-validation and an independent blind test. Additionally, the model outperformed state-of-the-art methods in recovering miRNA-target repressions in an unseen microarray dataset and in a collection of validated miRNA-mRNA interactions, highlighting its utility for preliminary screening. PRIMITI is available on a reliable, scalable, and user-friendly web server at https://biosig.lab.uq.edu.au/primiti.

Keywords: MicroRNAs, MiRNA-target interaction prediction, MiRNA-mediated repression, Machine learning, EXtreme Gradient Boosting (XGBoost)

Graphical Abstract

ga1

Highlights

  • miRNAs play an essential role in post-transcriptional gene regulation through complementary base pair binding with mRNA target sites.

  • PRIMITI yields a new machine learning model (ML) model to identify miRNA-target mRNA interactions.

  • PRIMITI incorporates novelties in the characterisation of functional miRNA binding sites and also in providing more reliable training sets for its respective ML model.

  • PRIMITI achieved great predictive performances under cross-validation, blind test, and independent test sets, indicating its robustness in identifying miRNA-target mRNA repression.

  • PRIMITI was made available as a user-friendly web server at https://biosig.lab.uq.edu.au/primiti/.

1. Introduction

Recent evidence strongly suggests that non-coding RNAs, a type of RNA that does not encode proteins, play critical roles in a variety of cellular processes, such as cell differentiation and apoptosis [1], [2], [3]. MicroRNAs, abbreviated as miRNAs, are small non-coding RNAs that regulate gene expression by partially complementary base pairing with binding sites in target messenger RNA (mRNA) [4]. The binding primarily occurs within 3’-untranslated regions (3’-UTR) [4], with some evidence of functional binding in 5’-untranslated regions (5’-UTR) and coding sequence [3]. This process generally results in translational suppression or mRNA degradation [2], [3], [4]. Over half of all protein-coding genes are estimated to be regulated by at least one miRNA [5], highlighting the importance of miRNA functions in molecular mechanisms. Dysregulation of several miRNAs has been associated with the development of multiple diseases, such as cancers, and cardiovascular and neurological disorders [6], [7], [8].

This has led to the increasing popularity and development of personalised miRNA therapeutics, such as novel biomarkers and medications [9], [10]. Given this scenario, a range of miRNA-based drugs has been currently undergoing clinical trials [9], [10]. Despite its innovation in personalised medicine, the development of miRNA-based medical treatments is still constrained by a lack of comprehensive understanding of a miRNA targeting mechanism.

Accordingly, to gain a better understanding of miRNA activities, several experimental validation approaches for identifying transcripts targeted by miRNAs have been developed [11], [12], [13], [14], [15], [16], including procedures that utilise RNA-seq or microarrays to assess changes in target expression driven by miRNA overexpression or downregulation [11], [12], or CrossLinking-ImmunoPrecipitation sequencing (CLIP-seq) to confirm the direct molecular interactions between miRNAs and mRNA target sites [13], [14], [15], [16]. Nevertheless, large-scale experimental exploration is not feasible due to time, funding constraints and uncertainty of this process.

Consequently, several computational approaches emerged to accurately and scalably predict miRNA-target interactions, aiming to allow the creation of better diagnostic tools and treatments, and to improve miRNA-target mRNA repression characterisation [17], [18]. They include state-of-the-art methods such as miRanda [19], [20], RNA22 [21], PITA [22], DIANA-microT-CDS v5 [23], TargetScan [24], and miRTarget [12], [25]. A broad overview of the evolution of these methods is described in Supplementary Materials.

Despite the development effort put forward by these current methods, model usability is still greatly constrained by two main distinct factors. First, there is a limitation on sensitivity coming from CLIP-seq data [26], which is generally obscured by background noise. This usually results in an experimental failure to capture a significant number of miRNA-target interactions and reliable non-interactions. Second, there is a lack of useful characteristics defining functional miRNA-target site interactions that impair target prediction accuracy. To overcome this challenge, further improvement needs to be introduced, including not only a way to effectively select the negative samples but also incorporating characteristics that define proper functional miRNA-target interactions and non-interactions.

In this context, iLearn, a versatile feature extraction tool designed for encoding DNA, RNA, and protein sequences, offers a comprehensive set of sequence-derived physicochemical descriptive variables (or features), known as iFeatures [27], [28]. Proven to enhance model performance in various biological problems [29], [30], iLearn presents a promising approach for enhancing miRNA-target prediction accuracy. Additionally, Genome-Wide Association Studies (GWAS) have been extensively conducted globally to collect comprehensive information on human genetic variation [31]. Single nucleotide polymorphisms (SNPs), the most commonly studied genetic variation, have been observed to have an impact on miRNA repression, providing an opportunity for better characterising miRNA-binding sites [32]. The integration of iFeatures and SNPs into predictive models has the potential to provide valuable insights into the landscape of miRNA-mediated regulation.

We proposed PRIMITI, a PRedictive model for Identification of novel miRNA-Target mRNA Interactions, to gather these insights and address the main drawbacks in current alternative methods. A machine learning model was implemented in PRIMITI to model the patterns of miRNA-mRNA interactions based on CLIP-seq and gene expression experiments [11], [12], [13], [14], [15], [16]. This model incorporates more than 150 features to characterise miRNA-target site interactions, including newly introduced iFeatures and SNP features. A negative selection strategy is also employed to provide more reliable no-interactive (i.e., negative) samples.

With these attributes, PRIMITI encompasses two prediction modes, PRIMITI-TS (Target Site) and PRIMITI-TM (Target mRNA). The former predicts potential miRNA-target sites that bind physicochemically, while the latter identifies potential functional miRNA-mRNA repression activity for specific miRNAs and mRNAs. This dual-mode approach acknowledges that one mRNA may contain multiple potential target sites, where multiple bindings can increase the probability of repression activity. In order to enhance the accessibility of our model, we provided a user-friendly web server platform for exploring human miRNA-target interactions, which is available at https://biosig.lab.uq.edu.au/primiti. We believe that with these characteristics, PRIMITI would greatly contribute to a better understanding of miRNA roles, the creation of accurate biomarkers, and the development of effective treatments for a range of diseases.

2. Materials and methods

2.1. PRIMITI general workflow

The proposed methodological pipeline for both PRIMITI-TS and PRIMITI-TM comprises six primary steps, which are summarised in Fig. 1. First, the datasets were collected and curated from multiple sources and experimental types [11], [12], [13], [14], [15], [16]. The following step involved feature engineering for the miRNA-target site data, which resulted in the generation of 154 features. Statistical analysis was then used to examine the differences between target-site and non-target-site samples for each feature. Next, a forward stepwise greedy feature selection method [33] was utilised to choose the minimal but the most effective set of features for training the machine learning model. As a result, PRIMITI-TS was primarily developed to prioritise functional miRNA-target binding sites. Due to the fact that a single mRNA might have several target sites complementary to a particular miRNA, PRIMITI-TM was constructed using confidence scores for each target site generated by PRIMITI-TS as inputs to estimate the probability of functional miRNA suppression of mRNA targets.

Fig. 1.

Fig. 1

PRIMITI workflow: miRNA-target mRNA interaction prediction. The development of PRIMITI is divided into six steps: (A) Data collection - experimental CLIP-seq and gene expression data are gathered from multiple sources, (B) Feature engineering - seven types of features are generated to characterise miRNA-target site interactions, (C) Statistical analysis - Mann-Whitney U test and Chi-square test were used to analyse the binding between miRNAs and target sites, (D) Feature selection - relevant features were determined using forward stepwise greedy feature selection, (E) PRIMITI-TS model training and evaluation – the extreme gradient boosting (XGBoost) algorithm was employed to predict miRNA-target site interactions, (F) PRIMITI-TM model training and evaluation - prediction results from PRIMITI-TS were used as features to construct a final XGBoost model for prioritising potential miRNA-mediated mRNA repression.

2.2. Data collection and preprocessing

As aforementioned, PRIMITI is divided into two prediction modes, PRIMITI-TS and PRIMITI-TM. Consequently, they differ during data acquisition, which is described next. The list of experimental data is provided in Table S1, and the data collection and preprocessing steps were summarised in Fig. S1.

In PRIMITI-TS, data collection starts by curating miRNA-target site interactions (i.e., positive samples). In this step, CLIP-seq data was acquired from different sources [13], [14], [15], [16] to be used in PRIMITI. The verified miRNA-canonical target site interactions constitute 6190 positive samples between 300 miRNAs and 3331 mRNA transcripts from 3147 genes. In terms of non-interactions (i.e., negative samples), 504,159 negative samples were extracted from unverified canonical miRNA-target site interactions from the same CLIP-seq data. Nevertheless, because of the limited sensitivity of CLIP-seq, interpreting unconfirmed interactions as non-interactions may result in the production of poor-quality negative data. Although this practice is commonly used in alternative methods [12], [24], we aimed to improve this aspect on negative data. To increase the quality of negative samples, negative sample selection was therefore introduced. 451,413 reliable negative samples were selected by comparing them to a collection of experimentally validated miRNA-target mRNA interactions from miRTarbase [34] and Tarbase [35]. Any miRNA-target mRNA pair identified in this list was considered to have an interaction with each other and was subsequently removed from the negative sample dataset. Given the high-imbalance ratio between interactions and non-interactions, a random under-sampling procedure [36] was used to create a dataset with 1:1 balanced data with miRNA-mRNA target interactions and non-interactions. Finally, high-throughput sequencing of RNA isolated by crosslinking immunoprecipitation (HITS-CLIP) data [13] was utilised as an additional external test set. 98 miRNA-target site samples and 7240 miRNA-non-target site samples were derived after using the same validation procedure of CLIP-seq data.

In PRIMITI-TM, on the other hand, miRNA-target mRNA repressions were extracted from miRTarget’s RNA-seq data [12]. By mapping repressed genes with at least 40 % reduction in gene expression as target mRNAs, a total of 2351 miRNA-target mRNA interactions were found between 25 miRNAs and 1511 genes. In contrast, a set of negative samples was created from genes with unaffected expression levels, ranging from 100 % to 110 % when compared to a negative control sample without miRNA transfection. With these thresholds, we obtained 6749 miRNA-non-target mRNA pairs involving 25 miRNAs and 3966 genes. To validate PRIMITI-TM and compare it to alternative methods, microarray data from Linsley et al. [11] was utilised. After normalisation and definition of functional miRNA-mRNA repression activities, 596 miRNA-mRNA interactions and 10,233 miRNA-mRNA non-interactions were observed between 7 miRNAs and 1279 genes.

2.3. Feature generation

We carried out an extensive literature review, searching for features that could assist in the characterisation of miRNA-target site interactions. As a result, 154 features are developed and generated to delineate a miRNA-target site interaction for PRIMITI-TS. The set of features consists of 4 canonical site types, 4 binding stability, 13 site accessibility, 7 3’-supplementary bindings, 16 conservation, 18 human genetic variations, and 92 iFeature descriptors. They are briefly introduced as follows. A complete description of the feature generation is provided in Supplementary Material and Methods, and in Table S2.

  • Canonical site types. Canonical site type features are related to different suppression efficacy in terms of mRNA fold change and can be divided into four classes: 6-mer, 7-mer-m8, 7-mer-A1, and 8-mer [4].

  • Binding stability. The stability of the miRNA-induced silencing complex (miRISC) complex formed by miRNA, mRNA, and Argonaute2 (Ago-2) protein, is referred to as the binding stability. This has been substantially correlated with miRNA targeting efficacy [37] and, therefore, used in PRIMITI.

  • Site accessibility. Accessibility to a binding site is considered a critical factor while determining the targeting specificity and translational repression activity [38].

  • 3’-supplementary binding. During the binding process, and after the seed region has initiated a conformational change of miRISC, the last half of the 3'-region of miRNA is exposed for additional interactions with the target mRNA [39], [40]. This supplementary interaction is regarded as a 3’-supplementary pairing and is mostly occurring at the 3’-supplementary regions [41].

  • Conservation. Conservation, in turn, is a useful indicator for prioritising functional target sites [5], [41], as genetic regions coding for functional miRNA binding sites have been proven to evolve at a slower rate due to evolutionary constraints to the conserved functional regions [5].

  • Human genetic variation. Additionally, human genetic variants were implemented and used to assist in the characterisation of miRNA-binding sites. A presence of single nucleotide (SNPs) and disease-related SNPs on each location in a target site is considered.

  • iFeatures. iLearn is a newly presented Python toolkit that implements a comprehensive set of descriptive variables (or features), named as iFeatures, to encode structural and physicochemical information of nucleic acid or peptide sequences [27], [28]. It has been widely used in multiple fields such as a prediction of mRNA subcellular localisation or protein-protein interaction [29], [30]. It is important to highlight and stress that this is the first time iFeatures are used to characterise miRNA-target site interactions.

Statistical analysis for this set of features has been discussed in detail in Supplementary Material and Methods and thereafter provided in Supplementary Results.

2.4. PRIMITI-TS Model training

A range of supervised machine learning (ML) algorithms was independently assessed [42], including random forest, extreme gradient boosting (XGBoost), extremely randomised trees, adaptive boosting, gradient boosting, artificial neural networks, decision trees, k-nearest neighbours, support vector machines and gaussian processes.

Among them, XGBoost [43] outperformed the nine other algorithms when training PRIMITI-TS with the complete feature set (see Table S3). We believe the reason for this is because of XGBoost’s main characteristics. XGBoost is a highly optimised decision tree-based ensemble machine learning algorithm that has advantages in efficiency, flexibility, and parallelisation. Additionally, it can manage missing data internally, obviating the need to alter the data explicitly prior. This technique combines the results of numerous weak tree-based classifiers to obtain increasingly precise overall predictions. Consequently, we have decided to keep XGBoost for the remaining parts of the PRIMITI-TS pipeline.

Internal cross-validation approaches, as well as external validation approaches with independent blind test sets, were performed to evaluate the performance of PRIMITI-TS using different metrics, including the Area Under the Receiver Operating Characteristic Curve (AUC), Balanced accuracy (bACC), F1-score (F1), Matthew’s Correlation Coefficient (MCC), Precision, Recall, and Specificity. An independent blind test set comprised samples separated from the training dataset prior to model training. Detailed information about the performance metrics employed is available in Supplementary Materials and Methods.

2.5. PRIMITI-TS Feature selection

Given the current set of 154 features, a forward stepwise greedy feature selection procedure is performed in PRIMITI-TS aiming to select the best combination of features while maintaining the model’s simplicity [33], [44], [45]. Greedy feature selection starts with a set of zero features and then selects the feature that contributed the most to the predictive performance at that iteration. The predictive performance of the XGBoost model [43] in terms of MCC across a 10-fold cross-validation procedure (Fig. S2) was used to quantify the quality of each feature [42]. Iteratively, the following features are chosen similarly and added one by one in the current set. In the end, a set of 22 features was selected based on the analysis trading-off between complexity and predictive performance on 10-fold cross-validation across greedy feature selection iterations (Fig. S2 and Table S4). This set contains two site types, one binding stability, one 3’-supplementary binding, three site accessibility, four conservation, two human genetic variations, and nine iFeature features [27]. Overall, this selection shows that all types of features are important in predicting the interactions between miRNAs and target mRNAs. Hence, the final PRIMITI-TS model was trained using these 22 features with the XGBoost algorithm.

2.6. PRIMITI-TM Model training

PRIMITI-TS was previously developed to predict a functional binding interaction between miRNAs and target sites. However, in certain real-world scenarios, researchers are more concerned with miRNA-mediated regulation of mRNA, which is a biological consequence of physical interactions between miRNA and multiple target sites in one mRNA. Previous work used this information to improve the understanding of the association of miRNAs and prostate cancer [46], [47], [48] and to develop miRNA therapeutics and biomarkers [46], [47], [48].

It is worth noting that predicting suppression is not equivalent to predicting binding sites, since a single target mRNA may host several binding sites and not all physical binding sites can result in miRNA-mediated gene silencing. When evaluating repressive interactions, the likelihood of each putative site must be considered collectively. PRIMITI-TM has been proposed to prioritise miRNA-target mRNA interactions. PRIMITI-TS will first calculate the likelihood of all potential canonical binding sites for each miRNA-mRNA interaction. Then, four features based on calculated probabilities will be used to train PRIMITI-TM's model, including: (i) the summation of all probabilities in all target locations, (ii) the probability in the highest probability site, (iii) the probability in the second highest probability site, and (iv) the probability in the third highest probability site (Fig. S3). Afterwards, these features will be used to characterise experimentally verified miRNA-mRNA suppression. 5-, 10- and 20-fold cross-validation procedures and an independent blind test set were utilised to evaluate the performance of PRIMITI-TM's model.

Overall, PRIMITI advances in the field of miRNA-target interaction by considering new aspects for characterising RNAs (e.g., iFeature and human genetic variation), the development of a negative sample selection method for choosing the reliable miRNA-target non-interactions, and a novel proposal to model miRNA-target site binding and suppression.

3. Results

3.1. Performance of PRIMITI-TS

We employed a number of cross-validation procedures to evaluate the model used in PRIMITI-TS in prioritising miRNA-target site interactions. On 5-fold cross-validation, PRIMITI-TS achieved AUC bACC, F1 and MCC values of 0.956, 0.898, 0.897 and 0.796, respectively. In addition, PRIMITI-TS yielded comparable results on 10- and 20-fold cross-validation procedures, indicating its ability to accurately predict miRNA-target site interactions (Table 1 and Fig. 2A).

Table 1.

The performance evaluation of PRIMITI-TS on 5-fold, 10-fold, and 20-fold cross-validation, and on a blind test set.

Method AUC bACC F1 MCC Precision Recall Specificity
5-fold cross-validation 0.956 ± 0.003 0.898 ± 0.006 0.897 ± 0.006 0.796 ± 0.012 0.900 ± 0.006 0.895 ± 0.008 0.900 ± 0.006
10-fold cross-validation 0.956 ± 0.005 0.899 ± 0.007 0.899 ± 0.006 0.798 ± 0.013 0.901 ± 0.013 0.896 ± 0.010 0.901 ± 0.015
20-fold cross-validation 0.959 ± 0.009 0.903 ± 0.017 0.903 ± 0.018 0.807 ± 0.035 0.906 ± 0.018 0.901 ± 0.027 0.906 ± 0.020
Blind test 0.966 0.904 0.903 0.809 0.915 0.892 0.917

Fig. 2.

Fig. 2

The Receiver Operating Characteristic (ROC) Curve Analysis of PRIMITI. (A) PRIMITI-TS successfully predicts miRNA-target site interactions in a blind test, 5-fold, 10-fold, and 20-fold cross-validation (AUC of 0.966, 0.956, 0.956, and 0.959, respectively). (B) PRIMITI-TM, a model trained with miRNA-mRNA repression activities, effectively prioritises potential repression activity, with AUC values of 0.959, 0.960, 0.959, and 0.958 in a blind test, 5-fold, 10-fold, and 20-fold cross-validation, respectively.

Additionally, we conducted an assessment across an independent blind test to estimate PRIMITI-TS' ability to generalise across unseen data. The model has achieved AUC, bACC, F1 and MCC scores of 0.966, 0.904, 0.903 and 0.809, respectively. These results are consistent with the predictive results from the employed cross-validation procedures. (Table 1 and Fig. 2 A).

The predictive performance of PRIMITI-TS to previously unknown data was further evaluated using HITS-CLIP data [13] (Table S5 and Fig. S4) with the assessment of several classification thresholds. After this, PRIMITI-TS reached bACC, F1, MCC, Precision, and Specificity up to 0.918, 0.148, 0.258, 0.080 and 0.846 at a cut-off of 0.900, while achieving a Recall of 0.990.

Supplementary analyses are conducted to assess the model’s generalisability and the results are presented in Supplementary Results (Table S6-S7). An analysis aims to uncover the model’s capability to generalise across different experimental data sources (Table S1) by constructing models using two datasets and evaluating their performance on a different dataset. The results indicate that all models demonstrate a notable degree of generalisation across the datasets, with an AUC of approximately 0.9. Despite a minor decline in performance, the model can accurately predict outcomes in diverse C. elegans cells [15] when trained on human cells [14], [16], achieving AUC, bACC, F1 and MCC scores of 0.895, 0.813, 0.803 and 0.628, respectively (Table S6).

Further analysis was performed to evaluate the model’s capability to handle newly unseen miRNAs and transcripts (Table S7). Three independent tests are being conducted using 30 % of miRNAs and/or mRNAs that have been separated from the training set. Although the models’ performance is slightly affected by a lack of characterisation on new miRNAs and/or transcripts, they are still capable of effectively applying their knowledge to new data, achieving AUC, bACC, F1 and MCC values more than 0.920, 0.840, 0.830 and 0.690, respectively.

The results provided in this section and in the Supplementary Materials corroborate the robustness and accuracy of PRIMITI-TS for properly predicting unseen interactions between miRNAs and mRNA target sites.

3.2. PRIMITI-TS's interpretation

Model interpretability is a critical aspect for demonstrating the viability of use in a real-world situation by providing insight into how the model works [49], [50]. Thus, we have employed SHapley Additive exPlanation (SHAP) [50] to analyse the relationships between the features and model’s output, as illustrated in Fig. 3. The features in this figure are listed in descending order according to the degree of impact on model prediction. We observed that the most crucial feature is the overall interaction energy estimated from IntaRNA [51]. The miRNA-target site combination that requires low energy tends to be a functional target site, which biologically corresponds to the fact that a miRNA-mRNA duplex structure that requires lower energy is more energetically stable. Additionally, this interpretation results demonstrate that site type is a significant predictor of a functional target site. Being an 8-mer site increases the likelihood of being a functional target site, while 6-mer increases the likelihood of being a non-functional target site. This can be explained by the fact that an 8-mer site contains seven seed complementary pairings and adenine at the first position, resulting in the highest binding affinity compared to a 6-mer site with the lowest affinity. Because binding affinity correlates with repression efficiency [37], the status of the site type plays a crucial role in determining functional repression. This is consistent with a prior analysis that exhibits a significant statistical difference in the distribution of site types between positive and negative samples (Table S8).

Fig. 3.

Fig. 3

The most essential feature is an overall interaction energy, followed by 6-mer seed type and distance from the 3’-UTR terminal. The SHAP value was computed for each feature in PRIMITI-TS. The features are sorted according to their impact on the model’s prediction accuracy. Each dot denotes an interaction between a miRNA and its target site, while colours denote the values of features, with red denoting high values and blue denoting low values.

We also discovered that targets located near the 3’-UTR are more likely to be functional, which is consistent with a previous study demonstrating that putative target sites are more evolutionarily conserved in the area towards the start and end of the 3’-UTR owing to a greater degree of site accessibility [38]. Furthermore, SHAP results also indicated that single-nucleotide polymorphism (SNP) characteristics provide only a small contribution to model prediction, corroborating the prior analysis’s finding (Fig. 3). We hypothesise that the limited data availability of reported disease-related SNPs limited the contribution of SNP features toward the performance [52].

3.3. Performance of PRIMITI-TM

Similar to the evaluation process of PRIMITI-TS, a blind test, 5-fold, 10-fold, and 20-fold cross-validation procedures were used to assess the performance of PRIMITI-TM in predicting miRNA-target mRNA repression activity. On 5-fold cross-validation, PRIMITI-TM obtained an AUC, bACC, F1, and MCC values of 0.960, 0.867, 0.834, and 0.765, respectively (Table 2). Comparable performances were achieved using 10-fold and 20-fold cross-validation, respectively (Table 2). This demonstrates that PRIMITI-TM is capable of accurately predicting the repression activity of miRNA-target mRNA. We conducted a blind test to determine PRIMITI-TM's generalisation capabilities. Equivalent performance in cross-validation was achieved, with AUC, bACC, F1, and MCC values of 0.959, 0.868, 0.835 and 0.764, respectively. Receiver operating characteristics curves (ROC) of all validations are shown in Fig. 2. The comparable performances in a blind test, as well as in 5-fold, 10-fold, and 20-fold cross-validation, support the robustness of PRIMITI-TM in predicting miRNA-target mRNA interactions.

Table 2.

PRIMITI-TM performance evaluation on 5-fold, 10-fold, and 20-fold cross-validation, and a blind test.

Methods AUC bACC F1 MCC Precision Recall Specificity
5-fold cross-validation 0.960 ± 0.004 0.867 ± 0.012 0.834 ± 0.016 0.765 ± 0.021 0.902 ± 0.015 0.777 ± 0.021 0.957 ± 0.007
10-fold cross-validation 0.959 ± 0.006 0.864 ± 0.013 0.830 ± 0.018 0.758 ± 0.026 0.897 ± 0.026 0.773 ± 0.025 0.955 ± 0.014
20-fold cross-validation 0.958 ± 0.012 0.864 ± 0.020 0.831 ± 0.026 0.760 ± 0.037 0.901 ± 0.032 0.771 ± 0.033 0.957 ± 0.015
Blind test 0.959 0.868 0.835 0.764 0.896 0.782 0.954

To further evaluate the generalisation of PRIMITI-TM, experimental miRNA-mRNA repression activities from Linsley et al. [11] were employed. This dataset is notably different from the dataset used to train PRIMITI-TM in terms of an experimental approach and cell line. Numerous computational models have been presented to date for predicting miRNA-mRNA interactions. In this study, we compare the performance of PRIMITI-TM with four state-of-the-art predictors: miRTarget [12], [25], TargetScan [24], DIANA-microT-CDS v5 [23], and RNA22 [21]. While the chosen techniques are trained on a variety of datasets, none of them is trained on the Linsley et al. microarray [11], which would allow a fair comparison. The prediction results from respective websites were retrieved and compared. Due to the highly imbalanced dataset, we validated the findings using MCC, F1, bACC, Precision, Recall, and Specificity. The result demonstrates that selected predictors may be classified into two categories: models that prioritise precision (miRTarget, DIANA-microT-CDS v5, and RNA22) and models that prioritise recall (PRIMITI-TM and TargetScan). Among all methods, miRTarget delivers the best overall performance (MCC, F1, and bACC) (Table 3). However, miRTarget’s model was built at the cost of a poor recall rate. This implies that a substantial proportion of functional miRNA-target mRNA interactions will be likely to be predicted as non-interactions. In contrast, our model captures the majority of positive samples, providing the greatest recall of any model. This contributes to the applicability of PRIMITI-TM for initial interaction screening. Additionally, PRIMITI-TM outperforms TargetScan, the other recall-focused model, on all metrics, demonstrating a higher predictive potential (Table 3). We recommend combining PRIMITI-TM with several precision-focused predictors to efficiently identify miRNA-target interactions with high confidence.

Table 3.

Validation of PRIMITI-TM based on Linsley microarray data.

Model MCC F1 bACC Precision Recall Specificity
PRIMITI 0.022 0.108 0.523 0.059 0.681 0.364
TargetScan 0.006 0.103 0.507 0.056 0.661 0.352
miRTarget 0.111 0.164 0.602 0.102 0.418 0.786
DIANA-microT-CDS v5 0.052 0.121 0.540 0.084 0.220 0.860
RNA22 -0.041 0.078 0.455 0.044 0.336 0.575

Additionally, our findings were corroborated using a dataset of 759,764 experimentally validated miRNA-target mRNA interactions obtained from miRTarbase [34] and Tarbase [35]. We select 20 well-studied miRNAs with the highest number of validated targets. For each miRNAs, we randomly select 100 positive samples and 100 negative samples, resulting in a balanced dataset of 4000 miRNA-target interactions. The process was repeated for 10 replicates. The performance of PRIMITI-TM was compared with that of miRTarget [12], [25], TargetScan [24], DIANA-microT-CDS v5 [23], and RNA22 [21] (Table 4). The results suggest that both miRTarget and PRIMITI-TM are capable of achieving the greatest overall performance, as measured by MCC, F1, and bACC. However, when recall is taken into account, PRIMITI-TM attained a significantly higher score (0.466) than miRTarget (0.176) (Table 4).

Table 4.

A validation based on a dataset of experimentally validated miRNA-target mRNA interactions retrieved from miRTarbase and Tarbase databases.

Model MCC F1 bACC Precision Recall Specificity
PRIMITI 0.108 ± 0.012 0.510 ± 0.010 0.553 ± 0.006 0.564 ± 0.007 0.466 ± 0.013 0.641 ± 0.008
TargetScan 0.079 ± 0.015 0.471 ± 0.010 0.538 ± 0.008 0.551 ± 0.010 0.411 ± 0.012 0.666 ± 0.012
miRTarget 0.158 ± 0.015 0.283 ± 0.009 0.552 ± 0.005 0.709 ± 0.021 0.176 ± 0.006 0.928 ± 0.006
DIANA-microT-CDS v5 0.049 ± 0.011 0.034 ± 0.004 0.505 ± 0.001 0.725 ± 0.058 0.017 ± 0.002 0.993 ± 0.002
RNA22 0.005 ± 0.008 0.103 ± 0.003 0.501 ± 0.002 0.511 ± 0.016 0.058 ± 0.002 0.945 ± 0.003

In summary, the main reason for the great predictive performance achieved by PRIMITI is related to its reliable data (including the miRNA-target site non-interactions, i.e., the negative samples), the advances in characterising both miRNA-target site binding and repression, and the novelties while modelling target-site interactions with machine learning, including a new way to summarise the target site information as features to predict miRNA-target repression. As a consequence, our findings – which are based on the implemented methodology – indicate that PRIMITI is an excellent tool for pre-screening miRNA-target interactions.

4. Online web interface

PRIMITI is accessible to the public in the form of a web server with a user-friendly interface via https://biosig.lab.uq.edu.au/primiti/. PRIMITI’s Prediction and Results pages are demonstrated in Fig. 4. In Fig. 4A, users must supply a list of miRNAs or a single miRNA following the universal miRBase format. Users must also provide a list of mRNAs or a single mRNA within their transcript IDs, where an Ensembl Transcript (ENST) format should be preferably used. Details on alternative input formats are listed at https://biosig.lab.uq.edu.au/primiti/help. PRIMITI will then process the inputs, providing a result table (Fig. 4B). Each miRNA-target interaction is accompanied by an ‘interaction confidence’ score, indicating the likelihood of functional miRNA-mRNA repression as determined by PRIMITI-TM. Details of each miRNA-target binding site, including a ‘binding confidence’ score for miRNA-target site binding, are provided (Fig. 4B).

Fig. 4.

Fig. 4

PRIMITI web server interface. (A) The PRIMITI web server requires an input file containing a list of miRNAs in miRBase format and transcripts or gene identifiers. In the case of gene identifiers, the longest transcripts will be selected for a prediction. (B) The result of PRIMITI is provided in tables. Prediction scores for PRIMITI-TS and TM will be given to each binding site and transcript, in which a higher score indicates a higher probability of interaction.

In addition, all the experimental data used to train, (cross-)validate and test PRIMITI’s model can be downloaded via its web server at https://biosig.lab.uq.edu.au/primiti/data. Furthermore, all code and data related to PRIMITI can be downloaded or accessed through GitHub at https://github.com/winkoramon/PRIMITI-miRNA-target-mRNA-interactions.

5. Discussion

MicroRNAs play a crucial role in controlling gene expression networks, which are responsible for a wide range of cellular processes [1], [2], [3]. Distinct miRNA profiles are present in every cell type, precisely influencing cellular phenotype. The dysregulation of miRNA functions is associated with various human diseases [6], [7], [8]. Exploring the functions of miRNA provides valuable knowledge about disease mechanisms, which helps in the discovery of biomarkers and the development of innovative therapies [9], [10]. The introduction of high-throughput sequencing techniques has improved our understanding of the activities of miRNAs, with a particular focus on their effects on target mRNAs [12], [13], [14], [15], [16]. With this progress, computational approaches have arisen as a solution to the limitations of traditional methods, enabling efficient analysis of high-throughput transcriptomic data. These approaches allow for large-scale investigation of miRNA-mediated interactions [12], [19], [20], [21], [22], [23], [24], [25].

In this work, we proposed and developed PRIMITI, a novel machine learning model for predicting miRNA-target mRNA repression. PRIMITI identifies patterns between miRNA-mediated repression of mRNA in 3’-UTR from CLIP-seq and gene expression (RNA-seq or microarray) data [11], [12], [13], [14], [15], [16]. In PRIMITI’s method, we have also introduced negative sample selection to filter out low-quality negative samples and improve the reliability of the training dataset (Table S9). Moreover, genetic variation and sequence-derived information are implemeted as a way to improve the characterisation of miRNA-target site binding in PRIMITI-TS. Sequence-derived information is carried out by iFeatures, which are a set of features generated by iLearn’s sophisticated scheme for encoding nucleic sequences into numerical features [27], [28]. Human genetic variation was brough by human-based SNPs and is another area that may host useful information for the miRNA repression characterisation [32], [52]. As a result, PRIMITI succeeds in predicting miRNA-mediated repression with an AUC and MCC up to 0.96 and 0.80 for a prediction of functional miRNA-target site binding and up to 0.96 and 0.76 for a prediction of miRNA-target mRNA repression activity. This predictive performance is achieved by PRIMITI on 10-fold cross-validation over the training set and independent blind test. The model also achieves a good performance in predicting repression in an unseen microarray dataset and collection of validated miRNA-mRNA interactions, yielding the largest number of validated miRNA-target repressions accurately predicted as repressed, when compared to state-of-the-art methods, demonstrating the utility for preliminary screening (Table 3, Table 4). PRIMITI is made available either in a GitHub repository (https://github.com/winkoramon/PRIMITI-miRNA-target-mRNA-interactions) or a web server https://biosig.lab.uq.edu.au/primiti). PRIMITI offers a number of benefits for the large-scale identification of miRNA-target interaction by providing valuable experimental guidance for the validation of predicted targets. By prioritising computationally selected candidates, researchers can optimise their experimental efforts, leading to more targeted and successful validations, as well as improving the pipeline’s efficiency and cost-effectiveness. PRIMITI’s model also provides a comprehensive understanding of miRNA-target site binding, including binding probability and binding structure (Fig. 4B).

6. Limitations

There are a few limitations associated with this work. Firstly, a lack of training data may limit the generalisation of our model. PRIMITI was trained on a relatively small dataset, considering that there are over 2000 unique human miRNAs [53], 100,000 unique transcripts, and a distinct set of cell types in the human body. Despite this limitation, our model still demonstrates reasonable predictive capabilities when trained on human data [14], [16] and tested on drastically different C. elegans data [15] (Table S6). However, we anticipate that incorporating additional data will considerably improve the model’s predictive performance and, as a consequence, generalisation across numerous cell lines and species.

In addition, the sensitivity of the experiment may have an impact on the model accuracy, particularly in regard to negative samples. Obtaining reliable negative samples is crucial for the development of a robust machine learning model. However, due to limited sensitivity, it becomes challenging to acquire true negative samples, as some positives can be misidentified as negatives. To address this challenging aspect, we implemented a negative sample selection approach to increase the reliability of negative samples, which resulted in a significant performance improvement (Table S9). Nonetheless, an enhanced experimental method with increased sensitivity has the potential to further improve the overall performance of our proposed model.

6.1. Feature generation

In this work, PRIMITI-TS have implemented new sets of features, iFeature and single nucleotide polymorphisms (SNPs). iFeature, a nucleic acid sequence encoding scheme provided by iLearn, has demonstrated significant enhancements in various machine learning models across diverse research domains [29], [30]. Our analysis of the model’s performance indicated a substantial improvement when integrating iFeature into PRIMITI-TS (Table S10). The 5-fold cross-validation results showed an increase in the Matthews correlation coefficient (MCC) from 0.71 to 0.80. This finding also suggests that employing advanced encoding schemes can effectively aid in the characterisation of target sites. In subsequent studies, it would be beneficial to investigate the potential advantages of employing other encoding schemes, such as RNABERT [54], to further improve the performance of the model.

Furthermore, within the framework of PRIMITI-TS, we hypothesised that leveraging human genetic variations could aid in the characterisation of miRNA-binding sites. It has been observed that SNPs occur across verified miRNA-targeted mRNA sequences, with some of them contributing to dysregulated gene expression regulation and the progression of disease [32]. Based on this supporting evidence, we integrated a set of SNPs and disease-related SNPs from miRNASNP v3 [52] into the PRIMITI-TS model. However, our analysis did not reveal any significant differences in performance between models trained with and without SNP features (Table S10). This unexpected outcome implies that additional research is needed to determine the significance of SNP traits in miRNA-target mRNA interactions and to evaluate the extent of human SNP data coverage in miRNA targets. For example, we intend in future research to evaluate the benefits of exploring alternative representations for SNPs. We also plan to incorporate multiple data sources to increase the quantity of variation data.

7. Conclusion

miRNAs play an essential role in post-transcriptional gene regulation through complementary base pairing with target sites in mRNAs. Therefore, identifying target sites in mRNAs and target mRNAs is an essential task for leading not only to an improvement of diagnostic tools and treatments but also to a better understanding of the whole human epigenetic regulation. In this study, PRIMITI was developed to identify miRNA-target mRNA interactions through machine learning modelling, incorporating novelties in terms of the types of employed features (i.e., iLearn-based and SNP-related features) to better characterise functional miRNA binding sites, as well as negative sample selection to provide a more reliable training set for a machine learning model. PRIMITI was assessed internally and externally validated, resulting in a robust predictive performance in terms of area under the ROC curve (AUC), Matthew’s correlation coefficient, Recall (or sensitivity) and specificity. PRIMITI’s predictive results indicate a great generalisation, providing the ability to correctly identify miRNA-target mRNA repression. Different from other alternative methods, PRIMITI is made available as a robust and user-friendly predictive web server at https://biosig.lab.uq.edu.au/primiti/. In future work, we plan to improve PRIMITI by ensembling it with other alternative methods and models. We trust this would extend PRIMITI’s current predictive performance, especially on the precision metric.

Code and data availability

The data used to train and test PRIMITI-TS and PRIMITI-TM and other supplementary data is available at https://biosig.lab.uq.edu.au/primiti/data. The source code for PRIMITI is available at https://github.com/winkoramon/PRIMITI-miRNA-target-mRNA-interactions.

Funding

This research was funded by the Investigator Grant from the National Health and Medical Research Council (NHMRC) of Australia [GNT1174405] and the Victorian Government's Operational Infrastructure Support Program (in part).

CRediT authorship contribution statement

David Benjamin Ascher: Writing – review & editing, Supervision, Methodology, Formal analysis, Conceptualization. Korawich Uthayopas: Writing – original draft, Methodology, Investigation, Formal analysis, Data curation. Alex de Sá: Writing – review & editing, Methodology, Formal analysis. Azadeh Alavi: Writing – review & editing, Methodology. Douglas Pires: Writing – review & editing, Methodology.

Declaration of Competing Interest

The authors declare no conflicts of interest.

Acknowledgements

We would like to thank Carlos Rodrigues for his assistance on web server deployment.

Footnotes

Appendix A

Supplementary data associated with this article can be found in the online version at doi:10.1016/j.csbj.2024.06.030.

Contributor Information

Alex G.C. de Sá, Email: Alex.deSa@baker.edu.au.

David B. Ascher, Email: d.ascher@uq.edu.au.

Appendix A. Supplementary material

Supplementary material

mmc1.docx (1MB, docx)

.

References

  • 1.Hombach S., Kretz M. Non-coding RNAs: classification, biology and functioning. Adv Exp Med Biol. 2016;937:3–17. doi: 10.1007/978-3-319-42059-2_1. [DOI] [PubMed] [Google Scholar]
  • 2.Gebert L.F.R., MacRae I.J. Regulation of microRNA function in animals. Nat Rev Mol Cell Biol. 2019;20(1):21–37. doi: 10.1038/s41580-018-0045-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.O'Brien J., et al. Overview of MicroRNA biogenesis, mechanisms of actions, and circulation. Front Endocrinol (Lausanne) 2018;9:402. doi: 10.3389/fendo.2018.00402. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Kim D., et al. General rules for functional microRNA targeting. Nat Genet. 2016;48(12):1517–1526. doi: 10.1038/ng.3694. [DOI] [PubMed] [Google Scholar]
  • 5.Friedman R.C., et al. Most mammalian mRNAs are conserved targets of microRNAs. Genome Res. 2009;19(1):92–105. doi: 10.1101/gr.082701.108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Peng Y., Croce C.M. The role of MicroRNAs in human cancer. Signal Transduct Target Ther. 2016;1:15004. doi: 10.1038/sigtrans.2015.4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Romaine S.P., et al. MicroRNAs in cardiovascular disease: an introduction for clinicians. Heart. 2015;101(12):921–928. doi: 10.1136/heartjnl-2013-305402. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Juzwik C.A., et al. microRNA dysregulation in neurodegenerative diseases: a systematic review. Prog Neurobiol. 2019;182 doi: 10.1016/j.pneurobio.2019.101664. [DOI] [PubMed] [Google Scholar]
  • 9.Hanna J., Hossain G.S., Kocerha J. The potential for microRNA therapeutics and clinical research. Front Genet. 2019;10:478. doi: 10.3389/fgene.2019.00478. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Gebert L.F., et al. Miravirsen (SPC3649) can inhibit the biogenesis of miR-122. Nucleic Acids Res. 2014;42(1):609–621. doi: 10.1093/nar/gkt852. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Linsley P.S., et al. Transcripts targeted by the microRNA-16 family cooperatively regulate cell cycle progression. Mol Cell Biol. 2007;27(6):2240–2252. doi: 10.1128/MCB.02005-06. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Liu W., Wang X. Prediction of functional microRNA targets by integrative modeling of microRNA binding and target expression data. Genome Biol. 2019;20(1):18. doi: 10.1186/s13059-019-1629-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Boudreau R.L., et al. Transcriptome-wide discovery of microRNA binding sites in human brain. Neuron. 2014;81(2):294–305. doi: 10.1016/j.neuron.2013.10.062. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Helwak A., et al. Mapping the human miRNA interactome by CLASH reveals frequent noncanonical binding. Cell. 2013;153(3):654–665. doi: 10.1016/j.cell.2013.03.043. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Grosswendt S., et al. Unambiguous identification of miRNA:target site interactions by different types of ligation reactions. Mol Cell. 2014;54(6):1042–1054. doi: 10.1016/j.molcel.2014.03.049. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Kozar I., et al. Cross-linking ligation and sequencing of hybrids (qCLASH) reveals an unpredicted miRNA targetome in melanoma cells. Cancers (Basel) 2021;13(5) doi: 10.3390/cancers13051096. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Quillet A., et al. Prediction methods for microRNA targets in bilaterian animals: toward a better understanding by biologists. Comput Struct Biotechnol J. 2021;19:5811–5825. doi: 10.1016/j.csbj.2021.10.025. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Yue D., Liu H., Huang Y. Survey of computational algorithms for MicroRNA target prediction. Curr Genom. 2009;10(7):478–492. doi: 10.2174/138920209789208219. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.John B., et al. Human MicroRNA targets. PLoS Biol. 2004;2(11) doi: 10.1371/journal.pbio.0020363. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Enright A.J., et al. MicroRNA targets in Drosophila. Genome Biol. 2003;5(1) doi: 10.1186/gb-2003-5-1-r1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Miranda K.C., et al. A pattern-based method for the identification of MicroRNA binding sites and their corresponding heteroduplexes. Cell. 2006;126(6):1203–1217. doi: 10.1016/j.cell.2006.07.031. [DOI] [PubMed] [Google Scholar]
  • 22.Kertesz M., et al. The role of site accessibility in microRNA target recognition. Nat Genet. 2007;39(10):1278–1284. doi: 10.1038/ng2135. [DOI] [PubMed] [Google Scholar]
  • 23.Paraskevopoulou M.D., et al. DIANA-microT web server v5.0: service integration into miRNA functional analysis workflows. Nucleic Acids Res. 2013;41(Web Server issue):W169–W173. doi: 10.1093/nar/gkt393. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Agarwal V., et al. Predicting effective microRNA target sites in mammalian mRNAs. Elife. 2015;4 doi: 10.7554/eLife.05005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Chen Y., Wang X. miRDB: an online database for prediction of functional microRNA targets. Nucleic Acids Res. 2020;48(D1):D127–D131. doi: 10.1093/nar/gkz757. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Hafner M., et al. CLIP and complementary methods. Nat Rev Methods Prim. 2021;1(1):20. [Google Scholar]
  • 27.Chen Z., et al. iLearn: an integrated platform and meta-learner for feature engineering, machine-learning analysis and modeling of DNA, RNA and protein sequence data. Brief Bioinform. 2020;21(3):1047–1057. doi: 10.1093/bib/bbz041. [DOI] [PubMed] [Google Scholar]
  • 28.Chen Z., et al. iFeature: a Python package and web server for features extraction and selection from protein and peptide sequences. Bioinformatics. 2018;34(14):2499–2502. doi: 10.1093/bioinformatics/bty140. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Musleh S., et al. MSLP: mRNA subcellular localization predictor based on machine learning techniques. BMC Bioinforma. 2023;24(1):109. doi: 10.1186/s12859-023-05232-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Asim M.N., et al. MP-VHPPI: Meta predictor for viral host protein-protein interaction prediction in multiple hosts and viruses. Front Med (Lausanne) 2022;9:1025887. doi: 10.3389/fmed.2022.1025887. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Hirschhorn J.N., Daly M.J. Genome-wide association studies for common diseases and complex traits. Nat Rev Genet. 2005;6(2):95–108. doi: 10.1038/nrg1521. [DOI] [PubMed] [Google Scholar]
  • 32.Moszynska A., et al. SNPs in microRNA target sites and their potential role in human disease. Open Biol. 2017;7(4) doi: 10.1098/rsob.170019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.John, G.H., R. Kohavi, K. Pfleger, Irrelevant Features and the Subset Selection Problem, in Machine Learning Proceedings 1994, W.W. Cohen and H. Hirsh, Editors. 1994, Morgan Kaufmann: San Francisco (CA). p. 121–129.
  • 34.Huang H.Y., et al. miRTarBase 2020: updates to the experimentally validated microRNA-target interaction database. Nucleic Acids Res. 2020;48(D1):D148–D154. doi: 10.1093/nar/gkz896. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Karagkouni D., et al. DIANA-TarBase v8: a decade-long collection of experimentally supported miRNA-gene interactions. Nucleic Acids Res. 2018;46(D1):D239–D245. doi: 10.1093/nar/gkx1141. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Lemaître G., Nogueira F., Aridas C.K. Imbalanced-learn: a python toolbox to tackle the curse of imbalanced datasets in machine learning. J Mach Learn Res. 2017;18(1):559–563. [Google Scholar]
  • 37.McGeary S.E., et al. The biochemical basis of microRNA targeting efficacy. Science. 2019;366(6472) doi: 10.1126/science.aav1741. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Grimson A., et al. MicroRNA targeting specificity in mammals: determinants beyond seed pairing. Mol Cell. 2007;27(1):91–105. doi: 10.1016/j.molcel.2007.06.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Sheu-Gruttadauria J., et al. Beyond the seed: structural basis for supplementary microRNA targeting by human Argonaute2. EMBO J. 2019;38(13) doi: 10.15252/embj.2018101153. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Schirle N.T., Sheu-Gruttadauria J., MacRae I.J. Structural basis for microRNA targeting. Science. 2014;346(6209):608–613. doi: 10.1126/science.1258040. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Wee L.M., et al. Argonaute divides its RNA guide into domains with distinct functions and RNA-binding properties. Cell. 2012;151(5):1055–1067. doi: 10.1016/j.cell.2012.10.036. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Géron, A. and aO.R.M.C. Safari, Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition. 2019: O'Reilly Media, Incorporated.
  • 43.Chen T., Guestrin C. in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery; San Francisco, California, USA: 2016. XGBoost: A Scalable Tree Boosting System; pp. 785–794. [Google Scholar]
  • 44.Pires D.E.V., Ascher D.B. mycoCSM: using graph-based signatures to identify safe potent hits against mycobacteria. J Chem Inf Model. 2020;60(7):3450–3456. doi: 10.1021/acs.jcim.0c00362. [DOI] [PubMed] [Google Scholar]
  • 45.Nguyen T.B., et al. mmCSM-NA: accurately predicting effects of single and multiple mutations on protein-nucleic acid binding affinity. NAR Genom Bioinform. 2021;3(4) doi: 10.1093/nargab/lqab109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Liu J., et al. MicroRNA-199b-3p suppresses malignant proliferation by targeting Phospholipase Cepsilon and correlated with poor prognosis in prostate cancer. Biochem Biophys Res Commun. 2021;576:73–79. doi: 10.1016/j.bbrc.2021.08.078. [DOI] [PubMed] [Google Scholar]
  • 47.Mercatelli N., et al. The inhibition of the highly expressed miR-221 and miR-222 impairs the growth of prostate carcinoma xenografts in mice. PLoS One. 2008;3(12) doi: 10.1371/journal.pone.0004029. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Ghorbanmehr N., et al. miR-21-5p, miR-141-3p, and miR-205-5p levels in urine-promising biomarkers for the identification of prostate and bladder cancer. Prostate. 2019;79(1):88–95. doi: 10.1002/pros.23714. [DOI] [PubMed] [Google Scholar]
  • 49.Ribeiro M.T., Singh S., Guestrin C. in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery; San Francisco, California, USA: 2016. Why Should I Trust You?": Explaining the Predictions of Any Classifier; pp. 1135–1144. [Google Scholar]
  • 50.Lundberg S.M., Lee S.-I. in Proceedings of the 31st International Conference on Neural Information Processing Systems. Curran Associates Inc; Long Beach, California, USA: 2017. A unified approach to interpreting model predictions; pp. 4768–4777. [Google Scholar]
  • 51.Mann M., Wright P.R., Backofen R. IntaRNA 2.0: enhanced and customizable prediction of RNA-RNA interactions. Nucleic Acids Res. 2017;45(W1):W435–W439. doi: 10.1093/nar/gkx279. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Liu C.J., et al. miRNASNP-v3: a comprehensive database for SNPs and disease-related variations in miRNAs and miRNA targets. Nucleic Acids Res. 2021;49(D1):D1276–D1281. doi: 10.1093/nar/gkaa783. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Alles J., et al. An estimate of the total number of true human miRNAs. Nucleic Acids Res. 2019;47(7):3353–3364. doi: 10.1093/nar/gkz097. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Akiyama M., Sakakibara Y. Informative RNA base embedding for RNA structural alignment and clustering by deep representation learning. NAR Genom Bioinform. 2022;4(1) doi: 10.1093/nargab/lqac012. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary material

mmc1.docx (1MB, docx)

Data Availability Statement

The data used to train and test PRIMITI-TS and PRIMITI-TM and other supplementary data is available at https://biosig.lab.uq.edu.au/primiti/data. The source code for PRIMITI is available at https://github.com/winkoramon/PRIMITI-miRNA-target-mRNA-interactions.


Articles from Computational and Structural Biotechnology Journal are provided here courtesy of Research Network of Computational and Structural Biotechnology

RESOURCES