Skip to main content
Molecular Biomedicine logoLink to Molecular Biomedicine
. 2025 Apr 9;6:22. doi: 10.1186/s43556-025-00263-w

Applications of gene pair methods in clinical research: advancing precision medicine

Changchun Wu 1, Xueqin Xie 1, Xin Yang 1, Mengze Du 2, Hao Lin 1,, Jian Huang 1,
PMCID: PMC11982013  PMID: 40202606

Abstract

The rapid evolution of high-throughput sequencing technologies has revolutionized biomedical research, producing vast amounts of gene expression data that hold immense potential for biological discovery and clinical applications. Effectively mining these large-scale, high-dimensional data is crucial for facilitating disease detection, subtype differentiation, and understanding the molecular mechanisms underlying disease progression. However, the conventional paradigm of single-gene profiling, measuring absolute expression levels of individual genes, faces critical limitations in clinical implementation. These include vulnerability to batch effects and platform-dependent normalization requirements. In contrast, emerging approaches analyzing relative expression relationships between gene pairs demonstrate unique advantages. By focusing on binary comparisons of two genes’ expression magnitudes, these methods inherently normalize experimental variations while capturing biologically stable interaction patterns. In this review, we systematically evaluate gene pair-based analytical frameworks. We classify eleven computational approaches into two fundamental categories: expression value-based methods quantifying differential expression patterns, and rank-based methods exploiting transcriptional ordering relationships. To bridge methodological development with practical implementation, we establish a reproducible analytical pipeline incorporating feature selection, classifier construction, and model evaluation modules using real-world benchmark datasets from pulmonary tuberculosis studies. These findings position gene pair analysis as a transformative paradigm for mining high-dimensional omics data, with direct implications for precision biomarker discovery and mechanistic studies of disease progression.

Keywords: Gene pair, High-throughput sequencing, Ranking relationships, Clinical research, Batch effects

Introduction

Genes are the fundamental units of life, playing a crucial role in the inheritance and functional expression within organisms. All life phenomena, such as birth, growth, illness, aging, and death, are related to gene [14]. High-throughput sequencing technology [5] and DNA microarray technology [6] are crucial tools in genomics research, enabling scientists to monitor gene expression on a genomic scale [7]. With the widespread application of these technologies, a large amount of high-throughput gene expression data from cancerous and non-cancerous tissue samples has been generated. These provide a wealth of information, although it is provided only implicitly in the form of raw expression values [8]. Effectively utilizing this information can have a significant impact on disease detection, subtype classification, prognosis, and disease progression [9].

In clinical research, differential expression analysis based on single-gene raw expression levels has been widely used [1012]. Common differential expression analysis methods can be categorized into several types. First, early single-gene differential expression analysis methods are typically based on hypothesis testing, including t-test, Analysis of Variance (ANOVA), and non-parametric test [1316]. These methods assess the significance of gene expression differences under different conditions using traditional statistical hypothesis testing, and are suitable for simpler experimental designs with small sample sizes. Second, the significance analysis of microarrays (SAM) is a statistical inference method that uses ranking-based hypothesis testing to evaluate differential expression between groups. Originally developed for microarray data, SAM is also widely used in RNA-seq analysis [1719], particularly for controlling false positives in multiple hypothesis testing. Additionally, methods like limma (based on linear models and Bayesian methods) [2022], edgeR [2325], and DESeq2 [2628] (which use negative binomial models) are widely used in RNA-seq analysis. These methods excel in handling complex experimental designs and multiple conditions, offering more comprehensive differential expression results. These representative single-gene differential expression analysis methods provide powerful tools for biological research and have uncovered many important biological insights across various fields. For example, Tabone et al. have developed an optimized and simplified feature to differentiate between active tuberculosis and other lung diseases with gene expression profiles [29]. Sun et al. revealed the potential of CLEC3B in lung cancer diagnosis [30]. Lee et al. identified a novel 5-gene biomarker for non-invasive diagnosis of gastric cancer, which may serve as a potential diagnostic tool for early detection [31]. Furthermore, researchers have identified numerous molecular markers associated with subtype classification [32, 33], disease prognosis [34, 35], and disease progression [36, 37]. These studies, although extracting molecular features related to specific issues from high-dimensional gene data, mainly focus on the level of individual genes. It is noteworthy that diseases are often result from interactions among multiple genes and other molecules within biological pathways or gene regulatory networks [38, 39]. Typically, genes within a network collaborate synergistically, and the effect of one gene can significantly influence the activity of other genes. As a result, insights from the expression patterns of individual genes are often limited [40].

As an alternative approach, assessing the relationships between a few genes, such as their interaction patterns, can provide more valuable information on disease-related biomolecular processes [41]. This concept was first put into practice in Bø’s study, which used gene pair features to distinguish between colorectal cancer and normal controls, showing a high level of diagnostic accuracy [42]. The emergence of gene pair method provided new perspectives for subsequent research, leading to the gradual development of many new gene pair-based methods, which have been widely applied and have become reliable sources for biological discoveries [4347]. To date, these methods can be divided into two main categories. One category is based on gene expression values. These methods combine the expression values of two genes to create new features that can be applied to various clinical issues. Compared with single-gene expression patterns, these approaches may provide more useful insights. Moreover, some methods are robust to batch effects from different experimental platforms (Fig. 1). The other category focuses on gene ranking relationships. These methods depend solely on comparing the expression values of two genes, making them also unaffected by batch effects and eliminating the need for data normalization. As a result, they produce accurate, robust, and easily interpretable results with strong clinical practicality (Fig. 1).

Fig. 1.

Fig. 1

Illustration of batch effects in gene expression profiles. Gene expression profiles from two batches (blue and brown) were used to distinguish disease (cross) from healthy control (circle) samples. The X-axis and Y-axis represent the expression levels of genes B and A, respectively. In Batch 1, a fixed threshold for Gene A’s expression level (e.g., Gene A > 6) can effectively classify samples as healthy or diseased. However, this approach fails in Batch 2, highlighting the influence of batch effects when relying on absolute expression levels of specific genes. In contrast, methods based on relative changes between two genes, such as relative expression rankings or ratios, are robust to batch effects and yield consistent classification results across both batches. For instance, in both Batch 1 and Batch 2, a sample is classified as healthy if the expression value of Gene A is consistently higher than Gene B; otherwise, it is classified as diseased

As described above, gene pair methods were first introduced in 2002 [42]. Since then, many new approaches have been developed. In this review, we systematically assessed various gene pair-based data mining algorithms so as to provide insights and guidance for researchers to choose appropriate approaches for effective biomolecular explorations and clinical research. Based on existing methods, we primarily examine eleven gene pair-based approaches—three based on gene expression values and eight based on gene ranking relationships—that were developed using mRNA transcriptome data and binary classification problems. To further promote their application, we constructed a reproducible analytical pipeline based on published methodologies. Using real-world peripheral blood transcriptome gene expression profiles from pulmonary tuberculosis (PTB) as benchmark datasets, we replicated the various methods and evaluated their performance in addressing the biological challenge of PTB diagnosis. Notably, these gene pair-based methods can also be applied to other types and different omics data. This versatility allows them to better serve various clinical scenarios, such as diagnosis, prognosis, and disease progression, thereby advancing medical progress.

Gene pair methods based on gene expression values

There are three gene pair methods based on gene expression values: Pair t-score, Gene Expression Ratios (GERs), and Pair-wise Support Vector Machine ensembles (PSVM) (Fig. 2, Table 1). When detailing each method, unless otherwise specified, it is assumed that the columns of the gene expression profile used for training represent samples, while the rows represent genes.

Fig. 2.

Fig. 2

Timeline of the development of gene pair methods. All the gene pair methods were categorized based on their underlying principles: one group relies on gene expression values, while the other focuses on gene ranking relationships. Methods based on gene expression values (blue) include approaches such as Pair t-score, GERs, and PSVM. Methods based on gene ranking relationships (pink) encompass TSP, k-TSP, TSPG, and subsequent advancements like k-TSP + SVM, RankComp, REOs, REOs + ML and the integration of TSP with machine learning proposed in this study

Table 1.

Gene pair methods based on gene expression values

Methods Author(s) Year MLa Application Batch Effectsb Availability
Pair t-score [42] Bø et al 2002 Yes Classification Task No J-Express software
GERs [48] Gordon et al 2002 No Classification Task Yes No
PSVM [49] Zak et al 2016 Yes Classification Task No No

Abbreviation: ML Machine Learning

aWhether machine learning methods are used

bWhether it is not affected by batch effects

Pair t-score

Pair t-score assigns a score for each gene pair based on their expression values, reflecting how effectively the pair distinguishes between two experimental classes. Proposed by Bø et al. in 2002, this method introduced the concept of gene pairs for the first time [42]. The specific description is as follows:

  1. Projection onto the diagonal linear discriminant (DLD) axis [50]: Pairwise combinations of genes from the gene expression profile are used to form gene pairs. For each gene pair, project the gene expression data onto a DLD axis defined by these two genes;

  2. Calculation of the t-score: For the projected data points, calculate the two-sample t-statistic [51] between the two experimental classes. This statistic is used as the gene pair score to measure the ability of the pair to distinguish between different experimental classes. Based on the scores, a feature subset is selected using one of two methods: an exhaustive method called “all pairs” or a faster method called “greedy pairs”;

  3. All pairs: First, select the gene pair with the highest score, then remove all pairs containing either of these two genes from the list. Next, choose the highest-scoring pair from the remaining list. Repeat this process until the desired subset is obtained;

  4. Greedy pairs: First, rank all genes based on their individual t-scores. Subsequently, this method selects the gene Gi ​with the highest t-score. Then, it identifies gene Gj that together with Gi ​maximizes the pair t-score. These two genes are then removed from the gene set, and the process is repeated on the remaining gene set until the desired number of genes is selected;

  5. Based on the final selected gene pair features, various machine learning algorithms can be combined and applied across multiple clinical scenarios.

The authors applied this method to two public datasets: one comprising samples from colon cancer and normal tissue [52], and the other containing samples from acute lymphoblastic leukemia (ALL) and acute myeloid leukemia (AML) [53]. They demonstrated that accurate diagnosis can be achieved using only 20–30 and 15–20 gene expression levels, respectively. By observing gene pairs, they found that some genes are not effective discriminators when used alone but perform well when paired with others, indicating that gene combinations reveal interesting information that cannot be discovered through other methods. The Pair t-score was integrated into the J-Express software package by the author [54]. This method has subsequently been applied and has provided new insights for many studies [5558].

Gene Expression Ratios

The method aims to accurately differentiate different experimental classes using gene expression ratios and reasonably chosen thresholds. It was proposed by Gordon et al. in 2002 [48]. The authors tested the accuracy of this method in distinguishing between malignant pleural mesothelioma (MPM) and adenocarcinoma (ADCA) in 181 tissue samples (31 MPM and 150 ADCA). They identified gene pairs with highly significant, negatively correlated expression levels using a training set of 32 samples (16 MPM and 16 ADCA), forming a total of 15 ratios from expression profile. Each ratio achieved a diagnostic accuracy of at least 90% in predicting the remaining 149 samples. They further tested the accuracy of multiple ratio combinations, achieving diagnostic accuracies of 95% for MPM versus ADCA using two ratios and 99% with three ratios. The detailed method is described below:

  1. Identifying genes with significantly different average expression levels between two experimental categories based on gene expression profile;

  2. Select two or more genes with the most significant differential expression, ensuring that these genes are not simultaneously highly expressed in one class relative to the other;

  3. Select a class, then divide each gene with relatively high expression in this class (N) by each of the remaining genes (M) to obtain N*M expression ratios;

  4. For a given expression ratio, if > 1 is classified as the selected class, < 1 is another class. Based on this rule, the sample category can be determined by a single expression ratio or by combining multiple expression ratios using a majority voting principle [59].

The GERs method is unaffected by data collection platforms and does not require correction for batch effects. The authors who proposed this method have shown its utility in predicting the efficacy of medulloblastoma treatment. Similarly, another study has shown that GERs can predict clinical outcomes in breast cancer (BC) patients treated with tamoxifen [60]. Furthermore, numerous studies have utilized this method, including in the diagnosis of head and neck squamous cell carcinoma [61] and gastric cancer [62], illustrating its applicability across various clinical scenarios.

Pair-wise Support Vector Machine ensembles

The method combines gene pairs with a linear support vector machines algorithm (SVM), as proposed by Zak et al. in 2016 [49]. Using this method, they identified 16 risk gene features from 46 tuberculosis progressors and 107 matched controls. These features predicted tuberculosis progression with a sensitivity of 66.1% and a specificity of 80.6% in the 12 months preceding tuberculosis diagnosis. The detailed steps are as follows:

  1. Based on gene expression profile data, randomly select 80% of the samples from each experimental category to perform differential analysis and identify differentially expressed genes;

  2. Train SVM-based models separately on all pairwise combinations of the differentially expressed genes;

  3. Using the remaining samples, set an accuracy threshold. Gene pairs that meet this threshold were given a score of “1”, while those that do not were given a score of “0”. Repeat the previous three steps fifty times. Finally, the gene pairs score between 0 and 50. Those with a score of 45 or higher are retained as the final feature set;

  4. Construct an SVM model for each gene pair in the final feature set. Use the predictions from all SVM models and apply the majority voting principle to distinguish between the two experimental categories.

Gene pair methods based on gene ranking relationships

There are eight gene pair methods based on gene ranking relationships, namely Top-Scoring Pair (TSP), k-Top Scoring Pairs (k-TSP), Top-Scoring Pair of Groups (TSPG), k-TSP + SVM, Rank Comparison (RankComp), Relative Expression Orderings (REOs), REOs + Machine Learning (REOs + ML), and TSP + Machine Learning (TSP + ML) (Fig. 2, Table 2). When introducing each method in detail, unless specifically stated otherwise, it is assumed that the columns of the gene expression profile used for training represent samples and the rows represent genes.

Table 2.

Gene pair methods based on gene ranking relationships

Methods Author(s) Year MLa Application Batch Effectsb Availability
TSP [63] Geman et al 2004 No Classification Task Yes R Package
k-TSP [64] Tan et al 2005 No Classification Task Yes R Package
TSPG [65] Xu et al 2007 No Classification Task Yes No
k-TSP + SVM [66] Shi et al 2011 Yes Classification Task Yes No
RankComp [67] Wang et al 2015 No Individualized DEA Yes R Package
REOs [68] Ao et al 2018 No Classification Task Yes No
REOs + ML [69] Zhang et al 2020 Yes Classification Task Yes No
TSP + ML This article 2024 Yes Classification Task Yes No

aWhether machine learning methods are used

bWhether it is not affected by batch effects

Abbreviation: ML, Machine Learning; DEA, differential expression analysis

Top-Scoring Pair

This is a novel method for microarray data classification based on pairwise comparison of relative gene expression levels, proposed by Geman et al. in 2004 [63]. The authors demonstrated the effectiveness of this method in three classification problems: predicting lymph node status in BC patients, classifying leukemia subtypes, and distinguishing prostate cancer patients from normal controls [53, 70, 71]. This method specifically involves the following steps:

  1. Assuming the columns of the gene expression profile represent samples (X1, …, XN) and rows represent genes (G1, …, GP), where the expression value of the i-th gene Gi in the n-th sample Xn is denoted as Xi,n. For binary classification, set the vector of class labels for the samples be (Y1, …, YN), where Yn belongs to C = {C1, C2};

  2. Within each sample, rank all genes based on their expression levels, replacing the expression values of each gene with its rank. This process results in a new matrix R, where the rank of the i-th gene Gi in the n-th sample Xn is denoted as Ri,n;

  3. Pair the genes to form gene pairs (Gi, Gj) where i, j belong to {1, …, P} and i ≠ j. For each gene pair (Gi, Gj), calculate the frequency Pij of Ri < Rj in each of the two classes of samples;
    PijCm=ProbRi<Rj|Y=Cm,m=1,2 1
  4. Here Δij represents the "score" for the gene pair (Gi, Gj), and select the gene pair(s) with the highest score as the top-scoring pair(s);
    ij=PijC1-PijC2 2
  5. If the top-scoring pair is (Gi, Gj) and assuming Pij(C1) > Pij(C2), the new sample Xnew is classified according to the following rule:
    Ynew=C1,ifRi,new<Rj,new,C2,Otherwise. 3

    If Pij(C2) > Pij(C1), the classification rule is reversed. If there are multiple top-scoring pairs, the final class is determined based on the majority voting principle using the above rules.

The motivation for proposing this method arises from the technical and practical limitations currently faced when using gene expression microarray data for class prediction, such as disease detection, tumor identification, and treatment response prediction. For instance, the number of observed samples is typically small, often only a few dozen, while the number of genes is very large, usually in the thousands, making accurate statistical inference challenging. Moreover, traditional machine learning methods are often difficult to interpret in simple or biologically meaningful terms [72, 73]. In contrast, the TSP method provides decision rules that (1) involve only a very small number of genes and their ranking relationships [74, 75], (2) are both accurate and interpretable [76, 77], and (3) achieve prediction rates for cancer data comparable to those in previous studies that used more genes and more complex procedures. The ‘tspair’ R package has been developed based on the TSP method [78]. This method has also been applied in various studies. Isella et al. utilized this approach to derive an algorithm for assigning colorectal cancer subtypes, termed CRIS-TSP, through which they identified five CRC subtypes with distinct molecular, functional, and phenotypic characteristics [79]. Hua et al. established a prognostic model for ovarian tumors by analyzing 1580 transcriptome profiles and combining the TSP method [80]. Zhao et al. used the method to predict which patients would experience systemic tumor progression following radical prostatectomy [81]. Weichselbaum et al. developed predictive biomarkers for chemotherapy and radiotherapy in BC using this method [82]. Raponi et al. applied this method to construct a predictor for the response of AML patients to the farnesyltransferase inhibitor tipifarnib [83]. Additionally, this method has provided new insights for many studies, such as the extraction of prognostic features for stage I non-small cell lung cancer (NSCLC) [84], and the development of the Top-Scoring Triplet (TST) classifier based on the expression orderings among three genes [85]; presents TimiGP, a novel gene-pair-based method for analyzing the tumor immune microenvironment [86]; developed ranktreeEnsemble by integrating TSP with boosting and random forest to enhance disease classification based on gene expression [87]; combines decision trees and evolutionary multi-test tree (EMTTree) with relative expression (RX) to develop Relative Multi-test Classification Tree (RMCT) [88] and EMTTree + RX [89], which enhances multi-omics data classification.

k-Top Scoring Pairs

In certain situations, the TSP method may be affected when the training data is perturbed by adding or removing some samples. Therefore, in 2005, Tan et al. proposed the k-TSP method, which extends TSP and aims to address this issue [64]. The k-TSP method can be seen as an ensemble learning approach, classifying based on k disjoint top-scoring pairs. In other words, it leverages the discriminative power of multiple "weaker" rules to make more reliable predictions. This method employs straightforward and accurate decision rules, focusing on the ranking relationships of a limited number of gene pairs. The authors compared this approach to other machine learning techniques for class prediction in 19 binary and multi-class gene expression datasets involving human cancers (including colorectal cancer, leukemia, lung cancer, prostate cancer, etc.) [9092]. The results showed that the k-TSP method performed as efficiently as SVM and outperformed other machine learning methods (Decision Trees [93], K-Nearest Neighbors [94], and Naive Bayes [95]). Here is a detailed description of the method:

  1. According to the specific description (i-iv) of the TSP method as above, obtain the scores Δij for gene pairs and sort them in descending order;

  2. Since multiple gene pairs may achieve the same highest score, calculate the secondary score Γij, known as the rank difference, to break ties. For these gene pairs, first compute the average rank difference γij within each class of samples:

    γijCm=nCmRi,n-Rj,nCm,m=1,2 4

    Where |Cm| represents the number of samples in each different category. Ri,n and Rj,n are defined as described in the TSP method. Next, you can proceed to calculate Γij:

    Γij=γijC1-γijC2 5

    From the formula, it can be derived that Γij represents the extent of reversal of the gene pair between the two classes of samples. Ultimately, for gene pairs where Δij is equal, Γij breaks the tie, resulting in a complete ranking of gene pairs in descending order;

  3. First, select the first gene pair in the ranking. Then, remove all pairs from the ranking that include either of these two genes. Next, select the first pair from the remaining ranking. Repeat this process until k disjoint highest-scoring pairs are selected. The parameter k is determined through cross-validation and must be odd to break ties in majority voting procedures;

  4. Finally, for each of the k pairs of genes, vote according to formula (3) and determine the class of the new sample based on the majority voting principle.

This method typically involves fewer genes, providing biologically meaningful decision rules, and thus has widespread application. For instance, it has been applied in studies investigating the anti-tumor activity of AL101 against adenoid cystic carcinoma with activated NOTCH signaling [96], constructing a classifier for predicting prostate cancer progression [97], developing personalized immune prognostic features for early-stage nonsquamous NSCLC [98], developing a model to predict the likelihood of hepatocellular carcinoma (HCC) recurrence after liver transplantation [99], and building highly accurate classifiers to differentiate gastrointestinal stromal tumors from leiomyosarcomas [100]. These application examples, which involve clinical issues such as diagnosis, prognosis assessment, and progression risk, demonstrate the robustness of the k-TSP method. Furthermore, other studies have developed R packages for this method. For example, the ‘ktspair’ R package (already disappeared from C-RAN), was developed by Damond et al. in 2011 [101]. Additionally, Afsari et al. developed the ‘switchBox’ R package in 2015, which is also based on the k-TSP method and has been widely applied [102105]. This package employs a new approach, namely “ANOVA”, to select the number of pairs (k). Compared to the original method, this approach is less computationally intensive and less prone to overfitting [106].

Top-Scoring Pair of Groups

To identify common gene expression features across various types of cancer, Xu et al. proposed the TSPG method in 2007. This method retains the basic properties of the TSP classifier but introduces a repeated random sampling strategy, incorporating more genes into the decision-making process [65]. The motivation for developing this method is as follows: First, cancer results from the combined action of multiple genes. To better understand the mechanisms of cancer development, it is necessary to identify common cancer features composed of multiple genes; Additionally, in some cases, when the training data is perturbed by adding or removing samples, a gene may form the top-scoring pair with different genes. This indicates that genes consistently appearing in top-scoring pairs are likely closely related to cancer, while other genes paired occasionally might not be. To retain genes that play critical roles in the carcinogenesis of common cancers and eliminate those possibly unrelated to cancer, a repeated random sampling strategy is required [107]. In their study, the authors collected and integrated approximately 1500 microarray gene expression profiles from 26 published cancer datasets, involving 21 major human cancer types (including lung, breast, bladder, ovary, pancreas, etc.) [108110]. They then applied TSPG to the training datasets and identified a common cancer feature composed of 46 genes. After further validation with training and independent test datasets, this feature was found to be potentially useful as a robust and objective cancer diagnostic test. Here are the ways in detail:

  1. Randomly select S% (in this study, S = 90) of the total samples from the gene expression profile used for training;

  2. Obtain the complete ranking of gene pairs from highest to lowest according to the specific description (i-ii) of the k-TSP method;

  3. First, select the top gene pair from the ranking and remove all pairs from the ranking that include either of these two genes. Next, select the top pair from the remaining ranking. Repeat this process until you have selected k (in this study, k = 20) disjoint highest-scoring pairs. Then, for each of these k gene pairs (Gi, Gj), if Pij(C1) > Pij(C2) (The definition of Pij as described in the TSP method), assign Gi to Group 1 and Gj to Group 2; otherwise, assign Gj to Group 1 and Gi to Group 2. Genes assigned to Group 1 typically exhibit lower expression levels compared to those assigned to Group 2;

  4. Repeat the preceding steps (i-iii) N times (in this study, N = 1000), after which the frequency of each gene in Group l and Group 2 is calculated separately. Set a frequency threshold F (in this study, F = 80%) to retain genes that appear more than this threshold;

  5. For a new sample, rank all genes in Group 1 and Group 2 based on their expression levels. If the average ranking of genes in Group 1 is lower than that in Group 2, classify the sample as Class 1; otherwise, classify it as Class 2.

This method introduces repeated random sampling, which allows for the extraction of more robust and reliable features.

k-Top Scoring Pairs + Support Vector Machine

The widely used k-TSP method is a simple yet powerful non-parametric classifier. However, its overall robustness may not extend to challenging datasets, possibly due to the relatively straightforward voting scheme used by the classifier. Therefore, Shi et al. (2011) proposed using multiple gene pairs obtained from k-TSP as features to construct a SVM-based model [66]. This approach was applied to four cancer prognosis datasets, where k-TSP + SVM outperformed the k-TSP classifier across all datasets, achieving performance comparable to or better than SVM used alone. Furthermore, this method has also been widely applied, including the construction of model for predicting melanoma subtypes [111].

Rank Comparison

The differential expression analysis that focuses on inter-group comparisons can only capture differentially expressed genes at the population level, potentially masking the heterogeneity of individual differences [67]. Therefore, to provide patient-specific information for personalized medicine, it is necessary to conduct differential expression analysis at the individual level. Consequently, Wang et al. (2015) proposed a method to detect differentially expressed genes in individual disease samples by utilizing disrupted rankings within individual disease samples [112]. The principle is to use previously accumulated data to pre-determine the stable gene pairs in specific types of normal tissues. Based on these stable gene pairs, any disrupted genes and pathways in disease samples can be easily detected. In both simulated data and real paired cancer-normal sample data, this method demonstrated excellent performance. The identification and application of lung cancer prognostic biomarkers further proved the usefulness of the RankComp method in clinical practice [113, 114]. The specific description of the method is as follows:

  1. Collect normal samples of a specific tissue type from various data sources. For each sample, convert the gene expression values to their ranks within the sample. Perform pairwise comparisons for all genes. For a gene pair (Gi, Gj), if the frequency of Gi > Gj or Gi < Gj in the normal samples is greater than 0.99, the pair is defined as a stable gene pair. Finally, take the intersection of genes from all identified stable gene pairs to obtain the gene list G;

  2. For each gene Gi​ in the gene list G, G-pair denote the set of stable gene pairs in normal samples that include Gi. Let a and b denote the numbers of gene pairs belonging to G-pair with ranking patterns Gi > Gj and Gi < Gj’ in normal samples, and c and d denote the corresponding numbers of gene pairs belonging to G-pair with ordering patterns Gi > Gj and Gi < Gj’ in the disease sample k. Use Fisher’s exact test under the null hypothesis that a/b = c/d to determine whether Gi is differentially expressed in disease sample k. If there is significant evidence to reject the null hypothesis, then Gi ​is considered differentially expressed. If a/b < c/d, Gi ​is defined as upregulated; if a/b > c/d, Gi is defined as downregulated;

  3. For each disease sample, iterate through each gene in the gene list G following the procedure in step ii to determine which genes are differentially expressed in the corresponding disease sample.

The analysis of individual-level differential genes has significant applications. It enables pathway analysis at the individual level. Additionally, it allows for patient stratification directly based on the specific dysregulation status of each patient. This method has also provided new insights and approaches for many other studies. For example, Song et al. identified a seven autophagy related gene pairs signature using the RankComp algorithm for colorectal cancer diagnosis, which could distinguish cancerous from non-cancerous tissues with high accuracy and was validated across multiple datasets [115]. Hu et al. used this method to identify differentially expressed genes in normal and osteosarcoma tissue samples [116]. Cai et al. developed RankCompV2 based on this method [117]. Moreover, Wang et al. proposed an approach to detect pathways with disrupted relative expression order at the individual level [118].

Relative Expression Orderings

This method, which determines features for clinical application based on the relative expression order of gene pairs within samples, was proposed by Ao et al. in 2018 [68]. In their study, the authors used this method to identify a signature of 19 gene pairs from training datasets of hepatocellular carcinoma (HCC) to distinguish HCC and most of tumor-adjacent tissues from cirrhosis tissues of non-HCC patients. These signatures were validated in two large sample sets from biopsy and surgical resection specimens. The results indicate that even if the biopsy sample is not taken from an accurate location, this feature can still effectively distinguish different samples, demonstrating the practicality of this method. The specific steps of this method are as follows:

  1. For each gene pair (Gi, Gj), if the REO pattern (Gi > Gj or Gi < Gj) remains consistent across more than 85% of the samples in the first class in the training data, and reverses in more than 85% of the samples in the second class, then this gene pair is defined as a reverse gene pair between these two types of samples. The calculation of the ranking difference for reverse gene pairs in each sample is as follows:
    Rij=Ri-Rj 6

    where Ri and Rj represent the rankings of genes Gi and Gj in the sample respectively, Rij denotes the absolute rank difference between the two genes;

  2. The mean[Rij(C1)] and mean[Rij(C2)] represent the mean absolute rank differences of reversed gene pairs (Gi, Gj) in all samples of the first and second classes, respectively. Then, calculate the geometric mean of the mean[Rij(C1)] and mean[Rij(C2)] to assess the degree of reversal of this gene pair between the two classes of samples:
    avgRij=meanRijC1×meanRijC2 7

    The larger this geometric mean, the greater the degree of reversal of this gene pair between the two types of samples;

  3. All gene pairs are sorted in descending order based on their degree of reversal. The gene pair with the maximum degree of reversal is selected as the seed, and then a forward selection procedure [119] is used to search for the optimal subset of an odd number of gene pairs;

  4. Based on the subset of reversed gene pairs, classification for a given sample is achieved using a majority voting rule. Specifically, if more than half of the feature gene pairs exhibit the REO pattern of the first class, the sample is classified into that class; otherwise, it is classified into the second class.

This method has been adopted in numerous studies. For instance, it has been applied in the diagnosis of pancreatic cancer [120], colorectal cancer [121], gastric cancer [122], as well as in prognostic prediction for hepatocellular carcinoma [123, 124], and many others.

Relative Expression Orderings + Machine Learning

The method was proposed by Zhang et al. in 2020, combining Machine Learning (ML) methods with REOs [69]. In the study, the authors initially extracted gene pair features using REOs from the HCC gene expression profile. They then applied Maximum Relevance Minimum Redundancy (mRMR) [125] and Incremental Feature Selection (IFS) [126] to eliminate irrelevant features, resulting in the identification of 11 gene pairs. These pairs were integrated with SVM to establish a diagnostic model for HCC. The authors further validated its ability to distinguish HCC using several independent datasets. The results indicate that the model can differentiate HCC from non-cancerous liver tissues and that these features exhibit robustness across clinical and pathological variations. The specific methodology is outlined as follows:

  1. According to the specific description (i) of the REOs method, obtain the reverse gene pairs as preliminary features (with a threshold of 95% in this study);

  2. Based on the gene expression profile and reversed gene pairs, a new matrix is generated with columns representing reversed gene pairs and rows representing samples. Each cell in the matrix is assigned a value to denote the relationship between the genes: 1 indicates Gi > Gj, 0 indicates Gi < Gj, − 1 indicates other situations, such as when Gi or Gj expression data is unavailable or undefined;

  3. Based on the new matrix, mRMR is applied to rank gene pairs according to their maximum relevance to disease type and minimum redundancy with other gene pairs. Subsequently, IFS is used to select the optimal gene pairs as final features;

  4. An SVM-based model is established based on the final selected features.

This method has also been widely applied. For example, Xie et al. used it for type 2 diabetes diagnosis and achieved promising predictive performance in both training and external independent test datasets [127]. Zhang et al. identified 10 reversal differential partial correlation (RDC) gene pairs through reversal gene pair and differential partial correlation analyses, and then construct a machine learning model for pancreatic ductal adenocarcinoma (PDAC) recognition, achieving 96.1% accuracy in cross validation and outperforming gene expression based models [128].

Top-Scoring Pair + Machine Learning

Based on the methods described above, we have found that features extracted from the order relationships of gene pairs can be effectively applied to various clinical scenarios. Therefore, drawing from this experience, we have also developed a method that combines TSP with ML methods. The specific method is as follows:

  1. Gene Pair Scoring: Obtain the gene pair scores Δij ​based on the specific description (i-iv) of the TSP method, and sort them in descending order;

  2. Preliminary Feature Extraction: The Δij scores range from 0 to 1. By setting a threshold (e.g., 0.5, depending on the situation), obtain preliminary features with scores greater than the threshold. If the same gene appears in different gene pairs, only the gene pair with the highest score is retained;

  3. Feature Encoding: Encode the training set based on the extracted features. Use the format ‘Gi | Gj’ as column names, and sample names as row names. The values in the matrix will be 1 (Gi < Gj), 0 (Gi = Gj), or −1 (Gi > Gj);

  4. Feature Importance Ranking: Based on the feature-encoded matrix, apply various feature importance ranking methods, such as Maximal Information Coefficient (MIC) [129], Chi-squared Test (Chi2) [130], mRMR, and Random Forest (RF) [131], to rank the preliminary features;

  5. Feature Selection and Model Construction: Apply the IFS method for feature selection. This involves adding features step-by-step based on their importance rankings from each feature importance method (such as MIC, Chi2, mRMR, RF) into multiple algorithms including eXtreme Gradient Boosting (XGBoost), Logistic Regression (LR) [132], RF, and SVM. After adding each new feature, perform cross-validation [133] and grid search. Ultimately, select the best combination of features and algorithms based on the receiver operating characteristic (ROC) area under the curve (AUC). Finally, use the entire training set to construct the final model.

After obtaining initial features through TSP, this method no longer relies on simple scoring for feature selection. Instead, it applies various feature importance ranking and IFS methods. This approach enables a more effective selection of features that are more relevant to the research objectives. Furthermore, comparing multiple algorithms ultimately yields the optimal model tailored for specific clinical issues.

Comparison of gene pair methods

Among the eleven gene pair methods, all except RankComp are applicable to classification task. However, the Pair t-score and PSVM methods are susceptible to batch effects. Therefore, we selected the remaining eight methods for comparison, comprising one method based on expression values and seven based on ranking relationships. These methods were implemented using Python (version 3.10.13) and R (version 4.2.3). Specifically, the k-TSP method was implemented with the switchBox package (version 1.34.0), while all other methods were implemented strictly according to their original descriptions. For machine learning components, we mainly utilized the scikit-learn package (version 1.3.2) [134] and the xgboost package (version 2.0.3). The complete code is publicly available (10.5281/zenodo.14948408).

To quantitatively compare and evaluate these eight methods, as well as to assess their applicability to real-world clinical problems, we collected four PTB-related datasets (GSE19439, GSE19442, GSE19444, and GSE83456) from the Gene Expression Omnibus (GEO) [135] database as benchmark datasets (n = 253) [136, 137]. The benchmark dataset includes 99 samples from PTB patients (positive group) and 154 samples from latent tuberculosis infection or healthy controls (negative group). PTB was confirmed by sputum culture [138], while latent tuberculosis infection was diagnosed using the Tuberculin Skin Test (TST) and Interferon-Gamma Release Assay (IGRA) [139141]. The raw, non-normalized data from these datasets were first processed using the limma package (version 3.54.0) for background correction and normalization [142]. Subsequently, probe-to-gene symbol conversion was performed, and the intersecting genes across the four datasets were extracted. Finally, the first three datasets were used as discovery data, split into training and testing sets in a 7:3 ratio, while the fourth dataset was used as an external independent validation set (Table 3).

Table 3.

Pulmonary tuberculosis benchmark datasets

GEO accession Country PTB LTBI HC
Discovery data GSE19439 [136] England 13 17 12
GSE19442 [136] South Africa 20 31
GSE19444 [136] England 21 21 12
Validation data GSE83456 [137] England 45 61

Abbreviation: PTB, Pulmonary Tuberculosis; LTBI, Latent Tuberculosis Infection; HC, Healthy Control

In the training set, we trained models using each of the eight methods and evaluated their performance on the training, testing, and external validation sets using accuracy as the metric (Fig. 3). From the perspective of overall performance, despite PTB being a non-cancer disease, these methods achieved an accuracy of 0.849–0.953 on the external validation set, demonstrating their potential for clinical applications. From the perspective of individual method performance, TSP combined with ML exhibited the best performance across the training, testing, and external validation sets. In terms of the number of feature genes used, these methods relied on only 2–30 genes to achieve excellent predictive performance. Notably, five genes—ANKRD22, PSTPIP2, FCGR1BP, GBP6, and KLF12—were selected as feature genes by more than half of the methods. Among them, ANKRD22 has been confirmed as a PTB biomarker in multiple studies [143145]. GBP6, a member of the guanylate-binding protein family, plays a critical role in innate immunity, and several studies have reported its association with PTB and can be used as a biomarker [146148]. Although the other three genes have not been explicitly reported as PTB biomarkers, they are closely associated with the disease. For instance, PSTPIP2 has been reported in studies related to PTB [29]. While FCGR1BP has not been directly identified as a PTB biomarker, its related genes, FCGR1A [149152], FCGR1B [153, 154], and FCGR1C [155], have all been shown to be associated with PTB. KLF12, a transcription factor belonging to the family of Kruppel-like factors, whose members have been shown to play a key role in regulating the function of macrophages and T cells [156158], and therefore it may be involved in the host immune response to Mycobacterium tuberculosis. In conclusion, gene pair methods not only identify known disease-associated genes but also uncover novel insights, providing valuable implications for addressing clinical challenges.

Fig. 3.

Fig. 3

Performance comparison of gene pair methods. The selected eight methods were applied to the training set, testing set and external verification set of PTB benchmark data, and the performance was compared by accuracy

Application of gene pair methods

Gene pair methods have been widely applied in research on both cancerous and non-cancerous diseases, encompassing various domains such as disease diagnosis, subtype classification, disease progression, and prognostic prediction (Fig. 4a). Here, we highlight several related studies that not only demonstrate the broad applicability of gene pair methods across diverse research scenarios but also underscore their excellent performance advantages, demonstrating their powerful potential in addressing complex biological challenges.

Fig. 4.

Fig. 4

The diverse applications of gene pair methods across omics data and clinical issues. a In clinical contexts, gene pair methods demonstrate utility in addressing various issues, including identifying differentially expressed genes for basic research, diagnosing diseases, differentiating disease subtypes, predicting disease progression, and assessing therapeutic efficacy. Their robustness and flexibility enable seamless integration into clinical workflows for precision medicine and biomarker discovery. b Gene pair methods are versatile analytical tools that extend beyond transcriptomics to various omics layers, including metabolomics, genomics, epigenomics, proteomics, and small RNA. These methods utilize relationships between gene pairs to extract meaningful insights from high-dimensional data without requiring strict normalization, making them applicable to a wide range of omics datasets

Application in disease diagnosis

Pathologists face significant challenges in diagnosing pancreatic cancer (PC) using biopsy specimens, as these samples may be obtained from the wrong site or contain insufficient tissue [159]. Consequently, there is an urgent need to develop a platform-independent molecular classifier to accurately distinguish between benign pancreatic lesions and PC. To address this, Zhou et al. [120] utilized the REOs method to develop a robust qualitative mRNA signature capable of differentiating both PC tissues and cancer-adjacent normal tissues from non-PC pancreatitis and healthy pancreatic tissues. The training cohort included samples from five microarray datasets, comprising 74 normal pancreatic tissue, 72 pancreatitis tissue, and 269 PC tissue samples [160163]. Using the training data, the authors constructed a feature set consisting of 12 gene pairs and 17 individual genes. The performance of this feature set was evaluated in an external validation cohort, which included 1,007 PC tissue samples and 257 non-tumor samples derived from both microarray and RNA-sequencing data [164169]. Validation results demonstrated a geometric mean sensitivity and specificity of 96.7%, with an AUC of 0.978. Moreover, in 20 specimens obtained via endoscopic biopsy, the diagnostic accuracy of this feature set reached 100%. In summary, the REOs-based feature provides a reliable molecular diagnostic tool for PC, enabling objective differentiation between benign and malignant pancreatic lesions with high accuracy.

Xu et al. [170] utilized the TSP method to integrate microarray datasets from three independent prostate cancer (PCa) studies [71, 171, 172], identifying a robust gene pair (HPN and STAT6) for the diagnosis of PCa. The overall accuracy of this feature gene pair in independent datasets generated from different microarray platforms was 93.8%, with a sensitivity of 91.7% and specificity of 97.7% [173, 174].

With increasing research into the role of microRNAs (miRNAs) in cancer and their potential as diagnostic and prognostic biomarkers [175177], Michele et al. [61] aimed to identify dysregulated miRNAs in head and neck squamous cell carcinoma (HNSCC) and evaluate their predictive capacity for the disease. RNA isolated from primary tumor tissues and non-diseased head and neck epithelial tissues, followed by microarray analysis to assess the expression of 662 miRNAs. Subsequently, miRNAs showing differential expression in both microarray and qRT-PCR analyses were further validated. The results identified 18 miRNAs with significantly altered expression between normal and tumor tissues. Notably, a biomarker based on the expression ratio of miR-221:miR-375 demonstrated excellent predictive performance, achieving a sensitivity of 0.92 and a specificity of 0.93. These findings suggest that this biomarker holds promise as a potential high-value diagnostic tool.

Gastrointestinal stromal tumor (GIST) has emerged as a clinically distinct type of sarcoma, characterized by the frequent overexpression and mutation of the c-Kit oncogene and a favorable response to imatinib mesylate therapy [178]. However, distinguishing GIST from leiomyosarcomas (LMS) remains a significant diagnostic challenge [179, 180]. To improve diagnostic accuracy, Nathan et al. [100] conducted a genome-wide gene expression analysis of 68 tumor samples. Using the TSP method, the authors identified a single gene-pair classifier (OBSCN and C9orf65) capable of differentiating GIST from LMS. Validated this classifier by using RT-PCR on 20 samples in the microarray study and on an additional 19 independent samples, with 100% accuracy. This gene pair provides a rapid, PCR-based assay that reliably distinguishes GIST from LMS and has the potential to aid in diagnosis and in the selection of appropriate therapies.

Apart from the above examples, gene pair methods have also been used for the diagnosis of ischemic stroke [181].

Application in subtype classification

Claudi et al. [79] deployed human-specific expression profiling of colorectal cancer (CRC) patient-derived xenografts (PDXs) to assess cancer-cell intrinsic transcriptional features. Through this approach, they identified five CRC intrinsic subtypes (CRIS) endowed with distinctive molecular, functional, and phenotypic characteristics. To translate the CRIS taxonomy into a clinically applicable diagnostic tool, the authors selected 526 shared CRIS genes from independent datasets across three different platforms (totaling 624 samples). Using the TSP and k-TSP methods, they developed a subtype assignment algorithm named CRIS-TSP. The results showed that the CRIS-TSP algorithm, using 40 gene pairs, maintained the classification ability of the original 526-gene classifier. When evaluated on six gene expression datasets with clinical outcome annotations (totaling 1,487 samples), the algorithm’s classification results further confirmed that CRIS-B subtype patients had a poorer prognosis. This suggests that the CRIS-TSP algorithm not only demonstrates high reliability in subtype classification but also holds potential clinical application value, offering guidance for personalized treatment in CRC patients.

Jennifer et al. [111], through gene expression analysis of 53 human melanoma cell lines and patient tumors, revealed that melanoma follows a two-dimensional differentiation trajectory, which can be further subdivided into four progressive subtypes. To train the subtype prediction model and enable more convenient cross-platform and cross-batch applications, the authors ultimately employed the TSP combined with the SVM method. Specifically, the authors selected the top 250 genes with the highest variance from the training set to construct the model, and converted the gene expression matrix into a binary matrix of gene pairs. Pairs were then scored by hypergeometric test to calculate the p value of enrichment for each subtype compared to the remaining subtypes. Gene pairs were filtered by a minimum p value of 1e-05 in at least one subtype, resulting in 1,561 gene pairs. Based on these selected gene pairs, a binary matrix for each cell line was constructed, and with the identified subtype information, an SVM model was trained using the radial basis function (RBF) kernel [182] from the R package kernlab. The model achieved a classification accuracy of 94% in leave-one-out cross-validation. This melanoma differentiation subtype prediction model can effectively support subtype diagnosis and assist in guiding personalized treatment strategies.

Application in disease progression

To predict metastatic progression in PCa, Hubert et al. [97] collected and organized gene expression profiles from different datasets, including 1,239 primary tumor samples from PCa patients, with information on metastatic events. Each dataset was preprocessed, retaining only the common genes (12,761 genes) across all datasets. The data was then split into 75% training (n = 930) and 25% testing (n = 309) using stratified sampling [183]. Subsequently, a classifier was trained based on the training set using the k-TSP method, with features consisting of 13 up- and down-regulated gene pairs. In addition to its interpretable decision rules, this signature demonstrated robustness and stability in both the training and testing sets, with AUC values of 0.69 and 0.70, respectively. Moreover, the prognostic value of this feature was tested in an independent cohort of 439 primary tumor samples from PCa patients [184, 185], where the feature was found to be significantly associated with progression-free survival (PFS).

Application in disease prognosis

Rajeshkumar et al. [186] aimed to evaluate the efficacy of AZD0530 (an orally active small molecule Src inhibitor) in a human PC xenograft and identify biomarkers that could predict its activity. The authors first treated 16 patient-derived PC xenografts with AZD0530 for 28 days. They then performed gene expression profiling on 16 tumor samples using microarrays, identified differentially expressed genes, and used the k-TSP method to identify a gene pair (LRRC19 and IGFBP2) from the 16 training samples. This gene pair was used to build a classifier. In an independent test set comprising 8 xenografts, the classifier achieved a sensitivity of 100% and specificity of 83.3%. These results indicate that the gene pair has high predictive ability and could serve as a biomarker for PC sensitivity to AZD0530.

A study by Chen et al. [187] confirmed the effectiveness of the GMPS|RAMP3 gene pair as a prognostic biomarker in HCC. By utilizing relative gene expression levels, the GMPS|RAMP3 signature overcomes platform-specific biases and biological heterogeneity, ensuring robust prognostic predictions. Experimental validation further supported the role of GMPS in HCC, with GMPS knockdown suppressing proliferation, promoting apoptosis, and increasing gemcitabine sensitivity. These findings highlight the potential of gene pair-based strategies in prognostic assessment and personalized treatment approaches for HCC. Xu et al. [188] identifies a neutrophil extracellular traps (NETs)—related gene pair signature that can predict HCC prognosis and distinguish patient immune statuses, offering promise for immunotherapy strategies. In addition, Huang et al. [189] found that the CSTF2/PDE2A gene pair can predict the prognosis of HCC and regulate the cell cycle, showing promise as a novel prognostic predictor.

Most BC patients express estrogen receptor (ER) [190] and primarily receive tamoxifen as adjuvant therapy. However, approximately 40% of ER-positive BC patients either do not respond to tamoxifen or eventually develop resistance, leading to disease progression [191, 192]. Currently, clinical pathological features, such as tumor stage and grade, as well as the expression of ERBB2 and EGFR, cannot accurately predict the clinical outcomes of tamoxifen treatment [193, 194]. Therefore, there is an urgent need to identify reliable predictive biomarkers. Ma et al. [60] analyzed the gene expression profiles of 60 ER-positive primary BC patients who received tamoxifen monotherapy as adjuvant treatment. The study found that the simple gene expression ratio of HOXB13 to IL17BR could accurately predict the efficacy of tamoxifen treatment, showing superior predictive ability compared to existing biomarkers. In the tissue section dataset, the AUC value of this gene expression ratio reached 0.81; in the laser capture microdissection dataset, the AUC value further increased to 0.84. Moreover, overexpression of HOXB13 was significantly associated with poor clinical prognosis and was closely related to the enhanced invasive ability of BC cells in vitro, suggesting that HOXB13 may play a key role in BC progression [195]. To further validate their findings, the authors conducted an analysis on an independent cohort comprising formalin-fixed paraffin-embedded (FFPE) samples from 20 patients with ER-positive early primary BC who were treated with tamoxifen monotherapy. The results showed that the HOXB13:IL17BR expression ratio was significantly associated with the patients’ clinical outcome (t test p = 0.024), with higher HOXB13 expression correlated with poor outcome. In conclusion, the HOXB13:IL17BR expression ratio has the potential to be a precise biomarker for predicting the efficacy of tamoxifen monotherapy, providing an important basis for personalized BC treatment.

Apart from the examples mentioned above, gene pair methods have also been used to develop prognostic models for various cancers, including cervical cancer [196], colon adenocarcinoma [197], gastric cancer [198], clear cell renal cell carcinoma [199], lung adenocarcinoma [200, 201], colorectal cancer [202, 203], and glioma [204, 205]. These methods have achieved significant results in predicting patient survival, risk stratification, and guiding treatment strategies.

Extension of gene pair methods

Extension to other data types

In this review, we primarily explore gene pair methods developed based on mRNA transcriptome data. However, these methods and their underlying principles are also applicable to other types of data (Fig. 4b). For example, Ren et al. used the GERs method to analyze microRNA expression data in osteosarcoma, developing biomarkers for predicting osteosarcoma [206]. Patnaik et al. utilized the TSP method to identify microRNA prognostic markers that predict the recurrence of NSCLC from microRNA expression data [207]. Lin et al. applied the k-TSP’s principle to serum metabolomics data obtained from liquid chromatography-mass spectrometry (LC–MS) of hepatocellular carcinoma (HCC) and chronic liver disease (CLD), ultimately selecting 27 feature pairs that successfully distinguished HCC from CLD and revealed certain deep metabolic disorders associated with liver disease progression [208]. Additionally, Yan et al. employed RankComp’s principle to accurately detect Differentially Methylated CpG sites in individual cancer samples from DNA methylation data of lung adenocarcinoma samples [209].

In addition to gene pair methods developed based on mRNA transcriptome data, there are also many gene pair methods designed for other types of data, and the concept of feature pairs has similarly been applied to non-gene expression data. For instance, gene pair methods have been developed for microRNA expression data [210, 211], lncRNA expression data [212, 213], methylation data [214, 215], and protein data [216].

Specifically, in proteomics, gene pair methods can be used to analyze the relative relationships of protein expression levels. Proteomics data is based on the abundance of proteins, like gene expression data in transcriptomics. By selecting protein pairs (e.g., protein A and protein B) and calculating their relative expression relationship (e.g., whether protein A is higher than protein B), features can be constructed for classification or state differentiation without relying on absolute expression levels. This relative abundance-based analysis has significant advantages, such as eliminating common batch effects and quantitative biases introduced by different protein measurement methods. Similarly, in metabolomics, gene pair methods can be used to analyze the relative relationships of metabolite abundances. Metabolomics data consists of a large number of small molecules that directly or indirectly reflect the metabolic state of cells [217]. By constructing metabolite pairs and comparing their relative abundances, disease-related metabolic pathways or key biomarkers can be revealed. For example, in metabolic diseases like type 2 diabetes, changes in the relative abundance of certain metabolite pairs can help distinguish disease states. Additionally, this method is robust to batch effects commonly found in metabolomics experiments, further enhancing its applicability in multi-center studies. In epigenomics, the concept of gene pair methods can also be extended to epigenetic data, such as DNA methylation or histone modifications. For example, by comparing the methylation levels of two CpG sites or the modification intensity of two histone modification sites, relative relationships can be established for constructing features in classification models. This analysis helps identify key patterns in epigenetic regulation and reveals epigenetic markers for specific diseases.

Furthermore, with the rapid advancement of single-cell RNA sequencing (scRNA-seq) technologies [218], many gene pair methods based on scRNA-seq data have also emerged. Chen et al. constructed a new feature matrix from scRNA-seq data, known as the “delta rank matrix” (DRM), based on the relative expression order of gene pairs. The results showed that DRM outperformed the original gene expression matrix in cell clustering, cell identification, and pseudotime reconstruction [219]. Tong et al. proposed “scRankXMBD”, a computational framework based on the relative expression order of gene pairs in scRNA-seq data, aimed at prioritizing the identification of prognosis-related cell subpopulations, demonstrating higher accuracy and consistency compared to five existing methods [220]. Shen et al. introduced a novel ensemble learning method named “scDetect” which combines the ranking relationships of gene pairs with a majority voting ensemble learning strategy, enabling high-precision cell classification across scRNA-seq data from different sequencing platforms [221]. Additionally, Yan et al. developed a new method called “RankCompV3”, which can identify differentially expressed genes in scRNA-seq data by comparing the relative expression order of gene pairs [222]. These methods also exhibit significant potential for clinical applications [223].

Extension to multiclass classification

The methods introduced in this review are primarily based on binary classification problems, but they can also be extended to multiclass classification problems. During this extension, well-known strategies such as “One-vs-One” [224], “One-vs-All” [225], and “Hierarchical Classification” [226] are commonly used. These strategies address multiclass problems by combining multiple binary classifiers.

In the “One-vs-One” method, a binary classifier is trained for each pair of categories. This means that if there are N categories, N*(N−1)/2 binary classifiers are needed, each comparing two categories, and the final class of each sample is determined through a voting mechanism. The advantage of this method is that it allows for more refined learning of the boundaries between different categories. However, its drawback is that when the number of categories is large, the computational cost increases significantly, as the number of classifiers grows quadratically. In the “One-vs-All” strategy, each category is distinguished from all other categories. This means that for each category, a binary classifier is trained to determine whether a sample belongs to that category. The final classification is made by evaluating the output probabilities of each classifier and selecting the category with the highest probability. Compared to “One-vs-One”, this method is simpler, requiring N classifiers. However, when the categories are highly similar, it may face a decrease in classification performance. “Hierarchical classification” addresses multi-class classification problems by constructing a hierarchy of categories. This method typically starts with coarse classification and progressively refines the categorization. The core idea is to gradually narrow down the classification scope using a hierarchical structure, which improves classification accuracy. For example, samples might first be categorized into broad groups (e.g., disease vs. health), and then further subdivided into specific subclasses (e.g., different types of diseases). “Hierarchical classification” can effectively reduce the number of classifiers needed, avoiding the direct distinction between all categories, thus improving both classification efficiency and accuracy [227].

Additionally, Marzouka et al. developed an R package called “multiclassPairs” specifically designed for training multiclass classifiers based on feature pairs [228]. With these extensions, the gene pair methods can not only effectively address binary classification problems but also tackle challenges in multiclass classification tasks.

Discussion and conclusion

Gene pair methods analyze data by leveraging the combined information of two genes, offering more comprehensive and nuanced biological insights. Notably, these methods are inherently robust to batch effects, making them highly effective for mining diverse biological datasets across different experimental conditions. This review systematically examined eleven distinct gene pair methods, emphasizing their methodologies, applications, and implementation details. By categorizing these methods based on expression values and ranking relationships, we provided a comprehensive overview to guide researchers in their application. Additionally, we developed a reproducible analytical pipeline based on these methods, ensuring their accessibility and usability for practical research scenarios. While most of these methods lack publicly available implementation tools, we have made every effort to reproduce the corresponding code and analyses based on descriptions in published manuscripts. This absence of reference tools posed significant challenges, as relying solely on method descriptions may introduce inconsistencies or biases. Consequently, we cannot guarantee that the reproduced results are fully identical to those in the original studies. This highlights the importance of providing open-source implementations when developing methods, as they not only help other researchers validate and apply these methods but also enhance the credibility and reliability of scientific discoveries.

Beyond summarizing and reproducing these approaches, we evaluated their performance using real-world benchmark datasets (PTB transcriptomic profiles) to highlight their strengths and limitations. These evaluations revealed that gene pair methods are robust tools for extracting meaningful biological insights from high-throughput gene expression data, offering significant advantages in disease research.

In the eleven methods reviewed in this paper, except for RankComp, which is used for the identification of personalized differentially expressed genes, the other ten methods are primarily applied to various classification tasks. Among these methods, three are based on gene expression values, with Pair t-score and PSVM being complex and not capable of eliminating batch effects, whereas GERs, in contrast, can overcome this issue. The remaining seven methods are based on ranking relationships. TSP, although the first method proposed using ranking relationships, relies solely on the top scoring gene pairs for classification. While it uses fewer features, it typically does not achieve the best performance compared to other methods. k-TSP, on the other hand, builds on TSP by increasing the number of feature gene pairs used for classification, thus enhancing its discriminatory power. Furthermore, corresponding packages have been developed by researchers to facilitate its application. REOs introduces the concept of reversed gene pairs and, like k-TSP, works with multiple gene pairs, achieving similar effectiveness. Additionally, TSPG, while ensuring the stability of identified feature gene pairs through repeated random sampling strategies, also increases the computational burden. Finally, by integrating methods such as TSP, k-TSP, and REOs with machine learning, the application performance can be enhanced, although their effectiveness depends on the coding implementation skills of researchers,

In practical applications, based on different research scenarios, we offer the following recommendations: For studies aiming to identify robust differentially expressed genes, we recommend using RankComp, as it has shown good robustness in such analyses. For studies involving multi-batch data integration, methods like Pair t-score and PSVM are not suitable due to their limitations in handling batch effects. We suggest using other methods, such as k-TSP, REOs, that are more robust against batch effects. For datasets with a large number of features (e.g., over 20,000), we do not recommend using Pair t-score, as it is computationally expensive. Similarly, methods like TSP and TSP + ML are also time-consuming in such cases and are less ideal. For bench researchers without programming backgrounds, we suggest using methods with well-established toolkits, such as TSP and k-TSP, which offer convenient solutions for experimental analysis. Lastly, we encourage researchers to use multiple methods for comparative analysis to identify the most suitable approach for their specific data. We provide the code implementation and comparative analysis of these methods for reference (10.5281/zenodo.14948408).

However, the implementation of gene pair-based methodologies must account for their intrinsic constraints. The reduction of continuous expression data to binary ordinal relationships inherently sacrifices quantitative information about expression magnitudes, which may mask biologically critical subtle variations. Technical artifacts (e.g., measurement noise in low-abundance genes) and threshold-dependent analytical frameworks could additionally undermine cross-platform reproducibility. From a clinical perspective, despite their resistance to batch effects, persistent challenges include tumor microenvironment heterogeneity, dynamically evolving disease states, and limited biological interpretability of mechanistically unrelated gene pairs. Moreover, the absence of standardized clinical validation frameworks demands concerted efforts to bridge the gap between computational research and real-world implementation.

Looking forward, gene pair-based approaches offer substantial potential for advancing precision medicine across a wide range of clinical and research scenarios. These include identifying differentially expressed genes, supporting basic research, facilitating disease diagnosis, distinguishing subtypes, tracking disease progression, and predicting treatment efficacy, thereby providing valuable insights for diverse studies. Furthermore, the application of gene pair methods has expanded to other data types, such as single-cell sequencing, further demonstrating their versatility and broad utility in addressing complex biological questions. These methods not only provide novel perspectives but also serve as powerful tools for both medical research and clinical practice. Future efforts could focus on enhancing computational efficiency, integrating multi-omic data, and improving result interpretability to better support clinical decision-making.

In summary, by summarizing existing methods and providing a reproducible pipeline, this review aims to serve as a valuable resource for researchers, inspiring future innovations and fostering broader applications of gene pair analysis in biomedical research.

Acknowledgements

We would like to acknowledge the researchers whose relevant works are cited in this review and all co-authors for their support.

Authors’ contributions

CW and XX conducted the literature search, drafted the manuscript, prepared all figures and tables, and implemented the code. XY, MD, HL, and JH provided critical feedback, participated in data processing, and contributed to the revision of the manuscript. All authors have read and approved the final version of the manuscript.

Funding

This work was supported by grant from the Medico-Engineering Cooperation Funds from University of Electronic Science and Technology of China (ZYGX2022YGRH004), National Natural Science Foundation of China (62371112), and Incubation Program for Innovative Science and Technology in UESTC (Y03023206100209).

Data availability

Not applicable.

Declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare no conflicts of interest.

Footnotes

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Hao Lin, Email: hlin@uestc.edu.cn.

Jian Huang, Email: hj@uestc.edu.cn.

References

  • 1.Guo Y, Ju Y, Chen D, Wang L. Research on the computational prediction of essential genes. Front Cell Dev Biol. 2021;9: 803608. 10.3389/fcell.2021.803608. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Mills MC, Tropf FC, Brazel DM, van Zuydam N, Vaez A, Consortium e, et al. Identification of 371 genetic variants for age at first sex and birth linked to externalising behaviour. Nat Hum Behav. 2021;5(12):1717–30. 10.1038/s41562-021-01135-3. [DOI] [PMC free article] [PubMed]
  • 3.Oh IH, Reddy EP. The myb gene family in cell growth, differentiation and apoptosis. Oncogene. 1999;18(19):3017–33. 10.1038/sj.onc.1202839. [DOI] [PubMed] [Google Scholar]
  • 4.van Dam S, Võsa U, van der Graaf A, Franke L, de Magalhães JP. Gene co-expression analysis for functional classification and gene-disease predictions. Brief Bioinform. 2018;19(4):575–92. 10.1093/bib/bbw139. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Reuter JA, Spacek DV, Snyder MP. High-throughput sequencing technologies. Mol Cell. 2015;58(4):586–97. 10.1016/j.molcel.2015.05.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Heller MJ. DNA microarray technology: devices, systems, and applications. Annu Rev Biomed Eng. 2002;4(Volume 4, 2002):129–53. 10.1146/annurev.bioeng.4.020702.153438. [DOI] [PubMed] [Google Scholar]
  • 7.Chee M, Yang R, Hubbell E, Berno A, Huang XC, Stern D, et al. Accessing genetic information with high-density DNA arrays. Science. 1996;274(5287):610–4. 10.1126/science.274.5287.610. [DOI] [PubMed] [Google Scholar]
  • 8.Liu P, Li H, Liao C, Tang Y, Li M, Wang Z, et al. Identification of key genes and biological pathways in Chinese lung cancer population using bioinformatics analysis. PeerJ. 2022;10: e12731. 10.7717/peerj.12731. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Jacob L, Witteveen A, Beumer I, Delahaye L, Wehkamp D, van den Akker J, et al. Controlling technical variation amongst 6693 patient microarrays of the randomized MINDACT trial. Commun Biol. 2020;3(1):397. 10.1038/s42003-020-1111-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Grancharova T, Gerbin KA, Rosenberg AB, Roco CM, Arakaki JE, DeLizo CM, et al. A comprehensive analysis of gene expression changes in a high replicate and open-source dataset of differentiating hiPSC-derived cardiomyocytes. Sci Rep. 2021;11(1):15845. 10.1038/s41598-021-94732-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Ullah R, Naz A, Akram HS, Ullah Z, Tariq M, Mithani A, et al. Transcriptomic analysis reveals differential gene expression, alternative splicing, and novel exons during mouse trophoblast stem cell differentiation. Stem Cell Res Ther. 2020;11(1):342. 10.1186/s13287-020-01848-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Mei F, Wang J, Chen Z, Yuan Z. Potentially important MicroRNAs in form-deprivation myopia revealed by bioinformatics analysis of MicroRNA profiling. Ophthalmic Res. 2017;57(3):186–93. 10.1159/000452421. [DOI] [PubMed] [Google Scholar]
  • 13.Koizumi K, Oku M, Hayashi S, Inujima A, Shibahara N, Chen L, et al. Suppression of dynamical network biomarker signals at the predisease state (Mibyou) before metabolic syndrome in mice by a traditional Japanese medicine (Kampo Formula) Bofutsushosan. Evid Based Complement Alternat Med. 2020;2020: 9129134. 10.1155/2020/9129134. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Lee DY, Hah JH, Jeong WJ, Chung EJ, Kwon TK, Ahn SH, et al. The expression of defensin-associated genes may be correlated with lymph node metastasis of early-stage tongue cancer. Clin Exp Otorhinolaryngol. 2022;15(4):372–9. 10.21053/ceo.2022.00150. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Xu X, Zhang Y, Williams J, Antoniou E, McCombie WR, Wu S, et al. Parallel comparison of Illumina RNA-Seq and Affymetrix microarray platforms on transcriptomic profiles generated from 5-aza-deoxy-cytidine treated HT-29 colon cancer cells and simulated datasets. BMC Bioinformatics. 2013;14 Suppl 9(Suppl 9):S1. 10.1186/1471-2105-14-S9-S1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Li T-H, Qin C, Zhao B-B, Cao H-T, Yang X-Y, Wang Y-Y, et al. Identification METTL18 as a potential prognosis biomarker and associated with immune infiltrates in hepatocellular carcinoma. Front Oncol. 2021;11: 665192. 10.3389/fonc.2021.665192. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Zhang G, Yang P. Bioinformatics genes and pathway analysis for chronic neuropathic pain after spinal cord injury. Biomed Res Int. 2017;2017: 6423021. 10.1155/2017/6423021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Abid F, Rubab Z, Fatima S, Qureshi A, Azhar A, Jafri A. In-silico analysis of interacting pathways through KIM-1 protein interaction in diabetic nephropathy. BMC Nephrol. 2022;23(1):254. 10.1186/s12882-022-02876-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Fernández-Pérez L, Guerra B, Recio C, Cabrera-Galván JJ, García I, De La Rosa JV, et al. Transcriptomic and lipid profiling analysis reveals a functional interplay between testosterone and growth hormone in hypothyroid liver. Front endocrinol. 2023;14: 1266150. 10.3389/fendo.2023.1266150. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Zhang L, Thapa I, Haas C, Bastola D. Multiplatform biomarker identification using a data-driven approach enables single-sample classification. BMC Bioinformatics. 2019;20(1):601. 10.1186/s12859-019-3140-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Liang E, Xiao S, Zhao C, Zhang Y, Fu G. M6A modification promotes blood-brain barrier breakdown during cerebral ischemia/reperfusion injury through increasing matrix metalloproteinase 3 expression. Heliyon. 2023;9(6): e16905. 10.1016/j.heliyon.2023.e16905. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Guan X, Hu R, Choi Y, Srivats S, Nabet BY, Silva J, et al. Anti-TIGIT antibody improves PD-L1 blockade through myeloid and Treg cells. Nature. 2024;627(8004):646–55. 10.1038/s41586-024-07121-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Aiello G, Sabino C, Pernici D, Audano M, Antonica F, Gianesello M, et al. Transient rapamycin treatment during developmental stage extends lifespan in Mus musculus and Drosophila melanogaster. EMBO Rep. 2022;23(9):e55299. 10.15252/embr.202255299. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Dunsmore G, Rosero EP, Shahbaz S, Santer DM, Jovel J, Lacy P, et al. Neutrophils promote T-cell activation through the regulated release of CD44-bound Galectin-9 from the cell surface during HIV infection. PLoS Biol. 2021;19(8): e3001387. 10.1371/journal.pbio.3001387. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Wong HY, Sheng Q, Hesterberg AB, Croessmann S, Rios BL, Giri K, et al. Single cell analysis of cribriform prostate cancer reveals cell intrinsic and tumor microenvironmental pathways of aggressive disease. Nat Commun. 2022;13(1):6036. 10.1038/s41467-022-33780-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Wang G, Hua R, Chen X, He X, Dingming Y, Chen H, et al. MX1 and UBE2L6 are potential metaflammation gene targets in both diabetes and atherosclerosis. PeerJ. 2024;12: e16975. 10.7717/peerj.16975. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Zhang Y, Luo Y, Liu X, Kiupel M, Li A, Wang H, et al. NCOA5 haploinsufficiency in myeloid-lineage cells sufficiently causes nonalcoholic steatohepatitis and hepatocellular carcinoma. Cell Mol Gastroenterol Hepatol. 2024;17(1):1–27. 10.1016/j.jcmgh.2023.09.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Yu H, Lin J, Yuan J, Sun X, Wang C, Bai B. Screening mitochondria-related biomarkers in skin and plasma of atopic dermatitis patients by bioinformatics analysis and machine learning. Front Immunol. 2024;15: 1367602. 10.3389/fimmu.2024.1367602. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Tabone O, Verma R, Singhania A, Chakravarty P, Branchett WJ, Graham CM, et al. Blood transcriptomics reveal the evolution and resolution of the immune response in tuberculosis. J Exp Med. 2021;218(10): e20210915. 10.1084/jem.20210915. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Sun J, Xie T, Jamal M, Tu Z, Li X, Wu Y, et al. CLEC3B as a potential diagnostic and prognostic biomarker in lung cancer and association with the immune microenvironment. Cancer Cell Int. 2020;20:106. 10.1186/s12935-020-01183-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Lee I-S, Ahn J, Kim K, Okugawa Y, Toiyama Y, Hur H, et al. A blood-based transcriptomic signature for noninvasive diagnosis of gastric cancer. Br J Cancer. 2021;125(6):846–53. 10.1038/s41416-021-01461-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Egan C, Nicolae A, Lack J, Chung H-J, Skarshaug S, Pham TA, et al. Genomic profiling of primary histiocytic sarcoma reveals two molecular subgroups. Haematologica. 2020;105(4):951–60. 10.3324/haematol.2019.230375. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Liu X, Wu L, Wang L, Li Y. Identification and classification of glioma subtypes based on RNA-binding proteins. Comput Biol Med. 2024;174: 108404. 10.1016/j.compbiomed.2024.108404. [DOI] [PubMed] [Google Scholar]
  • 34.Liu X, Liu P, Chernock RD, Kuhs KAL, Lewis JS, Luo J, et al. A prognostic gene expression signature for oropharyngeal squamous cell carcinoma. EBioMedicine. 2020;61: 102805. 10.1016/j.ebiom.2020.102805. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Yang X, Lei P, Huang L, Tang X, Wei B, Wei H. Prognostic value of LRRC4C in colon and gastric cancers correlates with tumour microenvironment immunity. Int J Biol Sci. 2021;17(5):1413–27. 10.7150/ijbs.58876. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Shen Z, Du W, Perkins C, Fechter L, Natu V, Maecker H, et al. Platelet transcriptome identifies progressive markers and potential therapeutic targets in chronic myeloproliferative neoplasms. Cell Reports Medicine. 2021;2(10): 100425. 10.1016/j.xcrm.2021.100425. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Montalban-Bravo G, Class CA, Ganan-Gomez I, Kanagal-Shamanna R, Sasaki K, Richard-Carpentier G, et al. Transcriptomic analysis implicates necroptosis in disease progression and prognosis in myelodysplastic syndromes. Leukemia. 2020;34(3):872–81. 10.1038/s41375-019-0623-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Zhang X, Yuan Z, Ji J, Li H, Xue F. Network or regression-based methods for disease discrimination: a comparison study. BMC Med Res Methodol. 2016;16:100. 10.1186/s12874-016-0207-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Zhang L, Ju T, Jin X, Ji J, Han J, Zhou X, et al. Network regression analysis for binary and ordinal categorical phenotypes in transcriptome-wide association studies. Genetics. 2022;222(4):iyac153. 10.1093/genetics/iyac153. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Eddy JA, Hood L, Price ND, Geman D. Identifying tightly regulated and variably expressed networks by Differential Rank Conservation (DIRAC). PLoS Comput Biol. 2010;6(5): e1000792. 10.1371/journal.pcbi.1000792. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Eddy JA, Sung J, Geman D, Price ND. Relative expression analysis for molecular cancer diagnosis and prognosis. Technol Cancer Res Treat. 2010;9(2):149–59. 10.1177/153303461000900204. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Bø TH, Jonassen I. New feature subset selection procedures for classification of expression profiles. Genome Biol. 2002;3(4):research0017.1-research.11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Dickinson SE, Griffin BA, Elmore MF, Kriese-Anderson L, Elmore JB, Dyce PW, et al. Transcriptome profiles in peripheral white blood cells at the time of artificial insemination discriminate beef heifers with different fertility potential. BMC Genomics. 2018;19(1):129. 10.1186/s12864-018-4505-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Vachani A, Nebozhyn M, Singhal S, Alila L, Wakeam E, Muschel R, et al. A 10-gene classifier for distinguishing head and neck squamous cell carcinoma and lung squamous cell carcinoma. Clin Cancer Res. 2007;13(10):2905–15. 10.1158/1078-0432.CCR-06-1670. [DOI] [PubMed] [Google Scholar]
  • 45.Zhong J, Huang Q, Wang Y, Gao H, Jia H, Fan J, et al. Distinguishing kawasaki disease from febrile infectious disease using gene pair signatures. Biomed Res Int. 2020;2020: 6539398. 10.1155/2020/6539398. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Yang BY, Sakharkar MK. Alterations in gene pair correlations as potential diagnostic markers for colon cancer. Int J Mol Sci. 2022;23(20): 12463. 10.3390/ijms232012463. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Li L, Wang B. One ferroptosis-related gene-pair signature serves as an original prognostic biomarker in lung adenocarcinoma. Front Genet. 2022;13: 841712. 10.3389/fgene.2022.841712. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Gordon GJ, Jensen RV, Hsiao L-L, Gullans SR, Blumenstock JE, Ramaswamy S, et al. Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Res. 2002;62(17):4963–7. [PubMed] [Google Scholar]
  • 49.Zak DE, Penn-Nicholson A, Scriba TJ, Thompson E, Suliman S, Amon LM, et al. A blood RNA signature for tuberculosis disease risk: a prospective cohort study. Lancet. 2016;387(10035):2312–22. 10.1016/S0140-6736(15)01316-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Zhao S, Zhang B, Yang J, Zhou J, Xu Y. Linear discriminant analysis. Nat Rev Methods Primers. 2024;4(1):1–16. 10.1038/s43586-024-00346-y. [Google Scholar]
  • 51.Lavi ES, Lin ZP, Ratner ES. Gene expression of non-homologous end-joining pathways in the prognosis of ovarian cancer. iScience. 2023;26(10):107934. 10.1016/j.isci.2023.107934. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, et al. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci U S A. 1999;96(12):6745–50. 10.1073/pnas.96.12.6745. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science. 1999;286(5439):531–7. 10.1126/science.286.5439.531. [DOI] [PubMed] [Google Scholar]
  • 54.Dysvik B, Jonassen I. J-Express: exploring gene expression data using Java. Bioinformatics. 2001;17(4):369–70. 10.1093/bioinformatics/17.4.369. [DOI] [PubMed] [Google Scholar]
  • 55.Mukherjee S, Laiakis EC, Fornace AJ, Amundson SA. Impact of inflammatory signaling on radiation biodosimetry: mouse model of inflammatory bowel disease. BMC Genomics. 2019;20(1):329. 10.1186/s12864-019-5689-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Jeon H, Lee W, Park H, Lee HJ, Kim SK, Kim HB, et al. Automatic classification of tremor severity in parkinson’s disease using a wearable device. Sensors. 2017;17(9): 2067. 10.3390/s17092067. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Bianchini G, Pusztai L, Pienkowski T, Im Y-H, Bianchi GV, Tseng L-M, et al. Immune modulation of pathologic complete response after neoadjuvant HER2-directed therapies in the NeoSphere trial. Ann Oncol. 2015;26(12):2429–36. 10.1093/annonc/mdv395. [DOI] [PubMed] [Google Scholar]
  • 58.Lai C, Reinders MJ, van’t Veer LJ, Wessels LF. A comparison of univariate and multivariate gene selection techniques for classification of cancer datasets. BMC Bioinformatics. 2006;7:235. 10.1186/1471-2105-7-235. [DOI] [PMC free article] [PubMed]
  • 59.Kanász R, Gnip P, Zoričák M, Drotár P. Bankruptcy prediction using ensemble of autoencoders optimized by genetic algorithm. PeerJ Comput Sci. 2023;9: e1257. 10.7717/peerj-cs.1257. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Ma X-J, Wang Z, Ryan PD, Isakoff SJ, Barmettler A, Fuller A, et al. A two-gene expression ratio predicts clinical outcome in breast cancer patients treated with tamoxifen. Cancer Cell. 2004;5(6):607–16. 10.1016/j.ccr.2004.05.015. [DOI] [PubMed] [Google Scholar]
  • 61.Avissar M, Christensen BC, Kelsey KT, Marsit CJ. MicroRNA expression ratio is predictive of head and neck squamous cell carcinoma. Clin Cancer Res. 2009;15(8):2850–5. 10.1158/1078-0432.CCR-08-3131. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Tsujiura M, Ichikawa D, Komatsu S, Shiozaki A, Takeshita H, Kosuga T, et al. Circulating microRNAs in plasma of patients with gastric cancers. Br J Cancer. 2010;102(7):1174–9. 10.1038/sj.bjc.6605608. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Geman D, d’Avignon C, Naiman DQ, Winslow RL. Classifying gene expression profiles from pairwise mRNA comparisons. Stat Appl Genet Mol Biol. 2004;3:Article19. 10.2202/1544-6115.1071. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Tan AC, Naiman DQ, Xu L, Winslow RL, Geman D. Simple decision rules for classifying human cancers from gene expression profiles. Bioinformatics. 2005;21(20):3896–904. 10.1093/bioinformatics/bti631. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Xu L, Geman D, Winslow RL. Large-scale integration of cancer microarray data identifies a robust common cancer signature. BMC Bioinformatics. 2007;8:275. 10.1186/1471-2105-8-275. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Shi P, Ray S, Zhu Q, Kon MA. Top scoring pairs for feature selection in machine learning and applications to cancer outcome prediction. BMC Bioinformatics. 2011;12: 375. 10.1186/1471-2105-12-375. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Luo J, Xu P, Cao P, Wan H, Lv X, Xu S, et al. Integrating genetic and gene co-expression analysis identifies gene networks involved in alcohol and stress responses. Front Mol Neurosci. 2018;11: 102. 10.3389/fnmol.2018.00102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Ao L, Zhang Z, Guan Q, Guo Y, Guo Y, Zhang J, et al. A qualitative signature for early diagnosis of hepatocellular carcinoma based on relative expression orderings. Liver Int. 2018;38(10):1812–9. 10.1111/liv.13864. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Zhang Z-M, Tan J-X, Wang F, Dao F-Y, Zhang Z-Y, Lin H. Early diagnosis of hepatocellular carcinoma using machine learning method. Front Bioeng Biotechnol. 2020;8: 254. 10.3389/fbioe.2020.00254. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.West M, Blanchette C, Dressman H, Huang E, Ishida S, Spang R, et al. Predicting the clinical status of human breast cancer by using gene expression profiles. Proc Natl Acad Sci U S A. 2001;98(20):11462–7. 10.1073/pnas.201162998. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Singh D, Febbo PG, Ross K, Jackson DG, Manola J, Ladd C, et al. Gene expression correlates of clinical prostate cancer behavior. Cancer Cell. 2002;1(2):203–9. 10.1016/s1535-6108(02)00030-2. [DOI] [PubMed] [Google Scholar]
  • 72.Miao Z, Humphreys BD, McMahon AP, Kim J. Multi-omics integration in the age of million single-cell data. Nat Rev Nephrol. 2021;17(11):710–24. 10.1038/s41581-021-00463-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Chen Y, Cai X, Cao Z, Lin J, Huang W, Zhuang Y, et al. Prediction of red blood cell transfusion after orthopedic surgery using an interpretable machine learning framework. Front Surg. 2023;10: 1047558. 10.3389/fsurg.2023.1047558. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Zhao H, Logothetis CJ, Gorlov IP, Zeng J, Dai J. Modified logistic regression models using gene coexpression and clinical features to predict prostate cancer progression. Comput Math Methods Med. 2013;2013: 917502. 10.1155/2013/917502. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Wang X. Robust two-gene classifiers for cancer prediction. Genomics. 2012;99(2):90–5. 10.1016/j.ygeno.2011.11.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Wang H, Zhang H, Dai Z, Chen MS, Yuan Z. TSG: a new algorithm for binary and multi-class cancer classification and informative genes selection. BMC Med Genomics. 2013;6 Suppl 1(Suppl 1):S3. 10.1186/1755-8794-6-S1-S3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Moody L, Chen H, Pan Y-X. Considerations for feature selection using gene pairs and applications in large-scale dataset integration, novel oncogene discovery, and interpretable cancer screening. BMC Med Genomics. 2020;13(Suppl 10):148. 10.1186/s12920-020-00778-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Leek JT. The tspair package for finding top scoring pair classifiers in R. Bioinformatics. 2009;25(9):1203–4. 10.1093/bioinformatics/btp126. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Isella C, Brundu F, Bellomo SE, Galimi F, Zanella E, Porporato R, et al. Selective analysis of cancer-cell intrinsic transcriptional traits defines novel clinically relevant subtypes of colorectal cancer. Nat Commun. 2017;8: 15107. 10.1038/ncomms15107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Hua Y, Cai D, Shirley CA, Mo S, Chen R, Gao F, et al. A prognostic model for ovarian neoplasms established by an integrated analysis of 1580 transcriptomic profiles. Sci Rep. 2023;13(1):19429. 10.1038/s41598-023-45410-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81.Zhao H, Logothetis CJ, Gorlov IP. Usefulness of the top-scoring pairs of genes for prediction of prostate cancer progression. Prostate Cancer Prostatic Dis. 2010;13(3):252–9. 10.1038/pcan.2010.9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82.Weichselbaum RR, Ishwaran H, Yoon T, Nuyten DSA, Baker SW, Khodarev N, et al. An interferon-related gene signature for DNA damage resistance is a predictive marker for chemotherapy and radiation for breast cancer. Proc Natl Acad Sci U S A. 2008;105(47):18490–5. 10.1073/pnas.0809242105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83.Raponi M, Lancet JE, Fan H, Dossey L, Lee G, Gojo I, et al. A 2-gene classifier for predicting response to the farnesyltransferase inhibitor tipifarnib in acute myeloid leukemia. Blood. 2008;111(5):2589–96. 10.1182/blood-2007-09-112730. [DOI] [PubMed] [Google Scholar]
  • 84.Qi L, Chen L, Li Y, Qin Y, Pan R, Zhao W, et al. Critical limitations of prognostic signatures based on risk scores summarized from gene expression levels: a case study for resected stage I non-small-cell lung cancer. Brief Bioinform. 2016;17(2):233–42. 10.1093/bib/bbv064. [DOI] [PubMed] [Google Scholar]
  • 85.Lin X, Afsari B, Marchionni L, Cope L, Parmigiani G, Naiman D, et al. The ordering of expression among a few genes can provide simple cancer biomarkers and signal BRCA1 mutations. BMC Bioinformatics. 2009;10: 256. 10.1186/1471-2105-10-256. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 86.Li C, Zhang B, Schaafsma E, Reuben A, Wang L, Turk MJ, et al. TimiGP: inferring cell-cell interactions and prognostic associations in the tumor immune microenvironment through gene pairs. Cell Reports Medicine. 2023;4(7): 101121. 10.1016/j.xcrm.2023.101121. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 87.Lu M, Yin R, Chen XS. Ensemble methods of rank-based trees for single sample classification with gene expression profiles. J Transl Med. 2024;22(1):140. 10.1186/s12967-024-04940-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 88.Czajkowski M, Jurczuk K, Kretowski M. Enhancing multi-omics data classification with relative expression analysis and decision trees. Journal of Computational Science. 2025;84: 102460. 10.1016/j.jocs.2024.102460. [Google Scholar]
  • 89.Czajkowski M, Jurczuk K, Kretowski M. Enhancing transparency of omics data analysis with the evolutionary multi-test tree and relative expression. Expert Syst Appl. 2025;276: 127131. 10.1016/j.eswa.2025.127131. [Google Scholar]
  • 90.Pomeroy SL, Tamayo P, Gaasenbeek M, Sturla LM, Angelo M, McLaughlin ME, et al. Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature. 2002;415(6870):436–42. 10.1038/415436a. [DOI] [PubMed] [Google Scholar]
  • 91.Shipp MA, Ross KN, Tamayo P, Weng AP, Kutok JL, Aguiar RCT, et al. Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nat Med. 2002;8(1):68–74. 10.1038/nm0102-68. [DOI] [PubMed] [Google Scholar]
  • 92.Armstrong SA, Staunton JE, Silverman LB, Pieters R, den Boer ML, Minden MD, et al. MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nat Genet. 2002;30(1):41–7. 10.1038/ng765. [DOI] [PubMed] [Google Scholar]
  • 93.Podgorelec V, Kokol P, Stiglic B, Rozman I. Decision trees: an overview and their use in medicine. J Med Syst. 2002;26(5):445–63. 10.1023/a:1016409317640. [DOI] [PubMed] [Google Scholar]
  • 94.Zhang Z. Introduction to machine learning: k-nearest neighbors. Ann Transl Med. 2016;4(11):218. 10.21037/atm.2016.03.37. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 95.Zhang Z. Naïve Bayes classification in R. Ann Transl Med. 2016;4(12):241. 10.21037/atm.2016.03.38. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 96.Ferrarotto R, Mishra V, Herz E, Yaacov A, Solomon O, Rauch R, et al. AL101, a gamma-secretase inhibitor, has potent antitumor activity against adenoid cystic carcinoma with activated NOTCH signaling. Cell Death Dis. 2022;13(8):678. 10.1038/s41419-022-05133-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 97.Pakula H, Omar M, Carelli R, Pederzoli F, Fanelli GN, Pannellini T, et al. Distinct mesenchymal cell states mediate prostate cancer progression. Nat Commun. 2024;15(1):363. 10.1038/s41467-023-44210-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 98.Li B, Cui Y, Diehn M, Li R. Development and validation of an individualized immune prognostic signature in early-stage nonsquamous non-small cell lung cancer. JAMA Oncol. 2017;3(11):1529–37. 10.1001/jamaoncol.2017.1609. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 99.Liu S, Nalesnik MA, Singhi A, Wood-Trageser MA, Randhawa P, Ren B-G, et al. Transcriptome and exome analyses of hepatocellular carcinoma reveal patterns to predict cancer recurrence in liver transplant patients. Hepatology Communications. 2022;6(4):710–27. 10.1002/hep4.1846. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 100.Price ND, Trent J, El-Naggar AK, Cogdell D, Taylor E, Hunt KK, et al. Highly accurate two-gene classifier for differentiating gastrointestinal stromal tumors and leiomyosarcomas. Proc Natl Acad Sci U S A. 2007;104(9):3414–9. 10.1073/pnas.0611373104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 101.Horr C, Buechler SA. Breast cancer consensus subtypes: a system for subtyping breast cancer tumors based on gene expression. NPJ breast cancer. 2021;7(1):136. 10.1038/s41523-021-00345-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 102.Scialdone A, Natarajan KN, Saraiva LR, Proserpio V, Teichmann SA, Stegle O, et al. Computational assignment of cell-cycle stage from single-cell transcriptome data. Methods. 2015;85:54–61. 10.1016/j.ymeth.2015.06.021. [DOI] [PubMed] [Google Scholar]
  • 103.Dihge L, Vallon-Christersson J, Hegardt C, Saal LH, Häkkinen J, Larsson C, et al. Prediction of lymph node metastasis in breast cancer by gene expression and clinicopathological models: development and validation within a population-based cohort. Clin Cancer Res. 2019;25(21):6368–81. 10.1158/1078-0432.CCR-19-0075. [DOI] [PubMed] [Google Scholar]
  • 104.Omar M, Nuzzo PV, Ravera F, Bleve S, Fanelli GN, Zanettini C, et al. Notch-based gene signature for predicting the response to neoadjuvant chemotherapy in triple-negative breast cancer. J Transl Med. 2023;21(1):811. 10.1186/s12967-023-04713-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 105.Hochberg JT, Sohal A, Handa P, Maliken BD, Kim T-K, Wang K, et al. Serum miRNA profiles are altered in patients with primary sclerosing cholangitis receiving high-dose ursodeoxycholic acid. JHEP Rep. 2023;5(6): 100729. 10.1016/j.jhepr.2023.100729. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 106.Afsari B, Fertig EJ, Geman D, Marchionni L. switchBox: an R package for k-Top Scoring Pairs classifier development. Bioinformatics. 2015;31(2):273–4. 10.1093/bioinformatics/btu622. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 107.Michiels S, Koscielny S, Hill C. Prediction of cancer outcome with microarrays: a multiple random validation strategy. Lancet. 2005;365(9458):488–92. 10.1016/S0140-6736(05)17866-0. [DOI] [PubMed] [Google Scholar]
  • 108.Ramaswamy S, Tamayo P, Rifkin R, Mukherjee S, Yeang CH, Angelo M, et al. Multiclass cancer diagnosis using tumor gene expression signatures. Proc Natl Acad Sci U S A. 2001;98(26):15149–54. 10.1073/pnas.211566398. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 109.Bhattacharjee A, Richards WG, Staunton J, Li C, Monti S, Vasa P, et al. Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. Proc Natl Acad Sci U S A. 2001;98(24):13790–5. 10.1073/pnas.191502998. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 110.Yu YP, Landsittel D, Jing L, Nelson J, Ren B, Liu L, et al. Gene expression alterations in prostate cancer predicting tumor aggression and preceding development of malignancy. J Clin Oncol. 2004;22(14):2790–9. 10.1200/JCO.2004.05.158. [DOI] [PubMed] [Google Scholar]
  • 111.Tsoi J, Robert L, Paraiso K, Galvan C, Sheu KM, Lay J, et al. Multi-stage differentiation defines melanoma subtypes with differential vulnerability to drug-induced iron-dependent oxidative stress. Cancer Cell. 2018;33(5):890-904.e5. 10.1016/j.ccell.2018.03.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 112.Wang H, Sun Q, Zhao W, Qi L, Gu Y, Li P, et al. Individual-level analysis of differential expression of genes and pathways for personalized medicine. Bioinformatics. 2015;31(1):62–8. 10.1093/bioinformatics/btu522. [DOI] [PubMed] [Google Scholar]
  • 113.Wei T-YW, Juan C-C, Hisa J-Y, Su L-J, Lee Y-CG, Chou H-Y, et al. Protein arginine methyltransferase 5 is a potential oncoprotein that upregulates G1 cyclins/cyclin-dependent kinases and the phosphoinositide 3-kinase/AKT signaling cascade. Cancer Sci. 2012;103(9):1640–50. 10.1111/j.1349-7006.2012.02367.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 114.Chen D-T, Nasir A, Culhane A, Venkataramu C, Fulp W, Rubio R, et al. Proliferative genes dominate malignancy-risk gene signature in histologically-normal breast tissue. Breast Cancer Res Treat. 2010;119(2):335–46. 10.1007/s10549-009-0344-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 115.Song Q-S, Wu H-J, Lin Q, Tang Y-K. Identification of a seven autophagy-related gene pairs signature for the diagnosis of colorectal cancer using the RankComp algorithm. J Bioinform Comput Biol. 2023;21(3):2350012. 10.1142/S0219720023500129. [DOI] [PubMed] [Google Scholar]
  • 116.Hu G, Cheng Z, Wu Z, Wang H. Identification of potential key genes associated with osteosarcoma based on integrated bioinformatics analyses. J Cell Biochem. 2019;120(8):13554–61. 10.1002/jcb.28630. [DOI] [PubMed] [Google Scholar]
  • 117.Cai H, Li X, Li J, Liang Q, Zheng W, Guan Q, et al. Identifying differentially expressed genes from cross-site integrated data based on relative expression orderings. Int J Biol Sci. 2018;14(8):892–900. 10.7150/ijbs.24548. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 118.Wang H, Cai H, Ao L, Yan H, Zhao W, Qi L, et al. Individualized identification of disease-associated pathways with disrupted coordination of gene expression. Brief Bioinform. 2016;17(1):78–87. 10.1093/bib/bbv030. [DOI] [PubMed] [Google Scholar]
  • 119.Blanchet FG, Legendre P, Borcard D. Forward selection of explanatory variables. Ecology. 2008;89(9):2623–32. 10.1890/07-0986.1. [DOI] [PubMed] [Google Scholar]
  • 120.Zhou Y-J, Lu X-F, Meng J-L, Wang X-Y, Ruan X-J, Yang C-J, et al. Qualitative transcriptional signature for the pathological diagnosis of pancreatic cancer. Front Mol Biosci. 2020;7: 569842. 10.3389/fmolb.2020.569842. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 121.Wang O, Shi D, Li Y, Zhou X, Yan H, Yao Q. lncRNA pair as candidate diagnostic signature for colorectal cancer based on the within-sample relative expression levels. Front Oncol. 2022;12: 912882. 10.3389/fonc.2022.912882. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 122.Yan H, Li M, Cao L, Chen H, Lai H, Guan Q, et al. A robust qualitative transcriptional signature for the correct pathological diagnosis of gastric cancer. J Transl Med. 2019;17(1):63. 10.1186/s12967-019-1816-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 123.Bu X, Ma L, Liu S, Wen D, Kan A, Xu Y, et al. A novel qualitative signature based on lncRNA pairs for prognosis prediction in hepatocellular carcinoma. Cancer Cell Int. 2022;22(1):95. 10.1186/s12935-022-02507-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 124.Li G, Xu W, Zhang L, Liu T, Jin G, Song J, et al. Development and validation of a CIMP-associated prognostic model for hepatocellular carcinoma. EBioMedicine. 2019;47:128–41. 10.1016/j.ebiom.2019.08.064. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 125.Peng H, Long F, Ding C. Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell. 2005;27(8):1226–38. 10.1109/TPAMI.2005.159. [DOI] [PubMed] [Google Scholar]
  • 126.Li F, Li C, Wang M, Webb GI, Zhang Y, Whisstock JC, et al. GlycoMine: a machine learning-based approach for predicting N-, C- and O-linked glycosylation in the human proteome. Bioinformatics. 2015;31(9):1411–9. 10.1093/bioinformatics/btu852. [DOI] [PubMed] [Google Scholar]
  • 127.Xie X, Wu C, Ma C, Gao D, Su W, Huang J, et al. Detecting key genes relative expression orderings as biomarkers for machine learning-based intelligent screening and analysis of type 2 diabetes mellitus. Expert Syst Appl. 2024;255: 124702. 10.1016/j.eswa.2024.124702. [Google Scholar]
  • 128.Zhang Z-Y, Sun Z-J, Gao D, Hao Y-D, Lin H, Liu F. Excavation of gene markers associated with pancreatic ductal adenocarcinoma based on interrelationships of gene expression. IET Syst Biol. 2024;18(6):261–70. 10.1049/syb2.12090. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 129.Li X, Ge J, Liu Z, Yang S, Wang L, Liu Y. Estimating the methane flux of the Dajiuhu subalpine peatland using machine learning algorithms and the maximal information coefficient technique. Sci Total Environ. 2024;916: 170241. 10.1016/j.scitotenv.2024.170241. [DOI] [PubMed] [Google Scholar]
  • 130.Auliah FN, Nilamyani AN, Shoombuatong W, Alam MA, Hasan MM, Kurata H. PUP-Fuse: prediction of protein pupylation sites by integrating multiple sequence representations. Int J Mol Sci. 2021;22(4): 2120. 10.3390/ijms22042120. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 131.Wang M, Perucho JAU, Vardhanabhuti V, Ip P, Ngan HYS, Lee EYP. Radiomic features of T2-weighted imaging and diffusion kurtosis imaging in differentiating clinicopathological characteristics of cervical carcinoma. Acad Radiol. 2022;29(8):1133–40. 10.1016/j.acra.2021.08.018. [DOI] [PubMed] [Google Scholar]
  • 132.LaValley MP. Logistic regression. Circulation. 2008. 10.1161/CIRCULATIONAHA.106.682658. [DOI] [PubMed] [Google Scholar]
  • 133.King RD, Orhobor OI, Taylor CC. Cross-validation is safe to use. Nat Mach Intell. 2021;3(4):276-. 10.1038/s42256-021-00332-z. [Google Scholar]
  • 134.Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: machine Learning in Python. J Mach Learn Res. 2011;12(null):2825–30. [Google Scholar]
  • 135.Clough E, Barrett T. The gene expression omnibus database. Methods Mol Biol. 2016;1418:93–110. 10.1007/978-1-4939-3578-9_5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 136.Berry MPR, Graham CM, McNab FW, Xu Z, Bloch SAA, Oni T, et al. An interferon-inducible neutrophil-driven blood transcriptional signature in human tuberculosis. Nature. 2010;466(7309):973–7. 10.1038/nature09247. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 137.Blankley S, Graham CM, Turner J, Berry MPR, Bloom CI, Xu Z, et al. The transcriptional signature of active tuberculosis reflects symptom status in extra-pulmonary and pulmonary tuberculosis. PLoS ONE. 2016;11(10): e0162220. 10.1371/journal.pone.0162220. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 138.Joyce SM. Sputum analysis and culture. Ann Emerg Med. 1986;15(3):325–8. 10.1016/s0196-0644(86)80576-5. [DOI] [PubMed] [Google Scholar]
  • 139.Campbell JR, Krot J, Elwood K, Cook V, Marra F. A systematic review on TST and IGRA tests used for diagnosis of LTBI in immigrants. Mol Diagn Ther. 2015;19(1):9–24. 10.1007/s40291-014-0125-0. [DOI] [PubMed] [Google Scholar]
  • 140.Huebner RE, Schein MF, John B, Bass J. The tuberculin skin test. Clin Infect Dis. 1993;17(6):968–75. [DOI] [PubMed] [Google Scholar]
  • 141.Lalvani A, Pareek M. Interferon gamma release assays: principles and practice. Enferm Infecc Microbiol Clin. 2010;28(4):245–52. 10.1016/j.eimc.2009.05.012. [DOI] [PubMed] [Google Scholar]
  • 142.Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 2015;43(7): e47. 10.1093/nar/gkv007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 143.Jobe D, Darboe F, Muefong CN, Barry A, Coker EG, Mohammed N, et al. Gene expression in TB disease measured from the periphery is different from the site of infection. Tuberculosis (Edinb). 2022;134: 102187. 10.1016/j.tube.2022.102187. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 144.Natarajan S, Ranganathan M, Hanna LE, Tripathy S. Transcriptional profiling and deriving a seven-gene signature that discriminates active and latent tuberculosis: an integrative bioinformatics approach. Genes. 2022;13(4): 616. 10.3390/genes13040616. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 145.Eckold C, Kumar V, Weiner J, Alisjahbana B, Riza A-L, Ronacher K, et al. Impact of intermediate hyperglycemia and diabetes on immune dysfunction in tuberculosis. Clin Infect Dis. 2021;72(1):69–78. 10.1093/cid/ciaa751. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 146.Blankley S, Graham CM, Levin J, Turner J, Berry MPR, Bloom CI, et al. A 380-gene meta-signature of active tuberculosis compared with healthy controls. Eur Respir J. 2016;47(6):1873–6. 10.1183/13993003.02121-2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 147.Kaforou M, Wright VJ, Oni T, French N, Anderson ST, Bangani N, et al. Detection of tuberculosis in HIV-infected and -uninfected African adults using whole blood RNA expression signatures: a case-control study. PLoS Med. 2013;10(10): e1001538. 10.1371/journal.pmed.1001538. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 148.Anderson ST, Kaforou M, Brent AJ, Wright VJ, Banwell CM, Chagaluka G, et al. Diagnosis of childhood tuberculosis and host RNA expression in Africa. N Engl J Med. 2014;370(18):1712–23. 10.1056/NEJMoa1303657. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 149.Thakur C, Tripathi A, Ravichandran S, Shivananjaiah A, Chakraborty A, Varadappa S, et al. A new blood-based RNA signature (R9), for monitoring effectiveness of tuberculosis treatment in a South Indian longitudinal cohort. iScience. 2022;25(2):103745. 10.1016/j.isci.2022.103745. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 150.Sutherland JS, Loxton AG, Haks MC, Kassa D, Ambrose L, Lee J-S, et al. Differential gene expression of activating Fcγ receptor classifies active tuberculosis regardless of human immunodeficiency virus status or ethnicity. Clin Microbiol Infect. 2014;20(4):O230–8. 10.1111/1469-0691.12383. [DOI] [PubMed] [Google Scholar]
  • 151.Gebremicael G, Kassa D, Quinten E, Alemayehu Y, Gebreegziaxier A, Belay Y, et al. Host gene expression kinetics during treatment of tuberculosis in HIV-coinfected individuals is independent of highly active antiretroviral therapy. J Infect Dis. 2018;218(11):1833–46. 10.1093/infdis/jiy404. [DOI] [PubMed] [Google Scholar]
  • 152.Sambarey A, Devaprasad A, Mohan A, Ahmed A, Nayak S, Swaminathan S, et al. Unbiased identification of blood-based biomarkers for pulmonary tuberculosis by modeling and mining molecular interaction networks. EBioMedicine. 2017;15:112–26. 10.1016/j.ebiom.2016.12.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 153.Penn-Nicholson A, Mbandi SK, Thompson E, Mendelsohn SC, Suliman S, Chegou NN, et al. RISK6, a 6-gene transcriptomic signature of TB disease risk, diagnosis and treatment response. Sci Rep. 2020;10(1):8629. 10.1038/s41598-020-65043-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 154.Esmail H, Lai RP, Lesosky M, Wilkinson KA, Graham CM, Horswell S, et al. Complement pathway gene activation and rising circulating immune complexes characterize early disease in HIV-associated tuberculosis. Proc Natl Acad Sci U S A. 2018;115(5):E964–73. 10.1073/pnas.1711853115. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 155.Hoang LT, Jain P, Pillay TD, Tolosa-Wright M, Niazi U, Takwoingi Y, et al. Transcriptomic signatures for diagnosing tuberculosis in clinical practice: a prospective, multicentre cohort study. Lancet Infect Dis. 2021;21(3):366–75. 10.1016/S1473-3099(20)30928-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 156.Tetreault M-P, Yang Y, Katz JP. Krüppel-like factors in cancer. Nat Rev Cancer. 2013;13(10):701–13. 10.1038/nrc3582. [DOI] [PubMed] [Google Scholar]
  • 157.Sweeney TE, Braviak L, Tato CM, Khatri P. Genome-wide expression for diagnosis of pulmonary tuberculosis: a multicohort analysis. Lancet Respir Med. 2016;4(3):213–24. 10.1016/S2213-2600(16)00048-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 158.Chang A, Loy CJ, Eweis-LaBolle D, Lenz JS, Steadman A, Andgrama A, et al. Circulating cell-free RNA in blood as a host response biomarker for detection of tuberculosis. Nat Commun. 2024;15(1):4949. 10.1038/s41467-024-49245-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 159.Banafea O, Mghanga FP, Zhao J, Zhao R, Zhu L. Endoscopic ultrasonography with fine-needle aspiration for histological diagnosis of solid pancreatic masses: a meta-analysis of diagnostic accuracy studies. BMC Gastroenterol. 2016;16(1):108. 10.1186/s12876-016-0519-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 160.Frampton AE, Castellano L, Colombo T, Giovannetti E, Krell J, Jacob J, et al. MicroRNAs cooperatively inhibit a network of tumor suppressor genes to promote pancreatic tumor growth and progression. Gastroenterology. 2014;146(1):268-77.e18. 10.1053/j.gastro.2013.10.010. [DOI] [PubMed] [Google Scholar]
  • 161.Sutaria DS, Jiang J, Azevedo-Pouly ACP, Lee EJ, Lerner MR, Brackett DJ, et al. Expression profiling identifies the noncoding processed transcript of HNRNPU with proliferative properties in pancreatic ductal adenocarcinoma. Non-coding RNA. 2017;3(3): 24. 10.3390/ncrna3030024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 162.Jiang J, Azevedo-Pouly ACP, Redis RS, Lee EJ, Gusev Y, Allard D, et al. Globally increased ultraconserved noncoding RNA expression in pancreatic adenocarcinoma. Oncotarget. 2016;7(33):53165–77. 10.18632/oncotarget.10242. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 163.Klett H, Fuellgraf H, Levit-Zerdoun E, Hussung S, Kowar S, Küsters S, et al. Identification and validation of a diagnostic and prognostic multi-gene biomarker panel for pancreatic ductal adenocarcinoma. Front Genet. 2018;9: 108. 10.3389/fgene.2018.00108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 164.Moffitt RA, Marayati R, Flate EL, Volmar KE, Loeza SGH, Hoadley KA, et al. Virtual microdissection identifies distinct tumor- and stroma-specific subtypes of pancreatic ductal adenocarcinoma. Nat Genet. 2015;47(10):1168–78. 10.1038/ng.3398. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 165.Stratford JK, Bentrem DJ, Anderson JM, Fan C, Volmar KA, Marron JS, et al. A six-gene signature predicts survival of patients with localized pancreatic ductal adenocarcinoma. PLoS Med. 2010;7(7): e1000307. 10.1371/journal.pmed.1000307. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 166.Crnogorac-Jurcevic T, Chelala C, Barry S, Harada T, Bhakta V, Lattimore S, et al. Molecular analysis of precursor lesions in familial pancreatic cancer. PLoS ONE. 2013;8(1): e54830. 10.1371/journal.pone.0054830. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 167.Rs Janky, Binda MM, Allemeersch J, Van den Broeck A, Govaere O, Swinnen JV, et al. Prognostic relevance of molecular subtypes and master regulators in pancreatic ductal adenocarcinoma. BMC Cancer. 2016;16:632. 10.1186/s12885-016-2540-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 168.Hiraoka N, Yamazaki-Itoh R, Ino Y, Mizuguchi Y, Yamada T, Hirohashi S, et al. CXCL17 and ICAM2 are associated with a potential anti-tumor immune response in early intraepithelial stages of human pancreatic carcinogenesis. Gastroenterology. 2011;140(1):310–21. 10.1053/j.gastro.2010.10.009. [DOI] [PubMed] [Google Scholar]
  • 169.Grimont A, Pinho AV, Cowley MJ, Augereau C, Mawson A, Giry-Laterrière M, et al. SOX9 regulates ERBB signalling in pancreatic cancer development. Gut. 2015;64(11):1790–9. 10.1136/gutjnl-2014-307075. [DOI] [PubMed] [Google Scholar]
  • 170.Xu L, Tan AC, Naiman DQ, Geman D, Winslow RL. Robust prostate cancer marker genes emerge from direct integration of inter-study microarray data. Bioinformatics. 2005;21(20):3905–11. 10.1093/bioinformatics/bti647. [DOI] [PubMed] [Google Scholar]
  • 171.Stuart RO, Wachsman W, Berry CC, Wang-Rodriguez J, Wasserman L, Klacansky I, et al. In silico dissection of cell-type-associated patterns of gene expression in prostate cancer. Proc Natl Acad Sci U S A. 2004;101(2):615–20. 10.1073/pnas.2536479100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 172.Welsh JB, Sapinoso LM, Su AI, Kern SG, Wang-Rodriguez J, Moskaluk CA, et al. Analysis of gene expression identifies candidate markers and pharmacological targets in prostate cancer. Cancer Res. 2001;61(16):5974–8. [PubMed] [Google Scholar]
  • 173.Lapointe J, Li C, Higgins JP, van de Rijn M, Bair E, Montgomery K, et al. Gene expression profiling identifies clinically relevant subtypes of prostate cancer. Proc Natl Acad Sci U S A. 2004;101(3):811–6. 10.1073/pnas.0304146101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 174.LaTulippe E, Satagopan J, Smith A, Scher H, Scardino P, Reuter V, et al. Comprehensive gene expression analysis of prostate cancer reveals distinct transcriptional programs associated with metastatic disease. Cancer Res. 2002;62(15):4499–506. [PubMed] [Google Scholar]
  • 175.Blenkiron C, Goldstein LD, Thorne NP, Spiteri I, Chin S-F, Dunning MJ, et al. MicroRNA expression profiling of human breast cancer identifies new markers of tumor subtype. Genome Biol. 2007;8(10): R214. 10.1186/gb-2007-8-10-r214. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 176.Chang SS, Jiang WW, Smith I, Poeta LM, Begum S, Glazer C, et al. MicroRNA alterations in head and neck squamous cell carcinoma. Int J Cancer. 2008;123(12):2791–7. 10.1002/ijc.23831. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 177.Lu J, Getz G, Miska EA, Alvarez-Saavedra E, Lamb J, Peck D, et al. MicroRNA expression profiles classify human cancers. Nature. 2005;435(7043):834–8. 10.1038/nature03702. [DOI] [PubMed] [Google Scholar]
  • 178.Hattori E, Oyama R, Kondo T. Systematic review of the current status of human sarcoma cell lines. Cells. 2019;8(2): 157. 10.3390/cells8020157. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 179.Clary BM, DeMatteo RP, Lewis JJ, Leung D, Brennan MF. Gastrointestinal stromal tumors and leiomyosarcoma of the abdomen and retroperitoneum: a clinical comparison. Ann Surg Oncol. 2001;8(4):290–9. 10.1007/s10434-001-0290-3. [DOI] [PubMed] [Google Scholar]
  • 180.Fletcher CDM, Berman JJ, Corless C, Gorstein F, Lasota J, Longley BJ, et al. Diagnosis of gastrointestinal stromal tumors: A consensus approach. Hum Pathol. 2002;33(5):459–65. 10.1053/hupa.2002.123545. [DOI] [PubMed] [Google Scholar]
  • 181.Fan J, Chen M, Cao S, Yao Q, Zhang X, Du S, et al. Identification of a ferroptosis-related gene pair biomarker with immune infiltration landscapes in ischemic stroke: a bioinformatics-based comprehensive study. BMC Genomics. 2022;23:59. 10.1186/s12864-022-08295-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 182.Buhmann MD. Radial basis functions. Acta Numer. 2000;9:1–38. 10.1017/S0962492900000015. [Google Scholar]
  • 183.Liberty E, Lang K, Shmakov K, editors. Stratified sampling meets machine learning. International conference on machine learning. 2016. 2016-06-11: PMLR.
  • 184.Network CGAR. The molecular taxonomy of primary prostate cancer. Cell. 2015;163(4):1011–25. 10.1016/j.cell.2015.10.025. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 185.Liu J, Lichtenberg T, Hoadley KA, Poisson LM, Lazar AJ, Cherniack AD, et al. An integrated TCGA pan-cancer clinical data resource to drive high-quality survival outcome analytics. Cell. 2018;173(2):400-16.e11. 10.1016/j.cell.2018.02.052. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 186.Rajeshkumar NV, Tan AC, De Oliveira E, Womack C, Wombwell H, Morgan S, et al. Antitumor effects and biomarkers of activity of AZD0530, a Src inhibitor, in pancreatic cancer. Clin Cancer Res. 2009;15(12):4138–46. 10.1158/1078-0432.CCR-08-3021. [DOI] [PubMed] [Google Scholar]
  • 187.Chen Z, Zeng Y, Ma P, Xu Q, Zeng L, Song X, et al. Integrated GMPS and RAMP3 as a signature to predict prognosis and immune heterogeneity in hepatocellular carcinoma. Gene. 2025;933: 148958. 10.1016/j.gene.2024.148958. [DOI] [PubMed] [Google Scholar]
  • 188.Xu Q, Ying H, Xie C, Lin R, Huang Y, Zhu R, et al. Characterization of neutrophil extracellular traps related gene pair for predicting prognosis in hepatocellular carcinoma. J Gene Med. 2023;25(11): e3551. 10.1002/jgm.3551. [DOI] [PubMed] [Google Scholar]
  • 189.Huang Y, Xu J, Xie C, Liao Y, Lin R, Zeng Y, et al. A novel gene pair CSTF2/DPE2A impacts prognosis and cell cycle of hepatocellular carcinoma. J Hepatocell Carcinoma. 2023;10:1639–57. 10.2147/JHC.S413935. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 190.O’Brien KM, Cole SR, Tse C-K, Perou CM, Carey LA, Foulkes WD, et al. Intrinsic breast tumor subtypes, race, and long-term survival in the Carolina Breast Cancer Study. Clin Cancer Res. 2010;16(24):6100–10. 10.1158/1078-0432.CCR-10-1533. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 191.Osborne CK, Schiff R. Growth factor receptor cross-talk with estrogen receptor as a mechanism for tamoxifen resistance in breast cancer. Breast. 2003;12(6):362–7. 10.1016/s0960-9776(03)00137-1. [DOI] [PubMed] [Google Scholar]
  • 192.Clarke R, Liu MC, Bouker KB, Gu Z, Lee RY, Zhu Y, et al. Antiestrogen resistance in breast cancer and the role of estrogen receptor signaling. Oncogene. 2003;22(47):7316–39. 10.1038/sj.onc.1206937. [DOI] [PubMed] [Google Scholar]
  • 193.Lichtner RB. Estrogen/EGF receptor interactions in breast cancer: rationale for new therapeutic combination strategies. Biomed Pharmacother. 2003;57(10):447–51. 10.1016/j.biopha.2003.09.006. [DOI] [PubMed] [Google Scholar]
  • 194.Dowsett M. Overexpression of HER-2 as a resistance mechanism to hormonal therapy for breast cancer. Endocr Relat Cancer. 2001;8(3):191–5. 10.1677/erc.0.0080191. [DOI] [PubMed] [Google Scholar]
  • 195.Zeltser L, Desplan C, Heintz N. Hoxb-13: a new Hox gene in a distant region of the HOXB cluster maintains colinearity. Development. 1996;122(8):2475–84. 10.1242/dev.122.8.2475. [DOI] [PubMed] [Google Scholar]
  • 196.Chen H, Xie H, Wang P, Yan S, Zhang Y, Wang G. A 25 immune-related gene pair signature predicts overall survival in cervical cancer. Cancer Inform. 2022;21: 11769351221090921. 10.1177/11769351221090921. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 197.Jiang Z, Xu J, Zhang S, Lan H, Bao Y. A pairwise immune gene model for predicting overall survival and stratifying subtypes of colon adenocarcinoma. J Cancer Res Clin Oncol. 2023;149(12):10813–29. 10.1007/s00432-023-04957-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 198.Ning Z-k, Tian H-k, Liu J, Hu C-g, Liu Z-t, Li H, et al. Analysis and application of RNA binding protein gene pairs to predict the prognosis of gastric cancer. Heliyon. 2023;9(7): e18242. 10.1016/j.heliyon.2023.e18242. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 199.Li X, Du G, Li L, Peng K. Cellular specificity of lactate metabolism and a novel lactate-related gene pair index for frontline treatment in clear cell renal cell carcinoma. Front Oncol. 2023;13: 1253783. 10.3389/fonc.2023.1253783. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 200.Li Y, An L, Jia Z, Li J, Zhou E, Wu F, et al. Identification of ubiquitin-related gene-pair signatures for predicting tumor microenvironment infiltration and drug sensitivity of lung adenocarcinoma. Cancers. 2022;14(14): 3478. 10.3390/cancers14143478. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 201.Yin Z, Deng J, Zhou M, Li M, Zhou E, Liu J, et al. Exploration of a novel circadian miRNA pair signature for predicting prognosis of lung adenocarcinoma. Cancers. 2022;14(20): 5106. 10.3390/cancers14205106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 202.Li K, Wang W. Establishment of m7G-related gene pair signature to predict overall survival in colorectal cancer. Front Genet. 2022;13: 981392. 10.3389/fgene.2022.981392. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 203.Chen Q, Tang P, Huang H, Qiu X. Establishment of a circular RNA regulatory stemness-related gene pair signature for predicting prognosis and therapeutic response in colorectal cancer. Front Immunol. 2022;13: 934124. 10.3389/fimmu.2022.934124. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 204.Cheng L, Wu H, Zheng X, Zhang N, Zhao P, Wang R, et al. GPGPS: a robust prognostic gene pair signature of glioma ensembling IDH mutation and 1p/19q co-deletion. Bioinformatics. 2023;39(1):btac850. 10.1093/bioinformatics/btac850. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 205.Wang G, Man Y, Cao K, Zhao L, Lun L, Chen Y, et al. An immune-related gene pair signature predicts the prognosis and immunotherapeutic response in glioblastoma. Heliyon. 2024;10(19): e39025. 10.1016/j.heliyon.2024.e39025. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 206.Huiwen R, Cheng Y, Hongwei S, Hongwei L. Predictive effect of microRNA ratio in osteosarcoma. J Clin Oncol. 2014;41(8):708. 10.3760/cma.j.issn.1673-422X.2014.09.019. [Google Scholar]
  • 207.Patnaik SK, Kannisto E, Knudsen S, Yendamuri S. Evaluation of microRNA expression profiles that may predict recurrence of localized stage I non-small cell lung cancer after surgical resection. Cancer Res. 2010;70(1):36–45. 10.1158/0008-5472.CAN-09-3153. [DOI] [PubMed] [Google Scholar]
  • 208.Lin X, Gao J, Zhou L, Yin P, Xu G. A modified k-TSP algorithm and its application in LC-MS-based metabolomics study of hepatocellular carcinoma and chronic liver diseases. J Chromatogr B Analyt Technol Biomed Life Sci. 2014;966:100–8. 10.1016/j.jchromb.2014.05.044. [DOI] [PubMed] [Google Scholar]
  • 209.Yan H, Guan Q, He J, Lin Y, Zhang J, Li H, et al. Individualized analysis reveals CpG sites with methylation aberrations in almost all lung adenocarcinoma tissues. J Transl Med. 2017;15(1):26. 10.1186/s12967-017-1122-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 210.Ma L, Gao Y, Huo Y, Tian T, Hong G, Li H. Integrated analysis of diverse cancer types reveals a breast cancer-specific serum miRNA biomarker through relative expression orderings analysis. Breast Cancer Res Treat. 2024;204(3):475–84. 10.1007/s10549-023-07208-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 211.Yan H, Cai H, Guan Q, He J, Zhang J, Guo Y, et al. Individualized analysis of differentially expressed miRNAs with application to the identification of miRNAs deregulated commonly in lung cancer tissues. Brief Bioinform. 2018;19(5):793–802. 10.1093/bib/bbx015. [DOI] [PubMed] [Google Scholar]
  • 212.Zheng X, Leung K-S, Wong M-H, Cheng L. Long non-coding RNA pairs to assist in diagnosing sepsis. BMC Genomics. 2021;22(1):275. 10.1186/s12864-021-07576-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 213.Peng F, Wang R, Zhang Y, Zhao Z, Zhou W, Chang Z, et al. Differential expression analysis at the individual level reveals a lncRNA prognostic signature for lung adenocarcinoma. Mol Cancer. 2017;16(1):98. 10.1186/s12943-017-0666-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 214.Li H, Jiang F, Du Y, Li N, Chen Z, Cai H, et al. Identification of differential DNA methylation alterations of ovarian cancer in peripheral whole blood based on within-sample relative methylation orderings. Epigenetics. 2022;17(3):314–26. 10.1080/15592294.2021.1900029. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 215.Yan H, He J, Guan Q, Cai H, Zhang L, Zheng W, et al. Identifying CpG sites with different differential methylation frequencies in colorectal cancer tissues based on individualized differential methylation analysis. Oncotarget. 2017;8(29):47356–64. 10.18632/oncotarget.17647. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 216.Liu Y, Lin Y, Yang W, Lin Y, Wu Y, Zhang Z, et al. Application of individualized differential expression analysis in human cancer proteome. Brief Bioinform. 2022;23(3):bbac096. 10.1093/bib/bbac096. [DOI] [PubMed] [Google Scholar]
  • 217.Huang T, Pu Y, Wang X, Li Y, Yang H, Luo Y, et al. Metabolomic analysis in spondyloarthritis: A systematic review. Front Microbiol. 2022;13: 965709. 10.3389/fmicb.2022.965709. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 218.Kolodziejczyk AA, Kim JK, Svensson V, Marioni JC, Teichmann SA. The technology and biology of single-cell RNA sequencing. Mol Cell. 2015;58(4):610–20. 10.1016/j.molcel.2015.04.005. [DOI] [PubMed] [Google Scholar]
  • 219.Chen Y, Zhang H, Sun X. Improving the performance of single-cell RNA-seq data mining based on relative expression orderings. Brief Bioinform. 2022;24(1):bbac556. 10.1093/bib/bbac556. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 220.Tong M, Lin Y, Yang W, Song J, Zhang Z, Xie J, et al. Prioritizing prognostic-associated subpopulations and individualized recurrence risk signatures from single-cell transcriptomes of colorectal cancer. Brief Bioinform. 2023;24(3):bbad078. 10.1093/bib/bbad078. [DOI] [PubMed] [Google Scholar]
  • 221.Shen Y, Chu Q, Timko MP, Fan L. scDetect: a rank-based ensemble learning algorithm for cell type identification of single-cell RNA sequencing in cancer. Bioinformatics. 2021;37(22):4115–22. 10.1093/bioinformatics/btab410. [DOI] [PubMed] [Google Scholar]
  • 222.Yan J, Zeng Q, Wang X. RankCompV3: a differential expression analysis algorithm based on relative expression orderings and applications in single-cell RNA transcriptomics. BMC Bioinformatics. 2024;25(1):259. 10.1186/s12859-024-05889-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 223.Zhao Z, Guo Y, Liu Y, Sun L, Chen B, Wang C, et al. Individualized lncRNA differential expression profile reveals heterogeneity of breast cancer. Oncogene. 2021;40(27):4604–14. 10.1038/s41388-021-01883-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 224.Zou S, Gader P, Zare A. Hyperspectral tree crown classification using the multiple instance adaptive cosine estimator. PeerJ. 2019;7: e6405. 10.7717/peerj.6405. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 225.Lyu S, Tian X, Li Y, Jiang B, Chen H. Multiclass probabilistic classification vector machine. IEEE Trans Neural Networks Learn Syst. 2020;31(10):3906–19. 10.1109/TNNLS.2019.2947309. [DOI] [PubMed] [Google Scholar]
  • 226.Klein DJ, Elliott MN, Haviland AM, Morrison PA, Orr N, Gaillot S, et al. A comparison of methods for classifying and modeling respondents who endorse multiple racial/ethnic categories: a health care experience application. Med Care. 2019;57(6):e34–41. 10.1097/MLR.0000000000001012. [DOI] [PubMed] [Google Scholar]
  • 227.Daisey K, Brown SD. Effects of the hierarchy in hierarchical, multi-label classification. Chemom Intell Lab Syst. 2020;207: 104177. 10.1016/j.chemolab.2020.104177. [Google Scholar]
  • 228.Marzouka N-A-D, Eriksson P. multiclassPairs: an R package to train multiclass pair-based classifier. Bioinformatics. 2021;37(18):3043–4. 10.1093/bioinformatics/btab088. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Not applicable.


Articles from Molecular Biomedicine are provided here courtesy of Springer

RESOURCES