Abstract
Integrins, a family of transmembrane receptor proteins, are well known to play important roles in cancer development and metastasis. However, a comprehensive understanding of these roles has not been achieved due to the complex relationships between specific integrins, cancer types, and the stages of cancer progression. Publicly accessible repositories from the Genotype-Tissue Expression (GTEx) and The Cancer Genome Atlas (TCGA) projects provide rich datasets for exploring these relationships using machine learning (ML). In this study, integrin RNA-Seq expression data of ~ 8 healthy tissues in GTEx and corresponding tumors in TCGA were selected. Integrin expression was used to train ML models to distinguish between different healthy tissues, solid tumors, as well as normal and tumor samples from the same tissue type. These ML models can classify samples by tissue origin or disease status with high accuracy, and the integrins essential to these classifiers were identified. In some cases, the expression of only one or two integrins was needed to classify tissue type, tumor type or disease status with accuracy > 0.9. For example, expression of ITGA7 alone can distinguish healthy and cancerous breast tissue. Additionally, integrin co-expression networks in healthy and cancerous breast tissues were compared and were found to change significantly from healthy to cancer, indicating changes in functional involvement of integrins due to cancer. Integrin expression in metastatic tumors were further examined using data from the AURORA project for Metastatic Breast Cancer (MBC), and several integrins such as ITGAD, ITGA4, ITGAL, and ITGA11 were found to have significantly lower expression in metastases than in primary tumors.
Keywords: Integrins, Artificial intelligence, Metastasis, Receptors, Extracellular matrix (ECM)
Subject terms: Breast cancer, Cancer genetics
Introduction
Integrins, a family of transmembrane proteins responsible for cell/cell and cell/extracellular matrix interactions, participate and play key roles during cancer and metastasis1–7. Cancer metastasis is the major8–10 cause of death among cancer patients, and its prevention would significantly reduce the mortality associated with cancer. Hence, integrins are the target of cancer therapeutics under development5,11. Structurally, integrins are heterodimers composed of an α and a β subunit.3 In humans, there are 18 known α subunits and 8 known β subunits that form at least 24 functional heterodimers12. They have traditionally been subdivided into 5 distinct classes based on ligand interaction and other features, such as the structure of the α subunit (Supplementary Fig. 1)13,14.
Due to the importance of integrins in cell communication mechanisms, it is not surprising that integrins have been widely implicated as key players in cancer development4,15–18. In breast cancer, for example, links between integrins and tumor development have been established through studies showing that deletion of integrins β1 and β4, respectively, inhibits tumor initiation and progression to invasive adenocarcinoma19. Metabolic changes associated with breast cancer and metastasis also involve integrin signaling, such as hypoxia-inducible factors targeting integrins and modifying their signaling pathways during hypoxia, a known hallmark of solid tumors6,20. Integrins have been implicated in organotropism of breast cancer metastasis, as the specific integrins expressed in exosomes released by tumors have been associated with preparing the premetastatic niche in different tissues21.
Despite the well-established connection between integrins and cancer, the specifics of the relationships between cancer and the expression of individual integrins are complex, with single integrins seeming to have opposite impacts in different cancers and different integrins having various functions during a single cancer type’s progression4. For example, α3 has been identified as both a risk factor and a protective factor in different cancers, as high expression of α3 was linked to poor prognosis in some cancers and better prognosis in others22,23. Another example of the complex behavior of integrins in cancer is their expression in breast cancer17,23. Some integrins, such as α11, β1, and β3 were found to promote tumor cell invasion, progression, and metastasis in breast cancer18,24–26, while others, such as α7, are down regulated in breast cancer17. Because of this complexity, a systematic investigation of how integrin expression patterns are associated with cancer development, progression, and metastasis is valuable.
The advent of next-generation sequencing technologies, such as RNA-Seq, has made comprehensive characterization of genomic and transcriptomic data in tumors possible. The establishment of large-scale projects, such as the Genotype Tissue Expression (GTEx)27, The Cancer Genome Atlas Project (TCGA)28, and AURORA29, a study of metastatic breast cancer (MBC), provide centralized data across various tissues, tumor types, and tumor conditions. Data generated by these projects provide opportunities for mining and machine learning (ML) to comprehensively investigate the changes in gene expression that underlie cancer development and metastasis. Much of the current knowledge of integrin-cancer relationships has been gained using a piecemeal approach that focuses on a single member of the integrin family or on a single cancer. A comprehensive investigation of integrin expression data across integrin family and tumor types can put previously acquired piecemeal results in broader context, by indicating if the observed expression changes are unique to a given integrin-cancer combination or if they are found across integrins and cancers. Such analysis can also identify integrins whose expression is highly variable across cancers or during cancer development. These integrins could then be prioritized in focused studies that aim to more fully unravel integrin-cancer relationships. Moreover, ML can identify complex patterns in the data that may not be apparent in traditional statistical analysis. Many ML algorithms have been developed and applied to analyze gene expressions in various diseases, including cancer30,31. by building predictive models and/or discovering biomarkers32. Panels of multi-gene expression used as clinical tools including PAM5033–35, Oncotype DX36, and MammaPrint37, are examples of the application of machine learning to analyze gene expression data.
This study reports a comprehensive analysis of integrin expression patterns utilizing publicly available datasets and examines how integrin expression varies across healthy tissues, primary tumors, and metastasis (Fig. 1). The analysis consists of two main sections. The first section compares integrin expression across a set of 8 healthy tissues and their corresponding tumors and finds that there is wide variation in integrin expression across tissues and across the members of the integrin family in both healthy tissues and cancer. Then ML methods, such as t-SNE visualization and Random Forest (RF) classification models, are used to examine if the variation in integrin expression patterns alone is sufficient for distinguishing samples by tissue for both healthy tissues and solid tumors and by disease status (i.e., distinguishing between healthy and cancer samples from the same tissue). The second section focuses on breast cancer and compares integrin expression in healthy breast tissue, primary breast tumors, and breast cancer metastases. Co-expression networks associated with integrins in normal breast tissue samples and breast cancer samples are compared and found to undergo significant rearrangements. For these re-arranged co-expression networks, there are corresponding changes in functional enrichment. Finally, integrin expression changes in metastatic breast cancer samples are tracked and integrins with significant expression changes are identified.
Fig. 1.
Flowchart depicting dataset selection and data analysis.
Materials and methods
Data sources
Most analysis in this work focused on expression of 27 integrin genes: 18 α subunits, 8 β subunits, and the integrin-like gene ITGBL1. Except for data from AURORA29, this study analyzed RNA-Seq data normalized with the TOIL38 Recompute pipeline. Specifically, we used the “gene expression RNAseq RSEM tpm” file of the TCGA-TARGET-GTEx cohort downloaded from XenaBrowser39,40. GTEx is our main source of healthy tissue data and TCGA is a source of cancer tissue data. We selected tissues to analyze by choosing all non-brain organs that had sample size > 100 in both GTEx and TCGA projects, resulting in the selection of 8 tissues: breast, colon, liver, lung, pancreas, prostate, stomach, and testis. Supplementary Table 1 summarizes the samples sizes of each healthy tissue and cancer studied.
Metastatic breast cancer data from the AURORA US Metastasis Project29 was downloaded as the AURORA Upper Quantile Normalized (UQN) RNA-Seq dataset from the GEO repository (GSE209998)41,42 in February 2023. We transformed the gene expression values by computing log2(x + 1) where x is the value in the downloaded file. This dataset consists of 44 primary tumor and 79 metastatic samples from 53 patients. Metastasis samples came from 19 locations, with liver (sample size, n = 18), lymph node (n = 11), brain (n = 9), and lung (n = 8) being the most common.
Machine learning and statistical methods
t-SNE visualization
T-distributed stochastic neighbor embedding (t-SNE) was used for dimension reduction of the expression of the 27 integrins using the Python scikit-learn43 library with perplexity parameters between 30 and 50. The first two dimensions in the t-SNE space were used for visualization.
Random forest classification
We employed Random Forest (RF) models in several ways: (i) to classify samples by tissue or cancer type using multiclass or one-vs-all classifiers, (ii) to classify samples from the same tissue origin as healthy or cancer, and (iii) to classify samples as primary breast tumor or metastasis. RF models were implemented with the Python scikit-learn43 Random Forest classifier and used a 50%/50% training/test split. To address class imbalance, we set the class_weight parameter to ‘balanced’. This setting yielded results consistent with those obtained when employing over- or under-sampling techniques using the imbalance-learn library. We conducted 500 iterations for each RF model that had a randomly selected training/test split. Model validation metrics, such as accuracy and area under the ROC curve (AUROC), and feature importances are averaged over the 500 iterations.
Co-expression analysis
For gene co-expression analysis, we computed Pearson correlation coefficients (R) between the expression of genes across samples. Integrin-integrin pairs were considered co-expressed if the R between their expression was greater than + 0.6. The mean Pearson coefficient between every integrin-integrin pair in the GTEx breast and TCGA BRCA primary tumor datasets were 0.14 and 0.24, respectively. For integrin-integrin pairs, we searched STRING-db44 to check for known associations in human. Graphviz45 was used for co-expression network visualization. We also examined co-expression relationships between integrins and ~ 19,000 protein coding genes that were selected using BioMart46–48. For select integrins (ITGA2, ITGA10 and ITGAM), we examined functional enrichment of Gene Ontology (GO)49,50 terms associated with co-expressed genes using the DAVID51,52 Bioinformatics webserver. For each integrin, in each dataset (GTEx Breast and TCGA BRCA), the top 2000 correlated genes (selected based on Pearson R values) were used as gene list inputs for the DAVID webserver, with the complete ~ 19,000 protein coding gene list serving as the background. The 5 Biological Process (BP) GO terms with the highest enrichment (i.e., lowest FDR) among co-expressed genes in GTEx Breast samples were selected. The enrichment among co-expressed genes in TCGA BRCA for these terms was then determined.
Statistical tests
The expression of each integrin was categorized as under- or over-expressed in cancer based on whether the expression of the integrin was lower in tumor samples compared to healthy/normal samples, or vice versa. For all plots associated with RF models, an asterisk denotes that the expression difference was significant, where significance was defined as the Bonferroni adjusted p-value of a t-test (ttest_ind from scipy.stats and statsmodels.test.multi.multipletests) falling below 0.05. Similar analysis was performed for metastatic breast cancer data, with under-expression defined as having lower expression in metastatic samples. One-way ANOVA (f_oneway from scipy.stats) followed by Tukey post-hoc tests (pairwise_tukeyhsd from statsmodels.stats.multicomp) were used to identify integrins with significantly different expression in primary breast tumors and four metastasis locations (liver, lymph node, brain, and lung).
Results
Integrin expression patterns can differentiate tissue compartments
Unperturbed human tissues differ markedly in the composition of their extracellular matrices, and this could be expected to correspond to variation in integrin expression. To explore this notion, we calculated mean mRNA expression levels of integrins in 8 selected healthy tissues from the GTEx database (Fig. 2a). There was variation in both the overall mean expression of the different integrins and the expression of each integrin across tissues. To further explore the relationship between integrin expression and tissue, we created a t-SNE plot of the samples in the 8 tissues using only integrin expression as input features (Fig. 2b). The t-SNE plot shows that, in general, samples from each tissue formed clusters that were relatively distinct from those of other tissues, indicating that integrin expression patterns are tissue specific.
Fig. 2.
Variation of integrin expression across healthy tissues (a) Mean expression of integrins in 8 healthy tissues in the GTEx dataset. The middle line of each box shows the mean of means. (b) t-SNE plot based on integrin expression in healthy tissues shows clustering of samples based on tissue. (c) Feature importance of integrins in a multiclass Random Forest model using a 50%/50% training/test split that classifies healthy samples by tissue. The magenta line shows the change in the accuracy of the model as features are added one-by-one from the most important feature to the least. The accuracy of the model using all 27 integrins is 0.953 ± 0.007 and AUROC is 0.995 ± 0.002. (d) Violin plots of expression across tissues for ITGA8, the top feature in the multiclass RF model and in one-vs-all RF models for prostate and stomach. (e) Violin plots of expression across tissues for ITGA6, the third-ranked feature in the multiclass RF model and the top feature in the RF models for breast and liver.
We then applied a multiclass Random Forest (RF) model to classify GTEx samples according to tissue based on integrin expression. This model had a high accuracy (0.953 ± 0.007) and AUROC (0.995 ± 0.002), suggesting that integrin expression patterns are unique to tissue compartments. The confusion matrix of the model’s predictions showed that performance was strong across tissues, as classification accuracy was greater than 95% for 6 of the 8 tissues (Supplementary Fig. 2). Accuracy for prostate (84%) and stomach (88%) was somewhat lower, as the model incorrectly classified ~ 7% of the samples of these tissues as colon. In addition to making predictions, random forest models can be used to determine how features contribute to the model’s predictions. Specifically, we used the feature importance of each integrin in the multiclass RF model to determine the integrins that were most important when classifying samples by tissue (Fig. 2c). Integrins with high feature importance tended to have particularly high or low expression in one or two tissues (Fig. 2d,e and Supplementary Fig. 3), with ITGA8, for example, having relatively high expression in lung and prostate and low expression in liver and pancreas (Fig. 2d). The maximum feature importance in the multiclass RF model was relatively low (0.07), indicating that the model was not highly dependent on the expression of a single integrin (Fig. 2c).
We also created one-vs-all RF models for each of the 8 tissues to further investigate the ability of integrin expression patterns to classify samples by tissue type and identify the integrins that were important in distinguishing individual tissues (Table 1). Like the multiclass model, the one-vs-all RF models had high accuracy (> 95% for all 8 tissues). However, the feature importances of the one-vs-all models showed different behavior than the multiclass model, as many one-vs-all models were reliant on the expression of one or two integrins (Supplementary Fig. 4). In the one-vs-all model for stomach, for example, the feature importance of ITGA8 was 0.175, a value almost three times that of any other integrin (Supplementary Fig. 4g). While the RF models described to this point used the expression of all integrins as input features, models with high accuracy could be achieved using only a handful of integrins. Figure 2c shows how the accuracy of the multiclass RF model changes as the number of features is increased (i.e., integrins are added to the model one at a time in order of their feature importance in the multiclass model with all integrins). Achieving an accuracy of 90% for the multiclass model required including the expression of only 5 integrins, and the accuracy reached a plateau after ~ 10 integrins are added to the model. For most one-vs-all models, the expression of only 1 or 2 integrins was needed for an accuracy higher than 0.9 (Supplementary Fig. 4).
Table 1.
Key integrins in Random Forest classification models.
Classifier | Features required to achieve an accuracy of 0.9 | Mean accuracy* |
---|---|---|
Top integrins features in classification of GTEx healthy tissue samples | ||
Breast vs. others | ITGA6, ITGB5 | 0.975 ± 0.006 |
Colon vs. others | ITGBL1, ITGB4, ITGB3 | 0.976 ± 0.005 |
Liver vs. others | ITGA6 | 0.995 ± 0.002 |
Lung vs. others | ITGAX | 0.996 ± 0.002 |
Pancreas vs. others | ITGA9, ITGB3, ITGAE, ITGA1 | 0.989 ± 0.004 |
Prostate vs. others | ITGA8, ITGB8 | 0.973 ± 0.006 |
Stomach vs. others | ITGA8, ITGB4, ITGB5 | 0.958 ± 0.008 |
Testis vs. others | ITGA2B, ITGB6 | 0.994 ± 0.003 |
GTEx multiclass | ITGA8, ITGB4, ITGA6, ITGA1, ITGB6 | 0.953 ± 0.007 |
Top integrin features in classification of TCGA cancer samples | ||
BRCA (breast) vs. others | ITGA3, ITGA10, ITGBL1, ITGB4, ITGA8 | 0.914 ± 0.007 |
COAD (colon) vs. others | ITGA6, ITGB4 | 0.968 ± 0.004 |
LIHC (liver) vs. others | ITGB6 | 0.985 ± 0.002 |
LUAD (lung) vs. others | ITGA3, ITGB6, ITGA6 | 0.948 ± 0.004 |
LUSC (lung) vs. others | ITGB8, ITGA1, ITGB4 | 0.942 ± 0.005 |
PAAD (pancreas) vs. others | ITGB6 | 0.971 ± 0.003 |
PRAD (prostate) vs. others | ITGA8, ITGB6 | 0.981 ± 0.004 |
STAD (stomach) vs. others | ITGB4, ITGB5, ITGA6, ITGA1 | 0.930 ± 0.006 |
TGCT (testis) vs. others | ITGA2B | 0.991 ± 0.002 |
TCGA multiclass** | ITGA3, ITGB6, ITGA2B, ITGB8 | 0.855 ± 0.008 |
*Mean accuracy of model including all integrins as input features.
**For the TCGA multiclass classifier, only the top four features are listed here as the model accuracy did not reach 0.9 even with all integrins used as input features.
Integrin expression patterns can differentiate solid tumor types
We then followed a similar procedure to examine integrin expression in TCGA primary tumor samples derived from the same tissue types analyzed earlier. Expression varied across integrins and across tumor tissues (Fig. 3a), like in the healthy samples. The t-SNE plot for 9 tumor types (there are two types of lung cancers: lung squamous cell carcinoma and lung adenocarcinoma) shows that the samples are relatively well separated based on their tissue lineage (Fig. 3b). However, they appear to be less clearly delineated than in the healthy tissues (Fig. 2b). This observation was confirmed by a multiclass RF model trained/tested on the tumor samples, which had an accuracy of 0.86 ± 0.01, below the 0.95 achieved for healthy samples (Fig. 3c). The lower accuracy for the tumor samples can be mostly explained by the model tending to incorrectly classify samples as breast cancer (Supplementary Fig. 5). One-vs-all classifiers were able to predict the origin of tumor samples with accuracy close to that of healthy samples for many tissues (Table 1). In agreement with the multiclass RF confusion matrix, breast cancer had the lowest accuracy among the one-vs-all models. In general, the feature importance of the integrins in the tumor RF models (Fig. 3d,e; Supplementary Figs. 6 and 7) indicated similar behavior to that of the healthy samples (e.g., no integrins had high feature importance in multiclass model), although the specific integrins with high feature importance were different in many cases.
Fig. 3.
Variation of integrin expression across cancer types (a) Mean expression of integrins in 9 cancer types in the TCGA dataset. The middle line of each box shows the mean of means. (b) t-SNE plot based on integrin expression in cancer shows clustering of samples based on cancer type, (c) Feature importance of integrins in a multiclass Random Forest model with a 50%/50% training/test split that classifies primary tumor samples by cancer type. The magenta line shows the model accuracy as features are added one-by-one from the most important feature to the least. The accuracy of the model using all 27 integrins is 0.855 ± 0.008 and AUROC is 0.982 ± 0.002. (d) Violin plot of expression across cancers of ITGA3, the top feature of the multiclass RF model and the one-vs-all RF models for BRCA and LUAD. (e) Violin plot of expression across cancers for ITGB6, the second ranked feature of the multiclass RF model and the top feature of one-vs-all RF models for LIHC and PAAD.
Taken together, the results from this section indicate that integrin expression patterns can be used to classify samples by tissue for both healthy and tumor tissues. In some cases, the success of the Random Forest models is due to the same integrins when classifying both healthy tissues and the tumors derived from these tissues (Table 1). For example, in the one-vs-all models for prostate (ITGA8) and testis (ITGA2B), the same integrin was identified as the most important feature for classifying both healthy and tumor samples. However, the specific integrins with the highest feature importance in corresponding models were usually different, as, for example, there was no overlap in the top five integrins in the multiclass RF models for healthy and tumor samples.
Changes in Integrin expression patterns from healthy tissue to tumor samples
To gain more insight into how integrin expression patterns change from healthy tissues to tumors, we looked at the difference between the mean expression of each integrin in GTEx healthy tissue samples and corresponding TCGA primary tumor samples (Fig. 4). The two subtypes of healthy colon tissue and the two lung tumor types were combined in both cases. As expected, the relationship between the development of cancer and integrin expression is complex, with cancer resulting in expression changes that vary across integrins for a given tissue and across tissues for a given integrin in most cases. One exception to this is pancreas, in which all integrins had higher expression in cancer than in healthy tissue, a behavior that has been previously reported16,53–56. There are also some integrins (e.g., ITGA7, ITGA10, and ITGA2B) whose expression decreased in tumors for all tissues except pancreas and integrins (e.g., ITGA2 and ITGA11) whose expression increased in tumors for most tissues. To compare the variation in integrin expression, we calculated the standard deviation in mean expression for each integrin across tissues for GTEx samples and across tumors for TCGA (Supplementary Fig. 8). This standard deviation was higher for GTEx samples than TCGA samples for 22 of the integrins. Therefore, there was reduced variation in integrin expression across cancers for most integrins in comparison with healthy tissue counterparts.
Fig. 4.
Impact of cancer development on integrin expression. The heatmap shows differences between the mean expression of an integrin in GTEx healthy tissue samples and the mean expression of that integrin in corresponding TCGA primary tumor samples. Red indicates higher expression in tumor samples, while blue indicates lower expression in tumor samples.
Integrin expression differentiates healthy and cancer breast tissue samples
To further investigate the roles of integrins in cancer development, we focused our attention on breast cancer. We tested if integrin expression could distinguish healthy tissue samples from tumor samples. RF classifiers were trained to classify healthy versus tumors using two comparisons: (i) patient-matched tumor/normal tissue samples from TCGA BRCA (Fig. 5a) and (ii) TCGA BRCA primary tumor and GTEx healthy breast samples (Fig. 5b). In both cases, the RF classifier reached a high accuracy, around 0.96 for the patient-matched case and 0.99 for TCGA BRCA/GTEx. The top feature identified in both RF classifiers was ITGA7. ITGA7 and the other top features in these RF models tended to have lower expression in tumors than in healthy tissue in breast samples (Fig. 5c) and in several other organs (Fig. 4).
Fig. 5.
Integrin expression in healthy and cancer breast tissue. Feature importance and increase in accuracy with number of features for binary Random Forest models classifying samples as healthy or cancer for (a) primary tumor and solid tissue normal samples in the TCGA BRCA patient-matched data (accuracy of the model using all 27 integrins is 0.954 ± 0.020 and AUROC is 0.991 ± 0.006) and (b) TCGA BRCA primary tumor samples and GTEx healthy tissue samples (accuracy of the model using all 27 integrins is 0.993 ± 0.004 and AUROC is 0.999 ± 0.001). (c) Violin plots of the expression of integrins with high feature importance in the RF model for TCGA BRCA primary tumor and GTEx normal tissue samples.
Supplementary Fig. 9 presents similar results for RF classifiers of normal/tumor samples for the other tissue types investigated here (pancreas and testis were excluded due to having very few patient-matched samples). Breast tissue was unique among the tissues, as it was the only tissue with the same top feature in the RF classifiers trained using both the patient-matched TCGA and the TCGA/GTEx data. We investigated how the ability of ITGA7 to classify breast samples as healthy/cancer compared with two genes sets whose expression has been shown to be related to cancer development. Specifically, we created RF healthy/cancer breast sample classifiers using ITGA7 expression in combination with the PAM50 gene set33–35 (Supplementary Fig. 10a) or a set of genes from Donato et al.57 that are differentially expressed between hypoxic circulating tumor cell (CTC) clusters and normoxic CTCs (Supplementary Fig. 10b). In both classifiers, ITGA7 had the highest feature importance, ranking above all genes in both the PAM50 and Donato sets.
Integrin co-expression in breast cancer
To further investigate changes in integrin expression in breast cancer, we examined co-expression relationships involving integrins in the GTEx breast and TCGA BRCA (primary tumor) datasets. Gene co-expression identifies genes with similar expression profiles that may indicate functional relationships or co-regulation58,59. We studied co-expression between integrins and created networks of co-expressed integrin-integrin pairs in healthy (Fig. 6a) and cancerous (Fig. 6b) breast tissue. These co-expression networks were based solely on correlation in expression and did not include any previous biological knowledge, such as integrins that are known to form dimers. For both healthy tissues and cancers, the co-expression of integrins is common, with both healthy and cancer networks having over 20 co-expressed integrin-integrin pairs. Many of the co-expressed integrin-integrin pairs in both datasets have been previously linked in associations included in STRING-db44 (Supplementary Tables 2 and 3).
Fig. 6.
Integrin co-expression relationships in healthy breast tissue and breast cancer. Integrin-integrin co-expression networks in the (a) GTEx (breast) and (b) TCGA BRCA primary tumor datasets, with edges indicating pairs that have Pearson R ≥ 0.6 in the dataset. Green edges indicate integrin pairs that are co-expressed in both healthy and cancer datasets. Red edges indicate integrin pairs whose co-expression was strongly impacted by cancer (i.e., the difference between the R value in the healthy and cancer datasets was > 0.4).
Despite the general similarity in the structure of the co-expression networks for the two conditions (in terms of nodes and edges), cancer does greatly impact integrin-integrin co-expression. This impact can be observed through examination of the identities of the integrins in the co-expressed pairs, as ~ 90% of the co-expressed integrin pairs are not conserved (i.e., Pearson R ≥ 0.6 in either healthy or cancer samples, but not both) across the two conditions. In many cases, the difference in the correlation coefficient of an integrin-integrin pair between healthy and cancer samples is large (integrin pairs connected by red edges in Fig. 6; Supplementary Tables 2 and 3). For example, ITGA3 and ITGB8 are highly co-expressed in healthy tissue (R = 0.72), but this relationship is absent in cancer (R = 0.00). Cancer does not disrupt the co-expression of all integrin-integrin pairs, as four co-expression relationships (ITGAM-ITGB2 and connections between ITGA1, ITGAV, and ITGB1; shown with green edges in Fig. 6) are conserved across the two networks (i.e., R ≥ 0.6 in both healthy and cancer samples). To examine if the impact of cancer on integrin co-expression was found more broadly, we investigated the co-expression between integrins and all protein coding genes (Supplementary Fig. 11). The integrins in conserved integrin-integrin pairs (e.g., ITGAV, ITGAM, and ITGB1) tended to have more integrin-protein coding gene relationships that were conserved in healthy and cancer samples than the other integrins. Thus, there was agreement between the conservation of integrin-integrin co-expression and the extent of conservation of integrin-protein coding gene co-expression. To examine if these changes in co-expression could be related to changes in the function of integrins in cancer, we performed functional enrichment analysis of the co-expressed protein coding genes for select integrins (ITGA2, ITGA10, and ITGAM; Supplementary Table 4). For ITGA2 and ITGA10, two integrins whose co-expression relationships are greatly disrupted by cancer (Supplementary Fig. 11), many of the highly enriched GO terms in healthy samples show much less enrichment (or were not enriched) in cancer samples. While some terms such as “regulation of DNA-templated transcription” is enriched for co-expressed genes in both GTEx and TCGA samples, other terms, such as “cell–cell adhesion” for ITGA2 and “cytoplasmic translation” for ITGA10, are significantly enriched in only GTEx samples (Supplementary Table 4). In contrast, ITGAM, which has relatively consistent co-expressed protein coding genes in healthy (GTEx breast) and cancer (TCGA primary tumor) (Supplementary Fig. 11), also has relatively conserved GO categories in healthy and cancer samples (Supplementary Table 4).
Integrins expression patterns in metastatic breast cancer
Metastasis is the major cause of tumor-related deaths60, but the biological profiles associated with metastasis, especially at distant sites, are not well understood61. Recently, large scale projects, such as AURORA, have begun to investigate gene expression in metastatic samples, and we used data from AURORA to compare integrin expression in primary breast tumor with that in all metastasis sites (Fig. 7a). The expression of some integrins (e.g., ITGAX, ITGBL1, and ITGAL) was significantly lower in metastatic samples. We also created a t-SNE plot to visualize if integrin expression could distinguish primary and metastatic samples and found that there is some clustering of the metastatic samples (Fig. 7b). A Random Forest model trained to classify samples as a primary or metastatic tumors based on integrin expression had an accuracy of 0.78 (Fig. 7c). However, the F1 score for primary tumor samples was relatively low (0.65 for primary tumor and 0.84 for metastasis samples), and the model tended to improperly classify primary tumor samples as metastatic. This result may be due to sample imbalance, as there were 79 metastasis samples and only 44 primary tumor samples. The integrin-based RF model performed similarly to models based on expression of the PAM50 (mean accuracy ~ 0.80, Supplementary Fig. 13a) and Donato gene groups (mean accuracy ~ 0.79, Supplementary Fig. 13b).
Fig. 7.
Integrin expression in metastatic breast cancer (a) Violin plots of integrin expression in primary tumor (breast) and all metastasis sites combined from the AURORA dataset. (b) t-SNE plot based on integrin expression in AURORA (c) Feature importance plot and accuracy with increase of number of features of a Random Forest model classifying primary breast tumor and metastasis samples (accuracy is 0.78 ± 0.046, AUROC = 0.84 ± 0.041). The magenta line shows model accuracy as each feature is added from highest to lowest importance.
While the results discussed to this point compared primary tumor samples with all metastatic samples combined, we also compared integrin expression in primary tumor with samples from specific metastasis sites with the largest number of samples: liver, brain, lymph node, and lung (Fig. 8a). The difference in expression between primary samples and at least one of these metastasis sites was significant for 10 integrins, including ITGA4, ITGA11, and ITGA7 (Supplementary Table 5). In general, the expression of integrins was lower in the metastatic samples from all four of the sites (Fig. 8a), a result in agreement with expression differences between primary tumor samples and all metastatic samples (Fig. 7a).
Fig. 8.
Comparison of integrin expression in breast cancer that has metastasized to different sites. (a) Heatmap of differences between the mean expression of integrins in metastatic samples (Liver, Brain, Lymph node and Lung) and in primary tumor. (b) Difference in mean expression of integrins in (metastatic) liver and primary breast tumors (AURORA), liver and breast tissue in GTEx (healthy) samples, and liver and breast (primary tumor) in TCGA samples. (c) Difference in mean expression of integrins in (metastatic) lung and primary breast tumors (AURORA), lung and breast tissue in GTEx (healthy) samples, and lung and breast (primary tumor) in TCGA samples. In (b, c), asterisks (*) indicate significance from independent t-test after Bonferroni correction.
The lower expression of integrins in metastasis was particularly pronounced in liver metastasis, as over 20 of the integrins had lower expression in samples from liver metastasis than in primary tumor. To put this result in context, we also compared integrin expression in breast and liver samples from the GTEx and TCGA data sets (Fig. 8b). In these comparisons, expression in liver samples was also lower than in breast samples for the majority of integrins, indicating that the difference in integrin expression in breast cancer that has metastasized to liver mirrors the differences between breast and liver in healthy and primary tumor samples. We performed a similar comparison of lung and breast tissue (Fig. 8c) and found less consistent results, as there was a mix of cases where integrin expression was higher and lower in lung tissue than breast tissue. We note that the data in Fig. 8b and c can be used to compare changes in integrin expression (i.e. whether expression was higher or lower in liver/lung than in breast) but should not be used to compare the magnitudes of these changes as the AURORA and GTEx/TCGA data were not normalized together.
Discussion
Integrins have long been established as key players in cancer, but the specifics of relationships between integrin expression patterns and cancer and how they vary across tissues and cancer types have yet to be fully elucidated. In this work, we attempted to unravel some of these relationships by using machine learning methods to investigate integrin expression patterns in healthy and cancer samples from 8 solid tissues. We found that integrin expression patterns had sufficient variation across tissues and cancer types to enable the creation of Random Forest models that could classify samples by tissue/cancer type or as healthy/cancer using integrin expression alone. A wide array of integrins were important features in these classifiers, and, for most tissues, different integrins were the important features for healthy tissues and their corresponding cancers. There were a couple of exceptions to this trend, as ITGA8 and ITGA2B were the most important integrins in the classification of both healthy and cancer samples in prostate and testis, respectively, and ITGB4 was important in the classification of both healthy and cancer samples in stomach and colon (see Table 1).
We focused on breast cancer and investigated the impact of cancer development on the co-expression relationships of integrins and how integrin expression patterns change during metastasis. The co-expression of most integrins was significantly altered in breast tissue after cancer development, but there was a group of integrins, including ITGAM and ITGB2, whose co-expression relationships were largely conserved between healthy and cancer breast samples. Functional enrichment analysis with integrins that have significantly altered co-expression (e.g. ITGA2 and ITGA10) showed corresponding alterations in GO categories between healthy and cancer (Supplementary Table 4). For ITGAM, on the other hand, enriched functions were relatively consistent (Supplementary Table 4). Metastasis was shown to result in significantly lower expression of some integrins, including ITGA4, ITGA11, and ITGA7. This reduced expression in metastasis was particularly notable for metastasis to the liver, behavior similar to the lower expression of integrins in liver and liver cancer in comparison with healthy breast and breast cancer, respectively.
One of the major results of this work is that, with some exceptions (e.g., the increased expression of integrins in pancreatic cancer compared to healthy pancreas tissue), the expression of integrins cannot be considered as a unified block. Instead, the behavior of individual integrins must be considered. To this end, we summarized important results identified in this work for each integrin (Supplemental Table 6). Some examples of specific integrins whose expression had interesting behavior include:
ITGA7
The RF models in this work identified ITGA7 as the key integrin that enabled the classification of breast tissue samples as healthy or cancer, corresponding with previous studies that identified ITGA7 as a potential predictive marker of chemotherapy response and found that that it was involved in regulating migration and invasion in breast cancer19,43. The expression of ITGA7 was found to be lower in breast cancer in comparison with corresponding healthy tissues, a result that may be explained by previous findings that the ITGA7 gene promoter CpG island, along with promoter regions of several other integrins (e.g., ITGA1, ITGA4), are abnormally hypermethylated in breast cancer samples62. Reduced expression of ITGA7 in cancer was not unique to breast, as its expression was lower in most cancers in comparison with corresponding healthy tissues. ITGA7 was also found to be the most important integrin when classifying samples as healthy or cancer in liver, lung, and prostate (Supplementary Fig. 9). Furthermore, ITGA7 had significantly lower expression in breast cancer that metastasized to liver than in primary breast tumors.
ITGA3
Associations between expression of ITGA3 and cancer risk are complex, as high ITGA3 expression has been associated with an increased risk in some cancers and a reduced risk in others22,23. In breast cancer, for example, ITGA3 was shown to have a methylated promoter region, suggesting gene silencing, and high ITGA3 expression was found to be associated with improved relapse-free survival (RFS) among breast cancer patients23. Interestingly, ITGA3 was identified as the integrin with the highest feature importance in both the multi-class RF model that classified cancer samples by cancer type and the one-vs-all classifiers for breast and lung (LUAD) cancers (Fig. 3 and Table 1). The importance of ITGA3 in classifying samples was unique to cancers, as it was not highly ranked in any models that classified healthy samples by tissue. The difference in the importance of ITGA3 in classifying healthy and cancer samples was relatively surprising given the relatively modest differences between mean ITGA3 expression in healthy tissues and their corresponding cancers (Fig. 4). However, we did find that cancer development had a strong impact on the co-expression relationships of ITGA3 in breast samples (Fig. 6 and Supplementary Tables 2 and 3).
ITGAV
ITGAV has been investigated as a therapeutic target in cancers for over 20 years63,64. However, these studies have yet to translate to therapeutic benefits for patients, and it has been suggested that future targeting of ITGAV containing heterodimers, such as αvβ3, relies on understanding their expression in individual tumors64. For the tissues/cancers investigated in this work, the expression of ITGAV was relatively consistent across tissues and healthy/cancer conditions, and ITGAV was not identified as an integrin with a high feature importance in any of the RF models that classified samples by tissue or disease status. Additionally, its co-expression was relatively well conserved across healthy and cancer samples when considering both its co-expression with other integrins (i.e., 2 of the 4 conserved integrin-integrin co-expression relationships involved ITGAV) and with all protein coding genes (Supplementary Fig. 11).
ITGB4
ITGB4 has been reported to be up-regulated in colon cancer and associated with overall survival, giving it a potential role as both a therapeutic target and prognostic marker for this cancer type65. The analysis in this work also showed that the expression of ITGB4 increased in colon cancer and, furthermore, identified ITGB4 as the second most important feature in distinguishing both healthy colon and colon cancer samples from other healthy tissues/cancer. Additionally, ITGB4 was the integrin that had the largest increase in expression in lung cancer compared to healthy lung tissue, corresponding with previous studies that identified ITGB4 as a potential candidate marker for tumor status and a prognostic indicator of small cell lung carcinoma (SCLC)66.
While this study was able to characterize integrin expression patterns across many tissues and their corresponding cancers, we recognize that it suffers from several limitations and presents opportunities for future analysis. First, our analysis focused on bulk RNA-Seq datasets and did not consider single-cell expression. Therefore, we cannot determine the extent to which the observed differences in expression patterns are due to differences in the mixtures of cell types in different samples. Second, this study analyzed only mRNA expression and did not include measures of integrin protein amounts in cells or in functional heterodimers on cell surfaces. However, we believe that the results presented here can be used to create hypotheses for future bioinformatic and experimental analysis. As the number of single-cell and expression datasets continues to grow, we plan on using these data sets to continue to investigate relationships between integrin expression patterns and cancer.
In conclusion, the current study reports a systematic examination of integrin expression in healthy tissues, their corresponding solid tumors, and metastatic breast tumors using machine learning and other statistical tools. This study tracks the variation in integrin expression patterns across healthy tissues and cancers to provide valuable foundational information for integrin-based therapy development.
Supplementary Information
Acknowledgements
The authors wish to thank Dr Janusz Rak at McGill University for inspiring this study and providing insightful comments. A preprint67 for this article is available.
Author contributions
The authors wish it to be known that H.S. and S.G. contributed equally and are joint first authors. Co-first authors can prioritize their own names when citing or referring to this paper in their resumé/portfolio. J.D.Z., X.H., and Y.W. initially conceptualized the study. H.S. and S.G. conducted all analysis, wrote python code and prepared all figures and tables, with input from C.L.,Y.W., Y.J., Q.C., and J.D.Z. H.S. and S.G. conducted all statistical analysis with input from Y.W., Y.J., and J.D.Z. H.S. and S.G. prepared first draft of manuscript and H.S., S.G., X.H., Y.J., Q.C., J.D.Z. and Y.W. edited the draft.
Funding
We acknowledge the receipt of partial funding from the University of Memphis through Memphis-Meharry partnership program. X.H. acknowledges funding from the National Cancer Institute of the National Institutes of Health under award number 1R15CA280765-01. Q.C acknowledges financial support from the National Institute of General Medical Sciences of the National Institutes of Health under award number 1R35GM145206.
Data availability
This study analyzes publicly available datasets (GTEx, TCGA, AURORA-US). GTEx and TCGA data used in this study are available as the “gene expression RNAseq: RSEM tpm (UCSC Toil RNA-seq Recompute)” file in the combined TCGA-TARGET-GTEx cohort of the UCSC XenaBrowser (https://xenabrowser.net/datapages/?cohort=TCGA%20TARGET%20GTEx&removeHub=https%3A%2F%2Fxena.treehouse.gi.ucsc.edu%3A443). The AURORA dataset is available as the AURORA Upper Quantile Normalized (UQN) RNA-Seq dataset from the GEO repository (GSE209998).
Declarations
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
These authors contributed equally: Hossain Shadman and Saghar Gomrok.
Supplementary Information
The online version contains supplementary material available at 10.1038/s41598-025-89497-w.
References
- 1.Molecular Biology of the Cell. (Garland Science, 2002).
- 2.Kadry, Y. A. & Calderwood, D. A. Structural and signaling functions of integrins. Biochim. Biophys. Acta BBA Biomembr.1862, 183206 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Bouvard, D., Pouwels, J., De Franceschi, N. & Ivaska, J. Integrin inactivators: Balancing cellular functions in vitro and in vivo. Nat. Rev. Mol. Cell Biol.14, 430–442 (2013). [DOI] [PubMed] [Google Scholar]
- 4.Hamidi, H. & Ivaska, J. Every step of the way: Integrins in cancer progression and metastasis. Nat. Rev. Cancer18, 533–548 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Desgrosellier, J. S. & Cheresh, D. A. Integrins in cancer: biological implications and therapeutic opportunities. Nat. Rev. Cancer10, 9–22 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Yousefi, H. et al. Understanding the role of integrins in breast cancer invasion, metastasis, angiogenesis, and drug resistance. Oncogene40, 1043–1063 (2021). [DOI] [PubMed] [Google Scholar]
- 7.Hou, S., Wang, J., Li, W., Hao, X. & Hang, Q. Roles of integrins in gastrointestinal cancer metastasis. Front. Mol. Biosci.8, 708779 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Mani, K. et al. Causes of death among people living with metastatic cancer. Nat. Commun.15, 1519 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Seyfried, T. N. & Huysentruyt, L. C. On the origin of cancer metastasis. Crit. Rev. Oncog.18, 43–73 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Chaffer, C. L. & Weinberg, R. A. A perspective on cancer cell metastasis. Science331, 1559–1564 (2011). [DOI] [PubMed] [Google Scholar]
- 11.Chen, J.-R., Zhao, J.-T. & Xie, Z.-Z. Integrin-mediated cancer progression as a specific target in clinical therapy. Biomed. Pharmacother.155, 113745 (2022). [DOI] [PubMed] [Google Scholar]
- 12.Takada, Y., Ye, X. & Simon, S. The integrins. Genome Biol.8, 215 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Koivisto, L., Heino, J., Häkkinen, L. & Larjava, H. Integrins in wound healing. Adv. Wound Care3, 762–783 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Johnson, M. S., Lu, N., Denessiouk, K., Heino, J. & Gullberg, D. Integrins during evolution: Evolutionary trees and model organisms. Biochim. Biophys. Acta BBA Biomembr.1788, 779–789 (2009). [DOI] [PubMed] [Google Scholar]
- 15.Sökeland, G. & Schumacher, U. The functional role of integrins during intra- and extravasation within the metastatic cascade. Mol. Cancer18, 12 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Li, J. et al. Integrin β1 in pancreatic cancer: Expressions, functions, and clinical implications. Cancers14, 3377 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Bhandari, A. et al. ITGA7 functions as a tumor suppressor and regulates migration and invasion in breast cancer. Cancer Manag. Res.10, 969–976 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Primac, I. et al. Stromal integrin α11 regulates PDGFRβ signaling and promotes breast cancer progression. J. Clin. Invest.129, 4609–4628 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Cooper, J. & Giancotti, F. G. Integrin signaling in cancer: Mechanotransduction, stemness, epithelial plasticity, and therapeutic resistance. Cancer Cell35, 347–367 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Chen, Z., Han, F., Du, Y., Shi, H. & Zhou, W. Hypoxic microenvironment in cancer: Molecular mechanisms and therapeutic interventions. Signal Transduct. Target. Ther.8, 70 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Hoshino, A. et al. Tumour exosome integrins determine organotropic metastasis. Nature527, 329–335 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Gui, J. et al. Identifying the prognosis implication, immunotherapy response prediction value, and potential targeted compound inhibitors of integrin subunit α3 (ITGA3) in human cancers. Heliyon10, e24236 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Li, Y. et al. ITGA3 is associated with immune cell infiltration and serves as a favorable prognostic biomarker for breast cancer. Front. Oncol.11, 658547 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Dos Santos, P. B., Zanetti, J. S., Ribeiro-Silva, A. & Beltrão, E. I. Beta 1 integrin predicts survival in breast cancer: A clinicopathological and immunohistochemical study. Diagn. Pathol.7, 104 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Felding-Habermann, B. et al. Integrin activation controls metastasis in human breast cancer. Proc. Natl. Acad. Sci.98, 1853–1858 (2001). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Ross, M. H. et al. Bone-induced expression of integrin β3 enables targeted nanotherapy of breast cancer metastases. Cancer Res.77, 6299–6312 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Lonsdale, J. et al. The genotype-tissue expression (GTEx) project. Nat. Genet.45, 580–585 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.The Cancer Genome Atlas Research Network et al. The cancer genome atlas pan-cancer analysis project. Nat. Genet.45, 1113–1120 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Garcia-Recio, S. et al. Multiomics in primary and metastatic breast tumors from the AURORA US network finds microenvironment and epigenetic drivers of metastasis. Nat. Cancer10.1038/s43018-022-00491-x (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Shipp, M. A. et al. Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nat. Med.8, 68–74 (2002). [DOI] [PubMed] [Google Scholar]
- 31.Abbas, M. & El-Manzalawy, Y. Machine learning based refined differential gene expression analysis of pediatric sepsis. BMC Med. Genomics13, 122 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Khalsan, M. et al. A survey of machine learning approaches applied to gene expression analysis for cancer prediction. IEEE Access10, 27522–27534 (2022). [Google Scholar]
- 33.Parker, J. S. et al. Supervised risk predictor of breast cancer based on intrinsic subtypes. J. Clin. Oncol.27, 1160–1167 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Wallden, B. et al. Development and verification of the PAM50-based Prosigna breast cancer gene signature assay. BMC Med. Genomics8, 54 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Prat, A., Parker, J. S., Fan, C. & Perou, C. M. PAM50 assay and the three-gene model for identifying the major and clinically relevant molecular subtypes of breast cancer. Breast Cancer Res. Treat.135, 301–306 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Baehner, F. L. The analytical validation of the Oncotype DX Recurrence Score assay. Ecancermedicalscience10, 675 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Wittner, B. S. et al. Analysis of the mammaprint breast cancer assay in a predominantly postmenopausal cohort. Clin. Cancer Res.14, 2988–2993 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Vivian, J. et al. Toil enables reproducible, open source, big biomedical data analyses. Nat. Biotechnol.35, 314–316 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Goldman, M. J. et al. Visualizing and interpreting cancer genomics data via the Xena platform. Nat. Biotechnol.38, 675–678 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.UCSC Xena. https://xenabrowser.net/datapages/.
- 41.Edgar, R. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res.30, 207–210 (2002). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Barrett, T. et al. NCBI GEO: Archive for functional genomics data sets—update. Nucleic Acids Res.41, D991–D995 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Pedregosa, F. et al. Scikit-learn: Machine learning in Python. JMLR12, 2825–2830 (2011). [Google Scholar]
- 44.Szklarczyk, D. et al. STRING v10: Protein–protein interaction networks, integrated over the tree of life. Nucleic Acids Res.43, D447–D452 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Gansner, E. R. & North, S. C. An open graph visualization system and its applications to software engineering. Softw. Pract. Exp.30, 1203–1233 (2000). [Google Scholar]
- 46.Steffen Durinck <Biomartdev@Gmail. Com>, W. H. biomaRt. Bioconductor 10.18129/B9.BIOC.BIOMART (2017).
- 47.Durinck, S. et al. BioMart and Bioconductor: A powerful link between biological databases and microarray data analysis. Bioinformatics21, 3439–3440 (2005). [DOI] [PubMed] [Google Scholar]
- 48.Durinck, S., Spellman, P. T., Birney, E. & Huber, W. Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt. Nat. Protoc.4, 1184–1191 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Ashburner, M. et al. Gene ontology: Tool for the unification of biology. Nat. Genet.25, 25–29 (2000). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.The Gene Ontology Consortium et al. The gene ontology knowledgebase in 2023. Genetics224, 031 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Sherman, B. T. et al. DAVID: A web server for functional enrichment analysis and functional annotation of gene lists (2021 update). Nucleic Acids Res.50, W216–W221 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Huang, D. W., Sherman, B. T. & Lempicki, R. A. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat. Protoc.4, 44–57 (2009). [DOI] [PubMed] [Google Scholar]
- 53.Cruz Da Silva, E., Dontenwill, M., Choulier, L. & Lehmann, M. Role of integrins in resistance to therapies targeting growth factor receptors in cancer. Cancers11, 692 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Cruz-Monserrate, Z., Qiu, S., Evers, B. M. & O’Connor, K. L. Upregulation and redistribution of integrin α6β4 expression occurs at an early stage in pancreatic adenocarcinoma progression. Mod. Pathol.20, 656–667 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Walsh, N., Clynes, M., Crown, J. & O’Donovan, N. Alterations in integrin expression modulates invasion of pancreatic cancer cells. J. Exp. Clin. Cancer Res.28, 140 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Cai, H. et al. Overexpressed integrin alpha 2 inhibits the activation of the transforming growth factor β pathway in pancreatic cancer via the TFCP2-SMAD2 axis. J. Exp. Clin. Cancer Res.41, 73 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Donato, C. et al. Hypoxia triggers the intravasation of clustered circulating tumor cells. Cell Rep.32, 108105 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Saelens, W., Cannoodt, R. & Saeys, Y. A comprehensive evaluation of module detection methods for gene expression data. Nat. Commun.9, 1090 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Hou, J. et al. Distance correlation application to gene co-expression network analysis. BMC Bioinform.23, 81 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Riggio, A. I., Varley, K. E. & Welm, A. L. The lingering mysteries of metastatic recurrence in breast cancer. Br. J. Cancer124, 13–26 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Tutzauer, J. et al. Gene expression in metastatic breast cancer—patterns in primary tumors and metastatic tissue with prognostic potential. Front. Mol. Biosci.10, 1343979 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Strelnikov, V. V. et al. Abnormal promoter DNA hypermethylation of the integrin, nidogen, and dystroglycan genes in breast cancer. Sci. Rep.11, 2264 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Alday-Parejo, B., Stupp, R. & Rüegg, C. Are integrins still practicable targets for anti-cancer therapy?. Cancers11, 978 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Slack, R. J., Macdonald, S. J. F., Roper, J. A., Jenkins, R. G. & Hatley, R. J. D. Emerging therapeutic opportunities for integrin inhibitors. Nat. Rev. Drug Discov.21, 60–78 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Li, M. et al. ITGB4 is a novel prognostic factor in colon cancer. J. Cancer10, 5223–5233 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Li, G.-S. et al. ITGB4 serves as an identification and prognosis marker associated with immune infiltration in small cell lung carcinoma. Mol. Biotechnol.66, 2956–2971 (2024). [DOI] [PubMed] [Google Scholar]
- 67.Shadman, H. et al. A machine learning-based investigation of integrin expression patterns in cancer and metastasis. Arxiv10.1101/2024.09.19.613933 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
This study analyzes publicly available datasets (GTEx, TCGA, AURORA-US). GTEx and TCGA data used in this study are available as the “gene expression RNAseq: RSEM tpm (UCSC Toil RNA-seq Recompute)” file in the combined TCGA-TARGET-GTEx cohort of the UCSC XenaBrowser (https://xenabrowser.net/datapages/?cohort=TCGA%20TARGET%20GTEx&removeHub=https%3A%2F%2Fxena.treehouse.gi.ucsc.edu%3A443). The AURORA dataset is available as the AURORA Upper Quantile Normalized (UQN) RNA-Seq dataset from the GEO repository (GSE209998).