Skip to main content
Journal of Biological Research logoLink to Journal of Biological Research
. 2021 Feb 16;28:5. doi: 10.1186/s40709-021-00136-7

Genome-scale meta-analysis of breast cancer datasets identifies promising targets for drug development

Reem Altaf 1,, Humaira Nadeem 1, Mustafeez Mujtaba Babar 2, Umair Ilyas 3, Syed Aun Muhammad 4
PMCID: PMC7885587  PMID: 33593445

Abstract

Background

Because of the highly heterogeneous nature of breast cancer, each subtype differs in response to several treatment regimens. This has limited the therapeutic options for metastatic breast cancer disease requiring exploration of diverse therapeutic models to target tumor specific biomarkers.

Methods

Differentially expressed breast cancer genes identified through extensive data mapping were studied for their interaction with other target proteins involved in breast cancer progression. The molecular mechanisms by which these signature genes are involved in breast cancer metastasis were also studied through pathway analysis. The potential drug targets for these genes were also identified.

Results

From 50 DEGs, 20 genes were identified based on fold change and p-value and the data curation of these genes helped in shortlisting 8 potential gene signatures that can be used as potential candidates for breast cancer. Their network and pathway analysis clarified the role of these genes in breast cancer and their interaction with other signaling pathways involved in the progression of disease metastasis. The miRNA targets identified through miRDB predictor provided potential miRNA targets for these genes that can be involved in breast cancer progression. Several FDA approved drug targets were identified for the signature genes easing the therapeutic options for breast cancer treatment.

Conclusion

The study provides a more clarified role of signature genes, their interaction with other genes as well as signaling pathways. The miRNA prediction and the potential drugs identified will aid in assessing the role of these targets in breast cancer.

Keywords: Breast cancer, Microarray datasets, Pathway enrichment analysis, Gene ontology, miRNA, Drug-gene network

Background

Cancer is one of the leading causes of death for the past several years and is the second cause of mortality according to the American Cancer Society (ACS) statistics after cardiovascular, infectious and parasitic disorders. Breast cancer is one of the most commonly diagnosed life-threatening malignancy that remains to be the leading cause of cancer incidence and mortality in women globally [1].

Several factors have been attributed towards the development of breast carcinoma. These include age, personal history of breast cancer, reproductive, environmental and genetic factors. Increasing age enhances the risk of breast cancer development [2]. Having a personal history of breast cancer also contributes towards a greater risk of second breast cancer that can be ipsilateral or contralateral. Family history of breast cancer can also enhance the risk of development of cancer in women. About 5–10% of women with breast cancer show an autosomal dominant inheritance while 20–25% have a positive family history [3]. Genetic predisposition alleles showing 40–85% of lifetime threat of breast cancer development include BRCA1 and BRCA2 mutations, TP53 mutations, PTEN, STK11, E-cadherin and neurofibromatosis (NF) [4].

The treatment strategies for breast cancer are largely determined by the status of progesterone receptor, estrogen receptor and the human epidermal growth factor receptor 2. Clinicopathological factors such as tumor grade, size and status of lymph node also determine the therapeutic plan, however, the biomarkers for the tumor invasion and metastasis are of profound importance in order to formulate new markers and treatment strategies for breast carcinomas. This will aid in both current therapies and tumor prognosis [5].

With the aid of in silico bioinformatic approaches the attainment of new treatment strategies have become easier. One such approach that has helped in identifying new markers in cancer therapy is the cDNA differential analysis [6]. In this study, 24 datasets were downloaded to analyze gene expression profiles in breast cancer and a functional analysis was performed to identify the differentially expressed genes (DEGs) between breast tumor cells and treated tissues. A genetic network was constructed as well as pathway analysis and miRNA target identification were performed to understand the underlying molecular mechanisms and to identify potential therapeutic targets for breast cancer. Moreover, drug-gene network analysis has also been performed to identify potential drug targets for breast cancer.

Methods

Accession of gene expression data

The study focuses on the identification of potential breast cancer targets through a differential screening method. The datasets of breast cancer were accessed from Gene Expression Omnibus database. The screening criteria was “organism: Homo sapiens”, and “experiment type: expression profiling by array”. The Affymetrix GeneChip Human Genome U133 Plus 2.0 Array (CDF: Hs133P_Hs_ENST, version 10) (Affymetrix, Inc., Santa Clara, CA, 95051, USA) platform was used. All datasets comprised of GEO accession number, platform, sample type, number of samples and gene expression data. The array platform and hgu133plus2 annotation platform of probes were used to identify the differentially expressed genes. The software R and Bioconductor packages AffyQCReport, Affy, Annotate, AnnotationDbi, Limma, Biobase, AffyRNADegradation, hgu133plus2cdf, and hgu133a2cdf were used to perform the computational analysis [7].

Preprocessing and differential expression analysis of microarray datasets

The preprocessing of datasets was performed by preparing the phenodata files for each dataset in a recognizable format [8]. Using the R version 3.1.3, the Bioconductor ArrayQuality Metrics package was utilized for the normalization of the data to a median expression level for each gene [7]. After normalization, the background correction was done for perfect match (pm) and mismatch (mm) by Robust Multi-array Analysis (RMA). The method was used to eliminate the artifacts and local noise. The expression value with a p-value < 0.15 was measured as marginal log transformation. Afterwards, summarization was performed by RMA-algorithm in order to measure the averages between probes in a probe set to attain the summary of intensities.

The quality of RNA in these microarray datasets was measured using the AffyRNADegradation package of Bioconductor, also called degradation analysis [9]. Lastly, the DEGs in each dataset were identified by pairwise comparison and the Benjamini–Hochberg method [10] was employed for multiple testing correction. The differentially expressed genes were shortlisted and ranked according to their p-values and resulting scores. The cutoff values set were p-value ≤ 0.05, FDR < 0.05 (False Discovery Rate) and absolute log fold change logFC > 1 [11] to calculate the moderated statistics.

Data curation and cluster analysis

The shortlisted genes obtained through differential expression analysis were further screened to confirm their role in breast cancer using diverse data sources such as PubMed (http://www.ncbi.nlm.nih.gov/pubmed), MeSH (http://www.ncbi.nlm.nih.gov/mesh), OMIM (Online Mendelian Inheritance in Man) (http://www.ncbi.nlm.nih.gov/omim), and PMC database (http://www.ncbi.nlm.nih.gov/pmc) [12]. Biomedical text mining helped in filtering significant disease specific genes. The CIMminner tool was used to perform the cluster analysis based on the expression values in each dataset using the Absolute Pearson correlation analysis. The cluster analysis revealed variations in gene expression levels between control and treated replicates [13].

Network analysis and identification of gene signatures

The protein–protein interaction network helped in identifying the interaction of each protein with other genes having different biological or molecular functions in a diseased state as compared to normal. The Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) [14] and Human Annotated and Predicted Protein Interaction (HAPPI) databases [15] were used to evaluate the proteins that interacted with each other in breast cancer with a confidence score of 0.999. The visualization and analysis of molecular interactions of seeder genes with the target genes were done using Cytoscape (version 3.2.1, Temple Place, Suite 330, Boston, MA 02111-1307 USA) software. The role of target genes in breast cancer was mapped by OMIM, MeSH, and PMC databases to identify the breast cancer associated gene signatures whose dysregulation causes a pathological phenotype. A molecular sub-network of those genes that were associated with pathways of interest causing breast cancer was constructed. The topological network properties were calculated using Network Analyzer in Cytoscape [16]. The web-based tools Database for Annotation Visualization and Integrated Discovery (DAVID) [17] and FunRich [18] were used to study the biological functions of these genes including the gene ontology, functional annotation and pathway enrichment analysis [19, 20].

miRNA target prediction

miRNAs are small non-coding RNAs considered as post-transcriptional regulators of several biological processes. Dysregulation of miRNAs leads to disruption of signaling pathways causing disease. The influence of miRNAs on gene targets is one beneficial approach to get a better understanding of disease etiology [21]. The miRNA targets of breast cancer related genes were predicted by miRDB target predictor (www.mirdb.org), an online database for miRNA target prediction and functional annotation. The miRNAs were selected based on the target score (≤ 99).

Integrated pathway modeling

The integrated and metabolic networks of breast cancer related source genes were analyzed and the correlation between test genes was observed. To recognize the underlying pathways involved in the progression of breast cancer, pathway analysis was performed for identifying biomarkers of the disease. The curation and mapping of candidate biomarkers were done using Kyoto Encyclopedia of Genes and Genomes (KEGG) [22], Reactome and Wiki pathways. PathVisio3tool was used to reconstruct the cellular and signaling pathways of potential biomarkers [23] and the potential mechanism of each marker in the pathway was studied based on evidence available in literature and databases.

Drug-gene network analysis

The target genes interrelated with the anti-breast cancer drugs were identified using CTD (http://ctdbase.org/) database, an open source database for the curation of chemical–gene, gene–disease and chemical–disease interactions from literature [24]. The chemical–gene interaction query was used to access drugs against each breast cancer related genes. Drugs that were directly linked with breast cancer related genes were sorted in this interaction network. The FDA approval status of these drugs was also verified using the DrugBank database [25].

Results

Gene expression analysis and normalization

Twelve breast cancer datasets were downloaded from the GEO database with cell format. Each database was having size of ArrayBatch object 1164 × 1164 and 732 × 732 features with related Affyids (Table 1). Quantile normalization was performed for normalization and background correction. This was done to avoid systematic variation. The probe level data obtained after normalization show the quality of the individual array of each dataset in the MA plots (Fig. 1). The severity of RNA-degradation and significance level was presented by the function plotAffyRNAdeg (Fig. 2) and a single summary statistic for each array in the batch was produced by the function summary of AffyRNAdeg (Additional file 1: Table S1). Additional file 2: Table S2 provides the list of databases, tools, and software used in this study.

Table 1.

List of cDNA datasets

Dataset Accession No. Total samples Tissues Species Conditions/type Platform Size of arrays AffyIDs References
GSE83325 4 Breast cancer Homo sapiens Control vs. treated GPL15207 [PrimeView] Affymetrix Human Gene Expression Array 732 × 732 features 49495 [35]
GSE28645 14 Breast cancer Homo sapiens Control vs. treated GPL570 [HG-U133_Plus_2] Affymetrix Human Genome U133 Plus 2.0 Array 1164 × 1164 features 54675 [36]
GSE28448 11 Breast cancer Homo sapiens Control vs. treated GPL570 [HG-U133_Plus_2] Affymetrix Human Genome U133 Plus 2.0 Array 1164 × 1164 features 54675 [37]
GSE27444 14 Breast cancer Homo sapiens Control vs. treated GPL570 [HG-U133_Plus_2] Affymetrix Human Genome U133 Plus 2.0 Array 1164 × 1164 features 54675 [38]
GSE12791 16 Breast cancer Homo sapiens Control vs. treated GPL570 [HG-U133_Plus_2] Affymetrix Human Genome U133 Plus 2.0 Array 712 × 712 features 22283 [39]
GSE33658 22 Breast cancer Homo sapiens Control vs. treated GPL570 [HG-U133_Plus_2] Affymetrix Human Genome U133 Plus 2.0 Array 1164 × 1164 features 54675 [40]
GSE116781 6 Breast cancer Homo sapiens Control vs. treated GPL15207 [PrimeView] Affymetrix Human Gene Expression Array 732 × 732 features 49495 [41]
GSE146911 11 Breast cancer Homo sapiens Control vs. treated GPL570 [HG-U133_Plus_2] Affymetrix Human Genome U133 Plus 2.0 Array 1164 × 1164 features 54675 [42]
GSE151635 12 Breast cancer Homo sapiens Control vs. treated GPL571 [HG-U133A_2] Affymetrix Human Genome U133A 2.0 Array 732 × 732 features 22277 [43]
GSE71363 18 Breast cancer Homo sapiens Control vs. treated GPL570 [HG-U133_Plus_2] Affymetrix Human Genome U133 Plus 2.0 Array 1164 × 1164 features 54675 [44]
GSE99860 16 Breast cancer Homo sapiens Control vs. treated GPL570 [HG-U133_Plus_2] Affymetrix Human Genome U133 Plus 2.0 Array 1164 × 1164 features 54675 [45]
GSE99861 16 Breast cancer Homo sapiens Control vs. treated GPL570 [HG-U133_Plus_2] Affymetrix Human Genome U133 Plus 2.0 Array 1164 × 1164 features 54675 [46]

Fig. 1.

Fig. 1

MA plots showing normalization and analysis of quality array metrics. Plots of log intensity ratio (M) vs. log intensity averages (A). Normally, the mass of distribution in the MA plot is expected to be concentrated along the M = 0 axis

Fig. 2.

Fig. 2

RNA degradation plots produced by plotAffyRNAdeg representing the quality of RNA and its severity of degradation

Identification and screening of differentially expressed genes

In each dataset the differential expression analysis provided 50 DEGs by pairwise comparison between biologically comparable groups. Out of these 50 DEGs, the top 24 genes were ranked and selected in each dataset. The selection was based on FDR (< 0.05), p-value (≤ 0.05) and |logFC| (> 1) parameters. These 24 DEGs were further shortlisted to eight common genes as potential biomarkers for breast cancer (Additional file 3: Table S3).

Data curation and cluster analysis

The gene mapping of 24 DEGs through PubMed, OMIM, MeSH, and PMC databases provided eight significant breast cancer associated genes: ID4, NCOA1, RHEB, PDZK1, PLAUR, AKC1R2, ANXA1 and SLIPI. The role of these genes in breast cancer was curated and counted (Table 2). The genetic expression of breast cancer cell samples showed a clear difference between the control and treated replicates (Fig. 3).

Table 2.

The differentially expressed breast cancer associated genes curated from Pubmed

Sr. No. Probe ID Gene ID Uniprot_id Pubmed count Protein name Reference link
1 11721688_at ID4 ID4_HUMAN 50 Inhibitor of DNA binding 4, HLH protein (ID4) https://pubmed.ncbi.nlm.nih.gov/?term=ID4+and+breast+cancer
2 209106_at NCOA1 NCOA1_HUMAN 106 Nuclear receptor coactivator 1 (NCOA1) https://pubmed.ncbi.nlm.nih.gov/?term=ncoa1+and+breast+cancer
3 211924_s_at PLAUR UPAR_HUMAN 189 Plasminogen activator, urokinase receptor (PLAUR) https://pubmed.ncbi.nlm.nih.gov/?term=plaur+and+breast+cancer
4 1555780_a_at RHEB RHEB_HUMAN 14 Ras homolog enriched in brain (RHEB) https://pubmed.ncbi.nlm.nih.gov/?term=rheb+and+breast+cancer
5 205380_at PDZK1 PDZ1I_HUMAN 26 PDZ domain containing 1 (PDZK1) https://pubmed.ncbi.nlm.nih.gov/?term=pdzk1+and+breast+cancer
6 11716033_at SLPI SLPI_HUMAN 14 Secretory leukocyte peptidase inhibitor (SLPI) https://pubmed.ncbi.nlm.nih.gov/?term=slpi+and+breast+cancer
7 11729101_a_at AKR1C2 Q1KXY7_HUMAN 36 Aldo–keto reductase family 1 member C2 (AKR1C2) https://pubmed.ncbi.nlm.nih.gov/?term=akr1c2+and+breast+cancer
8 201012_at ANXA1 ANXA1_HUMAN 52 Annexin A1 (ANXA1) https://pubmed.ncbi.nlm.nih.gov/?term=anxa1+and+breast+cancer

Fig. 3.

Fig. 3

Cluster analysis of breast cancer related differentially expressed genes. Blue represents small distance and red shows large distance. Lines indicate the cluster boundaries in the level of the tree

miRNA target prediction analysis

The computational algorithms (miRDB) identified multiple breast cancer associated miRNA targets for each gene such as hsa-miR-650, hsa-miR-203a-3p, hsa-miR-4520-3p, hsa-miR-1185-1-3p, hsa-miR-15b-3p and hsa-miR-942-5p. The dysregulation of these signature genes is linked to the progression of breast cancer. The genes ID4, RHEB, AKR1C2, ANXA1 and PDZK1 predicted 191, 74, 108, 41 and 41 miRNAs hits, respectively (Table 3).

Table 3.

miRNA targets of breast cancer related genes

Uniprot id Gene symbol miRNA Target score Total miRNA hits Structure of predicted duplex
NCOA1_HUMAN NCOA1 hsa-miR-650 99 205 AGGAGGCAGCGCUCUCAGGAC
ID4_HUMAN ID4 hsa-miR-203a-3p 98 191 GUGAAAUGUUUAGGACCACUAG
RHEB_HUMAN RHEB hsa-miR-4520-3p 96 74 UUGGACAGAAAACACGCAGGAA
ANXA1_HUMAN ANXA1 hsa-miR-1185-1-3p 93 41 AUAUACAGGGGGAGACUCUUAU
PDZ1I_HUMAN PDZK1 hsa-miR-15b-3p 92 41 CGAAUCAUUAUUUGCUGCUCUA
UPAR_HUMAN PLAUR hsa-miR-942-5p 92 43 UCUUCUCUGUUUUGGCCAUGUG
Q1KXY7_HUMAN AKR1C2 hsa-miR-185-5p 90 108 UGGAGAGAAAGGCAGUUCCUGA
SLPI_HUMAN SLPI hsa-miR-3173-3p 72 8 AAAGGAGGAAAUAGGCAGGCCA

Protein network analysis

The protein–protein interaction analysis revealed the interaction of breast cancer related genes with other potential genes contributing to a pathological phenotype. The network showed a total of 207 nodes and 226 edges that were retrieved from STRING [14] and HAPPI [15] databases. The network was categorized in three neighborhoods: red and blue nodes indicate the breast cancer associated potential biomarkers while the remaining yellow nodes represent the non-breast cancer target proteins. The potential biomarkers were found to functionally interact with other biologically essential target proteins, some of which are TCF4, TP53, mTOR, NOTCH1, ESR1 and ESR2. The source protein ID4 showed interaction with TCF4, NOTCH1 and WNT while NCOA1 and PDZK1 interacted with ESR1 and ESR2 potential biomarker proteins. ANXA1 was also associated with the CCL5, CXCR 10 and CXCL8 family of cytokines. The network analyzer was used to analyze the topological properties of the network. It also helped in classifying and improving the network performance (Fig. 4). The disease gene mapping of target genes using CTD showed that more than 50 genes have a functional relation with the source/seeder genes in breast cancer (Fig. 5). In gene enrichment analysis, the targeted genes were selected based on fold change and a p-value cut-off (< 0.05). The analysis revealed significant enrichment of these genes with mTOR signaling pathway, TGF-β signaling pathway, P13-AKT signaling pathway, insulin signaling pathway, thyroid signaling pathway and complement coagulation cascade (Table 4). The transcription factors identified were RBPJ, NHLH1, HENMT1, PHOX2A, CACD and ISL2. The transcription factors (TFs) NHLH1 and HENMT1 showed 50% abundance with known breast cancer genes (Fig. 5).

Fig. 4.

Fig. 4

Gene network of breast cancer related differentially expressed genes with 207 nodes and 226 edges. Red and blue nodes indicate the breast cancer associated potential biomarkers while yellow nodes represent the non-breast cancer target proteins

Fig. 5.

Fig. 5

Transcription factors for breast cancer associated gene signatures involved to alter gene expression in a host cell to promote breast cancer resistance and progression

Table 4.

Pathway enrichment and gene ontology of breast cancer related DEGs

Category Term Count p-value
GOTERM_BP_FAT Positive regulation of cell differentiation 4 3.9 × 10−3
GOTERM_BP_FAT Gliogenesis 3 4.1 × 10−3
GOTERM_BP_FAT Rhythmic process 3 6.9 × 10−3
GOTERM_BP_FAT Cellular lipid metabolic process 4 7.1 × 10−3
GOTERM_BP_FAT Estrous cycle 2 7.1 × 10−3
GOTERM_BP_FAT Epithelium development 4 7.4 × 10−3
GOTERM_BP_FAT Negative regulation of hydrolase activity 3 1.1 × 10−2
GOTERM_BP_FAT Response to drug 3 1.2 × 10−2
GOTERM_BP_FAT Regulation of oligodendrocyte differentiation 2 1.2 × 10−2
GOTERM_BP_FAT Reproductive structure development 3 1.2 × 10−2
GOTERM_BP_FAT Gland development 3 1.2 × 10−2
GOTERM_BP_FAT Prostaglandin metabolic process 2 1.3 × 10−2
GOTERM_BP_FAT Lipid metabolic process 4 1.4 × 10−2
GOTERM_BP_FAT Neurogenesis 4 1.8 × 10−2
GOTERM_BP_FAT Prostate gland development 2 1.9 × 10−2
GOTERM_BP_FAT Positive regulation of cell death 3 2.5 × 10−2
GOTERM_CC_FAT Extracellular exosome 5 5.9 × 10−3
GOTERM_CC_FAT Extracellular vesicle 5 6.0 × 10−3
GOTERM_CC_FAT Membrane-bound vesicle 5 1.5 × 10−2
GOTERM_CC_FAT Extracellular region 5 3.8 × 10−2
GOTERM_CC_FAT Extrinsic component of membrane 2 8.8 × 10−2
GOTERM_MF_FAT Receptor binding 4 2.2 × 10−2
GOTERM_MF_FAT Enzyme binding 4 3.8 × 10−2
GOTERM_MF_FAT Protein dimerization activity 3 9.2 × 10−2

Pathway modeling

The gene signatures isolated were further studied to understand their role in the progression of breast cancer and their underlying molecular mechanism. The signature genes were analyzed for their interaction with other proteins in breast carcinogenesis through reconstruction of a network. The pathways involved in the progression of breast cancer were the MTOR signaling pathway, estrogen signaling pathway, P13-AKT signaling pathway, TGF-β signaling pathway and the insulin signaling pathway. The source genes interact with other target genes through these signaling pathways leading to the occurrence of breast cancer. The network shows the heterogeneous nature of breast cancer which is the major obstacle in defining therapies with desirable outcomes (Fig. 6).

Fig. 6.

Fig. 6

Pathway analysis. Integrated gene signaling pathways involved in the progression of breast cancer. Gene signatures were mapped on KEGG pathways for signaling and metabolic reconstruction

Drug-gene network analysis

For drug-gene network analysis the toxicogenomic approach was used to further investigate the existing treatment options for breast cancer therapy. This was done to better understand the disease etiology. The publicly available database CTD identified 65 drugs that interacted with these signature genes. In total, 57 target drugs were FDA approved (Table 5). These drugs were found to interact with signature genes that are involved in the progression of breast cancer (Fig. 7).

Table 5.

Drug targets of identified differentially expressed genes

Gene Drugs Drug Bank ID FDA approval
PDZK1 Afimoxifene DB04468 Investigational
PDZK1 Raloxifene hydrochloride DB00481 Investigational
PDZK1 Urethane DB04827 Removed
PDZK1 Valproic acid DB00313 Approved
PDZK1 Ciglitazone DB09201 Experimental
PDZK1 Estradiol DB00783 Approved
PDZK1 Fenofibric acid DB13873 Approved
PDZK1 Ormosil DB00742 Approved
PDZK1 Polyethylene glycols DB09287 Approved
PDZK1 Zoledronic acid DB00399 Approved
ID4 Acetaminophen DB00316 Approved
ID4 Belinostat DB05015 Approved
ID4 Dorsomorphin DB08597 Experimental
ID4 Doxorubicin DB00997 Approved
ID4 Estradiol DB00783 Approved
ID4 Panobinostat DB06603 Approved
ID4 Tamoxifen DB00675 Approved
ID4 Valproic acid DB00313 Approved
RHEB Cisplatin DB00515 Approved
RHEB Lonafarnib DB06448 Investigational
RHEB Tipifarnib DB04960 Investigational
SLPI Copper DB09130 Approved
SLPI Ormosil DB00742 Approved
SLPI Polyethylene glycols DB09287 Approved
SLPI Doxorubicin DB00997 Approved
SLPI Cyclosporine DB00091 Approved
SLPI Cisplatin DB00515 Approved
SLPI Aspirin DB00945 Approved
AKR1C2 Aspirin DB00945 Approved
AKR1C2 Cloxazolam DB01553 Experimental
AKR1C2 Diazepam DB00829 Approved
AKR1C2 Estazolam DB01215 Approved
AKR1C2 Flurbiprofen DB00712 Approved
AKR1C2 Glipizide DB01067 Approved
AKR1C2 Indomethacin DB00328 Approved
AKR1C2 Meclofenamic acid DB00939 Approved
NCOA1 Tamoxifen DB00675 Approved
NCOA2 Calcitriol DB00136 Approved
NCOA3 Rifampin DB01045 Approved
NCOA4 Troglitazone DB00197 Approved
NCOA5 Alitretinoin DB00523 Approved
PLAUR Urokinase DB00013 Approved
PLAUR Tenecteplase DB00031 Approved
PLAUR Anistreplase DB00029 Approved
PLAUR Filgrastim DB00099 Approved
PLAUR Interferon gama-1b DB00011 Approved
PLAUR Reteplase DB00015 Approved
PLAUR Alteplase DB00009 Approved
ANXA1 Desonide DB01260 Approved
ANXA1 Prednisone DB00635 Approved
ANXA1 Trastuzumab DB00072 Approved
ANXA1 Loteprednol etabonate DB14596 Approved
ANXA1 Desoximetasone DB00547 Approved
ANXA1 Hydrocortisone DB00741 Approved
ANXA1 Hydrocortamate DB00769 Approved
ANXA1 Triamcinolone DB00620 Approved
ANXA1 Prednisolone DB00860 Approved
ANXA1 Amcinonide DB00288 Approved
ANXA1 Flumethasone pivalate DB00663 Approved
ANXA1 Betamethasone DB00443 Approved
ANXA1 Methylprednisolone DB00959 Approved
ANXA1 Rimexolone DB00896 Approved
ANXA1 Halobetasol propionate DB00596 Approved
ANXA1 Dexamethasone DB01234 Approved
ANXA1 Prednicarbate DB01130 Approved

Fig. 7.

Fig. 7

Drug-gene network. Drug-gene network constructed between the reported drugs and their target signature genes showing 66 nodes and 65 edges. Color codes are given in the legends. The drug-gene network shows potential drug targets for signature genes by curating using PMC, CTD and Drug Bank databases

Discussion

Due to its recurrence and heterogeneous nature, breast cancer is the leading cause of death in women globally. This calls for a better understanding of the molecular mechanisms of breast cancer in order to improve diagnosis and management.

This study focuses on the identification of several gene signatures, their functional annotation, potential protein–protein interactions and reconstruction of biological pathways for a better understanding of the disease. The differential expression analysis revealed eight gene signatures out of 50 DEGs based on physicochemical and functional studies that play a role in breast carcinogenesis. ID4, NOCA1, RHEB, ANXA1, AKR1C2, PDZK1, PLAUR and SLPI are the identified DEGs out of which five are upregulated and three are downregulated. The gene ontology of these genes showed functional enrichment in cellular communication, signal transduction, protein metabolism, transport and steroid hormone receptor signaling as well as essential roles in several important signaling pathways such as the MTOR, TGF-B, P13-Akt and insulin. These pathways have been studied for their role in the progression and occurrence of several cancers. ID4 belongs to a family of four helix-loop-helix (HLH) transcriptional regulators, termed as inhibitors of differentiation (ID) proteins. These proteins are involved in the regulation of several cell processes such as differentiation, transcription and cell cycle progression. Emerging evidence has shown a proto-oncogenic role of ID4 in basal like breast cancer (BLBC). An overexpression of this gene is observed in this subtype of breast cancer and is correlated with the expression of TP53 protein which is involved in higher grade and metastasis risk. This has led it to be a poor prognostic marker of BLBC as the proliferation of BLBC cell lines require an overexpression of ID4 [26]. The gene network analysis also revealed the interaction of ID4 with several other proteins such as TCF, WNT, TP53 and NOTCH1. This supports the previous evidence of correlation of ID4 with TP53 in the proliferation of BLBC.

The overexpression of nuclear receptor coactivator 1 (NCOA1) has also shown a positive correlation with disease metastasis and recurrence that resides in a subset of breast cancers. This gene belongs to the p160 SRC family and interacts with certain nuclear receptors and transcription factors (TFs) playing important roles in growth, development, reproduction and metabolism as well as in cancer. NCOA1 has been associated with HER2 expression, metastasis, disease recurrence and poor survival and overexpression in 19–29% of breast tumors [27]. Other interacting proteins identified through network analysis showing crosstalk with NCOA1 are the ESR and PPAR (Fig. 5). The pathway analysis also clarifies the role of NCOA1 in proliferation and metastasis of breast cancer by interaction with these proteins (Fig. 7). Another source protein identified through differential expression analysis is PDZ domain containing 1 (PDZK1) which is an adaptor protein expressed in the proximal tubules of kidney and has a pivotal role in lipid metabolism. However, this protein is thought to be responsive to estrogen in breast cancer cell lines (mcf-7). A significant correlation between 17B-estradiol plasma levels and PDZK1 mRNA expression has been shown in ER-α (+) breast tumors providing a link between Er-α and PDZK1 [28]. A potential candidate involved in the indirect link of this association is insulin-like growth factor-1 (IGS-1R). The gene ontology studies of these genes also revealed enrichment of these genes in the insulin signaling pathway, suggesting a link of this pathway in cell proliferation of breast tumors.

Ras human enriched in brain (Rheb) is a small GTP-binding protein and a well-known regulator of mTOR. mTOR plays a pivotal role in cell proliferation, aging, protein synthesis and autophagy. Recent evidence has suggested a hyperactivity in Rheb-mTORC1 signaling axis in several human carcinomas [29]. Evidence also suggest an elevated expression of RHEB in epithelial cells of fibroadenomas providing an association of RHEB with insulin/AKT/TOR signaling pathway in benign tumor development [30]. The pathway analysis has also shown association of Rheb with these proteins suggesting its important role in cell cycle control and cell growth. Secreted proteins play a pivotal role in several types of cancer metastasis including breast tumors. One of the secreted proteins identified through differential analysis is SLPI which has a role in the progression and development of tumors. Several tumors have shown elevated gene expression levels of SLPI such as ovarian and lung cancer. A recent study has identified SLPI as a new target for anti-metastatic therapies due to its pro-metastatic part of secretome for breast cancer, chiefly for TNCs [31]. The two aldo–keto reductases AKR1C1 and AKR1C2 belong to the super family of AKR1C1 and are involved in progesterone metabolism. The metabolites of progesterone are basically involved in suppression of cell proliferation and adhesion. In tumorous breast tissues the expression of AKR1C1 and AKR1C2 is reduced promoting tumor growth and progression [32]. The association of over-activation of PLAUR (uPAR) with increased aggressive carcinoma is also well-studied. A correlation has been observed between HER2 and uPAR mRNA in disseminated tumors suggesting a cross talk between HER2 and uPAR signaling pathways causing recurrence or metastasis [33]. Moreover, Annexin A1 (AnxA1) is also a candidate regulator of oncogenic switch during which cancer cells change their phenotype from epithelial to migratory, mesenchymal-like. AnxA1 is an actin regulatory protein and its overexpression is associated with the BLBC subtype. It has a pro-angiogenic role in vascular endothelial cells, tumor growth and metastasis and is also involved in the regulation of TGFβ signaling. Evidence suggests AnxA1 as an additional marker in discriminating BLBC diagnosis from other subtypes [34]. The drug-gene network analysis revealed that several common drugs have shown interactions with these signature genes such as Tamoxifen, Cisplatin, Diazepam, Aspirin, Hydrocortisone, etc. opening the platform for repurposing of these drugs to better manage this disease.

Conclusion

This study has opened new insights for potential targets for breast cancer, their relations with other signaling proteins and their involvement in the progression and development of breast cancer through cross talk. The pathway analysis further clarifies the role of several genes and contributes to the efficient management of this disease.

Supplementary Information

40709_2021_136_MOESM1_ESM.xlsx (15.3KB, xlsx)

Additional file 1: Table S1. The function summaryAffyRNAdeg of Bioconductor package produced a single summary-statistic for each array in the batch dataset.

40709_2021_136_MOESM2_ESM.xlsx (8.9KB, xlsx)

Additional file 2: Table S2. List of Databases, software, and Tools used in this study.

40709_2021_136_MOESM3_ESM.xlsx (10.9KB, xlsx)

Additional file 3: Table S3. Preliminary investigation of common and related differentially expressed genes of each microarray dataset.

Acknowledgements

Not applicable.

Authors’ contributions

SA and HN conceived and designed the study. RA and UI carried out the research work. MMB and SZ provided guidance with study design. All authors contributed to manuscript writing and edition. All authors read and approved the final manuscript.

Funding

Not applicable.

Availability of data and materials

The data has been presented with the article.

Ethics approval and consent to participate

Not applicable.

Consent of publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Footnotes

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

The online version contains supplementary material available at 10.1186/s40709-021-00136-7.

References

  • 1.Friedenreich CM. Physical activity and breast cancer: review of the epidemiologic evidence and biologic mechanisms. Recent Results Cancer Res Fortschritte der Krebsforschung Progres dans les recherches sur le cancer. 2011;188:125–139. doi: 10.1007/978-3-642-10858-7_11. [DOI] [PubMed] [Google Scholar]
  • 2.Siegel R, Naishadham D, Jemal A. Cancer statistics, 2013. CA Cancer J Clin. 2013;63(1):11–30. doi: 10.3322/caac.21166. [DOI] [PubMed] [Google Scholar]
  • 3.Lynch HT, Lynch JF. Breast cancer genetics in an oncology clinic: 328 consecutive patients. Cancer Genet Cytogenet. 1986;22(4):369–371. doi: 10.1016/0165-4608(86)90032-4. [DOI] [PubMed] [Google Scholar]
  • 4.Sharif S, Moran A, Huson S, Iddenden R, Shenton A, Howard E, et al. Women with neurofibromatosis 1 (nf1) are at a moderately increased risk of developing breast cancer and should be considered for early screening. J Med Genet. 2007;44(8):481–484. doi: 10.1136/jmg.2007.049346. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Guarino M, Rubino B, Ballabio G. The role of epithelial–mesenchymal transition in cancer pathology. Pathology. 2007;39(3):305–318. doi: 10.1080/00313020701329914. [DOI] [PubMed] [Google Scholar]
  • 6.Ilyas U, uz Zaman S, Altaf R, Nadeem H, Muhammad SA. Genome wide meta-analysis of cDNA datasets reveals new target gene signatures of colorectal cancer based on systems biology approach. J Biol Res Thessaloniki. 2020;27(1):1–13. doi: 10.1186/s40709-020-00118-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Obenchain V, Lawrence M, Carey V, Gogarten S, Shannon P, Morgan M. VariantAnnotation: a bioconductor package for exploration and annotation of genetic variants. Bioinformatics. 2014;30(14):2076. doi: 10.1093/bioinformatics/btu168. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Troyanskaya O, Cantor M, Sherlock G, Brown P, Hastie T, Tibshirani R, et al. Missing value estimation methods for DNA microarrays. Bioinformatics. 2001;17(6):520–525. doi: 10.1093/bioinformatics/17.6.520. [DOI] [PubMed] [Google Scholar]
  • 9.Fasold M, Binder H. AffyRNADegradation: control and correction of RNA quality effects in GeneChip expression data. Bioinformatics. 2013;29(1):129–131. doi: 10.1093/bioinformatics/bts629. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Ferreira J, Zwinderman A. On the Benjamini–Hochberg method. Ann Stat. 2006;34(4):1827–1849. [Google Scholar]
  • 11.Jin Y, Da W. RETRACTED ARTICLE: screening of key genes in gastric cancer with DNA microarray analysis. Eur J Med Res. 2013;18(1):37. doi: 10.1186/2047-783X-18-37. [DOI] [PMC free article] [PubMed] [Google Scholar] [Retracted]
  • 12.Eisen MB, Spellman PT, Brown PO, Botstein D. Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci. 1998;95(25):14863–14868. doi: 10.1073/pnas.95.25.14863. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Scherf U, Ross DT, Waltham M, Smith LH, Lee JK, Tanabe L, et al. A gene expression database for the molecular pharmacology of cancer. Nat Genet. 2000;24(3):236. doi: 10.1038/73439. [DOI] [PubMed] [Google Scholar]
  • 14.Szklarczyk D, Franceschini A, Kuhn M, Simonovic M, Roth A, Minguez P, et al. The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored. Nucleic Acids Res. 2011;39:D561–D568. doi: 10.1093/nar/gkq973. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Chen JY, Mamidipalli S, Huan T. HAPPI: an online database of comprehensive human annotated and predicted protein interactions. BMC Genom. 2009;10(1):S16. doi: 10.1186/1471-2164-10-S1-S16. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 2003;13(11):2498–2504. doi: 10.1101/gr.1239303. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Huang DW, Sherman BT, Tan Q, Kir J, Liu D, Bryant D, et al. DAVID bioinformatics resources: expanded annotation database and novel algorithms to better extract biology from large gene lists. Nucleic Acids Res. 2007;35(suppl_2):W169–W175. doi: 10.1093/nar/gkm415. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Pathan M, Keerthikumar S, Ang CS, Gangoda L, Quek CY, Williamson NA, et al. FunRich: an open access standalone functional enrichment and interaction network analysis tool. Proteomics. 2015;15(15):2597–2601. doi: 10.1002/pmic.201400515. [DOI] [PubMed] [Google Scholar]
  • 19.Nam D, Kim S-Y. Gene-set approach for expression pattern analysis. Brief Bioinform. 2008;9(3):189–197. doi: 10.1093/bib/bbn001. [DOI] [PubMed] [Google Scholar]
  • 20.Muhammad SA, Ahmed S, Ali A, Huang H, Wu X, Yang XF, et al. Prioritizing drug targets in Clostridium botulinum with a computational systems biology approach. Genomics. 2014;104(1):24–35. doi: 10.1016/j.ygeno.2014.05.002. [DOI] [PubMed] [Google Scholar]
  • 21.Alshalalfa M, Alhajj R. Using context-specific effect of miRNAs to identify functional associations between miRNAs and gene signatures. BMC Bioinform. 2013;14(S12):S1. doi: 10.1186/1471-2105-14-S12-S1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Kanehisa M, Goto S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000;28(1):27–30. doi: 10.1093/nar/28.1.27. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Kutmon M, van Iersel MP, Bohler A, Kelder T, Nunes N, Pico AR, et al. PathVisio 3: an extendable pathway analysis toolbox. PLoS Comput Biol. 2015;11(2):e1004085. doi: 10.1371/journal.pcbi.1004085. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Davis AP, Rosenstein MC, Wiegers TC, Mattingly CJ. DiseaseComps: a metric that discovers similar diseases based upon common toxicogenomic profiles at CTD. Bioinformation. 2011;7(4):154. doi: 10.6026/97320630007154. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Wishart DS, Knox C, Guo AC, Shrivastava S, Hassanali M, Stothard P, et al. DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Res. 2006;34(suppl_1):D668–D672. doi: 10.1093/nar/gkj067. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Baker LA, Holliday H, Swarbrick A. ID4 controls luminal lineage commitment in normal mammary epithelium and inhibits BRCA1 function in basal-like breast cancer. Endocr Relat Cancer. 2016;23(9):R381–R392. doi: 10.1530/ERC-16-0196. [DOI] [PubMed] [Google Scholar]
  • 27.Qin L, Wu Y-L, Toneff MJ, Li D, Liao L, Gao X, et al. NCOA1 directly targets M-CSF1 expression to promote breast cancer metastasis. Can Res. 2014;74(13):3477–3488. doi: 10.1158/0008-5472.CAN-13-2639. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Kim H, Abd Elmageed ZY, Davis C, El-Bahrawy AH, Naura AS, Ekaidi I, et al. Correlation between PDZK1, Cdc37, Akt and breast cancer malignancy: the role of PDZK1 in cell growth through Akt stabilization by increasing and interacting with Cdc37. Mol Med. 2014;20:270–279. doi: 10.2119/molmed.2013.00166. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.He L, Ren Y, Zheng Q, Wang L, Lai Y, Guan S, et al. Fas-associated protein with death domain (FADD) regulates autophagy through promoting the expression of Ras homolog enriched in brain (Rheb) in human breast adenocarcinoma cells. Oncotarget. 2016;7(17):24572. doi: 10.18632/oncotarget.8249. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Eom M, Han A, Lee MJ, Park KH. Expressional difference of RHEB, HDAC1, and WEE1 proteins in the stromal tumors of the breast and their significance in tumorigenesis. Korean J Pathol. 2012;46(4):324–330. doi: 10.4132/KoreanJPathol.2012.46.4.324. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Kozin SV, Maimon N, Wang R, Gupta N, Munn L, Jain RK, et al. Secretory leukocyte protease inhibitor (SLPI) as a potential target for inhibiting metastasis of triple-negative breast cancers. Oncotarget. 2017;8(65):108292. doi: 10.18632/oncotarget.22660. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Wenners A, Hartmann F, Jochens A, Roemer AM, Alkatout I, Klapper W, et al. Stromal markers AKR1C1 and AKR1C2 are prognostic factors in primary human breast cancer. Int J Clin Oncol. 2016;21(3):548–556. doi: 10.1007/s10147-015-0924-2. [DOI] [PubMed] [Google Scholar]
  • 33.Chandran VI, Eppenberger-Castori S, Venkatesh T, Vine KL, Ranson M. HER2 and uPAR cooperativity contribute to metastatic phenotype of HER2-positive breast cancer. Oncoscience. 2015;2(3):207. doi: 10.18632/oncoscience.146. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.de Graauw M, van Miltenburg MH, Schmidt MK, Pont C, Lalai R, Kartopawiro J, et al. Annexin A1 regulates TGF-beta signaling and promotes metastasis formation of basal-like breast cancer cells. Proc Natl Acad Sci USA. 2010;107(14):6340–6345. doi: 10.1073/pnas.0913360107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Kong S-Y, Kim K-S, Kim J, Kim MK, Lee KH, Lee J-Y, et al. The ELK3-GATA3 axis orchestrates invasion and metastasis of breast cancer cells in vitro and in vivo. Oncotarget. 2016;7(40):65137. doi: 10.18632/oncotarget.11427. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.McCartan D, Bolger JC, Fagan A, Byrne C, Hao Y, Qin L, et al. Global characterization of the SRC-1 transcriptome identifies ADAM22 as an ER-independent mediator of endocrine-resistant breast cancer. Can Res. 2012;72(1):220–229. doi: 10.1158/0008-5472.CAN-11-1976. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Hesling C, Fattet L, Teyre G, Jury D, Gonzalo P, Lopez J, et al. Antagonistic regulation of EMT by TIF1γ and Smad4 in mammary epithelial cells. EMBO Rep. 2011;12(7):665–672. doi: 10.1038/embor.2011.78. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Lee J, Hirsh AS, Wittner BS, Maeder ML, Singavarapu R, Lang M, et al. Induction of stable drug resistance in human breast cancer cells using a combinatorial zinc finger transcription factor library. PLoS ONE. 2011;6(7):e21112. doi: 10.1371/journal.pone.0021112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Luo W, Schork NJ, Marschke KB, Ng S-C, Hermann TW, Zhang J, et al. Identification of polymorphisms associated with hypertriglyceridemia and prolonged survival induced by bexarotene in treating non-small cell lung cancer. Anticancer Res. 2011;31(6):2303–2311. [PubMed] [Google Scholar]
  • 40.Massarweh S, Tham YL, Huang J, Sexton K, Weiss H, Tsimelzon A, et al. A phase II neoadjuvant trial of anastrozole, fulvestrant, and gefitinib in patients with newly diagnosed estrogen receptor positive breast cancer. Breast Cancer Res Treat. 2011;129(3):819. doi: 10.1007/s10549-011-1679-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Cui B, Luo Y, Tian P, Peng F, Lu J, Yang Y, et al. Stress-induced epinephrine enhances lactate dehydrogenase A and promotes breast cancer stem-like cells. J Clin Invest. 2019;129(3):1030–1046. doi: 10.1172/JCI121685. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Jayaraman S, Hou X, Kuffel MJ, Suman VJ, Hoskin TL, Reinicke KE, et al. Antitumor activity of Z-endoxifen in aromatase inhibitor-sensitive and aromatase inhibitor-resistant estrogen receptor-positive breast cancer. Breast Cancer Res. 2020;22:1–12. doi: 10.1186/s13058-020-01286-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Hakim S, Craig JM, Koblinski JE, Clevenger CV. Inhibition of the activity of cyclophilin A impedes prolactin receptor-mediated signaling, mammary tumorigenesis, and metastases. Iscience. 2020;23(10):101581. doi: 10.1016/j.isci.2020.101581. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Sayar N, Karahan G, Konu O, Bozkurt B, Bozdogan O, Yulug IG. Transgelin gene is frequently downregulated by promoter DNA hypermethylation in breast cancer. Clin Epigenet. 2015;7(1):104. doi: 10.1186/s13148-015-0138-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Marchan R, Büttner B, Lambert J, Edlund K, Glaeser I, Blaszkewicz M, et al. Glycerol-3-phosphate acyltransferase 1 promotes tumor cell migration and poor survival in ovarian carcinoma. Can Res. 2017;77(17):4589–4601. doi: 10.1158/0008-5472.CAN-16-2065. [DOI] [PubMed] [Google Scholar]
  • 46.Lesjak MS, Marchan R, Stewart JD, Rempel E, Rahnenführer J, Hengstler JG. EDI3 links choline metabolism to integrin expression, cell adhesion and spreading. Cell Adhes Migr. 2014;8(5):499–508. doi: 10.4161/cam.29284. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

40709_2021_136_MOESM1_ESM.xlsx (15.3KB, xlsx)

Additional file 1: Table S1. The function summaryAffyRNAdeg of Bioconductor package produced a single summary-statistic for each array in the batch dataset.

40709_2021_136_MOESM2_ESM.xlsx (8.9KB, xlsx)

Additional file 2: Table S2. List of Databases, software, and Tools used in this study.

40709_2021_136_MOESM3_ESM.xlsx (10.9KB, xlsx)

Additional file 3: Table S3. Preliminary investigation of common and related differentially expressed genes of each microarray dataset.

Data Availability Statement

The data has been presented with the article.


Articles from Journal of Biological Research are provided here courtesy of BMC

RESOURCES