Abstract
Colorectal cancer (CRC) is the third leading cause of cancer deaths. Advances within bioinformatics, such as machine learning, can improve biomarker discovery and ultimately improve CRC survival rates. There are clear sex differences in CRC characteristics, but the impact of sex has not been considered with regards to CRC biomarkers. Our aim here was to investigate sex differences in the transcriptome of a normal colon and CRC, and between paired normal and tumor tissue. Next, we attempted to identify CRC diagnostic and prognostic biomarkers and investigate if they are sex-specific. We collected paired normal and tumor tissue, performed RNA-seq, and applied feature selection in combination with machine learning to identify the top CRC diagnostic biomarkers. We used The Cancer Genome Atlas (TCGA) data to identify sex-specific CRC diagnostic biomarkers and performed an overall survival analysis to identify sex-specific prognostic biomarkers. We found transcriptomic sex differences in both the normal colon tissue and in CRC. Forty-four of the top-ranked biomarkers were sex-specific and 20 biomarkers showed a sex-specific prognostic value. Our data show the importance of sex in the discovery of CRC biomarkers. We propose 20 sex-specific CRC prognostic biomarkers, including ESM1, GUCA2A, and VWA2 for males and CLDN1 and FUT1 for females.
Keywords: biomarkers, colorectal cancer, feature selection, machine learning, sex differences
1. Introduction
Colorectal cancer (CRC) is the third leading cause of cancer deaths among both women and men in the US [1]. In Sweden, it is the second most common form of cancer in both sexes [2]. The 5-year survival rate is 91% for stage I CRC patients and 82% for stage II. However, the majority of CRC are detected at later stages with a decline in survival to 12% for stage IV CRC [3]. The poor prognosis highlights the need for new diagnostic and prognostic biomarkers to avoid CRC-related deaths. Current screening efforts include sigmoidoscopy and colonoscopy, which have been shown to significantly reduce CRC mortality. However, this association is limited to deaths from left-sided CRC [4] and participation rates remain low. Non-invasive methods using blood and stool-based tests have been proven to increase the compliance to CRC screening [5].
Identification of biomarkers, which can improve the diagnosis and disease monitoring, could significantly improve the survival rates. The advances in bioinformatics tools provide opportunities to speed up biomarker discovery and have been integrated for several cancers, including CRC [6,7]. Transcriptome studies have potential to yield large amounts of data, but the standard differential gene expression analysis has limitations. It is, for example, not performed in a multivariate setting and does not consider inter-gene relationships. Feature selection in combination with machine learning can add a new layer to the differential expression analysis and substantially improve biomarker discovery. There is also an urgent need to investigate potential sex differences in biomarker discovery. The current lack of this perspective may be one contributor to why many biomarkers fail to reach the clinic.
Sex-specific CRC recurrence and survival rates have been reported [8]. The incidence and mortality among patients over 65 years are higher for women compared to men, and the 5-year OS rate is lower for women [8]. However, the reverse is seen in pre-menopausal women [9]. Women are also more prone to right-sided CRC, which is associated with a more aggressive type compared to left-sided, more common in men [10,11]. There are also molecular sex differences where women have a higher number of B-Raf proto-oncogene, serine/threonine kinase (BRAF) mutations and a higher microsatellite instability (MSI) status compared to men, whereas men have a higher number of NRAS proto-oncogene, GTPase (NRAS) mutations [12]. Recently, we identified that mice exhibit sex differences in their colon transcriptomes [13]. Some of these differences may be related to estrogen signaling [8,14,15].
Despite the sex differences seen in CRC, most research is done without considering sex in study designs or interpretations. Sex-specific strategies for screening, prevention, and treatment should be considered in order to reduce CRC mortality. In the present study, we evaluated sex differences in the transcriptome of both non-tumor colon epithelium and CRC. Additionally, we studied sex differences in relation to diagnostic and prognostic biomarkers. Our study highlights sex differences in the normal colon, related to bile acid secretion, vitamin digestion and absorption, and in the tumor, especially related to immune response. Moreover, our study shows the importance of sex in the discovery of prognostic biomarkers. We identified 20 sex-specific prognostic biomarkers, including previously proposed biomarkers (endothelial cell-specific molecule 1/ESM1, guanylate cyclase activator 2A/GUCA2A, claudin 1/CLDN1) and novel ones (fucosyltransferase 1/FUT1 and von Willebrand factor A domain containing 2/VWA2).
2. Results
2.1. Normal Colon and CRC Transcriptomes Exhibit Sex-Related Differences
We first validated our CRC patient cohort by exploring the expression of two well-known CRC biomarkers, early diagnostic biomarker fibronectin 1 (FN1) and prognostic biomarker cell migration inducing hyaluronidase 1 (CEMIP) [16,17]. We validated the upregulation of both FN1 and CEMIP by qPCR (Figure S1A). Next, we used the RNA-seq data to identify the tumors’ molecular subtypes and compared their distribution. The distribution of the molecular subtypes in our cohort was similar to what was observed by Phipps et al. [18] (Table S1). Next, we investigated if there are sex differences in the transcriptome of normal mucosa and CRC samples. The sex differences were slightly larger in the normal mucosa compared to the CRC tissue, with 153 and 118 differentially expressed genes (DEG, with cutoff padj < 0.05, |log2FC| > 0.4, and fragments per kilobase of sequence per million mapped reads (FPKM) > 1), respectively (Figure 1A). The majority of the DEG was higher expressed in males compared to females (Figure 1B). Interestingly, only one gene, the mitochondrial enzyme carbamoyl-phosphate synthase 1 (CPS1), remained differentially expressed between the sexes in both conditions (Figure 1A). Biological process (BP) and KEGG pathway enrichment analysis revealed that the sex differences in the normal colon were related to metabolism, inflammatory bowel disease (IBD), bile secretion, epithelial cell differentiation, and PPAR signaling (Figure 1C). The sex differences in the CRC tumors were related to immune response and cell proliferation (Figure 1C). Thus, we noted clear sex differences in our cohort, both in normal and tumor tissue.
2.2. Transcriptomic Sex Differences Independent of Subtype and Tumor Location
Since it is well known that the sexes present differences in tumor location and characteristics [10,11,12], we investigated the distribution of tumor location and CRC molecular subtypes 1–5 based on the classification proposed by Jass in 2007 [19], which may be confounding factors in the analysis. There was no significant difference in the tumor location between the sexes (Figure S1B), which suggest that the sex differences in the transcriptomic analysis were not attributed to differences in tumor location. The females presented all subtypes whereas the males only presented subtype 3 and subtype 4 (Figure S1C). Subtype 5 clustered apart from subtypes 3 and 4 in the principal component analysis (PCA) plot (Figure S1E). In order to exclude the effect of subtype differences in the analysis, we repeated the analysis with subtypes 3 and 4 only (Figure S1C,D). The majority (75%) of the DEG remained differentially expressed between the sexes (Figure S1F–G), and the predominant pathways were still related to immune response (Figure S1H). This suggests that the sex differences in the tumor transcriptome related to immune response were not attributed to differences between molecular subtypes.
2.3. Paired Normal–CRC Gene Expression Analysis Reveals Sex-Related Differences
Next, we compared alterations between the normal colon and CRC transcriptomes for each patient using pairwise comparisons and investigated if the sexes showed different profiles. In females, 7156 genes were differentially expressed between the paired normal colon and tumor, whereas 2611 genes were differentially expressed in males (Figure 2A). Nearly all genes regulated in male tumors (2352 out of 2611, or 90%) were also altered in female tumors. A smaller set of 259 genes appeared to have a male-specific and a larger set (4804) a female-specific tumor expression (Figure 2A). There was an equal distribution of up- and downregulated genes in both females and males (Figure 2B). The genes regulated in both male and female tumors were related to typical CRC pathways, including PPAR signaling, bile secretion, proliferation, inflammatory response, apoptosis, TNF signaling, metabolic pathways, hypoxia, and angiogenesis (Figure 2C). Female-enriched pathways included NFκB signaling, WNT signaling, cell division, DNA repair and response to glucose, and insulin (Figure 2D). In males, response to cAMP, calcium ion, nutrient, mechanical stimulus, and patterning of blood vessels were regulated (Figure 2D). Overall, females and males differed in their gene expression in both the normal colon and CRC (Figure 1B). However, the actual changes in the tumor tissue (compared to the normal tissue of the same patient) were similar to the common CRC pathways, but we also identified sex-specific differences.
2.4. Sex-Specific Features Independent of the Imbalanced Data
The higher numbers of DEG in females may be due to the imbalanced data (n = 18 for females and n = 6 for males). In order to exclude the effect of the imbalanced data we performed differential expression analysis on six randomly selected female tumor samples (from subtype 3 and 4) and matched normal samples in three individual runs (Figure S2A,B). The females still presented more DEG in the tumors compared to the males (Figure S2C). The common DEG between the sexes were still related to the same pathways (Figure 2C and Figure S2D) and the female-specific pathways were still related to NFκB signaling, WNT signaling, and cell division (Figure 2D and Figure S2E). The male-specific pathways were still related to response to cAMP, nutrient, and mechanical stimulus (Figure 2D and Figure S2E). Interestingly, 100% of the female-specific tumor expression in the balanced data (n = 6 for both females and males) overlapped with the female-specific tumor expression in the unbalanced data (n = 18 for females, Figure S2F). This supports that the female-specific tumor genes were not due to the imbalanced data.
2.5. Biomarker Discovery Revealed Common and Sex-Specific CRC Biomarkers
To study whether sex differences impact data-driven diagnostic and prognostic biomarker discovery, we used feature selection methods separated by sex. The methods included the variable importance testing approach (Vita), minimum redundancy—maximum relevance (MRMR), and Boruta algorithm (Figure 3A). Due to the larger patient cohorts, we used The Cancer Genome Atlas (TCGA, COAD and READ) data for sex-specific biomarker discovery and combined the sexes for the Swedish cohort. With the selection criteria for Vita + Boruta and Vita + MRMR, 81, 56, and 37 features passed the selection for female TCGA, male TCGA, and Swedish mixed cohort, respectively (Figure 3B and Table 1). Next, we performed DESeq2 on the features to ensure that the selected features were significantly different between the CRC and paired normal samples (cutoff of padj < 0.05, |log2FC| > 2 and FPKM > 1). With the selected cutoff, 54, 46, and 19 of the features that passed the feature selection were differentially expressed for the female, male, and Swedish mixed cohorts, respectively (Table 1). The PCA plots on these differentially expressed features showed a clear separation between the non-cancerous and CRC groups in all three cohorts (Figure 3C). The biomarker discovery showed that females and males in addition to the common biomarkers also presented sex-specific ones (Figure 3D). In addition, the independent Swedish mixed cohort corroborated some biomarkers, even though the sexes were mixed, which strengthens the results (Figure 3D).
Table 1.
Cohort | Original Feature Numbers | Feature Selection Methods |
Selected Features | Features in Common | Differentially Expressed Features |
---|---|---|---|---|---|
Female TCGA data | 56,719 | Vita + Boruta | 100 | 81 | 54 |
Vita + MRMR | 100 | ||||
Male TCGA data | 56,719 | Vita + Boruta | 71 | 56 | 46 |
Vita + MRMR | 100 | ||||
Swedish mixed | 63,678 | Vita + Boruta | 57 | 37 | 19 |
Vita + MRMR | 100 |
2.6. Top-Ranked Common and Sex-Specific Biomarkers
Our data demonstrate that there are both common and sex-specific biomarkers. In order to evaluate if the best biomarkers are common or sex-specific, we performed machine-learning techniques to rank the features according to importance (Figure 3A). Random forest (RF) and adaptive boosting (AdaBoost) were used for machine learning. While RF performed best (Figure S2G), both gave similar ranked features. For the top 20 RF-ranked features, males and females presented 10 genes in common and 10 specific for each sex (Figure 4A). Next, we compared the biomarkers to an Italian cohort (GSE8671) containing 32-paired adenomas and colonic mucosa in an effort to determine if our biomarkers were regulated in the early stages of CRC and therefore could be considered as diagnostic biomarkers. The majority of the biomarkers were indeed regulated in the early stages of CRC tumorigenesis (Figure 4B). For the Swedish mixed cohort, cadherin 3 (CDH3) and ESM1 were ranked as the top biomarkers and were both upregulated in the tumors (Figure 4C,D). In addition, for the biomarkers to be considered as ideal diagnostic biomarkers and potential therapeutic targets, they should present an increased expression in the diseased state. The majority of the CRC biomarkers were downregulated in the TCGA dataset (Figure 4A). In order to detect potential new therapeutic targets, we performed the feature selection on the upregulated genes with Boruta and ranked them according to their importance. Boruta detected 86, 84, and 55 important features for female TCGA, male TCGA, and the mixed Swedish cohort, respectively (Figure 5A). Reassuringly, 100% of the upregulated TCGA biomarkers, and all but one (not thrombospondin-2 (THBS2)) of the upregulated Swedish mixed cohort from the previous analysis remained. Eighteen biomarkers were found in all three cohorts (Figure 5A). Twenty-eight newly identified upregulated biomarkers were sex-specific (Figure 5B,D) and six of the top-20 ranked biomarkers in the Swedish mixed cohort were common in the TCGA data (Figure 5C,D). Furthermore, diagnostic biomarkers secreted into body fluids are of specific interest for screening purposes. CEMIP, ESM1, inhibin subunit beta A (INHBA), matrix metallopeptidase 7 (MMP7), and collagen type XI alpha 1 chain (COL11A1) were identified as biomarkers in all cohorts, and are all secreted. Furthermore, cystatin SN (CST1) detected in female TCGA data, transcobalamin 1 (TCN1) detected in the Swedish mixed cohort, and palmitoleoyl-protein carboxylesterase (NOTUM) detected in female and male TCGA data are also secreted and thus of potential interest as screening biomarkers.
2.7. Biomarkers Have Sex-Specific Prognostic Value
Interestingly, although some of the top biomarkers were common in both sexes, the prognostic value of these could be sex-specific, and vice versa. We performed OS analysis with Kaplan–Meier plots and found that ESM1, an early biomarker and strong top candidate in all cohorts, showed a prognostic value when combining the sexes and had a clear unfavorable prognostic value specifically in males (Figure 5E and Table 2). CLDN1, a biomarker found in all three cohorts, had a clear unfavorable prognostic value for females specifically (Figure 5E and Table 2). Further down in the importance ranking lists we identified additional biomarkers with potential sex-specific prognostic values (Figure 6 and Table 2). Worth noting, solute carrier family 4 member 4 (SLC4A4) and kinesin family member 26B (KIF26B) were also significant when the sexes were combined, and showed a non-significant trend in the other sex (Figure S3A). Additional downregulated biomarkers presented a significant prognostic value when both sexes were combined but did not reach significance for either sex alone (e.g., prostaglandin D2 receptor 2/PTGDR2, aspartoacylase/ASPA, bestrophin 4/BEST4, and mineralocorticoid receptor/nuclear receptor subfamily 3 group C member 2/NR3C2; Figure S3B). None of the top-20 ranked upregulated biomarkers in CRC had prognostic value, except the previously identified biomarkers ESM1 and CLDN1. However, moving down in the importance-ranking list we identified seven new biomarkers with sex-specific prognostic values (Figure 6 and Table 2). Although epidermal growth factor-like domain-containing protein 6 (EGFL6), FUT1, and four-jointed box kinase 1 (FJX1) presented a sex-specific prognostic value, they presented a significant prognostic value when the sexes were combined and a non-significant trend in the other sex (data not shown). Overall, our data show that females and males indeed presented a number of sex-specific top biomarkers. Even more striking is that the prognostic value of the biomarkers was highly dependent on sex, with 20 biomarkers showing a sex-specific prognostic value. This suggests that some of the diagnostic biomarkers can have a profound impact on predicting CRC prognosis when sex is taken into account, and our results indicate that sex is an important factor when evaluating CRC biomarkers.
Table 2.
Biomarker | Cohort 1 | Rank 2 | Regulation 3 | Prognostic Value 4 |
---|---|---|---|---|
ESM1 | All | Top20 | Up | Unfavorable in males |
CLDN1 | All | Top20 | Up | Unfavorable in females |
TSPAN7 | Female TCGA | 39 | Down | Unfavorable in females |
SLC25A23 | Female TCGA | 35 | Down | Unfavorable in females |
C2orf88 | Female TCGA | 36 | Down | Favorable in males |
PKIB | Male TCGA | 27 | Down | Favorable in males |
P2RYI | Male TCGA | 37 | Down | Favorable in females |
RSPO2 | Male TCGA | 29 | Down | Unfavorable in females |
GCNT2 | Male TCGA and Swedish | Top20 | Down | Unfavorable in females |
HPSE2 | TCGA | 35 M and 25 F | Down | Favorable in males |
GUCA2A | TCGA | 31 M and 44 F | Down | Favorable in males |
SLC4A4 | TCGA | 23 M and 43 F | Down | Favorable in males |
KIF26B | Swedish | 6 | Up | Unfavorable in males |
PTGDR2 | Swedish | 15 | Down | Favorable in males and females (combined) |
ASPA | Female TCGA | 23 | Down | Unfavorable in males and females (combined) |
BEST4 | Female TCGA | 17 | Down | Favorable in males and females (combined) |
NR3C2 | Male TCGA | 30 | Down | Favorable in males and females (combined) |
SMOX | Male TCGA | 37 | Up | Unfavorable in females |
FUT1 | All | 38 M, 78 F and 27 S | Up | Unfavorable in females |
EGFL6 | Female TCGA | 27 | Up | Favorable in females |
VWA2 | Male TCGA | 31 | Up | Unfavorable in males |
FJX1 | TCGA | 67 M and 83 F | Up | Unfavorable in males |
S100A2 | TCGA | 64 M and 45 F | Up | Unfavorable in males |
EPHX4 | Female TCGA | 75 | Up | Favorable in females |
1 The cohort the biomarker was identified in. 2 The rank of the biomarker after importance ranking with machine learning. 3 Whether the biomarker was up- or downregulated (in the tumor and specified cohort). 4 Whether the biomarker correlated to a favorable or unfavorable prognostic value when highly expressed in males, females, or when both sexes were combined.
3. Discussion
Our objective with this study was to evaluate if there are sex differences in the gene expression of a normal colon and CRC, and whether separating the sexes can improve the diagnostic and prognostic CRC biomarkers. Several studies have shown that there are sex differences in CRC, regarding incidence and mortality, tumor location, and mutation status [8,10,11,12]. However, very few studies consider sex differences in the analysis of tumors and biomarkers. Recently, Cai et al. demonstrated that there are sex-specific metabolic sub-phenotypes dependent on tumor location [20]. However, no studies have evaluated sex-specific CRC biomarkers at a large scale. In this study, we analyzed sex differences in the gene expression of a normal colon and CRC. Further, we analyzed if there are sex-specific diagnostic biomarkers using feature selection methods in combination with machine learning with RF to rank the selected features. To evaluate the prognostic value of the biomarkers, we performed survival analysis of the TCGA data separated by sex.
Our findings revealed significant sex differences, which, if incorporated into biomarker discovery and the clinic, could impact CRC patient outcome. First, we demonstrated sex differences in the normal colon, especially among pathways related to gluconeogenesis, bile secretion, and carbohydrate, vitamin, and lipid metabolism, all known to be dysregulated in CRC. The sex differences in the normal colon might shape the tumor characteristics and microenvironment. This can help explain the differences in male and female incidences of CRC. Estrogen menopausal hormone therapy has indeed been shown to correlate to a lower CRC incidence [21,22,23]. Although the majority of the females were in the post-menopausal state during surgery, the sex differences in the tumors may be explained by the sex differences in the normal colon. However, a larger study including normal colon tissue from both pre- and postmenopausal women would be needed to further explore the role of estrogen signaling on the colon transcriptome. Furthermore, the sex differences seen in CRC were mostly related to the immune cell response, including B-cell receptor signaling. The X chromosome contains the vast majority of immune-related genes [24], and genes that escape inactivation can influence the expression of X-linked genes and lead to sex biases in inflammatory diseases.
The sex-independent potential diagnostic biomarkers (CLDN1, CEMIP, keratin 80/KRT80, CDH3, and ESM1) were ranked as top features in our paired cohort. This further validates the results in a study published by Long et al., who found CLDN1, CEMIP, and CDH3 amongst the most important features and potential diagnostic biomarkers [6]. Both ESM1 and CEMIP are secreted and can be promising CRC diagnostic biomarkers. Additionally, we found that CLDN1 has potential as an unfavorable prognostic biomarker specifically in females. CLDN1 has previously been proposed both as a marker for CRC prognosis and as a therapeutic target [25,26], and we suggest that this may be particularly relevant for females. ESM1 showed an unfavorable prognostic value in males. ESM1 regulates CRC cell growth and metastasis by activation of NFκB and has been shown to be of prognostic value for disease recurrence, and to correlate with a worse survival outcome [27,28]. Additionally, ESM1 is upregulated by vascular endothelial growth factor (VEGF) and is involved in hypoxia-associated angiogenesis, and further proposed as a potential therapeutic target [28]. Interestingly, we found female-specific (FtsJ RNA 2’-O-methyltransferase 1/FTSJ1, CST1, and glutamate ionotropic receptor NMDA type subunit 2D/GRIN2D) and male-specific (NOTUM, pancreatic and duodenal homeobox 1 (PDX1), and cyclin P/CCNP/CNTD2) top-ranked features, based on the TCGA data. Of note, CST1 and NOTUM are secreted and can be potential sex-specific diagnostic markers.
Moreover, FJX1, identified as an important biomarker in both sexes (not top 20), presented an unfavorable prognostic value specifically in males. FJX1 has also been shown to be involved in angiogenesis and associated with an unfavorable prognosis of CRC [29]. The common sex biomarker GUCA2A was downregulated in CRC in both sexes and showed a favorable prognostic value in males. GUCA2A mRNA and protein loss is among the most common gene losses in CRC, occurring in more than 85% of tumors [30], and has been suggested as a marker for poor prognosis [31]. GUCA2A is a peptide hormone and endogenous ligand for the guanylate cyclase 2C (GUCY2C) receptor. The loss of the GUCY2C signaling cascade due to GUCA2A downregulation promotes tumorigenesis [32]. Ligand replacement therapy to reactivate GUCY2C has been approved by the FDA or entered clinical trials [33]. Such interventions, however, relies on a maintained expression of GUCY2C. This suggests that GUCA2A can be a promising diagnostic biomarker in both sexes and may, together with the expression of GUCY2C, have a therapeutic value. Furthermore, the common sex upregulated biomarker S100 calcium-binding protein A2 (S100A2) was associated with an unfavorable prognostic value specifically in males. S100A2 has been shown to reprogram glycolysis and induce proliferation in CRC, and suggested as a therapeutic target [34]. High expression of S100A2 has been shown to correlate to a worse CRC OS [35].
Overall, in this study, we identified sex differences in the normal transcriptome, which may explain the sex differences in CRC susceptibility. Furthermore, we validated the previously proposed sex-independent diagnostic biomarkers CLDN1, CEMIP, and CDH3 and propose new potential biomarkers. Interestingly, we did not find a single significant biomarker showing a prognostic value independent of sex, while we identified 20 diagnostic features with a sex-specific prognostic value, in particular, ESM1, GUCA2A, FJX1, and S100A2 for males and CLDN1 for females. Importantly, our study highlights the need to take sex into account in CRC research, which may improve CRC mortality.
4. Materials and Methods
Patients and Samples
Clinical samples (colorectal tumors and matched noncancerous adjacent tissue) were collected from patients (n = 24, 18 women and 6 men) undergoing surgery in Stockholm after informed consent. The study was approved by the regional ethical review board in Stockholm (2016/957-31 and 2017/742-32). In addition, gene expression for 641 (299 women and 342 men) colorectal cancer (CRC) and 51 (28 women and 23 men) noncancerous mucosal tissues were downloaded from TCGA. The COAD and READ data were combined, the data were downloaded on 31st of January 2019, and the bioconductor package from R (Rversion 3.6.1) via the NCI Genomic Data Commons (GDC) data portal was used (TCGAbiolinks version 3.8). The molecular subtypes were determined on the Swedish cohort based on the status of the MSI, BRAF-, and KRAS mutations. The MSI status was determined using MSIsensor [36] and the BRAF- and KRAS mutation status was analyzed using the integrative genomics viewer (Broad Institute, Cambridge, MA, USA, version 2.5.2) [37]. A detailed description of the RNA isolation, quantitative PCR, gene expression analysis, feature selection, machine learning classification and overall survival analysis can be found online in the Supplementary Materials.
Acknowledgments
The authors acknowledge support from the National Genomics Infrastructure in Stockholm funded by Science for Life Laboratory, the Knut and Alice Wallenberg Foundation and the Swedish Research Council, and SNIC/Uppsala Multidisciplinary Center for Advanced Computational Science for assistance with massively parallel sequencing and access to the UPPMAX computational infrastructure. We would like to acknowledge Victor Jonsson at the National Bioinformatics Infrastructure Sweden at SciLifeLab for bioinformatics advice.
Supplementary Materials
The following are available online at https://www.mdpi.com/1422-0067/22/3/1354/s1, Supplementary Material and Methods, Figure S1. Sex differences in the normal colon and CRC transcriptome independent of tumor location and molecular subtypes, Figure S2. Sex-specific features in tumors compared to paired normal not due to the imbalanced data, Figure S3. Overall survival analysis of the biomarkers, Table S1. Distribution of molecular subtypes, and Table S2: Upregulated biomarkers selected with Boruta.
Author Contributions
Conceptualization, C.W. and L.H.; methodology, L.H., A.I. and Y.L.; software, L.H. and Y.L.; validation, L.H. and C.W.; formal analysis, L.H. and Y.L.; investigation, L.H., A.I. and Y.L.; resources, C.W., X.C. and J.H.; data curation, L.H. and Y.L.; writing—original draft preparation, L.H.; visualization, L.H.; supervision, C.W. and J.H.; project administration, C.W.; funding acquisition, C.W. All authors have read and agreed to the published version of the manuscript.
Funding
This work was supported by the Swedish Cancer Society (CAN 2018/596), Swedish Research Council (2017-01658), and Stockholm County Council (2017-0578).
Institutional Review Board Statement
The study was approved by the regional ethical review board in Stockholm (2016/957-31 and 2017/742-32).
Informed Consent Statement
Informed consent was obtained from all subjects involved in this study.
Data Availability Statement
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.
Conflicts of Interest
The authors declare no conflict of interest.
Footnotes
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Siegel R.L., Miller K.D., Jemal A. Cancer statistics, 2018. CA Cancer J. Clin. 2018;68:7–30. doi: 10.3322/caac.21442. [DOI] [PubMed] [Google Scholar]
- 2.Ferlay J., Colombet M., Soerjomataram I., Mathers C., Parkin D.M., Piñeros M., Znaor A., Bray F. Estimating the global cancer incidence and mortality in 2018: GLOBOCAN sources and methods. Int. J. Cancer. 2019;144:1941–1953. doi: 10.1002/ijc.31937. [DOI] [PubMed] [Google Scholar]
- 3.Miller K.D., Nogueira L., Mariotto A.B., Rowland J.H., Yabroff K.R., Alfano C.M., Jemal A., Kramer J.L., Siegel R.L. Cancer treatment and survivorship statistics, 2019. CA Cancer J. Clin. 2019;69:363–385. doi: 10.3322/caac.21565. [DOI] [PubMed] [Google Scholar]
- 4.Baxter N.N., Goldwasser M.A., Paszat L.F., Saskin R., Urbach D.R., Rabeneck L. Association of colonoscopy and death from colorectal cancer. Ann. Intern. Med. 2009;150:1–8. doi: 10.7326/0003-4819-150-1-200901060-00306. [DOI] [PubMed] [Google Scholar]
- 5.Adler A., Geiger S., Keil A., Bias H., Schatz P., deVos T., Dhein J., Zimmermann M., Tauber R., Wiedenmann B. Improving compliance to colorectal cancer screening using blood and stool based tests in patients refusing screening colonoscopy in Germany. BMC Gastroenterol. 2014;14:183. doi: 10.1186/1471-230X-14-183. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Long N.P., Park S., Anh N.H., Nghi T.D., Yoon S.J., Park J.H., Lim J., Kwon S.W. High-Throughput Omics and Statistical Learning Integration for the Discovery and Validation of Novel Diagnostic Signatures in Colorectal Cancer. Int. J. Mol. Sci. 2019;20:296. doi: 10.3390/ijms20020296. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Long N.P., Jung K.H., Yoon S.J., Anh N.H., Nghi T.D., Kang Y.P., Yan H.H., Min J.E., Hong S.S., Kwon S.W. Systematic assessment of cervical cancer initiation and progression uncovers genetic panels for deep learning-based early diagnosis and proposes novel diagnostic and prognostic biomarkers. Oncotarget. 2017;8:109436–109456. doi: 10.18632/oncotarget.22689. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Kim S.E., Paik H.Y., Yoon H., Lee J.E., Kim N., Sung M.K. Sex- and gender-specific disparities in colorectal cancer risk. World J. Gastroenterol. 2015;21:5167–5175. doi: 10.3748/wjg.v21.i17.5167. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Hendifar A., Yang D., Lenz F., Lurje G., Pohl A., Lenz C., Ning Y., Zhang W., Lenz H.J. Gender disparities in metastatic colorectal cancer survival. Clin. Cancer Res. 2009;15:6391–6397. doi: 10.1158/1078-0432.CCR-09-0877. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Pal S.K., Hurria A. Impact of age, sex, and comorbidity on cancer therapy and disease progression. J. Clin. Oncol. 2010;28:4086–4093. doi: 10.1200/JCO.2009.27.0579. [DOI] [PubMed] [Google Scholar]
- 11.Hansen I.O., Jess P. Possible better long-term survival in left versus right-sided colon cancer—A systematic review. Dan. Med. J. 2012;59:A4444. [PubMed] [Google Scholar]
- 12.Tsai Y.J., Huang S.C., Lin H.H., Lin C.C., Lan Y.T., Wang H.S., Yang S.H., Jiang J.K., Chen W.S., Lin T.C., et al. Differences in gene mutations according to gender among patients with colorectal cancer. World J. Surg. Oncol. 2018;16:128. doi: 10.1186/s12957-018-1431-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Hases L., Archer A., Indukuri R., Birgersson M., Savva C., Korach-André M., Williams C. High-fat diet and estrogen impacts the colon and its transcriptome in a sex-dependent manner. Sci. Rep. 2020;10:16160. doi: 10.1038/s41598-020-73166-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.DeCosse J.J., Ngoi S.S., Jacobson J.S., Cennerazzo W.J. Gender and colorectal cancer. Eur. J. Cancer Prev. 1993;2:105–115. doi: 10.1097/00008469-199303000-00003. [DOI] [PubMed] [Google Scholar]
- 15.Hases L., Indukuri R., Birgersson M., Nguyen-Vu T., Lozano R., Saxena A., Hartman J., Frasor J., Gustafsson J., Katajisto P., et al. Intestinal estrogen receptor beta suppresses colon inflammation and tumorigenesis in both sexes. Cancer Lett. 2020;492:54–62. doi: 10.1016/j.canlet.2020.06.021. [DOI] [PubMed] [Google Scholar]
- 16.Luo T., Wu S., Shen X., Li L. Network cluster analysis of protein-protein interaction network identified biomarker for early onset colorectal cancer. Mol. Biol. Rep. 2013;40:6561–6568. doi: 10.1007/s11033-013-2694-0. [DOI] [PubMed] [Google Scholar]
- 17.Fink S.P., Myeroff L.L., Kariv R., Platzer P., Xin B., Mikkola D., Lawrence E., Morris N., Nosrati A., Willson J.K., et al. Induction of KIAA1199/CEMIP is associated with colon cancer phenotype and poor patient survival. Oncotarget. 2015;6:30500–30515. doi: 10.18632/oncotarget.5921. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Phipps A.I., Limburg P.J., Baron J.A., Burnett-Hartman A.N., Weisenberger D.J., Laird P.W., Sinicrope F.A., Rosty C., Buchanan D.D., Potter J.D., et al. Association between molecular subtypes of colorectal cancer and patient survival. Gastroenterology. 2015;148:77–87.e2. doi: 10.1053/j.gastro.2014.09.038. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Jass J.R. Classification of colorectal cancer based on correlation of clinical, morphological and molecular features. Histopathology. 2007;50:113–130. doi: 10.1111/j.1365-2559.2006.02549.x. [DOI] [PubMed] [Google Scholar]
- 20.Cai Y., Rattray N.J.W., Zhang Q., Mironova V., Santos-Neto A., Hsu K.S., Rattray Z., Cross J.R., Zhang Y., Paty P.B., et al. Sex Differences in Colon Cancer Metabolism Reveal A Novel Subphenotype. Sci. Rep. 2020;10:4905. doi: 10.1038/s41598-020-61851-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Botteri E., Stoer N.C., Sakshaug S., Graff-Iversen S., Vangen S., Hofvind S., de Lange T., Bagnardi V., Ursin G., Weiderpass E. Menopausal hormone therapy and colorectal cancer: A linkage between nationwide registries in Norway. BMJ Open. 2017;7:e017639. doi: 10.1136/bmjopen-2017-017639. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Grodstein F., Newcomb P.A., Stampfer M.J. Postmenopausal hormone therapy and the risk of colorectal cancer: A review and meta-analysis. Am. J. Med. 1999;106:574–582. doi: 10.1016/S0002-9343(99)00063-7. [DOI] [PubMed] [Google Scholar]
- 23.Lobo R.A. Hormone-replacement therapy: Current thinking. Nat. Rev. Endocrinol. 2017;13:220–231. doi: 10.1038/nrendo.2016.164. [DOI] [PubMed] [Google Scholar]
- 24.Bianchi I., Lleo A., Gershwin M.E., Invernizzi P. The X chromosome and immune associated genes. J. Autoimmun. 2012;38:J187–J192. doi: 10.1016/j.jaut.2011.11.012. [DOI] [PubMed] [Google Scholar]
- 25.Nakagawa S., Miyoshi N., Ishii H., Mimori K., Tanaka F., Sekimoto M., Doki Y., Mori M. Expression of CLDN1 in colorectal cancer: A novel marker for prognosis. Int. J. Oncol. 2011;39:791–796. doi: 10.3892/ijo.2011.1102. [DOI] [PubMed] [Google Scholar]
- 26.Cherradi S., Ayrolles-Torro A., Vezzo-Vie N., Gueguinou N., Denis V., Combes E., Boissiere F., Busson M., Canterel-Thouennon L., Mollevi C., et al. Antibody targeting of claudin-1 as a potential colorectal cancer therapy. J. Exp. Clin. Cancer Res. 2017;36:89. doi: 10.1186/s13046-017-0558-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Kang Y.H., Ji N.Y., Han S.R., Lee C.I., Kim J.W., Yeom Y.I., Kim Y.H., Chun H.K., Kim J.W., Chung J.W., et al. ESM-1 regulates cell growth and metastatic process through activation of NF-kappaB in colorectal cancer. Cell Signal. 2012;24:1940–1949. doi: 10.1016/j.cellsig.2012.06.004. [DOI] [PubMed] [Google Scholar]
- 28.Kim J.H., Park M.Y., Kim C.N., Kim K.H., Kang H.B., Kim K.D., Kim J.W. Expression of endothelial cell-specific molecule-1 regulated by hypoxia inducible factor-1alpha in human colon carcinoma: Impact of ESM-1 on prognosis and its correlation with clinicopathological features. Oncol. Rep. 2012;28:1701–1708. doi: 10.3892/or.2012.2012. [DOI] [PubMed] [Google Scholar]
- 29.Al-Greene N.T., Means A.L., Lu P., Jiang A., Schmidt C.R., Chakravarthy A.B., Merchant N.B., Washington M.K., Zhang B., Shyr Y., et al. Four jointed box 1 promotes angiogenesis and is associated with poor patient survival in colorectal carcinoma. PLoS ONE. 2013;8:e69660. doi: 10.1371/journal.pone.0069660. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Wilson C., Lin J.E., Li P., Snook A.E., Gong J., Sato T., Liu C., Girondo M.A., Rui H., Hyslop T., et al. The paracrine hormone for the GUCY2C tumor suppressor, guanylin, is universally lost in colorectal cancer. Cancer Epidemiol. Biomark. Prev. 2014;23:2328–2337. doi: 10.1158/1055-9965.EPI-14-0440. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Zhang H., Du Y., Wang Z., Lou R., Wu J., Feng J. Integrated Analysis of Oncogenic Networks in Colorectal Cancer Identifies GUCA2A as a Molecular Marker. Biochem. Res. Int. 2019;2019:6469420. doi: 10.1155/2019/6469420. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Li P., Schulz S., Bombonati A., Palazzo J.P., Hyslop T.M., Xu Y., Baran A.A., Siracusa L.D., Pitari G.M., Waldman S.A. Guanylyl cyclase C suppresses intestinal tumorigenesis by restricting proliferation and maintaining genomic integrity. Gastroenterology. 2007;133:599–607. doi: 10.1053/j.gastro.2007.05.052. [DOI] [PubMed] [Google Scholar]
- 33.Bryant A.P., Busby R.W., Bartolini W.P., Cordero E.A., Hannig G., Kessler M.M., Pierce C.M., Solinga R.M., Tobin J.V., Mahajan-Miklos S., et al. Linaclotide is a potent and selective guanylate cyclase C agonist that elicits pharmacological effects locally in the gastrointestinal tract. Life Sci. 2010;86:760–765. doi: 10.1016/j.lfs.2010.03.015. [DOI] [PubMed] [Google Scholar]
- 34.Li C., Chen Q., Zhou Y., Niu Y., Wang X., Li X., Zheng H., Wei T., Zhao L., Gao H. S100A2 promotes glycolysis and proliferation via GLUT1 regulation in colorectal cancer. FASEB J. 2020 doi: 10.1096/fj.202000555R. [DOI] [PubMed] [Google Scholar]
- 35.Masuda T., Ishikawa T., Mogushi K., Okazaki S., Ishiguro M., Iida S., Mizushima H., Tanaka H., Uetake H., Sugihara K. Overexpression of the S100A2 protein as a prognostic marker for patients with stage II and III colorectal cancer. Int. J. Oncol. 2016;48:975–982. doi: 10.3892/ijo.2016.3329. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Niu B., Ye K., Zhang Q., Lu C., Xie M., McLellan M.D., Wendl M.C., Ding L. MSIsensor: Microsatellite instability detection using paired tumor-normal sequence data. Bioinformatics. 2014;30:1015–1016. doi: 10.1093/bioinformatics/btt755. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Robinson J.T., Thorvaldsdóttir H., Winckler W., Guttman M., Lander E.S., Getz G., Mesirov J.P. Integrative genomics viewer. Nat. Biotechnol. 2011;29:24–26. doi: 10.1038/nbt.1754. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.