Colorectal cancer (CRC) is the third leading cause of cancer-related mortality men and women in the US. Implementation of universal CRC screening for all adults over age 50 years has generated a steady trend of decreasing incidence and mortality for later onset CRC (loCRC). Prior to the 1990s, the incidence of early onset CRC (eoCRC) in those under 50 years was stable or declining. However, over the past three decades, there is a persistent and alarming trend of increasing incidence of eoCRC1,2. This appears to be a worldwide phenomenon and is associated with a birth cohort effect, implicating environmental over genetic factors as the primary contributor, and is associated with lifestyle-based risk factors such as obesity, sedentary lifestyle, tobacco use, alcohol use, and Western diet, most of which have concurrently increased in incidence in the US population over the past three decades3,4. Additionally, eoCRC tends to present at more advanced stage with “red flag” signs such as rectal bleeding or bowel obstruction, with more aggressive histologic features and distal location in the rectum or sigmoid colon. Unfortunately, the etiology for this increasing incidence and differences in clinical presentation and tumor characteristics are not well understood. In this study, we proposed that differences in tumor characteristics in eoCRC could manifest with alterations in the molecular biology of the tumor.
We utilized RNASeq data from The Cancer Genome Atlas (TCGA), a largescale consortium with publicly available molecular biology data for multiple human cancers. We analyzed TCGA colon adenocarcinoma (TCGA-COAD) samples using Gene Set Enrichment Analysis (GSEA), using the Kyoto Encyclopedia of Genes and Genomes as the gene set database5. GSEA for eoCRC compared to loCRC showed enrichment in a number of gene sets, with the top three gene sets ranked by normalized enrichment score being glycosaminoglycan biosynthesis (heparin sulfate and chondroitin sulfate) and retinol metabolism. No gene set reached statistically significant enrichment, but further analyses were performed on gene sets according to their rank.
Secondary analysis was conducted using additional data from TCGA rectal adenocarcinoma (TCGA-READ) as well as the Clinical Proteomics and Tumor Analysis Consortium-2 (CPTAC-2) RNASeq data6. Genes meeting enrichment threshold within the glycosaminoglycan and retinol biosynthesis gene sets were evaluated for differential expression in TCGA-COAD and TCGA-READ as well as CPTAC-2 RNASeq data. From the glycosaminoglycan biosynthesis gene set, no individual gene was statistically significantly upregulated in both TCGA and CPTAC-2 data sets. However, within the retinol biosynthesis gene set, ALDH1A1, ADH1B and UGT2B7 were significantly enriched across both datasets with p < 0.05, by Fisher’s exact test. Specifically, ALDH1A1 expression in eoCRC was 1.26-fold and 3.64-fold higher than loCRC in the TGCA and CPTAC-2 datasets, respectively (Figure 1A).
Figure 1. ALDH1A1 Expression in eoCRC and loCRC.

(A) RNASeq data of TCGA-COAD, TCGA-READ and combined COAD + READ as well as CPTAC-2 show increased mRNA counts of ALDH1A1 in the COAD + READ and CPTAC-2 analyses (p < 0.05 by Fisher’s exact t test). (B) Quantitative IHC staining of Korean cohort samples shows that eoCRC has increased prevalence of high ALDH1A1 protein expression compared to loCRC. (C) Representative IHC images and associated percentage surface area positive staining of ALDH1A1 IHC in Korean cohort samples. CRC=colorectal cancer; eo=early-onset; lo=later onset; TGCA-The Cancer Genome Atlas; COAD=colon adenocarcinoma; READ=rectal adenocarcinoma; FC= fold change difference; CPTAC-2=Clinical Proteomics and Tumor Analysis Consortium-2; IHC=immunohistochemistry
We chose to further evaluate ALDH1A1 due to its known roles in solid tumors acting as a cancer stem cell marker, its effect on xenobiotic metabolism and its role as the rate limiting step for retinoic acid biosynthesis. Protein expression of ALDH1A1 was evaluated with immunohistochemistry (IHC) staining on a cohort of 17 eoCRC and 19 loCRC Korean CRC specimens. Demographic and clinical data were available for gender, tumor stage, grade and tumor location. The eoCRC samples showed statistically significant increase in ALDH1A1 expression, on average having 5.9% of tissue staining positive, compared to 1.4% in loCRC samples (Figure 1B,C). Samples with adjacent normal tissue were noted to have rare crypt epithelial cells with ALDH1A1 staining, though normal tissue was excluded from analysis. Interestingly, there was a biphasic difference in the distribution of ALDH1A1 expression between samples, with specimens showing nearly no expression, or a high degree of expression. Comparing expression above or below the median expression levels using multivariate analysis, age less than 50 years was independently associated with higher ALDH1A1 expression (OR = 11.5, 95% CI = 1.66 to 79.54, p = 0.013, Table S1).
ALDH1A1 enrichment was validated at both the mRNA and protein level across multiple cohorts (TCGA, CPTAC-2 and the Korean cohort) showing a significant increase in ALDH1A1 expression levels in all datasets. Unfortunately, most eoCRC datasets are limited by the sample size, with CPTAC-2 only having 5 subjects, and the Korean cohort only having 17 subjects in the eoCRC group. When analyzing for mRNA expression or IHC staining by site, we did not reveal a statistically significant difference, likely owing to lack of power. However, even with this limited sample size, we observed a statistically significant difference in expression in these cohorts. Interestingly, overall ALDH1A1 expression levels tended to show a biphasic distribution of overall expression between different cancers, with the bottom quartile of expression on average having less than 0.1% positive staining, compared to the top quartile on average having 12.2% staining. This implies that CRCs with high ALDH1A1 expression represent a subset of all CRCs and that eoCRCs are more likely to fall within this subset.
These findings clearly outline increased prevalence of a high ALDH1A1 expression phenotype in eoCRC. ALDH1A1 has multiple proposed roles in cancer biology including serving as a cancer stem cell and prognostic marker, xenobiotic metabolism, retinoic acid metabolism, and immune modulation7,8. While isolated findings of ALDH1A1 expression in eoCRC does not necessarily point towards a potential discrete alteration in cell biology in eoCRC, our data does suggest that ALDH1A1 overexpression is associated with altered retinoic acid metabolism in eoCRC which has further implications in cancer biology9,10. While not statistically significant regarding GSEA, our study does show enrichment in genes involved in retinol biosynthesis and serves as a potential target for further evaluation. Additional genes (ADH1B and UGT2B7) showed increased expression as well and could be further evaluated. Lastly, while there is heterogeneity between our datasets in terms of demographics with the Korean cohort being Asian, this implies that these findings can be more generalized as they appear to hold between different geographical regions and varying demographics.
This data overall shows that eoCRC is associated with higher ALDH1A1 expression and implies a potential link between ALDH1A1 expression and alterations in the retinoic acid metabolism in eoCRC.
Supplementary Material
Acknowledgements.
This work was supported by the United States Public Health Service (R01 CA258519 to JMC and T32 DK094775 supporting AV), sources from the Department of Internal Medicine (to EKS) and the Rogel Cancer Center (to JMC) of the University of Michigan. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. This work was presented in abstract form at the 2022 Digestive Diseases Week held in San Diego, California.
Footnotes
Disclosure of potential conflicts of interest. The authors declare that they have no conflict of interest.
References
- 1.Siegel RL, et al. JAMA 2017;318:572–574. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Carethers JM. Dig Dis Sci 2016;61:2767–2769. [DOI] [PubMed] [Google Scholar]
- 3.Venugopal A, et al. EXCLI J 2022;21:162–182. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Hussan H, et al. Clin Transl Gastroenterol 2020;11:e00160. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Subramanian A, et al. Proc Natl Acad Sci U S A 2005;102:15545–50. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Vasaikar S, et al. Cell 2019;177:1035–1049 e19. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Chen J, et al. PLoS One 2015;10:e0145164. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Tomita H, et al. Oncotarget 2016;7:11018–32. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Zhang T, et al. PLoS One 2020;15:e0239601. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Applegate CC, et al. World J Gastrointest Oncol 2015;7:184–203. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
