Skip to main content
. 2023 Mar 18;12(6):935. doi: 10.3390/cells12060935

Table 2.

Data mining studies of multi-cancer early detection.

Biomarker Database Tumor Types Main Findings Ref.
DNA
methylation
TCGA
GEO
26 tumor types Identified 7 informative CpG sites capable of discriminating tumor from normal samples. AUC of 0.986 was obtained in the training set.
Validation using GEO datasets of breast, colorectal cancer, and prostate cancer obtained AUCs of 0.97, 0.95, and 0.93, respectively.
Validation set comprising the remaining cancer types obtained an AUC of 0.94.
Identified 12 CpG sites capable of discriminating each tumor type with an AUC of 0.98.
[168]
TCGA
GEO
27 tumor types Identified 12 CpG markers and 13 promoter markers and constructed diagnostic models by deep learning.
CpG marker model achieved 98.1% sensitivity, 99.5% specificity, and 98.5% accuracy on training set, while achieving 92.8% sensitivity, 90.1% specificity, and 92.4% accuracy on testing set.
Promoter marker model achieved 96.9% sensitivity, 99.9% specificity, and 97.8% accuracy on testing set, while achieving 89.8% sensitivity, 81.1% specificity, and 88.3% accuracy on testing set.
[156]
TCGA
GEO
27 tumor types Developed the CAncer Cell-of-Origin (CACO) methylation panel comprising 2 572 cytosines that are significantly hypermethylated in tumor tissues compared with normal tissues and healthy blood samples.
CACO panel identified TOO with AUC ranging from 0.856 to 0.998 in discovery cohort and 0.854 to 0.998 in validation cohort.
CACO panel could identify TOO in liquid biopsies and unknown primary carcinoma samples.
[169]
TCGA 14 tumor types Combined genome-wide differential methylation profiling with machine learning to detect cancer and discriminate TOO.
Set of 4 CpGs detected cancer with an AUC of 0.95 in the discovery set and an AUC of 0.96 in the validation set.
Set of 20 CpGs discriminated TOO with AUC values ranging from 0.87 to 0.99; 12 out of 14 cancer types were discriminated with sensitivities and specificities above 90%.
[157]
TCGA
GEO
3 tumor types Developed a machine learning algorithm to detect and discriminate TOO in 3 urological cancers (prostate, bladder, and kidney) using 128 methylation markers.
99.1% accuracy in training set; 97.6% accuracy in 2 independent validation sets.
[170]
TCGA
GEO
33 tumor types Identified a 12-market set that can detect all 33 cancers in TCGA database with AUCs > 0.84.
Identified sets of 6 markers that can discriminate TOO with AUCs ranging from 0.969 to 1.
[171]
TCGA 12 tumor types While performing genome-wide methylation analysis for pancreatic cancer biomarker discovery, identified SST as hypermethylated in pancreatic tumors compared to normal tissue and showed an AUC of 0.89 for pancreatic cancer detection in cfDNA.
SST methylation and expression in 11 other cancer types showed significant hypermethylation and downregulation of expression when compared to the respective normal tissue (p < 0.0001).
[154]
TCGA
GEO
14 tumor types Identified 6 CpGs in the GSDME gene differentially methylated between tumor and normal samples and used them for developing a machine learning algorithm for cancer identification.
98.8% sensitivity, 94.2% specificity, and AUC of 0.86 in the training set. AUC of 0.85 in validation set.
6 CpG model showed TOO discrimination capacity.
[172]
DNA
methylation, gene expression and somatic
mutations
TCGA 13 tumor types Developed EAGLING, a model that expands the Illumina 450K array data to cover about 30% of CpGs in the genome. Used this expanded methylation data combined with gene expression and somatic mutation data to identify genes with differential patterns in various cancer types (triple-evidenced genes).
Developed a machine learning algorithm, using the identified triple-evidenced genes, for cancer detection. AUC of 0.85 was obtained; 95.3% accuracy was obtained for TOO discrimination.
TNXB, RRM2, CELSR3, SLC16A3, FANCI, MMP9, MMP11, SIK1, and TRIM59 showed great capacity for cancer diagnosis.
[158]
Gene mutations TCGA 5 tumor types Based on a tumor’s mutations and their respective GO terms and KEGG pathways, a machine learning algorithm was developed for TOO discrimination; 62% accuracy was obtained for discriminating TOO in 5 cancer types. [173]
Gene expression GEO 10 tumor types Developed a deep learning classifier for multi-cancer diagnosis using transcriptomic data termed DeepDCancer.
90% accuracy was obtained for distinguishing cancer from normal samples, while accuracies ranged from 86 to 98% (94% average) for discriminating individual cancer types.
96% accuracy was obtained for distinguishing cancer from normal samples using an improved classifier, DeepDCancer.
[159]
TCGA 40 tumor types Developed SCOPE, a machine learning algorithm that uses RNA-seq data for TOO prediction.
SCOPE achieved 97% accuracy in training set and 99% in testing set.
SCOPE showed the ability to identify TOO in cancers of unknown primary.
[174]
TCGA 11 tumor types Developed GeneCT, a deep learning algorithm that uses RNA-seq data for cancer identification and TOO prediction. Known cancer-related genes were used for cancer status identification and transcription factors for TOO prediction.
100% sensitivity and 99.6% specificity for cancer identification in training set. 96.0% sensitivity and 96.1 specificity for cancer identification in validation set.
99.6% accuracy for TOO prediction in training set and 98.6% in validation set.
[160]
TCGA 33 tumor types 5 machine learning algorithms were compared on their performance for cancer classification.
Linear support vector machine (SVM) showed the best accuracy of 95.8%.
[175]
TCGA 5 tumor types Developed a deep learning model for TOO discrimination using RNA-seq data among the 5 most common cancers in women. LASSO feature selection reduced all 14,899 genes to only 173 relevant genes.
99.45% accuracy was obtained for discriminating TOO in 5 cancer types.
[176]
TCGA
GTEx
28 tumor types Identified differentially expressed genes (DEGs) that were shared in various cancer types and constructed a diagnostic model using 10 upregulated DEGs (CCNA2, CDK1, CCNB1, CDC20, TOP2A, BUB1B, AURKB, NCAPG, CDC45, and TTK).
AUC of 0.894 was obtained for discriminating cancer from normal samples.
[177]
TCGA 15 tumor types MMP11 and MMP13 expression was significantly higher in most cancer types compared to tissue matched controls.
Each cancer type featured at least one MMP with an AUC greater than 0.9, except prostate cancer; 6 cancer types featured 4 or more MMPs with AUC > 0.9.
If serum detection is possible, upregulated MMP11 or MMP13 could serve as a multi-cancer biomarker.
[178]
TCGA 9 tumor types Hsp90α expression was significantly higher in 8 cancers compared to tissue matched controls, except for prostate cancer which displayed significant lower expression.
AUC values ranged from 0.63 to 0.94 for individual cancer types.
[155]
TCGA
GTEx
33 tumor types Claudin-6 was significantly overexpressed in 20 cancer types.
AUC > 0.7 were obtained for detecting 15 cancer types.
AUC > 0.9 were obtained for detecting acute myeloid leukemia, testicular, ovarian, and uterine cancer.
[179]
TCGA
GTEx
33 tumor types YTHDC2 expression was significantly downregulated in most cancers compared with normal tissues.
YTHDC2 displayed high diagnostic value (AUC > 0.90) for 7 cancer types and moderate diagnostic value (AUC > 0.723) in 8 cancer types.
[180]
TCGA
GTEx
24 tumor types PAFAH1B expression was significantly upregulated in most cancers compared with normal tissues.
PAFAH1B displayed high diagnostic value (AUC > 0.90) for 15 cancer types and moderate diagnostic value (AUC > 0.75) in 9 cancer types.
[181]
TCGA
GTEx
20 tumor types SHC1 expression was significantly upregulated in most cancers compared with normal tissues.
SHC1 displayed high diagnostic value (AUC > 0.90) for 4 cancer types and moderate diagnostic value (AUC > 0.70) in 16 cancer types.
Strong diagnostic capability for KICH (AUC = 0.92), LIHC (AUC = 0.95), and PAAD (AUC = 0.95).
[182]
TCGA
GTEx
29 tumor types GPC2 expression was significantly upregulated in 12 early-stage cancers compared with normal tissues.
GPC2 displayed high diagnostic value (AUC > 0.90) for 6 cancer types, moderate diagnostic value (AUC > 0.70) in 16 cancer types, and low diagnostic value (AUC > 0.50) in 7 cancer types.
[183]
ncRNA TCGA 26 tumor types Developed algorithms to remove all the factor effects (genetic, epidemiological, and environmental variables) from big data and revealed 56 ncRNAs as universal markers for 26 cancer types. Used these 56 ncRNAs as markers and employed machine learning algorithms to discriminating cancer from normal samples and identify TOO.
AUC of 0.963 for discriminating cancer from normal samples. AUC values ranged from 0.99 to 1 for detecting individual cancer types.
82.15% accuracy for discriminating TOO.
[161]
lncRNA TCGA
GEO
9 tumor types CRNDE expression was significantly higher in 9 cancers compared to tissue matched controls.
AUC values ranged from 0.855 to 0.984, sensitivities from 70 to 97% and specificities from 75 to 100%.
Meta-analysis from 6 studies showed a pooled sensitivity of 77%, specificity of 90%, and AUC of 0.87.
[184]
TCGA
GEO
12 tumor types Identified 6 differently expressed long intergenic noncoding RNAs (lincRNAs) (PCAN-1 to PCAN-6) and applied machine learning algorithms for cancer detection using 5 of them.
AUC of 0.947 was obtained in the training set. AUC of 0.947, 81.7% sensitivity, and 97% specificity were obtained in the testing set.
[185]
TCGA
GEO
8 tumor types Using RNA-seq and methylation data from TCGA, identified 9 epigenetically regulated lncRNAs (lncRNAs regulated by methylation) that can predict cancer. Developed a score based on expression and methylation data of these 9 genes (PVT1, PSMD5-AS1, FAM83H-AS1, MIR4458HG, HCP5, GAS5, CTD2201E18.3, HCG11, and AC016747.3) that was applied to all cancer and normal samples.
AUC values ranged from 0.741 to 0.992 for detecting 8 cancer types. AUC values ranged from 0.712 to 1 in an independent validation set.
[186]
TCGA 33 tumor types SNHG3 expression was significantly upregulated in 16 (out of 33) cancers compared with normal tissues.
72% sensitivity, 87% specificity, and an AUC of 0.89 was observed for cancer detection.
[187]
microRNA TCGA 21 tumor types Used machine learning algorithms to develop a multi-cancer diagnostic method based on microRNA expression. Support vector machine (SVM) classifier was chosen, since it provided the highest accuracy of 97.2%, sensitivities over 90%, and specificities of 100% for most cancers. [188]
GEO 11 tumor types Developed a computational pipeline for extracellular microRNA-based cancer detection and classification.
All classifiers showed accuracies over 95%. SVM classifier performed the best, with 99% accuracy.
Identified a 10 microRNA-signature capable of TOO discrimination.
[162]
TCGA 4 tumor types Identified 3 differentially expressed miRNAs (miR-552, miR-490, and miR-133a-2) with diagnostic potential for digestive tract cancers.
3 miRNAs showed high diagnostic value in rectal cancer (AUC > 0.961) and moderate diagnostic value in esophageal (AUC > 0.826), gastric (AUC > 0.798), and colon cancer (AUC > 0.797).
[189]
GEO 12 tumor types Developed a serum-based 4-microRNA diagnostic model (has-miR-5100, has-miR-1343-3hashsa-miR-1290hasnd hsa-miR-4787-3p) for cancer early detection.
Sensitivities ranging from 83.2 to 100% for biliary tract, bladder, colorectal, esophageal, gastric, glioma, liver, pancreatic, and prostate cancers were obtained, while reasonable sensitivities of 68.2 and 72.0% for ovarian cancer and sarcoma, respectively, with 99.3% specificity.
[190]
GEO 12 tumor types Developed a m6A target miRNAs serum signature, based on 18 microRNAs combined with machine learning, for cancer detection.
93.9% sensitivity, 93.3% specificity, and AUC of 0.979 in training set.
94.2% sensitivity, 91.6% specificity, and AUC of 0.976 in internal validation set.
90.8% sensitivity, 84.7% specificity, and AUC of 0.936 in external validation set.
[191]
Progenitorness score TCGA
GEO
17 tumor types Selected 77 progenitor genes and formulated a score to quantify the progenitorness of a sample using its expression profile data.
Tumor samples showed significantly higher progenitorness scores than normal tissues for all cancer types, with AUC ranging from 0.746 to 1.000. For the majority of cancers, AUC was above 0.90.
[192]