Table 2.
Biomarker | Database | Tumor Types | Main Findings | Ref. |
---|---|---|---|---|
DNA
methylation |
TCGA GEO |
26 tumor types | Identified 7 informative CpG sites capable of discriminating tumor from normal samples. AUC of 0.986 was obtained in the training set. Validation using GEO datasets of breast, colorectal cancer, and prostate cancer obtained AUCs of 0.97, 0.95, and 0.93, respectively. Validation set comprising the remaining cancer types obtained an AUC of 0.94. Identified 12 CpG sites capable of discriminating each tumor type with an AUC of 0.98. |
[168] |
TCGA GEO |
27 tumor types | Identified 12 CpG markers and 13 promoter markers and constructed diagnostic models by deep learning. CpG marker model achieved 98.1% sensitivity, 99.5% specificity, and 98.5% accuracy on training set, while achieving 92.8% sensitivity, 90.1% specificity, and 92.4% accuracy on testing set. Promoter marker model achieved 96.9% sensitivity, 99.9% specificity, and 97.8% accuracy on testing set, while achieving 89.8% sensitivity, 81.1% specificity, and 88.3% accuracy on testing set. |
[156] | |
TCGA GEO |
27 tumor types | Developed the CAncer Cell-of-Origin (CACO) methylation panel comprising 2 572 cytosines that are significantly hypermethylated in tumor tissues compared with normal tissues and healthy blood samples. CACO panel identified TOO with AUC ranging from 0.856 to 0.998 in discovery cohort and 0.854 to 0.998 in validation cohort. CACO panel could identify TOO in liquid biopsies and unknown primary carcinoma samples. |
[169] | |
TCGA | 14 tumor types | Combined genome-wide differential methylation profiling with machine learning to detect cancer and discriminate TOO. Set of 4 CpGs detected cancer with an AUC of 0.95 in the discovery set and an AUC of 0.96 in the validation set. Set of 20 CpGs discriminated TOO with AUC values ranging from 0.87 to 0.99; 12 out of 14 cancer types were discriminated with sensitivities and specificities above 90%. |
[157] | |
TCGA GEO |
3 tumor types | Developed a machine learning algorithm to detect and discriminate TOO in 3 urological cancers (prostate, bladder, and kidney) using 128 methylation markers. 99.1% accuracy in training set; 97.6% accuracy in 2 independent validation sets. |
[170] | |
TCGA GEO |
33 tumor types | Identified a 12-market set that can detect all 33 cancers in TCGA database with AUCs > 0.84. Identified sets of 6 markers that can discriminate TOO with AUCs ranging from 0.969 to 1. |
[171] | |
TCGA | 12 tumor types | While performing genome-wide methylation analysis for pancreatic cancer biomarker discovery, identified SST as hypermethylated in pancreatic tumors compared to normal tissue and showed an AUC of 0.89 for pancreatic cancer detection in cfDNA. SST methylation and expression in 11 other cancer types showed significant hypermethylation and downregulation of expression when compared to the respective normal tissue (p < 0.0001). |
[154] | |
TCGA GEO |
14 tumor types | Identified 6 CpGs in the GSDME gene differentially methylated between tumor and normal samples and used them for developing a machine learning algorithm for cancer identification. 98.8% sensitivity, 94.2% specificity, and AUC of 0.86 in the training set. AUC of 0.85 in validation set. 6 CpG model showed TOO discrimination capacity. |
[172] | |
DNA
methylation, gene expression and somatic mutations |
TCGA | 13 tumor types | Developed EAGLING, a model that expands the Illumina 450K array data to cover about 30% of CpGs in the genome. Used this expanded methylation data combined with gene expression and somatic mutation data to identify genes with differential patterns in various cancer types (triple-evidenced genes). Developed a machine learning algorithm, using the identified triple-evidenced genes, for cancer detection. AUC of 0.85 was obtained; 95.3% accuracy was obtained for TOO discrimination. TNXB, RRM2, CELSR3, SLC16A3, FANCI, MMP9, MMP11, SIK1, and TRIM59 showed great capacity for cancer diagnosis. |
[158] |
Gene mutations | TCGA | 5 tumor types | Based on a tumor’s mutations and their respective GO terms and KEGG pathways, a machine learning algorithm was developed for TOO discrimination; 62% accuracy was obtained for discriminating TOO in 5 cancer types. | [173] |
Gene expression | GEO | 10 tumor types | Developed a deep learning classifier for multi-cancer diagnosis using transcriptomic data termed DeepDCancer. 90% accuracy was obtained for distinguishing cancer from normal samples, while accuracies ranged from 86 to 98% (94% average) for discriminating individual cancer types. 96% accuracy was obtained for distinguishing cancer from normal samples using an improved classifier, DeepDCancer. |
[159] |
TCGA | 40 tumor types | Developed SCOPE, a machine learning algorithm that uses RNA-seq data for TOO prediction. SCOPE achieved 97% accuracy in training set and 99% in testing set. SCOPE showed the ability to identify TOO in cancers of unknown primary. |
[174] | |
TCGA | 11 tumor types | Developed GeneCT, a deep learning algorithm that uses RNA-seq data for cancer identification and TOO prediction. Known cancer-related genes were used for cancer status identification and transcription factors for TOO prediction. 100% sensitivity and 99.6% specificity for cancer identification in training set. 96.0% sensitivity and 96.1 specificity for cancer identification in validation set. 99.6% accuracy for TOO prediction in training set and 98.6% in validation set. |
[160] | |
TCGA | 33 tumor types | 5 machine learning algorithms were compared on their performance for cancer classification. Linear support vector machine (SVM) showed the best accuracy of 95.8%. |
[175] | |
TCGA | 5 tumor types | Developed a deep learning model for TOO discrimination using RNA-seq data among the 5 most common cancers in women. LASSO feature selection reduced all 14,899 genes to only 173 relevant genes. 99.45% accuracy was obtained for discriminating TOO in 5 cancer types. |
[176] | |
TCGA GTEx |
28 tumor types | Identified differentially expressed genes (DEGs) that were shared in various cancer types and constructed a diagnostic model using 10 upregulated DEGs (CCNA2, CDK1, CCNB1, CDC20, TOP2A, BUB1B, AURKB, NCAPG, CDC45, and TTK). AUC of 0.894 was obtained for discriminating cancer from normal samples. |
[177] | |
TCGA | 15 tumor types |
MMP11 and MMP13 expression was significantly higher in most cancer types compared to tissue matched controls. Each cancer type featured at least one MMP with an AUC greater than 0.9, except prostate cancer; 6 cancer types featured 4 or more MMPs with AUC > 0.9. If serum detection is possible, upregulated MMP11 or MMP13 could serve as a multi-cancer biomarker. |
[178] | |
TCGA | 9 tumor types | Hsp90α expression was significantly higher in 8 cancers compared to tissue matched controls, except for prostate cancer which displayed significant lower expression. AUC values ranged from 0.63 to 0.94 for individual cancer types. |
[155] | |
TCGA GTEx |
33 tumor types | Claudin-6 was significantly overexpressed in 20 cancer types. AUC > 0.7 were obtained for detecting 15 cancer types. AUC > 0.9 were obtained for detecting acute myeloid leukemia, testicular, ovarian, and uterine cancer. |
[179] | |
TCGA GTEx |
33 tumor types |
YTHDC2 expression was significantly downregulated in most cancers compared with normal tissues. YTHDC2 displayed high diagnostic value (AUC > 0.90) for 7 cancer types and moderate diagnostic value (AUC > 0.723) in 8 cancer types. |
[180] | |
TCGA GTEx |
24 tumor types |
PAFAH1B expression was significantly upregulated in most cancers compared with normal tissues. PAFAH1B displayed high diagnostic value (AUC > 0.90) for 15 cancer types and moderate diagnostic value (AUC > 0.75) in 9 cancer types. |
[181] | |
TCGA GTEx |
20 tumor types |
SHC1 expression was significantly upregulated in most cancers compared with normal tissues. SHC1 displayed high diagnostic value (AUC > 0.90) for 4 cancer types and moderate diagnostic value (AUC > 0.70) in 16 cancer types. Strong diagnostic capability for KICH (AUC = 0.92), LIHC (AUC = 0.95), and PAAD (AUC = 0.95). |
[182] | |
TCGA GTEx |
29 tumor types |
GPC2 expression was significantly upregulated in 12 early-stage cancers compared with normal tissues. GPC2 displayed high diagnostic value (AUC > 0.90) for 6 cancer types, moderate diagnostic value (AUC > 0.70) in 16 cancer types, and low diagnostic value (AUC > 0.50) in 7 cancer types. |
[183] | |
ncRNA | TCGA | 26 tumor types | Developed algorithms to remove all the factor effects (genetic, epidemiological, and environmental variables) from big data and revealed 56 ncRNAs as universal markers for 26 cancer types. Used these 56 ncRNAs as markers and employed machine learning algorithms to discriminating cancer from normal samples and identify TOO. AUC of 0.963 for discriminating cancer from normal samples. AUC values ranged from 0.99 to 1 for detecting individual cancer types. 82.15% accuracy for discriminating TOO. |
[161] |
lncRNA | TCGA GEO |
9 tumor types | CRNDE expression was significantly higher in 9 cancers compared to tissue matched controls. AUC values ranged from 0.855 to 0.984, sensitivities from 70 to 97% and specificities from 75 to 100%. Meta-analysis from 6 studies showed a pooled sensitivity of 77%, specificity of 90%, and AUC of 0.87. |
[184] |
TCGA GEO |
12 tumor types | Identified 6 differently expressed long intergenic noncoding RNAs (lincRNAs) (PCAN-1 to PCAN-6) and applied machine learning algorithms for cancer detection using 5 of them. AUC of 0.947 was obtained in the training set. AUC of 0.947, 81.7% sensitivity, and 97% specificity were obtained in the testing set. |
[185] | |
TCGA GEO |
8 tumor types | Using RNA-seq and methylation data from TCGA, identified 9 epigenetically regulated lncRNAs (lncRNAs regulated by methylation) that can predict cancer. Developed a score based on expression and methylation data of these 9 genes (PVT1, PSMD5-AS1, FAM83H-AS1, MIR4458HG, HCP5, GAS5, CTD2201E18.3, HCG11, and AC016747.3) that was applied to all cancer and normal samples. AUC values ranged from 0.741 to 0.992 for detecting 8 cancer types. AUC values ranged from 0.712 to 1 in an independent validation set. |
[186] | |
TCGA | 33 tumor types | SNHG3 expression was significantly upregulated in 16 (out of 33) cancers compared with normal tissues. 72% sensitivity, 87% specificity, and an AUC of 0.89 was observed for cancer detection. |
[187] | |
microRNA | TCGA | 21 tumor types | Used machine learning algorithms to develop a multi-cancer diagnostic method based on microRNA expression. Support vector machine (SVM) classifier was chosen, since it provided the highest accuracy of 97.2%, sensitivities over 90%, and specificities of 100% for most cancers. | [188] |
GEO | 11 tumor types | Developed a computational pipeline for extracellular microRNA-based cancer detection and classification. All classifiers showed accuracies over 95%. SVM classifier performed the best, with 99% accuracy. Identified a 10 microRNA-signature capable of TOO discrimination. |
[162] | |
TCGA | 4 tumor types | Identified 3 differentially expressed miRNAs (miR-552, miR-490, and miR-133a-2) with diagnostic potential for digestive tract cancers. 3 miRNAs showed high diagnostic value in rectal cancer (AUC > 0.961) and moderate diagnostic value in esophageal (AUC > 0.826), gastric (AUC > 0.798), and colon cancer (AUC > 0.797). |
[189] | |
GEO | 12 tumor types | Developed a serum-based 4-microRNA diagnostic model (has-miR-5100, has-miR-1343-3hashsa-miR-1290hasnd hsa-miR-4787-3p) for cancer early detection. Sensitivities ranging from 83.2 to 100% for biliary tract, bladder, colorectal, esophageal, gastric, glioma, liver, pancreatic, and prostate cancers were obtained, while reasonable sensitivities of 68.2 and 72.0% for ovarian cancer and sarcoma, respectively, with 99.3% specificity. |
[190] | |
GEO | 12 tumor types | Developed a m6A target miRNAs serum signature, based on 18 microRNAs combined with machine learning, for cancer detection. 93.9% sensitivity, 93.3% specificity, and AUC of 0.979 in training set. 94.2% sensitivity, 91.6% specificity, and AUC of 0.976 in internal validation set. 90.8% sensitivity, 84.7% specificity, and AUC of 0.936 in external validation set. |
[191] | |
Progenitorness score | TCGA GEO |
17 tumor types | Selected 77 progenitor genes and formulated a score to quantify the progenitorness of a sample using its expression profile data. Tumor samples showed significantly higher progenitorness scores than normal tissues for all cancer types, with AUC ranging from 0.746 to 1.000. For the majority of cancers, AUC was above 0.90. |
[192] |