Table 1.
Review of various cancer prediction techniques.
Sr. no. | Paper name | Objective | Technique/tool | Dataset | Findings |
---|---|---|---|---|---|
1 | [8] | To design a method to classify and predict classes of cancer | Neighborhood analysis, DNA microarrays, self organizing maps | 27 ALL samples from Dana-Farber Cancer Institute, 11 adult AML samples from the Cancer and Leukemia Group B (CALGB) leukemia cell bank | Feasible method. Proper experimental care is required |
| |||||
2 | [2] | To classify samples of cancer for gene expression data | Computational analysis, affymetrix oligonucleaotide arrays, neighborhood analysis, genecluster software | 38 leukemia samples (11 AML, 27 ALL), for testing 34 samples (14 AML, 20 ALL) | Genes with no correlation provide a better result, and the median prediction strength is 0.86 |
| |||||
3 | [9] | To specify the specific categories of cancer using their gene expression | ANNs, cDNA microarrays, DeArray software | NCI, ATCC, MSKCC, CHTN, DZNSG, National Institutes of Health | It can work with nonlinear features also. It is robust. It also achieves high sensitivity and specificity. |
| |||||
4 | [10] | To create a framework for predicting predefined classes of tumor | Compound covariate prediction, BRB ArrayTools | Hereditary breast cancer dataset of 22 patients [11] | Good setter for comparing prediction methods. Require some improvements. |
| |||||
5 | [12] | To develop a classification system for DNA microarray gene expression data | SOMs, Cluster and TreeView software, PCA, KNN | Multiple datasets have been used, such as one with 99 samples, the other with 42 selections, | Gene expressions provide an excellent way of diagnosing patients with medulloblastomas |
| |||||
6 | [13] | To propose a method that performs classification on interval-scaled attributes basis | PCA, FA, fuzzy FA | 203 samples (a subset of the actual dataset used in [14]) | Successfully used in supervised learning. FA provides more information compared to surgical-pathological staging |
| |||||
7 | [15] | To propose a method for gene feature selection | Multiple SVM-RFE | Four gene expression datasets available on Kent Ridge Bio-Medical Data Set Repository | MSVM-RFE has classification accuracy better than SVM-RFE. SVM's performance has been improved. |
| |||||
8 | [16] | To propose a framework for addressing the problem of integration of different data types | Generalized singular value decomposition | Fourteen breast cancer cell lines from American Type Culture Collection | Gene expression and copy number data are being analyzed. Improvements can be made to use other data types also. |
| |||||
9 | [17] | To propose a method used to find tissues of the tumor with different gene expression data | ssEAM, PSO | NC160, acute leukemia, ALL dataset | ssEAM performs better than PNN, ANN, LVQ1and KNN at a 0.05 significance level |
| |||||
10 | [18] | To present a selection method for analyzing gene expression data | RBF neural network, rough based feature selection method, naïve Bayes, linear SVM | ALL, AML, lung cancer and prostate cancer dataset (http://sdmc.lit.org.sg/GEDatasets/Datasets.) | The best classification accuracy rate of 99.8% |
| |||||
11 | [19] | To present a framework for discovering cancer classes. | Permutation technique, cluster ensemble, cluster validity index (DAI) | 3 synthetic and 4 real datasets (leukemia [2], Novartis multitissue [20], lung cancer [14], St. Jude [21]) | DAI finds the number of classes correctly and outperforms other existing methods |
| |||||
12 | [22] | To present a method based on gene expression for classifying NSCLC | Hierarchical clustering, SpotFire decision site, proportional hazards model | 91 NSCLC, six normal lung tissues from GSE3526 (Duke University) | Gene signatures provide the best way for histopathological classification |
| |||||
13 | [23] | To propose a classifier predicting disease in CRC patients | Agilent 44K oligonucleotide arrays, Kaplan–Meier method, unsupervised hierarchical clustering | 188 training samples (NCI, LUMC, SGH) and 206 testing samples (Institute Catalad'Oncologia, Spain) | Eighty-six percent of patients of the validation dataset are identified as low-risk patients. First prognostic technique for CRC |
| |||||
14 | [24] | To propose a framework that combines genome-wide copy number and expression data | L1-L2 constrained regression, local and global search strategies | 89 samples of breast cancer Dataset (UG San Francisco and California Pacific Medical Center [25]) | Outperforms other existing methods accuracy |
| |||||
15 | [26] | To propose a framework that combines other models that describes gene interaction. | Bayesian model, Gibbs distribution, ANOVA test, parallel programming with GPU/CPU | GSE4290, DREAM dataset | Specificity of 0.99 has been achieved. Better performance than Enet and VAR |
| |||||
16 | [27] | To propose the extended framework for segmentation of breast tumor | Multichannel MRFs, kinetic observation model, Gaussian mixture model | DCE MRI images of breast cancer | AOC of 0.9 has been achieved using multichannel MRF compared to AOC of 0.89 in single-channel MRF. Better segmentation results when applied to SVM |
| |||||
17 | [28] | To propose a gene selection method | LSLS, wrapper method, SVM | Six datasets available at Kent Ridge Biomedical Data repository | LSLS performs better than KW and SPFS |
| |||||
18 | [29] | To present a novel method classifying tumor samples. | RPCA, LDA, SVM | Nine different publically available datasets (acute leukemia data [2], colon cancer data, glimos data, medulloblastoma data, prostate cancer data, 11_tumor data, and brain tumor data) | Performance is measured using LOO-CV, accuracy, and AUC. A feasible and effective method. |
| |||||
19 | [30] | To propose a method based on deep learning for inferring target genes expression | D-GEX | Microarray GEO dataset, RNA-Seq-based GTEx dataset | Outperforms linear regression (15.33 relative improvement) and KNN. The lower error rate in most of the genes (81.31%). |
| |||||
20 | [31] | To develop a fused network identifying KIRC stages | Gene expression and DNA methylation data, SNF, SNFTool, sparse partial least square regression, LASSO label prediction method | The Cancer Genome Atlas KIRC data (TCGA data portal) | High prediction accuracy than KNN, MLW, and WDC. It is robust. |
| |||||
21 | [32] | To classify widely and rarely expressed genes | Incremental feature selection method, mRMR, RNN | Gene expression dataset available at the Human Protein Atlas [33] | GO terms and KEGG are used at the functional level. Youden's indexes are 0.739 and 0.639 for normal and cancer tissues, respectively. |
| |||||
22 | [34] | To develop a light-weight CNN for classifying breast cancer | CNN, array-array intensity correlation, R-Studio, batch normalization | Breast cancer dataset from Pan-Cancer Atlas | Achieves 98.76% accuracy |
| |||||
23 | [35] | To propose a method for classifying different types of cancer. | BPSO-DT, CNN, deep learning | Cancer types: RNA sequencing values from tumor samples/tissues available at Mendeley datasets | It achieves an accuracy of 96.90%. Various evaluation parameters are recall, precision, and F1 score. |
| |||||
24 | [36] | To propose a method based on NMF to classify tumor | NMF, SNMF, SVM | Colon cancer dataset [37], acute leukemia dataset, medulloblastoma dataset | It is effective and efficient. The effect of sparseness is low. |
| |||||
25 | [38] | To propose a model for biclustering data of gene expression. | PCA, GLPCA, DHPCA, | SRBCT, medulloblastoma, colon cancer, 11_Tumors | It is compared with PCA, GLPCA, GNMF, ONMTF, and NMTFCoS. It provides better accuracy than others. |
| |||||
26 | [39] | To present a framework for predicting the expression of genes employing nonlinear features | Unsupervised clustering algorithm, L-GEPM, LSTM neural network | GEO data from LINCS cloud, GTEx, and 1000G RNA-Seq data | Performs better than D-GM, LR-L1, and KNN-R. Target genes extracted are much closer to the actual gene expression. Flexible and superior for NL features. |
| |||||
27 | [40] | To propose a multilayer framework to classify multitissues of cancer. | CNN, RNA sequencing, supervised learning, stochastic gradient descent optimization, back-propagation | 11093 samples from the Cancer Genome Atlas | 98.93 percent overall accuracy and 0.99 AUC have been achieved |
| |||||
28 | [41] | To propose a gene selection method that can classify tissues in multicategory datasets | PLS, linear support vector classifier, MATLAB, OSU_SVM3.00 toolbox linear SVC, SVM | MIT AML and ALL dataset, SRBCT datasets | It is efficient and robust. It works well for both two-category and multicategory datasets. |
| |||||
29 | [42] | To propose an ST model for finding the effects of CNAs | LST and NA, dynamic modeling, transcriptional bursting, transcriptional oscillation, circular binary segmentation | NCBI/GEO database | It shows the use of mathematical theory to investigate the findings and for a better understanding of cancer bio |
| |||||
30 | [43] | To propose a muti-fusion-based method for profiling gene expression under nonthermal plasma treatment. | Dempster–Shafer method, fuzzy C-Means clustering method, MATLAB R2016b | NCBI Gene Expression Omnibus under GEO (GSE59997) | Reduces uncertainty and increases reliability. The use of C-means finds changes in genes in various nonthermal plasma treatments. |
| |||||
31 | [44] | To present a survey of 1D CNN and its applications. | NA | NA | 1D CNN works well with small data and where fewer computations are required. It also works where low-cost implementation is needed. |
| |||||
32 | [45] | To propose a classification method for ECG signal images based on 2D CC. | CNN, Intel17-5930K CPU, and NVIDIA GTX1080 GPU | MIT-BIH Arrhythmia database | 2D CNN outperforms 1D CNN. 2D CNN is more accurate and robust. 1D CNN works well with limited data. |