Table 2.
Review of the studies, data sources, their purpose, and machine-learning algorithms reported from 2001 to 2015.
| Study | Year | Tasks | Data source | Leukaemia types involved in the study | Purpose | Methods | |
|---|---|---|---|---|---|---|---|
| 1 | Cho [82] | 2002 | Feature selection and classification | DNA microarray | AML, ALL | Classifying leukaemia types | Pearson's and Spearman's correlation coefficients, Euclidean distance, cosine coefficient, information gain, mutual information and signal-to-noise ratio being used for feature selection |
|
| |||||||
| 2 | Inza et al. [83] | 2002 | Feature selection and classification | DNA microarray | AML, ALL | Classifying cancer, select genes related to cancer | Feature subset selection, case-based, and nearest neighbor classifier |
|
| |||||||
| 3 | Farag [84] | 2003 | Feature selection and classification | Blood cells image | AML, ALL | Classifying leukaemia types | A three-layer backpropagation neural network |
|
| |||||||
| 4 | Futschik et al. [85] | 2003 | Knowledge discovery | Gene expression | AML, ALL | Classifying leukaemia types and select gene expression | Knowledge-based neural networks and evolving fuzzy neural networks and adaptive learning and rule extraction |
|
| |||||||
| 5 | Cho and Won [86] | 2003 | Feature selection, classification, and ensemble classifiers | DNA microarray | AML, ALL | Classifying leukaemia types and select genes related to cancer | Correlation coefficient, Euclidean distance, cosine coefficient, information gain, mutual information, a feed-forward multilayer perceptron, k-nearest neighbor, self-organizing map, and support vector machine. Majority voting, weighted voting, and Bayesian approach |
|
| |||||||
| 6 | Marx et al. [44] | 2003 | Feature selection and classification | DNA microarray | AML, ALL | Classifying leukaemia from nonleukaemia | Principal component analysis and clustering |
|
| |||||||
| 7 | Marohnic et al. [87] | 2004 | Feature selection and classification | DNA microarray | AML, ALL | Classifying leukaemia types | Mutual information and support vector machine |
|
| |||||||
| 8 | McCarthy et al. [88] | 2004 | Knowledge extraction, classification, feature selection, visualization | Proteomic mass spectroscopy data, and gene expression | Melanoma, leukaemia | Cancer detection, diagnosis, and management | Naïve Bayes, support vector machines, instance-based learning (K-nearest neighbor), logistic regression, and neural networks |
|
| |||||||
| 9 | Rowland [89] | 2004 | Classification | Gene expression | AML, ALL | Classifying leukaemia types | Genetic Programming |
|
| |||||||
| 10 | Markiewicz et al. [90] | 2005 | Feature selection and classification | Images of different blast cell | Myelogenous leukaemia | Classifying patients | Support vector machine |
|
| |||||||
| 11 | Tung and Quek [91] | 2005 | Classification | DNA microarrays | ALL | Classifying leukaemia types | A neural fuzzy system, NN, SVM and the K-nearest neighbor (K-NN) classifier |
|
| |||||||
| 12 | Nguyen et al. [92] | 2005 | Classification | DNA microarrays | AML, ALL | Classifying leukaemia types | Support vector machine (SVM) |
|
| |||||||
| 13 | Plagianakos et al. [93] | 2005 | Feature selection and classification | DNA microarrays | AML, ALL | Classifying leukaemia types | artificial neural networks |
|
| |||||||
| 14 | Li and Yang [94] | 2005 | Feature selection and classification | DNA microarrays | AML, ALL | Classifying leukaemia types | SVM, ridge regression and Rocchio, feature selection in recursive and nonrecursive settings |
|
| |||||||
| 15 | Jinlian et al. [95] | 2005 | Knowledge extraction | DNA microarray | AML, ALL | Leukaemia gene association structure | Clusters |
|
| |||||||
| 16 | Diaz et al. [96] | 2006 | Feature selection and classification | DNA microarrays | Acute Promyelocytic Leukaemia | Classifying Acute Promyelocytic Leukaemia (APL) from the non-APL leukaemia | Discriminant fuzzy pattern |
|
| |||||||
| 17 | Feng and Lipo [97] | 2006 | Feature selection and classification | DNA microarrays | AML, ALL | Acute leukaemia types | t-statistics to rank the gene and support vector machines |
|
| |||||||
| 18 | Nguyen and Ohn [98] | 2006 | Feature selection and classification | DNA microarrays | AML, ALL | Classifying leukaemia types | Dynamic recursive feature elimination and random forest |
|
| |||||||
| 19 | Shulin et al. [99] | 2006 | Feature selection and classification | DNA microarrays | AML, ALL | Classifying leukaemia types | Independent component analysis and SVM |
|
| |||||||
| 20 | Chen et al. [100] | 2007 | Feature selection, rule extraction, and classification | DNA microarrays | AML, ALL | Classifying leukaemia types | A multiple kernel support vector machine |
|
| |||||||
| 21 | Ujwal et al. [43] | 2007 | Feature selection and classification | DNA microarray | ALL | Identifying functional cancer cell line classes, classifying leukaemia from nonleukaemia | p value and clustering |
|
| |||||||
| 22 | Perez et al. [101] | 2008 | Classification | Gene expression | AML, ALL | Classify leukaemia types | Hybrid fuzzy-SVM |
|
| |||||||
| 23 | Yoo and Gernaey [42] | 2008 | Feature selection and classification | DNA microarrays data | ALL | Classifying ALL origin cell lines from non-ALL leukaemia origin cell lines | Discriminant partial least squares, principal component and Fisher's linear discriminant analysis, linear discriminant function and SVM, and hierarchical clustering method |
|
| |||||||
| 24 | Avogadri et al. [102] | 2009 | Knowledge extraction | Gene expression | Myeloid leukaemia | Discovering significant clusters | Stability-based methods |
|
| |||||||
| 25 | Eisele et al. [49] | 2009 | Knowledge extraction | Gene expression | CLL | Prognostic markers | Multivariate model |
|
| |||||||
| 26 | Chaiboonchoe et al. [103] | 2009 | Classification | DNA microarrays data | ALL | Identification of differentially expressed genes | Self-organizing maps (neural networks), emergent self-organizing maps (extension of neural networks), the short-time series expression miner (STEM), and fuzzy clustering by local approximation of membership (FLAME) |
|
| |||||||
| 27 | Oehler et al. [46] | 2009 | Knowledge extraction | Gene expression | CML | Identifying molecular markers | Bayesian model averaging |
|
| |||||||
| 28 | Corchado et al. [45] | 2009 | Decision support system preprocessing, filtering, classification, and extraction of knowledge |
Exon arrays | ALL, AML, CLL, CML | Classifying patients who suffer from different forms of leukaemia at various stages | Principal components, clustering, CART |
|
| |||||||
| 29 | Glez-Peña et al. [104] | 2009 | Feature selection and classification | DNA microarray | AML | Classifying gene expression | Fuzzy pattern algorithm |
|
| |||||||
| 30 | He and Hui [105] | 2009 | Classification | DNA microarray | ALL, AML | Classifying leukaemia types | Ant-based clustering (Ant-C) and an ant-based association rule mining (Ant-ARM) algorithms |
|
| |||||||
| 31 | Mukhopadhyay et al. [106] | 2009 | Feature selection and classification | DNA microarray | ALL, AML | Classifying leukaemia types | GA-based fuzzy clustering, neural network, and support vector machine |
|
| |||||||
| 32 | Torkaman et al. [107] | 2009 | Classification | Human leukaemia tissue | ALL, AML | Determining different CD markers | Cooperative game |
|
| |||||||
| 33 | Zheng et al. [108] | 2009 | Feature selection | DNA microarray | ALL | Gene ranking | Knowledge-oriented gene selection |
|
| |||||||
| 34 | Mehdi et al. [109] | 2009 | Knowledge acquisition | Gene expression | ALL, AML | Pattern clustering | K-nearest neighbor technique |
|
| |||||||
| 35 | Porzelius et al. [110] | 2011 | Feature selection, classification | Microarray and clinical data | ALL | Risk prediction | Feature selection approach for support vector machines as well as a boosting approach for regression models |
|
| |||||||
| 36 | Chen et al. [111] | 2011 | Feature selection, data fusion, class prediction, decision rule extraction, associated rule extraction, and subclass discovery | DNA microarray | ALL, AML | Select gene, classify leukaemia types, rule extraction | Multiple kernel SVM |
|
| |||||||
| 37 | Gonzalez et al. [112] | 2011 | Classification | Bone marrow cells images | ALL, AML | Classifying leukaemia subtypes | Segmentation method to obtain leukaemia cells and extract from them descriptive characteristics (geometrical, texture, statistical) and eigenvalues |
|
| |||||||
| 38 | Tong and Schierz [113] | 2011 | Feature selection and classification | DNA microarray | ALL, AML | Classifying two-class oligonucleotide microarray data for acute leukaemia | Hybrid genetic algorithm-neural network |
|
| |||||||
| 39 | Chauhan et al. [114] | 2012 | Classification | Genotype | ALL, AML | Identifying gene-gene interaction | Classification and regression tree |
|
| |||||||
| 40 | Escalante et al. [115] | 2012 | Feature selection and classification | The morphological properties of bone marrow images | ALL, AML | Classifying leukaemia subtypes | Ensemble particle swarm model selection |
|
| |||||||
| 41 | Yeung et al. [116] | 2012 | Feature selection and classification | Gene expression | CML | select gene, and predicted functional relationships | Integrating gene expression data with expert knowledge and predicted functional relationships using iterative Bayesian model averaging |
|
| |||||||
| 42 | Manninen et al. [117] | 2013 | Classification | Flow cytometry data | AML | Prediction method for diagnosis of AML | Sparse logistic regression |
|
| |||||||
| 43 | El-Nasser et al. [118] | 2014 | Classification | DNA microarrays | ALL, AML | Classifying leukaemia types | Implement enhanced classification (ECA) algorithm, SMIG module, and ranking procedure. |
|
| |||||||
| 44 | Singhal and Singh [119] | 2015 | Feature selection and classification | Image based analysis of bone marrow samples | ALL | Classifying leukaemia subtypes | Multilayer perceptron (MLP), linear vector quantization (LVQ), k-nearest neighbor (k-NN), and SVM |
|
| |||||||
| 45 | Yao et al. [120] | 2015 | Feature selection and classification | DNA microarrays | ALL, AML, the mixed-lineage leukaemia (MLL) data | Classifying leukaemia subtypes | Random forests and ranking features |
|
| |||||||
| 46 | Rawat et al. [121] | 2015 | Computer-aided diagnostic system, feature selection, and classification | Bone marrow cells in microscopic images | ALL | Diagnosis lymphoblast cells from healthy lymphocytes | Support vector machine |
|
| |||||||
| 47 | Kar et al. [122] | 2015 | Feature selection and classification | DNA microarrays | ALL, AML, the mixed-lineage leukaemia (MLL) data | Classifying leukaemia subtypes | Particle swarm optimization (PSO) method along with adaptive K-nearest neighborhood (KNN) |
|
| |||||||
| 48 | Li et al. [123] | 2016 | Classification | Gene expression | AML | Identifying feature genes | Support vector machine (SVM) and random forest (RF) |
|
| |||||||
| 49 | Dwivedi et al. [124] | 2016 | Classification | Microarray gene expression | ALL, AML | Classifying leukaemia subtypes | Artificial neural network (ANN) |
|
| |||||||
| 50 | Krappe et al. [125] | 2016 | Classification | Image based analysis of bone marrow samples | Leukaemia | Diagnosis of leukaemia and classifying 16 different classes for bone marrow | Knowledge-based hierarchical tree classifier |
|
| |||||||
| 51 | Li et al. [123] | 2016 | Classification | DNA microarrays | AML, ALL | Classifying leukaemia subtypes | A weighted doubly regularized support vector machine |
|
| |||||||
| 52 | Ocampo-Vega et al. [126] | 2016 | Feature selection and classification | DNA microarrays | AML, ALL | Classifying leukaemia subtypes | Principal component analysis and logistic regression |
|
| |||||||
| 53 | Rajwa et al. [127] | 2016 | Classification | Flow cytometry data | AML | Determining progression of the disease | Nonparametric Bayesian framework |
|
| |||||||
| 54 | Ni et al. [128] | 2016 | Classification | Flow cytometry data | AML | Analyzing minimal residual disease | Support vector machines (SVM) |
|
| |||||||
| 55 | Savvopoulos et al. [48] | 2016 | Knowledge extraction | CLL cells in peripheral blood | CLL | Capturing disease pathophysiology across patient types | Temporally and spatially distributed model |