Abstract
The topic of sparse representation of samples in high dimensional spaces has attracted growing interest during the past decade. In this work, we develop sparse representation-based methods for classification of clinical imaging patterns into healthy and diseased states. We propose a spatial block decomposition method to address irregularities of the approximation problem and to build an ensemble of classifiers that we expect to yield more accurate numerical solutions than conventional sparse analyses of the complete spatial domain of the images. We introduce two classification decision strategies based on maximum a posteriori probability (BBMAP), or a log likelihood function (BBLL) and an approach to adjusting the classification decision criteria. To evaluate the performance of the proposed approach we used cross-validation techniques on imaging datasets with disease class labels. We first applied the proposed approach to diagnosis of osteoporosis using bone radiographs. In this problem we assume that changes in trabecular bone connectivity can be captured by intensity patterns. The second application domain is separation of breast lesions into benign and malignant categories in mammograms. The object classes in both of these applications are not linearly separable, and the classification accuracy may depend on the lesion size in the second application. Our results indicate that the proposed integrative sparse analysis addresses the ill-posedness of the approximation problem and produces very good class separation for trabecular bone characterization and for breast lesion characterization. Our approach yields higher classification rates than conventional sparse classification and previously published convolutional neural networks (CNNs) that we fine-tuned for our datasets, or utilized for feature extraction. The BBLL technique also produced higher classification rates than learners using hand-crafted texture features, and the Bag of Keypoints, which is a sophisticated patch-based method. Furthermore, our comparative experiments showed that the BBLL function may yield more accurate classification than BBMAP, because BBLL accounts for possible estimation bias.
Keywords: sparse representation, ensemble classifiers, computer-aided diagnosis
1. INTRODUCTION
The research fields of computer-aided tissue characterization, diagnosis, and prognosis have gained significant interest in the past few decades [1, 2]. These techniques combine concepts from image analysis, pattern recognition and machine learning to separate diseased from healthy subjects. Applications span a wide range of clinical areas and diseases such as detection of microcalcifications in mammography screening systems [3, 4], early diagnosis of Alzheimer’s disease [5, 6], cancer [7, 8], soft and hard tissue characterization for age-related diseases [9, 10, 11], and cardiovascular diseases. This popularity is mainly attributed to the potential for timely characterization of tissues that may reduce the mortality rate from diseases. Frequently, these automated diagnostic systems extract information from medical imaging modalities such as Magnetic Resonance Imaging (MRI) and Computed Tomography (CT) scans to produce a binary decision or a likelihood score that characterizes the state of a lesion as healthy or diseased. Sometimes multiclass classification may be needed to characterize different lesion types as in cancer applications.
Tissue classification is typically achieved by supervised machine learning approaches. Among numerous techniques that proposed generative or discriminative models, use of kernels, and linear or nonlinear approaches, sparse classification techniques have shown promise and applicability for characterizing visual patterns in Region of Interest (ROI) based analyses. Sparse representation techniques have been applied to extensive fields including coding, feature extraction and classification, superresolution [12], and regularization of inverse problems [13]. Exploration of signal’s sparsity may provide insight into the important patterns for prototyping the object categories. The sparse representation is concise for compression and naturally discriminative for classification [14]. Sparse representation techniques calculate a sparse linear combination of atoms for describing a vector sample using an overcomplete dictionary of prototypes. If the representations of these linear combinations are sufficiently sparse, then they can be used for object recognition and classification of imaging patterns.
Finding an accurate sparse representation of a dataset is an NP-hard problem [15], therefore only approximate solutions can be found. In [14], the authors proposed a sparse representation classification (SRC) method for face recognition that yielded a recognition rate higher than 90% for both the Yale B and AR datasets. The authors performed experiments using random noise corruption and varying level of contiguous occlusion that showed improvement of the recognition rates. Other notable applications of sparse coding methods were published in [16, 17] reporting high levels of classification accuracy. A regression and spectral graph analysis based method has been used for sparse representation, and compared with other methods, such as PCA, SparsePCA, LDA in [18]. That method was evaluated on CMU, PIE, and Yale-B datasets.
The sparsity preserving projections (SPP) technique was proposed in [17]. It solves a modified sparse representation problem to create a sparse reconstructive weight matrix. Then a low dimensional feature space is calculated as a minimizer of an objective function that includes the weight matrix. The advantage of this method is the invariance to rescaling, rotation and translation of the data. It also produces natural discriminant representations for supervised and unsupervised problems. The SPP method was applied to face recognition on the Yale, AR and extended Yale B datasets. It was compared with PCA, local preserving projection (LPP) and neighhborhood preserving embedding (NPE). SPP yielded the highest accuracy for these data sets among the compared methods [17].
In this work, we propose a method for finding sparse solutions by reducing the dimensionality and correcting the bias of estimation using ensembles of Bayesian decision learners. We introduce a classification method that calculates sparse representations of localized block structures for given ROIs and builds an ensemble model of sparse learners to make a decision on lesion category. We hypothesize that the combination of relative sparsity scores of multiple disjoint sparse representations computed from multiple dictionaries will yield a more robust decision function than the decision function derived from a single dictionary used in conventional sparse representation classification techniques. We also propose a block-based log likelihood (BBLL) decision system and a minimum Bayes error-based approach for determining the decision threshold that will address classification bias. The optimized parameters may be used to define probability decision scores (PDS) in order to determine confidence intervals for prediction. This approach is advantageous in constructing overdetermined linear systems and addressing numerical optimization problems, such as convergence to infeasible solutions. The development of a classifier ensemble learning approach and the introduction of two Bayesian decision functions aim to improve classification accuracy.
We tested the predictive and generalization capability of our system for two diverse and significant clinical applications; osteoporosis diagnosis and breast lesion characterization. Osteoporosis is an age-related systemic skeletal disorder characterized by reduction in bone mass and deterioration in bone structure [1]. Early diagnosis can effectively predict fracture risk and prevent the disease [11, 19]. Furthermore, breast cancer is one of the leading causes of death for women [7]. Early detection and characterization of breast lesions is important for increasing the life expectancy and quality of health of women. Because of its significance, automated detection and diagnosis of breast cancer is a popular field of research [20, 3, 21, 22, 23, 24, 25, 26, 27]. An earlier version of this work appeared in [28, 29]. In this work we propose a new ensemble of sparse learners based on log likelihood ratio of sparsity scores that we denote by BBLL, which improved the previous results produced by BBMAP [28]. Furthermore, we introduce a logistic function and class separation optimization algorithm for BBLL and BBMAP decision functions, investigate the properties of solutions produced by the block-wise analysis, and have significantly extended the performance evaluation experiments relative to [29]. Finally, we compared the performances of the two decision functions BBMAP and BBLL. The results suggest that our proposed approach has the potential to be used for computer-aided diagnosis.
2. Sparse Representation and Classification
The sparse representation technique calculates the linear representation of a test sample after constructing a dictionary of labeled training samples. This representation is used for determining the class label of a test sample. Suppose there are k distinct classes in a dataset corresponding to groups of subjects characterized by health states, s samples that typically are imaging patterns to be classified so that , where si is the number of samples in ith class. A dictionary matrix is formed from the training set and is defined as M = [v1,1, v1,2, … , vk,sk], where vi,h is a column vector for the hth sample from ith class. In image classification applications, a p × q grayscale image or region of interest (ROI) forms a vector , l = p × q using lexicographical ordering.
A new test sample , which is an imaging pattern of unknown health state in a diagnostic application, can be represented by a linear combination of the training imaging patterns , where are scalar coefficients. Then the test sample y can be approximated by:
| (1) |
where x0 is a sparse solution. This solution is unique, if the nonzero entries in x are less than l/2 [30]. We can regulate this ill-posed problem by forcing sparse constraints. If the solution x0 is sparse enough, it is approximated by the solution of the l1-minimization problem via convex relaxation [31, 32]:
| (2) |
In the presence of imaging artifacts, such as noise, partial voluming, or other sources of variability, such as anatomical variations between subjects in the same class (expressed by ϵ), an exact solution is not feasible, so the problem becomes:
| (3) |
We may represent (2) as a linear programming (LP) problem and utilize an interior point solver to find a solution. We may solve the convex optimization problem in (3) by utilizing second-order cone programming (SOCP) and setting nonlinear inequality constraints.
To classify y, conventional sparse representation classification (SRC) methods minimize the residual between y and [14]:
| (4) |
The quantity indicates that only components of in class i are used to reconstruct . The function sets the components of x that are not associated with class i to 0.
3. Block Decomposition and Ensemble Classification
Conventional sparse representation techniques may not find a good approximation of the solution vector, if the pattern dimensionality is high and the number of training samples is small. This is a typical case for medical image classification applications that may include lesions of variable types and limitations in the availability of training samples. We propose to build an ensemble of sparse representation classifiers based on block decomposition of the input ROI to address these shortcomings.
3.1. Block Decomposition
We first divide each training ROI into non overlapping blocks of size m × n. Thus, each ROI image is expressed as I = [B1, B2, … , BNBL], where NBL is the number of blocks in an image. The dictionary Dj, where j = 1, 2, … , NBL corresponds to the block Bj at the same index within the image ROI. The dictionary Dj for all the s images can be represented as follows:
| (5) |
where is the column vector denoting the hth sample, ith class, jth block Bj.
3.2. Ensemble Classification
We propose to classify each test sample by constructing ensembles of classifiers that solve a set of sparse coding and classification problems, or hypotheses corresponding to the block components. Given a test sample yj in jth block, we find the solution xj of the regularized noisy l1-minimization problem:
| (6) |
where j = 1, 2, … , NBL. The test sample yj will be assigned to the class , which has minimum approximation error calculated by (4).
We propose ensemble learning techniques in a Bayesian probabilistic setting as weighted sums of classifier predictions. We propose a function that applies majority voting to individual hypotheses (BBMAP) and an ensemble of log likelihood scores computed from relative sparsity scores (BBLL). The concept of decomposition hypothesis is displayed in Figure 1 for a benign lesion. In this example, the weights of the benign base vectors form a more sparse solution vector than the weights of the malignant base vectors.
Figure 1:
First row, left to right: a test ROI, reconstructed sample using both classes, using benign atoms only, and using malignant atoms only. Second row: the corresponding solution vectors produced by the solver in (15). Third row, left to right: a 16 × 16 block of the test ROI, reconstructed block using all atoms, using benign atoms only, and using malignant atoms only. Fourth row: the corresponding solution vectors produced by the solver in (15). The benign base vectors form a more sparse solution vector than the malignant base vectors.
3.2.1. Maximum a Posteriori decision function (BBMAP)
The class label for each test sample is determined by voting over the ensemble of NBL block-based classifiers. The predicted class label is given by
| (7) |
where is the composite extracted feature from the test sample given by the solution of (6). The probability for classifying into class ωi is
| (8) |
| (9) |
where is an indicator function whose values are determined by the individual classifier decisions.
3.2.2. Log likelihood sparsity-based decision function (BBLL)
We define a likelihood score based on the relative sparsity scores , calculated in the sparse representation stage of each classifier
| (10) |
We calculate the expectation of LLS() denoted by ELLS() over all classifiers that is determined by individual classification scores derived from (10):
| (11) |
The introduction of the log-likelihood score accommodates the definition of a decision function for the state . To determine the class we apply a decision threshold τLLS to ELLS().
| (12) |
Finding optimal τLLS to reduce bias.
This threshold is expected to be equal to 0, if there is no estimation bias, but may be experimentally determined as the minimizer of a Bayes-type risk function. Hence the optimal τLLS value, which we denote by , can be determined by sampling the domain of τLLS and calculating true positive and true negative rates. Next, the optimal value is determined by the intersection of TPR and TNR curves. An example of this procedure for determining is displayed in Figure 2, left side.
Figure 2:
An example of TPR and TNR curves versus τLLS for determining (left) and the sigmoid probability decision score PDS after calculating the parameters m, c for (13) (right).
Calculating probabilistic decision scores.
In the next stage, we aim to convert the log likelihood decision scores to bounded posterior probability values using a logistic function. This function is denoted by Probability Decision Score (PDS) and is expressed by
| (13) |
To calculate the model parameter c, we require that this function be equal to 50% probability for , hence . To estimate m, we set a fixed probability level PDSmin (e.g., 5%, 10%) for the smallest value ELLSmin.
| (14) |
In Figure 2 (right side) we display the graph of PDS versus ELLS for one experiment. We can use PDS to express margins of uncertainty for classification in percentiles.
3.3. Proposed Solver for BBMAP and BBLL
The optimization problem in equation (6) includes a non-linear inequality condition and a non-linear objective function. The well-known linear programming solver formulation in the conventional SRC method in equation (2) was designed for linear equality conditions. In our method we introduce a margin of error ϵ for approximation as shown in equation (6). We also constrain our solutions to be positive semidefinite. We formulate the optimization problem as
| (15) |
We utilize second order cone programming (SOCP) to solve this optimization problem. SOCP allows us to define non-linear inequality conditions and non-linear objective functions to solve convex optimization problems. SOCP can be used to implement linear programming (LP), convex quadratic programs (QPs) and convex quadratically constrained quadratic programs (QCQPs) [33].
We utilize interior-point optimization to solve this problem. This algorithm optimizes a barrier function that incorporates the constraints and the objective function, and searches in the interior of the feasible region to obtain the optimal solution by applying a Newton step.
Stopping Criteria.
The following tolerance parameters are stopping criteria for SOCP. The constraint tolerance (TolCon) is the tolerance on the constraint violation, it is a positive scalar. If we use constraint tolerance in SOCP and it returns a point with c(x) > TolCon, then the constraints are violated at point x. The iterative search will continue, even if the constraint tolerance is not satisfied, unless there are some other reasons that halt it. Optimality tolerance (TolFun) denotes the termination tolerance on the first-order optimality. The first-order optimality is a measure of distance of point x to the optimal point. It is a necessary, but not sufficient condition. The step tolerance (TolX) is a positive scalar that is a lower bound for the step of the solver on x at iteration i. So the solver will stop when ∥xi − xi+1∥ < TolX. Maximum function evaluations (MaxFunEvals) sets the maximum number of function evaluations allowed. Maximum iterations (MaxIter) is the maximum number of iterations allowed. The algorithm may stop before reaching the value of MaxIter because of the other values of tolerances that may stop the solver.
4. Experiments and Comparisons to Other Methods
In this section we aim to investigate whether the proposed ensemble of block-based sparse classifiers improves the classification performance of conventional sparse representation. We also aim to compare the proposed technique with texture-based and patch-based classification methods. Finally, we compare the performances of our two decision functions denoted by BBMAP and BBLL.
We applied these methods to characterize tissues and lesions from medical imaging modalities. The application domains are: characterization of trabecular bone for osteoporosis diagnosis, and characterization of breast lesions. We evaluated the classification performance of our block-based sparse representation method by computing the true positive rate (TPR), true negative rate (TNR), classification accuracy (ACC), and area under the ROC curve (AUC) in leave-one-out cross-validation experiments.
Next, we introduce conventional texture-based classifiers and the Bag of Keywords method that we implemented and used for comparisons with our proposed method. Then we describe the validation experiments for trabecular bone characterization and breast lesion characterization, and discuss the results.
4.1. Conventional Texture-based Classifiers
These techniques calculate a pre-defined (or hand-crafted) feature set representing the texture content, then select a feature subset to reduce the dimensionality, and utilize machine learning techniques to make a decision [34].
4.1.1. Texture Features
We construct a feature set for texture description that consists of fractal dimension, discrete wavelet frames, wavelet Gabor filter bank, local binary patterns, discrete Fourier and Cosine transforms, Law’s texture energy maps, edge histogram, and statistical co-occurrence indices.
Fractal Dimension.
The fractal dimension measures the roughness and granularity of the image intensity function. The box counting method is utilized to compute the fractal dimension [35].
Discrete Wavelet Frames.
The Discrete Wavelet Frames technique employs a Haar filter bank for multi-scale decomposition. It computes the orthogonal projections and residues for a full discrete wavelet expansion. We form a texture descriptor by computing energy, variance, entropy, and kyrtosis measures. The number of features depends on the number of levels of decomposition. We selected 3 levels resulting in 10 features.
Wavelet Gabor Filter Bank.
The Gabor filter is a linear filter that can extract relevant characteristics for multiple frequencies and orientations. Gabor functions form a complete but non-orthogonal basis, and the dictionary is produced by dilations and rotations of the mother Gabor wavelet. The total number of Gabor filter-based features is 96.
Local Binary Patterns (LBP).
The Local Binary Patterns (LBP) feature computation technique compares the intensity of each pixel to the intensities of its eight neighbors, then constructs an eight-digit binary number and calculates the histogram of these numbers as a texture descriptor [36]. This procedure yields a feature dimensionality of 64.
Discrete Fourier and Cosine Transforms.
We calculate DFT and DCT coefficients to capture the spectral characteristics of texture. We use the 8 × 8 coefficients corresponding to lower frequencies, therefore the feature dimensionality for each transform is 64.
Law’s Texture Energy Masks.
The Law’s texture energy measures are computed by a set of 5 × 5 convolution masks to measure the amount of variation within a fixed-size window. The Laws texture descriptor has 129 components.
Edge Histogram.
We compute the intensity gradient magnitude ∣∇f∣ and then calculate its histogram by , k = 0, … , L − 1. In our experiments we set L = 16.
Gray Level Co-Occurrence Matrix (GLCM).
We create the GLCMs by calculating the frequency of occurence of gray-level pairs in horizontal, vertical, or diagonal pixel adjacencies on the image plane. The dimensionality of the GLCM features is 256.
4.1.2. Texture Feature Selection
The correlation-based feature selection (CFS) technique selects features that are highly correlated with the pattern classes, but have low correlation with the other features. The utilized optimization strategies were best first search (CFS-BF) and genetic algorithm-based search (CFS-GA). CFS-BF searches for the space of feature subsets through greedy hillclimbing that may include backtracking. CFS-GA uses genetic search to optimize a feature set-related objective function.
4.1.3. Classifiers and Discriminant Functions
Naïve Bayes (NB).
This model assumes conditional statistical independence , where D is the dimensionality of the feature space and x = (x1, x2, … , xD)T.
Bayes Network (BN).
A Bayes network is a probabilistic graphical model that uses a directed acyclic graph to represent a set of random variables and their conditional dependencies. We evaluated implementations with multiple search algorithms for the BN structure including Hill climbing, K2, Tabu search, and multiple estimators for calculating the conditional probabilities such as direct estimation from data, and Bayes model averaging.
Bagging.
This method generates subsets by sampling from a training set in uniform manner and with replacement. It trains classifiers by bootstrap samples from training set, and obtains the prediction by majority voting on individual decision tree learners [37]. We tested fast decision, alternating decision, and best first decision trees. In the parameter retuning experiments we also adjusted for the size of each bag and the number of iterations.
Random Forest (RF).
Random Forest is an ensemble learning system that builds multiple decision trees from subsets of the training set and uses random feature selection for node splitting. Random forests address the overfitting propensity of the decision trees and have displayed robustness relative to noise [38]. We evaluated performance using multiple numbers of features, number of trees and maximum tree depths.
4.2. Bag of Keypoints Classifiers
In our experiments we have also evaluated the classification performance of the Bag of Keypoints (BoK) [39, 40] that is a patch-based technique comparable to our method. Bag of Features methods have been applied to image recognition and classification and have produced very good results. The Bag of Keypoints technique originates from the Bag of Features. This method applies feature detection, extraction, and clustering for finding the most representative features in the training database. In the next step they build a vocabulary that consists of the frequency of occurrence of these features. In the testing stage, features are extracted from the unlabeled image and encoded using the vocabulary that was built during training. Then a learning method is applied to classify the test pattern into one of the classes.
In this work we employed the support vector machine (SVM) classifier for learning a discriminant function from the encoded features and classifying unlabeled samples. In SVM we evaluated the use of linear or radial basis function kernels. We utilized radial basis function kernels for our experiments to address possible non-linearity of the decision boundary. The main parameters that we tuned were the fraction of features to keep for building the vocabulary, the vocabulary size, the penalty coefficient for misclassification of training samples in SVM, and the kernel scale.
4.3. Deep Neural Networks
Deep learning methods and more specifically convolutional neural networks have recently re-emerged as powerful techniques for image segmentation, object recognition and classification [41, 42, 43, 44, 45, 46, 47]. These techniques simultaneously learn the set of features and the decision function. In contrast to traditional texture-based techniques, deep networks do not need to receive a hand-crafted feature set as input. Deep learning methods have been applied to biomedical image datasets and have produced very good results.
We employed sequential and residual networks of varying complexity such as Alexnet [41], Googlenet [43], Resnet18 [46], and Inceptionv3 [44]. Because our datasets are small, we employed transfer learning techniques to adjust the weights of pre-trained networks, instead of learning the decision function from the beginning as described in [45, 48]. We set the learning rates of the convolutional layers to much lower values than the final layers. In this way we largely preserved the pre-trained layer weights at the initial and intermediate convolutional stages. The main parameters that we tuned were the learning rate, learning rate drop, size of mini-batch, and the number of epochs. Another strategy that we tested was to utilize the pre-trained AlexNet for feature extraction, downsample the extracted features, and apply sparse-based classification to the downsampled features. We denote this method by AlexNet-Sp.
4.4. Bone Characterization
Our purpose is to distinguish between healthy and osteoporotic subjects. The Texture Characterization of Bone radiograph images (TCB) challenge dataset contains labeled digital radiographs of 87 healthy and 87 osteoporotic subjects for training and testing (available online in http://www.univ-orleans.fr/i3mto/data, last access in 05/2018).The calcaneus trabecular bone images in the dataset have an ROI size of 400 × 400 pixels and the pixel size is 105 μm. A more detailed description of the dataset is provided in [11, 19]. The experimental procedures involving human subjects were approved by the Institutional Review Board of the institution that provides the data.
In the performance evaluation of conventional texture-based techniques of Section 4.1.1, we calculated 723 texture-related features. We selected features using correlation-based feature selection with best first search (CFS-BF) and correlation-based feature selection with genetic algorithm search (CFS-GA) as described in Section 4.1.2.
Next, we implemented block-wise ensembles for texture-based classification. For each block set, we apply dimensionality reduction and training, followed by ensemble classification of the test samples. We conducted leave-one-out cross-validation experiments and report all results produced by non-block-based and block-based techniques in Table 1. Regarding the non-block-based methods, we note that CFS-GA yields an overall better performance than CFS-BF and no-feature selection. This implies that CFS-GA effectively selects distinguishing features from the entire set. Among the tested classifiers, Bagging accomplished the highest performance with an ACC of 67.8%. We performed ROC experiments using CFS-BF feature selection and display the graphs in Fig. 3. We note that NB yielded the largest area under the curve for the leave-one-out experiment followed by BN. Among the block-based methods, Naïve Bayes classification produced the highest ACC at 59.2%, and the highest AUC was 61.5%.
Table 1:
Classification performance for bone characterization using individual texture-based classifiers, or their ensembles, as denoted by the block size.
| Method | Feat. Sel. | Block Side | TPR (%) | TNR (%) | ACC (%) | AUC (%) |
|---|---|---|---|---|---|---|
| NB | No | 400 | 57.5 | 64.4 | 60.9 | 63.5 |
| 100 | 31.3 | 56.0 | 44.3 | 38.0 | ||
| 50 | 45.6 | 54.7 | 51.7 | 52.3 | ||
| 25 | 44.3 | 52.9 | 47.7 | 48.4 | ||
| BN | No | 400 | 58.6 | 65.5 | 62.1 | 65.3 |
| 100 | 38.6 | 62.6 | 51.2 | 48.5 | ||
| 50 | 33.3 | 42.5 | 37.9 | 38.1 | ||
| 25 | 47.6 | 55.3 | 53.5 | 49.1 | ||
| Bagging | No | 400 | 70.1 | 64.4 | 67.3 | 68.1 |
| 100 | 28.9 | 67.9 | 43.8 | 55.2 | ||
| 50 | 41.5 | 65.6 | 52.1 | 49.6 | ||
| 25 | 44.7 | 53.9 | 49.4 | 46.4 | ||
| RF | No | 400 | 65.5 | 64.4 | 64.9 | 67.2 |
| 100 | 37.4 | 61.5 | 50.0 | 48.1 | ||
| 50 | 44.6 | 53.9 | 49.4 | 50.6 | ||
| 25 | 49.4 | 57.9 | 54.0 | 52.7 | ||
| NB | CFS-GA | 400 | 63.2 | 64.4 | 63.8 | 67.3 |
| 100 | 38.4 | 64.8 | 51.7 | 54.6 | ||
| 50 | 47.4 | 54.7 | 52.3 | 48.7 | ||
| 25 | 46.9 | 65.9 | 51.7 | 55.2 | ||
| BN | CFS-GA | 400 | 66.7 | 62.1 | 64.4 | 70.4 |
| 100 | 50.7 | 65.4 | 59.2 | 61.5 | ||
| 50 | 37.9 | 55.2 | 46.6 | 50.2 | ||
| 25 | 52.4 | 54.6 | 54.0 | 46.0 | ||
| Bagging | CFS-GA | 400 | 70.1 | 65.5 | 67.8 | 65.0 |
| 100 | 44.6 | 57.3 | 50.6 | 52.1 | ||
| 50 | 57.8 | 58.3 | 58.1 | 57.4 | ||
| 25 | 46.1 | 55.1 | 51.2 | 53.4 | ||
| RF | CFS-GA | 400 | 67.8 | 65.5 | 66.7 | 68.2 |
| 100 | 40.9 | 61.6 | 51.2 | 48.9 | ||
| 50 | 45.0 | 51.1 | 48.3 | 50.0 | ||
| 25 | 46.8 | 53.7 | 50.6 | 50.7 | ||
| NB | CFS-BF | 400 | 71.3 | 57.5 | 64.4 | 70.9 |
| 100 | 43.9 | 59.8 | 52.3 | 52.0 | ||
| 50 | 45.7 | 52.3 | 50.6 | 48.0 | ||
| 25 | 48.5 | 53.7 | 51.7 | 49.6 | ||
| BN | CFS-BF | 400 | 64.4 | 66.7 | 65.5 | 69.9 |
| 100 | 37.9 | 60.9 | 49.4 | 49.8 | ||
| 50 | 40.2 | 46.0 | 43.1 | 44.9 | ||
| 25 | 48.8 | 53.4 | 52.3 | 49.9 | ||
| Bagging | CFS-BF | 400 | 66.6 | 67.8 | 67.2 | 70.5 |
| 100 | 50.0 | 63.3 | 56.9 | 52.6 | ||
| 50 | 46.7 | 53.5 | 50.6 | 54.1 | ||
| 25 | 58.6 | 51.0 | 54.0 | 50.4 | ||
| RF | CFS-BF | 400 | 60.9 | 67.8 | 64.4 | 68.4 |
| 100 | 46.1 | 64.7 | 55.2 | 52.6 | ||
| 50 | 48.0 | 55.6 | 52.3 | 52.2 | ||
| 25 | 40.9 | 44.7 | 43.1 | 43.4 |
Figure 3:
ROC curves for bone characterization using conventional (non-sparse) texture-based techniques using Bagging, BN, NB, and RF (left), and the block-based counterpart using CFS-GA and Bagging (right).
The results of cross-validation experiments for the Bag of Keypoints technique showed that BoK was able to separate healthy from osteoporotic subjects with an ACC of 99.3% as displayed in Table 2. This very high accuracy may be attributed to the extraction of discriminant features from the textured areas. Also, the employed SVM model is known to address data complexity caused by non-linearity and high dimensionality.
Table 2:
Classification performance for bone characterization using Bag of Keypoints and Deep Learning techniques.
| Method | TPR (%) | TNR (%) | ACC (%) | AUC (%) |
|---|---|---|---|---|
| Bag of Keypoints | 98.6 | 100 | 99.3 | 100 |
| AlexNet | 65.5 | 57.5 | 61.5 | 63.1 |
| GoogleNet | 64.4 | 54.0 | 59.2 | 65.6 |
| Resnet18 | 80.5 | 48.3 | 64.4 | 67.5 |
| Inceptionv3 | 69.0 | 51.7 | 60.3 | 66.5 |
| AlexNet-Sp | 94.3 | 87.4 | 90.8 | 94.4 |
We then evaluated the performance of the conventional SRC method described in Section 2. We utilized multiple undersampling factors to address convergence to infeasible solutions mostly caused by linearly dependent vectors that yielded different classes. In Table 3 we show results from the top performing experiments producing 59.2% classification accuracy for resampling of 1/20, corresponding to feature dimensionality of 400. From the ROC curves in Fig. 4 we deduce that a higher degree of downsampling yields shorter and more numerically tractable feature dimensionality, but it also diffuses the textural information. We also applied conventional SRC to the texture feature set produced in Section 4.1.1 and the classification accuracy was 71.7%. This result also implies the limited separation capability of a generic texture feature set.
Table 3:
Classification performance for bone characterization using conventional sparse classifiers and ensembles of block-based sparse classifiers. The block size in the first two rows implies no block decomposition (as in conventional SRC).
| Method | Block Side | TPR (%) | TNR (%) | ACC (%) | AUC (%) |
|---|---|---|---|---|---|
| BBMAP | 400 (samp. 1/4) | 55.2 | 54.0 | 54.6 | 58.4 4 |
| 400 (samp. 1/20) | 57.5 | 60.9 | 59.2 | 63.4 | |
| 100 | 65.5 | 67.8 | 66.7 | 71.4 | |
| 50 | 93.1 | 81.6 | 87.4 | 91.3 | |
| 25 | 100 | 100 | 100 | 100 | |
| 10 | 100 | 100 | 100 | 100 | |
| Mean±Std | 89.7±16.4 | 87.4±15.7 | 88.5±15.7 | 90.7±13.5 | |
| BBLL () | 400 (samp. 1/4) |
55.2 | 54.0 | 54.6 | 58.4 |
| 400 (samp. 1/20) | 57.5 | 60.9 | 59.2 | 63.4 | |
| 100 | 85.1 | 82.8 | 83.9 | 87.7 | |
| 50 | 98.6 | 90.8 | 94.8 | 97.3 | |
| 25 | 100 | 100 | 100 | 100 | |
| 10 | 100 | 100 | 100 | 100 | |
| Mean±Std | 95.9±7.2 | 93.4±8.3 | 94.7±7.6 | 96.3±5.8 |
Figure 4:
ROC curves for bone characterization using conventional SRC classification.
When testing deep learning methods, we decided to utilize transfer learning and feature extraction techniques because of the small number of samples in our datasets. According to transfer learning techniques, we initialized the networks using weights determined after training on Imagenet and assigned high learning rates to the new layers. Among the evaluated networks using transfer learning, Resnet18 yielded the top AUC of 67.5% and top ACC of 64.4%, while Inceptionv3 produced the second highest AUC of 66.5% (Table 2). The low performances are mainly because deep networks require large datasets for effective training and to avoid overfitting. In addition, we used a deep network as feature extractor, downsampled the features to reduce their dimensionality, and applied sparse representation for classification. The deep feature extraction method AlexNet-Sp yielded 90.8% ACC and 94.4% AUC.
Next, we evaluated the performance of our block-based ensemble of sparse classifiers. We utilized block sizes ranging from 100 × 100 pixels to 10 × 10 pixels to observe the impact of this variable on the classification performance. We repeated these experiments using the BBMAP and BBLL decision functions in this setting. We show cross-validation results in Table 3. The experiment with block size 25 × 25 pixels that led to 256 classifiers achieved the best classification accuracy of 100% by the BBMAP and BBLL technique. These results in this table imply 35.5% average improvement of our method over the traditional SRC method. Figure 5 displays the ROC graphs for varying block sizes using BBMAP and BBLL decision functions. We observe that the largest AUC was obtained by use of 25 × 25 and 10 × 10 blocks. We also note the improvement in classification performance compared with conventional SRC results that are depicted in Figure 4. These results suggest that the proposed approach finds more accurate sparse solutions than the conventional SRC approach and significantly improves the classifier performance. A reason for the improved group separation may be that the ensemble technique employs multiple learners of over-complete dictionaries that are more amenable to sparse coding and representation. In addition, we estimated the statistical significance of the differences between the ROCs of BBLL with optimized threshold and BBMAP by applying DeLong’s statistical test. The p-values for block sizes of 100 × 100, 50 × 50, 25 × 25 and 10 × 10 were 0.47, 0.66, 0 and 0 respectively, suggesting significant differences for block sizes of 100 × 100, 50 × 50, 20 × 20 and 10 × 10. We note that BBLL achieves the top AUC for 20 × 20 and 10 × 10 block size that was also found to be significantly different from the corresponding BBMAP result. We used the following values for the optimization parameters in our bone characterization and breast lesion characterization experiments: TolCon = 1e − 6, TolX = 1e − 6, TolFun = 1e − 6, MaxIter = 10, ϵ = 0.1, MaxFunEvals = 8000.
Figure 5:
ROC curves for bone characterization using the proposed block-based ensemble method with BBMAP (left), and BBLL (right) decision function.
4.5. Breast Lesion Characterization
In our second experiment, we validated the separation of the breast lesions data set into two classes: malignant and benign. The training and testing data were obtained from the Mammographic Image Analysis Society (MIAS) database that is available online [3, 23]. The resolution of the mammograms is 200 micron pixel edge, and size of each image is 1024 × 1024 px after clipping/padding. MIAS contains 322 MLO scans from 161 subjects. The data is categorized into groups of healthy subjects, subjects with benign, and subjects with malignant lesions and calcifications. The data annotations provide the center and radius of the area of interest (ROI) for each lesion. Our goal is to characterize the lesion type, therefore we utilized 68 benign and 51 malignant mammograms for performance evaluation.
Because our proposed method performs block-wise analysis, we need to ensure that the majority of the blocks cover the lesion to improve the accuracy. Hence we designed our system so that the lesion ROI sizes are greater than or equal to the analysis ROI size. In this experiment, we determined the centroid and radius of each lesion from the provided metadata. We used these two values to calculate a minimum bounding square ROI for each scan. We trained and tested all classifiers on these ROI patches centered at the lesion centroid. In order to evaluate the classification performance with respect to the lesion size, we performed validation experiments on variable minimum ROI sizes. The selected ROI sizes were 48 × 48, 60 × 60, 64 × 64, and 72 × 72. For each ROI size we selected subsets of the dataset that met the minimum lesion radius criteria described above.
In Table 4 we report texture-based classification results computed for lesions with 64 × 64 pixels minimum ROI size that performed better than the other ROI sizes. The feature dimensionality in this experiment is 451. The dimensionality of texture feature set is different from that of bone characterization experiments because (i) we did not utilize the co-occurrence features due to several ROIs being smaller than the required size, (ii) 14 edge histogram features were always zero and not used in analysis, (iii) 2 additional features produced numerical errors such as division by zero. Similarly to the bone characterization experiments, we applied the block-based decomposition method to the texture-based classification system and display the cross-validation results in Table 4 and Fig. 6. CFS-GA coupled with Bagging classification produced the highest performance with 76.7% ACC and 76.2% AUC. Fig. 6 displays the ROC graphs for the CFS-BF feature selection. This figure confirms that Bagging produced the largest AUC for the leave-one-out experiment, followed by Random Forest.
Table 4:
Classification performance for breast lesion characterization using non-sparse texture-based classifiers either individually, or in an ensemble structure for 64 × 64 ROI size.
| Method | Feat. Sel. | Block Side | TPR (%) | TNR (%) | ACC (%) | AUC (%) |
|---|---|---|---|---|---|---|
| NB | No | 64 | 66.7 | 35.1 | 50.7 | 54.2 |
| 32 | 31.8 | 70.6 | 58.9 | 47.2 | ||
| 16 | 28.6 | 57.7 | 49.3 | 44.4 | ||
| 8 | 46.2 | 46.8 | 46.6 | 50.0 | ||
| BN | 64 | 75 | 5.4 | 39.7 | 40.2 | |
| 32 | 16.7 | 56.8 | 37.0 | 32.4 | ||
| 16 | 34.8 | 60.0 | 52.1 | 52.1 | ||
| 8 | 48.5 | 47.5 | 47.9 | 41.8 | ||
| Bagging | 64 | 61.1 | 55.8 | 58.9 | 59.0 | |
| 32 | 28.9 | 67.9 | 43.8 | 48.4 | ||
| 16 | 41.5 | 65.6 | 52.1 | 52.7 | ||
| 8 | 51.4 | 50.0 | 50.7 | 57.0 | ||
| RF | 64 | 58.3 | 54.1 | 56.2 | 56.5 | |
| 32 | 27.0 | 66.7 | 46.6 | 43.0 | ||
| 16 | 32.4 | 56.4 | 45.2 | 46.2 | ||
| 8 | 45.5 | 45.0 | 45.2 | 46.5 | ||
| NB | CFS-GA | 64 | 22.2 | 70.3 | 46.6 | 21.6 |
| 32 | 40.0 | 75.5 | 65.8 | 59.3 | ||
| 16 | 35.7 | 64.4 | 53.4 | 49.6 | ||
| 8 | 54.2 | 44.9 | 48.0 | 45.4 | ||
| BN | 64 | 75.0 | 43.2 | 58.9 | 71.9 | |
| 32 | 11.1 | 43.2 | 27.4 | 25.2 | ||
| 16 | 53.3 | 62.1 | 60.3 | 62.1 | ||
| 8 | 50.0 | 56.6 | 54.8 | 52.3 | ||
| Bagging | 64 | 52.8 | 62.2 | 57.5 | 50.4 | |
| 32 | 59.5 | 94.4 | 76.7 | 76.2 | ||
| 16 | 62.2 | 72.2 | 67.1 | 70.1 | ||
| 8 | 47.1 | 46.2 | 46.6 | 48.9 | ||
| RF | 64 | 50.0 | 54.1 | 52.1 | 47.8 | |
| 32 | 35.3 | 69.2 | 53.4 | 47.4 | ||
| 16 | 36.4 | 52.5 | 45.2 | 40.3 | ||
| 8 | 60.0 | 55.8 | 57.5 | 63.6 | ||
| NB | CFS-BF | 64 | 58.3 | 21.6 | 39.7 | 28.2 |
| 32 | 31.6 | 68.5 | 58.9 | 50.3 | ||
| 16 | 54.3 | 68.4 | 61.6 | 64.4 | ||
| 8 | 54.3 | 50.0 | 52.1 | 52.4 | ||
| BN | 64 | 75.0 | 5.4 | 39.7 | 40.2 | |
| 32 | 8.3 | 48.7 | 28.8 | 34.9 | ||
| 16 | 47.8 | 64.0 | 58.9 | 52.3 | ||
| 8 | 60.6 | 70.0 | 65.8 | 64.8 | ||
| Bagging | 64 | 50.0 | 51.4 | 50.7 | 44.4 | |
| 32 | 46.0 | 83.3 | 64.4 | 67.1 | ||
| 16 | 64.9 | 72.2 | 68.5 | 66.4 | ||
| 8 | 48.6 | 47.4 | 48.0 | 46.8 | ||
| RF | 64 | 58.3 | 62.2 | 57.5 | 50.4 | |
| 32 | 29.2 | 63.3 | 52.1 | 51.0 | ||
| 16 | 41.7 | 59.5 | 50.7 | 52.6 | ||
| 8 | 58.3 | 62.2 | 60.3 | 58.3 |
Figure 6:
ROC curves for breast lesion characterization using conventional (non-sparse) texture-based techniques with Bagging, BN, NB, and RF (left), and the block-based counterpart using CFS-GA and Bagging (right).
The cross-validation experiments of BoK for each ROI size are displayed in Table 5. We note that this approach produces high classification rates for most of the ROI sizes and the top class separation with ACC of 99.1% was accomplished for ROI size of 48 × 48. We deduce that the extraction of discriminant features and use of SVM classification drives the very good results.
Table 5:
Classification performance for breast lesion characterization using Bag of Keypoints classification on ROIs
| Method | ROI Side | TPR (%) | TNR (%) | ACC (%) | AUC (%) |
|---|---|---|---|---|---|
| Bag of Keypoints | 72 | 100 | 71.3 | 86.7 | 99.8 |
| 64 | 77.3 | 100 | 88.5 | 99.6 | |
| 60 | 92.7 | 48.3 | 69.6 | 85.8 | |
| 48 | 100 | 98.3 | 99.1 | 100 | |
| Mean±Std | 92.5±10.7 | 79.5±24.6 | 86.0±12.2 | 96.3±7.0 |
Furthermore, Table 6 lists the results produced by the conventional SRC method. The top performance was obtained for lesion ROI sizes equal to or greater than 64 × 64 pixels at 65.8%. After comparing Tables 4 and 6 and Figures 6 and 7, we conclude that texture-based classification produces more accurate classification rates than conventional SRC. Similarly to our bone characterization experiments, here we applied conventional SRC to the texture feature set produced in subsection 4.1.1 and the top classification accuracy was 56.7% that indicates the limited separation capability of generic texture features.
Table 6:
Classification performance for breast lesion characterization using conventional SRC on ROIs of variable sizes
| Method | ROI side | TPR (%) | TNR (%) | ACC (%) | AUC (%) |
|---|---|---|---|---|---|
| SRC | 72 | 13.9 | 90.3 | 49.3 | 47.1 |
| 64 | 51.4 | 80.6 | 65.8 | 64.3 | |
| 60 | 91.9 | 10.0 | 49.4 | 52.9 | |
| 48 | 71.1 | 54.2 | 62.4 | 60.7 | |
| Mean±Std | 57.1±33.2 | 58.8±35.9 | 56.7±8.6 | 56.3±7.7 |
Figure 7:
ROC curves for breast lesion characterization using conventional SRC classification.
Our experiments using deep networks with transfer learning resulted in moderate classification rates as displayed in Table 7. The limited rates of the transfer learning techniques are due to the small sample size. The use of CNN-based features for sparse classification, improved the classification rates significantly. The top ACC was 95.5% and top AUC was 95.6% for AlexNet feature extraction and sparse classification on ROI size of 72 × 72 pixels. Our results indicate that use of deep networks for feature extraction is effective for datasets that have very small sample sizes.
Table 7:
Classification performance for breast lesion characterization using deep learning techniques.
| Method | ROI Side | TPR (%) | TNR (%) | ACC (%) | AUC (%) |
|---|---|---|---|---|---|
| AlexNet | 72 | 67.7 | 58.1 | 62.9 | 62.2 |
| GoogleNet | 48.4 | 64.5 | 56.5 | 53.9 | |
| Resnet18 | 64.5 | 54.8 | 59.7 | 62.9 | |
| Inceptionv3 | 58.1 | 54.8 | 56.6 | 52.1 | |
| AlexNet-Sp | 97.2 | 93.5 | 95.5 | 95.6 | |
| AlexNet | 64 | 72.2 | 61.1 | 66.7 | 64.6 |
| GoogleNet | 50.0 | 61.1 | 55.6 | 53.4 | |
| Resnet18 | 41.7 | 75.0 | 58.3 | 56.9 | |
| Inceptionv3 | 48.6 | 61.1 | 54.7 | 51.2 | |
| AlexNet-Sp | 86.5 | 33.3 | 60.3 | 55.6 | |
| AlexNet | 60 | 70.3 | 64.9 | 67.6 | 67.3 |
| GoogleNet | 51.4 | 62.2 | 56.8 | 51.9 | |
| Resnet18 | 40.5 | 72.5 | 57.1 | 51.6 | |
| Inceptionv3 | 37.8 | 75.0 | 57.1 | 50.0 | |
| AlexNet-Sp | 86.5 | 12.5 | 48.1 | 50.3 | |
| AlexNet | 48 | 57.8 | 60.0 | 58.9 | 56.9 |
| GoogleNet | 64.4 | 53.3 | 58.9 | 58.7 | |
| Resnet18 | 57.8 | 53.3 | 55.6 | 55.2 | |
| Inceptionv3 | 60.0 | 58.3 | 59.1 | 58.7 | |
| AlexNet-Sp | 75.6 | 25.0 | 49.5 | 45.7 |
In the last part of this experiment we validated our block-based ensemble classification system. In Table 8, we present the results for ROI sizes, greater than, or equal to 64 × 64 pixels that correspond to 36 benign and 37 malignant lesions. In the first row of this table, the results were obtained from a single block that is equivalent to conventional SRC analysis. The remaining rows correspond to classifier ensembles. Overall, the best accuracy achieved by our system using the BBLL approach was 97.3% for block size of 8 × 8 using . We note that our method improved the accuracy by 31.5% compared to the traditional SRC method. This indicates that the block decomposition and sampling combined with classifier decision fusion yields more accurate solutions than SRC. The ROC graphs in Fig. 8 confirm that the BBLL decision function using 8 × 8 blocks yielded the largest AUC. The BBLL approach contributes to reduction of potential prediction bias. In addition, we applied DeLong tests between the ROC curves produced by BBLL and BBMAP to find whether their differences are statistically significant. For breast lesion characterization, the p-values for minimum ROI size of 64 × 64 and block sizes of 32 × 32, 16 × 16, 8 × 8 and 4 × 4 were 0.1, 0.24, 7.9 · 10−4 and 1.2 · 10−11 respectively, suggesting significant differences for block sizes of 8 × 8 and 4 × 4. We note that comparisons between Tables 4 and 8 indicate that the proposed BBLL ensemble learning approach out-performed the top performing non-sparse texture-based classifier by 20.7%. In addition, Fig. 9 displays a graph of the classification rates produced by texture-based Bagging, BoK, AlexNet, SRC, AlexNet-Sp, BBMAP and BBLL ensemble learners with respect to ROI size (left), and the average ACC for each method over the ROI sizes (right). The summarized (μ ± σ) classification rates over multiple ROI sizes for texture-based Bagging, BoK, AlexNet, SRC, AlexNet-Sp, BBMAP and BBLL are 61.7 ± 10.9, 86.0 ± 12.2, 64.0±3.98, 56.7±8.6, 63.3±22.1, 58.9±14.4 and 94.9±7.6 respectively. From these experiments we observe that BBLL and BoK are the top performing approaches, and BBLL yields more consistent classification rates than BoK with respect to the ROI size.
Table 8:
Classification performance for breast lesion characterization using ensembles of block-based sparse classifiers (ROI size: 64 × 64)
| Method | Block Side | TPR (%) | TNR (%) | ACC (%) | AUC (%) |
|---|---|---|---|---|---|
| BBMAP | 64 | 51.4 | 80.6 | 65.8 | 64.3 |
| 32 | 59.5 | 91.7 | 75.3 | 76.1 | |
| 16 | 70.3 | 94.4 | 82.2 | 81.3 | |
| 8 | 56.8 | 97.2 | 76.7 | 74.8 | |
| 4 | 54.1 | 50 | 52.1 | 50.5 | |
| Mean±Std | 60.2±7.1 | 83.3±22.3 | 71.6±13.3 | 70.7±13.7 | |
| BBLL () | 64 | 51.4 | 80.6 | 65.8 | 64.3 |
| 32 | 27.0 | 77.8 | 52.1 | 53.5 | |
| 16 | 86.5 | 97.2 | 91.8 | 89.9 | |
| 8 | 94.6 | 100 | 97.3 | 98.7 | |
| 4 | 94.6 | 86.1 | 90.4 | 90.8 | |
| Mean±Std | 75.7±32.7 | 90.3±10.3 | 82.9±20.7 | 83.2±20.2 |
Figure 8:
ROC curves for breast lesion characterization using the proposed block-based ensemble method with BBMAP (left), and BBLL (right) decision functions.
Figure 9:
Graphs of ACC values versus ROI size for breast lesion characterization produced by Texture, BoK, AlexNet ,GoogleNet, Resnet18, Inceptionv3, SRC, AlexNet-Sp, BBMAP and BBLL (left) and the corresponding average ACC for each method over all ROI sizes (right).
We also measured the standardized execution times of our BBLL method versus the ROI size and the block size. For each method we applied cross-validation experiments and we divided the total execution time by the number of experiments and the number of subjects. Then we identified all values by the maximum execution time. Overall, the average standardized execution time of conventional SRC for the MIAS dataset using ROI sizes 72 × 72, 64 × 64, 60 × 60, 48 × 48 were all approximately equal to 0.015 indicating that the execution times of conventional SRC are not dependent on the ROI size of the lesion. When we applied to 64 × 64 ROIs classifier ensembles with block sizes of 32 × 32, 16 × 16, 8 × 8, we measured execution times of 0.038, 0.128, 0.473 respectively. These results suggest that the computational time for BBLL increases linearly with the number of blocks. We also calculated the execution times for the BoK method and ROI sizes of 72 × 72, 64 × 64, 60 × 60, 48 × 48. The standardized execution times were all approximately equal to 0.422. BoK applies the keypoint-feature extraction stage, therefore the execution time depends mostly on the number of keypoints and not very much on the ROI size. We observe that the top performing BBLL method for 64 × 64 ROI size and 8 × 8 block size requires about the same execution time as the top performing BoK for 48 × 48 ROI size.
5. CONCLUSIONS
We proposed integrative block-based sparse classification techniques for automated characterization of lesions. We introduced two Bayesian decision functions based on maximum a posteriori (MAP) and log likelihood (LL) estimates. We compared our ensemble of sparse classifiers to conventional SRC, texture-based classification, deep learning, and Bag of Keypoints approaches. We applied our method to diagnosis of osteoporosis in digital radiographs and breast lesion characterization in mammograms. We observed that the proposed approach has the potential for very good group separation in diverse applications such as bone characterization and breast lesion characterization. Our ensemble approach produced better performance than SRC, texture-based classifiers, and the tested deep learning methods. The main advantages of this technique over established classification methods are that (i) it divides the sparse coding problem into smaller ones, and yields overdetermined systems that are essential for finding effective sparse approximations, (ii) the localization of blocks introduces spatial information that subsequently improves the approximation and characterization of the sample (iii) the ϵ parameter that represents a margin of approximation error regularizes the solutions (iv) the use of ensembles further improves the class prediction. The performance of Bag of Keypoints was very high, albeit slightly less consistent than BBLL with respect to ROI size in breast lesion characterization and slightly lower for bone characterization. Also, the BoK method may be slower than the BBLL learners, especially if the block sizes are relatively large in BBLL. An advantage of this method over deep learning techniques is that it allows visualization and interpretation of the discriminant patterns and the decision functions. These capabilities can provide insight into the factors that drive separation between healthy and diseased states and may be used for training radiologists, as one application. Our results also indicate that BBLL produces more accurate classification than BBMAP. Our proposed system may be applicable to identification of subjects with higher risk of disease and computer-aided diagnosis.
Highlights of manuscript with title “Integrative Blockwise Sparse Analysis for Tissue Characterization and Classification”.
Integrative sparse coding and classification method
Decision function for ensemble classification based on relative sparsity
More sparse and regularized solutions compared to conventional sparse representation
Performance evaluation for osteoporosis diagnosis and breast lesion characterization
Method can be applied to diagnosis of other diseases with minor adjustments
Acknowledgment
This research was supported by the National Institute of General Medical Sciences of the National Institutes of Health (NIH) under Award Number SC3GM113754 and by the Intramural Research Program of NIA, NIH. We also acknowledge the support of the Center for Research and Education in Optical Sciences and Applications (CREOSA) of Delaware State University funded by NSF CREST-8763. The authors wish to thank Dr. Predrag Bakic for feedback on mammogram imaging and analysis.
Footnotes
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
The authors state that they have no conflicts of interest.
References
- [1].Bartl R, Frisch B, Osteoporosis: diagnosis, prevention, therapy, Springer Science & Business Media, 2009. [Google Scholar]
- [2].Smith RA, Cokkinides V, Eyre HJ, American cancer society guidelines for the early detection of cancer, 2003, CA: A Cancer Journal for Clinicians 53 (1) (2003) 27–43. doi: 10.3322/canjclin.53.1.27. [DOI] [PubMed] [Google Scholar]
- [3].Oliver A, Freixenet J, Marti J, Perez E, Pont J, Denton ER, Zwiggelaar R, A review of automatic mass detection and segmentation in mammographic images, Medical image analysis 14 (2) (2010) 87–110. [DOI] [PubMed] [Google Scholar]
- [4].Lee S-K, Lo C-S, Wang C-M, Chung P-C, Chang C-I, Yang C-W, Hsu P-C, A computer-aided design mammography screening system for detection and classification of microcalcifications, International journal of medical informatics 60 1 (2000) 29–57. [DOI] [PubMed] [Google Scholar]
- [5].Ramírez J, Górriz JM, Salas-Gonzalez D, Romero A, López M, Álvarez I, Gómez-Río M, Computer-aided diagnosis of alzheimer’s type dementia combining support vector machines and discriminant set of features, Inf. Sci 237 (2013) 59–72. doi: 10.1016/j.ins.2009.05.012. [DOI] [Google Scholar]
- [6].Makrogiannis S, Verma R, Davatzikos C, Anatomical equivalence class: A morphological analysis framework using a lossless shape descriptor, IEEE Trans. Med. Imaging 26 (4) (2007) 619–631. [DOI] [PubMed] [Google Scholar]
- [7].Ferlay J, Héry C, Autier P, Sankaranarayanan R, Global Burden of Breast Cancer, Springer New York, New York, NY, 2010, pp. 1–19. doi: 10.1007/978-1-4419-0685-41. [DOI] [Google Scholar]
- [8].Santamaria-Pang A, Dutta S, Makrogiannis S, Hara A, Pavlicek W, Silva A, Thomsen B, Robertson S, Okerlund D, Langan DA, Bhotika R, Automated liver lesion characterization using fast kvp switching dual energy computed tomography imaging, Vol. 7624, 2010, pp. 76240V–76240V-10. doi: 10.1117/12.844059. [DOI] [Google Scholar]
- [9].Lala D, Cheung AM, Lynch CL, Inglis D, Gordon C, Tomlinson G, Giangregorio L, Measuring apparent trabecular structure with pqct: a comparison with hr-pqct., J Clin Densitom 17 (1) (2014) 47–53. doi: 10.1016/j.jocd.2013.03.002. [DOI] [PubMed] [Google Scholar]
- [10].Erlandson M, Lorbergs A, Mathur S, Cheung A, Muscle analysis using pqct, dxa and mri, European Journal of Radiology 85 (8) (2016) 1505–1511. doi: 10.1016/j.ejrad.2016.03.001. [DOI] [PubMed] [Google Scholar]
- [11].Oulhaj H, Rziza M, Amine A, Toumi H, Lespessailles E, Hassouni ME, Jennane R, Anisotropic discrete dual-tree wavelet transform for improved classification of trabecular bone, IEEE Trans. Med. Imaging 36 (10) (2017) 2077–2086. [DOI] [PubMed] [Google Scholar]
- [12].Yang J, Wright J, Huang TS, Ma Y, Image super-resolution as sparse representation of raw image patches, Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on (2008) 1–8doi: 10.1109/CVPR.2008.4587647. [DOI] [Google Scholar]
- [13].Figueiredo MAT, Nowak R, Wright S, Gradient projection for sparse reconstruction: application to compressed sensing and other inverse problems, IEEE Journal on Selected Topics in Signal Processing 1 (4) (2007) 586–597. [Google Scholar]
- [14].Wright J, Yang AY, Ganesh A, Sastry SS, Ma Y, Robust face recognition via sparse representation, IEEE Transactions on Pattern Analysis and Machine Intelligence 31 (2) (2009) 210–227. doi: 10.1109/TPAMI.2008.79. [DOI] [PubMed] [Google Scholar]
- [15].Davis G, Mallat S, Avellaneda M, Adaptive greedy approximations, Constructive Approximation 13 (1) (1997) 57–98. doi: 10.1007/BF02678430. [DOI] [Google Scholar]
- [16].Zhao S-H, Hu Z-P, Occluded face recognition based on block-label and residual, International Journal on Artificial Intelligence Tools 25 (03) (2016) 1650019 arXiv:http://www.worldscientific.com/doi/pdf/10.1142/S0218213016500196, doi: 10.1142/S0218213016500196. [DOI] [Google Scholar]
- [17].Qiao L, Chen S, Tan X, Sparsity preserving projections with applications to face recognition, Pattern Recognition 43 (1) (2010) 331–341. doi: 10.1016/j.patcog.2009.05.005. [DOI] [Google Scholar]
- [18].Cai D, He X, Han J, Spectral regression: A unified approach for sparse subspace learning, in: Data Mining, 2007. ICDM 2007. Seventh IEEE International Conference on, IEEE, 2007, pp. 73–82. [Google Scholar]
- [19].Hassouni ME, Tafraouti A, Toumi H, Lespessailles E, Jennane R, Fractional brownian motion and rao geodesic distance for bone x-ray image characterization, IEEE J. Biomedical and Health Informatics 21 (5) (2017) 1347–1359. [DOI] [PubMed] [Google Scholar]
- [20].Heath M, Bowyer K, Kopans DB, K. P Jr., Moore RH, Chang K, Munishkumaran S, Current status of the digital database for screening mammography, in: Karssemeijer N, Thijssen M, Hendriks JHCL, van Erning L (Eds.), Digital Mammography / IWDM, Vol. 13 of Computational Imaging and Vision, Springer, 1998, pp. 457–460. [Google Scholar]
- [21].Oliver A, Lladó X, Pérez E, Pont J, Denton ERE, Freixenet J, Martí J, A statistical approach for breast density segmentation, Journal of Digital Imaging 23 (5) (2010) 527–537. doi: 10.1007/s10278-009-9217-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [22].Verma B, McLeod P, Klevansky A, Classification of benign and malignant patterns in digital mammograms for the diagnosis of breast cancer, Expert systems with applications 37 (4) (2010) 3344–3351. [Google Scholar]
- [23].Matheus BRN, Schiabel H, Online mammographic images database for development and comparison of cad schemes, Journal of digital imaging 24 (3) (2011) 500–506. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [24].Moreira IC, Amaral I, Domingues I, Cardoso A, Cardoso MJ, Cardoso JS, Inbreast: toward a full-field digital mammographic database, Academic radiology 19 (2) (2012) 236–248. [DOI] [PubMed] [Google Scholar]
- [25].Kulkarni P, Stranieri A, Kulkarni S, Ugon J, Mittal M, Hybrid technique based on ngram and neural networks for classification of mammographic images, in: Second International Conference on Signal, Image Processing and Pattern Recognition, 2014, pp. 297–306. [Google Scholar]
- [26].Pereira DC, Ramos RP, Do Nascimento MZ, Segmentation and detection of breast cancer in mammograms combining wavelet analysis and genetic algorithm, Computer methods and programs in biomedicine 114 (1) (2014) 88–101. [DOI] [PubMed] [Google Scholar]
- [27].Nagarajan R, Upreti M, An ensemble predictive modeling framework for breast cancer classification, Methods 131 (2017) 128–134, systems Approaches for Identifying Disease Genes and Drug Targets. doi: 10.1016/j.ymeth.2017.07.011. [DOI] [PubMed] [Google Scholar]
- [28].Zheng K, Makrogiannis S, Sparse representation using block decomposition for characterization of imaging patterns, in: Wu G, Munsell BC, Zhan Y, Bai W, Sanroma G, Coupé P (Eds.), Patch-Based Techniques in Medical Imaging: Third International Workshop, Patch-MI 2017, Held in Conjunction with MICCAI 2017, Quebec City, QC, Canada, September 14, 2017, Proceedings, Springer International Publishing, Cham, 2017, pp. 158–166. doi: 10.1007/978-3-319-67434-6_18. [DOI] [Google Scholar]
- [29].Zheng K, Jennane R, Makrogiannis S, Ensembles of sparse classifiers for osteoporosis characterization in digital radiographs, in: Medical Imaging 2019: Computer-Aided Diagnosis, Vol. 10950, International Society for Optics and Photonics, 2019, p. 1095024. [Google Scholar]
- [30].Donoho DL, Elad M, Optimally sparse representation in general (nonorthogonal) dictionaries via l1 minimization, Proceedings of the National Academy of Sciences 100 (5) (2003) 2197–2202. arXiv:http://www.pnas.org/content/100/5/2197.full.pdf, doi: 10.1073/pnas.0437847100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [31].Donoho DL, For most large underdetermined systems of linear equations the minimal l1-norm solution is also the sparsest solution, Comm. Pure Appl. Math 59 (6) (2004) 797–829. [Google Scholar]
- [32].Candès EJ, Romberg JK, Tao T, Stable signal recovery from incomplete and inaccurate measurements, Communications on Pure and Applied Mathematics 59 (8) (2006) 1207–1223. doi: 10.1002/cpa.20124. [DOI] [Google Scholar]
- [33].Alizadeh F, Goldfarb D, Second-order cone programming, MATHEMATICAL PROGRAMMING; 95 (2001) 3–51. [Google Scholar]
- [34].Zheng K, Makrogiannis S, Bone texture characterization for osteoporosis diagnosis using digital radiography, in: 2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), 2016, pp. 1034–1037. doi: 10.1109/EMBC.2016.7590879. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [35].Costa AF, Humpire-Mamani G, Traina AJM, An efficient algorithm for fractal analysis of textures, in: 2012 25th SIBGRAPI Conference on Graphics, Patterns and Images, 2012, pp. 39–46. doi: 10.1109/SIBGRAPI.2012.15. [DOI] [Google Scholar]
- [36].Shapiro L, Stockman G, Computer Vision, Prentice-Hall, Upper Saddle River, NJ, 2001. [Google Scholar]
- [37].Duda RO, Hart PE, Stork DG, Duda CRO, Hart PE, Stork DG, Pattern Classification, 2nd Ed, Wiley-Interscience, 2001. [Google Scholar]
- [38].Hastie T, Tibshirani R, Friedman J, The elements of statistical learning: data mining, inference and prediction, 2nd Edition, Springer, 2009. [Google Scholar]
- [39].Csurka G, Dance CR, Fan L, Willamowski J, Bray C, Visual categorization with bags of keypoints, in: In Workshop on Statistical Learning in Computer Vision, ECCV, 2004, pp. 1–22. [Google Scholar]
- [40].Zhang J, Lazebnik S, Schmid C, Local features and kernels for classification of texture and object categories: a comprehensive study, International Journal of Computer Vision 73 (2007) 2007. [Google Scholar]
- [41].Krizhevsky A, Sutskever I, Hinton GE, Imagenet classification with deep convolutional neural networks, in: Advances in neural information processing systems, 2012, pp. 1097–1105. [Google Scholar]
- [42].Li Q, Cai W, Wang X, Zhou Y, Feng DD, Chen M, Medical image classification with convolutional neural network, in: 2014 13th International Conference on Control Automation Robotics Vision (ICARCV), 2014, pp. 844–848. doi: 10.1109/ICARCV.2014.7064414. [DOI] [Google Scholar]
- [43].Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A, Going deeper with convolutions, in: Computer Vision and Pattern Recognition (CVPR), 2015. URL http://arxiv.org/abs/1409.4842 [Google Scholar]
- [44].Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z, Rethinking the inception architecture for computer vision, in: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. [Google Scholar]
- [45].Tajbakhsh N, Shin JY, Gurudu SR, Hurst RT, Kendall CB, Gotway MB, Liang J, Convolutional neural networks for medical image analysis: Full training or fine tuning?, IEEE Transactions on Medical Imaging 35 (5) (2016) 1299–1312. doi: 10.1109/TMI.2016.2535302. [DOI] [PubMed] [Google Scholar]
- [46].He K, Zhang X, Ren S, Sun J, Identity mappings in deep residual networks, in: Leibe B, Matas J, Sebe N, Welling M (Eds.), Computer Vision – ECCV 2016, Springer International Publishing, Cham, 2016, pp. 630–645. [Google Scholar]
- [47].Benjamin MLG. Huynh Q, Li Hui, Digital mammographic tumor classification using transfer learning from deep convolutional neural networks, Journal of Medical Imaging 3 (3) (2016) 1–5 – 5. doi: 10.1117/1.JMI.3.3.034501. URL 10.1117/1.JMI.3.3.034501 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [48].Shin H, Roth HR, Gao M, Lu L, Xu Z, Nogues I, Yao J, Mollura D, Summers RM, Deep convolutional neural networks for computer-aided detection: Cnn architectures, dataset characteristics and transfer learning, IEEE Transactions on Medical Imaging 35 (5) (2016) 1285–1298. doi: 10.1109/TMI.2016.2528162. [DOI] [PMC free article] [PubMed] [Google Scholar]









