Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2015 Jan 5.
Published in final edited form as: IEEE J Biomed Health Inform. 2014 Mar;18(2):585–593. doi: 10.1109/JBHI.2013.2278023

Max-AUC Feature Selection in Computer-Aided Detection of Polyps in CT Colonography

Jian-Wu Xu 1, Kenji Suzuki 1
PMCID: PMC4283828  NIHMSID: NIHMS648379  PMID: 24608058

Abstract

We propose a feature selection method based on a sequential forward floating selection (SFFS) procedure to improve the performance of a classifier in computerized detection of polyps in CT colonography (CTC). The feature selection method is coupled with a nonlinear support vector machine (SVM) classifier. Unlike the conventional linear method based on Wilks' lambda, the proposed method selected the most relevant features that would maximize the area under the receiver operating characteristic curve (AUC), which directly maximizes classification performance, evaluated based on AUC value, in the computer-aided detection (CADe) scheme. We presented two variants of the proposed method with different stopping criteria used in the SFFS procedure. The first variant searched all feature combinations allowed in the SFFS procedure and selected the subsets that maximize the AUC values. The second variant performed a statistical test at each step during the SFFS procedure, and it was terminated if the increase in the AUC value was not statistically significant. The advantage of the second variant is its lower computational cost. To test the performance of the proposed method, we compared it against the popular stepwise feature selection method based on Wilks' lambda for a colonic-polyp database (25 polyps and 2624 nonpolyps). We extracted 75 morphologic, gray-level-based, and texture features from the segmented lesion candidate regions. The two variants of the proposed feature selection method chose 29 and 7 features, respectively. Two SVM classifiers trained with these selected features yielded a 96% by-polyp sensitivity at false-positive (FP) rates of 4.1 and 6.5 per patient, respectively. Experiments showed a significant improvement in the performance of the classifier with the proposed feature selection method over that with the popular stepwise feature selection based on Wilks' lambda that yielded 18.0 FPs per patient at the same sensitivity level.

Index Terms: Colonic polyps, computer-aided detection (CADe), feature selection, support vector machines (SVMs)

I. Introduction

Colorectal cancer is one of the leading causes of mortality due to cancer in the United States [1]. Early detection is critical in reducing the risk of death due to colon cancer. However, early detection of polyps in CTC is difficult because of the similar appearance of various nonlesions. Therefore, there has been a great interest in the development of computer-aided detection (CADe) schemes for early detection of polyps in CTC to improve the detection sensitivity and specificity [2]–[4].

A CADe scheme generally consists of candidate detection followed by supervised classification [5]. The task of candidate detection is to achieve high detection sensitivity by including as many suspicious lesions as possible. A common approach to classification in a CADe scheme is to extract many texture, gray-level-based, geometric, and other features based on domain knowledge. However, not all of these extracted features might be helpful in discriminating lesions from nonlesions. Therefore, in the design of an effective classifier, it is critical to select the most discriminant features to differentiate lesions from nonlesions.

Feature selection has long been an active research topic in machine learning [6]–[8], because it is one of the main factors that determines the performance of a classifier. In the context of the CADe research field, one of the most popular feature selection methods is the stepwise feature selection based on Wilks' lambda coupled with linear discriminant analysis (LDA). The method has been applied in various CADe schemes because of its simplicity and effectiveness [9], [10]. Recently, feature ranking techniques have been applied for selection of relevant and informative features in CADe schemes [11], [12]. Campadelli et al. used the univariate Golub statistics to order individual features extracted from chest radiographs and chose a certain number of features with the highest positive and negative values [11]. Mutual information has been used to identify features that were highly correlated with the pathologic state of tissues from the trans-rectal ultrasound images in CADe of prostate cancer [12]. On the other hand, deterministic and stochastic feature selection methods were extensively employed for searching feature subset in the machine learning field. One of the most widely used deterministic feature searching approaches is the sequential forward floating selection (SFFS) and sequential backward floating selection (SBFS) [13]. SBFS has been used for selecting input features for artificial neural networks [14], [15]. SFFS has been used to search relevant features combined with various classifiers such as Naïve Bayes, a k-nearest-neighbor classifier, support vector machines (SVMs) [16], and AdaBoost [17], in different CADe schemes. Stochastic searching methodology consists of a genetic algorithm, particle swarm optimization, and others. A genetic algorithm has been used in lung nodule CADe [18] and in detecting pulmonary embolisms in CT images [19]. Mohamed and Salama applied particle swarm optimization in spectral multifeature analysis CADe of prostate cancer in trans-rectal ultrasound images [20].

Feature searching methods are classifier- and criterion-dependent. Different classifiers would select different sets of features given the same criterion. On the other hand, different selection criteria could result in distinctive feature sets even based on the same classifier. In the literature, classification standards/publications/rights/index.html for more information. accuracy [16], false-positive (FP) elimination rate [18], mean sensitivity of the free-response receiver operating characteristic (FROC) curve [21], pseudo-loss in the AdaBoost algorithm [17], and other general performance measures have been employed as the selection rules. However, a low FP rate at a high sensitivity region is necessary in order for a CADe scheme to be useful in clinical practice. The AUC value has been widely used in evaluation of CADe schemes in the literature [5]. The mean sensitivity criterion only measures the average sensitivity value in a predefined specificity range [21], which does not quantify how a CADe scheme performs in general as the AUC criterion does. In the machine learning community, the AUC value has also been used as a criterion for optimizing classifiers. Rakotomamonjy has proposed a novel form of an SVM by approximately maximizing the AUC value [22]. Marrocco et al. used a nonparametric linear classifier to maximize the AUC value [23]. These two methods did not involve feature selection. All features were used in the optimization of classifiers. Feature selections based on ranking [24] and perturbation [25] have been employed for the maximization of the AUC value in microarray and gene expression applications. However, feature ranking and perturbation considered only individual feature characteristics and did not take into account the collective discriminative power of feature combinations. In the classification, the collective discriminative power of combining multiple features matters most.

In this paper, we propose a feature-selection method that directly maximizes the AUC value for a CADe scheme coupled with a nonlinear SVM classifier. To test the performance of the proposed feature selection method, we compared it against the popular stepwise feature selection based on Wilks' lambda in CADe of polyps in CTC.

II. Materials

The CTC cases used in this study were acquired retrospectively at the University of Chicago Medical Center. The database consisted of 206 CTC datasets obtained from 103 patients. Each patient followed the standard CTC procedure with pre-colonoscopy cleansing and colon insufflation with room air or carbon dioxide. Fecal tagging was not employed in the CTC protocol. Both supine and prone positions were scanned with a multi-detector-row CT scanner (LightSpeed QX/i, GE Medical Systems, Milwaukee, WI) with collimations between 2.5 and 5.0 mm, reconstruction intervals of 1.25–5.0 mm, and tube currents of 60–120 mA with 120 kVp. Each reconstructed CT section had a matrix size of 512 × 512 pixels, with an in-plane pixel size of 0.5–0.7 mm. Optical colonoscopy was also performed for all patients. In this study, we used 5 mm as the lower limit on the clinically important size of polyps. The locations of polyps were confirmed by an expert radiologist based on CTC images, and pathology and colonoscopy reports. Fourteen patients had 25 colonoscopy-confirmed polyps, 11 of which were 5–9 mm and 14 were 10–25 mm in size. The dataset has been used in a previous study [10].

A lesion candidate detection algorithm was applied to the database. The initial detection algorithm was composed of 1) automatic knowledge-guided colon segmentation and 2) detection of polyp candidates based on the shape index and curvedness of the segmented colon [10]. The initial detection step missed one polyp and detected two polyps only in one view (supine or prone), yielding 24 detected lesions with 46 views, while detected 2624 nonlesions. The major sources of nonlesions included rectal tubes, stool, haustral folds, colonic walls, and the ileocecal valve. Therefore, the initial candidate detection algorithm achieved a 96% (24/25) by-polyp sensitivity with 25.5 (2624(103) FPs (i.e., nonlesions) per patient. Fig. 1 shows a representative polyp and two typical nonlesion detections and their corresponding segmented regions. Because the detection criterion was based on the shape index and curvedness, rectal tubes and haustral folds were typical FPs because of their similar shape appearances. A part of a rectal tube often exhibits a cap-like shape that is very similar to a part of a small polyp in appearance as shown in Fig. 1(c). Part of the rectal tube was falsely detected as a polyp with the segmented contour given in Fig. 1(d). A haustral fold produces large curvedness values, as does a polyp. Fig. 1(e) shows a typical haustral fold that was falsely detected as a polyp candidate because of its large curvedness. Fig. 1(f) presents the corresponding segmented region that has large curvedness values.

Fig. 1.

Fig. 1

Representative polyp and nonlesion detections and their corresponding segmented regions. (a) A true polyp; (c) a nonpolyp (rectal tube); (e) a nonpolyp (haustral fold); [(b), (d), and (f)] the corresponding segmented candidates.

III. Methods

The structure of our proposed feature selection method coupled with a linear/nonlinear classifier is depicted in Fig. 2. The classification step consists of three major components: feature extraction from lesion candidates, SFFS feature selection based on the maximal AUC value criterion, and an SVM classifier operated on the optimal feature subset. The feature selection stage only occurred in the design stage. Once the optimal feature set was selected, the classification stage only consisted of feature extraction and the SVM classifier. The SVM classifier would classify each suspicious candidate into a lesion or a nonlesion, so that nonlesions from the previous detection step could be reduced while a high sensitivity would be maintained.

Fig. 2.

Fig. 2

Proposed feature selection in classification and the initial lesion candidate detection in a CADe scheme.

A. Feature Extraction

Feature extraction is one of the most important steps in a classification stage. We extracted 75 two-dimensional (2-D) and three-dimensional (3-D) morphologic, gray-level-based, and texture features from detected lesion candidates in CT images to form an initial feature set. 2-D features were calculated in the axial slice where the segmented candidate region had the largest area. 3-D features were computed in the overall segmented volume.

To compute features such as the contrast between a segmented candidate region and its outside, we created a ring structure for a 2-D case and a shell structure for a 3-D case surrounding a detected candidate, denoted as the band region. We performed a binary dilation operation on the detected candidates with a square-structuring element of 21 × 21 pixels and 11 × 11 pixels (a cube of 21 × 21 × 21 and 11 × 11 × 11 voxels for a 3-D case) [26]. The difference between the output dilated regions would be the final band regions. Therefore, the outside region was defined as a ring (a shell for a 3-D case) with a width of 5 pixels and 5 pixels away from the boundary of the detected candidate.

Gray-level information characterized lesion intensity information. Shape information such as radial and tangential gradient indices inside the lesions and in the band regions were computed. To make these features meaningful and discriminant, the delineation of the candidates is required to correspond closely to the real object boundaries. This, in turn, requires the accuracy of the hysteresis thresholding and clustering method employed in the detection of polyps. Histogram-based features were extracted to specify the range, distribution, and overlap of the voxel values in gray-level and edge-enhanced images inside and outside the delineated candidates.

B. Support Vector Machines

We used SVMs [27] as the classifier in our CADe scheme. SVMs are a machine-learning technique that maximizes the margin of separation between positive and negative classes. Given a set of N training data points {(xi,y)}i=1N, where xi is the feature vector with xi ∈ ℜL, and yi is the class label with yi ∈ {–1, 1}, the decision function for the SVM classifier can be written as

f(x)=i=1NαiyiK(xi,x)+α0. (1)

The parameters αi ≥ 0 are called Lagrange multipliers that are optimized through quadratic programming. K(xi, xj) is a symmetric nonnegative inner-product kernel. In the applications of SVMs, popular kernel functions include

dth degree polynomial function:

K(xi,xj)=(1+xiTxj)d (2)

Gaussian kernel function:

K(xi,xj)=exp(xixj2/2σ2). (3)

The optimal Lagrange multipliers αi ≥ 0 in the optimal decision boundary (1) is computed through the maximization of the following objective function:

maxαii=1Nαi12i=1Nj=1NαiαjyiyjK(xi,xj) (4)

subject to the following constraints:

i=1Nαiyi=0
Cαi0,fori=1,2,,N

where C is a user-specified positive parameter. As the SVM can be reformulated through the regularized function estimation problem with a hinge loss criterion [27], it can be shown that the SVM classifier has property of large margin and is robust against outliers.

C. Maximal AUC SFFS Feature Selection

The proposed maximal AUC SFFS feature selection method adopted the wrapper approach where the searching procedure was coupled with an SVM classifier to yield the AUC value for evaluation in each step [28]. We used the AUC value from the ROC as the selection criterion, because it directly measures how a CADe scheme performs in general. It has been shown that the AUC value corresponds to the probability of correctly identifying if a case is normal or abnormal [29]. From a statistical perspective, the AUC value is also equivalent to the well-known nonparametric Wilcoxon statistic [29]. These connections provide alternative views of the AUC value and make it a suitable measure of the performance of a CADe scheme. The SFFS procedure selects features based on the collective discriminative power of a combination of features. This is different from the feature ranking or perturbation approach, where the selection is based on the individual discriminative power of features [24], [25].

In this study, we proposed two variants of the maximal AUC SFFS feature selection method. The first variant, denoted as MaxAUCSVM, selects features and stops the procedure until all combinations of features allowed in the SFFS procedure have been examined. On the other hand, the second variant, denoted as MaxAUCSVMStat, applies the statistical test between AUC values obtained from adding or deleting features to determine the stopping criterion. We provide detailed descriptions in the following.

Table I outlines the main procedure of the first variant of the proposed feature selection method. MaxAUCSVM starts with an empty selected feature set F0. Then it begins to include one feature at a time that would maximize the AUC value, calculated via an SVM classifier, of the selected feature subset given a subset size. This is given in (5) where the criterion J(Fk +{x}) is the AUC value of the SVM classification with the selection feature set (Fk+{x}). Therefore, (5) guarantees that the selected feature would produce the maximal AUC value with the combination of the existing features in the subset. However, this step only includes features without removing any existing ones. It might be possible to increase the AUC value by removing some features from the selected subset. This is realized in (6) and onward. It starts with the selected feature subset, and removes one feature at a time if the remaining feature subset performs better than the one containing the feature to be removed. The procedure continues until the number of features in the selected subset reaches the total number of available features. The feature subset with the maximal AUC value would be selected as the final output of the procedure.

Table I. Maximal AUC SFFS Feature Selection Method I.

Maximal AUC SFFS Feature Selection I
Initialization:
 Full feature set from CT images X, selected feature set at step Fo={Ø}, predefined feature number l=75, k=0.
while k<=l
x+=arg maxxXFkJ(Fk+{x})(5)
Fk+1 = Fk + {x+}
k=k+1
if k>2
   x=arg maxxFkJ(Fk{x})(6)
  while J(Fk-{x-}>J(Fk-1) and k>2
   Fk-1=Fk-x
   k=k-1
   if k>2
     x=arg maxxFkJ(Fk{x})
   end
  end
end
end
Output:
 Selected feature set Yi

One characteristic of the first variant is that the inclusion or exclusion of a particular feature is judged by the difference of two AUC values, regardless of whether the difference is statistically significant or not. Therefore, the procedure does not stop until it finishes searching all necessary combinations that are allowed in the SFFS framework. However, this approach inevitably increases the computational time. Moreover, it tends to include more features even though the increase in the AUC value is not statistically significant. Hence, the selected feature set makes the classifier less reliable, given the relatively small dataset usually used in the development of CADe schemes. To mitigate these two issues associated with the first variant, we proposed a second variant of our feature selection method, denoted as MaxAUCSVMStat. The main difference is the criterion used to include a particular feature into the selected set. Only if the increase in the AUC value by including a particular feature is statistically significant, the method chooses that feature. On the other hand, we do not impose this condition for feature deletion, i.e., if the decrease or increase in the AUC value is not statistically significant, we will delete that feature from the selected subset, because the feature subset becomes more compact by doing so.

To perform statistical testing on the difference of two AUC values, we used a binormal model to estimate the AUC value from the outputs of the SVM classifier [30]. Given the null hypothesis that the two outputs from the SVM classifier with two different selected feature subsets arose from ROC curves with equal areas beneath them, we calculated the z-score statistic [29], defined as

z=Az1Az2var(Az1)+var(Az2)2cov(Az1,Az2) (7)

where Az1 and var(Az1) refer to the estimated AUC value and variance associated with the case of selected feature subset one, respectively, Az2 and var(Az2) are the corresponding quantities for feature subset two, and cov(Az1, Az2) is the estimated co-variance between two cases. These quantifies were estimated via maximum likelihood estimation method [25]. The z-score statistic is then referred to tables of the standard normal distribution. The value of z above a threshold, e.g., z > 1.96, is considered as evidence that the null hypothesis has to be rejected, and hence the difference between two AUC values is statistically significant (two-tailed p-value < 0.05). We estimated the AUC value based on the binormal model as [30]

Az=Φ(a1+b2) (8)

where Φ is the cumulative probability function of a standard normal distribution function, a and b are the intercept and slope parameters, respectively, that specify an ROC graph in the normal-deviate coordinate. The maximum likelihood method was employed to estimate the two parameters.

The two variants of the proposed maximal AUC SFFS feature selection methods have their own merits. The first variant, MaxAUCSVM, is able to explore all possible feature combinations allowed in the SFFS procedure and select the one that achieves the maximum AUC value. However, this is obtained at the expense of excessive computational time. On the contrary, the second variant, MaxAUCSVMStat, aims at reducing the computational time with the selected feature subset of possibly a smaller AUC value, because it can happen that the increase in the AUC value becomes statistically significant by including more features as the SFFS procedure continues. Therefore, it is a tradeoff between performance and computational time.

D. Study Design and Performance Evaluation Criteria

The proposed feature selection method consists of parameter optimization for an SVM classifier in the training phase. Given the small sample size of our database, it is critical to apply appropriate strategies for training and testing the proposed method in order to avoid over-fitting. Cross-validation is a popular method to reduce the bias and overcome over-fitting in machine learning when the sample size is small. Based on the number of available cases in the database, we used a fivefold cross-validation method to estimate the AUC value for the candidate feature subsets chosen by the SFFS procedure at each step, and also to optimize the parameters in the SVM classifier. All of the lesions and nonlesions obtained from one case appeared in either training data or testing data. There was no crossover of one case (patient) belonging to both training and testing samples. The purpose was to eliminate the bias that results from testing of a classifier trained with data samples from the same patient.

After we optimized the kernel parameter for the SVM classifier, we applied the leave-one-lesion-out cross-validation method to perform feature selection and reported the final results of the trained SVM classifier with the selected feature set. We compared the proposed method to the popular stepwise feature selection method based on Wilks' lambda coupled with an LDA classifier. The proposed feature selection framework is very generic. In fact, other classifiers, such as an LDA classifier or an artificial neural network, can be used instead of an SVM classifier. Therefore, we replaced the SVM classifier with an LDA classifier in Fig. 2 and denoted the method as MaxAUCLDA, based on the first variant of the SFFS procedure [31].

IV. Results

A. Optimization of the SVM Classifier

We used a fivefold cross-validation method to choose an optimal kernel function with suitable parameters in the SVM classifier. In this study, we only focused on the polynomial (3) and Gaussian (4) kernel functions. Feature vectors were normalized to a range between 0 and 1. Table II presents the AUC values indicating the performance of the SVM with different kernel functions and parameters for the colon database. For each parameter, we applied the feature selection method, MaxAUCSVM, to choose an optimal set of feature vectors. The AUC values were obtained for the SVM classifier with the optimal feature subset in the fivefold cross-validation procedure. We experimented with four different values for the d parameter in the polynomial kernel function and seven different kernel widths for σ in the Gaussian function. The SVM classifier with a Gaussian kernel function performed better than that with a polynomial kernel function, which suggested that the decision boundaries between lesions and nonlesions were highly nonlinear. The AUC values reached a maximum with a Gaussian kernel function with σ = 0.2. We used the Gaussian kernel function with the optimal parameters for the SVM classifier in the following experiments.

Table II. AUC Values for Different Kernel Functions With Different Parameters in the SVM Classifier for Colon Database.

Model parameters AUC Value
Polynomial d 2 0.89
4 0.90
6 0.91
8 087
Gaussian σ 0.05 092
0.1 0.93
0.2 0.94
0.7 092
1 0.9
5 0.91
10 0.85

B. Comparison of Different Feature Selection Methods

After the parameter optimization of the SVM classifiers, we applied the proposed feature selection methods in a leave-one-lesion-out cross-validation procedure. To have a fair comparison, we used the same cross-validation procedure for the feature selection based on Wilks' lambda and MaxAUCLDA. The feature selection procedure shown in Fig. 2 produced one set of features. Then, we applied cross-validation to report the classification performance.

Fig. 3 plots the AUC values versus different selected feature subset sizes from the first variant of our proposed feature selection method, MaxAUCSVM. As the number of selected features increases, the AUC value first increases, and it reaches its maximum when the feature subset size is 29. Then, the AUC value starts to decrease, which suggests that the added features cause the classifier performance to deteriorate. If we used all the extracted features without performing feature selection, the AUC value would be 0.51, which is slightly better than random guessing. This clearly illustrates the importance and necessity of feature selection in the classifier design to improve the overall performance of a CADe scheme. The first variant of the proposed feature selection method, MaxAUCSVM, explored all feature combinations in the SFFS procedure. The second variant, MaxAUCSVMStat, stopped the process at a feature subset size of 7, because the increase in the AUC value with a feature subset size of 8 was not statistically significant. The selected feature subset size and the corresponding AUC value achieved by the second variant of our proposed feature selection method were relatively smaller than those from the first variant. However, the advantage of the second variant over the first is a much lower computational cost.

Fig. 3.

Fig. 3

AUC values versus different feature subsets selected by the proposed feature selection method, MaxAUCSVM.

To compare different selected feature subsets from different feature selection methods, we present the individual selected features, subset sizes, AUC values, and nonlesion reduction rate without removal of any lesion in Table III. The “X” mark denotes individual features selected by the method. The first variant of the proposed feature selection method, MaxAUCSVM, chose 29 features in total, out of which 25 were 3-D features and 4 were 2-D features. They include gray-level-based (such as feature numbers 3–7), shape-based (such as feature numbers 13 and 15), geometry-based (such as feature numbers 17 and 18), histogram-based (such as feature numbers 22 and 40), and other features. On the other hand, of the 7 features selected by the second variant, MaxAUCSVMStat, all were 3-D features, which suggests that 3-D features contain the most relevant and discriminatory information in distinguishing polyps from nonpolyps in CTC. These seven 3-D features include gray-level-based features on the contour of the candidate, sphere irregularity, and features derived from the edge-enhanced CT images. Note that 14 common features appear in the selected feature subsets by MaxAUCSVM and MaxAUCLDA. This accounts for around half of the selected features. However, their performance in terms of AUC values and nonlesion reduction rate without removal of any lesion is very different, which shows the substantial difference between a nonlinear SVM classifier and a LDA classifier. By comparing the feature subsets selected by MaxAUCLDA and the method based on Wilks' lambda, we observe that there are 11 common features in total. The feature selection method, MaxAUCLDA, resulted in more than twice the features compared to that based on Wilks' lambda. This observation suggests that different search procedures with different cost functions would have very different outcomes, even when the same classifiers were used.

Table III. Comparison of Selected Feature Subsets by Different Feature Selection Methods for the Colon Database.

Feature # Features Feature subsets selected by different methods

MaxAUC SVM MaxAUC SVMStat MaxAUC LDA Wilks' lambda

1 Maximum gray levels inside the lesion X
3 Mean gray levels inside the lesion X
4 Median gray levels inside the lesion X
5 Standard deviation of gray levels inside the lesion X X X
6 Maximum gray levels on the contour of the lesion X X
7 Minimum gray levels on the contour of the lesion X X
8 Mean gray levels on the contour of the lesion X X
9 Median gray levels on the contour of the lesion X X
10 Standard deviation of gray levels on the contour of the lesion X X
11 Summation over perimeter values of each 2D slice X X
12 Sphericity X X
13 Segmented lesion volume X X
14 Surface area of the candidate X X
15 Ratio of the overlapping volume between the candidate and a sphere (of the same volume) to the overall volume X X
16 Radial gradient index (RGI) inside the lesion X X
17 Radial gradient index outside the lesion X X
18 Tangential gradient index inside the lesion X X
20 Thresholds of top 10% histogram inside the lesion X X
22 Thresholds of bottom 10% histogram inside the lesion X X X
25 Minimum range of the histogram inside the lesion X
26 Maximum range of the histogram outside the lesion X
27 Minimum range of the histogram outside the lesion X X
28/30 Maximum/minimum range of the histogram of pixel values in Sobel images inside the candidate X
31 Minimum range of the histogram of pixel values in Sobel images outside the candidate X
32 Full width at half of the histogram in gray scale image X X X
39 Full width at 10% maximum of the histogram in Sobel image X
40 Histogram overlap in the gray scale images X
41 Histogram overlap in the Sobel images X
42 Voxel intensity difference X
44 Absolute distance between normalized histograms X
45 Shannon entropy of normalized histogram X
47 Matsutsita distance of normalized histograms X X
49 Voxel intensity difference in Sobel image X X
50 Voxel separation in Sobel image X X
52 Shannon entropy of normalized histogram in Sobel image X X X
55 Mean voxel intensity in the Sobel image X X X
57 Relative standard deviation in the Sobel image X
59 Average Sobel power value inside the 2D contour X X X
60 Mean gray levels inside the 2D lesion X
61 Mean gray levels outside (band region) the 2D lesion X X
62 Standard deviation of gray levels inside the 2D lesion X
64/66 Area of the 2D contour and Circularity X
67 Ratio of overlapping area X
69/74 RGI and entropy texture feature inside the 2D lesion X X

Feature subset size 29 7 30 14

AUC value 0.96 0.95 0.92 0.89

Non-polyp reduction rate without removal of any polyp 83.9% 74.5% 60.7% 29.3%

C. Performance Comparisons Among Different Feature Selection Methods

Both variants of the proposed feature selection method yielded a much higher performance than did the ones based on Wilks' lambda and MaxAUCLDA. The proposed feature selection methods achieved AUC values of 0.96 and 0.95, respectively, for the two variants, whereas the popular feature selection method based on Wilks' lambda yielded an AUC value of 0.89. MaxAUCLDA produced an AUC value of 0.92. We performed statistical tests among different feature selection methods, as shown in Table IV. The results show that the differences in AUC values between the proposed feature selection methods and the other two (i.e., MaxAUCLDA and Wilks' lambda) were statistically significant (with two-sided p-values < 0.05). However, the difference in AUC values between the two variants of the proposed feature selection methods was not statistically significant (with a two-sided p-value = 0.06). The FROC analysis provides more insights into the performance of different feature selection methods. Fig. 4 indicates that the first variant of the proposed feature selection method, MaxAUCSVM, was able to reduce 83.9% (2202/2624) of nonpolyps without removing any of the 24 polyps in a leave-one-lesion-out cross-validation test, i.e., a 96% (24/25) by-polyp sensitivity was achieved at an FP rate of 4.1 (422/103) per patient, whereas the second variant, MaxAUCSVMStat, eliminated 74.5% of nonpolyps without removal of any polyps and yielded a performance of 6.5 (669/103) FPs per patient at the same sensitivity. Although the difference in AUC values between the two variants was not statistically significant, the first variant was able to achieve a higher performance in terms of an FP rate per patient at the same sensitivity. The feature selection method based on Wilks' lambda yielded a performance of 18.0 (1854/103) FPs per patient by eliminating 29.5% of nonpolyps. The feature selection method, MaxAUCLDA, yielded a performance in between, i.e., 10.0 (1030/103) FPs per patient by reducing 60.7% of nonpolyps. It is evident from these results that our proposed feature selection performed much better than did the popular one based on Wilks' lambda and MaxAUCLDA.

Table IV. Statistical Tests Among the Performance (AUC Values) of Four Different Feature Selection Methods in the Distinction Between Polyps and Nonpolyps.

MaxAUC SVMStat (AUC=0.95±0.01) MaxAUC LDA (AUC=0.92±0.01) Wilks' lambda (AUC=0.89±0.02)
MaxAUC SVM (AUC=0.96±0.02) 0.06 0.03 0.02
MaxAUC SVMStat 0.02 0.04
MaxAUC LDA 0.01

The AUC values with standard errors and two-sided p values are shown.

Fig. 4.

Fig. 4

FROC curves for the CADe schemes incorporating four different feature selection methods for the colon database. The performance of the initial candidate detection is shown on the far right with a 96.0% sensitivity at 25.5 FPs per patient.

We compared the computational costs of the proposed feature selection methods on a workstation (Intel, Xeon, 2.7 GHz, 1 GB RAM). The MaxAUCSVM took 23 h. The MaxAUCSVMStat, on the other hand, only took 5 h. The results show that MaxAUCSVMStat is able to save computational cost during the training stage by performing a statistical test for an early stop. The difference in training time for the two variants is substantial. However, this is compensated for by a better performance of the MaxAUCSVM.

V. Discussions

The novelty of our approach is to use a nonlinear classifier to select, train, and test relevant features directly and consistently. Previous studies such as the one in [21] applied an LDA classifier for selection of features during training, but used a nonlinear neural network classifier for testing (i.e., actual classification). This is not a principled approach towards feature selection because the features chosen by a linear classifier are not necessary optimal for a nonlinear classifier. This would be the reason why their method failed to achieve a higher performance against Wilks' lambda-based feature selection in their test [21]. Our proposed technique is based on a consistent, principled approach to feature selection and classification where both problems are handled at the same time. Other studies also used two different types of classifiers for feature selection and classifier testing (i.e., actual classification). For example, Bhooshan et al. [32] used the stepwise feature selection based on Wilks' lambda coupled with LDA for feature selection and Bayesian neural networks for classification. Lee at al. [33] applied a Gaussian kernel SVM for ranking individual features, but used a least-square SVM instead for actual classification. Their approaches were not optimal in terms of classification performance and the computational relevance between algorithms for feature selection and those for classification. Our technique presented a consistent, principled manner for feature selection and classification such that the selected features are indeed optimal for the final classifier used in the CADe scheme. Moreover, the AUC criterion we used in the selection of features reflects how a CADe scheme performs in general. It would be more suitable than the mean sensitivity of FROC used in [16] which only measured the performance of a CADe scheme in a certain specificity range. Li proposed FloatBoost to minimize classification error directly based on a backtrack mechanism [34]. The Float-Boost learning is different from our proposed feature selection method where SVM has been used for selection of an optimal feature subset by maximization of AUC value. Another novelty of our approach compared to other studies such as the ones in [21]–[25] and [33] is that the second variant of our method conducted statistical tests during the searching procedure that makes the feature selection step more reliable and efficient.

We used fivefold and leave-one-lesion out cross validation method to optimize the parameters in SVM and report performance. This procedure provided a robust and principled way to select a subset of optimal features while minimizing the risk of over-fitting given the relatively small number of true positive samples in the dataset.

VI. Conclusion

We have developed a maximal AUC SFFS feature selection method coupled with a nonlinear SVM classifier for CADe of polyps in CTC. The proposed method selected the most relevant features that would maximize the AUC value of the ROC curve. We presented two variants of the proposed method. Our feature selection method achieved a performance of 96% by-polyp sensitivity with 4.1 and 6.5 FPs per patient, whereas the conventional stepwise feature selection based on Wilks' lambda yielded the same sensitivity with 18.0 FPs per patient, and a maximal AUC SFFS one coupled with a LDA classifier achieved 10.0 FPs per patient at the same sensitivity level, in a leave-one-lesion-out cross-validation test in a CADe scheme for detection of polyps in CTC. One advantage of the second variant over the first one is its much lower computational cost by a factor of 4.6.

Acknowledgments

This work was supported in part by the National Cancer Institute/National Institutes of Health under Grant R01CA120549 and in part by the NIH under Grant S10 RR021039 and Grant P30 CA14599.

Biographies

graphic file with name nihms648379b1.gif

Jian-Wu Xu (M'07) received the B.S. degree in electrical engineering from Zhejiang University, Hangzhou, China, in 2002, and the Ph.D. degree in electrical and computer engineering from the University of Florida, Gainesville, USA, in 2007.

He was an Intern at Siemens Medical Solutions, Malvern, PA, USA, and RIKEN Brain Science Institute, Japan. He is currently a Postdoctoral Scholar with the Department of Radiology, University of Chicago, Chicago, IL, USA. His current research interests include computer-aided detection, medical image analysis, and adaptive signal processing.

Dr. Xu is a member the Tau Beta Pi and Eta Kappa Nu.

graphic file with name nihms648379b2.gif

Kenji Suzuki (SM'04) received his Ph.D. degree in information engineering from Nagoya University in 2001.

From 1993 to 2001, he was with Hitachi Medical Corporation and then with Aichi Prefectural University as a Faculty Member. In 2001, he joined the Department of Radiology, University of Chicago, Chicago, IL, USA, where since 2006, he has been an Assistant Professor of Radiology, Medical Physics, and Cancer Research Center. His current research interests include computer-aided diagnosis and machine learning. He has authored or coauthored 230 papers (including 95 peer-reviewed journal papers). He has been the Editor-in-Chief and an Associate Editor of 27 leading international journals including Medical Physics, International Journal of Biomedical Imaging, and Academic Radiology.

Dr. Suzuki has received the Paul Hodges Award, three RSNA Certificate of Merit Awards and Research Trainee Prize, the Cancer Research Foundation Young Investigator Award, the SPIE Honorable Mention Poster Award, the IEEE Outstanding Member Award, and the Kurt Rossmann Excellence in Teaching Award.

Contributor Information

Jian-Wu Xu, Email: jwxu@uchicago.edu.

Kenji Suzuki, Email: suzuki@uchicago.edu.

References

  • 1.Jemal A, Siegel R, Ward E, Hao Y, Xu J, Thun MJ. Cancer statistics, 2010. CA Cancer J Clin. 2010;60:225–249. doi: 10.3322/caac.20006. [DOI] [PubMed] [Google Scholar]
  • 2.van Ravesteijn VF, van Wijk C, Vos FM, Truyen R, Peters JF, Stoker J, van Vliet LJ. Computer-aided detection of polyps in CT colonography using logistic regression. IEEE Trans Med Imag. 2010 Jan;29(1):120–131. doi: 10.1109/TMI.2009.2028576. [DOI] [PubMed] [Google Scholar]
  • 3.Zhu H, Liang Z, Pickhardt PJ, Barish MA, You J, Fan Y, Lu H, Posniak EJ, Richards RJ, Cohen HL. Increasing computer-aided detection specificity by projection features for CT colonography. Med Phys. 2010 Apr;37:1468–1481. doi: 10.1118/1.3302833. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Yao J, Summers RM, Hara AK. Optimizing the support vector machines (SVM) committee configuration in a colonic polyp CAD system. presented at the SPIE, Med. Imag.; San Diego, CA, USA. 2005. [Google Scholar]
  • 5.Giger ML, Chan HP, Boone J. Anniversary paper: History and status of CAD and quantitative image analysis: The role of medical physics and AAPM. Med Phys. 2008;35:5799–5820. doi: 10.1118/1.3013555. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Guyon I, Gunn S, Nikravesh M, Zadeh LA. Feature Extraction: Foundations and Applications. New York, NY, USA: Springer-Verlag; 2006. [Google Scholar]
  • 7.Tan M, Wang L, Tsang IW. Learning sparse SVM for feature selection on very high dimensional datasets. presented at the 27th Int. Conf. Mach. Learning; Haifa, Israel. 2010. [Google Scholar]
  • 8.Xu Z, Jin R, Ye J, Lyu MR, King I. Non-monotonic feature selection. presented at the Int. Conf. Mach. Learning; Montreal, Canada. 2009. [Google Scholar]
  • 9.Sahiner B, Petrick N, Chan HP, Hadjiiski LM, Paramagul C, Helvie MA, Gurcan MN. Computer-aided characterization of mammographic masses: Accuracy of mass segmentation and its effects on characterization. IEEE Trans Med Imag. 2001 Dec;20(12):1275–1284. doi: 10.1109/42.974922. [DOI] [PubMed] [Google Scholar]
  • 10.Yoshida H, Nappi J. Three-dimensional computer-aided diagnosis scheme for detection of colonic polyps. IEEE Trans Med Imag. 2001 Dec;20(12):1261–1274. doi: 10.1109/42.974921. [DOI] [PubMed] [Google Scholar]
  • 11.Campadelli P, Casiraghi E, Artioli D. A fully automated method for lung nodule detection from postero-anterior chest radiographs. IEEE Trans Med Imag. 2006 Dec;25(12):1588–1603. doi: 10.1109/tmi.2006.884198. [DOI] [PubMed] [Google Scholar]
  • 12.Maggio S, Palladini A, De Marchi L, Alessandrini M, Speciale N, Masetti G. Predictive deconvolution and hybrid feature selection for computer-aided detection of prostate cancer. IEEE Trans Med Imag. 2010 Feb;29(2):455–464. doi: 10.1109/TMI.2009.2034517. [DOI] [PubMed] [Google Scholar]
  • 13.Pudil P, Novovicova J, Kittler J. Floating search methods in feature selection. Pattern Recog Lett. 1994;15:1119–1125. [Google Scholar]
  • 14.Suzuki K. Determining the receptive field of a neural filter. J Neural Eng. 2004 Dec;1:228–237. doi: 10.1088/1741-2560/1/4/006. [DOI] [PubMed] [Google Scholar]
  • 15.Suzuki K, Horiba I, Sugie N. A simple neural network pruning algorithm with application to filter synthesis. Neural Process Lett. 2001 Feb;13:43–53. [Google Scholar]
  • 16.Huang PW, Lee CH. Automatic classification for pathological prostate images based on fractal analysis. IEEE Trans Med Imag. 2009 Jul;28(7):1037–1050. doi: 10.1109/TMI.2009.2012704. [DOI] [PubMed] [Google Scholar]
  • 17.Takemura A, Shimizu A, Hamamoto K. Discrimination of breast tumors in ultrasonic images using an ensemble classifier based on the AdaBoost algorithm with feature selection. IEEE Trans Med Imag. 2010 Mar;29(3):598–609. doi: 10.1109/TMI.2009.2022630. [DOI] [PubMed] [Google Scholar]
  • 18.Boroczky L, Zhao L, Lee KP. Feature subset selection for improving the performance of false positive reduction in lung nodule CAD. IEEE Trans Inf Technol Biomed. 2006 Jul;10(3):504–511. doi: 10.1109/titb.2006.872063. [DOI] [PubMed] [Google Scholar]
  • 19.Park SC, Chapman BE, Zheng B. A multi-stage approach to improve performance of computer-aided detection of pulmonary embolisms depicted on CT images: Preliminary investigation. IEEE Trans Biomed Eng. 2011 Jun;58(6):1519–1527. doi: 10.1109/TBME.2010.2063702. [DOI] [PubMed] [Google Scholar]
  • 20.Mohamed SS, Salama MMA. Prostate cancer spectral multifeature analysis using TRUS images. IEEE Trans Med Imag. 2008 Apr;27(4):548–556. doi: 10.1109/TMI.2007.911547. [DOI] [PubMed] [Google Scholar]
  • 21.Hupse R, Karssemeijer N. The effect of feature selection methods on computer-aided detection of masses in mammograms. Phys Med Biol. 2010;55:2893–2904. doi: 10.1088/0031-9155/55/10/007. [DOI] [PubMed] [Google Scholar]
  • 22.Rakotomamonjy A. Optimizing area under ROC curve with SVMs. presented at the Proc. ROC Anal. Artif. Intell.; Valencia, Spain. 2004. [Google Scholar]
  • 23.Marrocco C, Duin RPW, Tortorella F. Maximizing the area under the ROC curve by pairwise feature combination. Pattern Recog. 2008;41:1961–1974. [Google Scholar]
  • 24.Rui W, Ke T. Feature selection for maximizing the area under the ROC curve. presented at the IEEE Int. Conf. Data Mining Workshops; Miami, FL, USA. 2009. [Google Scholar]
  • 25.Canul-Reich J, Hall LO, Goldgof D, Eschrich SA. Feature selection for microarray data by AUC analysis. presented at the IEEE Int. Conf. Syst. Man and Cybern; Singapore. 2008. [Google Scholar]
  • 26.Vincent L. Morphological transformations of binary images with arbitrary structuring elements. Signal Process. 1991;22:3–23. [Google Scholar]
  • 27.Vapnik VN. The Nature of Statistical Learning Theory. 2nd. New York, NY, USA: Springer-Verlag; 1998. [Google Scholar]
  • 28.Kohavi R, John G. Wrappers for feature subset selection. Artif Intell. 1997;97:273–324. [Google Scholar]
  • 29.Hanley J, McNeil B. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology. 1982;143:29–36. doi: 10.1148/radiology.143.1.7063747. [DOI] [PubMed] [Google Scholar]
  • 30.Metz CE, Herman BA, Shen JH. Maximum likelihood estimation of receiver operating characteristic (ROC) curves from continuously-distributed data. Statist Med. 1998 May 15;17:1033–1053. doi: 10.1002/(sici)1097-0258(19980515)17:9<1033::aid-sim784>3.0.co;2-z. [DOI] [PubMed] [Google Scholar]
  • 31.Xu J, Suzuki K. Computer-aided detection of hepatocellular carcinoma in hepatic CT: False positive reduction with feature selection. presented at the 8th IEEE Int. Symp. Biomed. Imaging (ISBI 2011); Chicago, IL, USA. [Google Scholar]
  • 32.Bhooshan N, Giger ML, Jansen SA, Li H, Lan L, Newstead GM. Cancerous breast lesionson dynamic contrast-enhanced MR images: Computerized characterization for image-based prognostic markers. Radiology. 2010 Mar;254:680–690. doi: 10.1148/radiol.09090838. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Lee SH, Kim JH, Cho N, Park JS, Yang Z, Jung YS, Moon WK. Multilevel analysis of spatiotemporal association features for differentiation of tumor enhancement patterns in breast DCE-MRI. Med Phys. 2010 Aug;37:3940–3956. doi: 10.1118/1.3446799. [DOI] [PubMed] [Google Scholar]
  • 34.Li SZ, Zhenqiu Z. FloatBoost learning and statistical face detection. IEEE Trans Pattern Anal Mach Intell. 2004 Sep;26(9):1112–1123. doi: 10.1109/TPAMI.2004.68. [DOI] [PubMed] [Google Scholar]

RESOURCES