Abstract
Breast masses due to benign disease and malignant tumors related to breast cancer differ in terms of shape, edge-sharpness, and texture characteristics. In this study, we evaluate a set of 22 features including 5 shape factors, 3 edge-sharpness measures, and 14 texture features computed from 111 regions in mammograms, with 46 regions related to malignant tumors and 65 to benign masses. Feature selection is performed by a genetic algorithm based on several criteria, such as alignment of the kernel with the target function, class separability, and normalized distance. Fisher’s linear discriminant analysis, the support vector machine (SVM), and our strict two-surface proximal (S2SP) classifier, as well as their corresponding kernel-based nonlinear versions, are used in the classification task with the selected features. The nonlinear classification performance of kernel Fisher’s discriminant analysis, SVM, and S2SP, with the Gaussian kernel, reached 0.95 in terms of the area under the receiver operating characteristics curve. The results indicate that improvement in classification accuracy may be gained by using selected combinations of shape, edge-sharpness, and texture features.
Key words: Breast masses, breast tumors, mammography, computer-aided diagnosis, feature selection, pattern classification, kernel-based classifiers, shape analysis, edge-sharpness analysis, texture analysis
Introduction
Worldwide, breast cancer is the most common form of cancer and the second most common cause of cancer deaths in females; the disease affects approximately 10% of all women at some stage of their life in the Western world.1 Breast cancer may be detected via a careful study of clinical history, physical examination, and imaging with either mammography or ultrasound. However, definitive diagnosis of a breast mass may require, in some cases, fine-needle aspiration biopsy, core needle biopsy, or excisional biopsy.2 Mammography has been shown to be effective in screening asymptomatic women by detecting occult breast cancers and by reducing mortality by as much as 35% in women aged between 50 and 69 years.3,4 To improve the accuracy and efficiency of mammographic screening programs for the detection of early signs of breast cancer, a number of research projects are focusing on developing methods for computer-aided diagnosis to assist radiologists in diagnosing breast cancer, including works on image analysis5–20 and computational intelligence21–27 for efficient detection of breast cancer.
Breast tumors and masses usually appear in the form of dense regions in mammograms. Benign masses generally possess smooth, round, and well-circumscribed boundaries, as opposed to malignant tumors, which usually have spiculated, rough, and blurry boundaries.28 Several shape features have been proposed for the classification of benign masses and malignant tumors.5,6,13–15 The need for measures to characterize the sharpness of a region of interest (ROI) in an image has also been recognized, leading to different algorithms for the computation of measures of edge sharpness.6,7,14 In addition, subtle textural differences have been observed between benign masses and malignant tumors, with the former being mostly homogeneous and the latter showing heterogeneous texture.6,28 Methods of computing texture features have been proposed using the mass margin7,8,14 or ribbons of pixels around masses obtained using the “rubber band straightening transform.”29
To study the incorporation of features representing multiple radiological characteristics in the analysis of breast masses, many combinations of shape, edge-sharpness, and texture features, formed on the basis of diagnostic significance and classification performance, have been evaluated using several pattern classification methods,6,14,23–25,27 including several linear classifiers, artificial neural networks (ANNs), and kernel-based classification methods. The classification accuracy of the relatively weak texture features reached a level comparable to that of shape features using ANNs with a selected topological structure23 but was much lower than that of shape features using a linear classifier.24 Nandi et al.25 applied genetic programming (GP), associated with sequential forward (and backward) selection and statistical tests, to select combinations of shape, edge-sharpness, and texture features that are important for the purpose of classification using the GP classifier. Previous research works demonstrate that feature combinations with high classification performance using a specific classifier may not always be extended to other classifiers. Thus, in this paper, we propose to select combinations of shape, edge-sharpness, and texture features independent of any classifier, so that the selected combinations are suitable for use with several different classifiers. A genetic algorithm (GA)30 is employed, instead of an exhaustive search of all possible subsets of features of the chosen cardinality, based on measures of data separability in the original feature space, such as alignment of the kernel with the target function,31 class separability,32 and normalized distance.33 We also propose advanced kernel-based pattern classification algorithms that can yield higher classification accuracy when the features used are not well-separated.
Aizerman et al.34 introduced the idea of using kernel functions in machine learning as inner products in a corresponding feature space. Kernel methods in pattern analysis embed the data in a suitable feature space and then use algorithms based on linear algebra, geometry, and statistics to discover patterns in the embedded data. Several different kernel-based classifiers have been proposed: Boser et al.35 combined kernel functions with large-margin hyperplanes, leading to kernel-based support vector machines (SVMs) that are highly successful in solving various nonlinear and nonseparable problems in machine learning. Fisher36 proposed a method, which is well known as Fisher’s linear discriminant analysis (FLDA), to seek separating hyperplanes that best separate two or more classes of samples based on the ratio of the between-class scatter to the within-class scatter. Mika et al.37 combined kernel functions with FLDA, leading to kernel Fisher’s discriminant analysis (KFDA). We recently proposed the strict two-surface proximal (S2SP) classifier38,39: a classifier that seeks two cross proximal planes to fit the distribution of the samples in the empirical feature space by maximizing two strict optimization objectives with a “square of sum” optimization factor; kernel functions are employed to incorporate nonlinearity. In the present paper, we address the problem of identifying malignant breast tumors using FLDA and the linear versions of the SVM and the S2SP classifier, as well as KFDA and the nonlinear versions of the SVM and the S2SP classifier with the Gaussian kernel, using selected combinations of shape, edge-sharpness, and texture features.
Image Database and Feature Extraction
The digitized mammographic image set used in this study contains 111 ROIs extracted from mammograms, with 65 related to benign masses and 46 to malignant tumors, obtained by combining two image databases. One set of images was obtained from “Screen Test: Alberta Program for the Early Detection of Breast Cancer,”24,40 with 37 ROIs related to benign masses and 20 ROIs related to malignant tumors. The images were digitized using the Lumiscan 85 scanner at a resolution of 50 μm and 12 bits per pixel. The other set was obtained by using images containing masses from the Mammographic Image Analysis Society (MIAS, UK) database41 and the teaching library of the Foothills Hospital in Calgary,14 with a total of 54 ROIs including 28 benign and 26 malignant types. The MIAS images were digitized at a resolution of 50 μm, whereas the Foothills Hospital images were digitized at a resolution of 62 μm. The diagnosis of each case was proven by biopsy. Mass or tumor ROIs were manually identified, and contours were drawn by a radiologist experienced in screening mammography. Twenty-two features were extracted from each ROI, including 5 shape features (C, Fcc, FF, SI, and FD), 3 edge-sharpness features (A, Co, and CV), and 14 texture features, which are explained in the following paragraphs.
Shape features: Five shape features are considered in this study, including compactness (C), fractional concavity (Fcc), Fourier factor (FF), spiculation index (SI), and fractal dimension (FD). C is a simple measure of the efficiency of a contour to contain a given area and is defined in a normalized form as
, where D and S are the contour perimeter and area, respectively.15Fcc is the ratio of the cumulative length of the concave parts to the total length of the contour.15 Benign masses, due to their round or oval contours, result in low values of C and Fcc. On the other hand, contours of microlobulated or spiculated malignant tumors may be expected to have several significant concave portions, and hence, large values of Fcc, as well as C. FF is a measure related to the presence of roughness or high-frequency components in the contours.13,14. SI represents the degree of spicularity of a contour. Rangayyan et al.15 proposed an algorithm to compute SI based upon a polygonal model of the given contour and a combination of the segment lengths, base widths, and angles of possible spicules. Due to their effect on the surrounding tissues, most malignant tumors form narrow, stellate distortions around their boundaries and, hence, have higher values of SI than benign masses with smooth contours. FD can be used to characterize self-similarity, nested complexity, or space-filling properties, and was derived by using the two-dimensional ruler method.13Edge-sharpness features: Three edge-sharpness features are used in this study, including acutance (A), contrast (Co), and coefficient of variation (CV). A is a measure of the sharpness or change in density across a mass margin.14Co is a measure of contrast.7CV is a feature based on the coefficient of variation of the edge strength computed at all points on the boundary of the ROI.7
Texture features: Fourteen texture features were computed according to the definitions of Haralick et al.,42 using a ribbon of pixels around the margin of each mass,7,8,24 including angular second moment of energy (f1), contrast (f2), correlation (f3), sum of squares (f4), inverse difference moment (f5), sum average (f6), sum variance (f7), sum entropy (f8), entropy (f9), difference variance (f10), difference entropy (f11), information measure of correlation (f12 and f13), and maximal correlation coefficient (f14). The texture features were computed using ribbons of a width of 8 mm obtained by dilating the mass boundaries after filtering and downsampling the mammograms to an effective resolution of 200 μm per pixel.
Some of the shape features used are invariant to scaling (size) and rotation by design; others are normalized to remove the effect of spatial resolution and size. The texture and edge-sharpness features include normalization and should not be affected by the small differences in the pixel size used (50 and 62 μm).
Feature Selection
Given a set of l labeled training samples
, where Rn is the n-dimensional real feature space with a binary label space Y = 1,-1 and yi∈Y is the label assigned to the sample xi∈Rn, the purpose of feature selection is to select a subset of relevant features to build robust classifiers based on measures of separability constructed from the training samples.
Measures for Feature Analysis
Alignment
The alignment measure was introduced by Cristianini et al.31 to measure the similarity between two kernel functions or between a kernel and a target function. To perform feature selection, we employ the alignment between the inner product matrix (E) of features in the original feature space and the target label matrix, given as
![]() |
1 |
where y denotes the column vector of the labels of the training samples. The Frobenius product
between two Gram matrices M and N is defined as
![]() |
2 |
where “tr” denotes the trace of a matrix.31,32 This quantity captures the degree of agreement between the input features and the given learning task. A larger value of AE indicates a higher degree of agreement than a smaller value.
Class Separability
A quantity to measure the class separability of the training samples in the original feature space32 that we employ for feature selection is given by
![]() |
3 |
where SB and SW are the between-class and within-class scatter matrices, respectively, given by
![]() |
4 |
![]() |
5 |
![]() |
6 |
![]() |
7 |
where
and
represent samples belonging to the positive (+) and negative (−) classes, respectively; l+ and l− denote the number of positive and negative training samples, respectively; and C+ and C− denote the sets of positive and negative training samples, respectively. A larger value of J indicates better class separability of the training set than a smaller value.
Normalized Distance
We use a measure of normalized distance33 between the centers (mean vectors) of the two classes to evaluate their separability, given by
![]() |
8 |
A larger value of D indicates better class separability of the training set than a smaller value.
Genetic Algorithm
GA operates on a number of potential solutions (a population), applying the principle of survival of the fittest to produce better and better approximations (individuals) to a solution (see “Tutorial” of Chipperfield et al.43). Each individual is encoded as a string or chromosome composed over some alphabet, of which one commonly used representation is the binary alphabet {0,1}. The performance of each individual is assessed through an objective function and the fitness function. Highly fit individuals have a high probability of being selected for mating, whereas less fit individuals have a correspondingly low probability of being selected. The selected individuals are then recombined, using crossover, to produce the next generation with a probability Pc, which is to exchange genetic information between pairs or larger groups of individuals. A further genetic operator, called mutation, is then applied to the new individuals again with a low probability Pm, which is to ensure that the probability of searching a particular subspace of the problem space is never zero. The fractional difference between the size of the old population and the new population produced by selection, crossover, and mutation is termed as a generation gap. To maintain the size of the original population, those new individuals that have larger fitness values than those of the old individuals are brought into the new population. Therefore, the new population includes the old individuals, which are fitter than the new individuals, and the remaining places are taken by the comparatively fitter of the new individuals. GA is terminated after a prespecified number of generations; a subsequent test may be applied to check the quality of the highly fit members of the population. If no acceptable solutions are found, GA may be restarted or a fresh search may be initiated.
One of the main issues in applying GA to the problem of feature selection is to map the search space into a representation suitable for genetic search. In this study, we consider each feature in the candidate feature set as a binary gene. Each possible feature combination is encoded into an n-bit binary string, where n is the total number of features available. An n-bit individual corresponds to an n-dimensional binary feature vector X, where Xi = 0 represents elimination and Xi = 1 indicates inclusion of the ith feature. The objective function of GA is set as one of the measures of alignment, class separability, and normalized distance, as described in “Measures for Feature Analysis.”
Pattern Classification
The purpose of classification is to seek the best prediction of the label for an input sample x from the given set of labeled training samples
. A linear classifier achieves this by making a classification decision based on the value of a linear combination of the features. A kernel-based classifier performs the same task by applying the linear classification method in a kernel-transformed feature space κ, with a nonlinear mapping ϕ: Rn→κ, by using kernel functions as inner products in the feature space κ.
Linear Classifiers
Fisher’s Linear Discriminant Analysis
FLDA seeks the separating function
![]() |
9 |
by maximizing the following objective:
![]() |
10 |
where ω and b are the weight vector and the bias of the separating hyperplane, respectively, and SB and SW are the between-class and within-class scatter matrices, respectively (see Eqs. 4 and 5). The optimal values of ω and b can be calculated by solving a generalized eigenvalue problem.44 Letting f*(x) denote the derived optimal separating function, the label for an input sample is predicted by
![]() |
11 |
where sgn(x) is 1 when x ≥ 0 and −1 otherwise, and Yz is an estimate of the label for the input sample x.
Support Vector Machines
SVMs seek the separating function based on the maximal margin rule45 by solving the following constrained quadratic programming (QP) problem:
![]() |
12 |
subject to
![]() |
![]() |
where β = [β1, β2,...,βl]T are Lagrange multipliers and ξ is the regularization parameter set by the user. Letting
denote the optimal solution of the QP problem in Eq. 12, the optimal values of ω and b in the separating function given in Eq. 9 can be calculated as
![]() |
13 |
![]() |
14 |
where S+ and S− are two sets of support vectors with the same size S but different labels of 1 and −1. The label for an input sample x is then predicted by Eq. 11.
The Strict Two-Surface Proximal Classifier
Different from the discriminant classification methods, such as FLDA and SVMs, which predict the label based on one separating function, the S2SP classifier has been proposed to predict the label based on two proximal planes38,39:
![]() |
15 |
![]() |
16 |
where ω and b are the weight vector (direction) and the bias of the proximal planes, respectively; the subscripts 1 and 2 denote the first and second planes, respectively; and the first plane is as close to the points of the positive class while being as far as possible from the points of the negative class, whereas the second plane is as close to the points of the negative class while being as far as possible from the points of the positive class. Compared with the proximal classification method of multisurface proximal SVMs,46 the S2SP classifier eliminates the regularization term by employing a “square of sum” numerator, with consideration of the sign effect under the situation of misclassification with large projections onto the separating plane.
To obtain the first proximal hyperplane, the objective function to be maximized is
![]() |
17 |
the second proximal hyperplane is obtained by maximizing
![]() |
18 |
where the l− × n matrix X− represents samples from the negative class, the l+ × n matrix X+ represents samples from the positive class, e is a column vector with all elements equal to 1,
is used to denote the sum of the elements of the vector, and
is used to denote a column vector with the sum of each row. The optimal values of ω1, b1, ω2 and b2 can be calculated by solving two generalized eigenvalue problems.38,39 For linear classification in the original feature space, the two proximal planes serve as two ridge-like distribution models to fit the samples in the two classes. Letting d1(x) and d2(x) denote the Euclidean distance between a given sample x and the two proximal planes, respectively, one way to predict the label of x is to consider the values of d1(x), d2(x), and
together using linear discriminant analysis.44
Kernel-Based Classifiers
Kernel Fisher’s Discriminant Analysis
In the kernel-transformed feature space κ, by expanding the weight vector ω of f(x) in Eq. 9 into a linear summation of all training samples, the kernel-based separating function f(x) becomes
![]() |
19 |
where
denote the summation weights and K(·,·) is a kernel function used to compute the inner product matrix, the so-called kernel matrix, on pairs of samples in the kernel-transformed feature space κ. KFDA determines f*(x) by maximizing the Fisher criterion,47 as
![]() |
20 |
where
![]() |
![]() |
![]() |
![]() |
where μ+ and μ− denote the mean projections of the positive and negative samples, respectively, and σ+ and σ− are the corresponding standard deviations. By incorporating Eq. 19 into Eq. 20, the optimal values of
and b can be calculated by solving a generalized eigenvalue problem.47 The label for an input sample x is then predicted by Eq. 11.
Kernel-Based Support Vector Machines
An SVM with embedded kernel functions determines the optimal separating function in the kernel-transformed feature space by solving the following constrained QP problem instead of Eq. 12:
![]() |
21 |
subject to
![]() |
![]() |
Letting
denote the optimal solution of Eq. 21, the optimal value of the bias b can be calculated by
![]() |
22 |
The optimal separating function f*(x) is given by
![]() |
23 |
The label for an input sample x is then predicted by Eq. 11.
Kernel-Based Strict Two-Surface Proximal Classifier
In the kernel-transformed feature space κ, by expanding the direction vector of the hyperplane into a linear summation of all of the training samples, the two proximal hyperplanes are given as
![]() |
24 |
![]() |
25 |
where
and
are the summation weight vectors of the two proximal hyperplanes. To obtain the first proximal hyperplane, the objective function to be maximized is
![]() |
26 |
the second proximal hyperplane is obtained by maximizing
![]() |
27 |
where the l+ × l matrix K+ represents the kernel matrix between the samples from the positive class and all the training samples and the l− × l matrix K- represents the kernel matrix between the samples from the negative class and all the training samples. The optimal values of α1, b1, α2, and b2 can be calculated by solving two generalized eigenvalue problems.38,39 For nonlinear classification based on kernel functions, the two proximal planes serve as two Bayesian models to fit the samples in the two classes. The label for an input sample x is then predicted using the same method as with the linear S2SP classifier by considering the values of d1(x), d2(x), and
together (see “The Strict 2-Surface Proximal Classifier”).
Experiments, Results, and Comparative Analysis
Evaluation of Single Features
Distributions of the normalized values of the 22 features are shown in Figure 1, which illustrates the separability and (or) overlap between the benign and malignant categories of the various features used. Five different measures were evaluated for each single feature, including the area Az under the receiver operating characteristics (ROC) curve calculated by applying a sliding threshold with the LS-SVMlab1.5 toolbox,48 the p value of the t test derived by employing the function “ttest2” in MATLAB, as well as alignment, class separability, and normalized distance as described in “Image Database and Feature Extraction.” The “ttest2” program performs a t test of the hypothesis that two independent samples come from distributions with equal means and returns the result of the test in H, as well as the p value. H = 0 indicates that the null hypothesis (“means are equal”) cannot be rejected at the 5% significance level. H = 1 indicates that the null hypothesis can be rejected at the 5% level. p value is the probability of observing the given result, or one more extreme, by chance if the null hypothesis is true. Larger values of Az, alignment, class separability, and normalized distance and smaller values of the p value indicate stronger discriminating power. The calculated values of the five measures are recorded in Table 1; the features are ranked in Table 2 with each measure. A score was calculated by averaging the ranking numbers as given by the five measures for each feature; the resulting ranking of the features is recorded in the last column of Table 2. To ensure that only two edge-sharpness features would be included, features with scores less than 14 were selected to form the feature combination FSbest1 with all the five shape features, two edge-sharpness features A and CV, and six texture features f1, 2, 5, 9, 10, 11. To ensure that only one edge-sharpness feature would be included, features with scores less than 11 were selected to form the feature combination FSbest2 with all the five shape features, one edge-sharpness feature A, and five texture features f2,5,9,10,11.
Fig 1.
Distribution of the normalized values of each feature for the 111 breast masses. The cross marks denote malignant tumors, and the circle marks denote benign masses.
Table 1.
Values of the Various Measures of Separability Studied for Each Feature
| Features | Az | Alignment | Separability | Distance | p value |
|---|---|---|---|---|---|
| C | 0.8980 | 0.4677 | 0.0345 | 5.2186 | 0.0000 |
| SI | 0.9134 | 0.5039 | 0.0401 | 4.8699 | 0.0000 |
| Fcc | 0.8866 | 0.4576 | 0.0331 | 5.5692 | 0.0000 |
| FF | 0.9100 | 0.4809 | 0.0364 | 5.2407 | 0.0000 |
| FD | 0.9137 | 0.4459 | 0.0315 | 3.9774 | 0.0000 |
| Co | 0.5843 | 0.0275 | 0.0011 | 0.1623 | 0.0776 |
| A | 0.6839 | 0.0913 | 0.0039 | 0.5414 | 0.0011 |
| CV | 0.7027 | 0.0550 | 0.0022 | 0.1903 | 0.0119 |
| f1 | 0.6863 | 0.0879 | 0.0037 | 0.4982 | 0.0013 |
| f2 | 0.7110 | 0.0922 | 0.0039 | 0.4584 | 0.0010 |
| f3 | 0.5575 | 0.0173 | 0.0007 | 0.1058 | 0.1624 |
| f4 | 0.6405 | 0.0402 | 0.0016 | 0.2393 | 0.0323 |
| f5 | 0.6829 | 0.0980 | 0.0042 | 0.8421 | 0.0007 |
| f6 | 0.5759 | 0.0242 | 0.0009 | 0.1779 | 0.0980 |
| f7 | 0.6398 | 0.0393 | 0.0016 | 0.2326 | 0.0341 |
| f8 | 0.6565 | 0.0712 | 0.0029 | 0.5038 | 0.0040 |
| f9 | 0.6853 | 0.0982 | 0.0042 | 0.7844 | 0.0007 |
| f10 | 0.7187 | 0.0971 | 0.0041 | 0.3577 | 0.0007 |
| f11 | 0.6896 | 0.0973 | 0.0041 | 0.8121 | 0.0007 |
| f12 | 0.5903 | 0.0272 | 0.0011 | 0.1882 | 0.0792 |
| f13 | 0.5082 | 0.0002 | 0.0000 | 0.0010 | 0.8794 |
| f14 | 0.5329 | 0.0064 | 0.0002 | 0.0347 | 0.3980 |
Table 2.
Ranking of the 22 Features Based on the Various Measures of Separability
| Features | Az | Alignment | Separability | Distance | p value | Average |
|---|---|---|---|---|---|---|
| C | 4 | 3 | 3 | 3 | 3 | 2 |
| SI | 2 | 1 | 1 | 4 | 1 | 1 |
| Fcc | 5 | 4 | 4 | 1 | 4 | 3 |
| FF | 3 | 2 | 2 | 2 | 2 | 4 |
| FD | 1 | 5 | 5 | 5 | 5 | 5 |
| Co | 18 | 17 | 17 | 19 | 17 | 17 |
| A | 12 | 11 | 11 | 9 | 11 | 11 |
| CV | 8 | 14 | 14 | 16 | 14 | 14 |
| f1 | 10 | 12 | 12 | 11 | 12 | 12 |
| f2 | 7 | 10 | 10 | 12 | 10 | 10 |
| f3 | 20 | 20 | 20 | 20 | 20 | 20 |
| f4 | 15 | 15 | 15 | 14 | 15 | 15 |
| f5 | 13 | 7 | 7 | 6 | 7 | 8 |
| f6 | 19 | 19 | 19 | 18 | 19 | 19 |
| f7 | 16 | 16 | 16 | 15 | 16 | 16 |
| f8 | 14 | 13 | 13 | 10 | 13 | 13 |
| f9 | 11 | 6 | 6 | 8 | 6 | 6 |
| f10 | 6 | 9 | 9 | 13 | 9 | 9 |
| f11 | 9 | 8 | 8 | 7 | 8 | 7 |
| f12 | 17 | 18 | 18 | 20 | 18 | 18 |
| f13 | 22 | 22 | 22 | 22 | 22 | 22 |
| f14 | 21 | 21 | 21 | 21 | 21 | 21 |
Independent Feature Selection
Feature selection was performed using GA based on the three measures, as described in “Measures for Feature Analysis,” calculated using all the 111 breast masses. Such a feature selection procedure is independent of any classifier. “Genetic Algorithm Toolbox for use with MATLAB (version 1.2)”43 was employed to implement GA. For each GA individual, we set that at least one feature should be selected from each feature category of shape, edge-sharpness, and texture. The objective function of GA was set as alignment, class separability, and normalized distance. The fitness function was calculated by using linear ranking. The population size was set as 50. The generation gap was set as 0.6. Two-point crossover was employed, with the corresponding crossover rate set as 0.8. Bit flip was used for mutation, with the corresponding mutation rate set as 0.05. The maximum number of generations was set as 500. In the last generation, 50 feature combinations were derived, corresponding to 50 final individuals, each represented by a 22-bit binary string.
The more frequently a feature appears in the GA individuals in the final population, the more important the feature is to provide better separability for classification. Thus, for each feature and each measure, a score of importance was calculated by counting the number of times the feature was included in the 50 individuals and divided by 50; the result is recorded in Table 3. Features with higher scores are more important for the purpose of classification. It can be seen from Table 3 that there is a clear gap between the important and unimportant features, with scores above 0.85 and lower than 0.15, respectively. The five shape features and the texture feature f9 possess high scores for all of the three measures (see Table 3). By running GA multiple times, the same three combinations of shape, edge-sharpness, and texture features, containing features with scores higher than 0.85, were derived based on the three objective functions: FSalignment with all the five shape features, one edge-sharpness feature CV, and one texture feature f9; FSseparability with all the five shape features, one edge-sharpness feature A, and one texture feature f9; and FSdistance with five shape features, one edge-sharpness feature A, and two texture features f6,9.
Table 3.
Scores of Importance Derived by GA with Different Objective Functions for Feature Selection
| Features | Alignment | Separability | Distance |
|---|---|---|---|
| C | 0.95 | 0.92 | 0.88 |
| SI | 0.96 | 0.94 | 0.93 |
| Fcc | 0.91 | 0.89 | 0.92 |
| FF | 0.92 | 0.92 | 0.96 |
| FD | 0.89 | 0.89 | 0.94 |
| Co | 0.09 | 0.06 | 0.06 |
| A | 0.06 | 0.93 | 0.95 |
| CV | 0.92 | 0.03 | 0.06 |
| f1 | 0.06 | 0.07 | 0.08 |
| f2 | 0.08 | 0.07 | 0.10 |
| f3 | 0.07 | 0.10 | 0.05 |
| f4 | 0.07 | 0.05 | 0.11 |
| f5 | 0.05 | 0.07 | 0.05 |
| f6 | 0.08 | 0.12 | 0.91 |
| f7 | 0.08 | 0.09 | 0.06 |
| f8 | 0.05 | 0.06 | 0.08 |
| f9 | 0.91 | 0.89 | 0.91 |
| f10 | 0.08 | 0.08 | 0.08 |
| f11 | 0.03 | 0.05 | 0.09 |
| f12 | 0.06 | 0.10 | 0.10 |
| f13 | 0.06 | 0.11 | 0.05 |
| f14 | 0.07 | 0.09 | 0.08 |
The bold-faced numbers represent significantly better performance figures
Pattern Classification with Selected Features
The following experiments were conducted with FSbest1, FSbest2, FSalignment, FSseparability, and FSdistance; the individual shape, edge-sharpness, and texture feature sets, as well as all of the 22 features combined without performing feature selection, for comparison. The input features were normalized to have zero mean and unit variance before being used by a classifier. Classification performance is shown in terms of the area Az under the ROC curve and the corresponding standard error. Each ROC curve was generated by applying a sliding threshold to the output of the classifier with the LS-SVMlab1.5 toolbox.48
Both the leave-one-out (LOO) and half-half-random (HHR) split procedures were used to evaluate the generalized performance of the classifiers with the features of the 111 breast masses. For the LOO procedure, each mass was used as the test sample once and the remaining 110 masses were used as the training samples; thus, 111 training-test trials were conducted. For the HHR split procedure, 100 training-test trials were conducted. In each trial, 25 benign masses and 25 malignant tumors were selected at random as the training samples (a total of 50 training samples), and the remaining 61 masses were used as the test samples.
Linear Classification
Nine different feature sets, including the shape, edge-sharpness, and texture feature sets; the whole set of 22 features; and the five selected combinations of shape, edge-sharpness, and texture features, were evaluated using FLDA, linear SVM (LSVM), and the linear S2SP (LS2SP) classifier with both the LOO and HHR split procedures. The corresponding classification accuracies in Az values are recorded in Tables 4 and 5. The classification accuracies of FLDA, LSVM, and the LS2SP classifier are comparable; FLDA and LSVM performed slightly better than the LS2SP classifier. The best linear classification performance achieved is 0.93 in Az value (see Tables 4 and 5).
Table 4.
The Classification Performance of Different Feature Sets Using Linear Classifiers with the LOO Procedure
| Feature Sets | LS2SP | FLDA | LSVM | |||
|---|---|---|---|---|---|---|
| Az | SE | Az | SE | Az | SE | |
| Shape | 0.90 | 0.03 | 0.91 | 0.03 | 0.91 | 0.03 |
| Gradient | 0.67 | 0.05 | 0.68 | 0.05 | 0.65 | 0.06 |
| Texture | 0.64 | 0.05 | 0.65 | 0.05 | 0.65 | 0.05 |
| All | 0.88 | 0.03 | 0.88 | 0.03 | 0.89 | 0.03 |
| FSalignment | 0.92 | 0.03 | 0.93 | 0.02 | 0.93 | 0.02 |
| FSseparability | 0.92 | 0.02 | 0.92 | 0.03 | 0.92 | 0.03 |
| FSdistance | 0.92 | 0.03 | 0.92 | 0.03 | 0.92 | 0.03 |
![]() |
0.87 | 0.04 | 0.92 | 0.03 | 0.91 | 0.03 |
![]() |
0.89 | 0.03 | 0.93 | 0.02 | 0.91 | 0.03 |
SE = standard error
Table 5.
The Classification Performance of Different Feature Sets Using Linear Classifiers with the HHR Split Procedure
| Feature Sets | LS2SP | FLDA | LSVM | |||
|---|---|---|---|---|---|---|
| Az | SE | Az | SE | Az | SE | |
| Shape | 0.91 | 0.03 | 0.91 | 0.03 | 0.92 | 0.02 |
| Gradient | 0.67 | 0.06 | 0.66 | 0.07 | 0.66 | 0.07 |
| Texture | 0.62 | 0.06 | 0.61 | 0.07 | 0.65 | 0.06 |
| All | 0.83 | 0.09 | 0.84 | 0.05 | 0.89 | 0.04 |
| FSalignment | 0.92 | 0.03 | 0.92 | 0.03 | 0.93 | 0.02 |
| FSseparability | 0.92 | 0.03 | 0.92 | 0.03 | 0.93 | 0.03 |
| FSdistance | 0.92 | 0.03 | 0.92 | 0.03 | 0.92 | 0.03 |
![]() |
0.87 | 0.04 | 0.89 | 0.03 | 0.91 | 0.03 |
![]() |
0.89 | 0.04 | 0.91 | 0.02 | 0.92 | 0.03 |
Each Az value shown is the average of the Az values over 100 HHR split classification experiments
SE = standard error
From Tables 4 and 5, the following observations can be made: Shape features are the most significant features, with higher classification accuracy than the other two sets of edge-sharpness and texture features. The addition of edge-sharpness and texture features does not improve the classification performance of the shape features without feature selection. Feature selection using GA based on data-dependent measures improves the classification performance over using the whole set of 22 features without feature selection. The feature combinations selected by GA (FSalignment, FSseparability, and FSdistance) perform better than the combinations selected by ranking the features (FSbest1 and FSbest2). However, shape features on their own provide classification performance similar to the performance provided by the feature combinations selected by GA.
Kernel-Based Classification
The same nine feature sets as listed in Section Linear Classification were evaluated using KFDA and the nonlinear versions of the SVM and the S2SP classifier, denoted by KSVM and KS2SP, respectively. The Gaussian kernel was employed to incorporate nonlinearity, with the form
![]() |
28 |
between two vectors xa and xb, where σ is the kernel width set by the user. The value of σ for each classifier was determined by cross-validation in this experiment. Both the LOO and HHR split procedures were used to evaluate the three kernel-based classifiers; the classification accuracies in Az values are recorded in Tables 6 and 7. The classification accuracies of KFDA, KSVM, and the KS2SP classifier are comparable (see Tables 6 and 7). Observations similar to those described in Section Linear Classification can also be made from Tables 6 and 7. The best nonlinear classification performance achieved is 0.95 in Az value.
Table 6.
The Classification Performance of Different Feature Sets Using the Kernel-based Classifiers with the LOO Procedure
| Feature Sets | KS2SP | KFDA | KSVM | |||
|---|---|---|---|---|---|---|
| Az | SE | Az | SE | Az | SE | |
| Shape | 0.92 | 0.03 | 0.93 | 0.02 | 0.91 | 0.03 |
| Gradient | 0.68 | 0.05 | 0.69 | 0.05 | 0.65 | 0.06 |
| Texture | 0.74 | 0.05 | 0.68 | 0.05 | 0.66 | 0.05 |
| All | 0.92 | 0.03 | 0.92 | 0.03 | 0.92 | 0.03 |
| FSalignment | 0.95 | 0.02 | 0.93 | 0.02 | 0.94 | 0.02 |
| FSseparability | 0.94 | 0.02 | 0.93 | 0.02 | 0.94 | 0.02 |
| FSdistance | 0.94 | 0.02 | 0.93 | 0.02 | 0.93 | 0.02 |
![]() |
0.94 | 0.02 | 0.93 | 0.02 | 0.93 | 0.02 |
![]() |
0.94 | 0.02 | 0.93 | 0.02 | 0.94 | 0.02 |
SE = standard error
Table 7.
The Classification Performance of Different Feature Sets Using the Kernel-based Classifiers with the HHR Split Procedure
| Feature Sets | KS2SP | KFDA | KSVM | |||
|---|---|---|---|---|---|---|
| Az | SE | Az | SE | Az | SE | |
| Shape | 0.92 | 0.03 | 0.92 | 0.03 | 0.93 | 0.02 |
| Gradient | 0.65 | 0.08 | 0.67 | 0.07 | 0.68 | 0.06 |
| Texture | 0.68 | 0.06 | 0.64 | 0.07 | 0.69 | 0.05 |
| All | 0.93 | 0.02 | 0.93 | 0.02 | 0.93 | 0.03 |
| FSalignment | 0.93 | 0.02 | 0.93 | 0.02 | 0.94 | 0.03 |
| FSseparability | 0.94 | 0.03 | 0.93 | 0.03 | 0.94 | 0.02 |
| FSdistance | 0.94 | 0.03 | 0.93 | 0.03 | 0.94 | 0.03 |
![]() |
0.94 | 0.03 | 0.94 | 0.02 | 0.94 | 0.02 |
![]() |
0.94 | 0.02 | 0.94 | 0.02 | 0.94 | 0.02 |
Each Az value shown is the average of the Az values over 100 HHR split classification experiments
SE = standard error
Comparative Analysis
We compared the classification performance of the three pairs of linear classifiers and their corresponding nonlinear versions with the Gaussian kernel (LS2SP/KS2SP, FLDA/KFDA, and LSVM/KSVM) and the performance of the selected combinations of shape, edge-sharpness, and texture features with that of the shape features on their own. Comparison of the ROC curves for pairs of the linear vs. kernel-based classifiers is shown in Figure 2 using the feature set FSbest2, as well as the ROC curves of the feature set FSalignment and the five shape features on their own, all evaluated with the LOO procedure. Comparison of the ROC curves obtained by the three kernel-based classifiers is shown in Figure 3 using the feature set FSbest2, evaluated with the LOO procedure. The average Az values over the nine feature sets with the LOO and HHR procedures, the training time for the feature set FSseparability with the HHR split procedure, and the number of parameters required to be specified for each classifier are recorded in Table 8.
Fig 2.
ROC curves of different pairs of classifiers (or feature sets) with the LOO procedure.
Fig 3.
ROC curves of KFDA, KSVM, and the KS2SP classifier using the features set FSbest2 with the LOO procedure.
Table 8.
Comparison of the Linear and Kernel-based Classifiers Studied
| Classifiers | Az (LOO) | Az (HHR) | Time (Second) | Parameters |
|---|---|---|---|---|
| LS2SP → KS2SP | 0.84 → 0.89 | 0.84 → 0.87 | 0.80 → 0.93 | 0 → 1(σ) |
| FLDA → KFDA | 0.86 → 0.87 | 0.84 → 0.87 | 0.65 → 0.68 | 0 → 1(σ) |
| LSVM → KSVM | 0.85 → 0.87 | 0.86 → 0.88 | 2.21 → 2.56 | 1(ξ) → 2(ξ, σ) |
The symbol “→” indicates the advantage to be gained or the additional cost as the linear version of a classifier is replaced by its kernel-based version
By using the ROCKIT software,49 we also performed the area test50 with the features set FSbest2 to test the statistical significance of the differences between two ROC curves, for various kernel-based classifiers (KS2SP, KFDA, and KSVM), for the linear vs. kernel-based classifiers, and for the selected vs. the solely shape-based feature sets, based on the LOO procedure. The area test is a univariate z-score test of the difference between the areas under the two ROC curves (i.e., the difference in the overall diagnostic performance of the two tests), of which a null hypothesis indicates the data sets arose from binormal ROC curves with equal areas beneath them.51,52 The computed value of the correlated test statistic (observed z score), the corresponding two-tailed p value, and the approximate 95% confidence interval for the difference are recorded in Tables 9, 10, and 11 for different pairs of classifiers and feature sets. The following observations can be made from the tables and figures mentioned above:
The Gaussian kernel improves the performance of the linear classifiers, such as the LS2SP and LSVM, which indicates that advantages in classification accuracy may be gained by embedding kernel functions in the classifier (see Tables 8 and 10 and Fig. 2).
The selected combinations of shape, edge-sharpness, and texture features improve the classification performance as compared to that of the shape features on their own (see Tables 6, 7, and 11 and Fig. 2).
There is no statistically significant difference between the performance of the three kernel-based classifiers (see Tables 8 and 9 and Fig. 3).
Kernel-based classifiers take longer training time than the corresponding linear classifiers (see Table 8).
The SVM possesses lower training speed than the S2SP classifier and FLDA/KFDA (see Table 8).
The S2SP classifier and FLDA/KFDA are more convenient for users than classifiers with a regularization parameter, such as SVMs, as there are fewer parameters required to be specified or predetermined (see Table 8).
Table 9.
Statistical Analysis of the Difference Between the ROC Curves of Pairs of Various Kernel-based Classifiers
| KS2SP (Group 1) vs. KFDA (Group 2) | KS2SP (Group 1) vs. KSVM (Group 2) | KFDA (Group 1) vs. KSVM (Group 2) | |
|---|---|---|---|
| 95% CI1 | [0.89, 0.97] | [0.89, 0.97] | [0.86, 0.97] |
| 95% CI2 | [0.86, 0.97] | [0.88, 0.97] | [0.88, 0.97] |
| z-score | 0.90 | −0.57 | −0.85 |
| p value | 0.37 | 0.57 | 0.39 |
| 95% CId | [−0.012, 0.033] | [−0.008, 0.004] | [−0.031, 0.012] |
CI1 denotes the asymmetric 95% CI for Az of the classifiers in group 1. CI2 denotes the asymmetric 95% CI for Az of the classifiers in group 2. CId denotes the approximate 95% CI for the difference of Az between each pair of classifiers
CI = confidence interval
Table 10.
Statistical Analysis of the Difference Between the ROC Curves of Pairs of Linear vs. Kernel-based Classifiers
| LS2SP (Group 1) vs. KS2SP (Group 2) | FLDA (Group 1) vs. KFDA (Group 2) | LSVM (Group 1) vs. KSVM (Group 2) | |
|---|---|---|---|
| 95% CI1 | [0.81, 0.94] | [0.87, 0.96] | [0.85, 0.96] |
| 95% CI2 | [0.89, 0.97] | [0.86, 0.97] | [0.88, 0.97] |
| z-score | −2.23 | −0.10 | −2.23 |
| p value | 0.03 | 0.92 | 0.03 |
| 95% CId | [−0.102, −0.007] | [−0.021, 0.019] | [−0.047,−0.003] |
CI1 denotes the asymmetric 95% CI for Az of the classifiers in group 1. CI2 denotes the asymmetric 95% CI for Az of the classifiers in group 2. CId denotes the approximate 95% CI for the difference of Az between each pair of classifiers
CI = confidence interval
Table 11.
Statistical Analysis of the Difference Between the ROC Curves of Pairs of Various Feature Sets
| Shape (Group 1) vs. All (Group 2) | FSalignment (Group 1) vs. Shape (Group 2) |
(Group 1) vs. Shape (Group 2) |
|
|---|---|---|---|
| 95% CI1 | [0.84, 0.96] | [0.90, 0.98] | [ 0.89, 0.97] |
| 95% CI2 | [0.85, 0.96] | [0.84, 0.96] | [0.84, 0.96] |
| z-score | −1.37 | 1.36 | 0.96 |
| p value | 0.17 | 0.18 | 0.34 |
| 95% CId | [−0.017, 0.003] | [−0.012, 0.065] | [−0.018, 0.052] |
CI1 denotes the asymmetric 95% CI for Az of the classifiers in Group 1. CI2 denotes the asymmetric 95% CI for Az of the classifiers in Group 2. CId denotes the approximate 95% CI for the difference of Az between each pair of classifiers
CI = confidence interval
Overall, the KS2SP classifier, based upon the S2SP classifier that we have recently proposed, provides the best performance with low computational cost and complexity of training.
Discussion and Conclusion
We have investigated and presented the results of classification of breast masses with a set of 111 regions in mammograms, with 46 related to malignant tumors and 65 to benign masses, each represented with 22 features including 5 shape factors, 3 edge-sharpness measures, and 14 texture features. Before the classification stage, feature selection independent of any classifier was performed by GA based on three measures of data separability, including alignment of the kernel with the target function, class separability, and normalized distance. Three linear classifiers, including the classical pattern classification method of FLDA, the popular SVM based on the maximal margin rule, and the S2SP classifier that we have recently developed, as well as their corresponding nonlinear versions with the Gaussian kernel, were employed to perform the classification task.
Sahiner et al.6 obtained Az = 0.87 ± 0.02 with a set of 249 films from 102 patients using a combination of morphological and texture features with stepwise feature selection and linear discriminant analysis; the result was improved to Az = 0.91 ± 0.02 when the leave-one-case-out discriminant scores from different views of a mass were combined to obtain a summary score. One of our own previous studies27 with a subset of 57 masses (from the set of 111 masses used in the present study), using a combination of shape, edge-sharpness, and texture features with the kernel partial least squares transformation, achieved classification accuracy in the range of Az = [0.99, 1.0] using FLDA, but with low robustness with respect to variations in the associated parameters. In the present study, with a more diverse data set of 111 masses, with the GA-selected feature combination FSalignment, the linear classification performance reached 0.93 in Az value using SVM, and the nonlinear classification performance reached 0.95 in Az value using the S2SP classifier with good robustness around the associated parameters.
Principal component analysis (PCA) is a well-known feature dimensionality reduction tool, which concentrates on significant projections of features; however, PCA does not always improve the classification performance of most classifiers. Furthermore, PCA does not identify specific features as the best-performing features. Compared with PCA, the feature selection methods used in the present study not only reduce the dimensionality of the features provided, but they also improve the classification performance of several classifiers and identify the strong features. Feature selection based on the performance of a specific classifier can determine a set of feature combinations with high classification performance; however, the results cannot always be extended to other classifiers, especially to nonlinear classifiers with different logic and structure. The methods used for feature selection in the present study are independent of the classifier; the selected combinations are suitable for use with several different classifiers. A limitation of the proposed approaches to feature selection is that the correlation existing between the given features is not accounted for, and consequently, the selected combinations of features may contain more than the minimal set of features that could provide similar classification performance.The increased classification accuracy values obtained indicate that the incorporation of features representing multiple radiological characteristics, such as edge sharpness, texture, and shape, instead of shape alone, could lead to improved representation and analysis of breast masses in mammograms by using advanced kernel-based classifiers associated with feature selection using GA based on measures of data separability. The proposed methods should find application in computer-aided detection and diagnosis of breast cancer.
Acknowledgment
T. Mu would like to acknowledge financial support from the Overseas Research Students Awards Scheme, UK; the Hsiang Su Coppin Memorial Scholarship Fund; and the University of Liverpool, UK. We thank the Catalyst Program of Research Services and the University Research Grants Committee of the University of Calgary, Canada, and the Medical Research Council (the Interdisciplinary Bridging Awards), UK, for financial support.
References
- 1.Duijm L, Groenewoud JH, Jansen FH, Fracheboud J, Beek M, Koning HJ. Mammography screening in the Netherlands: delay in the diagnosis of breast cancer after breast cancer screening. Br J Cancer. 2004;91:1795–1799. doi: 10.1038/sj.bjc.6602158. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Breast Expert Workgroup, Cancer Detection Section, California Department of Health Service: Breast Cancer Diagnostic Algorithms for Primary Care Providers, 2005, 3rd edition. http://qap.sdsu.edu/screening/breastcancer/bda/pdf/CDS_Algorithms_2005a.pdf
- 3.Cady B, Chung M. Mammographic screening: no longer controversial. Am J Clin Oncol. 2005;28(1):1–4. doi: 10.1097/01.coc.0000150720.15450.05. [DOI] [PubMed] [Google Scholar]
- 4.Elmore JG, Armstrong K, Lehman CD, Fletcher SW. Screening for breast cancer. J Am Med Assoc. 2005;293(10):1245–1256. doi: 10.1001/jama.293.10.1245. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Bruce LM, Adhami RR. Classifying mammographic mass shapes using the wavelet transform modulus-maxima method. IEEE Trans Med Imaging. 1999;18(12):1170–1177. doi: 10.1109/42.819326. [DOI] [PubMed] [Google Scholar]
- 6.Sahiner BS, Chan HP, Petrick N, Helvie MA, Hadjiiski LM. Improvement of mammographic mass characterization using spiculation measures and morphological features. Med Phys. 2001;28(7):1455–1465. doi: 10.1118/1.1381548. [DOI] [PubMed] [Google Scholar]
- 7.Mudigonda NR, Rangayyan RM, Desautels JEL. Gradient and texture analysis for the classification of mammographic masses. IEEE Trans Med Imaging. 2000;19(10):1032–1043. doi: 10.1109/42.887618. [DOI] [PubMed] [Google Scholar]
- 8.Mudigonda NR, Rangayyan RM, Desautels JEL. Detection of breast masses in mammograms by density slicing and texture flow field analysis. IEEE Trans Med Imaging. 2001;20(12):1215–1227. doi: 10.1109/42.974917. [DOI] [PubMed] [Google Scholar]
- 9.Pisano E Ed.: Proceedings of the 7th International Workshop on Digital Mammography. Durham, NC, June 2004
- 10.Guo Y, Sivaramakrishna R, Lu Ch, Suri JS, Laxminarayan S. Breast image registration techniques: a survey. Med Biol Eng Comput. 2006;44(1–2):15–26. doi: 10.1007/s11517-005-0016-y. [DOI] [PubMed] [Google Scholar]
- 11.Morton MJ, Whaley DW, Brandt KR, Amrami KK. Screening mammograms: interpretation with computer-aided detection¡aprospective evaluation. Radiology. 2006;239:375–383. doi: 10.1148/radiol.2392042121. [DOI] [PubMed] [Google Scholar]
- 12.Oliver A, Freixenet J, Marti R, Pont J, Perez E, Denton ERE, Zwiggelaar R: A novel breast tissue density classification methodology. IEEE Trans Inf Technol Biomed DOI: 10.1109/TITB.2007.903514, 2007 [DOI] [PubMed]
- 13.Rangayyan RM, Nguyen TM. Fractal analysis of contours of breast masses in mammograms. J Digit Imaging. 2007;20(3):223–237. doi: 10.1007/s10278-006-0860-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Rangayyan RM, El-Faramawy NM, Desautels JEL, Alim OA. Measures of acutance and shape for classification of breast tumors. IEEE Trans Med Imaging. 1997;16(6):799–810. doi: 10.1109/42.650876. [DOI] [PubMed] [Google Scholar]
- 15.Rangayyan RM, Mudigonda NR, Desautels JEL. Boundary modelling and shape analysis methods for classification of mammographic masses. Med Biol Eng Comput. 2000;38(5):487–496. doi: 10.1007/BF02345742. [DOI] [PubMed] [Google Scholar]
- 16.Cascio D, Fauci F, Magro R, Raso G, Bellotti R, Carlo F, Tangaro S, Nunzio G, Quarta M, Forni G, Lauria A, Fantacci ME, Retico A, Masala GL, Oliva P, Bagnasco S, Cheran SC, Torres EL. Mammogram segmentation by contour searching and mass lesions classification with neural network. IEEE Trans Nucl Sci. 2006;53(5):2827–2833. doi: 10.1109/TNS.2006.878003. [DOI] [Google Scholar]
- 17.Doi K. Diagnostic imaging over the last 50 years: research and development in medical imaging science and technology. Phys Med Biol. 2006;51:R5–R27. doi: 10.1088/0031-9155/51/13/R02. [DOI] [PubMed] [Google Scholar]
- 18.Doi K. Computer-aided diagnosis in medical imaging: historical review, current status and future potential. Comput Med Imaging Graph. 2007;31:198–211. doi: 10.1016/j.compmedimag.2007.02.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Rangayyan RM, Ayres FJ, Desautels JEL. A review of computer-aided diagnosis of breast cancer: toward the detection of subtle signs. J Franklin Inst. 2007;344(3–4):312–348. doi: 10.1016/j.jfranklin.2006.09.003. [DOI] [Google Scholar]
- 20.Domínguez AR, Nandi AK. Improved dynamic-programming-based algorithms for segmentation of masses in mammograms. Med Phys. 2007;34(11):4256–4269. doi: 10.1118/1.2791034. [DOI] [PubMed] [Google Scholar]
- 21.El-Naqa I, Yang Y, Wernick MN, Galatsanos NP, Nishikawa RM. A support vector machine approach for detection of microcalcifications in mammograms. IEEE Trans Med Imaging. 2002;21(12):1552–1563. doi: 10.1109/TMI.2002.806569. [DOI] [PubMed] [Google Scholar]
- 22.Wei L, Yang Y, Nishikawa RM, Wernick MN, Edwards A. Relevance vector machine for automatic detection of clustered microcalcifications. IEEE Trans Med Imaging. 2005;24(10):1278–1285. doi: 10.1109/TMI.2005.855435. [DOI] [PubMed] [Google Scholar]
- 23.André TCSS, Rangayyan RM. Classification of breast masses in mammograms using neural networks with shape, edge sharpness, and texture features. J Electron Imaging. 2006;15(1):1–10. doi: 10.1117/1.2178271. [DOI] [Google Scholar]
- 24.Alto H, Rangayyan RM, Desautels JEL.Content-based retrieval and analysis of mammographic masses J Electron Imaging 20051421–17. 10.1117/1.190299617235365 [DOI] [Google Scholar]
- 25.Nandi RJ, Nandi AK, Rangayyan RM, Scutt D. Classification of breast masses in mammograms using genetic programming and feature selection. Med Biol Eng Comput. 2006;44(8):693–694. doi: 10.1007/s11517-006-0077-6. [DOI] [PubMed] [Google Scholar]
- 26.Wei J, Chan H-P, Sahiner B, Hadjiiski LM, Helvie MA, Roubidoux MA, Zhou C, Ge J. Dual system approach to computer-aided detection of breast masses on mammograms. Med Phys. 2006;33(11):4157–4168. doi: 10.1118/1.2357838. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Mu T, Nandi AK, Rangayyan RM. Classification of breast masses via nonlinear transformation of features based on a kernel matrix. Med Biol Eng Comput. 2007;45(8):769–780. doi: 10.1007/s11517-007-0211-0. [DOI] [PubMed] [Google Scholar]
- 28.Homer MJ. Mammographic Interpretation: A Practical Approach. 2. Boston: McGraw-Hill; 1997. [Google Scholar]
- 29.Sahiner BS, Chan H-P, Petrick N, Helvie MA, Goodsitt MM. Computerized characterization of masses on mammograms: the rubber band straightening transform and texture analysis. Med Phys. 1998;25(4):516–526. doi: 10.1118/1.598228. [DOI] [PubMed] [Google Scholar]
- 30.Buckles BP. Genetic Algorithms. Los Alamitos: IEEE Computer Society Press; 1992. [Google Scholar]
- 31.Cristianini N, Kandola J, Elisseeff A, Shawe-Taylor J: On Optimizing Kernel Alignment. Technical Report NC-TR-01-087. London: Royal Holloway University of London, 2001.
- 32.Xiong H, Swamy MNS, Ahmad MO. Optimizing the kernel in the empirical feature space. IEEE Trans Neural Netw. 2005;16(2):460–474. doi: 10.1109/TNN.2004.841784. [DOI] [PubMed] [Google Scholar]
- 33.Swain PH. Fundamentals of pattern recognition in remote sensing. In: Swain PH, Davis SM, editors. Remote Sensing: The Quantitative Approach. New York: McGraw-Hill; 1978. pp. 136–187. [Google Scholar]
- 34.Aizerman M, Braverman E, Rozonoer L. Theoretical foundations of the potential function method in pattern recognition learning. Autom Remote Control. 1964;25:821–837. [Google Scholar]
- 35.Boser BE, Guyon IM, Vapnik VN: A training algorithm for optimal margin classifiers. In: Proc. of the 5th Annual ACM Workshop on Computational Learning Theory, 1992, pp 144–152
- 36.Fisher RA. The use of multiple measurements in taxonomic problems. Ann Eugen. 1936;7(2):179–188. [Google Scholar]
- 37.Mika S, Rätsch G, Weston J, Schölkopf B, Muller K: Fisher discriminant analysis with kernels. In: Proc. of IEEE Neural Networks for Signal Processing Workshop, 1999, pp 41–48
- 38.Mu T, Nandi AK, Rangayyan RM: Strict 2-surface proximal classifier with application to breast cancer detection in mammograms. In: Proc. of the 32nd Int’l Conf. on Acoustics, Speech, and Signal Processing, ICASSP, volume 2, Honolulu, HI, April 2007, pp 477–480
- 39.Mu T, Nandi AK, Rangayyan RM: Strict 2-surface proximal classification of knee-joint vibroarthrographic signals. In: Proc. of the 29th Annual Int’l Conf. of the IEEE Engineering in Medicine and Biology Society, EMBC, Lyon, France, August 2007, pp 4911–4914 [DOI] [PubMed]
- 40.Alberta Cancer Board: Test: Alberta Program for the Early Detection of Breast Cancer. 2001/03 Biennial Report, Edmonton, Alberta, Canada, 2004. http://www.cancerboard.ab.ca/screentest/downloads/screentest_biennial_2001-03.pdf
- 41.The Mammographic Image Analysis Society digital mammogram database. Technical report. Imaging Science and Biomedical Engineering (ISBE): http://www.isbe.man.ac.uk
- 42.Haralick RM, Shanmugam K, Dinstein I. Textural features for image classification. IEEE Trans Syst Man Cybern SMC. 1973;3(6):610–622. doi: 10.1109/TSMC.1973.4309314. [DOI] [Google Scholar]
- 43.Chipperfield AJ, Fleming PJ, Pohlheim H, Fonseca CM. Genetic Algorithm Toolbox for use with MATLAB (version 1.2) Sheffield: University of Sheffield; 1994. [Google Scholar]
- 44.Duda RO, Hart PE, Stork DG. Pattern Classification. 2. New York: Wiley and Sons; 2001. [Google Scholar]
- 45.Cortes C, Vapnik V. Support-vector networks. Mach Learn. 1995;20(3):273–297. [Google Scholar]
- 46.Mangasarian OL, Wild EW. Multisurface proximal support vector machine classification via generalized eigenvalues. IEEE Trans Pattern Anal Mach Intell. 2006;28:69–74. doi: 10.1109/TPAMI.2006.17. [DOI] [PubMed] [Google Scholar]
- 47.Shawe-Taylor J, Cristianini N. Kernel Methods for Pattern Analysis. Cambridge: Cambridge University Press; 2004. [Google Scholar]
- 48.Pelckmans K, Suykens JAK, Gestel TV, Brabanter JD, Lukas L, Hamers B, Moor BD, Vandewalle J: LS-SVM lab1.5: Least squares support vector machines. Katholieke University, Leuven, Belgium, 2003. Available at http://www.esat.kuleuven.ac.be/sista/lssvmlab/toolbox.html
- 49.Metz CE: ROCKIT 1.1B2 Beta version for WINDOWS operating system. Kurt Rossmann Laboratories, Department of Radiology, The University of Chicago, 2006. Available at http://www-radiology.uchicago.edu/krl/KRL_ROC/software_index6.htm
- 50.Metz CE, Wang P-L, Kronman HB. A new approach for testing the significance of differences between ROC curves measured from correlated data. In: Deconinck F, editor. Information Processing in Medical Imaging. The Hague: Martinus Nijhoff; 1984. pp. 432–445. [Google Scholar]
- 51.Metz CE, Herman BA, Shen J-H. Maximum-likelihood estimation of receiver operating characteristic (ROC) curves from continuously-distributed data. Stat Med. 1998;17:1033–1053. doi: 10.1002/(SICI)1097-0258(19980515)17:9<1033::AID-SIM784>3.0.CO;2-Z. [DOI] [PubMed] [Google Scholar]
- 52.Dorfman DD, Alf E. Maximum likelihood estimation of parameters of signal detection theory and determination of confidence intervals ¡a rating method data. J Math Psychol. 1969;6:487–496. doi: 10.1016/0022-2496(69)90019-4. [DOI] [Google Scholar]
















































