Abstract
Background:
Developing computer aided diagnosis (CAD) schemes of mammograms to classify between malignant and benign breast lesions has been attracting broad research interest for the last several decades. However, unlike radiologists who make diagnostic decision based on fusion of image features extracted from multi-view mammograms, most CAD schemes are single view-based schemes, which limit CAD performance and clinical utility.
Purpose:
This study aims to develop and test a novel CAD framework that optimally fuses information extracted from ipsilateral views of bilateral mammograms using both deep transfer learning and radiomics feature extraction methods.
Methods:
An image dataset including 353 benign and 611 malignant cases are assembled. Each case contains four images of craniocaudal (CC) and mediolateral oblique (MLO) view of left and right breast. First, we extract four matching regions of interest (ROIs) from images that surround centers of two suspicious lesion regions seen in CC and MLO views, as well as matching ROIs in two contralateral breasts. Next, the handcrafted radiomics features and VGG16 model-generated automated features are extracted from each ROI resulting in eight feature vectors. Then, after reducing feature dimensionality and quantifying the bilateral and ipsilateral asymmetry of four ROIs to yield four new feature vectors, we test four fusion methods to build three support vector machine (SVM) classifiers by optimal fusion of asymmetrical image features extracted from four view images.
Results:
Using a 10-fold cross validation method, results show that a SVM classifier trained using an optimal fusion of four view images yields the highest classification performance (AUC = 0.876±0.031), which significantly outperforms SVM classifiers trained using one projection view alone, AUC = 0.817±0.026 and 0.792±0.026 for the CC and MLO view of bilateral mammograms, respectively (p<0.001).
Conclusions:
The study demonstrates that the shift from single-view CAD to four-view CAD and the inclusion of both deep transfer learning and radiomics features significantly increases CAD performance in distinguishing between malignant and benign breast lesions.
Keywords: Computer-aided diagnosis (CAD), multi-view CAD scheme, breast cancer, breast lesion classification, multi-view image feature analysis, deep learning, radiomics features, image feature fusion
1. Introduction
Early detection of breast cancer is critical for improving the efficacy of cancer treatment and patient survival. For the last several decades, routine population based mammography screening has played a crucial role in early cancer detection and is one of the primary reasons for the decrease in the mortality rate of breast cancer.1 Despite the widespread utility of mammography or digital breast tomosynthesis (DBT) recently, the efficacy of these population-based breast cancer screening exams is low and controversial due to the high false-positive recall and benign biopsy rates.2 As a result, decreasing the false-positive rates is a pressing clinical issue as it leads to unnecessary biopsies which often have long-term psychological consequences on the patients in additional to being an economic burden on the society.3
Thus, to help radiologists more accurately detect suspicious breast lesions and distinguish between malignant and benign lesions, computer-aided detection (CADe) and diagnosis (CADx) schemes have been extensively developed over the last several decades. Currently, commercialized CADe schemes have been used in the clinical practice to assist radiologists in detecting suspicious lesions while reading mammograms.4 However, whether using CADe can add real clinical values remains questionable5 due to the higher number of false-positive detections.6,7 On the other hand, CADx schemes which have the goal of helping radiologists more accurately classify between malignant and benign breast lesions detected on the mammograms to reduce false-positive recall rate and the number benign biopsies have not yet been accepted or applied in clinical practice to date. Difficulties with current mammogram-based CAD systems (both CADe and CADx schemes) arise from (a) low contrast images intrinsic to X-ray mammography that require various pre-processing methods, (b) drastic differences in the spatial location and appearance of suspicious lesions (i.e., soft tissue masses) which make it difficult to obtain a large and diverse training dataset, and (c) the plethora of breast segmentation schemes with no consensus on the best method to use.7 Therefore, there is still a need to improve the performance of mammography-based CAD systems and the manner in which these systems are employed.6, 7
In a typical mammography screening exam, two projection images are taken of each breast namely, a craniocaudal (CC) view and a mediolateral oblique (MLO) view, resulting in four images per screening exam (left-CC (LCC), right-CC (RCC), left-MLO (LMLO), and right-MLO (RMLO)) (Figure 1). When a radiologist reads mammograms from one screening exam, he/she combines information from all four view images to decide if a suspicious lesion is present or not and whether the presented lesion is malignant or benign (or whether the patient should be recalled for an additional exam or biopsy). However, most existing CAD schemes, including commercialized CADe schemes, are either single image-based (CADe) or lesion-based (CADx) schemes in which the CAD schemes only analyze information or image features extracted from a single view image. Thus, while radiologists use information from multiple view images when interpreting mammograms, most CAD systems only consider information from one view. This is thought to be one major reason that limits the performance of current CAD schemes, particularly, its capability to reduce false-positive detections (for CADe schemes) and classify lesions (for CADx schemes). As a result, this has led to an increase in research focused on exploring new technologies and approaches to effectively compute matched multi-view information or image features from multiple mammograms and how to optimally fuse the multi-view image features to build new multi-view image-based CAD schemes.6–8
Figure 1:
Example of the CC and MLO projection views taken in mammography, which are named as (A) RCC, (B) LCC, (C) RMLO, and (D) LMLO images, respectively.
Although several approaches to develop multi-view CAD schemes have been proposed and reported in the literature, they can be divided into three categories. The first and most popular method uses ipsilateral views by fusing information from CC and MLO views of one breast, which allows for the extraction of image characteristics that may be obscured due to dense overlapping fibro-glandular tissue in one projection view but visible in the other projection view. The second method uses bilateral views by fusing information from the same projection view of the left and right breast, which allows for quantification of breast tissue asymmetry (i.e., parenchymal distortions or change in contrast). This method mimics how radiologists tend to pay careful attention to the asymmetry between bilateral breasts as highly asymmetrical breasts is often a good indicator of breast cancer and the locations of asymmetry often contain suspicious lesions (i.e., masses).9 The third method uses both the ipsilateral and bilateral views by fusing information from all four images, which aims to take advantages of methods one and two.
However, developing multi-view image-based CAD schemes often face several challenges including a difficult image registration task. Mammography exams require the breast to be compressed, as a result simple rigid or affine transformation techniques cannot properly model the anatomical deformations present in the compressed breast. One technique commonly used to accomplish this non-rigid registration task is to use a free-form deformation (FFD) field parametrized by a B-spline control point mesh.10 However, many studies bypass this difficult registration task by using basic subtraction techniques without image registration and ignoring the mismatch between breast regions which results in inaccurate asymmetry quantifications.11, 12 Other studies do not quantify the bilateral asymmetry at all and just use whole breast images of bilateral breasts independently. Despite these difficulties, previous studies have shown that regardless of using either two view images (from only ipsilateral or only bilateral views) or a combination of four-view images, multi-view CAD schemes consistently outperform single-view CAD schemes, indicating that the information contained in different projection views of bilateral breasts can provide additional useful information in detecting and classifying suspicious breast lesions from mammograms.13
The jump from single-view to multi-view CAD schemes introduces another issue as developers must also consider how to efficiently extract and fuse information from multiple input images. Traditionally, a set of handcrafted radiomics features would be extracted from the input image and then used to train a machine learning classifier. More recently, deep learning models are used to extract information directly from the input image based on learned representations of a target domain. While deep learning-based CAD schemes tend to outperform conventional machine learning based CAD schemes, they require extremely large amounts of input data for adequate training and testing, which is often not available in the medical imaging domain. Hence, handcrafted radiomics feature extraction is still a relevant technique. Additionally, handcrafting specific radiomics features benefits from prior knowledge of the domain, meaning image characteristics that are known to be relevant to the task can be quantified through mathematical models and used as image features. On the other hand, deep learning-based features thrive in areas that traditional features lack since deep learning-based features can capture patterns that may not be distinguishable by human eyes, therefore, cannot be quantified by a human crafted mathematical model. Several studies have investigated potential advantages of combining handcrafted radiomics features with automated deep learning-based features to improve model classification performance.14–16 However, there is no consensus on the best way to fuse the information extracted from multiple input images using multiple feature extraction methods.
As outlined above, the three main considerations when developing CAD of mammograms are (1) single-view or multi-view schemes, (2) multi-view schemes based on ipsilateral-view analysis or bilateral-view analysis or both, and (3) the schemes using traditional radiomics features or deep learning generated features. Previous studies demonstrate that multi-view CAD models tend to outperform single-view CAD models,8 the addition of information from the contralateral breast to quantify the bilateral asymmetry increases model performance14–16, and the fusion of handcrafted radiomic features and deep learning features outperforms either method alone when classifying suspicious breast lesions17, 18. However, to the best of our knowledge, no existing work combines these three points into a singular framework. In order to address these challenges, we hypothesize that (1) the automated features generated by deep transfer learning model and handcrafted radiomics features contain complementary information, and optimal fusion of these two types of features can improve CAD performance in tumor classification, and (2) a 4-view image-based CAD scheme can yield significantly higher tumor classification performance than one or two-view image-based CAD schemes. In the rest of this paper, CAD scheme refers to CADx scheme of lesion diagnosis or classification. To test our hypothesis, this study systematically investigates and compares advantages and limitations of fusing deep learning generated features and traditional radiomics based features to develop multi-view CAD schemes.
Specifically, this study focuses on the investigation of the following issues, namely, (1) identifying and extracting matched regions of interest (ROIs) from four mammograms, (2) exploring and computing a new type of image features to represent bilateral image feature asymmetry, and (3) training and testing machine learning classifier using different image feature fusion methods namely, feature level and output level fusion techniques. Through these investigations, the goal of this study is to demonstrate the feasibility of developing a new optimal case-based CAD framework to classify suspicious breast lesions, which fuses both handcrafted radiomics (HCR) features and deep transfer learning (DTL) features computed from four CC and MLO view mammograms of the left and right breasts. The detailed description of the technical development of this framework are presented in the following sections.
2. Materials and Methods
2.1. Image Dataset
The dataset used in this study is assembled from a large de-identified image database of full-field digital mammograms (FFDM). These FFDM images were acquired under an institutional review board approved image collection protocol using the Hologic Selenia digital mammography machine (Hologic Inc., Bedford, MA, USA) from 2008 to 2014. Detailed image and dataset characteristics can be found in our previous studies.11 In brief, sizes of the original FFDM images are either 3,325×4,095 or 2,555×3,325 pixels with one-dimensional pixel size of 0.07mm. To develop CAD schemes of mass-type lesion detection or classification, an average kernel with 5×5 pixels is applied to subsample each original FFDM images. As a result, sizes of FFDM images are reduced to 665×819 or 511×665 pixels with pixel size of 0.35mm. In this study, we selected all available cases that have four FFDM images representing CC and MLO view of the left and right breast. Each case contains one suspicious mass-based lesion that has been marked by a radiologist and proven by biopsy as malignant or benign. Cases were excluded if the mass was not visualized in both CC and MLO view.
To confirm that the mass seen on both CC and MLO view is the same mass, an ipsilateral matching process was applied. Prior to ipsilateral matching, background artifacts are removed by first converting the image to a binary image using an Otsu thresholding method and then creating a mask based on the external contour.19 The mask is applied to the original image, resulting in an image of the whole breast region with all background artifacts removed. The first step of the ipsilateral matching process is to identify the location of the pectoral muscle in both views. The pectoral muscle is often not visible on the CC view; therefore, the location of the pectoral muscle on the CC view is defined as a vertical line that is parallel to the edge of the image. To identify the pectoral muscle on MLO images, a straight line approximation is made based on the average gradient as adopted from a previous study.20 The pectoral muscle location is then used to identify the nipple location in both CC and MLO images following the method developed in previous study.21
Once these landmarks are identified, ipsilateral matching is conducted based on an existing method.22 Briefly, the centerline is first defined which is a line perpendicular to the pectoral muscle that passes through the nipple. Next, the mass is projected onto the centerline and the distance between the mass projected onto the centerline and the nipple is calculated (Figure 2). If the absolute difference between this distance from the CC view and MLO view is less than 100 pixels than the two masses are considered ipsilaterally matched. In the subsampled images, 100 pixels represents 35mm which should comfortably match small and large lesions while accounting for bias introduced by the radiologist when marking the center of each lesion.23 Masses that could not be matched ipsilaterally are discarded.
Figure 2:
Example of the ipsilateral matching scheme. The location of the pectoral muscle is drawn in light gray, the location of the nipple is shown by a gray star, and the centerline is drawn in white. The LMLO and LCC images in this case each contained one suspicious lesion as marked with a white circle. After the centerline is drawn, the mass is projected onto the centerline (black circle) and the distance to the nipple is calculated. For this case, the distance was 157.93 pixels for the LMLO view and 155.00 pixels for the LCC view. Since the absolute difference between these values is less than 100, we consider this mass to be ipsilaterally matched.
2.2. A new multi-view CAD framework
Figure 3 is a visual representation of the proposed multi-view CAD framework developed and tested in this study. In this figure, we assume that a suspicious lesion is visually detected on FFDM images of right breast. Thus, two suspicious lesion regions (ROIs) are located and extracted on both CC and MLO view images of right breast, which are defined as RCC image and RMLO image on the top row of the figure. Then, multiple CAD image processing and feature analysis steps are applied to build machine learning classifiers to predict the likelihood of the queried suspicious lesion being malignant. The details of all CAD steps are described below.
Figure 3:
Flowchart of the framework of this study
2.2.1. Extraction of matching regions in four views
As shown in Figure 3, after ipsilateral matching, all cases are represented by four images where two of those images are ipsilateral views of the same suspicious lesion and the other two images are ipsilateral views of the contralateral breast without a lesion. To quantify the bilateral asymmetry of image features computed between a lesion region on a breast and the matched normal breast tissue region on another breast, we perform bilateral image registration to identify and extract two matched regions of interest (ROIs) from two bilateral mammograms, which includes two pairs of the registered ROIs (LCC-RCC and LMLO-RMLO).
Before registration, all right breast images are mirrored so that the orientation of the left and right breasts are the same. Bilateral registration is conducted using a multiresolution B-spline transformation that optimizes the mattes mutual information metric.24 The registration method is implemented using SimpleITK of the Insight Toolkit (ITK) in python.25 For this study, the mammogram containing a suspicious lesion is used as the fixed image while the contralateral mammogram is used as the moving image. Registration is conducted in this manner so that the annotations of the center of the suspicious lesions remain accurate. Once the images are bilaterally registered, four matched ROIs of 64×64 pixels are extracted surrounding the center of each lesion region on two ipsilateral view of the same breast and from two matching ROIs on two ipsilateral views of the contralateral breast (Figure 4).
Figure 4:
Example of bilateral registration and ROI extraction. The top row displays the B-spline transformation via checkerboard visualization. (A) is the unregistered CC images, (B) is the registered CC images, (C) is the unregistered MLO images, and (D) is the registered MLO images. The middle row displays the registered bilateral images for the (E) RCC, (F) LCC, (G) RMLO and (H) LMLO view. The black bounding boxes show the 64×64 pixel ROI extracted surrounding the center of the lesion as marked by a radiologist. White boxes seen in the contralateral images show the location in which the corresponding ROI is extracted after bilateral registration. The bottom row shows the extracted ROIs from the corresponding view above it.
2.2.2. Handcrafted Radiomics Feature Extraction and Reduction
Forty-five handcrafted radiomics (HCR) features are first computed from each ROI independently. These include 7 first-order statistical features that describe the intensity distributions of the images with no spatial information, 12 gray-level cooccurrence matrix (GLCM) derived features and 22 gray-level run length matrix (GLRLM) derived features that are used to describe the spatial distribution of the varying intensity distributions. Additionally, four Gabor features are extracted since these features are known to be extremely useful for mammography texture analysis as the filters have optimal Heisenberg joint resolution in the spatial frequency domain, so that the features are able to overcome the intrinsic low resolution and high noise of mammography images.26 A Gabor filter bank of 16 filters is created from a combination of the following parameters, spatial frequency of the harmonic function of 0.05 or 0.25, orientation of 0–4, and standard deviation of the Gaussian kernel of 1 or 3. Each image is convolved with each filter and the mean, variance, energy, and entropy are calculated from the filtered image. Then, the mean of each feature over the 16 filtered images is taken resulting in four Gabor features per image.
After HCR feature extraction, four feature vectors are created namely, LCC-HCR, RCC-HCR, LMLO-HCR and RMLO-HCR, each containing 45 features. Next, each of these features (is independently normalized using the following equation.
where , and is mean feature value of all cases ( and is the standard deviation. If , it is assigned to 1, while if , it is assigned to 0. In this way, we can avoid the possible impact by the outlier feature values in the dataset.
Next, the bilateral asymmetry features are computed using an absolute subtraction of two matched features extracted from the left and right breast of either CC or MLO views, independently (i.e., to quantify bilateral breast tissue or image feature difference or asymmetry. Then, a variance thresholding method is applied to prescreen the compute bilateral asymmetrical features using an empirically selected threshold of 0.01 to remove irrelevant or redundant features. Thus, the final CC-HCR and MLO-HCR feature vectors are generated.
2.2.3. Deep Transfer Learning Feature Extraction and Reduction
To extract deep transfer learning (DTL) features directly from the image, we select and use a VGG16 network pretrained on the ImageNet database exactly as conducted in our previous study.27 Because this network is pretrained on three channel color images from the ImageNet database, the network will take three channel images as an input. We create pseudo-color ROIs by stacking the original image, a bilaterally filtered image, and a histogram equalized image in the three channels and feed this image to the network. Details of creating pseudo-ROIs can be found in our previous work.27 The previous studies have demonstrated that using pseudo-ROIs as inputs to the deep transfer learning model produce features that contain more relevant information for the prediction task than directly stacking the original ROI into the three channels.27, 28 Since VGG16 takes a 224×224 image as an input, all ROIs of 64×64 pixels are resized using a bilinear interpolation.
The architecture of the VGG16 network is made up of five blocks, each of which contain either two or three convolutional layers followed by a max pooling layer after each convolution layer. Then, VGG16 network includes three fully connected layers. Since we use VGG16 network as an automated feature extractor, the top three fully connected layers are removed. As a result, 25,088 automated image features are extracted after the final max pooling layer and then normalized. The bilateral asymmetrical features are quantified in the same manner as the HCR features.
To significantly reduce the dimensionality of the extremely large sets of automated features, we take two steps. First, a variance thresholding method is also applied to prescreen and remove all features that have a variance of less than an empirically selected threshold of 0.02, which reduces the number of automated features from 25,088 to ~6,000. Second, we use a sequential forward feature selector (SFFS) algorithm implemented with a 4-fold cross-validation method wrapped inside a linear support vector machine (SVM) and using the area under the receiver operating characteristic curve (AUC) as an evaluation metric29 to obtain a final optimal CC-DTL and MLO-DTL feature vector. This feature selection method of using SFFS algorithm have been applied and reported in our previous studies. 27, 28
2.3. Classification and Model Evaluation
In this study, we investigate four different feature and model fusion strategies to determine which method produced the best results (Figure 5).
Figure 5:
Schematic Diagram of the four fusion methods investigated
Method 1 is a feature level fusion followed by a two-stage classification system. In this method, for each projection view, the HCR and DTL feature vectors are first fused via concatenation. Next, the two fusion feature vectors are used to train two separate classifiers whose outputs are then fused and used to train a final classifier.
Method 2 is a two-stage classification system in which four separate classifiers are trained independently using either the HCR or DTL feature vector from either projection view. Then, the outputs of the four classifiers are fused and used to train a final classifier.
Method 3 is a three-stage classification system, which begins the same way as method 2. In the second stage, the outputs of the classifiers trained on the CC-HCR and CC-DTL feature vectors are concatenated and used to train one classifier and the outputs of the classifiers trained on the MLO-HCR and MLO-DTL feature vectors are concatenated and used to train another classifier. In the third stage the output of the two classifiers trained on either projection CC or MLO view are concatenated and used to train a final classifier.
Method 4 is identical to method 3 except in the second stage the two classifiers are trained using the concatenation of the outputs of the prior classifiers that were trained using either the CC-HCR and MLO-HCR feature vectors or the CC-DTL and MLO-DTL feature vectors.
As shown in Figure 5, we select a linear support vector machine (SVM) as the machine learning classifier to fuse image features and generate a classification score to predict the likelihood of a testing case depicting a malignant lesion because when comparing to many other types of machine learning classifiers, a SVM is easy to train with a simple structure and has a higher capability to be robust. Thus, SVMs are commonly used in CAD of breast lesion classification tasks as described in a recent systematic review article 30. In this study, all SVMs are trained and tested using a stratified 10-fold cross validation method in which all cases were randomly divided into 10 subsets, where in each cross-validation cycle, nine subsets are used for training and one subset is used for the testing of SVM classifier. To address the imbalance issue of our dataset (36.6% benign cases versus 63.4% malignant cases, which will be reported in Results section below), we use the synthetic minority oversampling technique (SMOTE) to oversample the benign cases to ensure that each classifier is trained using a subset of the data that contains a balanced number of malignant and benign cases31. SMOTE algorithm is embedded into each cross-validation fold and applied to only the training datasets as reported in the previous study.32
Each SVM produces a prediction score between 0 and 1 for each testing case, where higher scores indicate a higher probability of being malignant. Prediction scores generated on the testing dataset over 10-fold CV are then used to generate receiver operating characteristic (ROC) curves using the publicly available ROC curve fitting program, ROCKIT (http://metz-roc.uchicago.edu/MetzROC), which generates a smooth ROC curve based on the maximum likelihood estimates of the SVM-generated prediction scores. The area under the ROC curve (AUC) along with the standard deviation is computed and used as an evaluation metric. The statistically significant difference of the different SVM classifiers (AUC values) are also compared using p-values computed by ROCKIT program. Additionally, an operation threshold of 0.5 is applied to the prediction scores to divide the testing cases into malignant and benign class. Predictions scores (<0.5) are classified as benign, while scores (≥0.5) are malignant. The overall classification accuracy, precision, sensitivity, and specificity of each SVM along with the standard deviation are then computed and recorded as additional evaluation indices.
3. Results
The initial dataset is comprised of 1,065 cases that contain four FFDM images named as LCC, RCC, LMLO, and RMLO images. Each case depicts one biopsied soft tissue mass-type breast lesion. However, our ipsilateral matching scheme is unable to confirm that the lesion marked in the CC view is the same lesion marked in the MLO view in 66 cases. The bilateral registration scheme fails to register other 35 cases. This resulted in a final dataset that contained 964 cases of which 353 cases depict benign lesions and 611 cases depict malignant as confirmed by tissue biopsy. Therefore, the final true case-based dataset used in this study to train and test CAD scheme contain 3,856 FFDM images where 1,412 images associate with benign cases and 2,444 images associate with malignant cases.
After feature reduction, the HCR-CC and HCR-MLO feature vectors contain 26 and 22 features, respectively. The HCR features selected for the final feature sets are displayed in Table 1. The DTL-CC and DTL-MLO feature vectors contain 74 and 44 features, respectively. The HCR-CC feature vector and the DTL-CC feature vector are combined via concatenation to create the fusion feature vector that is used in Method 1. The same process is repeated with the MLO feature vectors. To fuse HCR and DTL features, the features included in CC fusion and MLO fusion feature vectors are further analyzed and reduced using a SFFS method. After feature reduction, the CC fusion feature vector contains 43 features (including 14% HCR features and 86% DTL features) while the MLO fusion feature set contains 31 features (including 13% HCR features and 87% DTL features). The HCR features selected in the final fusion feature vectors are shown in the last two columns of Table 1.
Table 1:
Handcrafted radiomic features selected after feature reduction. An X indicates that the feature was selected to be used in the final feature vector for the corresponding column.
Feature Type | Feature Name | Feature Set | ||||
---|---|---|---|---|---|---|
| ||||||
HCR-CC | HCR-MLO | Fusion (CC) | Fusion (MLO) | |||
| ||||||
Statistical | Mean | X | X | X | ||
Max | X | X | ||||
Standard Deviation | X | X | X | |||
Energy | ||||||
Entropy | X | X | ||||
Skewness | ||||||
Kurtosis | X | X | X | |||
| ||||||
GLCM | Contrast | Max | ||||
Mean | X | |||||
Dissimilarity | Max | X | X | |||
Mean | X | X | ||||
Homogeneity | Max | X | ||||
Mean | X | X | ||||
ASM | Max | |||||
Mean | ||||||
Energy | Max | |||||
Mean | ||||||
Correlation | Max | X | X | X | ||
Mean | X | |||||
| ||||||
GLRLM | SRE | Max | X | |||
Mean | ||||||
LRE | Max | |||||
Mean | ||||||
GLN | Max | X | X | |||
Mean | X | X | ||||
RLN | Max | |||||
Mean | ||||||
RP | Max | |||||
Mean | ||||||
LGLRE | Max | |||||
Mean | ||||||
HGLRE | Max | X | X | |||
Mean | X | X | ||||
SRLGLE | Max | X | ||||
Mean | X | X | ||||
SRHGLE | Max | X | X | |||
Mean | X | X | ||||
LRLGLE | Max | |||||
Mean | ||||||
LRGHLE | Max | X | X | |||
Mean | X | X | X | |||
| ||||||
Gabor Features | Mean | X | X | X | ||
Variance | X | X | X | X | ||
Energy | X | X | ||||
Entropy | X | X | X |
The results of the four different fusion methods are shown in Figure 6 and Table 2, which include four ROC curves generated by four fusion methods (Figure 6) and the corresponding AUC values along with the standard deviation computed by ROCKIT program (Table 2). The results show that using Method 1, three SVMs yield significantly higher AUC values than the corresponding SVMs generated using methods 2, 3, and 4 (with all p < 0.005). The similar performance patterns (including classification accuracy, precision, sensitivity, and specificity) among the SVM classifiers generated in four methods are also observed after applying the operation threshold to assign or classify testing cases into malignant and benign classes. Thus, Method 1 is selected for further data analysis.
Figure 6:
Final ROC Curves of the four different fusion methods. ROC Curves are generated using a maximum likelihood estimation method in ROCKIT.
Table 2:
Results of the four fusion methods. Mean values and standard deviation over 10-fold CV.
Method | SVM | AUC | Accuracy | Precision | Sensitivity | Specificity |
---|---|---|---|---|---|---|
| ||||||
1 | SVM1 | 0.817 ± 0.026 | 0.745 ± 0.033 | 0.745 ± 0.116 | 0.633 ± 0.057 | 0.841 ± 0.053 |
SVM2 | 0.792 ± 0.026 | 0.721 ± 0.035 | 0.734 ± 0.048 | 0.600 ± 0.047 | 0.823 ± 0.027 | |
SVM3 | 0.876 ± 0.031 | 0.792 ± 0.044 | 0.773 ± 0.097 | 0.696 ± 0.059 | 0.863 ± 0.049 | |
| ||||||
2 | SVM1 | 0.664 ± 0.039 | 0.611 ± 0.030 | 0.694 ± 0.063 | 0.478 ± 0.027 | 0.763 ± 0.039 |
SVM2 | 0.642 ± 0.051 | 0.584 ± 0.046 | 0.677 ± 0.047 | 0.456 ± 0.039 | 0.738 ± 0.041 | |
SVM3 | 0.781 ± 0.030 | 0.726 ± 0.023 | 0.714 ± 0.110 | 0.609 ± 0.027 | 0.823 ± 0.052 | |
SVM4 | 0.741 ± 0.029 | 0.694 ± 0.034 | 0.694 ± 0.069 | 0.572 ± 0.049 | 0.800 ± 0.029 | |
SVM5 | 0.851 ± 0.025 | 0.782 ± 0.030 | 0.748 ± 0.095 | 0.691 ± 0.053 | 0.850 ± 0.043 | |
| ||||||
3 | SVM1 | 0.664 ± 0.039 | 0.611 ± 0.030 | 0.694 ± 0.063 | 0.478 ± 0.027 | 0.763 ± 0.039 |
SVM2 | 0.642 ± 0.051 | 0.584 ± 0.046 | 0.677 ± 0.047 | 0.456 ± 0.039 | 0.738 ± 0.041 | |
SVM3 | 0.781 ± 0.030 | 0.726 ± 0.023 | 0.714 ± 0.110 | 0.609 ± 0.027 | 0.823 ± 0.052 | |
SVM4 | 0.741 ± 0.029 | 0.694 ± 0.034 | 0.694 ± 0.069 | 0.572 ± 0.049 | 0.800 ± 0.029 | |
SVM5 | 0.800 ± 0.023 | 0.742 ± 0.042 | 0.714 ± 0.120 | 0.634 ± 0.049 | 0.825 ± 0.054 | |
SVM6 | 0.766 ± 0.032 | 0.709 ± 0.038 | 0.697 ± 0.072 | 0.595 ± 0.063 | 0.806 ± 0.032 | |
SVM7 | 0.852 ± 0.027 | 0.778 ± 0.035 | 0.742 ± 0.098 | 0.686 ± 0.057 | 0.846 ± 0.044 | |
| ||||||
4 | SVM1 | 0.664 ± 0.039 | 0.611 ± 0.030 | 0.694 ± 0.063 | 0.478 ± 0.027 | 0.763 ± 0.039 |
SVM2 | 0.642 ± 0.051 | 0.584 ± 0.046 | 0.677 ± 0.047 | 0.456 ± 0.039 | 0.738 ± 0.041 | |
SVM3 | 0.781 ± 0.030 | 0.726 ± 0.023 | 0.714 ± 0.110 | 0.609 ± 0.027 | 0.823 ± 0.052 | |
SVM4 | 0.741 ± 0.029 | 0.694 ± 0.034 | 0.694 ± 0.069 | 0.572 ± 0.049 | 0.800 ± 0.029 | |
SVM5 | 0.642 ± 0.051 | 0.581 ± 0.046 | 0.657 ± 0.033 | 0.452 ± 0.039 | 0.728 ± 0.036 | |
SVM6 | 0.829 ± 0.028 | 0.762 ± 0.029 | 0.734 ± 0.090 | 0.659 ± 0.037 | 0.838 ± 0.042 | |
SVM7 | 0.841 ± 0.028 | 0.772 ± 0.033 | 0.742 ± 0.057 | 0.676 ± 0.056 | 0.842 ± 0.028 |
Additionally, based on the data as listed in Table 2, we enable to obtain more data analysis results. First, to further analyze the differences between feature level fusion and output level fusion, we compare the performance of classifiers of method 1 to stage two classifiers of method 3. In method 1, SVM 1 and SVM 2 are trained using a feature vector that fuses HCR and DTL feature vectors computed from the CC view and MLO view, respectively. In method 3, SVM 5 and SVM 6 are trained using the fusion of the outputs from classifiers independently trained on the HCR and DTL feature vectors computed from the CC and MLO views, respectively. We compare the performance between SVM 1 of Method 1 and SVM 5 of Method 3, as well as between SVM 2 of Method 1 and SVM 6 of Method 3 to determine if direct fusion of features computed from multi-view images continues to outperform fusion of classifier output scores generated by multi-classifiers trained only using image features computed from a single projection view. The data analysis results show that SVM 1 and SVM 2 of method 1 yield significantly higher AUC values (AUCs = 0.817±0.026 and 0.792±0.026) than SVM 5 and SVM 6 of Method 3 (AUC = 0.800±0.023 and 0.766±0.032) with p = 0.0327 for using two bilateral CC view images and p < 0.001 for using two bilateral MLO view images, respectively, which indicate that fusion of image features is better than fusion of output of two classifiers separately trained using different single-view image features.
Second, to determine whether fusion of HCR and DTL feature vectors yield better results, we compare the performance of the SVMs trained on the fusion feature sets used in Method 1 to the SVMs trained on the HCR and DTL feature sets independently in stage one of all three output level fusion methods. For both projection views, the SVMs trained using the HCR and DTL fusion feature vectors also yield significantly higher classification performance (AUCs = 0.817±0.026 and 0.792±0.026) than the SVMs trained using either only the HCR or DTL feature vector (AUCs = 0.664±0.039 and 0.781±0.030) with p < 0.001 and p = 0.0431 for the CC view, and (AUCs = 0.642±0.051 and 0.741±0.029) with p < 0.001 and p = 0.0091 for the MLO view, respectively.
Third, besides that the SVMs of Method 1 in general perform significantly better than the SVMs of the other three methods, we also compare the performance between the SVM trained using four images that combine two pairs of bilateral images (CC and MLO view) and other two SVMs trained using two images that combine one pair of bilateral images (either CC or MLO view), which are SVM1 vs SVM2 and SVM1 vs SVM3 as shown in Method 1 of Figure 5). The results show that SVM1 yields an AUC = 0.876±0.031, which is significantly higher than AUC = 0.817±0.026 yielded by SVM2 and AUC = 0.7920±0.026 yielded by SVM3 (both p < 0.001) (Table 2). Corresponding ROC Curves are displayed in Figure 7. No statistically significant difference is observed in the ROC curves or AUC values between SVM2 and SVM3 (p = 0.3546). Additionally, Figure 8 displays the sum of three confusion matrices of SVM1, SVM2 and SVM3 computed based on the classification accuracy of malignant and benign cases, which are then used to compute the overall classification accuracy, precision, sensitivity, and specificity as reported in Table 2.
Figure 7:
smooth ROC curves of the single-view classifiers and the multi-view classifier based on the maximum likelihood estimates of the prediction scores generated over 10-fold CV.
Figure 8:
Sum of each confusion matrix over 10-fold CV for the single view and multi-view classifiers.
4. Discussion
This paper reports a new study that combines three common analysis tools used in developing CAD of multi-view mammograms into a single framework for assisting in the diagnosis of suspicious breast lesions as malignant or benign. Unlike previous multi-view CAD schemes of mammograms that combine the complementary image features computed from either ipsilateral or bilateral mammography views, or the CAD schemes the use both HCR features and DTL features computed from single images, this study has several unique characteristics or aspects as comparing to many previous studies in this research field.
First, this is a complete case-base CAD scheme that extracts two sets of matched ROIs from four mammograms in one screening examination (including two lesion regions depicting on two ipsilateral views and two negative regions on images of the contralateral breast). Two types of image features (HCR and DTL) computed from these four matched ROIs are passed through the framework simultaneously, so that the final machine learning classifier fuses the clinically relevant and complementary information extracted from each ROI of different view when making a final predictive decision. However, accurate identification of four matched ROIs on both ipsilateral and bilateral mammograms by a CAD scheme is very difficult due to the difference of breast compression in acquiring four view images. Unlike previous studies (i.e., Khan et al.13) that manually determine four matched ROIs from four mammograms, we develop and add two algorithms of an ipsilateral view matching and a bilateral image registration prior to ROI extraction. As a result, applying an ipsilateral matching algorithm ensures that one lesion visualized in one projection view is the same lesion visualized in another projection view. This is an extremely important step as some cases may have a suspicious lesion marked in the CC view and a different lesion marked in the MLO view, meaning there are two distinct lesions within the breast, and each is only visualized in one projection view. Additionally, our CAD framework also applies a bilateral registration algorithm to ensure that the ROIs extracted from the contralateral breast are from the same spatial location as the lesion on two ipsilateral view images. By implementing these two algorithms, we developed a unique four-view image or case-based CAD framework.
Second, we chose to quantify the bilateral asymmetrical features computed from two ROIs in each projection (CC and MLO) view as opposed to using the image features computed from two bilateral ROIs independently to build and train machine learning classifier. Our approach does not only reduce the number of image features in the initial feature pool, which improve efficacy of feature selection or feature dimensionality reduction, it can also better mimic the experience of how radiologists diagnose breast lesions in reading mammograms. Since when visually inspecting a mammogram exam for abnormalities, a radiologist often relies on the bilateral asymmetry as a qualitative imaging marker, quantifying bilateral asymmetry of two pairs of the matched ROIs in CC and MLO view can also generate effective quantitative imaging markers used in CAD schemes. Previous studies have demonstrated the advantages of applying CAD schemes that focus on analysis of bilateral image feature asymmetry computed from two mammograms of left and right breast to predict the short-term risk of developing breast cancer14–16 and the likelihood of having breast cancer depicting on mammograms.33, 34 However, these previous studies bypass the image registration step and the extraction of ROIs. Thus, the prediction models or classifiers are developed based on the analysis of bilateral image feature asymmetry computed from whole breast. Our study computes bilateral image feature asymmetry two matched ROIs, which can eliminate or significantly reduce the impact of the most heterogeneously normal breast tissue areas, and thus help improve CAD performance of lesion classification. For example, one previous CAD scheme using bilateral image feature asymmetry of whole mammograms reported a macro-AUC of 0.733 in detecting breast cancer,34 while our CAD scheme yields AUC = 0.876±0.031. Although two studies use different image datasets and their performance cannot be directly compared, we believe that classification performance of our new CAD scheme is encouraging, which is attributed to the quantification of bilateral image feature asymmetry of the targeted ROIs matched in pairs of bilateral mammograms.
Third, unlike many previous CAD schemes that use either traditional HCR features or automated DTL features separately, this study demonstrates the feasibility and advantages of fusing HCR and DTL features computed from two pairs of bilateral ROIs extracted from four mammograms. In using the pretrained VGG16 network as a feature extractor, we are able to mix HCR and DTL features into one initial feature pool. Thus, the optimal fusion feature vectors include both HCR and DTL features, which provide lower correlation or complementary information. Additionally, we also observe that in fusion feature vectors, majority of features are DTL feature (i.e., CC fusion feature vector contains 6 HCR features (14%) and 37 DTL features (86%)), which shows that DTL features make higher contribution in this CAD scheme. However, adding the minority of HCR features still make contribution to improve classification performance of the final fusion-based CAD scheme. In addition, although several other studies have also been conducted to fuse HCR and DTL features to develop CAD schemes of breast lesion classification, these schemes are limited to be single-view or faux case-based schemes as ROIs are extracted from all four views and classified independently or from only two-views omitting information contained in the contralateral 17, 18. Our study is the first study that fuses the bilateral asymmetry of HCR and DTL features computed from two pairs of the matched ROIs on CC and MLO views.
Fourth, although many fusion methods have been previously investigated aiming to help improve CAD performance, few studies have investigated and compared different fusion methods to identify the optimal method for the multi-level fusion problem. In this study, we test three fusion methods or tasks in developing this CAD framework namely, bilateral image fusion, ipsilateral image fusion, and finally fusion of multiple feature types. The first level of fusion is handled through the quantification of the bilateral asymmetry as this is when we fuse information extracted from two bilateral mammograms. Our justification for this type of fusion is based on the location of bilateral asymmetry in mammograms as an indicator of abnormalities. To determine the optimal way to fuse ipsilateral information and multiple feature types, we conduct several experiments with four different fusion methods. Results show that feature level fusion of the different feature types prior to training classifiers on each projection view is superior to output level fusion after training classifiers on each feature set independently.
Due to above unique characteristics or innovation of this study, we also make several interesting observations to further support or validate several important conclusions of previous studies. First, in our previous work, we developed a single view CAD scheme that fused HCR and DTL features extracted from only the CC view of a lesion and concluded that the CAD scheme trained by fusion of HCR and DTL features could yield significantly higher performance than the CAD schemes developed using only either HCR or DTL features.27 This work is an extension of our previous study which solidifies this conclusion using both the CC and MLO projection views. Second, we observe that late fusion of information extracted from different projection views performs better than when this information is fused earlier. This can be seen by the results of method 1 and method 3 as both methods keep the two projection views separate until the final classification step and yield the best classification performance in terms of AUC. We believe that this is because fibroglandular tissue patterns often appear very different on CC and MLO view projection images, which makes the information contained in the feature vectors extracted from the two view images very different. Hence, the superior result is obtained when the information extracted from multiple projection views is used to train classifiers separately. Third, we also observe that multi-view CAD systems tend to outperform single view CAD systems as demonstrated in many previous studies.8, 27 This conclusion is further validated and expanded in this study using a combination of HCR and DTL features from all four view mammograms in a complete case-based manner. In this study, the four-view fusion CAD system yields a classification performance of AUC = 0.876±0.031 with an accuracy of 0.792±0.044, while the performance of CAD schemes based on fusion of two bilateral images of either CC or MLO view only yield AUC of 0.817±0.026 and 0.792±0.026, and an accuracy of 0.745±0.033 and 0.721±0.035, respectively.
Although this is a unique case-based multi-view CAD framework that yields an encouraging performance of breast lesion classification, we recognize that the limitations of this study. First, we use a relatively simple ROI extraction technique to avoid introducing any potential bias or variability from an automated or semi-automated tumor segmentation scheme. This method may not have been optimal, therefore, we should investigate other lesion segmentation techniques prior to feature extraction.
Second, although many deep learning models have been used in CAD field as feature extractors, we used a pretrained VGG16 network as a feature extractor to decrease the computational complexity of this framework since using transfer learning for feature extraction does not require additional training of the network. We should test and compare different networks and methods for extracting the DTL features from these deep networks (i.e., using another popular ResNet50 model in CAD schemes35). Additionally, we plan to investigate the effects of transfer learning using a DL network pretrained on radiological images from the RadImageNet database as opposed to the natural images in the ImageNet database, as RadImageNet pretrained models have outperformed ImageNet pretrained models in some medical classification tasks. 36
Third, we conduct the feature reduction and selection process to identify the optimal feature vectors using the whole dataset. To minimize the possible bias, we also apply a 4-fold cross validation method in feature selection as reported in previous CAD studies. 37 Then, the features are used to build SVM classifiers using a 10-fold cross-validation method. Although this approach has advantages of identifying the final optimal feature vectors, it may introduce the risk of increasing bias to the classifiers because the testing cases are only blind to classifier training process and may be involved in feature selection process. To eliminate the possible bias, the feature selection process should be embedded into the cross-validation of classifier training and testing process, however, this process has disadvantages of higher computation costs and the inability to identify the final optimal feature vectors that can be applied “as is” to the new independent datasets in future validation studies. For this study, we believe that the impact of the potential bias can be ignored because our objective is to compare the relative performance changes among the several SVMs that are built using the same feature selection and classifier training and testing method.
Fourth, this framework is developed and tested using a singular dataset, therefore, it may not be generalizable to other mammography images that were taken at different centers on different machines. To further test and improve the generalizability and robustness of this new CAD framework, we will continue to expand our study dataset by collecting new images from our university medical center and utilizing publicly available databases in our future studies.
Fifth, we recognize that in current clinical practice, more and more 2D mammograms are synthetic images generated by digital breast tomosynthesis (DBT) images, which may have slightly different image quality or characteristics as comparing to original FFDM images. Thus, CAD scheme developed using FFDM images may need to be retrained to fit the DBT-generated synthetic images. However, the approved concept of this study is also valid to the DBT-generated synthetic images.
Last, this study only includes soft-tissue mass type lesions seen on both projection (CC and MLO) views, while this excludes a small fraction of subtle or difficult lesions. Future work should include cases where a mass is only seen in one view by developing and adding a new CAD module to handle and process these difficult cases.
5. Conclusions
In summary, we develop and test a novel case-based CAD framework of breast lesion classification in this study, which (1) extracts two sets of matched ROIs from the CC and MLO view of mammograms, (2) computes a set of bilateral asymmetric HCR and DTL image features (3) assembles two optimal fusion feature vectors mixed with both HCR and DTL features, and (4) builds final machine learning classifier (SVM) trained using the fusion feature vectors. By applying this new CAD framework to a diverse image dataset involving 964 cases of 3,856 FFDM images, we conduct a series of experiments to compare advantages and lesion classification performance using different image feature or classification score fusion methods. The study results demonstrate that (1) fusing HCR and DTL features for each pair of projection view before training a classifier is a better choice than fusing the outputs of classifiers trained on each type of features independently and (2) CAD classification performance is enhanced through the addition and fusion of image features computed from two ipsilateral (CC and MLO) views of the lesion. Overall, the study results fully support our hypothesis that (1) HCR and DTL features contain complementary information in lesion classification, (2) multi-view CAD outperforms single-view CAD for mammography lesion classification. The study results also highlight the significance of optimally fusing HCR and DTL image features computed from all four matched mammograms to enhance performance of the final CAD classifiers. However, this is a proof-of-concept type study, more work needs to further optimize and validate this new case-based CAD framework in future studies.
6. Acknowledgements
This research is supported in part by the National Institute of General Medical Sciences (NIGMS), National Institutes of Health (NIH), USA, under the grant numbers P20GM135009.
Footnotes
Conflict of Interest Statement
The authors have no relevant conflicts of interest to disclose.
8. References
- 1.DeSantis CE, Ma J, Gaudet MM, et al. Breast cancer statistics, 2019. CA: A Cancer Journal for Clinicians. 2019/11/01 2019; 69(6):438–451. [DOI] [PubMed] [Google Scholar]
- 2.Boyd NF, Guo H, Martin LJ, et al. Mammographic density and the risk and detection of breast cancer. New England Journal of Medicine. 2007; 356(3):227–236. [DOI] [PubMed] [Google Scholar]
- 3.Brodersen J, Siersma VD. Long-term psychosocial consequences of false-positive screening mammography. The Annals of Family Medicine. 2013; 11(2):106–115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Katzen J, Dodelzon K. A review of computer aided detection in mammography. Clin Imaging. 2018; 52:305–309. [DOI] [PubMed] [Google Scholar]
- 5.Nishikawa RM, Gur D. CADe for early detection of breast cancer—current status and why we need to continue to explore new approaches. Acad Radiol. 2014; 21(10):1320–1321. [DOI] [PubMed] [Google Scholar]
- 6.Berlin L, Hall FM. More mammography muddle: emotions, politics, science, costs, and polarization. Radiology. 2010; 255(2):311–316. [DOI] [PubMed] [Google Scholar]
- 7.McCann J, Stockton D, Godward S. Impact of false-positive mammography on subsequent screening attendance and risk of cancer. Breast Cancer Research. 2002; 4(5):1–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Jouirou A, Baâzaoui A, Barhoumi W. Multi-view information fusion in mammograms: A comprehensive overview. Information Fusion. 2019; 52:308–321. [Google Scholar]
- 9.Scutt D, Lancaster GA, Manning JT. Breast asymmetry and predisposition to breast cancer. Breast Cancer Res. 2006; 8(2):R14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Guo Y, Sivaramakrishna R, Lu C-C, Suri JS, Laxminarayan S. Breast image registration techniques: a survey. Medical and Biological Engineering and Computing. 2006; 44(1):15–26. [DOI] [PubMed] [Google Scholar]
- 11.Tan M, Qian W, Pu J, Liu H, Zheng B. A new approach to develop computer-aided detection schemes of digital mammograms. Physics in Medicine & Biology. 2015; 60(11):4413. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Zheng B, Leader JK, Abrams GS, et al. Multiview-based computer-aided detection scheme for breast masses. Med Phys. Sep 2006; 33(9):3135–43. [DOI] [PubMed] [Google Scholar]
- 13.Khan HN, Shahid AR, Raza B, Dar AH, Alquhayz H. Multi-view feature fusion based four views model for mammogram classification using convolutional neural network. IEEE Access. 2019; 7:165724–165733. [Google Scholar]
- 14.Tan M, Zheng B, Ramalingam P, Gur D. Prediction of near-term breast cancer risk based on bilateral mammographic feature asymmetry. Academic Radiology. 2013; 20(12):1542–1550. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Li Y, Fan M, Cheng H, Zhang P, Zheng B, Li L. Assessment of global and local region-based bilateral mammographic feature asymmetry to predict short-term breast cancer risk. Phys Med Biol. 2018; 63(2):025004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Yang Q, Li L, Zhang J, Shao G, Zhang C, Zheng B. Computer-aided diagnosis of breast DCE-MRI images using bilateral asymmetry of contrast enhancement between two breasts. Journal of Digital Imaging. 2014; 27(1):152–160. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Antropova N, Huynh BQ, Giger ML. A deep feature fusion methodology for breast cancer diagnosis demonstrated on three imaging modality datasets. Med Phys. 2017; 44(10):5162–5171. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Cui Y, Li Y, Xing D, Bai T, Dong J, Zhu J. Improving the prediction of benign or malignant breast masses using a combination of image biomarkers and clinical parameters. Front Oncol. 2021; 11:629321–629321. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Ashgan M. Omer ME Preprocessing of digital mammogram image based on Otsu’s threshold. American Scientific Research Journal for Engineering, Technologym and Sciences. 2017; 37(1):220–229. [Google Scholar]
- 20.Chakraborty J, Mukhopadhyay S, Singla V, Khandelwal N, Bhattacharyya P. Automatic detection of pectoral muscle using average gradient and shape based feature. J Digit Imaging. Jun 2012; 25(3):387–99. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Jas M, Mukhopadhyay S, Chakraborty J, Sadhu A, Khandelwal N. A heuristic approach to automated nipple detection in digital mammograms. J Digit Imaging. Oct 2013; 26(5):932–40. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Zheng B, Leader JK, Abrams GS, et al. Multiview-based computer-aided detection scheme for breast masses. Med Phys. 2006; 33(9):3135–3143. [DOI] [PubMed] [Google Scholar]
- 23.Wienbeck S, Uhlig J, Fischer U, et al. Breast lesion size assessment in mastectomy specimens: Correlation of cone-beam breast-CT, digital breast tomosynthesis and full-field digital mammography with histopathology. Medicine (Baltimore). 2019; 98(37):e17082. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Díez Y, Oliver A, Llado X, et al. Revisiting intensity-based image registration applied to mammography. IEEE Transactions on Information Technology in Biomedicine. 2011; 15(5):716–725. [DOI] [PubMed] [Google Scholar]
- 25.Lowekamp BC, Chen DT, Ibáñez L, Blezek D. The design of SimpleITK. Frontiers in neuroinformatics. 2013; 7:45. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Wei CH, Li Y, Li CT. Effective extraction of Gabor features for adaptive mammogram retrieval. 2007; 1503–1506. [Google Scholar]
- 27.Jones MA, Faiz R, Qiu Y, Zheng B. Improving mammography lesion classification by optimal fusion of handcrafted and deep transfer learning features. Phys Med Biol. 2022; 67(5):054001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Heidari M, Mirniaharikandehei S, Khuzani AZ, Danala G, Qiu Y, Zheng B. Improving the performance of CNN to predict the likelihood of COVID-19 using chest X-ray images with preprocessing algorithms. International journal of medical informatics. 2020; 144:104284. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Tan M, Pu J, Zheng B, Optimization of breast mass classification using sequential forward floating selection (SFFS) and a support vector machine (SVM) model, Int J Comput Assist Radiol Surg. 2014; 9(6):1005–1020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Yassin NIR, Omran S, El Houby EMF, Allam H. Machine learning techniques for breast cancer computer aided diagnosis using different image modalities: A systematic review. Comput Methods Programs Biomed. Mar 2018; 156:25–45. [DOI] [PubMed] [Google Scholar]
- 31.Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: synthetic minority oversampling technique. Journal of artificial intelligence research. 2002; 16:321–357. [Google Scholar]
- 32.Yan S, Qian W, Guan Y, Zheng B, Improving lung cancer prognosis assessment by incorporating synthetic minority oversampling technique and score fusion method, Med Phys. 2016; 43:2694–2703. [DOI] [PubMed] [Google Scholar]
- 33.Hina I, Raza SA, Basit R, Hasan K. Multi-view attention-based late fusion (MVALF) CADx system for breast cancer using deep learning. Machine Graphics and Vision. 2020; 29(1/4):55–78. [Google Scholar]
- 34.Geras KJ, Wolfson S, Shen Y, et al. High-resolution breast cancer screening with multi-view deep convolutional neural networks. arXiv preprint arXiv:170307047. 2017; [Google Scholar]
- 35.Islam W, Jones MA, Faiz R, Sadeghipour N, Qiu Y, Zheng B, Improving performance of breast lesion classification using a ResNet50 model optimized with a novel attention mechanism, Tomography, 2022; 8(5):2411–2425. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Mei X, Liu Z, Robson PM, Marinelli B, Huang M, Doshi A, et al. RadImageNet: An Open Radiologic Deep Learning Research Dataset for Effective Transfer Learning. Radiology: Artificial Intelligence. 2022;4(5):e210315. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Li T, Liu Y, Guo J, Wang Y , Prediction of the activity of Crohn’s disease based on CT radiomics combined with machine learning models, J Xray Sci Technol. 2022; 30(6):1155–1168. [DOI] [PubMed] [Google Scholar]