Abstract
Purpose:
In this paper the authors propose a texton based prostate computer aided diagnosis approach which bypasses the typical feature extraction process such as filtering and convolution which can be computationally expensive. The study focuses the peripheral zone because 75% of prostate cancers start within this region and the majority of prostate cancers arising within this region are more aggressive than those arising in the transitional zone.
Methods:
For the model development, square patches were extracted at random locations from malignant and benign regions. Subsequently, extracted patches were aggregated and clustered using k-means clustering to generate textons that represent both regions. All textons together form a texton dictionary, which was used to construct a texton map for every peripheral zone in the training images. Based on the texton map, histogram models for each malignant and benign tissue samples were constructed and used as a feature vector to train our classifiers. In the testing phase, four machine learning algorithms were employed to classify each unknown sample tissue based on its corresponding feature vector.
Results:
The proposed method was tested on 418 T2-W MR images taken from 45 patients. Evaluation results show that the best three classifiers were Bayesian network (Az = 92.8% ± 5.9%), random forest (89.5% ± 7.1%), and k-NN (86.9% ± 7.5%). These results are comparable to the state-of-the-art in the literature.
Conclusions:
The authors have developed a prostate computer aided diagnosis method based on textons using a single modality of T2-W MRI without the need for the typical feature extraction methods, such as filtering and convolution. The proposed method could form a solid basis for a multimodality magnetic resonance imaging based systems.
Keywords: computer aided detection, prostate MRI and texton
1. INTRODUCTION
Prostate cancer is one of the most common cancers affecting men, with an estimated 1.1 × 106 diagnoses and 307 000 deaths in the world in 2012.1 In the same year, there were 41 736 cases reported in the UK with 10 837 deaths. Prostate cancer rates in the UK have at least tripled over the last 35 years, causing prostate cancer to be the most common cancer among British men. In 2015, there were an estimated 220 800 incidences and around 27 500 deaths, making it the second most deadly cancer in the United States.2
The most common methods used for preliminary screening are the prostate-specific antigen (PSA) test, digital rectal examination (DRE), and transrectal ultrasound (TRUS) guided biopsy. However, these methods have limited sensitivity and specificity. For example, an elevated PSA level does not always indicate the occurrence of prostate cancer because several factors can increase PSA levels such as a urine infection, vigorous exercise, and ejaculation in the 48 h before a PSA test.3 The DRE test is highly dependent on the experience of the examiner. For example, a more experienced examiner can detect subtle abnormalities in comparison to less experienced clinicians.3,4 For TRUS guided biopsy, due to its random procedure, the sample needle can miss cancerous and significant tissues, meaning the test result can indicate incorrect results.5
Since there is still space for improvement in the reliability of clinical methods for screening and detecting prostate cancer, integrating magnetic resonance imaging (MRI) into clinical practices (e.g., MRI/ultrasound guided biopsy and multimodality image fusion) is becoming popular as it has shown a significant improvement over PSA and TRUS alone.3,6 Unfortunately such methods require substantial expertise from the radiologist. Previous studies have shown a high degree of interobserver variability,3,6 indicating a high risk of human error. They are also time consuming. Computer aided diagnosis (CAD) can assist radiologists in the interpretation of medical images by providing a “second” opinion, which eliminates variability among radiologists, speeds up the analysis of the images, and improves diagnosis decision results.7,8
Developing CAD systems is a difficult task due to variations in the appearance of anatomy in images produced by MRI scanners. Recent studies3,4,9–11 have reported the deficiencies of T2-weighted (T2-W) MRI including weak texture descriptors that could be affected by noise. In fact, Tiwari et al.12,13 suggested that T2-W MRI texture features alone are insufficient to identify prostate malignancies. Recently, a popular approach to improve the performance of CAD systems is using multiparametric MRI. The use of multimodality MRI in developing CAD systems is a popular way to improve the performances of existing methods. It is noted that using T2-W alone is deemed insufficient, but that T2-W classification can form a solid basis for a multimodality MRI based CAD system.
In 2015, Lemaître et al.14 conducted a review of CAD systems for prostate cancer detection and reported that there are 42 studies in the literature from 2007 until 2014. Most of the methods described used typical feature extraction algorithms based on filtering and convolution, which can be computationally expensive. The large number of extracted features also led to the need for the additional step of feature selection or dimensionality reduction. None of the CADs reviewed in Ref. 14 have used textons to discriminate benign and malignant tissues in their studies. Although the term texton was first introduced in the 1980’s, it did not get much attention until a study of texture classification by Leung and Malik15 in 2001. Later, similar studies showing promising results in texture classification were conducted by Varma and Zisserman16,17 in 2005 and 2009, respectively. In medical image analysis textons have been used in retinal vessel segmentation18 and lung cancer detection.19
Textons can be seen as a representation of microstructures in natural images and are considered as the atoms of preattentive human visual perception.20 In the original approach, textons were represented by means of a collection of filter bank responses obtained from large filter banks such as the MR8,21 LM,15 S (Ref. 22) filter banks and Gabor filter.23 All the response vectors were collected and clustered using k-means and the resulting cluster centers were called textons (hence, in a simplest definition textons are the k-means’ cluster centers). Nevertheless, the study in Ref. 24 showed textons can be generated by directly clustering the image’s pixel values from patches without the need for filter banks (hence, speeding up the process of constructing the texton dictionary).
In Refs. 16 and 17, the authors made quantitative comparisons between a typical texton-based approach (using a filter bank) and a texton-based approach without filter bank, which showed that the latter approach produced better classification results. The study in Ref. 24 suggested there are three reasons for the relatively strong performance of textons generated from the image’s pixels in comparison to textons generated from the convolved image’s pixels. First, the use of filter banks reduces the number of textons that can be extracted from a texture image.25 For example, the number of textons are significantly reduced when an image of 250 × 250 pixel is convolved with a 50 × 50 filter. This affects the ability of the histogram models to characterize a particular texture (i.e., insufficient information to model the actual representation of the image). Second, the large number of filters leads to small errors in the edge localizations which may significantly change the geometry of the textons, leading to errors in the estimation of the texton frequency histogram.25 Finally, most filter banks lead to some blurring of the texture which might remove local details in the texture hence resulting in different textons.25 Based on these reasons the proposed method in this paper did not use any filter bank but took the image’s pixels directly to generate textons.
The aim of this paper is to investigate the use of textons without the need for filtering or convolution for feature extraction in prostate cancer CAD system in a single modality T2-W MRI. The novel contributions of our work are the following:
-
1.
This is the first CAD method which has investigated the use of textons in classifying benign and malignant tissues within the prostate gland in MRI.
-
2.
The proposed method learns directly from image pixels without the need to use a filter bank. In comparison, most prostate cancer CAD systems in the literature compute large numbers of texture descriptors, which are computationally expensive. In fact, computing a large number of texture descriptors also leads to an additional (essential) step of feature selection or dimensionality reduction.
The clinical motivations of our work are threefold:
-
1.
Finding cancer regions in each MRI image manually by a radiologist is time consuming. A CAD system can potentially speed up this process by delineating cancerous regions automatically.
-
2.
Computer algorithms are deterministic while radiologists’ results can be variable. For example, fatigue affects radiologists’ performances, potentially resulting in missing cancerous regions.
-
3.
The accuracy of detecting prostate cancer among radiologists varies depending on the level of experience. In comparison, a CAD system can eliminate this issue as it can provide consistent results.
-
4.
Prostate cancer CAD acts as a second opinion which can significantly improve the performance of less experienced radiologists.
2. METHODOLOGY
Figure 1 shows a flowchart of our proposed method. For every input image, we roughly estimate the area of the prostate’s peripheral zone (PZ) followed by normalization and noise reduction. Subsequently, for every training image we randomly extract patches from benign and malignant regions within the PZ and employed k-means clustering to generate textons (the output of this stage in the texton dictionary). Each pixel in every training image is labeled with the texton to which it lies closest, producing texton maps. Using the texton maps, a histogram of textons (the frequency with which each texton occurs) is constructed for every pixel within the PZ. The texton histograms from all pixels are treated as feature vectors and used to train our classifiers. Finally, at the testing phase, every unseen PZ is processed in the same way and the trained classifiers are used to decide, for each pixel, whether it belongs to the benign or malignant class. Subsections 2.A–2.D will explain this process in more detail.
FIG. 1.
A general overview of the proposed method.
2.A. Capturing the peripheral zone
We employed the 2D model developed by Rampun et al.26 to estimate the area of the PZ. The method uses a quadratic equation based on the central coordinates of the prostate gland, the left-most and right-most coordinates of the prostate gland boundary (each prostate boundary was provided by a radiologist). This allows us to model a priori general knowledge of radiologists which is similar to the methods of Makni et al.28 and Liu et al.27 Figure 2 shows an example of MRI images with the ground truth location of the prostate gland, transitional/central zone (CZ), and tumor (T) represented in red, yellow, and green, respectively, while the magenta line is the estimated boundary of the PZ based on the method given in Ref. 27. Note that our study is only developed within the segmented PZ which is under the magenta line in Fig. 2. Our clinical justifications for currently focusing on the PZ are as follows4,29,30
-
1.
More than 75% of prostate cancers are located within this region.
-
2.
The majority of prostate cancers arising within this region are more aggressive than those arising in the transitional zone.
-
3.
Most prostate cancers start to develop in this zone before spreading to the transitional zone.
FIG. 2.
Example images of prostate MRI with ground truth delineated by an expert radiologist and the estimated PZ region under the magenta line. PZ (peripheral zone), CZ (central zone), and T (tumor).
2.B. Preprocessing
The major problem with MR images is that specific tissues do not have fixed specific intensity values. This is mainly caused by Refs. 4 and 31–33: (a) corruption by thermal noise due to receiver coils, (b) different scanning protocols causing large intensity variations, and (c) poor radio frequency coil uniformity. These can significantly affect the discriminative performance hence needed to be corrected.33 Following the preprocessing procedure method described in Refs. 4, 34, and 35, each image is median filtered to preserve edge boundaries. Subsequently, image intensities were normalized to zero mean unit variance and anisotropic diffusion filtering35,36 was applied to remove noise.
We used the anisotropic diffusion and median filter to eliminate low-level noise and sharp noise (e.g., bright spikes), respectively. The anisotropic diffusion is a robust filter, however, studies4,35 have shown that it is ineffective for eliminating sharp noise. The idea behind anisotropic diffusion is to use a diffusion function to prevent smoothing happening across edges, and therefore it preserves edges in the images. Unfortunately, the gradient to the noise element (e.g., sharp noise) may compete with edge responses and the diffusion function cannot distinguish between image structure and noise contribution. By combining both filters, they work in a complementary way to gradually eliminate the overall noise element without blurring the edges and textures of the image. The filters we use are both of size 3 × 3 for the entire study.
2.C. Texton dictionary
Figure 3 shows the summary of steps to construct the texton dictionary. Textons (i.e., m × n window square, where m and n are rows and columns, respectively) were retrieved from benign and malignant regions. To construct the texton dictionary, we followed the work of Varma and Zisserman.16,17,21,24 For every PZ area in the training images we randomly extracted m × n patches of raw pixels from benign and malignant regions. Subsequently, all patches extracted from benign regions were clustered using the k-means algorithm. The same process was performed for all patches extracted from malignant regions. A summary of the k-means algorithm can be found in Ref. 37. The cluster centroids produced by the k-means algorithm are the textons. Once all textons from both classes (benign and malignant) were generated, they were combined to form the texton dictionary. As shown in Fig. 3, each texton is unique and has its own id (TX = tx1, tx2, tx3, …, txn) saved in a matrix which will be used to construct the texton map for each image.
FIG. 3.
Generating the texton dictionary. Patches from the same class are aggregated and clustered using the k-means algorithm. Red (black solid line) and blue (white dotted line) patches are malignant and benign samples, respectively.
Figure 4 shows textons extracted from both classes. Visually it can been seen that textons generated from benign regions (bottom row) look smoother than the ones retrieved from malignant regions (top row).
FIG. 4.
Example of textons generated from malignant (top row) and benign (bottom row) regions.
2.D. Feature extraction
In this study, texton histograms are used to model benign and malignant tissues. These histograms are calculated in two main stages (Fig. 3). The first stage is to generate the texton map, where every pixel in the image (within the PZ) is assigned to the closest texton (using the Euclidean distance) in the texton dictionary. We used a sliding window, WT of the same size as the textons, and found the texton, TX, with the shortest Euclidean distance to the image patch WT. Subsequently, the central pixel in WT is assigned the texton id which is the closest to WT (Fig. 5). This process was repeated for every pixel in the image until all pixels were assigned with the corresponding textons “ids.” By the end of this stage, a texton map was constructed for every PZ which was used in the subsequent stages.
FIG. 5.
A graphical illustration on how to construct a texton value of an image patch.
Figure 6 shows examples of texton maps of three PZs generated in this phase. Each pixel within the PZ was replaced with the corresponding texton id. At the second stage, using the texton map we were able to generate a histogram model for each pixel by using a sliding window of the same size. A histogram for each pixel was constructed based on the occurrence of each texton’s frequency within the neighborhood of the central pixel (including the central pixel).
FIG. 6.
Examples of texton maps of three PZs taken from three different prostates.
Figure 7 shows an example of constructing a histogram for each pixel. In this example, there are 10 textons (5 textons for each class) in the dictionary and each histogram of a tissue was constructed based on 9 × 9 window size; this means the histogram was constructed based on the texton frequency within 81 pixels (and 25 pixels for 5 × 5 window size). Note that every histogram was normalized to unity. This yielded histograms for each tissue in the training images, which are used as feature vectors representing every pixel. To this end, each pixel is represented in a txt dimension feature space where txt is the number of textons in the dictionary (10 in this example). Similarly, if there are 30 textons (15 textons/class), each pixel is represented in a 30 dimensional feature space. It should be noted that the data dimensional is independent of WT. Finally, the constructed histogram for each pixel was treated as a feature vector in the training and testing phases.
FIG. 7.
Constructing a histogram for each pixel from the texton map.
3. EXPERIMENTAL SETTINGS
3.A. Materials and dataset
Our dataset consists of 418 T2-W MR images (227 malignant and 191 normal slices resulting in 74 208 malignant pixels and 97 310 normal pixels) taken from 45 patients aged 54–74 (all patients had biopsy-proven prostate cancer). Each patient has between 6 and 13 slices covering the top to the bottom of the prostate gland. The prostate gland, malignant and transitional zones were delineated by an expert radiologist with more than 10 yr experience in prostate MRI. All sequences with prostate cancer cases were confirmed malignancies based on TRUS biopsy reports. All annotated cases of malignant cancer were confirmed as clinically significant (Gleason score grade 7 and above). Vollmer38 who conducted a study (526 patients) about the relationship between tumor length and Gleason score and found that tumor length was positively and significantly related to Gleason score. The study found that the median tumor length for Gleason scores 4–6 was 2.9 mm and the median length for Gleason scores 7–10 was 13.1 mm. These results are similar to the study conducted by Lee et al.39 who found most cancers with Gleason score ≥7 have longer cancer core length (based on more than 5000 patients). Another study conducted by Billis et al.40 covering 401 patients with Gleason score ≤6 found that the median core cancer length was 1.5 mm with the range of 0.5–3.0 mm. All patients underwent T2-W MR imaging at the Department of Radiology Norfolk and Norwich University Hospital, Norwich, UK. MR acquisitions were performed prior to radical prostatectomy. All images were obtained on a 1.5 T magnet (Sigma, GE Medical Systems, Milwaukee, USA) using a phased array pelvic coil, with a 24 × 24 cm field of view, 512 × 512 matrix, 3 mm slice thickness, 3.5 mm interslice gap, and 0.47 mm pixel spacing.
Approximately 60% (preprocessing, PZ modeling, construction of texton dictionary, feature extraction, and segmentation) of the source code was written in matlab 2012a and 40% (training and testing in machine learning) was written in java. All experiments were run under the Windows 7 operating system with an Intel Core i5 processor.
3.B. Training and testing
All pixels within the radiologist’s tumor annotation were extracted as prostate cancer samples. This area was delimited by the tumor mask, to ensure no pixels outside the tumor region were included in the malignant samples. All pixels outside the tumor region and within the PZ were considered benign samples. Similarly, this region was delimited by the tumor or prostate gland masks to ensure no pixels within the tumor region and outside the prostate gland were included as benign samples.
A stratified nine runs ninefold cross-validation (9-FCV) scheme was employed. The folds were populated on a patient basis to ensure no samples from the same patient were used in the training and testing phases (45 patients in our case, hence each fold contains five patients). Each classifier was trained using the histograms of textons from the training partition for that fold. In the testing phase each unseen instance/pixel from the testing data (taken from five randomly selected patients) was classified as malignant or benign. During the cross-validation we set aside a small set of patients (in our case five patients) as the validation set and trained the classifiers on the remaining 40 patients. This process was randomized and the results presented in our paper are the average of the 9-FCV.
For classification, we employed four machine learning algorithms in weka (Ref. 41) namely the Bayesian network (BNet),42 alternating decision tree (ADTree),43 random forest (RF),44 linear discriminant analysis (LDA), (Ref. 45) and k-nearest neighbors (k-NN) with the following default settings: hill climbing search algorithm, number of boosting iterations is 10, number of initial random forests are 100, the ridge parameter is 1 × 10−6, and k = 1, respectively.
4. EXPERIMENTAL RESULTS
The performances were measured using the most popular metrics in the literature: area under the curve (Az, also known as AUC) and classification accuracy (CA). The Az indicates the trade-off between the true positive rate against the false positive rate, where CA represents the number of pixels classified correctly. The CA can be calculated as (TP + TN)/(TN + TP + FP + FN). TP and FP denote the number of true positives and false positives, respectively. Similarly, TN and FN indicate the numbers of true negatives and false negatives. Note that the Az values in this paper are presented as a percentage (0–100) instead of the automatic normalized range (0–1). The standard deviation measures the dispersion of both metrics across ninefold cross-validation. The t-test statistics was used to compare the best results (both metrics) between the best classifier and the other classifiers. This determines whether the best classifier produced significantly better results in comparison to the other classifiers.
One of the main challenges in developing a texton based approach in texture classification is finding the best window size (ws) for WT and the number of textons (txt) as these parameters can influence the classification results. For this purpose, we used the following: ws = 5 × 5, 7 × 7, 9 × 9, 11 × 11, and 13 × 13. For txt, we tested the following values: 6, 10, 12, 16, 20, 24, and 30. Note that these numbers represent the number of textons for both classes (e.g., txt = 6, 3 benign textons and 3 malignant textons).
Tables I and II show Az and CA results, respectively, using the ADTree classifier which produced the best results of Az = 83.2% and CA = 75.3%. At the smaller ws, the classifiers produced Az ranging from 74% to 77%. The results are slightly better at ws = 7 × 7 with Az ranging from 75% to 79%. It produced the best Az at 9 × 9 with average Az > 80% regardless of the number of textons. However, in terms of accuracy the ADTree classifiers produced better results on average at 11 × 11 (CA > 70%).
TABLE I.
Az (%) values for ADTree classifier. Boldface means highest value for each metric.
| 5 × 5 | 7 × 7 | 9 × 9 | 11 × 11 | 13 × 13 | |
|---|---|---|---|---|---|
| 6 | 77.3 ± 12.1 | 79.4 ± 12.4 | 83.8 ± 10.8 | 83.2 ± 10.6 | 82.3 ± 10.7 |
| 10 | 76.5 ± 11.9 | 77.6 ± 13.0 | 82.9 ± 9.8 | 80.7 ± 10.7 | 80.2 ± 9.9 |
| 12 | 75.9 ± 11.9 | 77.3 ± 12.2 | 81.9 ± 9.7 | 80.2 ± 11.5 | 80.0 ± 10.7 |
| 16 | 76.2 ± 11.7 | 77.0 ± 12.0 | 82.1 ± 10.7 | 82.5 ± 9.6 | 81.3 ± 10.2 |
| 20 | 75.1 ± 11.3 | 76.1 ± 10.1 | 81.6 ± 10.2 | 76.5 ± 12.9 | 76.3 ± 11.9 |
| 24 | 75.6 ± 11.9 | 75.6 ± 11.8 | 80.9 ± 9.9 | 76.6 ± 12.1 | 75.9 ± 11.5 |
| 30 | 74.3 ± 12.0 | 75.1 ± 12.1 | 80.7 ± 10.9 | 76.3 ± 12.6 | 75.5 ± 11.7 |
TABLE II.
CA (%) values for ADTree classifier. Boldface means highest value for each metric.
| 5 × 5 | 7 × 7 | 9 × 9 | 11 × 11 | 13 × 13 | |
|---|---|---|---|---|---|
| 6 | 71.0 ± 13.3 | 70.3 ± 14.6 | 71.9 ± 15.1 | 72.7 ± 14.6 | 70.9 ± 15.1 |
| 10 | 69.5 ± 15.6 | 68.2 ± 14.4 | 69.7 ± 16.7 | 70.7 ± 16.3 | 70.1 ± 15.5 |
| 12 | 70.8 ± 15.4 | 67.4 ± 14.9 | 69.3 ± 12.8 | 70.5 ± 16.3 | 70.3 ± 16.1 |
| 16 | 68.6 ± 16.1 | 66.9 ± 15.4 | 69.3 ± 10.7 | 75.3 ± 13.6 | 74.3 ± 12.9 |
| 20 | 68.3 ± 16.2 | 68.5 ± 14.5 | 69.2 ± 14.9 | 69.8 ± 16.2 | 68.3 ± 15.3 |
| 24 | 68.1 ± 17.5 | 65.8 ± 16.3 | 70.0 ± 14.9 | 68.4 ± 16.8 | 67.1 ± 16.1 |
| 30 | 63.9 ± 21.1 | 65.3 ± 15.1 | 69.1 ± 15.1 | 69.7 ± 16.8 | 68.1 ± 15.3 |
The results for the BNet classifier can be found in Tables III and IV. The BNet classifier performed best when features were extracted using a larger window (e.g., 11 × 11). At ws = 11 × 11 and txt = 16 the BNet outperformed the other classifiers with Az = 92.8% and CA = 84%. At ws = 13 × 13 with the same value of txt the BNet classifier decreased 1.5% on average. In fact, it can also be seen that all Az and CA went down regardless of the number of textons. One noticeable pattern that can be seen from the results in Tables III and IV is that at ws = 5 × 5 until 11 × 11 the range of Az value increased around 4%–5% at each window size and gradually decreased at 13 × 13. In terms of the number of textons, it did not affect the performance much on both metrics (variation between 1% and 3%).
TABLE III.
Az (%) values for BNet classifier. Boldface means highest value for each metric.
| 5 × 5 | 7 × 7 | 9 × 9 | 11 × 11 | 13 × 13 | |
|---|---|---|---|---|---|
| 6 | 79.2 ± 11.5 | 83.7±10.7 | 89.5 ± 7.9 | 90.8 ± 7.8 | 89.4 ± 7.7 |
| 10 | 80.1 ± 11.4 | 84.3±10.2 | 90.3 ± 6.7 | 92.0 ± 6.9 | 91.3 ± 7.5 |
| 12 | 80.2 ± 11.0 | 84.8±10.1 | 90.0 ± 7.1 | 91.6 ± 6.1 | 90.7 ± 6.9 |
| 16 | 81.7 ± 10.6 | 85.0±9.9 | 90.9 ± 7.1 | 92.8 ± 5.9 | 91.3 ± 5.7 |
| 20 | 80.5 ± 11.2 | 85.3±9.7 | 90.8 ± 7.0 | 91.4 ± 6.0 | 90.7 ± 5.9 |
| 24 | 80.9 ± 11.5 | 85.4±9.6 | 90.8 ± 6.8 | 91.5 ± 6.3 | 90.1 ± 6.1 |
| 30 | 81.0 ± 10.9 | 84.2±8.9 | 90.9 ± 7.3 | 91.8 ± 5.9 | 91.1 ± 6.0 |
TABLE IV.
CA (%) values for BNet classifier. Boldface means highest value for each metric.
| 5 × 5 | 7 × 7 | 9 × 9 | 11 × 11 | 13 × 13 | |
|---|---|---|---|---|---|
| 6 | 70.0 ± 15.2 | 73.6 ± 13.2 | 78.6 ± 10.2 | 80.5 ± 10.8 | 78.1 ± 10.2 |
| 10 | 70.4 ± 14.1 | 74.0 ± 12.4 | 80.1 ± 9.9 | 83.0 ± 8.5 | 82.3 ± 9.5 |
| 12 | 70.4 ± 14.5 | 74.5 ± 11.7 | 80.3 ± 10.1 | 82.1 ± 10.1 | 81.7 ± 10.2 |
| 16 | 72.3 ± 14.2 | 75.1 ± 11.8 | 81.4 ± 8.9 | 84.0 ± 7.0 | 82.8 ± 7.5 |
| 20 | 70.2 ± 14.8 | 75.7 ± 12.6 | 81.9 ± 8.0 | 82.0 ± 8.9 | 81.5 ± 8.3 |
| 24 | 71.6 ± 14.6 | 75.6 ± 12.1 | 81.4 ± 8.9 | 83.0 ± 8.3 | 82.3 ± 8.9 |
| 30 | 70.8 ± 14.3 | 75.3 ± 11.9 | 81.8 ± 8.2 | 82.8 ± 8.7 | 81.9 ± 7.9 |
The k-NN classifier performed well with Az = 86.9% and CA = 80.2% as shown in Tables V and VI, respectively. In terms of the area under the curve, the classifier performed better with a smaller number of textons regardless of the ws. This can be seen in Table V where most Az values are above 80% at txt = 6, but as the txt value increases the Az value decreases to around 70%. The lowest accuracy was produced at the largest window of 13 × 13 with 30 textons in the texton dictionary.
TABLE V.
Az (%) values for k-NN classifier. Boldface means highest value for each metric.
| 5 × 5 | 7 × 7 | 9 × 9 | 11 × 11 | 13 × 13 | |
|---|---|---|---|---|---|
| 6 | 80.2 ± 11.3 | 82.7 ± 9.9 | 86.9 ± 7.5 | 85.2 ± 7.1 | 85.6 ± 7.4 |
| 10 | 80.0 ± 10.1 | 80.8 ± 9.1 | 83.5 ± 6.8 | 80.1 ± 7.9 | 79.1 ± 7.9 |
| 12 | 79.4 ± 9.6 | 78.3 ± 8.7 | 80.4 ± 7.5 | 79.0 ± 8.5 | 78.8 ± 8.3 |
| 16 | 77.0 ± 9.4 | 74.8 ± 8.4 | 77.3 ± 7.4 | 77.7 ± 8.2 | 76.6 ± 8.1 |
| 20 | 73.6 ± 8.6 | 72.2 ± 8.4 | 75.5 ± 7.4 | 72.8 ± 8.4 | 74.0 ± 8.9 |
| 24 | 71.7 ± 8.3 | 70.1 ± 7.8 | 74.5 ± 7.9 | 71.6 ± 9.2 | 69.9 ± 9.5 |
| 30 | 69.8 ± 8.1 | 69.8 ± 7.3 | 73.1 ± 7.4 | 70.7 ± 9.1 | 69.8 ± 9.3 |
TABLE VI.
CA (%) values for k-NN classifier. Boldface means highest value for each metric.
| 5 × 5 | 7 × 7 | 9 × 9 | 11 × 11 | 13 × 13 | |
|---|---|---|---|---|---|
| 6 | 74.8 ± 11.3 | 74.3 ± 11.2 | 78.1 ± 9.6 | 80.2 ± 9.1 | 78.9 ± 9.3 |
| 10 | 74.7 ± 11.7 | 74.2 ± 9.1 | 75.9 ± 8.5 | 74.3 ± 9.9 | 76.3 ± 9.4 |
| 12 | 75.5 ± 10.8 | 73.2 ± 8.7 | 74.3 ± 8.5 | 73.6 ± 10.1 | 72.1 ± 10.3 |
| 16 | 73.4 ± 10.3 | 71.1 ± 8.5 | 73.1 ± 8.3 | 74.3 ± 9.6 | 73.2 ± 9.8 |
| 20 | 72.2 ± 9.5 | 69.6 ± 8.2 | 71.8 ± 8.0 | 70.0 ± 9.6 | 69.3 ± 9.7 |
| 24 | 71.1 ± 9.6 | 68.7 ± 8.4 | 71.4 ± 8.4 | 68.9 ± 10.1 | 68.5 ± 10.4 |
| 30 | 69.9 ± 9.3 | 67.8 ± 8.5 | 70.6 ± 8.2 | 68.5 ± 10.5 | 67.7 ± 10.6 |
Tables VII and VIII show the results of the second best classifier in our experiments which is the RF. In terms of Az, the RF performed best at larger ws (e.g., 9 × 9 and 11 × 11) with txt = 6 or 10. In our experiment, using the maximum number of textons (txt = 30) decreased the Az from 89.5% to 81.6%, which is statistically significant (p < 0.001). On the other hand, both metrics are highest at ws = 11 × 11 and lowest at ws = 5 × 5.
TABLE VII.
Az (%) values for RF classifier. Boldface means highest value for each metric.
| 5 × 5 | 7 × 7 | 9 × 9 | 11 × 11 | 13 × 13 | |
|---|---|---|---|---|---|
| 6 | 80.3 ± 11.3 | 83.6 ± 9.8 | 88.2 ± 7.4 | 89.5 ± 7.1 | 86.2 ± 8.1 |
| 10 | 80.6 ± 9.9 | 83.0 ± 8.8 | 87.2 ± 6.6 | 86.7 ± 7.8 | 85.5 ± 8.2 |
| 12 | 80.4 ± 9.6 | 81.9 ± 8.7 | 85.7 ± 6.7 | 86.0 ± 8.0 | 82.7 ± 7.9 |
| 16 | 79.5 ± 9.3 | 80.4 ± 8.7 | 85.1 ± 7.5 | 86.3 ± 8.1 | 83.6 ± 8.2 |
| 20 | 77.3 ± 9.1 | 79.3 ± 8.9 | 84.4 ± 7.4 | 82.6 ± 8.9 | 81.1 ± 8.2 |
| 24 | 76.9 ± 8.9 | 78.8 ± 9.1 | 83.9 ± 7.6 | 82.0 ± 9.4 | 81.3 ± 9.5 |
| 30 | 75.9 ± 9.1 | 78.3 ± 9.9 | 83.4 ± 8.4 | 81.6 ± 9.7 | 80.9 ± 8.9 |
TABLE VIII.
CA (%) values for RF classifier. Boldface means highest value for each metric.
| 5 × 5 | 7 × 7 | 9 × 9 | 11 × 11 | 13 × 13 | |
|---|---|---|---|---|---|
| 6 | 74.9 ± 11.3 | 74.6 ± 10.9 | 78.4 ± 9.6 | 81.1 ± 9.3 | 77.8 ± 9.8 |
| 10 | 74.8 ± 11.7 | 75.2 ± 8.9 | 77.7 ± 8.7 | 77.5 ± 9.9 | 75.3 ± 9.5 |
| 12 | 75.9 ± 10.6 | 74.9 ± 8.9 | 76.9 ± 8.7 | 77.8 ± 9.0 | 74.3 ± 8.9 |
| 16 | 74.5 ± 10.1 | 73.9 ± 8.8 | 76.3 ± 8.7 | 77.0 ± 10.0 | 75.1 ± 9.5 |
| 20 | 74.0 ± 9.4 | 73.4 ± 8.6 | 75.3 ± 8.2 | 74.0 ± 9.7 | 73.0 ± 9.3 |
| 24 | 73.6 ± 9.7 | 72.7 ± 9.1 | 75.2 ± 8.4 | 73.0 ± 10.2 | 74.0 ± 10.3 |
| 30 | 73.1 ± 9.7 | 72.5 ± 9.3 | 74.3 ± 8.6 | 72.1 ± 11.0 | 72.2 ± 10.8 |
Tables IX and X show the results of the proposed method using the LDA classifier. The best Az value was achieved at ws = 9 × 9 which is 80.7%. At the same ws, the proposed method achieved on average 79.9% and higher Az was achieved at txt > 12. In terms of accuracy, the highest CA = 75.8% was achieved at ws = 11 × 11. Our results are within the range of the existing studies3,10,46,47 in the literature.
TABLE IX.
Az (%) values for LDA classifier.
| 5 × 5 | 7 × 7 | 9 × 9 | 11 × 11 | 13 × 13 | |
|---|---|---|---|---|---|
| 6 | 74.6 ± 13.9 | 75.0 ± 15.4 | 79.1 ± 12.1 | 78.2 ± 13.8 | 77.3 ± 12.5 |
| 10 | 75.2 ± 12.9 | 74.9 ± 13.7 | 79.5 ± 11.3 | 79.2 ± 11.9 | 78.2 ± 12.3 |
| 12 | 74.9 ± 12.3 | 75.8 ± 13.3 | 79.7 ± 11.2 | 78.6 ± 12.4 | 77.5 ± 11.9 |
| 16 | 75.9 ± 12.6 | 75.9 ± 13.1 | 80.2 ± 11.7 | 79.1 ± 12.8 | 78.5 ± 12.5 |
| 20 | 75.2 ± 12.2 | 75.7 ± 12.7 | 80.3 ± 11.1 | 74.6 ± 14.6 | 73.6 ± 13.2 |
| 24 | 75.5 ± 12.7 | 75.5 ± 12.7 | 80.1 ± 11.5 | 75.8 ± 14.2 | 73.2 ± 13.7 |
| 30 | 75.3 ± 11.9 | 75.1 ± 12.6 | 80.7 ± 11.7 | 75.4 ± 13.1 | 73.2 ± 13.5 |
TABLE X.
CA (%) values for LDA classifier.
| 5 × 5 | 7 × 7 | 9 × 9 | 11 × 11 | 13 × 13 | |
|---|---|---|---|---|---|
| 6 | 65.2 ± 17.6 | 64.5 ± 18.3 | 67.7 ± 16.8 | 67.8 ± 16.9 | 67.2 ± 16.5 |
| 10 | 65.7 ± 16.3 | 65.9 ± 18.1 | 67.9 ± 16.5 | 69.7 ± 18.1 | 68.7 ± 15.7 |
| 12 | 64.7 ± 16.1 | 67.1 ± 15.8 | 67.6 ± 14.8 | 70.2 ± 16.6 | 69.5 ± 16.1 |
| 16 | 67.4 ± 15.9 | 66.0 ± 16.6 | 68.7 ± 15.7 | 75.1 ± 13.2 | 69.1 ± 13.5 |
| 20 | 64.2 ± 16.6 | 66.3 ± 16.5 | 68.8 ± 15.3 | 68.5 ± 17.1 | 67.3 ± 17.3 |
| 24 | 66.7 ± 16.2 | 66.6 ± 16.2 | 69.4 ± 14.3 | 75.8 ± 14.2 | 67.5 ± 15.8 |
| 30 | 66.0 ± 15.7 | 66.1 ± 16.3 | 70.1 ± 14.6 | 68.4 ± 15.3 | 67.1 ± 15.3 |
Overall, the BNet classifier with Az = 92.8% and CA = 84% outperformed the other classifiers in both metrics, which is statistically significant (p < 0.001) against all classifiers except the CA of RF classifier (p = 0.095) and k-NN classifier (p = 0.025). The p value between the best Az and CA of the BNet classifier against the best results of ADTree is p < 0.001. The p value against the best Az of k-NN is p < 0.001. The RF classifier produced the second best results in both metrics with Az = 89.5% and CA = 81.1% at ws = 11 × 11 and txt = 6. The k-NN and ADTree classifiers achieved Az > 83% but in terms of accuracy the ADTree achieved CA < 80%. The overall results show that the best results for all classifiers employed were achieved using either ws = 9 × 9 or 11 × 11 and txt = 6 or 16. Considering the best Az values of all classifiers, results suggest that the proposed method can achieve similar performances to other prostate cancer CAD in the literature. Furthermore, based on the best Az, our method qualitatively outperformed most of the existing methods. Nevertheless, in terms of accuracy there is space for improvement.
The RF classifier (Tables VII and VIII) performs better than the ADTree and k-NN because of its ability to perform like an ensemble classifier (consider various decisions and use averaging to improve predictive accuracy). The k-NN classifier produced better results than the ADTree at txt = 6 because of the simplicity of its decision rule as well as the data itself (low data dimension). Nevertheless, the ADTree classifier performed better than the k-NN at txt > 10 because it employs a “boosting” approach (building a decision tree iteratively based on the error produced in the previous decision tree). A larger number of textons would be beneficial for the ADTree classifier because a better representation of the problem domain can be created iteratively. Results in Tables III and IV show that the BNet classifier produced consistent results regardless of txt and outperformed the other classifiers. The BNet is expected to perform better due to its ability to map the relationships among variables (or features) to build a predictive model without being restricted by the independence condition. Finally, the LDA produced the worst results in our experiments. Our explanations for this behavior are twofold. First, the LDA is a linear classifier which means that the data would be linearly separable. Our test/training data are much more complex and the decision boundary between classes is expected to be nonlinear. Second, the decision rules in the LDA classifier are incapable of dealing with complex data such as prostate MRI. Existing studies3,10,46,47 whose methods employed the LDA classifier achieved similar results ranging between Az = 75% and 84% whereas our proposed method achieved Az = 80.7% which is similar to the current methods.
The top and bottom rows in Fig. 8 show the segmentation results of two different cases produced by the classifiers employed in this study. The BNet, RF, and k-NN classifiers produced good segmentation accuracy covering most areas of the malignant region. In the second case (bottom row), the RF classifier again produced the highest accuracy followed by the k-NN classifier whereas the BNet and ADTree classifiers generated reasonable segmentation results.
FIG. 8.
Segmentation results using different machine learning algorithms.
5. PARAMETER OPTIMIZATION
Since the performance of most machine learning algorithms is dependent on the parameters chosen by the users, we further investigated the performance of three classifiers employed in this study which are the k-NN, RF, and ADTree classifiers (note that the BNet classifier does not have an adjustable parameter in weka). Performing parameter optimization for each of the classifiers is time consuming due to the size of the dataset (number of instances more than 170 000) and the complexity of the classifier itself. Note that in this section we used the data with features extracted using ws = 11 × 11 based on the results in Sec. 4 and we have not tested on features extracted using different ws. For the k-NN classifier we tested k = 1 up to k = 41 (at 2 neighbors interval to ensure odd k values). For the RF and ADTree classifiers, we tested the initial number of random trees (rF) from 5 to 165 (with an interval rF = 5) and the number of boosting (nB) from 1 to 41 (with an interval nB = 2, nB = 10 the WEKA default value), respectively. The purpose of these experiments is to demonstrate the stability of the proposed method when different parameters are used.
Figure 9 shows results for Az and CA when k is varied from 1 to 41. The data were extracted using ws = 11 × 11 and txt = 12 (6 textons/class). In terms of classification accuracy, no significant difference was noticed as all CA were between 72% and 74%. Nevertheless, there was a significant difference in terms of Az at k = 1 and k = 5. The Az increased to just below 84% as the k increases in comparison to k = 1 where the Az value is around 79%. This indicates that the classifier tends to produce a better result when comparing k neighbors instead of just taking the nearest neighbor in a classification.
FIG. 9.
The Az and CA values using different k values for the k-NN classifier. Default k = 1 in weka (Ref. 41).
Figure 10 shows the results for the ADTree classifier using 22 different nB values. The classifier produced Az ≤ 80% and CA ≤ 70% at nB ≤ 9. At the default nB given in weka,41 it produced around Az = 83% and CA = 73%. At nB ≥ 11 both metrics change around 1% before they gradually increase at nB ≥ 17 and reach Az ≤ 83% and CA ≤ 73%. As shown in Fig. 10, the classifier produced the best Az and CA at nB = 10. Our explanation for this behavior is twofold: First, it may be caused that an optimal model for the data was achieved at nB = 10, which means adding more iteration results in an overfitted model and second, adding more iterations decreases the training error but increases the test error, which affects the overall accuracy. In an early study conducted by Freund and Manson,43 they showed that a significant test error was encountered at nB > 10.
FIG. 10.
The Az and CA values using different nB values for the ADTree classifier. Default nB = 10 in weka (Ref. 41).
In Fig. 11, the RF classifier achieved the highest Az = 91% at rF = 50 with Az just above 82% with rF = 85. Overall, there is no significant difference for both performance metrics using rF = 5–165, the CA = 80%–82% and Az = 89%–91%. In this section, one visible pattern for all the classification results is after an optimal model (or optimal parameter) is achieved, both metrics showed very little change, which may be caused by the size of our data (more than 170 000 instances). For example, 100 misclassified instances have very little effect on the percentage. Oshiro et al.48 conducted a study investigating the correlation between the numbers of random forests and Az values, and found that in most cases the classifier achieved a high AUC value between random forests 8 and 64 (similar to our results in Fig. 11). Adding more random forests only increases the computational time.
FIG. 11.
The Az and CA values using different rF values for the RF classifier. Default rF = 100 in weka (Ref. 41).
6. QUALITATIVE COMPARISON
Despite promising results of CAD in assisting radiologists in diagnostic decision making, the major problem is the lack of publicly available datasets, resulting in each group of researchers having their own dataset. Several factors contribute to the difficulty of making quantitative comparisons in prostate cancer CAD:
-
1.
Differences in the type of modalities [different modalities such as T2-weighted (T2-W) MRI, diffusion-weighted (DWI) MRI, dynamic contrast enhanced (DCE) MRI, and magnetic resonance spectroscopy (MRS)] and protocols used in the other studies as the tissues’ numerical representations are inconsistent for different modalities.
-
2.
Absence of public datasets also makes quantitative comparisons among CADs in the literature difficult. Each team of researchers have their own datasets which cause a huge range of variability in terms of noise, acquisition protocol, and image quality.
-
3.
Studies were conducted within different regions of the prostate. Visually it is harder to detect and differentiate malignant regions within the CZ in comparison to the ones in the PZ.
-
4.
Finally, another difficulty is that the basis of evaluation has been different (e.g., volume, slice, regions, or voxel/pixels). Pixel level evaluation is more challenging as the number of instances increases as the number of pixels increases, resulting in more complex data, whereas the number of instances is limited to the number of regions annotated by the radiologists in region level evaluation.
Nevertheless, qualitative comparisons can roughly indicate the relative performance of the proposed method in this paper.
All methods in Table XI achieved at least Az = 89%. The methods proposed by Vos et al.49 and Lv et al.50 achieved the highest Az = 97%. Vos et al.49 proposed a method using features extracted from quantitative pharmacokinetic (PK) maps and T2-W MRI before training a SVM to calculate the malignancy likelihood of each lesion. However, the method was tested on a small dataset of 87 regions of interest (ROIs) taken from 29 patients. On the other hand, Lv et al.50 used analysis of histogram fractal dimension (HFD) and texture fractal dimension (TFD) information on a single modality of T2-W MRI. Although the study covered 55 patients, the actual evaluation was based on 130 selected ROIs of 12 × 12 pixels (which means only a small part of the PZ region was covered). In fact, Lv et al.50 did not perform cross- validation to further evaluate their method. In our study, we performed 9-FCV as well as tested the proposed method on 418 PZ regions.
TABLE XI.
Qualitative comparison with eight of the existing CAD methods in the literature. Note that # and WP indicate the number of patients and whole prostate, respectively.
| Authors | # | Zone | Modality | Az |
|---|---|---|---|---|
| Vos et al. (Ref. 49) | 29 | PZ | T2-W, DCE | 97 |
| Lv et al. (Ref. 50) | 55 | PZ | T2-W | 97 |
| Peng et al. (Ref. 51) | 48 | WP | T2-W, DCE, DWI | 95 |
| Our method | 45 | PZ | T2-W | 93 |
| Vos et al. (Ref. 52) | 29 | PZ | T2-W, DCE | 91 |
| Tiwari et al. (Ref. 53) | 19 | WP | T2-W, MRS | 91 |
| Tiwari et al. (Ref. 12) | 36 | WP | T2-W, MRS | 90 |
| Litjens et al. (Ref. 3) | 347 | WP | T2-W, DCE, DWI | 89 |
| Kwak et al. (Ref. 54) | 244 | WP | T2-W, DWI | 89 |
| Niaf et al. (Ref. 10) | 30 | PZ | T2-W, DWI, DCE | 89 |
Peng et al.51 reported Az = 95% based on T2-W, DCE, and DWI using the following features: tenth percentile apparent diffusion coefficient (ADC), average ADC, and T2-W skewness. Subsequently, individual image features were combined using linear discriminant analysis (LDA) to perform leave-one-patient-out cross-validation. From an evaluation point of view, their study is similar to the studies in Refs. 49 and 50. Although Peng et al.51 reported that their study covered 48 patients, the actual evaluation was based on 104 ROIs (61 malignant ROIs, 43 normal ROIs). In comparison, our proposed methods achieved similar results qualitatively with some of the methods in the literature regardless of the size of dataset, modality, and studied zones.
Another study by Vos et al.49 reported Az = 91% for malignant and benign discrimination, and 83% for suspicious malignant and benign discrimination, which is similar to the earlier study conducted by them.52 Niaf et al.10 extracted 140 texture features from 180 ROIs (30 patients) and achieved Az = 89% which is similar to the method in Ref. 3. Niaf et al.10 compared the performance of four different classifiers (SVM, LDA, k-NN, and NB) based on four different feature selection methods. Further, their results showed that employing feature selection significantly improved the performance of their method and gradient features showed a high discriminant capability in their study.
Litjens et al.3 conducted a study which covered 347 patients and reported Az = 89%. Their method consisted of two stages: in the first stage the prostate gland was segmented using a multiatlas-based segmentation method and features based on intensity, anatomical, pharmacokinetic, texture, and blobness were calculated. Subsequently, each voxel was classified using GentleBoost and RF classifiers to generate a likelihood map. On each likelihood map local maxima detection was performed to capture ROIs with the highest probability of being malignant. A method by Tiwari et al.53 which is based on multikernel graph embedding in T2-W and MRS produced Az = 89% covering 29 patients. The method53 was also based on a two-stage classification approach: in the first stage, a voxel based classification was performed by employing a random forest classifier in conjunction with the SeSMiK-GE based data representation and a probabilistic pairwise Markov random field (MRF) algorithm to identify malignant ROIs. Subsequently, each of the segmented malignant ROIs was classified as either high or low Gleason grade. Using the same method, in a smaller study53 of 19 patients Tiwari et al. reported an Az = 91%. Later, Tiwari et al.12 proposed a data integration framework for T2-W and MRS for prostate cancer detection. Texture descriptors such as Gabor, gradient, first and second order statistical features were extracted from T2-W and wavelet features were extracted from MRS images. Both sets of features were fused (via dimensionality reduction) using their proposed framework before employing a probabilistic boosting tree (PBT), SVM, and RF classifiers. They reported an improvement of at least Az = 5% in comparison to the results without using the proposed data integration framework.
Finally, a recent study by Kwak et al.54 combining texture descriptors (different variations of local binary pattern) in T2-W and b-value in DWI showed similar results to the studies of Litjens et al.3 and Niaf et al.10 However, the proposed method involved a three-step feature selection process, which can be time consuming. In fact, texture information extracted using local binary patterns and its variants yields a large number of features and extracting unnecessary features, both of which can be computationally expensive. In comparison to our proposed method, we achieve Az approximately 90% using the RF classifier with only six textons (only six features and feature selection is not necessary) and at smaller rF, our proposed method achieved Az > 91% using the same classifier.
7. DISCUSSION
In our experiments, the results suggest that the number of textons in the dictionary has a significant effect on both Az and CA. For example, ADTree, k-NN, and RF classifiers produced better results at a smaller txt value. In contrast, the BNet classifier performed slightly better at txt = 16 or 20. Both metrics are highly influenced by the ws which are used to construct the histograms (treated as feature vectors) from the texton maps. Furthermore, using the maximum value of ws (in our case 13 × 13) reduced the performance on both metrics. In terms of selecting the best ws and txt, most classifiers performed well at 9 × 9 and 11 × 11 with 6 or 16 textons (3 and 8 textons/class, respectively).
All classifiers produced the best results when the size of the textons generated is either 9 × 9 or 11 × 11. Our explanations for this are fourfold. First, using a small ws such as 5 × 5 does not provide sufficient information about the regions (such as limited intensities and gray level variations). Second, small textons which contain or are affected by noise are unable to characterize the actual representation of the region. Third, using a medium ws (e.g., 9 × 9) features tend to be more reliable because “noisy pixels” are shrunk by the domination of “reliable pixels” (e.g., malignant pixels). Finally, when using a large ws (e.g., 13 × 13), the performance tends to decrease because the chance of mixing up pixels from benign and malignant classes is higher, hence altering the actual feature’s representation of a particular class.
The number of textons affects the complexity of the predictive model built by the classifier. Most classifiers produced better results using smaller number of textons (e.g., txt = 6) because it reduces the data complexity. This makes it much easier for the classifier to create decision boundaries between classes which decreases error rates and increases the accuracy of the model in making a prediction in unseen cases. Nevertheless, for the BNet classifier using a small number of textons (e.g., txt = 6) it is insufficient to build a network model that can represent the problem. A larger number of textons were needed (e.g., txt = 16) to build an optimum network model to represent the problem.
From a computer vision point of view, the appearance of textons is determined by pixel intensities which means they could be influenced by the image intensities. In contrast, feature-based textons are less affected by image intensities. Therefore, to overcome this issue most of the texton-based approaches perform image normalization. In our study, we used a similar approach by normalizing image intensities to zero mean and unit variance. The standardization gives similar representation for normal and malignant regions for the whole prostate. From a data classification point of view, data imbalance between normal and malignant samples (resulting in a model biased toward the class with a larger number of samples) is another challenge in the development of prostate cancer CAD. Most CAD systems employed stratified cross-validation as a standard procedure to build a predictive model. Recently, Fehr et al.55 proposed an alternative procedure using data augmentation for classifying prostate cancer aggressiveness.
There are two main advantages of the proposed method: First, it bypasses the typical feature extraction algorithm such as filtering and convolution which can be computationally expensive and second, it does not need the additional step of feature selection as the number of textons is already small (e.g., txt = 6 or 16). With a large number of features, selecting the best features can be time consuming. Although feature selection can significantly improve classification results, it needs to be applied with a robust feature selection algorithm. The results suggested that even at txt = 6, our method can produce Az > 90% with the BNet classifier and using the simplest classifier (k-NN) can still achieve around 87%. In fact, all classifiers employed in this study produced Az > 80%. There are three reasons why most texton-based methods in the literature used larger numbers of textons in comparison to our proposed method. First, most filter banks lead to blurring of the appearance of the texture. This results in a larger number of textons being needed to characterize the actual representation of the texture. Second, the number of textons used is depending on the variation (or complexity) of the textures. For example, to construct a texton model for a grass image we need more textons because the variations are huge due to different orientations, shapes, colors, sizes, etc., in contrast to a specific cancer (e.g., prostate or lung cancer), which can show limited variations. In retinal vessel segmentation,18 the authors reported that 12 textons are sufficient to get good segmentation results. Similarly, the study in Ref. 19 achieved more than 85% accuracy in lung cancer detection using only 10 textons. Finally, most studies investigated a larger number of classes which increase the number of textons in the codebook (texton dictionary) considerably. For example, studies in Refs. 16, 17, 21, and 24 classified more than 60 different textures resulting in more than a thousand textons generated in the codebook. In comparison, the studies in Refs. 18 and 19 classified only two classes: vessel and nonvessel and healthy and nonhealthy resulting in the best accuracy using 12 and 10 textons.
The main limitation of this study is that we are unable to compare the results quantitatively with existing methods due to the absence of public datasets. This is the major problem among the research communities in prostate cancer CAD. Second, we are unable to test the classifiers at different parameters with features extracted using different ws. Nevertheless, the experimental results presented in this paper showed the prospect of CAD based on T2-W MRI to achieve similar results with CADs based on multiparametric MRI (and as such would form an excellent basis for multiparametric MRI based CAD) without the need for the typical feature extraction methods such as filtering and convolution. In addition, preliminary results indicated that the proposed method could achieve better results when the classifier’s parameters were optimized. Although the proposed method achieved Az = 92.8%, we would like to emphasize that this does not indicate that our method is better in comparison to the other prostate cancer CAD based on multiparametric MRI. We are aware that several studies3,11,56 have shown that combining different features from different modalities can increase the CAD’s performance. It should be noted that the classification accuracies reported in this study were based on the prostate PZ. Therefore the proposed method may not produce the same accuracy when tested within the CZ due to different MR phenotypes (e.g., the tissue contrast between PZ and CZ is different due to higher water content within the PZ). Another limitation of our study is that our ground truth was based on TRUS biopsies report and annotated by a radiologist. Due to the random procedure and limited access on the horns of the PZ, TRUS biopsies can miss cancerous tissues, particularly small lesions. This means some of our training samples might be inaccurate (e.g., some benign lesions might be malignant). In contrast, template biopsy can help detect small lesions as its procedure tends to obtain more tissue samples from across the prostate. On the other hand, radical prostatectomy allows the whole prostate to be examined thoroughly in the lab which will give definite results about how aggressive the cancer is and how far it may have spread. Future work will investigate combining a texton with features from the other modalities such as DCE, MRS, and DWI and use patches with deep learning to see if there is a significant effect on both performance metrics.
8. SUMMARY AND CONCLUSIONS
The proposed method consists of the following steps: (a) preprocessing, (b) construction of the texton dictionary, (c) feature extraction, (d) training and testing. We used median and anisotropic diffusion filtering techniques in the preprocessing phase. To construct the texton dictionary we did not use a filter bank as originally proposed by Varma and Zisserman.16 Instead we followed their later study in Ref. 24 in which benign clusters and malignant patches directly form the original image pixels. In the feature extraction phase, each pixel is represented as a histogram treated as a feature vector. The constructed histogram for each pixel consists of the frequency of the neighboring texton occurrence within the ws (or patch size) including the texton at the central pixel. Subsequently we employed four classifiers to build predictive models and test the models on unseen cases.
Evaluation results show that the proposed method achieved similar results with the state-of-the-art in all performance metrics. Indeed, the texton based approach relies on two parameters ws and txt. In our experiments we found ws = 9 × 9 and 11 × 11 with 6 and 16 textons producing the best results for most classifiers. The BNet, RF, and k-NN classifiers are the best three machine learning algorithms that produced Az > 87%.
In conclusion, we have developed a CAD texton based approach based on a single modality of T2-W MRI and has similar performance with those based on multiparametric MRI and as such would form an excellent basis for multiparametric MRI based CAD. Experimental results suggest that textons can be used as robust texture descriptors to characterize benign and malignant tissues. The main advantages of the proposed method are as follows: First, it bypasses the conventional feature extraction methods based on filtering and second, it avoids dimensionality reduction methods or feature selection which can both be time consuming. To our knowledge this is the first texton based CAD method in the literature applied to prostate cancer detection.
ACKNOWLEDGMENTS
Andrik Rampun is grateful for the awards given by Aberystwyth University under the Departmental Overseas Scholarship (DOS) and Doctoral Career Development Scholarships (DCDS). This work was funded in part by the NISCHR Biomedical Research Unit for Advanced Medical Imaging and Visualization.
CONFLICT OF INTEREST DISCLOSURE
The authors have no COI to report.
REFERENCES
- 1.International Agency for Research on Cancer, GLOBOCAN 2012: Estimated cancer incidence, mortality and prevalence worldwide in 2012, http://globocan.iarc.fr/Pages/fact_sheets_cancer.aspx (accessed August 2, 2015).
- 2.American Cancer Society, Cancer facts & figures, 2015.
- 3.Litjens G., Debats O., Barentsz J., Karssemeijer N., and Huisman H., “Computer-aided detection of prostate cancer in MRI,” IEEE Trans. Med. Imaging (5), 1083–1092 (2014). 10.1109/TMI.2014.2303821 [DOI] [PubMed] [Google Scholar]
- 4.Artan Y. and Yetik I. S., “Prostate cancer localization using multiparametric MRI based on semi-supervised techniques with automated seed initialization,” IEEE Trans. Inf. Technol. Biomed. (6), 2986–2994 (2012). 10.1109/TITB.2012.2201731 [DOI] [PubMed] [Google Scholar]
- 5.Yacoub J. H., Verma S., Moulton J. S., Eggener S., and Oto A., “Imaging-guided prostate biopsy: Conventional and emerging techniques,” Radiographics (3), 819–837 (2012). 10.1148/rg.323115053 [DOI] [PubMed] [Google Scholar]
- 6.Schroder F. H., Hugosson J., Roobol M. J., Tammela T. L., Ciatto S., Nelen V., Ciatto S., Nelen V., Kwiatkowski M., Lujan M., Lilja H., Zappa M., Denis L. J., Recker F., Berenguer A., Maattanen L., Bangma C. H., Aus G., Villers A., Rebillard X., van der Kwast T., Blijenberg B. G., Moss S. M., de Koning H. J., Auvinen A., and ERSPC Investigators, “Screening and prostate-cancer mortality in a randomized European study,” N. Engl. J. Med. , 1320–1328 (2009). 10.1056/NEJMoa0810084 [DOI] [PubMed] [Google Scholar]
- 7.Doi K., “Computer-aided diagnosis in medical imaging: Historical review, current status and future potential,” Comput. Med. Imaging Graphics (4-5), 198–211 (2007). 10.1016/j.compmedimag.2007.02.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Shiraishi J., Li Q., Appelbaum D., and Doi K., “Computer-aided diagnosis and artificial intelligence in clinical imaging,” Semin. Nucl. Med. (6), 449–462 (2011). 10.1053/j.semnuclmed.2011.06.004 [DOI] [PubMed] [Google Scholar]
- 9.Summers R. M., Liu J., Rehani B., Stafford P., Brown L., Louie A., Barlow D. S., Jensen D. W., Cash B., Choi J. R., Pickhardt P. J., and Petrick N., “CT colonography computer-aided polyp detection: Effect on radiologist observers of polyp identification by CAD on both the supine and prone scans,” Acad. Radiol. , 948–959 (2010). 10.1016/j.acra.2010.03.024 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Niaf E., Rouvière O., Mège-Lechevallier F., Bratan F., and Lartizien C., “Computer aided diagnosis of prostate cancer in the peripheral zone using multiparametric MRI,” Phys. Med. Biol. (12), 3833–3851 (2012). 10.1088/0031-9155/57/12/3833 [DOI] [PubMed] [Google Scholar]
- 11.Viswanatha S., Blochb B. N., Chappelowa J., Patela P., Rofskyc N., Lenkinskid R., Genegad E., and Madabhushi A., “Enhanced multi-protocol analysis via intelligent supervised embedding (EMPrAvISE): Detecting prostate cancer on multi-parametric MRI,” Proc. SPIE , 79630U (2011). 10.1117/12.878312 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Tiwari P., Viswanath S., Kurhanewicz J., Shridhar A., and Madabhushi A., “Multimodal wavelet embedding representation for data combination (MaWERiC): Integrating magnetic resonance imaging and spectroscopy for prostate cancer detection,” NMR Biomed. , 607–619 (2012). 10.1002/nbm.1777 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Barentsz J. O., Richenberg J., Clements R., Choyke P., Verma S., Villeirs G., Rouviere O., Logager V., and Futterer J. J., “European society of urogenital radiology. ESUR prostate MR guidelines 2012,” Eur. Radiol. (4), 746–757 (2012). 10.1007/s00330-011-2377-y [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Lemaître G., Marti R., Freixenet J., Vilanova J. C., Walker P. M., and Meriaudeau F., “Computer-aided detection and diagnosis for prostate cancer based on mono and multi-parametric MRI: A review,” Comput. Biol. Med. , 8–31 (2015). 10.1016/j.compbiomed.2015.02.009 [DOI] [PubMed] [Google Scholar]
- 15.Leung T. and Malik J., “Representing and recognizing the visual appearance of materials using three-dimensional textons,” Int. J. Comput. Vision (1), 29–44 (2001). 10.1023/A:1011126920638 [DOI] [Google Scholar]
- 16.Varma M. and Zisserman A., “A statistical approach to texture classification from single images,” Int. J. Comput. Vision (1), 61–81 (2005). 10.1007/s11263-005-4635-4 [DOI] [Google Scholar]
- 17.Varma M. and Zisserman A., “A statistical approach to material classification using image patch exemplars,” IEEE Trans. Pattern Anal. Mach. Intell. (11), 2032–2046 (2009). 10.1109/TPAMI.2008.182 [DOI] [PubMed] [Google Scholar]
- 18.Zhang L., Fisher M., and Wang W., “Retinal vessel segmentation using Gabor filter and textons,” in Proceedings of the 18th Conference on Medical Image Understanding and Analysis, MIUA’14 (BMVA, Lincoln, UK, 2014), pp. 155–160. [Google Scholar]
- 19.Gangeh M., Sorensen L., Shaker S., Kamel M., de Bruijne M., and Loog M., “A texton-based approach for the classification of lung parenchyma in CT images,” in Proceedings of Medical Image Computing and Computer-Assisted Intervention–MICCAI (Springer-Verlag, Beijing, China, 2010), Vol. 6363, pp. 595–602. 10.1007/978-3-642-15711-0_74 [DOI] [PubMed] [Google Scholar]
- 20.Julesz B., “A theory of preattentive texture discrimination based on first-order statistics of textons,” Biol. Cybern. (2), 131–138 (1981). 10.1007/BF00335367 [DOI] [PubMed] [Google Scholar]
- 21.Varma M. and Zisserman A., “Classifying images of materials: Achieving viewpoint and illumination independence,” in Proceedings of the 7th European Conference on Computer Vision, Copenhagen, Denmark (Springer-Verlag, Copenhagen, Denmark, 2002), Vol. 3, pp. 255–271. [Google Scholar]
- 22.Schmid C., “Constructing models for content-based image retrieval,” in Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (IEEE Computer Society, Kauai, HI, 2001), Vol. 2, pp. II-39–II-45. [Google Scholar]
- 23.Gabor D., “Theory of communication,” J. Inst. Electr. Eng. , 429–457 (1946). [Google Scholar]
- 24.Varma M. and Zisserman A., “Texture classification: Are filter banks necessary?,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (IEEE Computer Society, Wisconsin, 2003), Vol. 2, pp. 691–698. [Google Scholar]
- 25.van der Maaten L. and Postma E. O., “Texton-based texture classification,” in Proceedings of the Belgian-Dutch Artificial Intelligence Conference (Utrecht, Netherlands, 2007), pp. 213–220. [Google Scholar]
- 26.Rampun A., Chen Z., Malcolm P., Tiddeman B., and Zwiggelaar R., “Computer-aided diagnosis: Detection and localization of prostate cancer within the peripheral zone,” Int. J. Numer. Methods Biomed. Eng. (5 ) (2015). 10.1002/cnm.2745 [DOI] [PubMed] [Google Scholar]
- 27.Liu X., Haider M. A., and Yetik S., “Automated prostate cancer localization with MRI without the need of manually extracted peripheral zone,” Med. Phys. (6), 2986–2994 (2011). 10.1118/1.3589134 [DOI] [PubMed] [Google Scholar]
- 28.Makni N., Iancu A., Colot O., Puech P., Mordon S., and Betrouni N., “Zonal segmentation of prostate using multispectral magnetic resonance images,” Med. Phys. , 6093–6105 (2011). 10.1118/1.3651610 [DOI] [PubMed] [Google Scholar]
- 29.Ocak I., Bernardo M., Metzger G., Barrett T., Pinto P., Albert P. S., and Choyke P. L., “Dynamic contrast-enhanced MRI of prostate cancer at 3 T: A study of pharmacokinetic parameters,” Am. J. Roentgenol. (4), W192–W201 (2007). 10.2214/AJR.06.1329 [DOI] [PubMed] [Google Scholar]
- 30.Carlsson J., Helenius G., Karlsson M. G., Andren O., Klinga-Levan K., and Olsson B., “Differences in microRNA expression during tumor development in the transition and peripheral zones of the prostate,” BMC Cancer (1), 6093–6105 (2013). 10.1186/1471-2407-13-362 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Madabhushi A., Udupa J., and Souza A., “Generalized scale: Theory, algorithms, and application to image inhomogeneity correction,” Comput. Vision Image Understanding (2), 100–121 (2006). 10.1016/j.cviu.2005.07.010 [DOI] [Google Scholar]
- 32.Madabhushi A. and Udupa J. K., “New methods of MR image intensity standardization via generalized scale,” Med. Phys. (9), 3426–3434 (2006). 10.1118/1.2335487 [DOI] [PubMed] [Google Scholar]
- 33.Madabhushi A., Udupa J. K., and Moonis G., “Comparing MR image intensity standardization against tissue characterizability of magnetization transfer ratio imaging,” Journal of Magnetic Resonance Imaging (3), 667–675 (2006). 10.1002/jmri.20658 [DOI] [PubMed] [Google Scholar]
- 34.Artan Y., Haider M. A., Langer D. L., and Yetik I. S., “Semi-supervised prostate cancer segmentation with multiparametric MRI,” in Proceedings of International Symposium Biomedical Imaging (IEEE Engineering in Medicine and Biology Society, Rotterdam, Netherlands, 2010), pp. 648–651. [Google Scholar]
- 35.Liang J. and Bovik A., “Smoothing low-SNR molecular images via anisotropic median-diffusion,” IEEE Trans. Med. Imaging (4), 377–384 (2002). 10.1109/TMI.2002.1000261 [DOI] [PubMed] [Google Scholar]
- 36.Perona P. and Malik J., “Scale-space and edge detection using anisotropic diffusion,” IEEE Trans. Pattern Anal. Mach. Intell. (7), 629–639 (1990). 10.1109/34.56205 [DOI] [Google Scholar]
- 37.Hartigan J. A. and Wong M. A., “Algorithm AS 136: A k-means clustering algorithm,” J. R. Stat. Soc. Ser. C (Appl. Stat.) (1), 100–108 (1979). 10.2307/2346830 [DOI] [Google Scholar]
- 38.Vollmer R. T., “Tumor length in prostate cancer,” Am. J. Clin. Pathol. , 77–82 (2008). 10.1309/PJNRHT63TP6FVC8B [DOI] [PubMed] [Google Scholar]
- 39.Lee S., Lee J. K., Keun J., Jeong C. W., Jeong S. J., Hong S. K., Byun S. S., Lee S. E., and Lee H., “Core length as a predictor of Gleason score upgrading in men diagnosed with low risk prostate cancer by contemporary multicore prostate biopsy,” J. Urol. (4), e608–e618 (2013). 10.1016/j.juro.2013.02.2948 [DOI] [Google Scholar]
- 40.Billis A., Quintal M. M. Q., Freitas L. L. L., Costa L. B. E., and Ferreira U., “Predictive criteria of insignificant prostate cancer: What is the correspondence of linear extent to percentage of cancer in a single core?,” Int. Braz. J. Urol. (2), 367–372 (2015). 10.1590/S1677-5538.IBJU.2015.02.26 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Hall M., Frank E., Holmes G., Pfahringer B., Reutemann P., and Witten I. H., “The WEKA data mining software: An update,” ACM SIGKDD Explor. Newsl. (1), 10–18 (2009). 10.1145/1656274.1656278 [DOI] [Google Scholar]
- 42.Bayes T., “An essay toward solving a problem in the doctrine of chances,” Philos. Trans. R. Soc. London , 370–418 (1763). 10.1098/rstl.1763.0053 [DOI] [PubMed] [Google Scholar]
- 43.Freund Y. and Mason L., “The alternating decision tree learning algorithm,” in Proceedings of the Sixteenth International Conference on Machine Learning (Bled, Slovenia, 1999), pp. 124–133. [Google Scholar]
- 44.Breiman L., “Random forests,” Mach. Learn. (1), 5–32 (2001). 10.1023/A:1010933404324 [DOI] [Google Scholar]
- 45.Friedman J. H., “Regularized discriminant analysis,” J. Am. Stat. Assoc. (405), 165–175 (1989). 10.1080/01621459.1989.10478752 [DOI] [Google Scholar]
- 46.Chan I., Wells III W., Mulkern R. V., Haker S., Zhang J., Zou K. H., Maier S. E., and Tempany C. M. C., “Detection of prostate cancer by integration of line-scan diffusion, T2-mapping and T2-weighted magnetic resonance imaging; a multichannel statistical classifier,” Med. Phys. (30), 2390–2398 (2003). 10.1118/1.1593633 [DOI] [PubMed] [Google Scholar]
- 47.Vos P. C., Barentsz J. O., Karssemeijer N., and Huisman H. J., “Automatic computeraided detection of prostate cancer based on multiparametric magnetic resonance image analysis,” Phys. Med. Biol. , 1527–1542 (2012). 10.1088/0031-9155/57/6/1527 [DOI] [PubMed] [Google Scholar]
- 48.Oshiro T. M., Perez P. S., and Baranauskas J. A., “How many trees in a random forest?,” in Machine Learning and Data Mining in Pattern Recognition, Lecture Notes in Computer Science, edited by Perner P. (Springer, Berlin, Heidelberg, 2012), pp. 154–168. [Google Scholar]
- 49.Vos P. C., Hambrock T., Barentsz J., and Huisman H., “Computer-assisted analysis of peripheral zone prostate lesions using T2-weighted and dynamic contrast enhanced T1-weighted MR,” Phys. Med. Biol. (6), 1719–1734 (2010). 10.1088/0031-9155/55/6/012 [DOI] [PubMed] [Google Scholar]
- 50.Lv D., Guo X., Wang X., Zhang J., and Fang J., “Computerized characterization of prostate cancer by fractal analysis in MR images,” J. Magn. Reson. Imaging (1), 161–168 (2009). 10.1002/jmri.21819 [DOI] [PubMed] [Google Scholar]
- 51.Peng Y., Jiang Y., Yang C., Brown J. B., Antic T., Sethi I., Schmid-Tannwald C., Giger M. L., Eggener S. E., and Oto A., “Quantitative analysis of multiparametric prostate MR images: Differentiation between prostate malignant and normal tissue and correlation with gleason score—A computer-aided diagnosis development study,” Radiology (3), 787–796 (2013). 10.1148/radiol.13121454 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Vos P. C., Hambrock T., Barenstz J. O., and Huisman H. J., [“Combining T2-weighted with dynamic MR images for computerized classification of prostate lesions,” Proc. SPIE 6915, 69150W-1–69150W-8 (2008)]. 10.1117/12.771970 [DOI] [Google Scholar]
- 53.Tiwari P., Kurhanewicz J., Rosen M., and Madabhushi A., “Semi supervised multi kernel (SeSMiK) graph embedding: Identifying aggressive prostate cancer via magnetic resonance imaging and spectroscopy,” in Proceedings of Medical Image Computing Computer Assisted Interventions (MICCAI) (Springer-Verlag, Beijing, China, 2010), pp. 666–673. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Kwak J. T., Xu S., Wood B. J., Turkbey B., Choyke P. L., Pinto P. A., Wang S., and Summers R. M., “Automated prostate cancer detection using T2-weighted and high-b-value diffusion-weighted magnetic resonance imaging,” Med. Phys. (5), 2368–7238 (2015). 10.1118/1.4918318 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Fehr D., Veeraraghavan H., Wibmerb A., Gondo T., Matsumoto K., Vargas H. A., Sala E., Hricak H., and Deasy J. O., “Automatic classification of prostate cancer Gleason scores from multiparametric magnetic resonance images,” Proc. Natl. Acad. Sci. U. S. A. , E6265–E6273 (2015). 10.1073/pnas.1505935112 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Tiwari P., Kurhanewicz J., and Madabhushi A., “Multi-kernel graph embedding for detection, Gleason grading of prostate cancer via MRI/MRS,” Med. Image Anal. (2), 219–235 (2013). 10.1016/j.media.2012.10.004 [DOI] [PMC free article] [PubMed] [Google Scholar]











