Evaluation of the Diagnostic Power of Thermography in Breast Cancer Using Bayesian Network Classifiers

Cruz-Ramírez Nicandro; Mezura-Montes Efrén; Ameca-Alducin María Yaneli; Martín-Del-Campo-Mena Enrique; Acosta-Mesa Héctor Gabriel; Pérez-Castro Nancy; Guerra-Hernández Alejandro; Hoyos-Rivera Guillermo de Jesús; Barrientos-Martínez Rocío Erandi

doi:10.1155/2013/264246

. 2013 May 22;2013:264246. doi: 10.1155/2013/264246

Evaluation of the Diagnostic Power of Thermography in Breast Cancer Using Bayesian Network Classifiers

Cruz-Ramírez Nicandro ^1,^*, Mezura-Montes Efrén ¹, Ameca-Alducin María Yaneli ¹, Martín-Del-Campo-Mena Enrique ², Acosta-Mesa Héctor Gabriel ¹, Pérez-Castro Nancy ³, Guerra-Hernández Alejandro ¹, Hoyos-Rivera Guillermo de Jesús ¹, Barrientos-Martínez Rocío Erandi ¹

PMCID: PMC3674659 PMID: 23762182

Abstract

Breast cancer is one of the leading causes of death among women worldwide. There are a number of techniques used for diagnosing this disease: mammography, ultrasound, and biopsy, among others. Each of these has well-known advantages and disadvantages. A relatively new method, based on the temperature a tumor may produce, has recently been explored: thermography. In this paper, we will evaluate the diagnostic power of thermography in breast cancer using Bayesian network classifiers. We will show how the information provided by the thermal image can be used in order to characterize patients suspected of having cancer. Our main contribution is the proposal of a score, based on the aforementioned information, that could help distinguish sick patients from healthy ones. Our main results suggest the potential of this technique in such a goal but also show its main limitations that have to be overcome to consider it as an effective diagnosis complementary tool.

1. Introduction

Breast cancer is one of the main causes of death among women worldwide [1]. Moreover, a specificity is required in the diagnosis of such a disease given that an incorrect classification of a sample as a false positive may lead to the surgical removal of the breast [2]. Nowadays, there are different techniques for carrying out the diagnosis: mammography, ultrasound, MRI, biopsies, and, more recently, thermography [3–6]. In fact, thermography started in 1956 [7] but was discarded some years later because of the poor quality of the thermal images [8] and the low specificity values it achieved. However, with the development of new thermal imaging technology, thermography has reappeared and is being seriously considered as a complementary tool for the diagnosis of breast cancer [9]. Because of specificity required, it is compulsory to have as many available tools as possible to reduce, on the one hand, the number of false positives and, on the other hand, to achieve high sensitivity. Although open biopsy is regarded as the gold standard technique for diagnosing breast cancer, it is practically the last diagnostic resource used since it is an invasive procedure that represents not only significant health implications but also psychological and economic ones also [10]. Other techniques, which are not necessarily invasive, have implicit risks or limitations such as X-ray exposure, interobserver interpretability and difficult access to high-tech expensive equipment [11, 12]. Thermography is also noninvasive, but it has the advantage of using a cheaper device (an infrared camera), which is far more portable than those used in mammography, MRI, and ultrasound. Furthermore, it can be argued that some of the variables considered by thermography may be more easily interpreted than those of some of the aforementioned techniques. As a matter of fact, in this paper we will explore and assess this argument in order to measure the potential of such a technique as a diagnostic tool for breast cancer. Moreover, our main contribution is the proposal of a score, based not only on thermographic variables but also on variables that portray more information than temperature alone, that might help differentiate sick patients from healthy ones. We will also explore the potential of thermography in diagnosing women below the age of 50, which would allow the detection of the disease in its early stages, thus reducing the percentage of mortality.

The rest of the paper is divided as follows. In Section 2, we will present some related research that places our research in context and thus appreciates our contribution. In Section 3, we explain the materials and methods used in our experiments. In Section 4, we will present the methodology and the experimental results. In Section 5, we will discuss these results and, finally, in Section 6, we will conclude our paper and give directions regarding future research.

2. Related Research

In our review of the related literature, we divided these into three categories: introductory, image-based, and data-based works [13–17]. The introductory research mainly points out the potential of thermography as an alternative diagnostic tool for breast cancer comparing its performance to other diagnostic methods such as mammography and biopsy [18, 19]. Unfortunately, because this research is intended as an introduction to the topic, it lacks some important details about the data used in these studies as well as the analyses carried out.

The image-based works mainly range from cluster analyses applied to thermal images (to differentiate healthy from sick breasts) [20] to fractal analyses (to characterize the geometry of the malignant lesions) [21] to the camera calibration for capturing thermal images [3, 22].

The data-based investigations present statistical analyses of patient databases (healthy and sick) such as nonparametric tests, correlation, and analysis of variance; artificial intelligence analyses such as artificial neural networks and Bayesian analysis; and numerical models such as physical and simulation models (bioheat equations) [8, 9, 23–26]. Only a small number of papers propose a score formed from thermographic data [27, 28] but they only propose a maximum of 5 variables to form such a score. In our research, we propose 14 variables to calculate this score: this is the main contribution of the paper alongside the analysis of the diagnostic power of the proposed variables. In Section 3, we will present those variables in more detail and, in Section 4, we will evaluate how informative these variables are in the diagnosis of breast cancer. To end this section, it is important to mention that although the research in this category is very interesting, in some of them the methodology is not clear. This prevents one from easily reproducing the experiments carried out there. We have done our best to present a clear methodology so that our results can be reproduced.

3. Materials and Methods

3.1. The Database

For our experiments, we used a real-world database which was provided by an oncologist who has specialized in the study of thermography since 2008, consisting of 98 cases: 77 cases are patients with breast cancer (78.57%) and 21 cases are healthy patients (21.43%). All the results (either sick or healthy) were confirmed by an open biopsy, which is considered the gold standard diagnostic method for breast cancer [29]. We include in this study 14 explanatory variables (attributes): 8 of them form our score (proposed by the expert), 6 are obtained from the thermal image, one variable is the score itself, and the final variable is age which was discretized in three categories as this is recommended for the selected algorithms [30–32]. In Table 1, we give details of the name, definitions, and values of each of these variables. The dependent variable (class) is the outcome (cancer or no cancer).

Table 1.

Names, definitions, and values of variables. In the experiments the positive value is discretized to 1 and the negative value is discretized to 0. All the values of qualitative variables are given by the image analyst.

Variable name	Definition	Variable value	Variable type
Asymmetry	Temperature difference (in Celsius) between the right and the left breasts	If difference < 1°C, then value = 5, difference between 1°C and 2°C, the value is 10, and difference > 2°C, the value is 15	Nominal (5, 10, 15)

Thermovascular network	Number of veins with the highest temperature	If the visualization is abundant vascularity, the value is 15, if it is moderate, the value is 10, and if it is slight, the value is 5	Nominal (5, 10, 15)

Curve pattern	Heat area under the breast	If heat visualized is abundant, the value is 15, if it is moderate, the value is 10, and if it is slight, the value is 5	Nominal (5, 10, 15)

Hyperthermia	Hottest point of the breast	If there is at least one hottest point, the value is 20 and otherwise the value is 0	Binary (0, 20)

2c	Temperature difference between the hottest points of the two breasts	If difference between 1 and 10, the value is 10, difference between 11 and 15, the value is 15, difference between 16 and 20, the value is 20 and if difference > 20, the value is 25	Nominal (10, 15, 20, 25)

F unique	Amount of hottest points	If sum = 1, the value is 40, if sum = 2, the value is 20, if sum = 3, the value is 10, and if sum > 3, the value is 5	Nominal (5, 10, 20, 40)

1c	Hottest point in only one breast	If the hottest point is only one breast, the value is 40 and if the hottest point is both breasts, the value is 20	Binary (20, 40)

Furrow	Furrows under the breasts	If the furrow is visualized, the value is positive; if not,the value is negative	Binary (0, 1)

Pinpoint	Veins going to the hottest points of the breasts	If the veins are visualized, the value is positive; if not, the value is negative	Binary (0, 1)

Hot center	The center of the hottest area	If the center of the hottest is visualized, the value is positive; if not, the value is negative	Binary (0, 1)

Irregular form	Geometry of the hot center	If the hot center is visualized like a nongeometrical figure, the value is positive; if not, the value is negative	Binary (0, 1)

Histogram	Histogram in form of a isosceles triangle	If the histogram is visualized as a triangle form, the value is positive; if not, the value is negative	Binary (0, 1)

Armpit	Difference temperature between the 2 armpits	If the difference = 0, the value in both is negative; if not, the value is positive; consequently the other is negative	Binary (0, 1)

Breast profile	Visually altered profile	If an altered profile is visualized abundantly, the value is 3, if it is moderate, value is 2, if it is small, the value is 1, and if it does not exist, the value is 0	Binary (0, 1)

Score	The sum of values of the previous 14 variables	If the sum < 160, then the value is negative for cancer; if the sum ≥ 160, the value is positive for cancer	Binary (0, 1)

Age	Age of patient	If the age < 51, the value is 1, if the age between 51 and 71, the value is 2, and if age > 71, the value is 3	Binary (0, 1)

Outcome	The result is obtained via open biopsy	The values are cancer or no-cancer	Binary (0, 1)

Open in a new tab

3.2. Bayesian Networks

A Bayesian network (BN) [33, 34] is a graphical model that represents relationships of a probabilistic nature among variables of interest. Such networks consist of a qualitative part (structural model), which provides a visual representation of the interactions amid variables, and a quantitative part (set of local probability distributions), which permits probabilistic inference and numerically measures the impact of a variable or sets of variables on others. Both the qualitative and quantitative parts determine a unique joint probability distribution over the variables in a specific problem [33–35]. In other words, a Bayesian network is a directed acyclic graph consisting of [36]: (a) nodes (circles), which represent random variables; arcs (arrows), which represent probabilistic relationships among these variables and (b) for each node, there is a local probability distribution attached to it, which depends on the state of its parents.

Figures 3 and 4 (see Section 4) show examples of a BN. One of the great advantages of this model is that it allows the representation of a joint probability distribution in a compact and economical way by making extensive use of conditional independence, as shown in (1):

\begin{matrix} P (X_{1}, X_{2}, \dots, X_{n}) = \prod_{i = 1}^{n} P (X_{i} ∣ P a (X_{i})), \end{matrix}

(1)

where Pa(X _i) represents the set of parent nodes of X _i, that is, nodes with arcs pointing to X _i. Equation (1) also shows how to recover a joint probability from a product of local conditional probability distributions.

Bayesian network built by procedure of Hill-Climber using the 98-case database. Only variable furrow is directly related to the outcome. Once the variable furrow is known, all the other variables are independent of the class.

Bayesian network built by procedure of Repeated Hill-Climber using the 98-case database. Only variable furrow is directly related to the outcome. Once the variable furrow is known, all the other variables are independent of the class.

3.2.1. Bayesian Network Classifiers

Classification refers to the task of assigning class labels to unlabeled instances. In such a task, given a set of unlabeled cases on the one hand and a set of labels on the other, the problem to solve lies in finding a function that suitably matches each unlabeled instance to its corresponding label (class). As can be inferred, the central research interest in this specific area is the design of automatic classifiers that can estimate this function from data (in our case, we are using Bayesian networks). This kind of learning is known as supervised learning [37–39]. For the sake of brevity and the lack of space, we have not written here the code of the 2 procedures used in the tests carried out in this research. We have only briefly described them and refer the reader to their original sources. The procedures used in these tests are (a) the Naïve Bayes classifier, (b) Hill-Climber, and (c) Repeated Hill-Climber [38, 40, 41].

The Naïve Bayes classifier (NB) is one of the most effective classifiers [38] and the benchmark against which state-of-the-art classifiers have to be compared. Its main appeals lie in its simplicity and accuracy: although its structure is always fixed (the class variable has an arc pointing to every attribute), it has been shown that this classifier has a high classification accuracy and optimal Bayes's error (see Figure 3, Section 4). In simple terms, the NB learns, from a training data sample, the conditional probability of each attribute given the class. Then, once a new case arrives, the NB uses Bayes's rule to compute the conditional probability of the class given the set of attributes selecting the value of the class with the highest posterior probability.
Hill-Climber is a Weka's [41] implementation of a search and scoring algorithm, which uses greedy-hill-climbing [42] for the search part and different metrics for the scoring part, such as Bayesian information criterion (BIC), Bayesian Dirichlet (BD), Akaike information criterion (AIC), and minimum description length (MDL) [43]. For the experiments reported here, we selected the MDL metric. This procedure takes an empty graph and a database as input and applies different operators for building a Bayesian network: addition, deletion, or reversal of an arc. In every search step, it looks for a structure that minimizes the MDL score. In every step, the MDL is calculated and procedure Hill-Climber keeps the structure with the best (minimum) score. It finishes searching when no new structure improves the MDL score of the previous network.
Repeated Hill-Climber is a Weka's [41] implementation of a search and scoring algorithm, which uses repeated runs of greedy hill-climbing [42] for the search part and different metrics for the scoring part, such as BIC, BD, AIC, and MDL. For the experiments reported here, we selected the MDL metric. In contrast to the simple Hill-Climber algorithm, Repeated Hill-Climber takes as input a randomly generated graph. It also takes a database and applies different operators (addition, deletion, or reversal of an arc) and returns the best structure of the repeated runs of the Hill-Climber procedure. With this repetition of runs, it is possible to reduce the problem of getting stuck in a local minimum [35].

3.3. Evaluation Method: Stratified k-Fold Crossvalidation

We followed the definition of the crossvalidation method given by Kohavi [37]. In k-fold crossvalidation, we split the database D in k mutually exclusive random samples called the folds: D ₁, D ₂,…, D _k, where said folds have approximately the same size. We trained this classifier each time i ∈ 1,2,…, k using D∖D _i and testing it on D _i (again, the symbol denotes set difference). The crossvalidation accuracy estimation is the total number of correct classifications divided by the sample size (total number of instances in D). Thus, the k-fold crossvalidation estimate is as follows:

\begin{matrix} ac c_{cv} = \frac{1}{n} \sum_{(v_{i}, y_{i}) \in D} δ (I (D ∖ D_{(i)}, v_{i}), y_{i}), \end{matrix}

(2)

where (I(D∖D _(i), v _i), y _i) denotes the label assigned by inducer I to an unlabeled instance v _i on dataset D∖D _(i), y _i is the class of instance v _i, n is the size of the complete dataset, and δ(i, j) is a function where δ(i, j) = 1 if i = j and 0 if i ≠ j. In other words, if the label assigned by the inducer to the unlabeled instance v _i coincides with class y _i, then the result is 1; otherwise, the result is 0; that is, we consider a 0/1 loss function in our calculations of (2). It is important to mention that in stratified k-fold crossvalidation, the folds contain approximately the same proportion of classes as in the complete dataset D. A special case of crossvalidation occurs when k = n (where n represents the sample size). This case is known as leave-one-out crossvalidation [37, 39].

For both evaluation methods, we assessed the performance of the classifiers presented in Section 3.2 using the following measures [44–47].

(a)
Accuracy: the overall number of correct classifications divided by the size of the corresponding test set:
$\begin{matrix} a = \frac{cc}{n}, \end{matrix}$ (3)
where cc represents the number of cases correctly classified and n is the total number of cases in the test set.
(b)
Sensitivity: the ability to correctly identify those patients who actually have the disease:
$\begin{matrix} S = \frac{TP}{TP + FN}, \end{matrix}$ (4)
where TP represents true positive cases and FN is false negative cases.
(c)
Specificity: the ability to correctly identify those patients who do not have the disease:
$\begin{matrix} S p = \frac{TN}{TN + FP}, \end{matrix}$ (5)
where TN represents true negative cases and FP is false positive cases.

4. Methodology and Experimental Results

We used stratified 10-fold crossvalidation on the 98-case database described in Section 3.1. All the algorithms described in Section 3.2.1 used this data in order to learn a classification model. Once we have this model, we then evaluate its performance in terms of accuracy, sensitivity, and specificity. We used Weka [41] for the tests carried out here (see their parameter set in Table 2). For comparison purposes other classifiers were included: a multilayer perceptron (MLP) neural network and decision trees (ID3 and C4.5) with default parameters. The fundamental goal of this experiment was to assess the diagnostic power of the thermographic variables that form the score and the interactions among these variables. To illustrate how the variable values are obtained, we cite one example.

In Figure 1 we show the type of images obtained by the thermal imager; in this case, the front of the breast thermography. Using ThermaCAM Researcher Professional 2.9 [48] software, we detect the hottest areas of the breast that pass from red to gray. The breast whose furrow displays the largest gray area is assigned a positive value and the other a negative one.

Table 2.

Parameter values for Hill-Climber and Repeated Hill-Climber.

Parameters	Hill-Climber	Repeated Hill-Climber
The initial structure NB (Naïve Bayes)	False	False
Number of parents	100,000	100,000
Runs	—	10
Score type	MDL	MDL
Seed	—	1
Arc reversal	True	True

Open in a new tab

Thermal image showing the temperature of the color-coded breasts. The red and gray tones represent hotter areas.

In Figure 2 we show a general overview of the procedure of breast thermography, from thermal image acquisition to the formation of the score.

Tables 3, 4, 5, 6, 7, 8, 9, and 10 show the numerical results of this experiment. Figures 3 and 4 show the structures resulting from running Hill-Climber and Repeated Hill-Climber classifiers and Figure 5 shows the decision tree (C4.5). We do not present the structure of the Naïve Bayes classifier since it is always fixed: there is an arc pointing to every attribute from the class. For the accuracy test, the standard deviation is shown next to the accuracy result. For the remaining tests, their respective 95% confidence intervals (CI) are shown in parentheses.

Table 3.

Accuracy, sensitivity, and specificity results for the three Bayesian network classifiers presented in Section 3.2.1.

	Naïve Bayes	Hill-Climber	Repeated Hill-Climber
Accuracy	71.88% (±12.61)	76.10% (±7.10)	76.12% (±7.19)
Sensitivity	82% (74–91)	97% (94–100)	99% (96–100)
Specificity	37% (15–59)	0% (0-0)	0% (0-0)

Open in a new tab

Table 4.

Accuracy, sensitivity, and specificity of artificial neural network, decision trees ID3 and C4.5 for the thermography.

	Artificial neural network	Decision tree ID3	Decision tree C4.5
Accuracy	67.47% (±15.65)	73.19% (±12.84)	75.50% (±6.99)
Sensitivity	82% (73–91)	87% (79–94)	94% (88–99)
Specificity	33% (13–53)	29% (9–48)	0% (0-0)

Open in a new tab