Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2016 Apr 11.
Published in final edited form as: Magn Reson Med. 2015 May 20;75(4):1708–1716. doi: 10.1002/mrm.25743

Multi-institutional validation of a novel textural analysis tool for preoperative stratification of suspected thyroid tumors on diffusion weighted MRI

Anna M Brown 1,2, Sidhartha Nagala 3, Mary A McLean 1, Yonggang Lu 4, Daniel Scoffings 5, Aditya Apte 4, Mithat Gonen 6, Hilda E Stambuk 7, Ashok R Shaha 8, R Michael Tuttle 9, Joseph O Deasy 4, Andrew N Priest 5, Piyush Jani 10, Amita Shukla-Dave 4,7, John Griffiths 1
PMCID: PMC4654719  NIHMSID: NIHMS680607  PMID: 25995019

Abstract

Purpose

Ultrasound-guided fine-needle-aspirate-cytology (FNAC) fails to diagnose many malignant thyroid nodules; thus, patients may undergo diagnostic lobectomy. This study assessed whether textural analysis (TA) could non-invasively stratify thyroid nodules accurately using diffusion-weighted (DW) MRI.

Methods

This multi-institutional study examined 3T DW-MRI images obtained with spin-echo echo-planar-imaging (DW-EPI) sequences. The training dataset included 26 patients from Cambridge, UK and test dataset included 18 thyroid cancer patients from Memorial Sloan-Kettering, USA. Apparent diffusion coefficients (ADCs) were compared over regions-of-interest (ROIs) defined on thyroid nodules. TA, linear discriminant analysis (LDA), and feature reduction using the 21 MaZda-generated texture parameters that best distinguished benign and malignant ROIs.

Results

Training dataset mean ADC values were significantly different for benign and malignant nodules (p=0.02) with sensitivity and specificity of 70% and 63%, respectively, and receiver-operator-characteristic (ROC) area-under-the-curve (AUC) of 0.73. The LDA model of the top 21 textural features correctly classified 89/94 DW-MRI ROIs with 92% sensitivity, 96% specificity, and AUC of 0.97. This algorithm correctly classified 16/18 (89%) patients in the independently obtained test set of thyroid DW-MRI scans.

Conclusion

TA classifies thyroid nodules with high sensitivity and specificity on multi-institutional DW-MRI datasets. This method needs further validation in a larger prospective study.

Keywords: Textural analysis, diffusion weighted MRI, thyroid tumors

Introduction

Thyroid cancer is the most common malignant endocrine tumor, with an annual incidence in the USA of 12.2 per 100,000 in men and women per year (1). Thyroid nodules may have benign or malignant pathology, and are diagnosed before surgery using ultrasound-guided fine needle aspirate cytology (FNAC), the current gold standard. Thyroid nodules are common and ultrasound is an excellent screening tool to determine which nodules need FNAC. Despite repeated aspirates, however, up to 7% of nodules yield non-diagnostic cytology, classified as Thy1 (2). A further 15–30% of FNACs represent an indeterminate cytology (Thy3), where a follicular or Hurthle cell neoplasm is reported (3). The risk of malignancy within these Thy1 and Thy3 indeterminate nodules is 20–30% (4). These cytological categories with management recommendations are shown in Table 1.

Table 1.

Thyroid nodule cytology classification schema according to the British Thyroid Association guidelines, 2007

Thy1 Thy2 Thy3 Thy4 Thy5
Definition Non-diagnostic/cysts Non-neoplastic Indeterminate Suspicious for malignancy Malignant
Current management recommendations Repeat FNAC and US at follow up Repeat FNAC 3–6 months Diagnostic lobectomy Repeat FNAC, then either diagnostic lobectomy or radical treatment Radical treatment

FNAC = Fine needle aspiration cytology; US = ultrasound

A thyroid lobectomy may be therapeutic for Thy3 (indeterminate) patients if the histology is benign. However, if a malignant diagnosis is made, patients are likely to need completion thyroidectomy with central compartment lymph node dissection followed by radioiodine therapy. Accurate preoperative diagnosis would therefore improve surgical planning as well as reduce unnecessary operations, since patients with malignant tumors would receive one definitive operation. Thus, more research is needed into new modalities that discriminate between malignant and benign thyroid nodules.

Recent interest has centered on DW-MRI, which measures the apparent diffusivity of tissue water. When diffusion-sensitizing magnetic gradients are applied, Brownian motion of water protons creates a DW-MRI signal that can be used to generate maps of the apparent diffusion coefficient (ADC). Diffusion measurements can provide insight into tissue structure and organization, and can discriminate between benign and malignant tumors in organs such the breast, liver, and uterus (5). It is hypothesized that because of the increased cell proliferation in malignant tumors, water protons undergo less Brownian motion, thus lowering ADC. Several recent studies of thyroid nodules in small cohorts of patients have supported this hypothesis, as delineated in Table 2 (612).

Table 2.

Comparison of thyroid tumor DW-MRI studies

Razek et al. (6)

Thyroid Tissue Type N Mean ADC
(× 10−3mm2/s) ± SD
Optimum ADC
threshold

Benign
Adenomatous nodule 42 1.8 ± 0.14 0.98 × 10−3 mm2/s
Follicular adenoma 6 1.7 ± 0.17
Cyst 8 1.9 ± 0.38
Malignant
Papillary 4 0.68 ± 0.23
Follicular 3 0.77 ± 0.17

Bozgeyik et al. (7)

Thyroid Tissue Type N Mean ADC
(× 10−3mm2/s) ± SD
Optimum ADC
threshold

Benign 88 1.15 ± 0.43 0.62 × 10−3 mm2/s
Malignant 5 0.30 ± 0.20

Schueller-Weidekamm et al. (8)

Thyroid Tissue Type N Mean ADC
(× 10−3mm2/s) ± SD
Optimum ADC
threshold

Benign 20 1.93 ± 0.25
Malignant 5 2.73 ± 0.65 2.25 × 10−3 mm2/s
Contralateral Normal (malignant patients) 20 1.44 ± 0.65

Erdem et al. (9)

Thyroid Tissue Type N Mean ADC
(× 10−3mm2/s) ± SD
Optimum ADC
threshold

Benign 52 2.75 ± 0.60
Malignant 9 0.70 ± 0.31 N/A
Control normal 24 1.34 ± 0.28

Nakahira et al. (10)

Thyroid Tissue Type N Mean ADC
(× 10−3mm2/s) ± SD
Optimum ADC
threshold

Benign 23 1.93 ± 0.37 1.60 × 10−3 mm2/s
Malignant 19 1.20 ± 0.25

Mutlu at al. (11)

Thyroid Tissue Type N Mean ADC
(× 10−3mm2/s) ± SD
Optimum ADC
threshold

Benign 46 1.6 ± 0.1 1.60 × 10−3 mm2/s
Malignant 5 0.8 ± 0.2

Dilli et al. (12)

Thyroid Tissue
Type
N Mean ADC
(× 10−3mm2/s) ± SD
Optimum ADC threshold

Benign 40 1.98 ± 0.48 N/A
Malignant 19 0.83 ± 0.18

SD = standard deviation; ADC = apparent diffusion coefficient; N = number of participants in study

Textural analysis (TA) has become an attractive clinical tool, as it quantifies pixel intensity variation otherwise invisible to the naked eye and thus aids in characterizing underlying tissue structures. Several TA studies show good discrimination of thyroid nodules on ultrasound images (1315) and better distinction between benign and malignant thyroid lesions on nuclear chromatin images (16), but none have utilized TA on DW-MRI scans of the thyroid. The aim of this study was to assess whether textural analysis could improve the accuracy, sensitivity, and specificity of DW-MRI for the stratification of malignancy in suspicious thyroid nodules.

Methods

Two cohorts of patients, from the Cambridge University Foundation Hospital Trust, UK (Cambridge) and Memorial Sloan Kettering Cancer Center, USA (MSKCC) were included in this multi-institutional study. The clinical protocols and methods of analysis at each institution are described below.

Training dataset, Cambridge University Hospitals Foundation Trust, UK

Study Design and Patient Population

A total of 42 patients (11 men, mean age 57.1 years, range 29–79, and 31 women, mean age 42.9 years, range 18–78) with a preoperative cytological status that was indeterminate (Thy3), suspicious (Thy4) or diagnostic of thyroid cancer (Thy5) were prospectively recruited into this pilot study between February 2010 and January 2012, following ethical approval granted by the Local Research Ethics Committee (LREC) in January 2010. The inclusion criteria for the study were: (1) proven Thy 3–5 thyroid lesions on cytological classification; (2) a follicular neoplasm, suspected malignancy, or an inconclusive lesion on ultrasound-guided thyroid core biopsy; and (3) a plan for surgical excision. Exclusion criteria included the typical contraindications to MR imaging. Initially, FNAC or core biopsy was performed on all nodules and reported by an experienced cytologist or pathologist. Next, patients underwent preoperative MRI (protocol below). Two patients then opted out of surgical treatment and were excluded from the study. The remaining 40 patients underwent thyroid surgery. The type of thyroid surgery depended on the recommendation of the local thyroid multi-disciplinary team meeting, which followed the British Thyroid Association Guidelines, 2007 (see Table 1). The post-operative histology and nodule dimensions for the remaining 40/42 were reported by an experienced pathologist and correlated to the preoperative images.

Magnetic Resonance Imaging Protocol

Subjects were studied on a 3 Tesla HDx scanner (GE Healthcare, Waukesha WI, USA). Signal was transmitted using a body coil and received using two channels of a four-channel phased array surface coil (PACC, Machnet BV, Elde, The Netherlands) designed for studies of the carotid arteries. One arm of the coil was centered over the area of interest (thyroid nodule) to maximize local sensitivity and secured by a soft cervical collar to reduce motion artifact. After a 3-plane localizer, the following sequences were performed:

  1. Fast spin echo axial T1: echo time (TE) = 12 ms; repetition time (TR) = 580 ms; field of view (FOV) 18 cm; matrix 256 × 192; 4 averages; 15 slices (slice thickness 5 mm, spacing 1 mm); scan duration 2:31.

  2. Fast spin echo axial T2: TE = 102 ms; TR = 3780 ms; FOV 18 cm; matrix 384 × 256; 2 averages; 15 slices (slice thickness 5 mm, spacing 1 mm); scan duration 1:38.

  3. Fast spin echo axial T2 with fat saturation: same as sequence 2, except TR = 3360 ms; matrix 320 × 192; and a chemical shift selective fat suppression pulse was used; scan duration 1:13.

  4. Diffusion-weighted dual-spin-echo echo-planar imaging (DW-EPI): TE = 81 ms; TR = 2200 ms; FOV 22 cm; matrix 128 × 128; 16 averages; slice thickness 5 mm; spacing 1 mm; scan duration 2:21; and b-values of 0 and 500 s/mm2 were acquired. Fat saturation was achieved using both a spectrally selective saturation pulse and a water-selective excitation pulse. Spatial saturation bands were also used to remove signal from overlying fat and other nearby tissues. The scanner software automatically interpolated the images to a reconstructed matrix of 256×256 by zero-filling k-space.

Image Analysis

The ADC maps were calculated by fitting the signal intensities in the images with b-values of 0 and 500 s/mm2 as follows:

S(500)=S(0)*exp(500*ADC) [1]

An experienced neuroradiologist, blinded to the clinical data of the subjects, drew regions of interest (ROIs) around the thyroid lesions on each image slice containing a lesion, avoiding any obvious cysts or hematomata from previous biopsy. ROI measurements were defined on ADC maps, with reference to the T2-weighted images, using an Advantage Windows workstation and FuncTool software (GE Healthcare). Images where a thyroid nodule was not clearly identified (due to the small volume of non-cystic tissue sampled or to the severity of DW-MRI-related distortions) were excluded from analysis. Specifically, patients were excluded due to withdrawal from surgery (n=2), image distortion (n=4), nodule too small to be identified at <10 mm (n=3), and cystic nodule (n=7). This resulted in 26 of the 40 patients with reliable images for analysis. In this cohort, there were 10 patients with malignant nodules and 16 patients with benign pathology. To maintain consistency with the test dataset, which was a population of exclusively papillary carcinomas, the malignant nodules in the local dataset were limited to the 8 cases of papillary carcinoma. A total of 24 patients with 94 image slices were included in the final training set for analysis. The number of image slices per patient ranged from 1 to 7 in the training set (mean = 4).

Mean ADC values for each slice in the nodule were derived using the FuncTool software. The mean ADCs from multiple slices were then pooled as follows:

x¯=w1x1+w2x2++wnxnw1+w2++wn [2]

where = overall weighted-mean ADC; w1 = area of first ROI; w2 = area of second ROI; x1 = mean ADC of first ROI; x2 = mean ADC of second ROI, etc.

Statistical Analysis

Weighted-mean ADC values were plotted against post-operative histology (benign and malignant thyroid tissue) and the ROI areas and 95% confidence intervals (CIs) were calculated using GraphPad Prism (ver. 5.00 for Windows, GraphPad Software, San Diego, California, USA). A two-sample t-test was used to compare mean values between benign and malignant cases.

Test dataset, Memorial Sloan Kettering Cancer Center

Study Design and Patient Population

Between January 2011 and March 2012, a convenience sample of 25 adult patients (≥18 years) undergoing surgical consultation for thyroidectomy on the basis of a thyroid nodule FNAC either (1) demonstrating papillary thyroid cancer or (2) suspicious for thyroid cancer were enrolled in a prospective clinical trial evaluating multi-parametric MRI including DW-MRI in the pre-operative evaluation of head and neck tumors. This prospective protocol was approved by the Memorial Sloan Kettering Cancer Center (MSKCC) local institutional review board. After providing appropriate informed consent, all subjects underwent research MRI prior to thyroid surgery. The exclusion criteria were: (i) presence of contraindication to MRI, (ii) tumor size larger than 5 cm (detected by ultrasonography) and (iii) claustrophobic patients. Of the 25 patients initially enrolled in the study, 7 patients were excluded from the study due to either distorted image quality (n=5) or small tumor size such that visualization was difficult on DW-MRI images (n=2). Eighteen patients were suitable for the final analysis.

Magnetic Resonance Imaging Protocol

MRI examination was performed on a 3 Tesla HDx scanner (GE Healthcare, Waukesha WI, USA) using an 8-channel neurovascular phased-array coil. The MRI study consisted of standard multi-planar (sagittal, axial, coronal) T1 and T2-weighted imaging scans followed by DW-MRI scans. The duration of the whole examination was approximately 30 minutes.

The T1 and T2-weighted MRI scans covered the whole thyroid gland with a slice thickness of 5 mm, field of view (FOV) of 20–24 cm, and acquisition matrix of 256×256. For the T1-weighted MRI, TR = 500 ms and TE = 15 ms; for the T2-weighted MRI, TR = 4000 ms and TE = 80 ms.

DW-MRI data were acquired using a single-shot echo planar imaging (SS-EPI) spin echo sequence (TR = 4000 ms; TE = 98–104 ms; number of excitation [NEX] = 4; 3 orthogonal directions) with b values of 0 and 500 s/mm2. Fat-suppression, shimming (shimming FOV =14~16 cm), and parallel imaging (acceleration factor =2) techniques were used. The DW-MRI scans were focused on thyroid tumors with 4 to 8 slices of 5-mm thickness, 0-mm gap, 20~24-cm FOV, and 128 × 128 acquisition matrix, which was zero-filled and reconstructed to 256 × 256 pixels. Images were all obtained in axial planes.

Image Analysis

The ROIs for papillary thyroid cancers were placed within the thyroid gland images avoiding obvious cystic, hemorrhagic, or calcified portions. Based on the radiological and clinical information including ultrasound reports, they were drawn on the DW-MR images by a neuroradiologist with more than 10 years’ experience. The ROI encompassed the entire nodule of interest with a minimum two dimensional ROI considered to be 17 mm2 (i.e.17 pixels). The ADC values were calculated using Eqn 1with b values of 0 and 500 s/mm2. A noise floor rectification scheme was used in the ADC calculation (17), which was performed on a voxel-by-voxel basis, generating an ADC map as well as averaged values for the ROIs.

Textural Analysis

Textural analysis (TA) was performed using MaZda (Institute of Electronics, Technical University of Łódź, Wólczańska, Poland), a freely available software package (1820). Two-dimensional ROIs delineated by radiologists at each institution were transferred to MaZda by using binary masks in ImageJ (National Institutes of Health, Bethesda, MD, USA). An example of the ROI transfer process is shown in Figure 1.

Figure 1.

Figure 1

Apparent Diffusion Coefficient (ADC) images are shown for a patient with a follicular adenoma from the training set, depicting the neuroradiologist-defined region of interest (ROI) of the lesion on a bitmap-format ADC map in FuncTool (a) and the same ROI shown on the original resolution DICOM-format ADC map in ImageJ (b).

Training Dataset Analysis

The MaZda textural analysis resulted in a report with more than 300 texture parameters for each ROI in the training dataset. There were seven texture feature categories included in this analysis: run-length matrix, wavelet transform, gradient, geometric, histogram, and autoregressive model parameters in addition to features derived from co-occurrence matrices in four directions (0, 45, 90, and 135 degrees) at pixel pair distances ranging from 1 to 5 pixels in separation. Feature reduction was necessary to reduce the dimensionality. MaZda offers three feature reduction algorithms: mutual information, Fisher coefficient, and classification error probability, and average correlation coefficients (POE+ACC). Each algorithm determined the 10 texture features that best distinguished the selected classes in the program (e.g. benign and malignant), such that a combined total of up to 30 parameters were identified for further investigation (2123). This dimensionality was further reduced by exporting the selected features into the statistical package b11 (Institute of Electronics, Technical University of Łódź, Wólczańska, Poland). Within b11, subsets of the top 30 parameters were further evaluated by sequentially eliminating features of lower significance based on the MaZda-assigned rank (e.g. top 29, top 28, top 27, etc. down to the top 2 parameters). The misclassification rate for distinguishing benign and malignant nodules using linear discriminant analysis (LDA) for each of these subsets was then observed. The final subset achieving the lowest misclassification rate was selected for the LDA model. The resultant most discriminant factor 1 (MDF1) values in the LDA model of the training set were exported into GraphPad Prism to determine the sensitivity and specificity of the selected cutoff MDF1 value and to generate an ROC curve. Additional analysis included comparing the number of central slices and end slices that were misclassified in nodules containing at least 3 slices, and classifying thyroid nodules on the basis of the slice containing the lowest MDF1 value (lowest scoring slice). The lowest scoring slice was considered rather than the highest scoring slice in order to minimize false positive results.

Test Dataset Analysis

The DW-EPI images and ROIs of the test dataset were imported into MaZda and processed in the same way as the training set to generate >300 texture features per ROI using the same seven texture classes as were considered for the training set. Next, the MDF1 was calculated using the same LDA model equation and final subset of parameters used for the training set. The resultant MDF1 values were used to classify the test set samples into either malignant or benign categories, based on the pre-defined training set MDF1 cutoff value. The additional comparisons of central versus end slice misclassification rates and lowest scoring slice analysis as described in the prior section were also performed.

Results

Training Dataset

The T2-weighted and DW-EPI images were collected in 40 patients and achieved sufficient quality for reliable ROI definition in 26 patients with a variety of benign and malignant tumor subtypes. Figure 1 depicts an example of one patient’s ADC maps with ROIs drawn avoiding a cystic area. Each ROI was originally delineated by an experienced neuroradiologist using the FuncTool software (GE Healthcare) and subsequently carefully traced using ImageJ software onto the original resolution ADC maps so that binary masks of these ROIs could be imported into MaZda to preserve the original ROI locations. For each patient, the entire nodule was classified as benign or malignant, based on histological analysis. The maximum nodule diameter was determined, with a mean and standard deviation of 29.3 ± 8.0 mm for the benign nodules and 33.3 ± 10.4 mm for the malignant nodules.

The performance of ADC alone in distinguishing malignant and benign nodules was determined. Figure 2 displays the overall weighted-mean ADC values for benign and malignant tumors, with means for each patient and corresponding subtype of thyroid nodule. The overall weighted-mean ADC for benign tumors was 2.24 × 10−3 mm2/s (95% CI 2.09 – 2.39) and for malignant tumors was 1.92 × 10−3 mm2/s (95% CI 1.65 – 2.19). The difference between the means of the benign and malignant nodules was significant (P=0.02); however, there was overlap between the confidence intervals resulting in an AUC of 0.73 (95% CI 0.51 to 0.95), sensitivity of 70%, and specificity of 63% on ROC analysis using a cutoff ADC value of 2.16 × 10−3 mm2/s (Figure 2).

Figure 2.

Figure 2

(a) Overall weighted mean and 95% confidence interval of the ADC values of benign and malignant thyroid tumors for DW-EPI: p=0.02 for the difference between means. The overall weighted mean ADC for benign tumors was 2.24 × 10−3 mm2/s (95% CI 2.09 – 2.39) and for papillary carcinoma malignant tumors was 1.92 × 10−3 mm2/s (95% CI 1.65 – 2.19). The follicular carcinoma (n=1) and neuroendocrine (n=1) tumors shown in this graph were not included in the final analysis. (b) ROC curve for performance of ADC using a cutoff value of 2.16 × 10−3 mm2/s to distinguish benign and malignant nodules demonstrates an area-under-the-curve (AUC) of 0.73 (95% CI 0.51 to 0.95), sensitivity of 70%, and specificity of 63%.

For the training set malignant category, only nodules containing papillary carcinoma were included. Texture analysis on the DW-EPI images yielded higher sensitivity and specificity values (Figure 3) than the ADC analysis. Table 3 lists the original top 30 MaZda texture analysis parameters obtained by using the three feature-reduction algorithms (Fisher, POE+ACC, and MI), the final subset of the top 21 parameters used for the LDA model, and the corresponding texture classes for each parameter. This texture analysis LDA model used a cutoff MDF1 value of >0.03265 as the basis for classification as malignant. It correctly classified 89/94 thyroid nodule slices in the training set, resulting in a misclassification rate of 5.3%, an area-under-the-curve (AUC) of 0.97 (95% CI 0.92 – 1.0), and the sensitivity and specificity values were 92% and 96%, respectively. Of the five misclassified slices, one was a central slice (slice 2 of 7) and four were end slices (either the first or last slice). Distinguishing whole thyroid nodules on the basis of the slice per nodule with the lowest MDF1 value (lowest scoring slice) resulted in correct classification of 22/24 nodules in the training set based on the pre-defined cutoff value (Figure 3).

Figure 3.

Figure 3

Texture-based classification of individual images (a–c) or the nodule as a whole (d).

(a) Output from b11 for the linear discriminant analysis (LDA) classification most discriminant factor 1 (MDF1) values for all 94 slices of the training set. MDF1 values are shown for benign and malignant slices, with red 1=benign and green 2=malignant. 89/94 correctly classified, using a cutoff value of 0.03265. (b) Mean and standard deviation of the benign and malignant MDF1 values. (c) ROC curve for using this MDF1 cutoff as a diagnostic tool. P<0.0001 and AUC of 0.97 (95% CI 0.92 to 1.0). (d) LDA classification results for the slice with the lowest MDF1 value per patient (lowest scoring slice analysis). 22/24 nodules were correctly classified, using the same training set cutoff value of 0.03265. Mean and standard deviation values are shown along with separate points for each nodule. The two misclassified nodules were both malignant and are shown in red.

Table 3.

Top 30 texture parameters and top 21 feature subset for thyroid stratification model

MaZda
rank
Texture class Top 30 texture
parameters
Top 21 feature
subset
1 Geometric GeoY GeoY
2 Geometric GeoX GeoX
3 Co-occurrence matrix S(0,3)SumAverg S(0,3)SumAverg
4 Co-occurrence matrix S(0,4)SumAverg S(0,4)SumAverg
5 Co-occurrence matrix S(0,1)SumAverg S(0,1)SumAverg
6 Co-occurrence matrix S(0,2)SumAverg S(0,2)SumAverg
7 Co-occurrence matrix S(0,5)SumAverg S(0,5)SumAverg
8 Co-occurrence matrix S(2,0)SumOfSqs S(2,0)SumOfSqs
9 Co-occurrence matrix S(1,0)SumOfSqs S(1,0)SumOfSqs
10 Co-occurrence matrix S(2,2)Correlat S(2,2)Correlat
11 Geometric GeoM2xy GeoM2xy
12 Co-occurrence matrix S(1,0)SumVarnc S(1,0)SumVarnc
13 Co-occurrence matrix S(3,−3)DifVarnc S(3,−3)DifVarnc
14 Geometric GeoS2 GeoS2
15 Geometric GeoXYo GeoXYo
16 Autoregressive model Teta1 Teta1
17 Co-occurrence matrix S(2,0)SumAverg S(2,0)SumAverg
18 Geometric GeoYo GeoYo
19 Wavelet transform WavEnHH_s-3 WavEnHH_s-3
20 Co-occurrence matrix S(5,5)DifEntrp S(5,5)DifEntrp
21 Co-occurrence matrix S(1,0)SumAverg S(1,0)SumAverg
22 Co-occurrence matrix S(1,1)SumAverg
23 Wavelet transform WavEnLL_s-3
24 Co-occurrence matrix S(2,2)SumAverg
25 Co-occurrence matrix S(1,−1)SumAverg
26 Co-occurrence matrix S(2,−2)SumAverg
27 Co-occurrence matrix S(3,0)SumAverg
28 Co-occurrence matrix S(3,3)SumAverg
29 Co-occurrence matrix S(4,0)SumAverg
30 Co-occurrence matrix S(3,−3)SumAverg

Test Dataset

Our LDA model was tested on an independent dataset from MSKCC containing papillary carcinoma thyroid nodules. The mean ADC value for this cohort was 1.80 × 10−3 mm2/s (95% CI 1.52 – 2.08). Using the same 21 texture parameters from the training set LDA model, 32/34 slices were classified correctly, resulting in an overall misclassification rate of 5.9% (Figure 4). Using the same cutoff MDF1 value of the training set (>0.03265), this resulted in a sensitivity of 89% [95% CI 65 to 99] and specificity of 97% [95% CI 74 to 100]. Of the two misclassified slices, one was a central slice (2 of 3) and one was an end slice. The lowest scoring slice analysis correctly classified 16/18 nodules in the test set (Figure 4).

Figure 4.

Figure 4

(a) LDA classification most discriminant factor 1 (MDF1) results for all 34 slices of the test set, with median and interquartile ranges displayed alongside training set results. 32/34 slices classified correctly, using the training set cutoff value of 0.03265. (b) LDA classification results for the slice with the lowest MDF1 value per patient (lowest scoring slice analysis) for the test set. 16/18 nodules correctly classified, using the same training set cutoff value of 0.03265 for MDF1 values. Mean and standard deviation values are shown along with separate points for each nodule. One of the two misclassified test set nodules is shown in red, and the other was an outlier (not shown, MDF1 value −13.5).

Discussion and Conclusions

Comparisons with other studies

Our results comparing benign and malignant ADCs are consistent with recent reports, as shown in Table 2 (612). All except one (8) found lower ADCs in malignant thyroid nodules compared to benign nodules, supporting the hypothesis that increased cellularity and reduced extracellular extravascular space restrict water diffusion in malignant nodules (24). However, our results indicated poor sensitivity and specificity for using ADC alone to discriminate benign and malignant pathology. This could be due to cytological similarities, since both malignant and benign follicular thyroid tumors may be well-differentiated and exhibit significant cytological overlap (25). Additionally, in our study some small cystic and necrotic areas may have been included in the ROIs despite efforts to avoid them, which would have artifactually increased the mean ADC value of the nodule.

Strengths of the study

To our knowledge, this is the first attempt to use texture analysis (TA) for diffusion-weighted imaging of suspected thyroid tumor nodules. Validating this model on an independent dataset from another institution provides additional evidence that this tool can be implemented in a clinical setting and is robust against institutional differences in imaging equipment and technique. Our study demonstrates very high performance for both the training and test datasets as evidence of this robustness.

Limitations of DW-MRI Results

The DW-MRI images showed distortion at 3 Tesla, and, based on neuroradiologist exclusion criteria, only 26/40 patients (University of Cambridge) and 18/25 patients (MSKCC) had images that could be interpreted. MSKCC excluded 7 patients due to either distorted image quality (n=5) or small tumor size resulting in poor visualization on DW-MRI images (n=2). Of note, seven cystic nodules (University of Cambridge) were excluded. However, other common thyroid imaging techniques such as ultrasound elastography are also unable to image cystic nodules (26). Better pulse sequences are necessary to reduce image distortion and improve interpretability such that radiologists are able to draw reliable ROIs around small nodules. One potential method that merits further investigation is reduced field-of-view DW-EPI (27), which has previously shown less distortion in diffusion imaging of the kidneys (28).

Possible Methodological Improvements

The small sample size of this study (24 patients in the training set (Cambridge), 18 in the test set (MSKCC)) results in underrepresentation of several tumor subtypes. Moreover, our decision to limit the malignant pathology in our training set to only papillary carcinomas reduces its universal applicability to distinguish benign nodules from other types of malignant pathology. A larger study is required, including all common tumor pathologies.

Another concern is that the large number of texture parameters used for the LDA model may “over fit” the training set, as the 21 parameters were combined into a linear discriminant analysis model to represent the 94 slices in the training set. To reduce the risk of overfitting, the top 30 parameters from the three feature reduction tools of the original MaZda output were further examined in subsets in an attempt to reduce the dimensionality of the texture parameters while still achieving the lowest misclassification rate; that resulted in the number of texture features being reduced from 30 to 21 parameters. It is encouraging that 32/34 slices in the independently obtained test set from another institution were classified correctly; however, the number of parameters is still quite large relative to the size of the dataset so overfitting remains a risk. Testing this tool on larger datasets will better characterize its robustness.

One potential technical concern is that the image resolution of 256×256, obtained after scanner software zero-fill interpolation, may alter the image’s textural properties when compared to the original images in which the resolution was determined by pulse sequence parameters (128×128). Previous studies have shown that zero-fill interpolated images enhance physically distinct structures’ textural differences (29,30). However, it is routine clinical practice to use MR scanner software to interpolate images by automatic zero-filling of k-space to achieve a resolution of 256×256. Thus, our results are reflective of results obtained using routine clinical images.

An additional technical consideration is the difference in TR between the diffusion MRI sequences used at the two institutions. Variations in TE, TR, and other pulse sequence variables have been shown to affect texture features in phantom studies (31). Thus, ostensibly the variation in TR values used at the two institutions may impact the quality of the textural calculations. However, this concern may be partially alleviated by the dominance of co-occurrence matrix-derived texture parameters in our LDA model. Co-occurrence-based features were found to be the most robust of the texture categories examined by Mayerhoefer et al (31). This finding has been corroborated by recent studies that identified co-occurrence matrix features as superior to all other texture classes in distinguishing benign and malignant breast lesions (32) and certain co-occurrence matrix features as helpful in differentiating brain malignancies (33). Therefore, while MRI acquisition parameters certainly need to be taken into account in further clinical applications of this technique, it is encouraging that our model is primarily comprised of the co-occurrence features previously deemed robust in multiple clinical studies.

Furthermore, the low success rate of 26/40 reliable ROIs (Cambridge dataset) will not apply in future studies, since small and cystic lesions (n=10 in the training dataset) could be identified by standard ultrasound, and patients with inappropriate lesion characteristics would not be offered the DW-MRI test. Patients excluded from this study on the grounds of poor or distorted image quality (n=4 training set, n=5 test set) present another challenge. However, we anticipate that this problem will also be greatly reduced in the future due to the development of DW-MRI techniques with reduced distortion (27). In principle, if thinner slices were analyzed, the three-dimensional TA capability of MaZda could also improve the classification, as patterns in the z-axis direction could be detected.

In conclusion, our pilot study indicates potential for textural analysis to be used on DW-MRI images for non-invasively categorizing the malignancy of thyroid nodules in a single, definitive procedure, thus sparing patients the unnecessary operations and waiting times associated with a diagnostic lobectomy. The current multi-center study shows promise for the limited patient population represented by our investigations; the ability of the LDA method to classify images obtained in another institution using different imaging parameters suggests that it will be robust. A larger, prospective study is now needed to fully prove this model.

Acknowledgments

Cancer Research UK (core funding award C14303/A17197), Addenbrooke’s Charitable Trust, University of Cambridge, Cambridge Experimental Cancer Medicine Centre, and NIHR Cambridge Biomedical Research Centre for funding, and Mr. Brian Fish (ENT Consultant), Addenbrooke's Hospital, for help with patient recruitment. The MSKCC work was supported by the National Cancer Institute/National Institutes of Health (grant numbers 1R21CA176660-01A1 and P50 CA172012-01A1)

Footnotes

Prior abstract presentations:

American Medical Association Interim Meeting, November 8th – 10th, 2012

International Society of Magnetic Resonance in Medicine (ISMRM) Annual Symposium, April 24th – 26th, 2013

ENT UK Annual Conference, London, September 13th, 2013

American Medical Women’s Association Annual Meeting, March 13th–16th, 2014

Author Disclosure Statement

No competing financial interests exist for any of the authors of this manuscript.

References

  • 1.U.S. Department of Health and Human Services NIH. SEER Stat Fact Sheets: Thyroid Cancer. Surveill. Res. Program, Natl. Cancer Inst. 2014:1. [Internet] [Google Scholar]
  • 2.Yeh MW, Demircan O, Ituarte P, Clark OH. False-negative fine-needle aspiration cytology results delay treatment and adversely affect outcome in patients with thyroid carcinoma. Thyroid. 2004;14:207–215. doi: 10.1089/105072504773297885. [DOI] [PubMed] [Google Scholar]
  • 3.Hegedus L. Clinical practice. The thyroid nodule. N Engl J Med. 2004;351:1764–1771. doi: 10.1056/NEJMcp031436. [DOI] [PubMed] [Google Scholar]
  • 4.Baloch Z, LiVolsi V, Asa S, Rosai J, Merino M, Randolph G, Vielh P, DeMay R, Sidawy M, Frable W. Diagnostic terminology and morphologic criteria for cytologic diagnosis of thyroid lesions: a synopsis of the National Cancer Institute Thyroid Fine-Needle Aspiration State of the Science Conference. Diagn Cytopathol. 2008;36:425–437. doi: 10.1002/dc.20830. [DOI] [PubMed] [Google Scholar]
  • 5.Thoeny HC, De Keyzer F. Extracranial applications of diffusion-weighted magnetic resonance imaging. Eur Radiol. 2007;17:1385–1393. doi: 10.1007/s00330-006-0547-0. [DOI] [PubMed] [Google Scholar]
  • 6.Razek AA, Sadek AG, Kombar OR, Elmahdy TE, Nada N. Role of apparent diffusion coefficient values in differentiation between malignant and benign solitary thyroid nodules. AJNR Am J Neuroradiol. 2008;29:563–568. doi: 10.3174/ajnr.A0849. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Bozgeyik Z, Coskun S, Dagli AF, Ozkan Y, Sahpaz F, Ogur E. Diffusion-weighted MR imaging of thyroid nodules. Neuroradiology. 2009:193–198. doi: 10.1007/s00234-008-0494-3. [DOI] [PubMed] [Google Scholar]
  • 8.Schueller-Weidekamm C, Kaserer K, Schueller G, Scheuba C, Ringl H, Weber M, Czerny C, Herneth A. Can quantitative diffusion-weighted MR imaging differentiate benign and malignant cold thyroid nodules? Initial results in 25 patients. AJNR Am J Neuroradiol. 2009;30:417–422. doi: 10.3174/ajnr.A1338. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Erdem G, Erdem T, Karakas HM, Mutlu DY, Fırat AK, Sahin I, Alkan A. Diffusion-weighted images differentiate benign from malignant thyroid nodules. J. Magn Reson Imaging. 2010;31:94–100. doi: 10.1002/jmri.22000. [DOI] [PubMed] [Google Scholar]
  • 10.Nakahira M, Saito N, Murata S, Sugasawa M, Shimamura Y, Morita K, Takajyo F, Omura G, Matsumura S. Quantitative diffusion-weighted magnetic resonance imaging as a powerful adjunct to fine needle aspiration cytology for assessment of thyroid nodules. Am J Otolaryngol. 2012;33:408–416. doi: 10.1016/j.amjoto.2011.10.013. [DOI] [PubMed] [Google Scholar]
  • 11.Mutlu H, Sivrioglu A, Sonmez G, Velioglu M, Sildiroglu H, Basekim C, Kizilkaya E. Role of apparent diffusion coefficient values and diffusion-weighted magnetic resonance imaging in differentiation between benign and malignant thyroid nodules. Clin Imaging. 2012;36:1–7. doi: 10.1016/j.clinimag.2011.04.001. [DOI] [PubMed] [Google Scholar]
  • 12.Dilli A, Ayaz UY, Cakir E, Cakal E, Gultekin SS, Hekimoglu B. The efficacy of apparent diffusion coefficient value calculation in differentiation between malignant and benign thyroid nodules. Clin Imaging. 2012;36:316–322. doi: 10.1016/j.clinimag.2011.10.006. [DOI] [PubMed] [Google Scholar]
  • 13.Bibicu D, Moraru L, Biswas A. Thyroid Nodule Recognition Based on Feature Selection and Pixel Classification Methods. J. Digit. Imaging. 2012 doi: 10.1007/s10278-012-9475-5. [Internet] [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Chen S-J, Chang C-Y, Chang K-Y, Tzeng J-E, Chen Y-T, Lin C-W, Hsu W-C, Wei C-K. Classification of the thyroid nodules based on characteristic sonographic textural feature and correlated histopathology using hierarchical support vector machines. Ultrasound Med. Biol. 2010;36:2018–2026. doi: 10.1016/j.ultrasmedbio.2010.08.019. [Internet] [DOI] [PubMed] [Google Scholar]
  • 15.Hirning T, Zuna I, Schlaps D, Lorenz D, Meybier H, Tschahargane C, van Kaick G. Quantification and classification of echographic findings in the thyroid gland by computerized B-mode texture analysis. Eur. J. Radiol. 1989;9:244–247. [Internet] [PubMed] [Google Scholar]
  • 16.Ferreira RC, Ward LS, Adam RL, Leite NJ, Metze K, Matos PSde. Contribution of nuclear texture analysis for the differential diagnosis of follicular lesions of the thyroid: comparison to immunohistochemical markers. Arq. Bras. Endocrinol. Metabol. 2009;53:804–810. doi: 10.1590/s0004-27302009000700003. [Internet] [DOI] [PubMed] [Google Scholar]
  • 17.Prah DE, Paulson ES, Nencka AS, Schmainda KM. A simple method for rectified noise floor suppression: Phase-corrected real data reconstruction with application to diffusion-weighted imaging. Magn. Reson. Med. 2010;64:418–429. doi: 10.1002/mrm.22407. [Internet] [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Strzelecki M, Szczypinski P, Materka A, Klepaczko A. A software tool for automatic classification and segmentation of 2D/3D medical images. Nucl. Instruments Methods Phys. Res. A. 2013;702:137–140. [Google Scholar]
  • 19.Szczypiński PM, Strzelecki M, Materka A, Klepaczko A. MaZda--a software package for image texture analysis. Comput. Methods Programs Biomed. 2009;94:66–76. doi: 10.1016/j.cmpb.2008.08.005. [Internet] [DOI] [PubMed] [Google Scholar]
  • 20.Szczypinski P, Strzelecki M, Materka A. MaZda - a Software for Texture Analysis. Proc. of ISITC. 2007:245–249. doi: 10.1016/j.cmpb.2008.08.005. [DOI] [PubMed] [Google Scholar]
  • 21.Holli KK, Harrison L, Dastidar P, Wäljas M, Liimatainen S, Luukkaala T, Ohman J, Soimakallio S, Eskola H. Texture analysis of MR images of patients with mild traumatic brain injury. BMC Med. Imaging. 2010;10:8. doi: 10.1186/1471-2342-10-8. [Internet] [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Zacharaki EI, Wang S, Chawla S, Soo Yoo D, Wolf R, Melhem ER, Davatzikos C. Classification of brain tumor type and grade using MRI texture and shape in a machine learning scheme. Magn. Reson. Med. 2009;62:1609–1618. doi: 10.1002/mrm.22147. [Internet] [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Chen G, Jespersen S, Pedersen M, Pang Q, Horsman MR, Stødkilde Jørgensen H. Evaluation of anti-vascular therapy with texture analysis. Anticancer Res. 2005;25:3399–3405. [Internet] [PubMed] [Google Scholar]
  • 24.Colagrande S, Carbone SF, Carusi LM, Cova M, Villari N. Magnetic resonance diffusion-weighted imaging: extraneurological applications. Radiol Med. 2006;111:392–419. doi: 10.1007/s11547-006-0037-0. [DOI] [PubMed] [Google Scholar]
  • 25.Kelman AS, Rathan A, Leibowitz J, Burstein DE, Haber RS. Thyroid cytology and the risk of malignancy in thyroid nodules: importance of nuclear atypia in indeterminate specimens. Thyroid. 2001;11:271–277. doi: 10.1089/105072501750159714. [DOI] [PubMed] [Google Scholar]
  • 26.Bhatia KSS, Rasalkar DP, Lee YP, Wong KT, King aD, Yuen HY, Ahuja aT. Cystic change in thyroid nodules: a confounding factor for real-time qualitative thyroid ultrasound elastography. Clin. Radiol. 2011;66:799–807. doi: 10.1016/j.crad.2011.03.011. [Internet] [DOI] [PubMed] [Google Scholar]
  • 27.Taviani V, Nagala S, Priest AN, McLean MA, Jani P, Graves MJ. 3T diffusion-weighted MRI of the thyroid gland with reduced distortion: preliminary results. Br. J. Radiol. 2013;86:20130022. doi: 10.1259/bjr.20130022. [Internet] [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Jin N, Deng J, Zhang L, Zhang Z, Lu G, Omary RA, Larson AC. Targeted single-shot methods for diffusion-weighted imaging in the kidneys. J. Magn. Reson. imaging. 2011;33:1517–1525. doi: 10.1002/jmri.22556. [Internet] [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Mayerhoefer ME, Szomolanyi P, Jirak D, Berg A, Materka A, Dirisamer A, Trattnig S. Effects of magnetic resonance image interpolation on the results of texture-based pattern classification: a phantom study. Invest. Radiol. 2009;44:405–411. doi: 10.1097/RLI.0b013e3181a50a66. [Internet] [DOI] [PubMed] [Google Scholar]
  • 30.Mayerhoefer M, Schima W. Texture-based classification of focal liver lesions on MRI at 3.0 Tesla: A feasibility study in cysts and hemangiomas. J. Magn. 2010;32:352–359. doi: 10.1002/jmri.22268. [Internet] [DOI] [PubMed] [Google Scholar]
  • 31.Mayerhoefer ME, Szomolanyi P, Jirak D, Materka A, Trattnig S. Effects of MRI acquisition parameter variations and protocol heterogeneity on the results of texture analysis and pattern discrimination: An application-oriented study. Med. Phys. 2009;36:1236. doi: 10.1118/1.3081408. [Internet] [DOI] [PubMed] [Google Scholar]
  • 32.Holli K, Laaperi A-L, Harrison L, Luukkaala T, Toivonen T, Rymin P, Dastidar P, Soimakallio S, Eskola H. Characterization of Breast Cancer Types by Texture Analysis of Magnetic Resonance Images. Acad Radiol. 2010;17:135–141. doi: 10.1016/j.acra.2009.08.012. [DOI] [PubMed] [Google Scholar]
  • 33.Eliat P-A, Olivié D, Saïkali S, Carsin B, Saint-Jalmes H, de Certaines JD. Can dynamic contrast-enhanced magnetic resonance imaging combined with texture analysis differentiate malignant glioneuronal tumors from other glioblastoma? Neurol. Res. Int. 2012;2012:195176. doi: 10.1155/2012/195176. [Internet] [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES