Data on MRI brain lesion segmentation using K-means and Gaussian Mixture Model-Expectation Maximization

Ju Qiao; Xuezhu Cai; Qian Xiao; Zhengxi Chen; Praveen Kulkarni; Craig Ferris; Sagar Kamarthi; Srinivas Sridhar

doi:10.1016/j.dib.2019.104628

. 2019 Oct 10;27:104628. doi: 10.1016/j.dib.2019.104628

Data on MRI brain lesion segmentation using K-means and Gaussian Mixture Model-Expectation Maximization

Ju Qiao ^a, Xuezhu Cai ^b, Qian Xiao ^c, Zhengxi Chen ^d, Praveen Kulkarni ^e, Craig Ferris ^e, Sagar Kamarthi ^a, Srinivas Sridhar ^f,^∗

PMCID: PMC6820303 PMID: 31687441

Abstract

The data in this article provide details about MRI lesion segmentation using K-means and Gaussian Mixture Model-Expectation Maximization (GMM-EM) algorithms. Both K-means and GMM-EM algorithms can segment lesion area from the rest of brain MRI automatically. The performance metrics (accuracy, sensitivity, specificity, false positive rate, misclassification rate) were estimated for the algorithms and there was no significant difference between K-means and GMM-EM. In addition, lesion size does not affect the accuracy and sensitivity for either method.

Keywords: Ischemic stroke, Lesion, Magnetic resonance image (MRI), Segmentation

Specifications Table

Subject area	Biology
More specific subject area	Magnetic Resonance Imaging Segmentation
Type of data	image, graph, figure
How data was acquired	Raw data were from ischemic stroke lesion segmentation online database. Segmentation data were acquired using K-means and Gaussian Mixture Model-Expectation Maximization algorithms.
Data format	analyzed data
Experimental factors	All images were normalized and co-registered for all subjects
Experimental features	The segmentation labels were determined using K-means and GMM-EM
Data source location	Raw data at: http://www.isles-challenge.org/ISLES2015/; owned by ISLES. Lubeck, Germany. Segmentation data: Northeastern University, Boston, MA, US; segmentation data is included in this article and can be downloaded from this article
Data accessibility	Segmentation data is included with this article

Open in a new tab

Value of the Data

•
These data provide automatic segmentation of lesion in MRI using K-means and GMM-EM.
•
These data evaluate the performance of K-means and GMM-EM algorithms regarding MRI segmentation.
•
The data show that lesion size does not affect the performance of K-means and GMM-EM in lesion segmentation

Open in a new tab

1. Data

Magnetic Resonance Imaging (MRI) data were pre-processed. Instead of using the conventional method to manually segment lesion [1] which is time-consuming, inaccurate, and subjective, K-means and Gaussian Mixture Model-Expectation Maximization (GMM-EM) algorithms were applied to automatically segment lesion regions from the rest of brain tissue in MRI. The data included here provides the lesion segmentation results using K-means (dataset as K-means estimated labels.mat) and GMM-EM (GMM-EM estimated labels.mat) as well as the ground truth mask (ground truth mask data.mat). These three datasets are the estimated labels and ground truth mask of brain regions for all 28 subjects.

Fig. 1 shows the brain lesion segmentation using K-means. The best performance (Fig. 1 top row) shows that the estimated lesion regions (light blue) and the ground truth (yellow) match very well with the accuracy of 99.27%. The accuracy of K-means varies from subject to subject. And for some subject, the accuracy is only 56.96% (Fig. 1 bottom row).

GMM-EM is applied to segment brain lesion, since in each MRI image modality, the intensity of four different brain tissues follows Gaussian distribution approximately as shown in Fig. 2. The segmentation shows GMM-EM works well with the average accuracy of 85%. The estimated lesion regions (light blue) matches the ground truth lesion regions (yellow) well for the best performance subject (Fig. 3 top row) with the accuracy of 95%. While for some subjects, GMM-EM does not segment lesion correctly with healthy regions misclassified as lesion regions. Fig. 3 bottom row shows representative subject with accuracy of 89.02% and the edge of the brain is misclassified as lesion.

Fig. 2 — Signal intensity histogram of GM, WM, CSF, and lesion (if present) in T1-weighted, T2-weighted, FLAIR, and DWI MRI of a representative subject.

Fig. 3 — GMM-EM brain segmentation visualization. The best performance (top row) of GMM-EM have the accuracy of 95% and representative subject with accuracy of 89.02% (bottom row) shows GMM-EM misclassifies edge of the brain as lesion.

The performance metrics (accuracy, misclassification rate, sensitivity, specificity, and false positive rate) were calculated for both K-means and GMM as shown in Fig. 4. The accuracy, sensitivity, and specificity of K-means are 85 ± 11%, 67 ± 24%, and 86 ± 11% specifically. The accuracy, sensitivity, and specificity of GMM-EM are 84 ± 9%, 64 ± 25%, and 84 ± 10% specifically. There is no significant difference between K-means performance and GMM-EM performance (p-values of accuracy, sensitivity and specificity are: 0.6645, 0.7647, 0.5479). In addition, both K-means and GMM-EM performance varies from subject to subject.

Fig. 4 — Algorithm performance evaluation and comparison. There is no significant difference between K-means and GMM-EM in accuracy, sensitivity, and specificity.

When the algorithms were first applied to perform lesion segmentation, the intuition might suggest that the bigger the lesion size, the better the algorithms performance. However, Fig. 5 shows that there is little correlation between algorithms performance accuracy (sensitivity, and specificity) and the lesion volume. In Fig. 5, the lesion volumes were calculated by counting the number voxels labeled as lesion in mask imaging.

Fig. 5 — Lesion size does not affect the performance of K-means and GMM-EM algorithms.

2. Experimental design, materials, and methods

2.1. Data and feature extraction

Raw data were acquired from ischemic stroke lesion segmentation 2015 online database [2] (http://www.isles-challenge.org/ISLES2015/), and data is one of the two sub-taskes: sub-acute ischemic stroke lesion segmentation (SISS) training data with 28 subjects. Each of the 28 subjects contains T1-weighted, T2-weighted, FLAIR, DWI images and a lesion mask labeled by experts as ground truth as shown in Fig. 6.

Fig. 6 — MRI data acquired from ischemic stroke lesion segmentation 2015 online database. Images from 4 MRI modalities, T1-weighted, T2-weighted, FLAIR, and DWI, are available (first row), along with mask (ground truth in yellow) overlaid on T2-weighted image in different views (second row).

The flowchart of the work is shown in Fig. 7. After data were acquired, pre-processing was performed to make sure different images are in the same space. Then, features were extracted and normalized. K-means and GMM-EM were used to segment lesion from the rest of the brain tissue. Algorithms performance were evaluated by comparing the estimated lesion region with mask (ground truth).

In the pre-processing step, all images were co-registered to the standard space using MNI152 1 mm symmetric human brain atlas. In addition, for each MRI modality, images were intensity-normalized based on the average across all subjects so that features were consistent.

For each voxel, 25 features are extracted to feed into K-means and GMM-EM algorithms. The first four features are the signal intensity from T1-weighted, T2-weighted, FLAIR, DWI images. The next four are the intensities from the smoothed T1-weighted, T2-weighted, FLAIR, DWI images using a Gaussian kernel with sigma of 3 mm. Then the local information of each voxel within the brain mask is obtained using an 11 mm × 11 mm x 11mm cubic window of neighboring voxels centered at this voxel. More specifically, among more than 1 million voxels per subject, the mean, median, variance, 10th percentile and 90th percentile are calculated as four individual features for each voxel from its ±5 mm neighbors of 1330 voxels. These parameters contribute to features 9th through 24th features. The last feature was the distance of each voxel to the image center.

2.2. K-means clustering

K-means classifies n observations $X (x_{1}, x_{2}, \dots, x_{n)}$ into $k$ clusters with the aim at minimizing the distance function:

D i s t a n c e = \sum_{i = 1}^{k} \sum_{j = 1}^{n} x_{i j} - {C_{i}}^{2}

Where $C_{i} = \frac{1}{N_{i}} \sum_{x \in x_{i}} x, i = 1,2, \dots, k$ represents the $i$ th.

Cluster center.

The K-means algorithm:

1.
Initialize cluster centroids $C_{i}$ with $k$ random samples;
2.
Assign each observation $x_{i}$ to the nearest cluster center;
3.
Recalculate and update each cluster center $C_{i} = \frac{1}{N_{i}} \sum_{x \in x_{i}} x, i = 1,2, \dots, k$ ; where $N_{i}$ is the number of elements in the $i$ th cluster;
4.
Repeat steps 2 and 3 until $C_{i}$ does not change.

Here, in this paper, we assign voxels into 4 groups: white matter (WM), gray matter (GM), cerebrospinal fluid (CSF), and lesion if present.

2.3. Gaussian mixture model-expectation maximization

In regions where MRI signal is present with signal-to-noise (SNR) ≥ 3, noise follows a Gaussian distribution approximately [[3], [4], [5]]. The histogram of brain MRI with noise in presence can be represented by a Gaussian Mixture Model in which each tissue type such as white matter, gray matter, cerebrospinal fluid, lesion if present follows a Gaussian distribution. In this model, each voxel is assigned to one of the classes.

Gaussian mixture model can be defined as:

p (x) = \sum_{k = 1}^{K} π_{k} N (x | μ_{k}, Σ_{k})

Where $x$ is a d-dimensional observation vector, $π_{k}, k = 1, \dots ., K$ are the mixture weights that satisfy $0 \leq π_{k} \leq 1$ and $\sum_{k}^{K} π_{k} = 1$ , and $N (x | μ_{k}, Σ_{k})$ is a d-variate Gaussian density for the $k$ th mixture component as given by the equation:

N (x | μ_{k}, Σ_{k}) = \frac{1}{\sqrt{{(2 π)}^{1 / d} | Σ_{k} |}} exp {- \frac{1}{2} {(x - μ_{k})}^{T} {Σ_{k}}^{- 1} (x - μ_{k})}

where $μ_{k}$ is the $k$ th mean vector and $Σ_{k}$ is the $k$ th covariance matrix.

The parameters (including means, covariances and weights of each component) can be determined by maximizing the likelihood function.

EM Algorithm.

1.
Initialize means, covariances and the mixing coefficients and evaluate the initial value of the log likelihood.
2.
E step. Evaluate the posterior probability using the current parameter

γ (z_{n k}) = \frac{π_{k} N (x_{n} | μ_{k}, Σ_{k})}{\sum_{j = 1}^{K} π_{j} N (x_{n} | μ_{k}, Σ_{k})}

3.
M step. Recalculate the parameters using the current posterior and update the parameters

μ_{k}^{n e w} = \frac{1}{N_{k}} \sum_{n = 1}^{N} γ (z_{n k}) x_{n}

{Σ_{k}^{n e w} = \frac{1}{N_{k}} \sum_{n = 1}^{N} γ (z_{n k}) (x_{n} - μ_{k}^{n e w}) (x_{n} - μ_{k}^{n e w})}^{T}

π_{k}^{n e w} = \frac{N_{k}}{N}

where

N_{k} = \sum_{n = 1}^{N} γ (z_{n k})

4.
Evaluate the log likelihood

ln p (X | μ, Σ, π) = \sum_{n = 1}^{N} ln {\sum_{k = 1}^{K} π_{k} N (x_{n} | μ_{k}, Σ_{k})}

5.
Repeat step 2, 3, and 4 until the convergence criterion is satisfied.

Acknowledgments

The authors acknowledge the invaluable support from Ischemic Stroke Lesion Segmentation.

Conflict of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

1.van de Ven A.L. Nanoformulation of olaparib amplifies PARP inhibition and sensitizes PTEN/TP53-Deficient prostate cancer to radiation. Mol. Cancer Ther. 2017;16(7):1279–1289. doi: 10.1158/1535-7163.MCT-16-0740. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Maier O. Isles 2015 – a public evaluation benchmark for ischemic stroke lesion segmentation from multispectral MRI. Med. Image Anal. 2017;35:250–269. doi: 10.1016/j.media.2016.07.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Despotovic I., Goossens B., Philips W. MRI segmentation of the human brain: challenges, methods, and applications. Comput. Math Methods Med. 2015;2015:450341. doi: 10.1155/2015/450341. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Gudbjartsson H., Patz S. The Rician distribution of noisy MRI data. Magn. Reson. Med. 1995;34(6):910–914. doi: 10.1002/mrm.1910340618. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Gharagouzloo C.A. Quantitative vascular neuroimaging of the rat brain using superparamagnetic nanoparticles: new insights on vascular organization and brain function. Neuroimage. 2017;163:24–33. doi: 10.1016/j.neuroimage.2017.09.003. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib1] 1.van de Ven A.L. Nanoformulation of olaparib amplifies PARP inhibition and sensitizes PTEN/TP53-Deficient prostate cancer to radiation. Mol. Cancer Ther. 2017;16(7):1279–1289. doi: 10.1158/1535-7163.MCT-16-0740. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib2] 2.Maier O. Isles 2015 – a public evaluation benchmark for ischemic stroke lesion segmentation from multispectral MRI. Med. Image Anal. 2017;35:250–269. doi: 10.1016/j.media.2016.07.009. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib3] 3.Despotovic I., Goossens B., Philips W. MRI segmentation of the human brain: challenges, methods, and applications. Comput. Math Methods Med. 2015;2015:450341. doi: 10.1155/2015/450341. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib4] 4.Gudbjartsson H., Patz S. The Rician distribution of noisy MRI data. Magn. Reson. Med. 1995;34(6):910–914. doi: 10.1002/mrm.1910340618. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib5] 5.Gharagouzloo C.A. Quantitative vascular neuroimaging of the rat brain using superparamagnetic nanoparticles: new insights on vascular organization and brain function. Neuroimage. 2017;163:24–33. doi: 10.1016/j.neuroimage.2017.09.003. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Data on MRI brain lesion segmentation using K-means and Gaussian Mixture Model-Expectation Maximization

Ju Qiao

Xuezhu Cai

Qian Xiao

Zhengxi Chen

Praveen Kulkarni

Craig Ferris

Sagar Kamarthi

Srinivas Sridhar

Abstract

1. Data

Fig. 1.

Fig. 2.

Fig. 3.

Fig. 4.

Fig. 5.

2. Experimental design, materials, and methods

2.1. Data and feature extraction

Fig. 6.

Fig. 7.

2.2. K-means clustering

2.3. Gaussian mixture model-expectation maximization

Acknowledgments

Conflict of Interest

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Data on MRI brain lesion segmentation using K-means and Gaussian Mixture Model-Expectation Maximization

Ju Qiao

Xuezhu Cai

Qian Xiao

Zhengxi Chen

Praveen Kulkarni

Craig Ferris

Sagar Kamarthi

Srinivas Sridhar

Abstract

1. Data

Fig. 1.

Fig. 2.

Fig. 3.

Fig. 4.

Fig. 5.

2. Experimental design, materials, and methods

2.1. Data and feature extraction

Fig. 6.

Fig. 7.

2.2. K-means clustering

2.3. Gaussian mixture model-expectation maximization

Acknowledgments

Conflict of Interest

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases