Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2023 Oct 1.
Published in final edited form as: Magn Reson Imaging. 2022 Jun 7;92:45–57. doi: 10.1016/j.mri.2022.06.001

Automated Multimodal Segmentation of Acute Ischemic Stroke Lesions on Clinical MR Images

Hae Sol Moon a, Lindsay Heffron b, Ali Mahzarnia c, Barnabas Obeng-Gyasi c, Matthew Holbrook c, Cristian T Badea a,c, Wuwei Feng d, Alexandra Badea a,c,d,e,*
PMCID: PMC9949513  NIHMSID: NIHMS1868433  PMID: 35688400

Abstract

Magnetic resonance (MR) imaging (MRI) is commonly used to diagnose, assess, and monitor stroke. Accurate and timely segmentation of stroke lesions provides the anatomico-structural information that can aid physicians in predicting prognosis, as well as in decision making and triaging for various rehabilitation strategies. To segment stroke lesions, MR protocols, including diffusion-weighted imaging (DWI) and T2-weighted fluid attenuated inversion recovery (FLAIR) are often utilized. These imaging sequences are usually acquired with different spatial resolutions, due to time constraints. Within the same image, voxels may be anisotropic, with reduced resolution along slice direction, in particular, for diffusion scans. In this study, we evaluate the ability of 2D and 3D U-Net Convolutional Neural Network (CNN) architectures to segment ischemic stroke lesions using single contrast (DWI), and dual contrast images (T2w FLAIR, and DWI). The predicted segmentations correlate with post-stroke motor outcome measured by the National Institutes of Health Stroke Scale (NIHSS) and Fugl-Meyer Upper Extremity (FM-UE) index based on the lesion loads overlapping the corticospinal tracts (CST), which is a neural substrate for motor movement and function. Although the four methods performed similarly, the 2D multimodal U-Net achieved the best results with a mean Dice of 0.737 (95% CI: 0.705, 0.769) and a relatively high correlation between the weighted lesion load and the NIHSS scores (both at baseline, and at 90 days). A monotonically constrained quintic polynomial regression yielded R2 = 0.784 and 0.875 for weighted lesion load versus baseline and 90 days NIHSS respectively, and better Akaike information criterion (AICc) scores than those of the linear regression. In addition, using the quintic polynomial regression model to regress the weighted lesion load to the 90-Days FM-UE score results in an R2 of 0.570 with a better AICc score than that of the linear regression. Our results suggest that the multi-contrast information enhanced the accuracy of the segmentation, and the prediction accuracy for upper extremity motor outcomes. Expanding the training dataset to include different types of stroke lesions, and more data points, will help add a temporal longitudinal aspect and increase the accuracy. Adding patient-specific data may further improve the inference about the relationship between imaging metrics and functional outcomes.

Keywords: Multimodal MRI, Brain, Ischemic stroke, AI, Convolutional neural networks, Automatic lesion segmentation, Corticospinal tract, Lesion load

1. Introduction

Stroke remains one of the most pervasive and serious causes of long-term disability [13]. Of the patients who survive stroke, more than one-third remain disabled at different levels [4]. Studies have shown that rehabilitation and physical activity are important for the recovery and prevention of recurrent strokes by repairing or stimulating new neural connections in the brain through neuroplasticity [4, 5]. Personalized rehabilitation is important because each individual has a distinct brain structural damage and functional deficit caused by the stroke. A better understanding of anatomical and structural information of stroke lesions can aid clinicians in predicting prognosis, making therapeutic decisions, and in formulating rehabilitation plans. Adding knowledge on brain structural connectivity can help determine how stroke lesions affect remote brain regions connected by white matter tracts and predict patient recovery potential [6, 7].

Multimodal images are often acquired to distinguish stroke lesions from other acute or chronic injury [8]. These include CT images and MRI. CT scanners are widely available, and images can be acquired fast; however, it remains challenging to identify irreversibly damaged tissue on CT. MRI is versatile and offers multiple contrasts, including T1-weighted, T2-weighted, and diffusion-weighted images. T2-weighted FLAIR scans are relatively fast and can be used to study acute ischemic strokes [9]. Nevertheless, conventional MR sequences are not sensitive enough for an effective diagnosis of hyperacute infarction (within 6 hours after the onset of ischemia) and for correctly identifying the entire ischemic stroke lesions. Diffusion-weighted imaging (DWI) is currently the most sensitive sequence to identify super acute ischemic lesions [1013]. However, DWI usually is acquired with lower resolutions than that of the accompanying T2w-FLAIR, which leads to partial volume effects. Additional challenges are associated with distortions, and T2-shine through artifacts, which can be easily misidentified as acute stroke lesions [1316].

While manual segmentation is considered the gold standard, this process is time-consuming and incurs inter and intra-reader variability [17, 18]. Automated approaches can reduce bias, and significantly speed up processing for individual subjects or large databases.

Various methods have been proposed to address the need for fast and accurate stroke lesion segmentation, based on both CT [19, 20] and MRI [2124]. These methods include mathematical morphology, active contours, gaussian mixture models (GMM), support vector machines (SVM), random fields, etc. While these methods work well with small, study-specific datasets, they are limited in generalizations [25]. Deep learning methods have practically exploded in recent years, offering multiple avenues to address the current challenges in stroke lesion segmentation [2427]. Among these, we selected U-Nets because of their simplicity and pervasive usage for segmentation problems. The advantages of U-Net-based approaches include reliable segmentation results even with a limited training dataset, and the ability to effectively learn the variabilities of anatomical information and the shape of lesions [28, 29]. In addition, using a patch-based method makes it possible to train large MRI images by reducing memory requirements but may not be advantageous with clinical images with a limited number of axial slices. Recently, 2D and 3D U-Net based-methods have been proposed, where small patches of the images are selected from training in an encoder-decoder architecture [30, 31].

In this study, we evaluated single (DWI) and multimodal (DWI+FLAIR) 2D and 3D U-Net-based segmentation methods to assess whether multimodal methods overcome the physical shortcomings of single modality approaches. We only included DWI for singlemodal results because using only T2w-FLAIR to segment stroke lesions has shown to be unsatisfactory due to its less sensitivity to present infarct lesion hyperintensity than DWI in U-Net-based segmentation [32]. We explored whether a combination of both clinical MR protocols (T2w-FLAIR and DWI) can lead to a better segmentation of stroke lesions. We tested our methods with a cohort of ischemic stroke patients with both acute and chronic outcomes information, and evaluated the correlation between image-based metrics, such as stroke lesion, lesion overlapped on corticospinal tract and motor impairment, and global impairment outcomes. Beyond addressing an urgent need for a robust segmentation method of acute stroke lesions from diagnostic clinical scans, we provide insight into how this information can be used for patient motor outcome prediction [6, 10].

2. Methods

2.1. Data Source

In this prospective observational stroke outcomes study, brain scans of 79 acute ischemic stroke patients with clinical multimodality MRI and their outcome data at both baseline (2–7 days) and 90 days post-stroke were included. The main outcomes used in our study are the National Institutes of Health Stroke Scale (NIHSS) and the Fugl-Meyer Upper Extremity (FM-UE) index. Both outcome estimates were collected at baseline (2–7 days post-stroke), and subsequently at 90±15 days post-stroke. These scans were acquired at several local hospitals with different 1.5 T MR scanners. Each subject was imaged with clinical protocols including a T2w-FLAIR and a DWI sequence at baseline, which was typically within 72 hours from stroke onset. FLAIR images were acquired with matrix sizes between 424~480 × 512 × 24~30, and DWI images were acquired with matrix sizes between 166~192 × 166~192 × 25~30. The resolution of FLAIR images was 0.43 mm × 0.43 mm × 6 mm and the resolution of DWI was either 1.325 mm × 1.325 mm × 6 mm or 1.198 mm × 1.198 mm × 6 mm. The b values were 1000 s/mm2. The clinico-demographic information of the patient population is shown in Table1.

Table 1.

Clinico-demographic profile of the stroke patients.

Items N=79 (%)

Gender
 Female 44 55.70
 Male 35 44.30
Ethnicity
 African American 36 45.57
 White/non-Hispanic 22 27.85
 Hispanic 13 16.46
 Others 8 10.13
Stroke hemisphere
 Right 41 51.90
 Left 36 45.57
 Both 2 2.53
Stroke location
 Cortical 18 22.78
 Subcortical 22 27.85
 Both 39 49.37
Mean SD
Age (years) 58.60 14.28
Baseline NIHSS 9.33 6.09
90 days NIHSS 3.96 4.16
Baseline FM-UE 24.97 20.38
90 days FM-UE 30.16 26.56
Days after onset to first evaluation 3.49 1.76
Days of stay in hospital 98.43 11.71
Days of discharge to follow up visit 87.78 17.97

2.2. Pre-processing

We registered all the images in the database to the same space and dimensions using the Advanced Normalization Tools (ANTs) [33]. The workflow for the pre-processing steps is shown in Fig. 1. First, all FLAIR images were skull-stripped using the Kirby/MMRR template [34] and included the cerebellum probability mask to provide anatomical priors [35]. Then, we selected a reference FLAIR image, with dimensions of 480 × 512 × 30 and this was zero-padded to change the dimensions uniformly to 512 × 512 × 32. All DWI images were registered to the corresponding FLAIR image of the same patient using rigid, affine, and diffeomorphic transforms, employing the symmetric image normalization option, with SyN=0.5 [36]. FLAIR images in the database were registered using a rigid transformation to the zero-padded reference FLAIR image. We applied the derived transformation to the DWI images, registering those to the reference FLAIR image, thus the entire dataset was brought into the same space. To compensate for the non-uniformity of intensity across FLAIR and DWI images for the different subjects in the database, through a whole-brain intensity standardization, we normalized the intensities of images to the same scale, by subtracting the mean voxel values from each voxel and dividing by the standard deviation (SD) of voxel values in an image. Such normalization is expected to speed up convergence, enhancing compatibility for network training and quantitative analysis.

Fig. 1.

Fig. 1.

Pre-processing flow chart. (A) A reference FLAIR image (subject 1) was zero-padded and all FLAIR images in the database were skull-stripped. (B) DWI image of the corresponding subject of reference FLAIR image was registered to the reference FLAIR image. (C) All FLAIR images in the database were registered to the zero-padded skull-stripped reference FLAIR image. (D) All DWI images in the database were registered to the corresponding FLAIR images of the same patient.

2.3. CNN Network Architecture Design

In this study, the manual labels for stroke lesion segmentation were delineated using ITK-SNAP [37] for all slices where lesion appeared. We evaluated four variations of 2D, and 3D single (DWI) and multimodal (DWI+FLAIR) U-Net-based encoder-decoder architectures, for a total of four architectures: 2D multimodal, 2D singlemodal, 3D multimodal, and 3D singlemodal. The U-Net architecture was chosen since it has been shown to work with fewer training images than other deep learning methods while maintaining reliable segmentation [25]. In the 3D multimodal U-Net network, both FLAIR and DWI images each with a size of 512 × 512 × 32 were provided as input to the network, together with the manually labeled lesion masks. The network consisted of two downsampling and two upsampling convolutional blocks. Each downsampling block consisted of a convolutional layer with stride size of 1 followed by a batch normalization layer and a rectified-linear unit (ReLU) activation layer, and then followed by another convolutional layer with stride size of 2 to downsample by a factor of 2, batch normalization, and a ReLU layer. Each upsampling block consisted of a convolutional layer with stride size of 1, batch normalization, ReLU, followed by a transposed convolution with stride size of 2 to up-sample by a factor of 2, and then followed by batch normalization and ReLU. Each convolutional layer with a stride size of 1 in the encoder was fused with a corresponding transposed convolution layer in the decoder. Finally, a convolutional layer, followed by a sigmoid layer was added to obtain predicted probabilistic segmentation masks. The schematic of the network is shown in Fig. 2 and a detailed summary can be found in Table 2. The difference between the 2D and 3D architecture is that in 2D, back-to-back convolutional layers with stride size of 1 are used in a convolutional block and the input is fed into the network slice by slice, whereas in 3D the input is a whole volume at a time.

Fig. 2.

Fig. 2.

Schematic of multimodal U-Net architecture. BN: batch normalization; ReLU: rectified linear unit.

Table 2.

Summary of the four methods and their parameters used for evaluating Ischemic Stroke lesion segmentation

Method 2D Multimodal U-Net 2D U-Net (DWI) 3D Multimodal U-Net 3D U-Net (DWI)

Input Image Size 512 × 512 × 2 512 × 512 × 1 512 × 512 × 32 × 2 512 × 512 × 32 × 1
Starting Filter Number 64 64 32 32
Kernel Size 5 × 5 5 × 5 5 × 5 × 5 5 × 5 × 5
CNN Activation Type ReLU ReLU ReLU ReLU
Activation Layer Sigmoid Sigmoid Sigmoid Sigmoid
Layer Depth 2 Down & 2 Up 2 Down & 2 Up 2 Down & 2 Up 2 Down & 2 Up
Optimizer / learning rate Adam / 1e-5 Adam / 1e-5 Adam / 8e-4 Adam / 8e-4
Loss Function Type Binary Crossentropy Binary Crossentropy Binary Crossentropy Binary Crossentropy
Multi-GPU 4 4 4 4
Training Batch Size 50 50 8 8
Epochs 200 200 200 200
Testing Method 6-Fold Cross Validation 6-Fold Cross Validation 6-Fold Cross Validation 6-Fold Cross Validation

2.4. Experimental Settings and Implementation

We split the data in a proportion of 80:20 for training and testing, in a 6-fold-cross validation scheme to optimize hyperparameters of the network and evaluate results. The first fold includes 65 training sets and 14 test sets, and five folds include 66 training sets and 13 testing sets.

Due to the relatively small number of subjects, we did not hold out a validation set; the test data was used to evaluate the performance of the 4 networks. Once we determined the best network type, we optimized the weights of the final model using the whole data set. The best network type was selected based on average performance.

The hyperparameters for our networks, including the filter numbers, kernel sizes, layers, etc. were optimized experimentally, after running 200 epochs, and are shown in Table 2.

The method and models were implemented using Python 3.7, TensorFlow 2.4, and Keras 2.4. For computations, we used a Lambda compute system, equipped with 4 NVIDIA Quadro RTX 8000 GPU cards, each with 48 GB memory, and Centos 7 OS. Training required ~24 hours for both 2D and 3D networks and took only a few seconds to segment a single subject using a total of 192 GB of memory.

2.5. Segmentation Evaluation

The performance of the segmentation task was quantitatively evaluated by calculating the spatial overlap between the ground truth mask volumes and the predicted volumes. After obtaining the predicted lesion masks, the masks were binarized after thresholding. The decision thresholds were based on the intersection of the precision and recall curves, to find the optimum separation between our two classes: the lesion, and the background. They were chosen so that the precision and recall curves would intersect. First, the Dice similarity coefficient (DSC) was used to evaluate pixel-wise segmentation performance. The other metric evaluated was the accuracy as the proportion of true positive and true negative in all evaluated cases. Higher values of these metrics indicate more accurate segmentation performance. We also evaluated precision, or the ability of our models to identify only the relevant data points. As we increase precision, we decrease recall, the ability of a model to find all the relevant cases within a dataset. Then, we calculated specificity, which shows true negative rates (or proportion of negative results that are genuinely negative).

DiceSimilarityCoefficient=2TP2TP+FP+FN (1–1)
Accuracy=TP+TNTP+TN+FP+FN (1–2)
Precision=TPTP+FP (1–3)
Recall/Sensitivity=TPTP+FN (1–4)
Specificity=TNTN+FP (1–5)

where TP is true positive, TN is negative, FP is false positive, and FN is false negative. Finally, we evaluated segmentation results using 95th percentile of Hausdorff distance (HD95), which indicates the maximum distance between the ground truth and the segmentation delineation [38].

2.6. Comparison with Tractography and Clinical Outcomes

To evaluate the impact of lesion size and position relative to white matter tracts involved in voluntary motor function, i.e., the corticospinal tract (CST), we mapped an atlas derived CST onto each subject. We extracted the CST from Illinois Institute for Technology (IIT) atlas [39]. The IIT B0 map was registered to reference FLAIR images with rigid, affine, and diffeomorphic SyN transformations. The transformations were applied to the CST mask with B-spline interpolation to register the CST into our subject space. Then we calculated the lesion load and a weighted-lesion load that estimate the amount of lesion overlap with the CST. The weighted lesion load accounts for the narrowing of CST tracts in the lower brain by introducing weights to lesion load. The lesion load and binary weighted lesion load are defined as follows:

lesionload=i=1nLn*Cn*v (2–1)
Wn=m*m(n) (2–2)
weightedlesionload=i=1nL(n)*C(n)*v*Wn (2–3)

where n is the slice position, L(n) is the stroke lesion and C(n) is the CST mask in slice n, v is voxel volume, W(n) are slice-based CST weights, m(n) is CST mask area for slice location n, and m* is the maximum CST mask slice area [7].

We asked whether lesion size and location predict motor function recovery after stroke, as measured through the NIHSS and FM-UE. The NIHSS is a standardized outcome measure used to measure the global stroke severity in clinical and research settings. Items on the NIHSS are scored on an ordinal scale and summed to a total score with a range of 0 to 42, where the higher score indicates greater stroke severity [40]. The NIHSS has adequate intra-rater and inter-rater reliability and validity [4042]. The NIHSS was administered and rated at 2–7 days and at 90±15 days post-stroke by trained study staffs [40]. The FM-UE is a standardized outcome measure that is designed to measure the of upper extremity motor impairment recovery after stroke [43]. The FM-UE has a maximum score of 66 with lower scores indicating more severe motor impairment [43]. It has excellent intra-rater and inter-rater reliability and acceptable levels of construct validity [4446]. In this study, both the data collected from the NIHSS and the FM-UE was analyzed because NIHSS has been universally collected in the acute stroke phase, and FM-UE is used more widely as a metric of recovery after rehabilitation to quantify functional outcome [4749].

In the case of an imbalanced data set that is biased towards more subjects having smaller lesions, we would expect smaller severity as measured by NIHSS. Also, we would expect the presence of outliers to be more vivid in the patients with smaller lesions. Hence, we will need a dynamic outlier removal strategy on the severity metric based the lesion size.

To address the challenges of an imbalanced data set, with more subjects having smaller lesions, we developed an outlier removal strategy to better balance the distribution of lesion metrics. We used the lesion metrics that include lesion size, lesion load, and weighted lesion load and defined them as three independent variables in our three regression analyses. Respectively, the clinical outcome metrics are the baseline NIHSS and the 90 days FM-UE as dependable variables in these three regression analyses. We consider higher values of x as its observed values above the 0.875 quantile (aka 87.5th percentile). We denote this value of the distribution by 0.875Q in subsequent plots. The 0.875 quantile is defined as the median between the third quartile (0.75 quantile) and the maximum. For example, the distribution of x values (lesion size) shows that there are 9 values above 0.875 quantile of x, and 66 data points below it. To achieve a balanced dataset, we separated the dataset into two sets; those associated with large x values and those not associated with the large values of x before we apply a dynamic outlier removal strategy to each of these subsets. The reason for such a separation is that we would have removed much more data points from the 9 high values (more than half of them) had we applied a uniform outlier removal strategy to the whole data set. Then, it would have resulted in imbalanced data set of 66 small-valued and 4 large-valued in this example. Such an example of imbalanced distribution of lesion size is shown in the boxen plot in Fig. 3A with the 0.875 quantile marked. The boxen plot is a similar plot to boxplot with more quantiles included for better visualization. We applied a dynamic outlier removal strategy on the y values (e.g., NIHSS) on the left portion (small-valued × data points) and right portion (large-valued × data points). The conventional Tukey inner fence for outlier removal detects and removes the outliers if the values satisfy one of the following criteria [50]:

Y<Q11.5IQR,Y>Q3+1.5IQR (3)

Fig. 3.

Fig. 3.

Outlier removal strategy. (A) Boxen plot of x (lesion size) shows data distribution in reference to 0.875 quantile. (B) Box plot of y (baseline NIHSS) for small-valued y relative to 0.875 quantile. We applied a dynamic outlier removal strategy based on the y values on the left portion (small-valued y data points) and right portion (large-valued y data points).

Fig. 3B shows the box plot of y associated with small-valued x (the left portion of data that includes 66 data points). Since most values lie between Q1 and Q3 and all values lie within the traditional Tukey inner fence (Eq. 3), the conventional method fails to remove any data points from the left portion to achieve a spatial balance. Thus, we propose the following method as a modification of the conventional outlier removal method. We will detect outliers and remove such data points that satisfy any of the following criteria:

Y<Q2w*IQR,Y>Q2+w*IQR (4)

where w is the weight defined differently for large-valued x (right portion) and small-valued x (left portion) so that the distribution of x would be balanced in terms of large and small values at the end. Although Q2 with the weight of 1.5 IQR has been used in robust statistical methods to separate negative and positive anomalies from the background [51], there is a need for different weights for each side to balance out the final data distribution. To remove more data from the left side, where the data is much more clustered, we propose a smaller window on the left side by assigning a smaller value of weight on the left side than on the right side. The weights of 0.3 and 2 were chosen for left and right portions respectively given x is the lesion size and y is the baseline NIHSS. After outlier removal in this example, there were 26 data in lesion size vs baseline NIHSS, almost half-half proportion of high and low x values. To keep a considerable consistency in our methodology and regression analyses with different dependent and independent variables, we applied the same outlier removal strategy with the same windows to the other regression analyses. We used the first order linear regression to regress the lesion metrics to the clinical metrics. A similar set of regression studies was performed in previous clinical studies [6, 7, 52]. However, if the population model between the independent variable (lesion metrics) and dependent variable (clinical metrics) is nonlinear then a linear model regression on the sample taken from such population may not entirely explain the variability in the dependent variable (clinical metrics) as much as it could have explained, had it been a nonlinear regression [53]. Therefore, we also incorporated a monotonically increasing/decreasing constrained polynomial regression to fit lesion metrics to clinical metrics. We calculated R2 values for the regression fits as:

R2=1RSSTSS (5)

where RSS is the sum of squares of residuals and TSS is the total sum of squares. Due to a limitation of interpreting differences between linear and polynomial regressions such as lack of p-value in polynomial regression, we also report the corrected Akaike information criterion (AICc) score to compare linear and polynomial regression models [54]. AIC can be used to compare different orders of regression models within the same dataset (after outlier removal) and the corrected AIC (AICc) accounts for potential overfitting of AIC from small sample size [55]. The equation of AICc is as follows:

AICc=2ln(^)+2k+2kk+1nk1 (6)

where ^ is RSSn, n is the sample size, and k is the number of parameters estimated by the model. We chose the fifth-degree polynomial regression based on its higher AICc criteria relative to the lower degree polynomial regression models. We did not use a higher degree than fifth because the AICc scores did not improve significantly, and to avoid possible overfitting.

3. Results

We constructed four types of networks including 2D multimodal, 2D singlemodal, 3D multimodal, and 3D singlemodal, (see Table 2), and evaluated their performance on a data set of N=79 subjects, with mean stroke lesion volumes of 56.25 ± 74.29 and the volumes ranging from 0.37 to 289.42 cc, using a single (DWI), or dual MRI contrasts (DWI, T2w-FLAIR).

3.1. Optimizing Segmentation Performance

Training and testing loss plots are illustrated in Fig. 4. As shown in the plots, both multimodal and singlemodal 2D methods converged faster, at around 20 epochs, and were more robust with fewer spikes than 3D methods in training. The training and testing loss of 3D methods converged to a stable point after ~80 epochs. During testing, we found spikes at the beginning and around 50 to 70 epochs for the 3D methods, but the loss plots became stable towards the end of training. A more thorough exploration of the grid search optimization of network parameters is shown in Supplementary Table 1.

Fig. 4.

Fig. 4.

Training and validation errors over 6 folds, with 95% confidence intervals.

In order to apply thresholding for the binarization of predicted lesion masks, the decision thresholds were determined to be where the precision and recall curves intersect as shown in Figure 5. The precision and recall curves for the 2D multimodal U-Net method yielded a decision threshold of 0.4, where values greater or equal to 0.4 were classified as lesion and values smaller than 0.4 were classified as the background. Similarly, the decision thresholds were 0.4, 0.6, and 0.5 for 2D singlemodal (DWI) U-Net, 3D multimodal U-Net, and 3D singlemodal (DWI) U-Net respectively. All results were calculated after the binary classification using the decision thresholds for each method.

Fig. 5.

Fig. 5.

Decision threshold optimizations. The intersections of precision/recall are selected as optimal thresholds.

3.2. Segmentation Evaluation

Violin plots of the Dice coefficients for the four methods are shown in Fig. 6, and indicate their similar performance for the 2D multimodal, 2D DWI, and 3D multimodal methods. The one-way ANOVA test yielded (F(2, 79) = 0.12, p = 0.73), (F(2, 79) = 0.05, p = 0.83) and (F(2, 79) = 0.49, p = 0.48) when comparing 2D multimodal methods with 2D singlemodal, 3D multimodal and 3D singlemodal respectively.

Fig. 6.

Fig. 6.

Quantitative results for segmentation accuracy for the four methods we have tested.

The one-way ANOVA between the Dice scores of 2D multimodal U-Net and other methods indicated no significant differences. Based on its outperformance across various quantitative figures of merit as shown in Table 3, the 2D multimodal U-Net was chosen for the rest of the analysis in this paper.

Table 3.

Quantitative evaluation of four architectures used for lesion segmentation.

Methods Mean Dice (95% CI) Median Dice Min-Max Range Dice HD95 Accuracy Precision Sensitivity Specificity

2D Multimodal U-Net 0.737 (0.705, 0.769) 0.783 0.297 – 0.932 22.047 0.998 0.758 0.755 0.999
2D U-Net (DWI) 0.729 (0.695, 0.762) 0.750 0.281 – 0.928 32.254 0.998 0.729 0.763 0.999
3D Multimodal U-Net 0.731 (0.697, 0.766) 0.774 0.176 – 0.920 21.694 0.998 0.761 0.757 0.999
3D U-Net (DWI) 0.719 (0.679, 0.759) 0.773 0.001 – 0.931 20.503 0.998 0.749 0.759 0.999

Qualitative results of 2D multimodal U-Net are illustrated in Fig. 7. The first column shows DWI images, the second column shows FLAIR images, and the last column displays segmentation results and ground truth lesion labels overlaid on FLAIR images. Each row shows images of different types of lesions. The first three rows show lesions that are visible on both images and produced reliable segmentation results. The lesion at the bottom right region is clearly shown in DWI but not very clear on FLAIR images. Our method was able to segment the main lesion near the right striatum and the cortical lesion too. The second subject had a larger lesion on the right hemisphere which is clearly shown in both DWI and FLAIR. The method segmented the lesion accurately. The third patient had a very small stroke lesion on the left. There were periventricular lesions present as shown in the FLAIR image. The method was able to segment the small lesion correctly without including the periventricular lesions, which appeared brighter than the normal tissue in the FLAIR images. The last patient shown had a lesion in the brainstem, characterized by low contrast in DWI and this could not be distinguished in the FLAIR image. The CNN method was able to partially segment the lesion in the brainstem but also included T2 shine-through artifacts present in the DWI image in the right cerebellum. This is most likely due to the very limited number of lesions in the brainstem in the dataset.

Fig. 7.

Fig. 7.

Qualitative results. The first column shows DWI images, the second column shows FLAIR images, and the third column shows overlaid ground truth and predicted stroke lesion masks on FLAIR images. Each row shows images from a different patient.

Linear models assessing the relationships between the lesion sizes and Dice scores are displayed in Fig. 8. Ideally, a segmentation method should perform robustly regardless of the lesion size, but because Dice score is a measure of volume overlap, bigger regions generally yield higher Dice scores than smaller lesions.

Fig. 8.

Fig. 8.

Segmentation accuracy versus lesion volume for all four network architectures.

We examined the relation between the Dice score and lesion size and observed R2 = 0.195, p < 0.0001; R2 = 0.275, p < 0.0001; R2 = 0.246, p < 0.0001; and R2 = 0.184, p < 0.0001 for 2D multimodal, 2D singlemodal, 3D multimodal and 3D singlemodal respectively. Our results indicated a weak but significant relationship between Dice scores and the lesion size, suggesting other variables come into play (e.g., lesion type, contrast, location, shape).

As shown in Fig. 7, there were no Dice scores less than 0.25 in the 2D methods. However, there were several instances of very poor segmentation for both 3D methods (n=4). The general trends of the histograms are similar, and median values were ~0.8.

3.3. Comparison with Corticospinal Tractography and Clinical Outcomes

We evaluated the relation between motor outcome and the position of the lesion in relation to white matter tracts. Corticospinal tracts derived from DWI images in the IIT atlas were registered onto each subject, as illustrated in Fig. 9. Accurate registration is needed for a reliable calculation of lesion loads, a quantitative figure of merit that represents the amount of overlap between stroke lesions and CST. Lesion load values ranged from 0.13 to 25.17, with a mean and SD of 5.37 ± 5.71 cc. Weighted lesion load values ranged from 0.42 to 41.63 with mean and SD of 8.91 ± 9.49 cc [6].

Fig. 9.

Fig. 9.

Lesion loads were calculated from the overlap of the lesion (red) with the corticospinal tract (yellow), shown here as overlays on the DWI for one example subject.

Plots of baseline NIHSS score versus lesion metrics are shown in Fig. 10, and plots of 90 days FM-UE score versus lesion metrics are shown in Fig. 11. The baseline NIHSS and 90-days FM-UE were compared with lesion size, lesion load and weighted lesion load calculated from segmentation results of the 2D multimodal method which was chosen for its largest Dice coefficient.

Fig. 10.

Fig. 10.

Left to right: Baseline NIHSS score versus predicted lesion size, lesion load and weighted lesion load for 2D multimodal segmentation method. The first row shows linear regression results, and the second row shows the quintic regression results. For linear regression, the reports from left to right are: (R2 = 0.666, P < 0.0001 and AICc = 2.721); (R2 = 0.714, P < 0.0001 and AICc = 2.920); (R2 = 0.687, P < 0.0001 and AICc = 2.910). For the quintic regression, the readings are (R2 = 0.695 and AICc = 14.318); (R2 = 0.775 and AICc = 14.485); (R2 = 0.784 and AICc = 15.131) for baseline NIHSS as y vs lesion size as x, vs. lesion load as x and vs. weighted lesion load as x respectively.

Fig. 11.

Fig. 11.

Left to right: 90 days FM-UE score versus predicted lesion size, lesion load and weighted lesion load for 2D multimodal segmentation method. The first row shows linear regression results, and the second row shows the quintic regression results. For linear regression, the reports from left to right are: (R2 = 0.382, P < 0.01 and AICc = 0.191); (R2 = 0.495, P < 0.0001 and AICc = 0.0306); (R2 = 0.451, P < 0.001 and AICc = 0.336). For the quintic regression, the readings are (R2 = 0.557 and AICc = 12.859); (R2 = 0.541 and AICc = 11.635); (R2 = 0.570 and AICc = 12.918) for 90 days FM-UE as y vs lesion size as x, vs. lesion load as x and vs. weighted lesion load as x respectively.

A linear regression of baseline NIHSS versus lesion metrics yielded R2 values of 0.666, 0.714 and 0.687. The parametric two-tailed hypothesis testing on the slope of these three regressions results in p-values smaller than 0.0001, which speaks of the significance of population correlation between the independent and dependent variables. The AICc values of these regressions are 2.721, 2.920 and 2.910 for lesion size, lesion load and weighted lesion load respectively. Using the constrained quintic regression in these three data sets improved the R2 values to 0.695, 0.775 and 0.784, and yielded significantly better AICc scores of 14.318, 14.485 and 15.131 in the same order.

A linear regression of 90 days NIHSS versus lesion metrics yielded R2 values of 0.704, 0.546 and 0.626 for linear model, and 0.847, 0.683 and 0.875 for the quintic models respectively. The AICc values were 3.844, 3.442 and 3.838 for linear regression while they were 17.751, 17.088 and 18.546 for the quintic regression for 90 days NIHSS compared with lesion size, lesion load and weighted lesion load respectively.

The linear regression of baseline FM-UE versus lesion metrics did not yield significant results.

A linear regression of 90 days FM-UE versus lesion metrics yielded R2 values of 0.382, 0.495 and 0.451 for linear model, and 0.557, 0.541 and 0.570 for the quintic models respectively. The AICc values were 0.191, 0.0306 and 0.336 for linear regression while they were 12.859, 11.635 and 12.918 for the quintic regression for 90 days FM-UE compared with lesion size, lesion load and weighted lesion load respectively.

Our results support that baseline lesion metrics correlated well with baseline NIHSS and 90 days NIHSS as global impairment indices, and with 90 days FM-UE as a motor impairment recovery index.

The relationships between image derived parameters and clinical outcomes are presented in Supplementary Table 2.

4. Discussion

In this study, we examined the use of deep learning for stroke lesion segmentation of routine clinical MR scans with single and dual MRI contrasts, and the relation between derived imaging biomarkers, e.g., lesion size and lesion load against the corticospinal tract and post-stroke functional outcomes. We have implemented 4 U-Net approaches for automated lesion segmentation, using single (DWI) or multimodal (FLAIR, and DWI) imaging protocols on one hand, and 2D or 3D approaches on the other hand. Our results yielded similar accuracies for all 4 approaches, with mean Dice coefficients of ~0.73. The time required to train four models were 24.6 hours, 23.0 hours, 23.4 hours, and 28.0 hours for 2D multimodal, 2D single modal, 3D multimodal, and 3D Single modal methods. After training, the segmentation of a single subject lesion takes only a few seconds using GPU computing, which makes our method a viable tool in clinical research or clinical practice. Although by a very small margin (i.e., 0.009), the 2D multimodal approach was the winner with a median Dice coefficient of 0.783, and the runner-up was the 3D multimodal method with 0.774. Our results are consistent with the recently published results using routine clinical MR scans which also found that 2D U-Net performed marginally better than 3D U-Net [25, 5661].

Our main motivation was to overcomes challenges associated with manual segmentation and qualitative only approaches for lesion assessment. First, manual segmentation is time-consuming and prone to inter and intra-reader variability [15, 16]. Second, because of the time-sensitive nature of ischemic stroke, quick and reliable segmentation of stroke lesions along with quantitative evaluation of the lesion state can assist clinicians for decisions making in acute care and rehabilitation planning [6, 62, 63]. There has been remarkable progress in stroke detection and segmentation with the advent of deep learning methods. These methods are less time-consuming, less prone to inter and intra reader-variability and can be generalizable. The primary feature of the stroke lesions present in MRI images is the hyperintense region caused by vessel occlusion and the area of restricted diffusion appears brighter in DWI and FLAIR images. There have been multiple developments of CNN architectures to detect and quantify ischemic stroke lesions, from U-Nets to more complex networks such as Ensemble U-Net and HyperDenseNets [64, 65] that can aid lesion segmentation. However, in situations where there are a limited number of images, limited axial slices, diverse imaging quality from different scanners, and heterogeneous lesion shapes, volumes, and locations, complex model predictions are likely to overfit. Thus, it is necessary to choose an architecture type that is simple and can work with multimodal images to incorporate both DWI and FLAIR. While DWI is considered the more sensitive protocol, partial volume effects and artifacts are present in DWI images. FLAIR sequence adds the benefit of higher resolution. Thus, we tested and optimized U-Net-based architectures for single and multimodal contrast approaches. To overcome overfitting, we incorporated decision thresholds based on precision and recall [26, 66], by choosing where the precision and recall curves met for each method (Fig. 5).

We expected that 3D U-Net would outperform a 2D U-Net but the 2D multimodal U-Net, though not significant, outperformed both 3D multimodal and singlemodal U-Net methods. However, our 2D methods had 5.9M parameters and the 3D methods had 3.8M parameters, which can increase the risk of overfitting in the 2D case. Moreover, there could be a significant loss of axial information from highly anisotropic voxels, with in slice pixel size of 0.43 mm, and large slice thickness of 6 mm in our clinical images. This likely caused the 3D segmentation to be less effective in the generalization of a model than the 2D, as found in previous studies that used anisotropic MR images [5761]. Also, the number of training images was greatly reduced during 3D training, which may hinder the learning process [57].

The 2D methods used 41 Gb of GPU memory and the 3D methods used 192 Gb. The 2D methods required significantly less GPU memory for training than the 3D methods, therefore, they were more time-efficient and cost-effective. Furthermore, the 2D methods can be advantageous in multi-institutional segmentation studies, where the number of axial slices might differ depending on the protocol adopted at each clinic. Therefore, it would be easier to register and fine-tune segmentation for 2D images than 3D images. Fine-tuning and transfer learning may be necessary if we seek to use the trained model on a dataset from a new institution with different slice thicknesses or image dimensions. The quantitative difference between multimodal and singlemodal segmentation methods was not significant if we only retained the DWI scan. This means that the network was able to segment lesions, while catching the pattern of the white matter lesions in FLAIR images, T2 shine-through artifacts, and successfully compensated for partial volume errors present in DWI images. In our study, we did not include a network based on only FLAIR images, which have higher resolution, but our rationale was based on the enhanced sensitivity of DWI scans.

Our results agree with previous publications relating lesion metrics with NIHSS (R2 = 0.483, at 90 days) [67]. In our study, the lesion metrics significantly correlate with baseline NIHSS (R2 > 0.666, P < 0.0001), and 90 days NIHSS (R2 > 0.546, P < 0.001), with a max R2 = 0.875 between the weighted lesion load and NIHSS at 90 days for the quintic regression. Our results support that all baseline lesion metrics predict the 90 days NIHSS well, with all R values greater than 0.8.

The quintic regression with weighted lesion load and 90 days FM-UE results AICc = 12.918 and R2 = 0.570, better than other metrics where we have tested for the non-linear relationship between these two variables. Neither the linear nor the non-linear regression models with baseline FM-UE performed well. These results are in overall agreement with values reported in previous studies that showed larger R2 values between 90 days FM-UE and weighted lesion loads in linear regression, which were 0.64 and 0.71 [6, 52]. However, in our case, lesion load had the highest correlation in linear regression, and weighted lesion load had the highest correlation in monotonically constrained polynomial regression of the 90 days FM-UE with R2 = 0.570. In previous studies, the weighted lesion load performed better than the lesion load for linear regression because of its advantage of accounting for narrowing of CST in the brain stem area, by assigning larger weights to slices where there are fewer pixels of CST [6, 7]. However, in our study, weighted lesion load yielded lower R-squared values than those of raw lesion load for 90 days FM-UE in linear correlation with the clinical metrics. A possible explanation is that the weighted lesion load assigns a weight to each slice, but we had limited axial resolution (6 mm) and no subject specific CST, but a binarized atlas derived CST. This may have reduced the performance of the weighted lesion load in linear regression. However, our monotonically constrained polynomial regression yielded higher correlation for the weighted lesion load relative to that of lesion load for all clinical metrics.

There are several limitations to our study. First, there were several cases where the network failed to segment correctly. For example, the network did not perform well with brain stem stroke lesions as shown in the last column of Fig. 6. There are only two patients with brain stem lesions in our dataset. In addition, for small lesions there are only one or two slices of the lesions that are available. As a result, the network had a very limited training data set to catch the patterns of these cases. Secondly, as shown in Fig. 8, there was a widespread distribution of lesion volumes with numerous small lesions. It is generally more challenging to segment smaller lesions compared to larger ones. This had introduced imbalanced distribution of stroke lesion anatomy in our dataset. There were instances where the lesions were not easily detected by a human observer. Lastly, the limited spatial resolution for diffusion in clinical scans resulted in much larger slice thicknesses and smaller number of axial slices. While our methods were tested in patients with ischemic stroke only, we have not evaluated patients with hemorrhagic stroke.

Our proposed automated pipeline includes ischemic stroke lesion detection and segmentation with multimodality MRI scans obtained from clinical protocols, followed by quantitative evaluations of lesion sizes, and loads (through the overlap on the corticospinal tract), and assessment of the ability to predict motor and global impairment and recovery. In addition, this study is a step towards building a larger database that may include MR images, delineations, and quantitative evaluations such as lesion loads as well as clinical outcomes including NIHSS, FM-UE and other motor outcome metrics. Examining lesion properties from larger studies, including data at the subacute and chronic stages could lead to more robust results and more personalized models. Recent efforts have led to the creation of large-scale and longitudinal stroke neuroimaging datasets. Mining such resources such as ENIGMA Stroke Recovery [68] can lead to better models predicting infarct progression and functional recovery after stroke. Moreover, deep learning models utilizing image features coupled with clinico-demographic information about the patient could yield more accurate results, taking us closer towards personalized medicine. Although we have used 2D U-Net for simplicity to minimize the chance of overfitting due to limited data which has also been shown to be effective in previous studies [25, 56], other types of architectures might be effective too. In the future, we hope to explore and optimize other types of architectures. It is possible to add regional segmentation of lesions such as the cores and penumbra regions to perform quantitative evaluations and see whether these specific lesion regions can help better predict patient outcomes. Adding longitudinal scans and subject-specific information can also help improve the prediction of lesion progression and functional outcomes.

5. Conclusion

A fast and reliable segmentation of acute ischemic stroke lesions can aid with outcome prediction and clinical decision-making. A pipeline that can quickly segment stroke lesions from clinical MRI scans and provide various quantitative evaluations will greatly help clinicians improve acute stroke care and better triage neurorehabilitation. In this study, we focused on building such a framework that includes pre-processing, segmentation, and outcome prediction. We tested 2D and 3D, singlemodal and multimodal U-Nets. 2D multimodal U-Net achieved the best performance with mean Dice, median Dice, and Dice range of 0.737 (95% CI: 0.705, 0.769), 0.783, and 0.297–0.932. Thus, 2D multimodal was chosen to calculate lesion metrics (lesion size, lesion load and weighted lesion load) and these metrics were compared with clinical metrics. The weighted CST lesion load correlated significantly with baseline NIHSS, 90 days NIHSS, and 90 days FM-UE using the proposed constrained quintic polynomial regression model. In the future, we aim to refine and validate our method in an independent cohort with a larger sample size and well represented lesion location and size, as well as try other, more complex deep learning networks. Also, we can foresee further benefits from longitudinal studies. This study is a stepping stone towards more personalized treatments.

Supplementary Material

Supplementary Table 1
Supplementary Table 2

Acknowledgment

This work was supported by the National Institutes of Health through RF1 AG057895, R01 AG066184, U24 CA220245, RF1 AG070149 and by the American Heart Association Scientist Development Grant 14SDG1829003.

Footnotes

Declarations of interest: none

6. Citations

  • 1.Zivin JA and Choi DW, Stroke therapy. Scientific American, 1991. 265(1): p. 56–63. [DOI] [PubMed] [Google Scholar]
  • 2.Towfighi A and Saver JL, Stroke declines from third to fourth leading cause of death in the United States: Historical perspective and challenges ahead. Stroke, 2011. 42(8): p. 2351–2355. [DOI] [PubMed] [Google Scholar]
  • 3.Johnson W, et al. , Stroke: a global response is needed. Bulletin of the World Health Organization, 2016. 94(9): p. 634–634. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Hankey GJ, et al. , Long-term disability after first-ever stroke and related prognostic factors in the Perth Community Stroke Study, 1989–1990. Stroke, 2002. 33(4): p. 1034–1040. [DOI] [PubMed] [Google Scholar]
  • 5.Hara Y, Brain Plasticity and Rehabilitation in Stroke Patients. Journal of Nippon Medical School, 2015. 82(1): p. 4–13. [DOI] [PubMed] [Google Scholar]
  • 6.Feng W, et al. , Corticospinal tract lesion load: An imaging biomarker for stroke motor outcomes. Annals of Neurology, 2015. 78(6): p. 860–870. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Zhu LL, et al. , Lesion load of the corticospinal tract predicts motor impairment in chronic stroke. Stroke, 2010. 41(5): p. 910–915. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Bang OY, Multimodal MRI for Ischemic Stroke: From Acute Therapy to Preventive Strategies. Journal of Clinical Neurology, 2009. 5(3): p. 107–119. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Aoki J, et al. , FLAIR can estimate the onset time in acute ischemic stroke patients. Journal of the Neurological Sciences, 2010. 293(1–2): p. 39–44. [DOI] [PubMed] [Google Scholar]
  • 10.Leiva-Salinas C and Wintermark M, Imaging of Acute Ischemic Stroke. Neuroimaging Clinics, 2010. 20(4): p. 455–468. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Fiebach JB, et al. , CT and diffusion-weighted MR imaging in randomized order: Diffusion-weighted imaging results in higher accuracy and lower interrater variability in the diagnosis of hyperacute ischemic stroke. Stroke, 2002. 33(9): p. 2206–2210. [DOI] [PubMed] [Google Scholar]
  • 12.Saur D, et al. , Sensitivity and Interrater Agreement of CT and Diffusion-Weighted MR Imaging in Hyperacute Stroke. AJNR: American Journal of Neuroradiology, 2003. 24(5): p. 878–878. [PMC free article] [PubMed] [Google Scholar]
  • 13.Choi DS, Bright Intracranial Lesions on Diffusion-weighted Images: A Pictorial Review. Journal of the Korean Radiological Society, 2006. 55(1): p. 21–32. [Google Scholar]
  • 14.Schuleri KH, et al. , Characterization of Peri-Infarct Zone Heterogeneity by Contrast-Enhanced Multidetector Computed Tomography. A Comparison With Magnetic Resonance Imaging. Journal of the American College of Cardiology, 2009. 53(18): p. 1699–1707. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Ospel JM, et al. , Spatial Resolution and the Magnitude of Infarct Volume Measurement Error in DWI in Acute Ischemic Stroke. American Journal of Neuroradiology, 2020. 41(5): p. 792–797. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Burdette JH, Elster AD, and Ricci PE, Acute cerebral infarction: Quantification of spin-density and T2 shine- through phenomena on diffusion-weighted MR images. Radiology, 1999. 212(2): p. 333–339. [DOI] [PubMed] [Google Scholar]
  • 17.Lambin P, et al. , Radiomics: Extracting more information from medical images using advanced feature analysis. European Journal of Cancer, 2012. 48(4): p. 441–446. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Heye T, et al. , Reproducibility of dynamic contrast-enhanced MR imaging part II. Comparison of intra- and interobserver variability with manual region of interest placement versus semiautomatic lesion segmentation and histogram analysis. Radiology, 2013. 266(3): p. 812–821. [DOI] [PubMed] [Google Scholar]
  • 19.Qiu W, et al. , Machine Learning for Detecting Early Infarction in Acute Stroke with Non–Contrast-enhanced CT. Radiology, 2020. 294(2): p. 638–644. [DOI] [PubMed] [Google Scholar]
  • 20.Cho J, et al. , Improving Sensitivity on Identification and Delineation of Intracranial Hemorrhage Lesion Using Cascaded Deep Learning Models. Journal of Digital Imaging, 2019. 32(3): p. 450–461. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Kwon Y, et al. , Uncertainty quantification using Bayesian neural networks in classification: Application to biomedical image segmentation. Computational Statistics & Data Analysis, 2020. 142: p. 106816–106816. [Google Scholar]
  • 22.Subudhi A, Dash M, and Sabut S, Automated segmentation and classification of brain stroke using expectation-maximization and random forest classifier. Biocybernetics and Biomedical Engineering, 2020. 40(1): p. 277–289. [Google Scholar]
  • 23.Liu L, et al. , A survey on U-shaped networks in medical image segmentations. Neurocomputing, 2020. 409: p. 244–258. [Google Scholar]
  • 24.Qi K, et al. , X-Net: Brain Stroke Lesion Segmentation Based on Depthwise Separable Convolution and Long-Range Dependencies. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2019. 11766 LNCS: p. 247–255. [Google Scholar]
  • 25.Zhou Y, et al. , D-UNet: A Dimension-Fusion U Shape Network for Chronic Stroke Lesion Segmentation. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2021. 18(3): p. 940–950. [DOI] [PubMed] [Google Scholar]
  • 26.Inamdar MA, et al. , A Review on Computer Aided Diagnosis of Acute Brain Stroke. Sensors 2021, Vol. 21, Page 8507, 2021. 21(24): p. 8507–8507. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Kumar A, et al. , CSNet: A new DeepNet framework for ischemic stroke lesion segmentation. Computer Methods and Programs in Biomedicine, 2020. 193: p. 105524–105524. [DOI] [PubMed] [Google Scholar]
  • 28.Ronneberger O, Fischer P, and Brox T, U-Net: Convolutional Networks for Biomedical Image Segmentation. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2015. 9351: p. 234–241. [Google Scholar]
  • 29.Baldeon-Calisto M and Lai-Yuen SK, AdaResU-Net: Multiobjective adaptive convolutional neural network for medical image segmentation. Neurocomputing, 2020. 392: p. 325–340. [DOI] [PubMed] [Google Scholar]
  • 30.Sekou TB, et al. , From Patch to Image Segmentation using Fully Convolutional Networks -- Application to Retinal Images. arXiv preprint, 2019. [Google Scholar]
  • 31.Liu C-F, et al. , Deep learning-based detection and segmentation of diffusion abnormalities in acute ischemic stroke. Communications Medicine 2021 1:1, 2021. 1(1): p. 1–18. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Zhu H, et al. , An automatic machine learning approach for ischemic stroke onset time identification based on DWI and FLAIR imaging. NeuroImage: Clinical, 2021. 31: p. 102744–102744. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Avants BB, Tustison N, and Johnson H, Advanced Normalization Tools (ANTS) Release 2.x. 2014.
  • 34.Landman BA, et al. , Multi-Parametric Neuroimaging Reproducibility: A 3T Resource Study. NeuroImage, 2011. 54(4): p. 2854–2854. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Landman BA, et al. , Multi-parametric neuroimaging reproducibility: A 3-T resource study. NeuroImage, 2011. 54(4): p. 2854–2866. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Avants BB, et al. , Symmetric diffeomorphic image registration with cross-correlation: Evaluating automated labeling of elderly and neurodegenerative brain. Medical Image Analysis, 2008. 12(1): p. 26–41. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Yushkevich PA, et al. , User-Guided Segmentation of Multi-modality Medical Imaging Datasets with ITK-SNAP. Neuroinformatics, 2019. 17(1): p. 83–102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Huttenlocher DP, Klanderman GA, and Rucklidge WJ, Comparing Images Using the Hausdorff Distance. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1993. 15(9): p. 850–863. [Google Scholar]
  • 39.Zhang S and Arfanakis K, Evaluation of standardized and study-specific diffusion tensor imaging templates of the adult human brain: Template characteristics, spatial normalization accuracy, and detection of small inter-group FA differences. NeuroImage, 2018. 172: p. 40–50. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Lyden P, et al. , Improved reliability of the NIH Stroke Scale using video training. NINDS TPA Stroke Study Group. Stroke, 1994. 25(11): p. 2220–6. [DOI] [PubMed] [Google Scholar]
  • 41.Lyden P, et al. , National Institutes of Health Stroke Scale certification is reliable across multiple venues. Stroke, 2009. 40(7): p. 2507–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Goldstein LB, Bertels C, and Davis JN, Interrater reliability of the NIH stroke scale. Arch Neurol, 1989. 46(6): p. 660–2. [DOI] [PubMed] [Google Scholar]
  • 43.Fugl-Meyer AR, et al. , A method for evaluation of physical performance. Scand J Rehabil Med, 1975. 7(1): p. 13–31. [PubMed] [Google Scholar]
  • 44.Duncan PW, Propst M, and Nelson SG, Reliability of the Fugl-Meyer assessment of sensorimotor recovery following cerebrovascular accident. Phys Ther, 1983. 63(10): p. 1606–10. [DOI] [PubMed] [Google Scholar]
  • 45.Sanford J, et al. , Reliability of the Fugl-Meyer assessment for testing motor performance in patients following stroke. Phys Ther, 1993. 73(7): p. 447–54. [DOI] [PubMed] [Google Scholar]
  • 46.Gladstone DJ, Danells CJ, and Black SE, The fugl-meyer assessment of motor recovery after stroke: a critical review of its measurement properties. Neurorehabil Neural Repair, 2002. 16(3): p. 232–40. [DOI] [PubMed] [Google Scholar]
  • 47.Hsieh YW, et al. , Responsiveness and validity of three outcome measures of motor function after stroke rehabilitation. Stroke, 2009. 40(4): p. 1386–1391. [DOI] [PubMed] [Google Scholar]
  • 48.Leira EC, et al. , The NIHSS Supplementary Motor Scale: A Valid Tool for Multidisciplinary Recovery Trials. Cerebrovasc Dis, 2013. 36: p. 69–73. [DOI] [PubMed] [Google Scholar]
  • 49.Chhatbar P, et al. , Abstract TP152: Correlation of NIH Stroke Scale and Fugl-Meyer Motor Scales in a Longitudinal Stroke Recovery Study: Implication for Feasibility Survey for Stroke Rehabilitation Trial. Stroke, 2017. 48(suppl_1). [Google Scholar]
  • 50.Tukey JW, Exploratory data analysis. Vol. 2. 1977. [Google Scholar]
  • 51.Nguyen TT and Vu TD, Identification of multivariate geochemical anomalies using spatial autocorrelation analysis and robust statistics. Ore Geology Reviews, 2019. 111: p. 102985–102985. [Google Scholar]
  • 52.Doughty C, et al. , Detection and Predictive Value of Fractional Anisotropy Changes of the Corticospinal Tract in the Acute Phase of a Stroke. Stroke, 2016. 47(6): p. 1520–1526. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Senesh MR and Reinkensmeyer DJ, Breaking Proportional Recovery After Stroke. Neurorehabilitation and Neural Repair, 2019. 33(11): p. 888–901. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Lin YL, et al. , Stratifying chronic stroke patients based on the influence of contralesional motor cortices: An inter-hemispheric inhibition study. Clinical Neurophysiology, 2020. 131(10): p. 2516–2525. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Hurvich CM and Tsai CLL, A CORRECTED AKAIKE INFORMATION CRITERION FOR VECTOR AUTOREGRESSIVE MODEL SELECTION. Journal of Time Series Analysis, 1993. 14(3): p. 271–279. [Google Scholar]
  • 56.Li X, et al. , H-DenseUNet: Hybrid Densely Connected UNet for Liver and Tumor Segmentation from CT Volumes. IEEE Transactions on Medical Imaging, 2018. 37(12): p. 2663–2674. [DOI] [PubMed] [Google Scholar]
  • 57.Baumgartner CF, et al. , An Exploration of 2D and 3D Deep Learning Techniques for Cardiac MR Image Segmentation. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2017. 10663 LNCS: p. 111–119. [Google Scholar]
  • 58.Isensee F, et al. , Automatic Cardiac Disease Assessment on cine-MRI via Time-Series Segmentation and Domain Specific Features. 2017. 10663. [Google Scholar]
  • 59.Isensee F, et al. , nnU-Net: Self-adapting Framework for U-Net-Based Medical Image Segmentation. Informatik aktuell, 2018: p. 22–22. [Google Scholar]
  • 60.Jang Y, et al. , Automatic segmentation of LV and RV in cardiac MRI. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2018. 10663 LNCS: p. 161–169. [Google Scholar]
  • 61.De Feo R, et al. , Automated joint skull-stripping and segmentation with Multi-Task U-Net in large mouse brain MRI databases. NeuroImage, 2021. 229: p. 117734–117734. [DOI] [PubMed] [Google Scholar]
  • 62.Quinlan EB, et al. , Neural function, injury, and stroke subtype predict treatment gains after stroke. Annals of Neurology, 2015. 77(1): p. 132–145. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Milot MH and Cramer SC, Biomarkers of Recovery after Stroke. Current opinion in neurology, 2008. 21(6): p. 654–654. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Sundaresan V, et al. , Triplanar ensemble U-Net model for white matter hyperintensities segmentation on MR images. Medical Image Analysis, 2021. 73: p. 102184–102184. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Dolz J, et al. , HyperDense-Net: A Hyper-Densely Connected CNN for Multi-Modal Image Segmentation. IEEE Transactions on Medical Imaging, 2019. 38(5): p. 1116–1126. [DOI] [PubMed] [Google Scholar]
  • 66.Zhang X, et al. , Segmentation quality evaluation using region-based precision and recall measures for remote sensing images. ISPRS Journal of Photogrammetry and Remote Sensing, 2015. 102: p. 73–84. [Google Scholar]
  • 67.Habegger S, et al. , Relating acute lesion loads to chronic outcome in ischemic stroke-an exploratory comparison of mismatch patterns and predictive modeling. Frontiers in Neurology, 2018. 9(SEP): p. 737–737. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Liew SL, et al. , The ENIGMA Stroke Recovery Working Group: Big data neuroimaging to study brain–behavior relationships after stroke. Human Brain Mapping, 2022. 43(1): p. 129–148. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Table 1
Supplementary Table 2

RESOURCES