Skip to main content
Radiology: Artificial Intelligence logoLink to Radiology: Artificial Intelligence
. 2021 Jan 6;3(2):e200130. doi: 10.1148/ryai.2021200130

Automated Segmentation of Abdominal Skeletal Muscle on Pediatric CT Scans Using Deep Learning

James Castiglione 1, Elanchezhian Somasundaram 1,, Leah A Gilligan 1, Andrew T Trout 1, Samuel Brady 1
PMCID: PMC8043356  PMID: 33937859

Abstract

Purpose

To automate skeletal muscle segmentation in a pediatric population using convolutional neural networks that identify and segment the L3 level at CT.

Materials and Methods

In this retrospective study, two sets of U-Net–based models were developed to identify the L3 level in the sagittal plane and segment the skeletal muscle from the corresponding axial image. For model development, 370 patients (sampled uniformly across age group from 0 to 18 years and including both sexes) were selected between January 2009 and January 2019, and ground truth L3 location and skeletal muscle segmentation were manually defined. Twenty percent (74 of 370) of the examinations were reserved for testing the L3 locator and muscle segmentation, while the remaining were used for training. For the L3 locator models, maximum intensity projections (MIPs) from a fixed number of central sections of sagittal reformats (either 12 or 18 sections) were used as input with or without transfer learning using an L3 localizer trained on an external dataset (four models total). For the skeletal muscle segmentation models, two loss functions (weighted Dice similarity coefficient [DSC] and binary cross-entropy) were used on models trained with or without data augmentation (four models total). Outputs from each model were compared with ground truth, and the mean relative error and DSC from each of the models were compared with one another.

Results

L3 section detection trained with an 18-section MIP model with transfer learning had a mean error of 3.23 mm ± 2.61 standard deviation, which was within the reconstructed image thickness (3 or 5 mm). Skeletal muscle segmentation trained with the weighted DSC loss model without data augmentation had a mean DSC of 0.93 ± 0.03 and mean relative error of 0.04 ± 0.04.

Conclusion

Convolutional neural network models accurately identified the L3 level and segmented the skeletal muscle on pediatric CT scans.

Supplemental material is available for this article.

See also the commentary by Cadrin-Chênevert in this issue.

© RSNA, 2021


Summary

Automated algorithms were developed using deep learning to identify the L3 level and segment the abdominal muscle from the corresponding axial CT image in a pediatric population.

Key Points

  • ■ An automated deep learning algorithm to correctly identify the L3 level in the sagittal plane was accurate to within 3.23 mm ± 2.61 on CT images from pediatric patients who were 0–18 years old.

  • ■ A second automated deep learning algorithm was able to segment abdominal muscle in a pediatric population (0–18 years old) with a Dice coefficient of 0.93 ± 03.

  • ■ The deep learning pipeline for skeletal muscle segmentation of a pediatric population had a mean relative error of 7% when compared with the manually segmented area.

Introduction

Sarcopenia, characterized by the loss of skeletal mass, has been linked to outcomes for various disease states and is a marker of overall health (1,2). A commonly used method of skeletal muscle mass quantification is to calculate the skeletal muscle mass index (SMI) in which the cross-sectional area of the skeletal muscle at the L3 vertebral level is normalized to patient height squared (SMI = skeletal muscle area/height2) (3). The muscles included in this measurement are the psoas, paraspinal (erector spinae and quadratus lumborum), and abdominal wall muscles (external and internal abdominal obliques, transversus abdominis, and rectus abdominis) (example shown in Figure E1 [supplement]).

Consensus normal ranges and cutoff values of SMI for the diagnosis of sarcopenia have been reported primarily for adult populations (1,2,47). For children, only a few studies have reported the diagnosis of sarcopenia using measurements of skeletal muscle mass, and the reference ranges used were derived from adult measurements without age-specific cutoff values (8,9). Availability of reference ranges from only adults is problematic, as skeletal muscle mass increases throughout childhood and adolescence and then decreases after age 30 (10,11). To accurately diagnose and study sarcopenia in children, reference ranges for muscle mass in children are needed (12).

Establishing reference ranges for SMI in the pediatric population will require measurement of SMI in a large number of healthy children. This is a tall order as, to date, segmentation of the skeletal muscle on cross-sectional imaging for calculation of the SMI has either been achieved manually, with the user tracing the outline of muscles, or in semiautomated fashion with manual adjustment of an initial segmentation on the basis of image thresholding (13,14).

As a first step toward addressing the need for skeletal muscle mass reference ranges in the pediatric population, the purpose of this study was to automate skeletal muscle segmentation in children using a convolutional neural network that identifies the L3 at routine CT and segments the abdominal wall muscle at that level. Similar studies focused on adult populations have been published recently (15,16), and in this study, the deep learning model for identifying the L3 level is derived from that previous work.

Materials and Methods

Study Design

This retrospective study was performed under institutional review board approval permitting review and use of existing clinical records without requiring patient consent. All study activities were performed in a Health Insurance Portability and Accountability Act–compliant manner. To achieve the aim of developing a fully automated pipeline to calculate skeletal muscle area at the L3 level, this study sought to: (a) develop ground truth data for L3 section detection and skeletal muscle segmentation for a pediatric population; (b) improve an L3 detection model, developed in an adult population (16) for pediatric patients (including training an in-house–developed model and fine-tuning a pretrained model); and (c) optimize a U-Net convolutional neural network architecture to achieve skeletal muscle segmentation at the L3 level in a pediatric dataset using two different loss functions and data-augmentation techniques.

A pediatric image dataset was developed by querying the Cincinnati Children’s Hospital Medical Center clinical picture archiving and communication system using Illuminate (Softek Illuminate 2020) for patients aged 0 to 18 years, inclusive, who underwent an abdominopelvic CT examination with intravenous contrast material as part of an emergency department encounter between January 1, 2009, and January 3, 2019. This query returned 10 268 examinations. For each year of age between 0 and 18, image sets for 10 unique boys and 10 unique girls (ie, 10 boys and 10 girls from 19 different age groups) were selected from the most recent examinations, resulting in a dataset of 380 images.

CT Image Acquisition

The CT examinations used in this study were performed on one of four possible CT scanners: an Aquilion 64 (2006–2017), Aquilion 64 (2008–2020), Aquilion One (2009–present), or Aquilion One Genesis (2017–present); all Aquilion scanners were manufactured by Toshiba (Canon Medical Systems). Patient protocols were optimized by weight, where patients weighing less than 15 kg were imaged at 80 kV and their images reconstructed at axial image thicknesses of 3 mm, patients weighing 15–70 kg were imaged at 100 kV with 5-mm axial images, and patients heavier than 70 kg were imaged at 120 kV with 5-mm axial images. All sagittal reformats were reconstructed from 0.5-mm raw data to have 3-mm thick images. Oral contrast media was diluted iohexol (Omnipaque 300; GE Healthcare). Diluent was clear, noncarbonated liquid. Oral contrast material dosage was based on patient weight, ranging from 2 mL (patient weight < 4.5 kg) to 18 mL (patient weight > 70 kg) to provide a total volume of diluent-to–Omnipaque 300 of 45 mL/mL (eg, for patients weighing 0–4.4 kg, 90 mL diluent was mixed with 2 mL Omnipaque 300) to be delivered within 1 hour. Administered intravenous contrast media was ioversol (Optiray 320; Guerbet). Intravenous contrast material dosage was based on 1.5 mL per kilogram of body weight, with a maximum dosage of 100 mL.

Dataset Development

On the basis of the sagittal reformatted images from the selected patient examinations, a postgraduate year 1 research physician (L.A.G.) and a 2nd-year medical student (J.C.) identified and segmented the axial image closest to the midbody of the third lumbar vertebra above the sacrum (L3) using seg3D (National Institutes of Health Center for Integrative Biomedical Computing, sci.utah.edu/cibc-software/seg3d.html) (17). The longitudinal distance from the most cephalad extent of the full CT volume to the identified image at the L3 level was then measured (in millimeters) and served as the ground truth for the L3 level localizer model. Manually segmented skeletal muscle area in the identified axial image was performed by the same two reviewers (L.A.G., J.C.) and supervised by a board-certified pediatric radiologist (A.J.T.), and the segmentation mask (binary image) served as ground truth for the muscle segmentation model. Of the 380 pediatric patients selected for inclusion, 10 patients were excluded from the final dataset due to poor image quality for reasons such as bad scoliosis, dysraphism, and incomplete scans with missing L3 sections that were identified only during manual annotation. The final internal dataset used for development of both models had a total of 370 patients with a mean age of 9 years ± 6 and a male:female ratio of 0.97 (182 male and 188 female). CT images for two of the 370 patients had been acquired prone, with the remainder acquired supine. The skeletal muscle area for the final dataset varied between 14 cm2 and 216 cm2, with a mean area of 70 cm2 ± 39. Figure 1 shows the flowchart of the data preparation for training and validation of the entire pipeline.

Figure 1:

Flowchart for the data sampling scheme used to develop the L3 section localization and skeletal muscle segmentation models. AI = artificial intelligence, PACS = picture archiving and communication system.

Flowchart for the data sampling scheme used to develop the L3 section localization and skeletal muscle segmentation models. AI = artificial intelligence, PACS = picture archiving and communication system.

Training and Test Datasets for L3 Localization

A subset of the pediatric dataset was withheld for independent testing after model training and validation. The independent test set consisted of image sets from 20% (74 of 370) of patients, with a mean age of 9 years ± 6 and a male:female ratio of 0.85 (34 male, 40 female). An additional external dataset, which was used for model development in a previous work (16), was used for transfer learning of the L3 localizer model and consisted of 1006 publicly available patient examinations that were obtained from multiple different sources, including a liver tumor dataset from the Liver Tumor Segmentation challenge (18), three sets from the Cancer Imaging Archive (19), and a dataset of ovarian cancer from Hammersmith Hospital, London. The mean age of the combined dataset is unknown, but visual inspection revealed a majority to be adult patients.

Data Augmentation for Muscle Segmentation Models

Data augmentation using image transformations was used to improve the convolutional neural network model performance for muscle segmentation by making the model robust against variations in the input image due to field of view and patient position differences (20). In this study, models were trained with and without data augmentation. Augmented images were generated using image operations such as random flip, zoom (up to 50%), rotation (up to 35%), shear (up to 15%), and shift (horizontally or vertically up to 30%) on the original training set, resulting in 2960 training images (10 augmented samples for each original image).

Image Preprocessing

To increase the contrast of the spine for the L3 section localizer, two sets of maximum intensity projection (MIP) images were created; the first set of MIP images was derived from the central 12 images of the sagittal CT scan volume (± six images of midvolume), and the second was derived from the central 18 images (± nine images of midvolume). The MIP images were resized to 256 × 384 pixels, and the attenuation values (in Hounsfield units) were normalized to values between 0 and 1.

The CT scan table was removed from the axial images prior to segmentation of skeletal muscle (21), and the images were normalized using z score normalization (22). The axial CT images and the manual segmentation masks were resized to 256 × 256 resolution.

Model Training and Validation

L3 localizer. For the L3 localizer, the same U-Net (23) convolutional neural network architecture that was developed for an adult population (16) was used. Four L3 localizer models were trained. Two models were trained using either the 12-section or 18-section MIP images calculated from the pediatric dataset (models termed 12sM and 18sM, respectively). Two additional models applied transfer learning to both the 12sM and 18sM models by starting with initial weights from a model trained using an external dataset (n = 1006) with 20 section MIPs (models called 12sM_TL and 18sM_TL, respectively). The Adam optimizer (24), with the learning rate set to 0.001, was chosen for the L3 localizer models. The models were trained until there was no improvement in validation loss for 20 consecutive epochs, and the model with the lowest validation loss was employed. Convergence of training and validation loss occurred within 30 epochs in all cases.

Skeletal muscle segmentation. A second convolutional neural network with the same U-Net architecture (Fig 2) was selected for the skeletal muscle segmentation model due to its previous success in medical image segmentation tasks (15,23). Two different loss functions, binary cross-entropy and a weighted Dice similarity coefficient (DSC) score (25), were evaluated. For each loss function, two iterations of the model, one without data augmentation and one with data augmentation, were trained. A total of four models were tested: model 1, binary cross-entropy without augmentation; model 2, binary cross-entropy with augmentation; model 3, weighted Dice score without augmentation; and model 4, weighted Dice score with augmentation. The Adam optimizer (24), with the learning rate set to 0.0001, was chosen. The models were trained until there was no improvement in validation loss after 20 epochs. The model with the lowest validation loss was employed. The training and validation loss converged by 30 epochs in all cases.

Figure 2:

Illustration of the two-dimensional (2D) U-Net used for skeletal muscle segmentation. Each contracting block on the left consisted of a 2D convolutional (Conv2D) layer followed by batch normalization, rectified linear unit activation, 2 × 2 max pool, and 0.5 dropout layers. The blocks on the expansion side are convolutional 2D transpose layers, followed by concatenation and 0.5 dropout layers.

Illustration of the two-dimensional (2D) U-Net used for skeletal muscle segmentation. Each contracting block on the left consisted of a 2D convolutional (Conv2D) layer followed by batch normalization, rectified linear unit activation, 2 × 2 max pool, and 0.5 dropout layers. The blocks on the expansion side are convolutional 2D transpose layers, followed by concatenation and 0.5 dropout layers.

Five iterations of the models were created from fivefold cross-validation during training; for each fold, 80% of the training data were used for training while the remaining 20% were used to calculate the validation loss and test for convergence. For final validation of the models, each of the five iterations of the model were used to generate predictions on the independent test dataset, and mean predictions of the models across all five iterations are reported as the final result.

Statistical Analysis

For the L3 localizer models, the mean absolute error in the L3 location relative to ground truth (in millimeters) was used to assess the model’s performance. For the skeletal muscle segmentation models, the mean DSC and the mean absolute relative error Inline graphic in skeletal muscle area on the manually determined L3 section were used for assessment. Two-sided t tests (α = .05) were used to compare the models against each other. In addition, the number of samples that had greater than 10-mm error for the L3 localizer and the number of samples with greater than 0.10 relative error for the muscle segmentation were reported. All models were implemented using Keras (26) with TensorFlow backend (27), and Microsoft Excel (2013) was used for statistical analysis. To quantify the combined performance of the entire pipeline, the best models were selected for each task, and the resulting error in skeletal muscle area is reported.

Results

L3 Localizer Performance

Results for the performance of the L3 localizer model versus ground truth for the test dataset of 74 pediatric CT image sets are reported in Table 1. Use of transfer learning for the model trained on 18-section MIP images resulted in a lower mean error than for the model trained on pediatric data only (3.23 ± 2.61 [18sM_TL] vs 4.31 ± 3.33 [18sM]; P = .03); there was no significant difference for the models trained on 12-section MIP images (3.39 ± 2.75 [12sM_TL] vs 4.04 ± 3.06 [12sM]; P = .18). Note that the median and mean error for all models were within approximately one axial image distance (3–5 mm) of ground truth.

Table 1:

L3 Level Localization from Four Configurations of L3 Localizer Models

graphic file with name ryai.2021200130.tbl1.jpg

Overall, the 18sM_TL model achieved the lowest mean error of all of the L3 localizer models. Figure 3 shows examples for which the L3 localizer models diverged from, and agreed with, ground truth. For a patient with the presence of sclerotic (high attenuation) cancer metastases to the spine and compression fractures of some of the vertebra (Fig 3, B), the 12sM model had a mean of error 21.8 mm versus ground truth, while the 18sM_TL model had a mean error of 0.6 mm. In Figure 3, B, a patient’s spine shows relatively lower attenuation than normal in the 12sM dataset due to off-center patient positioning along the lateral dimension of the scanner, resulting in erroneous localization of the model with an average error of 14.6 mm. Using the 18sM_TL model mitigated this effect, such that the model agreed to better than 1.4 mm with ground truth. For a neonatal patient whose vertebral body (left rectangular part of the vertebra) is smaller relative to the spinal canal and spinous processes (right, pointy part) (Fig 3, C), the 12sM model had a mean error of 19.6 mm across the five folds, but for the 18sM_TL, the mean improved to 11 mm. Another observation from visual inspection of the test data is that poor performance of the L3 section localizer was caused in patients who had their sacrum partially visible or invisible on the MIP images.

Figure 3:

Three cases of inaccurate prediction of the L3 level with the 12-section maximum intensity projection (MIP) model without transfer learning (12sM, top row) compared with the predictions from the 18-section U-Net model with transfer learning (18sM_TL, bottom row). The green lines are the ground truth, and the red lines are the models’ predictions. Where the lines overlap, the line appears yellow. A (left column), Image in a boy aged 4 years 11 months (12sM error, 21.8 mm; 18sM_TL error, 0.6 mm). B (middle column), Image in a boy aged 7 years 8 months (12sM error, 14.6 mm; 18sM_TL error, 1.4 mm). C (right column), Image in a 26-day-old male neonate (12sM error, 19.6 mm; 18sM_TL error, 11.6 mm).

Three cases of inaccurate prediction of the L3 level with the 12-section maximum intensity projection (MIP) model without transfer learning (12sM, top row) compared with the predictions from the 18-section U-Net model with transfer learning (18sM_TL, bottom row). The green lines are the ground truth, and the red lines are the models’ predictions. Where the lines overlap, the line appears yellow. A (left column), Image in a boy aged 4 years 11 months (12sM error, 21.8 mm; 18sM_TL error, 0.6 mm). B (middle column), Image in a boy aged 7 years 8 months (12sM error, 14.6 mm; 18sM_TL error, 1.4 mm). C (right column), Image in a 26-day-old male neonate (12sM error, 19.6 mm; 18sM_TL error, 11.6 mm).

Abdominal Skeletal Muscle Segmentation Performance

Table 2 shows performance summary of the skeletal muscle segmentation models on the independent pediatric test dataset. For all four models (binary cross-entropy and weighted DSC score loss functions with and without data augmentation), there was no measured difference in mean DSC value or its deviation. Overall, the mean relative error in segmented muscle area was lower for the models trained without data augmentation (weighted DSC, 0.04 ± 0.04 [original] vs 0.09 ± 0.04 [augmented]; P < .001 and binary cross-entropy, 0.05 ± 0.04 [original] vs 0.08 ± 0.04 [augmented]; P < .001). The mean relative error for the DSC-weighted and binary cross-entropy models with data augmentation were no different from one another (P = .45). The number of samples in the test set that had greater than 0.10 relative error in muscle area was minimal (four cases) for the models without augmentation compared with models with augmentation (> 20 cases).

Table 2:

DSC Values and Relative Error in Muscle Area for Different Muscle Segmentation Models

graphic file with name ryai.2021200130.tbl2.jpg

Two example predictions of skeletal muscle regions made by the weighted DSC score loss model without and with data augmentation are shown in Figure 4. Both segmentation models scored relatively poorly (DSC ~ 0.8) for smaller patients with thin abdominal wall skeletal musculature (Fig 4, A). However, the model with augmentation had a small relative area error of 0.02 compared with 0.07 for the model without augmentation. Both segmentation models achieved a DSC value of 0.94 for patients with relatively greater skeletal muscle mass (Fig 4, B); however, the model with data augmentation had a greater relative error of 0.12 compared with a 0.04 error for the model without augmentation. Manual investigation of the segmentation masks predicted by both models revealed that the augmentation model slightly overpredicts the muscle segments, especially at the borders, and in a few cases incorrectly incorporated bowel in the segmentation (Fig 4, B). This also explains the smaller area error of the augmented model in a thin patient (Fig 4, A). Also, in patients with irregular psoas muscles, the models overpredicted the muscle area, but there were no other distinct patterns of segmentation failure observed.

Figure 4:

Examples of skeletal muscle segmentation performance by the weighted Dice similarity coefficient (DSC) loss model with and without data augmentation show the original transaxial section and overlays of ground truth and predicted muscle regions.

Examples of skeletal muscle segmentation performance by the weighted Dice similarity coefficient (DSC) loss model with and without data augmentation show the original transaxial section and overlays of ground truth and predicted muscle regions.

Combined Performance of L3 Localizer and Skeletal Muscle Segmentation

To estimate the performance of the entire pipeline in predicting the skeletal muscle area, the metric of interest for sarcopenia indexes, a fully automated pipeline using the best-performing L3 localizer, the 18sM_TL, and the best-performing segmentation model, the weight Dice score segmentation model (model 3), was implemented. At each stage, the mean result of the five model iterations from cross-validation was computed as output. The segmented muscle areas of this automated pipeline were then compared with the manually annotated segmentation areas for the test dataset. The pipeline’s mean relative area error was 0.08 ± 0.05, with a range of 0–0.24. The percentage of test samples with greater than 0.10 relative error was 24% (18 of 74). The error using this pipeline was higher than the segmentations produced using model 3 on the axial L3 sections determined manually (0.04 ± 0.04, P = .0001); however, the pipeline still performs similar to the weighted Dice score–augmented segmentation model on manually determined L3 sections (0.09 ± 0.04, P = .18) (Table 2). Figure 5 shows the Bland-Altman plot of the skeletal muscle area predictions on the test set using the full pipeline in comparison to manual annotation. The slight overprediction of the muscle area by the pipeline compared with manual annotation observed during visual inspection is confirmed in the plot.

Figure 5:

Bland-Altman plot for the skeletal muscle area predictions on the test dataset (n = 74) using the automated pipeline with the 18-section maximum intensity projection model with transfer learning for L3 localizer and weighted Dice similarity coefficient loss model for skeletal muscle segmentation. The ordinate indicates the difference between predicted area and manually segmented area (ground truth). The abscissa indicates the mean area between predicted and true area. SD = standard deviation.

Bland-Altman plot for the skeletal muscle area predictions on the test dataset (n = 74) using the automated pipeline with the 18-section maximum intensity projection model with transfer learning for L3 localizer and weighted Dice similarity coefficient loss model for skeletal muscle segmentation. The ordinate indicates the difference between predicted area and manually segmented area (ground truth). The abscissa indicates the mean area between predicted and true area. SD = standard deviation.

Discussion

An automated pipeline to calculate skeletal muscle cross-sectional area for pediatric patients from contrast material–enhanced CT scans was developed and described within this study. As the first element of the pipeline, the L3 section localizer model demonstrated an agreement with ground truth of 3.23 mm ± 2.6 on a pediatric test set of 74 patients for the 18sM_TL model. The mean localization accuracy achieved is a smaller increment than the nominal 5-mm reconstructed image thickness and equivalent to the 3-mm image thickness (used for smaller patients < 15 kg) routinely used for image reconstruction at our institution. This model demonstrated higher (P = .03) location accuracy versus the model trained without transfer learning.

The L3 section localizer results in this study are similar to those of Kanavati et al (16), who achieved a mean agreement to better than 2 mm ± 5 with a model trained on sagittal input. Their model was trained on a dataset consisting of 1006 CT images (same dataset used for our transfer learning) of adult patients aggregated from multiple datasets, and the test results were reported for threefold cross-validation (~ 335 test images per fold). Importantly, in the study by Kanavati et al, CT section thickness varied between 1 mm and 7 mm, with more than 400 examinations having 1-mm thick images. Furthermore, we believe errors ± 3 mm for L3 section selection to be of unlikely clinical significance.

The second element of our pipeline allows automated skeletal muscle segmentation on a transaxial image at the identified L3 level. The segmentation model leveraged a U-Net semantic segmentation algorithm that outputs segmentations with a mean DSC of 0.93 ± 0.03 and with mean relative error of only 0.04 ± 0.04. To the best of our knowledge, currently published studies on automated segmentation of skeletal muscle have only been focused on adult populations, not pediatric populations. In adults (mean age, 66.5 years ± 11), Weston et al (28) used a similar two-dimensional U-Net architecture to segment subcutaneous adipose tissue, muscle, viscera, and bone and achieved a mean DSC of 0.93 ± 0.06 for muscle at the L3 level for 270 patients. Also, in adults (mean age, 68 years), Burn et al (15) showed a similar U-Net model for sarcopenia estimation from abdominal CT to have a DSC of 0.94 ± 0.03 at the L3 level on a dataset with 51 patients and a mean age of 68 years. Performance of our algorithm, and that in the prior studies by both Weston et al and Burn et al, is similar to the performance of human observers quantified by Weston et al (28). Those authors showed an interobserver variability measurement of 0.93 ± 0.02 DSC for two human observers performing manual segmentation on 30 patients, indicating that the performance of automated skeletal muscle segmentation algorithms is approaching human level performance.

When comparing the performance of models with and without data augmentation, despite having similar DSC, models trained with augmentation had larger relative errors in muscle area compared with the models without augmentation, apparently due to slight overprediction of muscle area around the borders, an effect to which the DSC is not sensitive, but the relative error is sensitive. This discrepancy is a known limitation of using DSC and has been documented in the literature (29). The smaller area error in the nonaugmented models may also reflect greater similarity between samples in the test set and those in the training set. For the augmented models, while performance was relatively poor on our limited test set, their performance should generalize well on a more diverse dataset (with prone patients, different CT vendors, etc) because of the transformed images in the training data. The performance of the full pipeline using the best-performing models for L3 localizer (18sM_TL model) and muscle segmentation (weighted DSC loss without augmentation) revealed that the errors due to automatic L3 localization affected the errors in segmented muscle area; however, they were still better than the area errors for augmented segmentation models on manually picked L3 sections.

One of the major limitations of this work arose from the limited availability of ground truth data for a pediatric population. Most of the publicly available CT datasets contain a large number of adult patients, and because there is a considerable variability in the tissue composition between adults and pediatric patients, they may not be suitable for validating pediatric-specific models. The models developed in this work have proven to perform well in the smaller pediatric test datasets we developed; however, when applying the models to a larger pediatric patient population (> 2000 samples) for generating skeletal muscle mass indexes for sarcopenia diagnosis, it will be essential to verify the accuracy of the model predictions without manual interventions. To this end, Bayesian convolutional neural networks with Monte Carlo dropout as proposed by Hiasa et al (30) could be used to provide an uncertainty metric for each output prediction made by the U-Net model and can be incorporated in the future models to identify outlier predictions for the larger datasets without ground truth. An additional potential limitation of this work was that we did not attempt to account for variant anatomy, including patients who only had 11 rib-bearing thoracic-type vertebral bodies and patients who had transitional vertebrae at the lumbosacral junction. Instead, we sought to identify the third lumbar vertebrae above the sacrum. In patients with transitional vertebral anatomy, it is possible that this vertebra is not L3. While this potential discrepancy is relevant to annotation, it should not impact our results, as what is important is that the relative distance from the top of the pelvis is constant.

Taken together, we describe models for the automatic localization of L3 for skeletal muscle area segmentation. Automatic quantification of skeletal muscle area from CT in children could help improve diagnostic workflows for sarcopenia. The model-predicted skeletal muscle area along with the patient height extracted from the electronic health records could be used to create nomograms for children aged 0 to 18 years to allow for the diagnosis of sarcopenia. In conclusion, automated algorithms were developed to identify the third lumbar vertebral level and segment the abdominal muscle at that level at CT in a pediatric population.

SUPPLEMENTAL FIGURES

Figure E1:
ryai200130suppf1.jpg (63.2KB, jpg)

Acknowledgments

Acknowledgments

The authors would like to thank Timothy O’Connor for help with data curation and Joanne Lovelace for help with manuscript preparation.

Disclosures of Conflicts of Interest: J.C. disclosed no relevant relationships. E.S. disclosed no relevant relationships. L.A.G. disclosed no relevant relationships. A.T.T. disclosed no relevant relationships. S.B. Activities related to the present article: author received consulting fee or honorarium from Cincinnati Children’s Hospital. Activities not related to the present article: disclosed no relevant relationships. Other relationships: author has patent or copyright pending.

Abbreviations:

DSC
Dice similarity coefficient
18sM
18-section MIP model
18sM_TL
18-section MIP with transfer learning model
MIP
maximum intensity projection
SMI
skeletal muscle mass index
12sM
12-section MIP model
12sM_TL
12-section MIP with transfer learning model

References

  • 1.Cruz-Jentoft AJ, Baeyens JP, Bauer JM, et al. Sarcopenia: European consensus on definition and diagnosis: Report of the European Working Group on Sarcopenia in Older People. Age Ageing 2010;39(4):412–423. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Cruz-Jentoft AJ, Bahat G, Bauer J, et al. Sarcopenia: revised European consensus on definition and diagnosis. Age Ageing 2019;48(1):16–31. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Prado CM, Wells JC, Smith SR, Stephan BC, Siervo M. Sarcopenic obesity: A Critical appraisal of the current evidence. Clin Nutr 2012;31(5):583–601. [DOI] [PubMed] [Google Scholar]
  • 4.Chen LK, Lee WJ, Peng LN, et al. Recent Advances in Sarcopenia Research in Asia: 2016 Update From the Asian Working Group for Sarcopenia. J Am Med Dir Assoc 2016;17(8):767.e1–767.e7. [DOI] [PubMed] [Google Scholar]
  • 5.Chen LK, Liu LK, Woo J, et al. Sarcopenia in Asia: consensus report of the Asian Working Group for Sarcopenia. J Am Med Dir Assoc 2014;15(2):95–101. [DOI] [PubMed] [Google Scholar]
  • 6.Fielding RA, Vellas B, Evans WJ, et al. Sarcopenia: an undiagnosed condition in older adults. Current consensus definition: prevalence, etiology, and consequences. International working group on sarcopenia. J Am Med Dir Assoc 2011;12(4):249–256. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Studenski SA, Peters KW, Alley DE, et al. The FNIH sarcopenia project: rationale, study description, conference recommendations, and final estimates. J Gerontol A Biol Sci Med Sci 2014;69(5):547–558. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Mangus RS, Bush WJ, Miller C, Kubal CA. Severe Sarcopenia and Increased Fat Stores in Pediatric Patients With Liver, Kidney, or Intestine Failure. J Pediatr Gastroenterol Nutr 2017;65(5):579–583. [DOI] [PubMed] [Google Scholar]
  • 9.Steffl M, Chrudimsky J, Tufano JJ. Using relative handgrip strength to identify children at risk of sarcopenic obesity. PLoS One 2017;12(5):e0177006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Woo J. Sarcopenia. Clin Geriatr Med 2017;33(3):305–314. [DOI] [PubMed] [Google Scholar]
  • 11.McCarthy HD, Samani-Radia D, Jebb SA, Prentice AM. Skeletal muscle mass reference curves for children and adolescents. Pediatr Obes 2014;9(4):249–259. [DOI] [PubMed] [Google Scholar]
  • 12.Gilligan LA, Towbin AJ, Dillman JR, Somasundaram E, Trout AT. Quantification of skeletal muscle mass: sarcopenia as a marker of overall health in children and adults. Pediatr Radiol 2020;50(4):455–464 [DOI] [PubMed] [Google Scholar]
  • 13.van Vugt JL, Levolger S, Gharbharan A, et al. A comparative study of software programmes for cross-sectional skeletal muscle and adipose tissue measurements on abdominal computed tomography scans of rectal cancer patients. J Cachexia Sarcopenia Muscle 2017;8(2):285–297. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Mitsiopoulos N, Baumgartner RN, Heymsfield SB, Lyons W, Gallagher D, Ross R. Cadaver validation of skeletal muscle measurement by magnetic resonance imaging and computerized tomography. J Appl Physiol (1985) 1998;85(1):115–122. [DOI] [PubMed] [Google Scholar]
  • 15.Burns JE, Yao J, Chalhoub D, Chen JJ, Summers RM. A Machine Learning Algorithm to Estimate Sarcopenia on Abdominal CT. Acad Radiol 2020;27(3):311–320. [DOI] [PubMed] [Google Scholar]
  • 16.Kanavati F, Islam S, Aboagye EO, Rockall A. Automatic L3 slice detection in 3D CT images using fully-convolutional networks. arXiv:1811.09244 [preprint] https://arxiv.org/abs/1811.09244. Posted November 22, 2018. Accessed February 18, 2020.
  • 17.Tate J, Burton B, Khan A. Seg3D Basic Functionality. http://www.sci.utah.edu/devbuilds/seg3d_docs/Seg3DBasicFunctionality.pdf. Published 2016. Accessed April 18, 2017. [Google Scholar]
  • 18.Bilic P, Christ PF, Vorontsov E, et al. The Liver Tumor Segmentation Benchmark (LiTS). arXiv:1901.04056 [preprint] https://arxiv.org/abs/1901.04056. Posted January 13, 2019. Accessed May 16, 2020. [DOI] [PMC free article] [PubMed]
  • 19.Clark K, Vendt B, Smith K, et al. The Cancer Imaging Archive (TCIA): maintaining and operating a public information repository. J Digit Imaging 2013;26(6):1045–1057. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Eaton-Rosen Z, Bragman F, Ourselin S, Cardoso MJ. Improving data augmentation for medical image segmentation. https://openreview.net/forum?id=rkBBChjiG. Published April 11, 2018. Accessed May 15, 2020.
  • 21.Somasundaram E, Deaton J, Kaufman R, Brady S. Fully automated tissue classifier for contrast-enhanced CT scans of adult and pediatric patients. Phys Med Biol 2018;63(13):135009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Patro S, Sahu KK. Normalization: A Preprocessing Stage. arXiv:1503.06462 [preprint] https://arxiv.org/abs/1503.06462. Posted March 19, 2015. Accessed March 8, 2020. [Google Scholar]
  • 23.Ronneberger O, Fischer P, Brox T. U-net: Convolutional Networks for Biomedical Image Segmentation. In: Navab N, Hornegger J, Wells W, Frangi A, eds. Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015. MICCAI 2015. Lecture Notes in Computer Science, vol 9351. Cham, Switzerland: Springer, 2015; 234–241. [Google Scholar]
  • 24.Kingma DP, Ba J. Adam: A Method for Stochastic Optimization. arXiv:1412.6980 [preprint] https://arxiv.org/abs/1412.6980. Posted December 22, 2014. Accessed February 18, 2020.
  • 25.Sudre CH, Li W, Vercauteren T, Ourselin S, Cardoso MJ. Generalised Dice Overlap as a Deep Learning Loss Function for Highly Unbalanced Segmentations. In: Cardoso MJ, Arbel T, Carneiro G, et al., eds. Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support. DLMIA 2017, ML-CDS 2017. Lecture Notes in Computer Science, vol 10553. Cham, Switzerland: Springer, 2017; 240–248.. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Gulli A, Pal S. Deep Learning with Keras. Birmingham, England: Packt, 2017. [Google Scholar]
  • 27.Abadi M, Barham P, Chen J, et al. Tensorflow: A system for large-scale machine learning. In: OSDI’16: Proceedings of the 12th USENIX conference on Operating Systems Design and Implementation. Berkeley, Calif: USENIX Association, 2016; 265–283. https://dl.acm.org/doi/10.5555/3026877.3026899. [Google Scholar]
  • 28.Weston AD, Korfiatis P, Kline TL, et al. Automated abdominal segmentation of CT scans for body composition analysis using deep learning. Radiology 2019;290(3):669–679. [DOI] [PubMed] [Google Scholar]
  • 29.Crum WR, Camara O, Hill DL. Generalized overlap measures for evaluation and validation in medical image analysis. IEEE Trans Med Imaging 2006;25(11):1451–1461. [DOI] [PubMed] [Google Scholar]
  • 30.Hiasa Y, Otake Y, Takao M, Ogawa T, Sugano N, Sato Y. Automated Muscle Segmentation from Clinical CT Using Bayesian U-Net for Personalized Musculoskeletal Modeling. IEEE Trans Med Imaging 2020;39(4):1030–1040. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Figure E1:
ryai200130suppf1.jpg (63.2KB, jpg)

Articles from Radiology. Artificial intelligence are provided here courtesy of Radiological Society of North America

RESOURCES