See also the article by Castiglione et al in this issue.

Alexandre Cadrin-Chênevert, MD, is a diagnostic and interventional radiologist at CISSS Lanaudière affiliated with Laval University. He has previously served as chief of the medical imaging department. As a Kaggle competition master, he has successfully participated in many machine learning competitions. He is an early member of the Canadian Association of Radiologists artificial intelligence working group. His current research interests include deep learning, computer vision, object detection, self-supervised learning, model generalizability, and public medical imaging datasets.
“Is this chest radiograph normal?” Answering such a question may seem simple for an experienced radiologist who has integrated thousands of previously interpreted examinations into his or her own current subjective representation of normality (1). This hardly explainable representation learned from experience is based on a set of intrinsic factors associated with the image and on extrinsic factors related to the patient’s demographics and symptoms. Producing a useful and accurate radiologic report for the referring physician necessarily involves defining clear and reproducible limits of image normality (2). The factors involved in assessing normality are often complex and cannot be easily broken down.
The use of the Greulich and Pyle atlas for the estimation of bone age in pediatric radiology is a typical example of the adaptation required to define thresholds of normality associated with the patient’s age. The application of such an atlas quickly comes up against hidden population biases such as geographic location or genetic characteristics of the population under study (3). In medicine, the use of multivariate nomograms attempts to compensate for these individual biases through simple mathematical relationships that are often linear. Deep learning through iterative optimization learning of the statistical distribution of a dataset is a very powerful tool to help define a normality specific to the context of a subpopulation. The development of a deep learning model of bone age, trained using chronological age from pediatric hand radiographs in a clinical setting of trauma, demonstrated the flexibility of the method to define a new statistical normality (4).
The article “Automated Segmentation of Abdominal Skeletal Muscle on Pediatric CT Scans Using Deep Learning” by Castiglione et al published in the current issue of Radiology: Artificial Intelligence effectively explores the ability of deep learning to segment and quantify muscle tissue in L3-level CT images of a pediatric population (5). The use of two successive deep learning semantic segmentation models based on the U-Net (6) architecture—first to localize the L3 vertebra on the localization image and then to segment the muscle structures on the chosen axial section—allows one to obtain an excellent segmentation measurement accuracy. Most importantly, the automation of this method, applied to larger pediatric datasets from several institutions, would allow one to define clear quantitative boundaries of muscle mass and skeletal muscle mass index ratio for the pediatric age range.
In this particular application, the diagnosis of sarcopenia, the generalized loss of muscle mass, is the clinical justification of the process. In the older population, but more broadly for the entire adult population, sarcopenia is associated with physical disability, poor quality of life, and death (7). In the pediatric population, muscle mass even becomes a quantitative marker of overall health (8). In this context, Weston et al trained a U-Net segmentation model of different tissues (fat, muscle, bone, visceral organs) on 2430 abdominal CT scans of an adult population (mean age, 66.5 years) and tested it on 270 examinations, with excellent segmentation accuracy (9).
Replication of the method developed in adults for a pediatric population was natural but equally effective. To define a normal distribution of pediatric muscle mass in a large population, the next step is to validate or retrain the pediatric model through different institutions, different machines, and different acquisition protocols. Thus, the strength of deep learning to properly represent the statistical distribution of data in an institution becomes its intrinsic weakness when shifting the data from one institution to another for external validation. The challenge of generalizability is well known but is absolutely necessary to overcome to advance toward a useful clinical application by increasing by one or more orders of magnitude the amount of data analyzed to define fair limits of normality across the age range of pediatric patients.
The opportunity to quantify the range of normality in future imaging is not limited to segmentation of pediatric muscle structures. It is a necessary route that can lead to the routine volumetric quantification of all anatomic structures observed in imaging and, more specifically, in cross-sectional imaging such as CT and MRI. These examinations are usually done with a specific clinical question that the radiologist tries to answer clearly in the report. At the same time, these images contain a phenomenal amount of volumetric information about the patient’s condition, some of which is generally evaluated semiquantitatively by the radiologist (eg, splenomegaly based on bipolar diameter of 15 cm). On the not-so-distant horizon, the quantitative volumetry of a set of organs or anatomic structures could be measured automatically and transmitted to the radiologist prior to the interpretation of an examination. This quantitative information will have to be combined with numerical thresholds to define an appropriate normality range with minimal bias for the subpopulation. The curation of a large dataset of segmented organs from several institutions could allow such a clinical application. For example, CT-ORG is a recently available dataset of 140 CT scans with the provided segmentation of six organ classes (10).
In the vast field of computer vision, the main tasks can be subdivided into classification, object detection, and segmentation (11). In medical imaging, the advent of deep learning has popularized the implementation of several image classification applications. The simplicity and speed of the classification labeling process with expert radiologists favored the development of supervised classification models. However, these classification models are quickly challenged by the interpretability and explainability issues typical of deep learning (12). Methods to address this challenge in radiology all involve some form of spatial image localization that allows the categorization of an image via feature or activation mapping. The mapped image allows a subjective appreciation of the features used to categorize the image in relation to the frequent pathognomonic signs used by the radiologist. However, segmentation models of anatomic or pathologic structures using an intermediary volumetric step before classification by applying thresholds of normality such as suggested for sarcopenia are fundamentally much more interpretable. Conceptually, one can even consider the Dice score metric used to evaluate segmentation accuracy as a spatial correlation of interpretation at the pixel level between the radiologist’s opinion and that of the model. The quantification of this interpretive correlation via the Dice score can even become a measurement tool of robustness that can more efficiently detect differences in classification performance when applying the model in distribution shift situations.
The future challenge of using artificial intelligence algorithms to define quantitative measures of normality for subpopulations of patients will be framed by the ability to gather a large enough amount of data to adequately represent the studied subpopulation with a deep learning model while avoiding the injection of hidden biases that could harm a given patient individually. Thus, in the current scenario, a robust and sufficiently validated model of skeletal muscle segmentation in the pediatric subpopulation is likely to be successful in effectively detecting abnormal cases of sarcopenia that are statistically outside the normal age distribution.
Footnotes
Disclosures of Conflicts of Interest: A.C. disclosed no relevant relationships.
References
- 1.Robinson D, Bevan EA. Defining normality--art or science? Methods Inf Med 1993;32(3):225–228. [PubMed] [Google Scholar]
- 2.Hartung MP, Bickle IC, Gaillard F, Kanne JP. How to create a great radiology report. RadioGraphics 2020;40(6):1658–1670. [DOI] [PubMed] [Google Scholar]
- 3.Soudack M, Ben-Shlush A, Jacobson J, Raviv-Zilka L, Eshed I, Hamiel O. Bone age in the 21st century: is Greulich and Pyle’s atlas accurate for Israeli children? Pediatr Radiol 2012;42(3):343–348. [DOI] [PubMed] [Google Scholar]
- 4.Pan I, Baird GL, Mutasa S, et al. Rethinking Greulich and Pyle: A deep learning approach to pediatric bone age assessment using pediatric trauma hand radiographs. Radiol Artif Intell 2020;2(4):e190198. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Castiglione J, Somasundaram E, Gilligan LA, Trout AT, Brady S. Automated segmentation of abdominal skeletal muscle on pediatric CT scans using deep learning. Radiol Artif Intell 2021;3(2):e200130. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Ronneberger O, Fischer P, Brox T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In: Navab N, Hornegger J, Wells WM, Frangi AF, eds. Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015. MICCAI 2015. Lecture Notes in Computer Science, vol 9351. Cham, Switzerland: Springer, 2015; 234–241. [Google Scholar]
- 7.Prado CMM, Wells JCK, Smith SR, Stephan BCM, Siervo M. Sarcopenic obesity: A Critical appraisal of the current evidence. Clin Nutr 2012;31(5):583–601. [DOI] [PubMed] [Google Scholar]
- 8.Gilligan LA, Towbin AJ, Dillman JR, Somasundaram E, Trout AT. Quantification of skeletal muscle mass: sarcopenia as a marker of overall health in children and adults. Pediatr Radiol 2020;50(4):455–464. [DOI] [PubMed] [Google Scholar]
- 9.Weston AD, Korfiatis P, Kline TL, et al. Automated abdominal segmentation of CT scans for body composition analysis using deep learning. Radiology 2019;290(3):669–679. [DOI] [PubMed] [Google Scholar]
- 10.Rister B, Yi D, Shivakumar K, Nobashi T, Rubin DLCT-ORG. CT-ORG, a new dataset for multiple organ segmentation in computed tomography. Sci Data 2020;7(1):381. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Chartrand G, Cheng PM, Vorontsov E, et al. Deep learning: A primer for radiologists. RadioGraphics 2017;37(7):2113–2131. [DOI] [PubMed] [Google Scholar]
- 12.Reyes M, Meier R, Pereira S, et al. On the interpretability of artificial intelligence in radiology: challenges and opportunities. Radiol Artif Intell 2020;2(3):e190043. [DOI] [PMC free article] [PubMed] [Google Scholar]
