Abstract
The aim of our study was to evaluate the specific performance of an artificial intelligence (AI) algorithm for lung nodule detection in chest radiography for a larger number of nodules of different sizes and densities using a standardized phantom approach. A total of 450 nodules with varying density (d1 to d3) and size (3, 5, 8, 10 and 12 mm) were inserted in a Lungman phantom at various locations. Radiographic images with varying projections were acquired and processed using the AI algorithm for nodule detection. Computed tomography (CT) was performed for correlation. Ground truth (detectability) was established through a human consensus reading. Overall sensitivity and specificity of 0.978 and 0.812, respectively, were achieved for nodule detection. The false-positive rate was low with an overall rate of 0.19. The overall accuracy was calculated as 0.84 for all nodules. While most studies evaluating AI performance in the detection of pulmonary nodules have evaluated a mix of varying nodules, these are the first results of a controlled phantom-based study using a balanced number of nodules of all sizes and densities. To increase the radiologist’s diagnostic performance and minimize the risk of decision bias, such algorithms have an obvious benefit in a clinical scenario.
Keywords: artificial intelligence, chest x-ray, lung, pulmonary nodule, radiography, sensitivity and specificity
1. Introduction
Currently numerous artificial intelligence (AI) algorithms are flooding radiology workflows, including chest radiography interpretation. The purpose of these algorithms varies among the detection of abnormalities, triage and autonomous interpretation.[1–4] Many algorithms have recently focused on lung nodule detection, a task that is conveniently achievable using machine learning strategies.[5]
The widely newly introduced low dose CT lung cancer screening programs do not incorporate chest radiography as part of the process.[6] However the impact of AI detected clinically relevant findings in chest radiography, including nodules that are missed by human readers, remains unclear.[7]
While lung cancer remains the 2nd most common cancer in the world, also metastatic pulmonary disease is observed in up to 54% of extrathoracic malignancies.[8] Radiography remains the most common diagnostic tool used in radiology.[9] Being accessible, low in cost, and having low radiation exposure, radiography is ideal to improve the early detection of cancer and metastatic disease and therefore the overall outcome in survival rates, by detecting pulmonary nodules.[10,11]
Owing to its high frequency and common use in clinical practice the sheer number of examinations could lead to a decrease in the sensitivity of the radiologist.[9,11] AI has been proven to simplify the detection of pathologic findings and make the work of radiologists more effective.[9,12] It may substantially reduce the cognitive bias of the radiologist during the interpretation of images and hence increase safety.[13] Therefore those computer-aided detection systems could play a significant role in pulmonary nodule detection. Usually, useful aspects include nodule candidate detection and false-positive reduction. Deep learning methods have been shown to be successful in nodule candidate detection in chest computed tomography (CT); however, they have limitations in chest radiography because of the large image size and obscuring anatomical structures, such as ribs.[12]
Feature selection is an important component in the creation of computer-assisted detection (CAD) systems for extracting geometric or contrast features to differentiate between normal tissue and pulmonary nodules. The early work of Wei et al defined an optimum of 216 features using the forward stepwise selection method to differentiate between normal tissue and pulmonary nodules. A true positive rate of 80% were reached as well as 5.4 false positives per image.[14] More recent work improved the performances of CAD systems to a sensitivity of 91.4% with 2.0 FP/image respectively 97.1% sensitivity and 5.0 FP/image using segmentation of lung parenchyma via multiple algorithm service model (M-ASM) algorithm, nodule enhancement techniques, watershed segmentation of lung nodules, synthetic training images, and transfer-learning methods for nodule classification.[12]
Liang et al compared 4 different deep learning algorithms and were able to detect pulmonary nodules with an average diameter of 4.872 cm, although they showed limitations with smaller and subsolid nodules.[11]
Baseline nodule density was not a reliable discriminator between benign and malignant diseases.[15] However subsolid nodules may be difficult to detect because ground glass refers to a density at which the lung architecture is not obscured by the nodule.[16] Malignant subsolid pulmonary nodules grow at a slower pace than malignant solid nodules and are typically adenocarcinomas.[16]
The initial validation of the AI algorithm analyzed in our study was published before using clinical images with varied nodularities.[10,12] The aim of our study was to evaluate the specific performance of the comprehensive algorithm for lung nodule detection in chest radiography for a larger number of different nodule sizes and densities, using a standardized phantom approach with a defined number of nodules for all densities and sizes available.
2. Materials and methods
This study was designed according to the MAIC-10 (Must AI criteria-10) checklist, as it represents the 10 essential criteria considered to be necessary in any AI publication with medical images.[17]
All phantom images met predefined image quality criteria such as the Digital Imaging and Communications in Medicine (DICOM) image format standard, coverage of entire lungs inside the imaged chest radiograph, and absence of image artifacts.
2.1. Phantom description
An anthropomorphic thoracic phantom (Lungman, Kyoto Kagaku, Tokyo, Japan) with an artificial thoracic wall, heart, mediastinum, diaphragm and lung with pulmonary vessels was used for image acquisition (Fig. 1). The phantom consists of an accurate life-size anatomical model of a male thorax with soft tissue substitute materials made of polyurethane resin composites and synthetic bones made of epoxy resin with X-ray absorption characteristics resembling human tissue. The space between the pulmonary vessels in the thoracic cavity was filled with air.
Figure 1.
Lungman phantom showing artificial chest cavity and vascular/mediastinal inlay.
The lung nodule phantom inlay consisted of synthetic spheres of 3, 5, 8, 10, and 12 mm in diameter, with each sphere having 3 densities: +100 HU (d1) made of polyurethane, SZ50 and hydroxyapatite, −630 HU (d2), and −800 HU (d3), both made of urethane foam (Fig. 2).
Figure 2.
Lung nodule phantom inlay demonstrating nodules of 5 different sizes and 3 different densities, respectively.
2.2. Scan protocol
All images of the phantom were acquired in the postero-anterior (pa) beam direction with standardized acquisition parameters using a Siemens Ysio digital radiography system (Siemens, Forchheim, Germany). The tube voltage was fixed at 125 kV for all acquisitions, SID (source-to-image distance) was 2 m, and the filter was fixed at 0.1 mmCu.
In a sequential approach the empty phantom chest cavity was first scanned.
Second, each nodule of each density was inserted into the empty cavity for baseline detection and to determine whether the nodule was visible to the human eye.
Each sphere was scanned with an additional mediastinum/heart/vessel inlay ten times, each scan was repositioned for varying projections and angulations. The localization of the nodule was randomized to avoid the heart and the upper apex, and was varied 3 times resulting in a total of 15 × 30 nodule images (n = 450). The phantom chest cavity with the inserted vessel inlay was scanned further 3 × 10 times without any spheres inserted, and for each sequence the inlay was repositioned.
Before changing the location of the nodules, CT scans were performed to ensure that the findings of the AI software were equivalent to the actual locations of the nodules.
2.3. AI algorithm
All images were processed using the AI-Rad Companion Chest X-ray algorithm VB10 (Siemens Healthineers AG, Erlangen, Germany). The distribution of cases used for training the algorithm was provided and has been extensively described previously.[10] The AI algorithm solely analyzes the posterior–anterior (pa) and anterior–posterior (ap) views of chest X-ray images and creates secondary capture DICOM objects reporting the results of the analysis.
2.4. Establishing the ground truth
To establish the ground truth for the study, 2 radiologists (TN [with 18 years of experience in thoracic radiology] and ME [2 years of experience]) assessed all unprocessed images to establish the ground truth in a consensus readout. Readers were provided with all available data including the corresponding CT images. To avoid any unintentional bias, the 2 radiologists establishing the ground truth did not have access to the AI results or annotations. They generated a label for the absence and presence (scored as a 0:1 binary) of a nodule with the constraint that the nodule should be visible in the pa images. Lesions not visible on radiographs but seen on CT were deemed absent.
2.5. Image analysis
A total of 450 images were analyzed for detectability of the pulmonary spheres. These consisted of n = 30 images of each density (d1–d3) and size (3, 5, 8, 10, and 12 mm) in varying locations and projections.
The images were loaded onto a clinical viewing workstation and processed by the CAD system. Figure 3 shows a representative overlay of the AI algorithm results showing the tagging of a 10 mm sphere (d1) in the right lower lung.
Figure 3.
10 mm sphere (d1) in right lower lung tagged by the algorithm.
A subset or 30% representing 10 images of each sphere and density was processed a second time with the AI algorithm to evaluate repeatability of the results. Image analysis was performed using the Picture Archiving and Communications Systems (PACS, GE Centricity version I6, GE Healthcare, Chalfont, St Giles, UK).
2.6. Statistics
Data processing and descriptive statistical analyses were performed using SPSS software (IBM Corp. Released 2021. IBM SPSS Statistics for Windows, Version 28.0.1.1, IBM Corp, Armonk). The sensitivity, specificity, negative predictive value, false discovery/positive rate (FPR) and false omission/negative rate (FNR) were calculated for the detection of each predefined finding. Accuracy (ACC) was calculated to evaluate the performances. The continuous probability distribution over a set of positive real numbers was tested using the chi-square test. The statistical significance threshold was set at P < .05.
3. Results
Taking all the X-rays into account with spheres inserted into the phantom and considering what is detectable to the radiologist’s eyes as ground truth, an overall sensitivity and specificity of 0.978 and 0.812, respectively, for nodule detection was achieved. Therefore the false-positive rates were low with an overall false-positive rate of 0.19. The overall accuracy was calculated as 0.84 for all nodules. Table 1 provides an overview of the global results.
Table 1.
Results of performance for different densities and all sphere sizes. Division with divisor zero was noted as n/a.
The chi-square test returned a significantly positive correlation between the detection probability and the ground truth for all spheres and all densities (P < .001, n = 534) and the same correlation was applied for all individual sizes and densities. The subgroup analysis showed the best performance of the algorithm for the high-density nodules, resulting in an accuracy of 0.85 for all sizes. In this subgroup, the algorithm yielded the best results for the smallest detectable nodules of the ground truth (e.g., 8 mm spheres) with an accuracy of 0.97. The lowest accuracy was obtained for the largest detectable nodules (0.81). See Tables 2 and 3 for detailed results of all sphere sizes and densities.
Table 2.
Results of performance for different densities relative to sphere size. Division with divisor zero was noted as n/a.
Table 3.
Results of performance for different sphere sizes relative to density. Division with divisor zero was noted as n/a.
The false-positive rate fluctuated over the different nodule sizes and densities without a clear tendency, and a subset of 30 images of the phantom and the vascular inlay but without any spheres inserted did not show any false-positive results. A subanalysis of 150 images for inter-reader performance did not show any variance in the results of the algorithm.
4. Discussion
While only spheres equal to or larger than 8 mm of the highest density could be detected as the human reader ground truth, the algorithm was able to detect 1 nodule of intermediate density that was not visible to the human eye. Lowd density spheres (d3) representing pure ground-glass nodules could not be detected neither by the human reader, nor by the algorithm. Our results showed the best performance for the smallest detectable spheres, which is in line with the results of a recently published study,[5] which reported an AUC of 0.89 for nodules of 5 to 9 mm in size (n = 11) over all conspicuity levels without further specifying the total number of nodules per conspicuity level in their manuscript. In our study, the number of comparable nodules was n = 30, over all densities even n = 90, resulting in an accuracy of 0.86. Similar results were observed for the large spheres. Based on our results a subgroup analysis of spheres 10 to 12 mm showed an accuracy of 0.84 for n = 180, nearly the same as the subgroup of 10 to 14 mm that showed an AUC of 0.83, for n = 41 reported by the group of van Leeuwen et al.[5] Only high-density spheres (100 HU) were reliably detected using the software. Only 1 sphere of intermediate density was detected by the algorithm, which was not visible to the human eye. We respectfully contribute to this finding for a lucky detection. Interestingly, repetitive analysis of the same image always yielded the same results. Our data analysis demonstrated a clustering of data in high specificity regions within a specificity range of 0.71 to 1 (corresponding to the false-positive-rate interval of 0 to 0.29), associated with different levels of sensitivity. This is in line with the results of a previous study that initially validated the algorithm used.[10]
Our study had several limitations. First, the algorithm for the detection of pulmonary nodules was trained on human images and not on phantom data and artificial spheres. Because the Kyoto Lungman phantom used in our study is an anthropomorphic phantom whose X-ray absorption rates are very close to those of human tissue, which is known for its realistic simulation of human lungs we consider this limitation to be negligible. Our results are consistent with other recently published human data.
Second, the phantom used for the analysis was built as a normal anthropomorphic chest of a standardized human model. Comparable results of other published data represent a study group with a mean age of 64 ± 11 years, not specifying further underlying lung disease (e.g., emphysema) or osteoporosis that may yield in higher transparency of lung tissue. Therefore, we believe that the relative probability of finding a sphere placed at the intersection of ribs and/or pulmonary vessels is higher in our phantom analysis, which may contribute to a rather lower performance. An additional analysis of all nodules placed in the empty chest cavity without a vessel inlay resulted in the safe depiction of all nodules at the highest density. Only 8 mm density 2 was also visible to the human reader, but not depicted by the algorithm. No other sphere of intermediate or low density was visible both for human eye and for the algorithm.
Third, we only evaluated nodules up to 12 mm in size. Even if other authors evaluated nodules up to 30 mm in diameter[5] we estimated the highest yield of AI support in lung nodule detection in a clinical scenario for medium sized and small nodules that are usually prone to incidental finding. Larger nodules or masses are usually present in symptomatic patients and require workups in addition to chest radiography.
Forth, the ground truth was established as a consensus reading by radiologists. Low-density spheres representing ground-glass nodules were not visible to the human eye on chest radiography, even with CT correlation. Evidence on the definition of the density of ground-glass nodules is limited. Several authors have reported a mean density of −517 HU to −624 HU for adenocarcinoma presenting as aground glass nodule.[18,19] The spheres used for evaluation in our study had densities of −630 HU (d2) and −800 HU (d3), densities even below the mean values reported for adenocarcinoma in the literature. Ground-glass tends to be difficult to be identified radiographically especially in cases of mild expression: this is because the differential diagnosis of ground-glass nodules is mainly based on CT.
In conclusion although most studies evaluating AI performance in the detection of pulmonary nodules in chest radiography mainly evaluated a mix of varying nodule sizes, densities, and conspicuity levels, these are the first results of a controlled phantom-based study for lung nodule detection in chest radiography using a balanced number of nodules of all sizes and densities. To increase the radiologist’s diagnostic performance and to minimize the risk of decision bias, such algorithms have an obvious benefit in a clinical scenario. Our data may contribute to further evaluation of sensible utilization with the highest yield for patients benefit.
Acknowledgments
The Department of Radiology of the Kantonsspital Baden has research agreements with Siemens Healthcare. Siemens Healthineers provided technical support and the software tools for this study. One author is an employee of Siemens Healthineers AG (A.P.). He had no involvement in the study design; collection, analysis and interpretation of data; and in the decision to submit the article for publication.
Author contributions
Conceptualization: Mona El-Gedaily, Tilo Niemann.
Data curation: Mona El-Gedaily, Bastian Schulz, Tilo Niemann.
Investigation: Mona El-Gedaily, Mike Guldimann, Tilo Niemann.
Methodology: Mona El-Gedaily, André Euler, Bastian Schulz, Tilo Niemann.
Validation: Mona El-Gedaily, André Euler, Tilo Niemann.
Visualization: Mona El-Gedaily, Tilo Niemann.
Writing – original draft: Mona El-Gedaily, Tilo Niemann.
Writing – review & editing: Mona El-Gedaily, André Euler, Mike Guldimann, Bastian Schulz, Foroud Aghapour Zangeneh, Andreas Prause, Rahel A. Kubik-Huch, Tilo Niemann.
Project administration: André Euler, Rahel A. Kubik-Huch, Tilo Niemann.
Funding acquisition: Bastian Schulz, Rahel A. Kubik-Huch, Tilo Niemann.
Resources: Andreas Prause, Rahel A. Kubik-Huch.
Software: Andreas Prause.
Supervision: Rahel A. Kubik-Huch, Tilo Niemann.
Formal analysis: Tilo Niemann.
Abbreviations:
- ACC
- accuracy
- AI
- Artificial intelligence
- AUC
- area under the curve
- CAD
- computer-aided diagnosis
- CT
- Computed Tomography
- FN
- false negative
- FNR
- false negative rate
- FP
- false positive
- FPR
- false-positive rate
- HU
- Hounsfield units
- MAIC-10
- Must AI criteria-10
- sens
- sensitivity
- spec
- specificity
- TN
- true negative
- TP
- true positive
ME has received scientific grant from Guerbet AG, Switzerland.
No ethical approval was necessary for this phantom study.
The authors have no conflicts of interest to disclose.
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.
How to cite this article: El-Gedaily M, Euler A, Guldimann M, Schulz B, Aghapour Zangeneh F, Prause A, Kubik-Huch RA, Niemann T. Phantom evaluation of feasibility and applicability of artificial intelligence based pulmonary nodule detection in chest radiographs. Medicine 2024;103:47(e40485).
Contributor Information
Mona El-Gedaily, Email: mona.elgedaily@hirslanden.ch.
André Euler, Email: andre.euler@ksb.ch.
Mike Guldimann, Email: mike.guldimann@ksb.ch.
Bastian Schulz, Email: schulzbastian123@gmail.com.
Foroud Aghapour Zangeneh, Email: Foroud.AghapourZangeneh@ksb.ch.
Andreas Prause, Email: andreas.prause@siemens-healthineers.com.
Rahel A. Kubik-Huch, Email: rahel.kubik@ksb.ch.
References
- [1].Miró Catalina Q, Vidal-Alaball J, Fuster-Casanovas A, Escalé-Besa A, Ruiz Comellas A, Solé-Casals J. Real-world testing of an artificial intelligence algorithm for the analysis of chest X-rays in primary care settings. Sci Rep. 2024;14:5199. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [2].Bennani S, Regnard NE, Ventre J, et al. Using AI to improve radiologist performance in detection of abnormalities on chest radiographs. Radiology. 2023;309:e230860. [DOI] [PubMed] [Google Scholar]
- [3].Plesner LL, Müller FC, Nybing JD, et al. Autonomous chest radiograph reporting using AI: estimation of clinical impact. Radiology. 2023;307:e222268. [DOI] [PubMed] [Google Scholar]
- [4].Yoon SH, Park S, Jang S, et al. Use of artificial intelligence in triaging of chest radiographs to reduce radiologists’ workload. Eur Radiol. 2024;34:1094–103. [DOI] [PubMed] [Google Scholar]
- [5].van Leeuwen KG, Schalekamp S, Rutten MJCM, et al. ; Project AIR Working Group. Comparison of commercial AI software performance for radiograph lung nodule detection and bone age prediction. Radiology. 2024;310:e230981. [DOI] [PubMed] [Google Scholar]
- [6].Veronesi G, Baldwin DR, Henschke CI, et al. Recommendations for implementing lung cancer screening with low-dose computed tomography in Europe. Cancers (Basel). 2020;12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [7].Topff L, Steltenpool S, Ranschaert ER, et al. Artificial intelligence-assisted double reading of chest radiographs to detect clinically relevant missed findings: a two-centre evaluation. Eur Radiol. 2024;34:5876–85. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [8].Mohammed TLH, Chowdhry A, Reddy GP, et al.; Expert Panel on Thoracic Imaging. ACR appropriateness criteria® screening for pulmonary metastases. J Thorac Imaging. 2011;26:W1–3. [DOI] [PubMed] [Google Scholar]
- [9].Rudolph J, Huemmer C, Ghesu FC, et al. Artificial intelligence in chest radiography reporting accuracy: added clinical value in the emergency unit setting without 24/7 radiology coverage. Invest Radiol. 2022;57:90–8. [DOI] [PubMed] [Google Scholar]
- [10].Homayounieh F, Digumarthy S, Ebrahimian S, et al. An artificial intelligence-based chest X-ray model on human nodule detection accuracy from a multicenter study. JAMA Netw Open. 2021;4:e2141096. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [11].Liang CH, Liu YC, Wu MT, Garcia-Castro F, Alberich-Bayarri A, Wu FZ. Identifying pulmonary nodules or masses on chest radiography using deep learning: external validation and strategies to improve clinical practice. Clin Radiol. 2020;75:38–45. [DOI] [PubMed] [Google Scholar]
- [12].Chen S, Han Y, Lin J, Zhao X, Kong P. Pulmonary nodule detection on chest radiographs using balanced convolutional neural network and classic candidate detection. Artif Intell Med. 2020;107:101881. [DOI] [PubMed] [Google Scholar]
- [13].Itri JN, Tappouni RR, McEachern RO, Pesch AJ, Patel SH. Fundamentals of diagnostic error in imaging. Radiographics. 2018;38:1845–65. [DOI] [PubMed] [Google Scholar]
- [14].Wei J, Hagihara Y, Shimizu A, Kobatake H. Optimal image feature set for detecting lung nodules on chest X-ray images. In: Lemke HU, Inamura K, Doi K, Vannier MW, Farman AG, Reiber JHC, eds. CARS 2002 Computer Assisted Radiology and Surgery. Herausgeber: Springer Berlin Heidelberg; 2002, pp. 706–11. [Google Scholar]
- [15].Xu DM, Van Klaveren RJ, De Bock GH, et al. Role of baseline nodule density and changes in density and nodule features in the discrimination between benign and malignant solid indeterminate pulmonary nodules. Eur J Radiol. 2009;70:492–8. [DOI] [PubMed] [Google Scholar]
- [16].Mazzone PJ, Lam L. Evaluating the patient with a pulmonary nodule: a review. JAMA. 2022;327:264–73. [DOI] [PubMed] [Google Scholar]
- [17].Cerdá-Alberich L, Solana J, Mallol P, et al. MAIC-10 brief quality checklist for publications using artificial intelligence and medical images. Insights Imaging. 2023;14:11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [18].Fu F, Zhang Y, Wang S, et al. Computed tomography density is not associated with pathological tumor invasion for pure ground-glass nodules. J Thorac Cardiovasc Surg. 2021;162:451–9.e3. [DOI] [PubMed] [Google Scholar]
- [19].Heidinger BH, Anderson KR, Nemec U, et al. Lung adenocarcinoma manifesting as pure ground-glass nodules: correlating CT size, volume, density, and roundness with histopathologic invasion and size. J Thorac Oncol. 2017;12:1288–98. [DOI] [PubMed] [Google Scholar]






