Extract
Applications of artificial intelligence (AI) in respiratory medicine include AI-assisted interpretation of thoracic radiology, histopathology slides and physiological data [1], but most of these are not yet in widespread clinical use. Topalovic et al. [2] recently showed that AI software could out-perform pulmonologists in the interpretation of full lung function tests including spirometry, static lung volumes and diffusing capacity of the lung. In this issue of ERJ Open Research, Sunjaya et al. [3] report the results of a primary care validation study, using a spirometry-only version of this software. The AI software took as its input raw spirometry data (time and flow–volume curves) as well as demographic information (age, sex, ethnicity, height, weight and smoking pack-years). The output of the software was the probability of each of the following categories: “COPD”, “asthma”, “other obstructive lung disease”, “interstitial lung disease”, “unidentified” and “normal”, with the AI-preferred diagnosis being the one with the highest probability. The software was tested against 1113 clinical cases in which spirometry had been performed in primary care clinics in northwest London between September 2015 and March 2019. The gold standard was the consensus diagnosis of an expert panel of pulmonologists who had access to spirometry as well as the primary and secondary care case notes. The AI-preferred diagnosis had a sensitivity of 84.0% and specificity of 86.8% for the diagnosis of COPD. This compared to a better sensitivity of 90.6% but worse specificity of 67.5% when using the Global Initiative for Chronic Obstructive Lung Disease criteria of forced expiratory volume in 1 s/forced vital capacity ratio <0.7 alone.
Shareable abstract
Artificial intelligence tools may help with spirometry interpretation and diagnosis of COPD in primary care, but further work is needed to improve calibration to a primary care population and model interpretability https://bit.ly/42mWXIR
Applications of artificial intelligence (AI) in respiratory medicine include AI-assisted interpretation of thoracic radiology, histopathology slides and physiological data [1], but most of these are not yet in widespread clinical use. Topalovic et al. [2] recently showed that AI software could out-perform pulmonologists in the interpretation of full lung function tests including spirometry, static lung volumes and diffusing capacity of the lung. In this issue of ERJ Open Research, Sunjaya et al. [3] report the results of a primary care validation study, using a spirometry-only version of this software. The AI software took as its input raw spirometry data (time and flow–volume curves) as well as demographic information (age, sex, ethnicity, height, weight and smoking pack-years). The output of the software was the probability of each of the following categories: “COPD”, “asthma”, “other obstructive lung disease”, “interstitial lung disease”, “unidentified” and “normal”, with the AI-preferred diagnosis being the one with the highest probability. The software was tested against 1113 clinical cases in which spirometry had been performed in primary care clinics in northwest London between September 2015 and March 2019. The gold standard was the consensus diagnosis of an expert panel of pulmonologists who had access to spirometry as well as the primary and secondary care case notes. The AI-preferred diagnosis had a sensitivity of 84.0% and specificity of 86.8% for the diagnosis of COPD. This compared to a better sensitivity of 90.6% but worse specificity of 67.5% when using the Global Initiative for Chronic Obstructive Lung Disease criteria of forced expiratory volume in 1 s/forced vital capacity ratio <0.7 alone.
When interpreting these results, we should note that the AI was trained on a secondary care dataset, which is likely to have had a relatively low proportion of normal subjects compared to an undifferentiated primary care population, as well as a higher proportion of more unusual conditions such as neuromuscular disease, chest wall disease and cystic fibrosis. This will have directly influenced the probabilities produced by the model. Indeed, the AI software correctly identified only 33.3% of normal subjects, which could lead to unnecessary referrals to secondary care and anxiety for patients if this model was deployed in a primary care setting. This illustrates the broader point that the pre-test probability of a given diagnosis is just as important as the results of a diagnostic test in arriving at the final probability of a diagnosis. In the context of an AI model, this means that the prevalence of different conditions (including normality) should broadly align between the population used to train the model and that in which the model will be deployed clinically.
There may be scope to improve the clinical utility of the AI software in primary care by allowing it to take account of a wider range of input data. For instance, the diagnostic accuracy for asthma could be improved by incorporating pre- and post-bronchodilator spirometry, fractional exhaled nitric oxide or blood eosinophil counts. The current model was not able to accurately diagnose patients with “other obstructive lung disease” (mainly bronchiectasis) or patients with “unidentified/other lung disease” (mainly heart failure and extrathoracic restriction) as separate categories. Therefore, in primary care, it may be better to remove these categories and focus on a simpler classification scheme, such as “COPD”, “asthma”, “restrictive” and “normal”.
There is increasing recognition that when it comes to the use of AI models in healthcare, statistical accuracy is not enough – models also need to be interpretable so that patients and clinicians can trust their outputs and have the confidence to act on them. This means that AI models should provide an understandable explanation for their outputs. In the case of AI-enabled spirometry interpretation, the explanation could include which spirometric and demographic characteristics were used to reach the preferred diagnosis. Complex AI models are often described as “black boxes” because their internal reasoning is opaque, even to those who develop them. Methods have been developed to provide post hoc explanations for the outputs of these models, but for critical decisions, it is preferable to use simpler models that are interpretable by design [4]. The situation is further complicated in this case by the fact that the underlying AI algorithm is proprietary and has not been described in detail. This means that even if the underlying model is interpretable to those who developed it, it remains a black box to the outside world. This makes it all the more important that explanations are built into future iterations of the software.
The authors of this study are already planning to test whether the use of AI decision support can improve the diagnostic performance of primary care clinicians when interpreting spirometry [5]. This will be an important addition to the evidence base and may bring AI-enabled spirometry one step closer to the clinic.
Footnotes
Provenance: Commissioned article, peer reviewed.
Conflict of interest: S. Gonem has no conflicts of interest to declare.
References
- 1.Gonem S, Janssens W, Das N, et al. Applications of artificial intelligence and machine learning in respiratory medicine. Thorax 2020; 75: 695–701. doi: 10.1136/thoraxjnl-2020-214556 [DOI] [PubMed] [Google Scholar]
- 2.Topalovic M, Das N, Burgel PR, et al. Artificial intelligence outperforms pulmonologists in the interpretation of pulmonary function tests. Eur Respir J 2019; 53: 1801660. doi: 10.1183/13993003.01660-2018 [DOI] [PubMed] [Google Scholar]
- 3.Sunjaya A, Edwards GD, Harvey J, et al. Validation of artificial intelligence spirometry diagnostic support software in primary care: a blinded diagnostic accuracy study. ERJ Open Res 2025; 11: 00116-2025. doi: 10.1183/23120541.00116-2025 [DOI] [Google Scholar]
- 4.Rudin C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat Mach Intell 2019; 1: 206–215. doi: 10.1038/s42256-019-0048-x [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Doe G, El-Emir E, Edwards GD, et al. Comparing performance of primary care clinicians in the interpretation of SPIROmetry with or without Artificial Intelligence Decision support software (SPIRO-AID): a protocol for a randomised controlled trial. BMJ Open 2024; 14: e086736. [DOI] [PMC free article] [PubMed] [Google Scholar]
