Skip to main content

This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

Research Square logoLink to Research Square
[Preprint]. 2025 Jun 30:rs.3.rs-6296752. [Version 1] doi: 10.21203/rs.3.rs-6296752/v1

Deep Learning of Suboptimal Spirometry to Predict Respiratory Outcomes and Mortality

Davin Hill 1,2, Max Torop 1, Aria Masoomi 1, Peter J Castaldi 2,3,4, Edwin K Silverman 2,4,5, Sandeep Bodduluri 6, Surya P Bhatt 6, Taedong Yun 7, Cory Y McLean 7, Farhad Hormozdiari 7, Jennifer Dy 1,*,, Michael H Cho 2,4,5,*,, Brian D Hobbs 2,4,5,
PMCID: PMC12236919  PMID: 40630516

Abstract

Importance:

Obtaining spirometry requires repeated testing and using the maximal values based on quality control criteria. Whether the suboptimal efforts are useful for the prediction of respiratory outcomes is not clear.

Objective:

To determine whether a machine learning model could predict respiratory outcomes and mortality based on suboptimal spirometry.

Design:

Observational cohorts (UK Biobank and COPDGene).

Setting:

Multi-center; population, and disease-enriched.

Participants:

UK aged 40–69; US aged 45–80, >10 pack-years smoking, without respiratory diseases other than COPD or asthma.

Exposures:

Raw spirograms (volume-time).

Main outcomes and measures:

To create a combined representation of lung function we implemented a contrastive learning approach, Spirogram-based Contrastive Learning Framework (Spiro-CLF), which utilized all recorded volume-time curves per participant and applied different transformations (e.g. flow-volume, flow-time). We defined “maximal” efforts as those passing quality control (QC) with the maximum FVC; all other efforts, including submaximal and QC-failing efforts, were defined as “suboptimal”. We trained the Spiro-CLF model using both maximal and suboptimal efforts from the UK Biobank. We tested the model in a held-out 20% testing UK Biobank subset and COPDGene, on 1) binary predictions of FEV1/FVC <0.7, and FEV1 Percent Predicted (FEV1PP) <80%, 2) Cox regression for all-cause mortality, and 3) prediction of respiratory phenotypes.

Results:

We trained Spiro-CLF on 940,705 volume-time curves from 352,684 UKB participants with 2–3 spirometry efforts per individual (66.7% with 3 efforts) and at least one QC-passing spirometry effort. Of all spirometry efforts, 61.6% were suboptimal (37.5% submaximal and 24.1% QC-failing). In the UK Biobank, Spiro-CLF using QC-failing and submaximal efforts predicted FEV1/FVC < 0.7 with an Area under the Receiver Operating Characteristics (AUROC) of 0.956, mortality with a concordance index of 0.647, and asthma with a 9–42% improvement versus baseline models. In COPDGene (n=10,110 participants), adding QC-passing, submaximal efforts did not improve the prediction of lung function or mortality; however, Spiro-CLF representations predicted asthma and respiratory phenotypes (joint test P ≤ 2 × 103).

Conclusions and Relevance:

A machine-learning model can predict respiratory phenotypes using suboptimal spirometry; results from all spirometry efforts may contain valuable data. Additional studies are required to determine performance and utility in specific clinical scenarios.

1. Introduction

Spirometry is one of the most commonly used diagnostic tests in pulmonary medicine [1]. It is essential for diagnosing COPD and plays a critical role in assessing asthma, interstitial lung disease, neuromuscular conditions, preoperative pulmonary risk, and monitoring for drug toxicity and environmental exposures [2]. Annual spirometry is recommended for asthma and COPD [3], which together affect over 200 million people worldwide [4, 5].

Spirometry is performed by an individual taking a maximum inhalation followed by a maximum forced exhalation into a spirometry device, producing a graphical volume-time curve known as a spirogram. Spirograms are used to calculate key spirometry measurements, including forced expiratory volume in 1 second (FEV1), forced vital capacity (FVC), the FEV1/FVC ratio, and forced mid-expiratory flow (FEF) [6].

However, spirometry measurements are often noisy, with repeated efforts yielding variable measurements [7]. Unlike most biomedical measurements, spirometry depends heavily on effort and requires repeated attempts to ensure reliability. Clinical guidelines mandate at least two high-quality, reproducible efforts with minimal variability in measured volumes, though three or more attempts are commonly performed. [6, 8] The “best” spirometry measurements are selected as the highest values recorded across all high-quality, reproducible efforts. Spirometry quality control (QC) guidelines include factors such as minimal back-extrapolated volume (i.e., assuring the forced exhalation effort is maximal from the very start of the exhalation maneuver), sufficient amount of time performing the forced exhalation, and a sufficient flow plateau at the end of the forced exhalation effort. Any spirometry efforts that are not reproducible or do not meet QC guidelines are discarded, since traditional spirometry measurements are only extracted from the QC-passing, maximal efforts.

We hypothesize that the suboptimal (i.e. submaximal and QC-failing) spirometry efforts carry valuable information about lung function and overall health outcomes, which can be effectively captured using deep learning methods trained on the entirety of the spirometry curves. While the suboptimal efforts may be less informative compared with the maximal effort, it may remain relevant for understanding overall lung function. For example, the suboptimal efforts contain information related to the variance and reproducibility of the reported spirometry measures; the existence of failed efforts may be correlated to underlying lung function and pulmonary disease. Additionally, the combined information from the suboptimal efforts may help reconstruct the information from the maximal effort, potentially reducing the number of efforts required during a clinical visit.

Therefore, the objective of this study was to evaluate whether information derived from suboptimal spirometry efforts could provide comparable or even superior relevance for lung function and mortality prediction. To do this, we developed a deep learning method, the Spirogram-based Contrastive Learning Framework (Spiro-CLF), to learn nonlinear vector representations of lung function from the entirety of an individual’s spirometry efforts, including suboptimal efforts. Spiro-CLF does not require any manual feature engineering or annotation of the spirograms, and is trained using a self-supervised, contrastive learning approach [9].

We trained the Spiro-CLF model on raw, pre-bronchodilator spirograms from the UK Biobank study (UKB) and assessed the quality of the learned vector representations on a held-out UK Biobank test dataset across three prediction tasks: (1) prevalence of lung function impairment, (2) mortality risk prediction, and (3) prediction of respiratory phenotypes. In each task, the vector representations of lung function served as substitutes for traditional spirometry measures. We additionally applied the Spiro-CLF model on the Genetic Epidemiology of COPD Study (COPDGene) to further validate the quality of the extracted representations on a separate cohort.

Our results showed that including the suboptimal spirometry efforts led to improved prediction of lung function impairment, mortality prediction, and phenotype prediction compared to both utilizing only maximal efforts and classic summary measures of lung function from spirometry. The Spiro-CLF model was able to be applied directly to the COPDGene dataset without any additional training, demonstrating the generalizability of the model to other datasets. These results indicate that Spiro-CLF is able to derive vector representations of lung function from suboptimal spirometry that are more predictive of clinical outcomes than traditional spirometry measures.

2. Methods

2.1. UKB Spirogram Preprocessing

UKB participants were recruited across 21 assessment centers in the United Kingdom for the UK Biobank database. Additional details regarding the UK Biobank study were previously published in [10]. A summary of participant characteristics is listed in Table 1. We extracted raw, pre-bronchodilator spirograms from UK Biobank field 3066. We utilized the spirograms from the initial visit for each participant, thus excluding multiple visits from the same participant. Spirograms were recorded as total exhaled lung volumes (mL) at 10 ms intervals over a median of 7.9 seconds. Each participant had between one and three spirometry efforts.

Table 1:

Characteristics of participants in the UK Biobank study.

Characteristic Training Set Validation Set Test Set
Participants, n 211,603 70,534 70,535
Spirometry Efforts, n 564,509 188,105 188,091
Participants with 3 Efforts, n (%) 141,279 (66.8) 47,037 (66.7) 47,021 (66.7)
Age, yr, mean (SD) 56.3 (8.1) 56.3 (8.1) 56.3 (8.1)
Sex, F, n (%) 118,064 (55.8) 39,528 (56.0) 39,454 (55.9)
Ethnicity, Self-Reported
 Asian or Asian British, n (%) 3,564 (1.7) 1,266 (1.8) 1,276 (1.8)
 Black or Black British, n (%) 2,752 (1.3) 939 (1.3) 938 (1.3)
 Chinese, n (%) 606 (0.3) 209 (0.3) 207 (0.3)
 Other, n (%) 2928 (1.4) 931 (1.3) 930 (1.3)
 White, n (%) 201,074 (95.3) 66,968 (95.2) 66,963 (95.2)
Ever Smoke, n (%) 96,277 (45.5) 32,039 (45.4) 32,171 (45.6)
FEV1 / FVC, mean (SD) 0.74 (0.06) 0.74 (0.06) 0.74 (0.06)
Death Events, n (%) 7,376 (3.5) 2,536 (3.6) 2,478 (3.5)
Time to Event, yr, mean (SD) 11.2 (1.5) 11.2 (1.5) 11.2 (1.5)

The following quality control filtering and preprocessing steps were applied to the spirogram samples. We used the spirometry quality control as previously outlined by Shrine et al. [11], which keeps key components of the ATS and ERS guidelines with modifications to retain additional samples. Specifically, a more liberal threshold for repeatability was used, which is explained in further detail in Shrine et al. [11]. We considered failing QC according to the steps outlined in Table 3. We considered efforts passing these QC criteria, but not corresponding to the maximal effort, “submaximal”. For model validation purposes, we required all participants to have at least one QC-passing, reproducible effort. Effort reproducibility was defined as the presence of a second effort that achieved an FVC within 250mL of the maximal FVC. Spirometry efforts were flagged as passing or failing each step of the QC process (Fig. 1). After each step, participants with less than one passing effort were removed from the dataset. After removing the participants with less than one QC-passing effort or fail the reproducibility criteria, the entirety of the efforts from remaining participants are included in the dataset. The distribution of remaining efforts is shown in Fig. 2.

Fig. 1:

Fig. 1:

Overview of the Quality Control (QC), Spiro-CLF training, and downstream prediction process. The QC process contains participant filters, which contains participant-level criteria, and blow filters, which contain effort-level criteria. In each blow filter, spirometry efforts are tested against the specified criteria and labeled as either passing or failing the given filter. Participants with less than one QC-passing efforts or failing to produce a reproducible effort are removed from the dataset. QC criteria details are provided in Table 3.

Fig. 2:

Fig. 2:

Number of UKB efforts after preprocessing, listed by effort type. QC-Pass, Submaximal250mL denotes efforts that pass QC with FVC within 250mL of the maximal FVC.

After the QC process, the raw spirograms was downsampled from 10ms to 60ms intervals and the exhalation duration was limited to 30s to improve model efficiency. Efforts were truncated after reaching FVC; we replaced the spirometry volume values after achieving FVC to be the total spirogram FVC.

2.2. COPDGene Spirogram Preprocessing

The Genetic Epidemiology of COPD Study (COPDGene) is an observational study which contains detailed genetic, phenotyping, and spirogram data for over 10,000 participants [12]. A summary of participant characteristics is listed in Table 2. We utilized the spirograms recorded from each patient’s initial visit. The raw spirograms were recorded as total exhaled volume (mL) in 60ms intervals. We follow the UKB preprocessing procedure by limiting spirograms to 30s, and replacing spirometry volume values after achieving FVC to be the total spirogram FVC. In contrast to UKB, the spirograms in COPDGene were already processed with automated and human annotation to remove QC-failing efforts. Therefore, each participant in COPDGene had between 2–3 spirometry efforts (97.0% with 3 efforts), with all efforts (n=30,012) passing QC.

Table 2:

Characteristics of participants in the COPDGene study.

Characteristic Training Set Validation Set Test Set
Participants, n 6,066 2,022 2,022
Spirometry Efforts, n 18,004 6,000 6,008
 Maximal Efforts, n (%) 6,066 (33.7) 2,022 (33.7) 2,022 (33.7)
 Submaximal Efforts, n (%) 11,938 (66.3) 3,978 (66.3) 3,986 (66.3)
Participants with 3 Efforts, n (%) 5,880 (96.9) 1,959 (96.9) 1,965 (97.2)
Age, yr, mean (SD) 59.5 (9.0) 59.3 (8.9) 59.7 (9.3)
Sex, F, n (%) 2,807 (46.3) 980 (48.4) 945 (46.7)
Ethnicity, Self-Reported
 African American, n (%) 1,979 (32.6) 692 (34.2) 694 (34.3)
 Non-Hispanic White, n (%) 4,087 (67.4) 1,330 (65.8) 1,328 (65.7)
FEV1 / FVC, mean (SD) 0.65 (0.16) 0.66 (0.15) 0.65 (0.16)
Death Events, n (%) 1,596 (26.3) 510 (25.2) 544 (26.9)
Time to Event, yr, mean (SD) 8.7 (4.4) 8.7 (4.4) 8.8 (4.3)

2.3. Overview of Spiro-CLF

The Spiro-CLF framework consists of two stages (Fig. 3). First, the Spiro-CLF model was trained using a self-supervised, contrastive learning approach1 [9]. The Spiro-CLF model consists of a Convolutional Neural Network (CNN) [13] encoder, which maps the spirogram inputs to a vector representation, and a Multi-Layer Perceptron (MLP) projection head, which maps the vector representation to a prediction space. During training, the volume-time spirograms were augmented with random transformations, including flow-time and flow-volume transformations, to increase the diversity of the training data. A binary encoding, indicating transformation type, was appended to the end of each spirogram sample. The Spiro-CLF model was then trained to cluster spirograms from the same participant together in feature space, while separating spirograms from different participants, using the NT-XENT contrastive loss function [9]. This training procedure encouraged the Spiro-CLF model to identify the unique characteristics of each participant’s lung function contained within the entirety of the patient’s spirometry efforts. Spiro-CLF was trained on a randomly selected 60% subset of the UKB dataset. Model hyperparameters were selected using a separate 20% validation set.

Fig. 3:

Fig. 3:

Top: Schematic of the Spiro-CLF training process. We randomly applied flow-time, flow-volume, and identity (volume-time) transformations to each spirometry effort within the training batch. The Spiro-CLF model is trained with contrastive loss to predict high pairwise similarity between efforts from the same individual and low pairwise similarity between efforts from different individuals. Bottom: Once the Spiro-CLF model is trained, we applied the encoder network to the entirety of an individual’s efforts, including transformation, to generate a single feature representation for each individual. This representation can then be used in a variety of downstream predictive tasks.

In the second stage, the trained encoder was used to extract the learned vector representations for each participant. The encoder network processed the entirety of an individual’s spirometry efforts, generating separate representations for each spirogram and transformation. These representations were then averaged to produce a single vector representation of lung function for each participant, which we refer to as the Spiro-CLF representation

To evaluate the performance of Spiro-CLF, we utilized three separate test sets that were excluded from the training process: 1) a randomly selected 20% subset of the UK Biobank dataset, 2) samples from two held-out UK Biobank assessment centers, and 3) the COPDGene cohort. The latter two groups served as external replication datasets to validate the model’s generalizability. During Spiro-CLF validation, we generated vector representations for each participant in both training and test sets. For each prediction task, we then trained linear prediction models on the vector representations from the training set, then evaluated model performance on the respective test sets.

Further method details, background on contrastive learning, and additional results can be found in the supplement (App. D, E, F).

3. Results

3.1. Lung Function Impairment Prediction with Spiro-CLF Representations

Lung function impairment is often defined using FEV1/FVC and FEV1PP metrics, derived from the maximal, QC-passing effort [1]. We evaluated the ability of the Spiro-CLF vector representations to predict lung function impairment when restricted to using submaximal and QC-failing efforts. This simulates a scenario where only suboptimal spirometry data is available for a given individual. We included two separate prediction tasks: A) FEV1/FVC< 0.7 and B) FEV1PP< 80%. Spiro-CLF was calculated using GLI-2012 reference values [14] with covariates sex, height, age, and self-reported ethnicity. Covariates were excluded from the prediction model in order to directly evaluate the Spiro-CLF representations. Results are shown in Figure 4. We omitted COPDGene results from this section, since COPDGene does not include QC-failing spirograms.

Fig. 4:

Fig. 4:

Results of the lung function impairment prediction tasks: FEV1/FVC< 0.7 (Top), and FEV1PP< 80% (Bottom). The error bars represent 95% bootstrap confidence intervals (50 iterations).

We observed that the predictive performance of the Spiro-CLF representations on QC-failing and submaximal efforts was significantly recovered as compared to the baseline of using the maximal effort on both the FEV1/FVC and FEV1PP prediction tasks. In the FEV1/FVC task, the Spiro-CLF representations for submaximal and QC-failing efforts achieved 0.956 AUROC. Exclusively using submaximal efforts yielded AUROC of 0.975. Similarly, in the FEV1PP prediction task we compared the performance of the submaximal and QC-failing features to a baseline performance of using only features from maximal efforts. Note that the FEV1PP “maximal” baseline did not achieve 100% AUROC performance due to the omitted covariate information from the prediction model. Using the Spiro-CLF features achieved 0.870 (submaximal and QC-failing efforts) and 0.881 (submaximal efforts) AUROC compared to the baseline of 0.887 AUROC.

We further assessed the performance of the Spiro-CLF model using only efforts that failed QC. When restricted to efforts failing due to excessive time to peak expiratory flow (PEF), the model achieved an AUROC of 0.918 for task A and 0.839 for task B.

3.2. All-Cause Mortality Prediction with Spiro-CLF Features

We used a mortality prediction task to validate that the Spiro-CLF representation contains additional relevant information for disease progression and health outcomes. We trained Cox regression models2 on the Spiro-CLF representations for the UKB and COPDGene training partitions. The resulting model fit was evaluated on the respective test partitions using the concordance index (c-index). For comparison, we trained additional Cox models on alternative spirometry metrics. No additional covariates were included in the model training in order to compare the predictive power of each metric.

Results are shown in Figure 5. In the UKB dataset, Spiro-CLF Cox model achieved a c-index of 0.647, which surpassed the performance of the second-highest performing single metric, FEV1/FVC, with c-index 0.597 (P ≤ 1.4 × 10−37). By combining metrics FEV1, FVC, and FEV1/FVC, we increased the c-index to 0.616. Including all competing metrics in the Cox model further improved c-index to 0.622, which was still exceeded in performance by the Spiro-CLF features (P ≤ 8.3×10−23). We also evaluated Spiro-CLF performance using two UKB assessment centers (Bristol and Leeds) as the testing set. Results were consistent with the previous findings (App. F.6).

Fig. 5:

Fig. 5:

Results of a time-to-event mortality prediction task measured using c-index (higher is better). We trained the Cox regression model using different lung function representations to compare their predictive power with respect to mortality prediction. The error bars represent 95% bootstrap confidence intervals (50 iterations). **** indicates significance at the α = 10−3 significance level, with Holm-Bonferroni adjustment.

Spiro-CLF Cox model acheived a 0.702 c-index on the COPDGene dataset. In contrast to the UKB results, the Spiro-CLF Cox model did not show statistically significant improvement in c-index performance over the Cox models with combined spirometry metrics. The Spiro-CLF Cox model exhibited a mild performance increase over the best-performing single-metric Cox models.

3.3. Phenotype Prediction

We used a Phenotype prediction task to evaluate the predictive power of the Spiro-CLF representations with respect to various phenotypes related to lung function. We trained generalized linear models (GLM) on 13 different phenotypes (App. D.5.3) with 4 different sets of base predictors:

  1. Intercept-Only. No predictors were included. The linear model was fit using only the intercept term.

  2. Covariates. We included relevant covariates for each phenotype, as specified in Table 4. Covariates were selected based on prior work in COPDGene phenotype association [15].

  3. Spirogram Metrics. We included traditional spirometry metrics FEV1, FVC, and FEV1/FVC, taken from each participant’s maximal QC-passing effort.

  4. Covariates & Spirogram Metrics. All previously specified covariates and spirogram metrics are included.

For each combination of phenotype and set of predictors, we trained two GLMs including and excluding Spiro-CLF features (m = 120 models in total). Model fit was evaluated using AUROC for binary phenotypes (Asthma, Chronic Bronichitis) and Mean Squared Error (MSE) for all other phenotypes. We then compared the improvement in model fit when including Spiro-CLF representations (Figure 6).

Fig. 6:

Fig. 6:

Improvement in model fit when combining Spiro-CLF vector representations with different sets of base predictors (Covariates, Spirogram Metrics, and both Covariates & Spirogram Metrics) to predict respiratory phenotypes. Model fit is evaluated using AUROC for Logistic Regression models and MSE for all other models. Error bars represent 95% bootstrap confidence intervals (50 iterations).

We observed that, using Spiro-CLF alone, we could improve prediction of all measured phenotypes by at least 11%. SpiroCLF also improved prediction over standard SM alone, and spirometry features and covariates. In COPDGene, we found a 15% improvement for prediction of normal lung on CT scan. For prediction of asthma, inclusion of spirometry led to minimal improvement in COPDGene (2% with covariates), but a 9% improvement in asthma in UKBB. Additionally, the Spiro-CLF features were jointly significant for all sets of predictors (P ≤ 2 × 10−3, App. F.5)

4. Discussion

The clinical test of spirometry involves repeating efforts and choosing specific features of the best results for clinical reporting. In this work, we show that a deep learning model, trained on repeated measures of the raw data, can improve prediction of relevant respiratory outcomes.

In the population-based UK Biobank sample, we demonstrate significant prediction of lung function impairment, mortality, and respiratory phenotypes using QC-failed efforts. In COPDGene, which only had QC-passed, reviewed, submaximal efforts, we demonstrate prediction of respiratory phenotypes. Our results suggest that the benefit of Spiro-CLF will depend on whether suboptimal efforts are available.

Our work complements other efforts in applying deep learning to raw spirometry curves for different tasks. Supervised deep learning models have been developed for applications such as predicting Chronic Obstructive Pulmonary Disease (COPD) risk [16, 17], COPD subtyping [18], prediction of upper airway obstruction [19], or acceptability criteria [2022]. In contrast, our approach leverages unsupervised methods to learn a deep representation of lung function. Spiro-CLF uses a contrastive learning framework to generate a single representation using the entirety of an individual’s blows. This differs from other approaches [23] using Variational Autoencoders [24], which only use a single effort per individual.

Spiro-CLF features are a representation of lung function that are trained to be robust to noise inherent in the spirometry testing process. Therefore the Spiro-CLF features can be calculated for any effort from a given individual, including QC-failing and submaximal efforts. Our model is able to maintain high prediction of lung function impairment using QC-failing and submaximal efforts, and improve prediction of mortality when including the entirety of an individual’s efforts. In addition, the Spiro-CLF representation is generated from an unsupervised model which can be easily transferred across datasets. The resulting vector representation can directly replace traditional spirometry measures in any prediction task. Our findings suggest that Spiro-CLF or similar machine learning models could augment current clinical spirometry testing by providing a more accurate prediction of airflow limitation, lung function, and other outcomes, even in the absence of QC-passing, reproducible spirometry. Specifically, our model or similar models may be able to provide a prediction of reduced FEV1/FVC ratio or reduced FEV1 when QC-passing results are not available; in addition, our model can provide improved predictions of respiratory phenotypes, such as asthma or emphysema, beyond standard measures of lung function.

There are a number of possible causes for the improved predictive performance gained from using the Spiro-CLF representation. First, the Spiro-CLF encoding incorporates information from suboptimal spirograms; in particular, the QC-failing efforts may indicate underlying lung function impairment. For instance, elderly patients, or those with severe lung disease, may have difficulty performing a traditional FVC measurement appropriately and thus their spirometry efforts may be more likely to fail QC [2528]. In addition, QC-passing, submaximal efforts contribute information about the variance relative to the maximal effort, offering insights not captured by maximal efforts alone.

The neural network parametrization of the Spiro-CLF model further improves predictive performance by encoding nonlinear effects from the raw spirograms. Previous works have shown that machine learning methods can be applied to raw spirograms to improve various prediction tasks related to COPD [16], COPD subtypes [18], and genetic association [17, 23].

Another possible factor in improved performance is the increase in training samples from utilizing the entirety of an individual’s efforts and additionally applying data transformations to obtain flow-time, flow-volume, and volume-time views of each effort. The combined efforts and transformations effectively increase the number of training samples per individual, up to 9x more when compared with using the maximal effort and a single blow representation. Deep learning models are known to require a significant number of data samples during training [29] and increasing the number of samples and the variability in data views offered by volume-time transformations to flow-volume and flow-time enables the use of more complex models and reduces the likelihood of over fitting the training data. We were additionally able to show that the trained Spiro-CLF model was able to be applied to smaller datasets, such as COPDGene, that may not necessarily have a sufficient dataset size to train a deep learning model from scratch with the same level of performance.

While we obtained QC-failing efforts from multiple testing centers in UK Biobank, we do not know whether these results are generalizable to QC-failing efforts using other spirometers in other scenarios of clinical care. In order to compare our models to “ground truth”, we excluded participants that were unable to perform at least one QC-passing spirometry effort. The UKB study was predominantly of self-identified white race, European genetic ancestry, and - as a volunteer cohort - has a “healthy volunteer” selection bias, while COPDGene is comprised of smokers with at least 10-pack-years. While we provide some assessment of features, showing that the more important features of our model are from the beginning of the blow, and we additionally predict normal (non-emphysematous or gas trapping lung), the specific physiologic parameters which Spiro-CLF is using for prediction are not known. Our results also do not suggest that existing lung function interpretation guidelines should change; in fact, the FEV1, FVC, and FEV1/FVC ratio as summative measures have advantages of simplicity and interpretability [30]. We did not evaluate performance of our model in other scenarios. Additional training may be needed when applying the Spiro-CLF model to settings including clinical spirometry. During training, additional validation may also be needed to ensure that model parameter and architecture choices are optimal with respect to the new dataset.

In conclusion, we developed a deep learning model leveraging repeated raw spirograms to predict lung impairment, mortality, and respiratory phenotypes. Our work underscores the richness of data embedded in spirometry beyond traditional metrics and highlights the potential of machine learning to maximize the use of this data.

Supplementary Material

1

Key Points.

Question:

Can suboptimal spirometry data predict respiratory outcomes?

Findings:

We analyzed raw spirogram curves, including efforts that failed quality control from the UK Biobank, as well as submaximal efforts from both the UK Biobank and COPDGene cohorts, using the Spirogram-based Contrastive Learning Framework (Spiro-CLF). Spiro-CLF predicted asthma and other respiratory phenotypes beyond standard lung function measures, and, in UK Biobank, predicted lung function impairment and all-cause mortality.

Meaning:

This study demonstrates that suboptimal spirometry efforts can significantly contribute to predicting respiratory outcomes.

Acknowledgements

This research has been conducted using the UK Biobank Resource under application number 20915.

Funding

MHC is supported by NIH R01HL137927, R01HL135142, HL147148, and HL089856.

BDH was supported by NIH K08HL136928, U01 HL089856, and an Alpha-1 Foundation Research Grant.

DH is supported by NIH 2T32HL007427-41

MT and JD were supported by by NIH U24CA264369, and DOD grant ME230206P1

EKS is supported by NIH R01 HL152728, R01 HL147148, U01 HL089856, R01 HL133135, and P01 HL114501.

PJC is supported by NIH R01HL124233 and R01HL147326.

SPB is supported by NIH R01HL151421 and UH3HL155806.

TY, FH, and CYM are employees of Google LLC.

The COPDGene study (NCT00608764) was supported by NHLBI grants U01 HL089897 and U01 HL089856 and by NIH contract 75N92023D00011. The COPDGene study has also been supported by the COPD Foundation through contributions made to an Industry Advisory Committee that has included AstraZeneca, Bayer Pharmaceuticals, Boehringer-Ingelheim, Genentech, GlaxoSmithKline, Novartis, Pfizer, and Sunovion.

Disclosures

BDH received grant support from Bayer.

MHC has received grant support from GlaxoSmithKline and Bayer, consulting fees from Genentech and AstraZeneca, and speaking fees from Illumina.

EKS has received grant support from and Bayer and Northpond Laboratories.

PJC has received grant support from Bayer.

SPB has received consulting fees from Sanofi/Regeneron and Boehringer Ingelheim, and CME fees from IntegrityCE. His institute has received funds from Sanofi and Nuvaira for the conduct of clinical trials.

TY, FH, and CYM are employees of Google LLC and own Alphabet stock.

The COPDGene study was supported by the COPD Foundation through contributions made to an Industry Advisory Board that has included AstraZeneca, Bayer Pharmaceuticals, Boehringer-Ingelheim, Genentech, GlaxoSmithKline, Novartis, Pfizer, and Sunovion.

Funding Statement

MHC is supported by NIH R01HL137927, R01HL135142, HL147148, and HL089856.

BDH was supported by NIH K08HL136928, U01 HL089856, and an Alpha-1 Foundation Research Grant.

DH is supported by NIH 2T32HL007427-41

MT and JD were supported by by NIH U24CA264369, and DOD grant ME230206P1

EKS is supported by NIH R01 HL152728, R01 HL147148, U01 HL089856, R01 HL133135, and P01 HL114501.

PJC is supported by NIH R01HL124233 and R01HL147326.

SPB is supported by NIH R01HL151421 and UH3HL155806.

TY, FH, and CYM are employees of Google LLC.

The COPDGene study (NCT00608764) was supported by NHLBI grants U01 HL089897 and U01 HL089856 and by NIH contract 75N92023D00011. The COPDGene study has also been supported by the COPD Foundation through contributions made to an Industry Advisory Committee that has included AstraZeneca, Bayer Pharmaceuticals, Boehringer-Ingelheim, Genentech, GlaxoSmithKline, Novartis, Pfizer, and Sunovion.

Footnotes

1

We provide a review of contrastive learning methods in App. E.1.

2

We provide a review on Cox regression models and concordance index in App. E

References

  • [1].Stanojevic Sanja, Kaminsky David A., Miller Martin R., Thompson Bruce, Aliverti Andrea, Barjaktarevic Igor, Cooper Brendan G., Culver Bruce, Derom Eric, Hall Graham L., Hallstrand Teal S., Leuppi Joerg D., MacIntyre Neil, McCormack Meredith, Rosenfeld Margaret, and Swenson Erik R.. ERS/ATS technical standard on interpretive strategies for routine lung function tests. European Respiratory Journal, 60(1):2101499, July 2022. doi: 10.1183/13993003.01499-2021. [DOI] [PubMed] [Google Scholar]
  • [2].Vestbo Jørgen, Hurd Suzanne S, Agustí Alvar G, Jones Paul W, Vogelmeier Claus, Anzueto Antonio, Barnes Peter J, Fabbri Leonardo M, Martinez Fernando J, Nishimura Masaharu, et al. Global strategy for the diagnosis, management, and prevention of chronic obstructive pulmonary disease: GOLD executive summary. American journal of respiratory and critical care medicine, 187(4): 347–365, 2013. [DOI] [PubMed] [Google Scholar]
  • [3].Agustí Alvar, Celli Bartolome R., Criner Gerard J., Halpin David, Anzueto Antonio, Barnes Peter, Bourbeau Jean, Han MeiLan K., Martinez Fernando J., de Oca Maria Montes, Mortimer Kevin, Papi Alberto, Pavord Ian, Roche Nicolas, Salvi Sundeep, Sin Don D., Singh Dave, Stockley Robert, López Varela M. Victorina, Wedzicha Jadwiga A., and Vogelmeier Claus F.. Global Initiative for Chronic Obstructive Lung Disease 2023 Report: GOLD Executive Summary. Respirology (Carlton, Vic.), 28(4):316–338, April 2023. ISSN 1440-1843 1323-7799. doi: 10.1111/resp.14486. [DOI] [PubMed] [Google Scholar]
  • [4].Wang Zhufeng, Li Yun, Gao Yi, Fu Yu, Lin Junfeng, Lei Xuedong, Zheng Jinping, and Jiang Mei. Global, regional, and national burden of asthma and its attributable risk factors from 1990 to 2019: A systematic analysis for the Global Burden of Disease Study 2019. Respiratory Research, 24(1): 169, June 2023. ISSN 1465-993X. doi: 10.1186/s12931-023-02475-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [5].Adeloye Davies, Chua Stephen, Lee Chinwei, Basquill Catriona, Papana Angeliki, Theodoratou Evropi, Nair Harish, Gasevic Danijela, Sridhar Devi, Campbell Harry, Chan Kit Yee, Sheikh Aziz, and Rudan Igor. Global and regional estimates of COPD prevalence: Systematic review and metaanalysis. Journal of global health, 5(2):020415, December 2015. ISSN 2047-2978 2047-2986. doi: 10.7189/jogh.05.020415. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [6].Miller Martin R, Hankinson JATS, Brusasco Vito, Burgos F, Casaburi R, Coates A, Crapo R, Enright Pvd, Van Der Grinten CPM, Gustafsson P, et al. Standardisation of spirometry. European respiratory journal, 26(2):319–338, 2005. [DOI] [PubMed] [Google Scholar]
  • [7].Janssens W., Liu Y., Liu D., Kesten S., Tashkin D.P., Celli B.R., and Decramer M.. Quality and reproducibility of spirometry in COPD patients in a randomized trial (UPLIFT®). Respiratory Medicine, 107(9):1409–1416, 2013. ISSN 0954-6111. doi: 10.1016/j.rmed.2013.04.015. [DOI] [PubMed] [Google Scholar]
  • [8].Graham Brian L., Steenbruggen Irene, Miller Martin R., Barjaktarevic Igor Z., Cooper Brendan G., Hall Graham L., Hallstrand Teal S., Kaminsky David A., McCarthy Kevin, McCormack Meredith C., Oropez Cristine E., Rosenfeld Margaret, Stanojevic Sanja, Swanney Maureen P., and Thompson Bruce R.. Standardization of Spirometry 2019 Update. An Official American Thoracic Society and European Respiratory Society Technical Statement. American journal of respiratory and critical care medicine, 200(8):e70–e88, October 2019. ISSN 1535-4970 1073-449X. doi: 10.1164/rccm.201908-1590ST. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [9].Chen Ting, Kornblith Simon, Norouzi Mohammad, and Hinton Geoffrey. A simple framework for contrastive learning of visual representations. In Daumé Hal III and Singh Aarti, editors, Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 1597–1607. PMLR, July 2020. [Google Scholar]
  • [10].Sudlow Cathie, Gallacher John, Allen Naomi, Beral Valerie, Burton Paul, Danesh John, Downey Paul, Elliott Paul, Green Jane, Landray Martin, Liu Bette, Matthews Paul, Ong Giok, Pell Jill, Silman Alan, Young Alan, Sprosen Tim, Peakman Tim, and Collins Rory. UK biobank: An open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLOS Medicine, 12(3):1–10, March 2015. doi: 10.1371/journal.pmed.1001779. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [11].Shrine Nick, Guyatt Anna L., Erzurumluoglu A. Mesut, Jackson Victoria E., Hobbs Brian D., Melbourne Carl A., Batini Chiara, Fawcett Katherine A., Song Kijoung, Sakornsakolpat Phuwanat, Li Xingnan, Boxall Ruth, Reeve Nicola F., Obeidat Ma’en, Zhao Jing Hua, Wielscher Matthias, Weiss Stefan, Kentistou Katherine A., Cook James P., Sun Benjamin B., Zhou Jian, Hui Jennie, Karrasch Stefan, Imboden Medea, Harris Sarah E., Marten Jonathan, Enroth Stefan, Kerr Shona M., Surakka Ida, Vitart Veronique, Lehtimäki Terho, Allen Richard J., Bakke Per S., Beaty Terri H., Bleecker Eugene R., Bossé Yohan, Brandsma Corry-Anke, Chen Zhengming, Crapo James D., Danesh John, DeMeo Dawn L., Dudbridge Frank, Ewert Ralf, Gieger Christian, Gulsvik Amund, Hansell Anna L., Hao Ke, Hoffman Joshua D., Hokanson John E., Homuth Georg, Joshi Peter K., Joubert Philippe, Langenberg Claudia, Li Xuan, Li Liming, Lin Kuang, Lind Lars, Locantore Nicholas, Luan Jian’an, Mahajan Anubha, Maranville Joseph C., Murray Alison, Nickle David C., Packer Richard, Parker Margaret M., Paynton Megan L., Porteous David J., Prokopenko Dmitry, Qiao Dandi, Rawal Rajesh, Runz Heiko, Sayers Ian, Sin Don D., Smith Blair H., Artigas María Soler, Sparrow David, Tal-Singer Ruth, Timmers Paul R. H. J., Van den Berge Maarten, Whittaker John C., Woodruff Prescott G., Yerges-Armstrong Laura M., Troyanskaya Olga G., Raitakari Olli T., Kähönen Mika, Polašek Ozren, Gyllensten Ulf, Rudan Igor, Deary Ian J., Probst-Hensch Nicole M., Schulz Holger, James Alan L., Wilson James F., Stubbe Beate, Zeggini Eleftheria, Jarvelin Marjo-Riitta, Wareham Nick, Silverman Edwin K., Hayward Caroline, Morris Andrew P., Butterworth Adam S., Scott Robert A., Walters Robin G., Meyers Deborah A., Cho Michael H., Strachan David P., Hall Ian P., Tobin Martin D., and Wain Louise V.. New genetic signals for lung function highlight pathways and chronic obstructive pulmonary disease associations across multiple ancestries. Nature Genetics, 51(3):481–493, March 2019. ISSN 1546-1718. doi: 10.1038/s41588-018-0321-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [12].Regan Elizabeth A., Hokanson John E., Murphy James R., Make Barry, Lynch David A., Beaty Terri H., Curran-Everett Douglas, Silverman Edwin K., and Crapo James D.. Genetic Epidemiology of COPD (COPDGene) Study Design. COPD: Journal of Chronic Obstructive Pulmonary Disease, 7(1):32–43, February 2011. ISSN 1541–2555. doi: 10.3109/15412550903499522. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [13].LeCun Yann, Bengio Yoshua, and Hinton Geoffrey. Deep learning. nature, 521(7553):436–444, 2015. [DOI] [PubMed] [Google Scholar]
  • [14].Quanjer Philip H., Stanojevic Sanja, Cole Tim J., Baur Xaver, Hall Graham L., Culver Bruce H., Enright Paul L., Hankinson John L., Ip Mary S.M., Zheng Jinping, and Stocks Janet. Multi-ethnic reference values for spirometry for the 3–95-yr age range: The global lung function 2012 equations. European Respiratory Journal, 40(6):1324–1343, 2012. ISSN 0903-1936. doi: 10.1183/09031936.00080312. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [15].Sakornsakolpat Phuwanat, Prokopenko Dmitry, Lamontagne Maxime, Reeve Nicola F, Guyatt Anna L, Jackson Victoria E, Shrine Nick, Qiao Dandi, Bartz Traci M, Kim Deog Kyeom, et al. Genetic landscape of chronic obstructive pulmonary disease identifies heterogeneous cell-type and phenotype associations. Nature genetics, 51(3):494–505, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [16].Bhattacharjee Sudipto, Saha Banani, Bhattacharyya Parthasarathi, and Saha Sudipto. Classification of obstructive and non-obstructive pulmonary diseases on the basis of spirometry using machine learning techniques. Journal of Computational Science, 63:101768, 2022. ISSN 1877-7503. doi: 10.1016/j.jocs.2022.101768. [DOI] [Google Scholar]
  • [17].Cosentino Justin, Behsaz Babak, Alipanahi Babak, McCaw Zachary R., Hill Davin, Schwantes-An Tae-Hwi, Lai Dongbing, Carroll Andrew, Hobbs Brian D., Cho Michael H., McLean Cory Y., and Hormozdiari Farhad. Inference of chronic obstructive pulmonary disease with deep learning on raw spirograms identifies new genetic loci and improves risk models. Nature Genetics, April 2023. ISSN 1546-1718. doi: 10.1038/s41588-023-01372-4. [DOI] [PubMed] [Google Scholar]
  • [18].Bodduluri Sandeep, Nakhmani Arie, Reinhardt Joseph M., Wilson Carla G., McDonald Merry-Lynn, Rudraraju Ramaraju, Jaeger Byron C., Bhakta Nirav R., Castaldi Peter J., Sciurba Frank C., Zhang Chengcui, Bangalore Purushotham V., and Bhatt Surya P.. Deep neural network analyses of spirometry for structural phenotyping of chronic obstructive pulmonary disease. JCI Insight, 5 (13), July 2020. doi: 10.1172/jci.insight.132781. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [19].Wang Yimin, Li Yicong, Chen Wenya, Zhang Changzheng, Liang Lijuan, Huang Ruibo, Jian Wenhua, Liang Jianling, Zhu Senhua, Tu Dandan, Gao Yi, Zhong Nanshan, and Zheng Jinping. Deep Learning for Automatic Upper Airway Obstruction Detection by Analysis of Flow-Volume Curve. Respiration; international review of thoracic diseases, 101(9):841–850, 2022. ISSN 1423-0356 0025-7931. doi: 10.1159/000524598. [DOI] [PubMed] [Google Scholar]
  • [20].Das Nilakash, Verstraete Kenneth, Stanojevic Sanja, Topalovic Marko, Aerts Jean-Marie, and Janssens Wim. Deep-learning algorithm helps to standardise ATS/ERS spirometric acceptability and usability criteria. The European respiratory journal, 56(6):2000603, December 2020. ISSN 1399-3003 0903-1936. doi: 10.1183/13993003.00603-2020. [DOI] [PubMed] [Google Scholar]
  • [21].Wang Yimin, Li Yicong, Chen Wenya, Zhang Changzheng, Liang Lijuan, Huang Ruibo, Liang Jianling, Tu Dandan, Gao Yi, Zheng Jinping, and Zhong Nanshan. Deep learning for spirometry quality assurance with spirometric indices and curves. Respiratory Research, 23(1):98, April 2022. ISSN 1465-993X. doi: 10.1186/s12931-022-02014-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [22].Velickovski Filip, Ceccaroni Luigi, Marti Robert, Burgos Felip, Gistau Concepción, Alsina-Restoy Xavier, and Roca Josep. Automated spirometry quality assurance: Supervised learning from multiple experts. IEEE Journal of Biomedical and Health Informatics, 22(1):276–284, 2018. doi: 10.1109/JBHI.2017.2713988. [DOI] [PubMed] [Google Scholar]
  • [23].Yun Taedong, Cosentino Justin, Behsaz Babak, McCaw Zachary R., Hill Davin, Luben Robert, Lai Dongbing, Bates John, Yang Howard, Schwantes-An Tae-Hwi, Zhou Yuchen, Khawaja Anthony P., Carroll Andrew, Hobbs Brian D., Cho Michael H., McLean Cory Y., and Hormozdiari Farhad. Unsupervised representation learning on high-dimensional clinical data improves genomic discovery and prediction. Nature Genetics, 56(8):1604–1613, August 2024. ISSN 1546-1718. doi: 10.1038/s41588-024-01831-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [24].Kingma Diederik P. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013. [Google Scholar]
  • [25].Vandevoorde J., Verbanck S., Schuermans D., Broekaert L., Devroey D., Kartounian J., and Vincken W.. Forced vital capacity and forced expiratory volume in six seconds as predictors of reduced total lung capacity. European Respiratory Journal, 31(2):391–395, 2008. ISSN 0903-1936. doi: 10.1183/09031936.00032307. [DOI] [PubMed] [Google Scholar]
  • [26].Bellia V., Sorino C., Catalano F., Augugliaro G., Scichilone N., Pistelli R., Pedone C., and Antonelli-Incalzi R.. Validation of FEV6 in the elderly: Correlates of performance and repeatability. Thorax, 63(1):60–66, January 2008. ISSN 1468-3296 0040-6376. doi: 10.1136/thx.2007.080572. [DOI] [PubMed] [Google Scholar]
  • [27].Jing Ji-Yong, Huang Tian-Cha, Cui Wei, Xu Feng, and Shen Hua-Hao. Should FEV1/FEV6 replace FEV1/FVC ratio to detect airway obstruction? A metaanalysis. Chest, 135(4):991–998, April 2009. ISSN 1931-3543 0012-3692. doi: 10.1378/chest.08-0723. [DOI] [PubMed] [Google Scholar]
  • [28].Allen S. C., Charlton C., Backen W., Warwick-Sanders M., and Yeung P.. Performing slow vital capacity in older people with and without cognitive impairment–is it useful? Age and ageing, 39(5): 588–591, September 2010. ISSN 1468-2834 0002-0729. doi: 10.1093/ageing/afq084. [DOI] [PubMed] [Google Scholar]
  • [29].Vapnik Vladimir. The Nature of Statistical Learning Theory. Springer science & business media, 2013. [Google Scholar]
  • [30].Bhatt Surya P., Balte Pallavi P., Schwartz Joseph E., Cassano Patricia A., Couper David, Jacobs David R. Jr, Kalhan Ravi, O’Connor George T., Yende Sachin, Sanders Jason L., Umans Jason G., Dransfield Mark T., Chaves Paulo H., White Wendy B., and Oelsner Elizabeth C.. Discriminative Accuracy of FEV1:FVC Thresholds for COPD-Related Hospitalization and Mortality. JAMA, 321 (24):2438–2447, June 2019. ISSN 1538-3598 0098-7484. doi: 10.1001/jama.2019.7233. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [31].Jaiswal Ashish, Babu Ashwin Ramesh, Zadeh Mohammad Zaki, Banerjee Debapriya, and Makedon Fillia. A survey on contrastive self-supervised learning. Technologies, 9(1):2, 2020. [Google Scholar]
  • [32].Shurrab Saeed and Duwairi Rehab. Self-supervised learning methods and applications in medical imaging analysis: A survey. PeerJ Computer Science, 8:e1045, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [33].Hoseinzade Ehsan and Haratizadeh Saman. CNNpred: CNN-based stock market prediction using a diverse set of variables. Expert Systems with Applications, 129:273–285, 2019. ISSN 0957-4174. doi: 10.1016/j.eswa.2019.03.029. [DOI] [Google Scholar]
  • [34].Wang Kang, Li Kenli, Zhou Liqian, Hu Yikun, Cheng Zhongyao, Liu Jing, and Chen Cen. Multiple convolutional neural networks for multivariate time series prediction. Neurocomputing, 360:107–119, 2019. ISSN 0925-2312. doi: 10.1016/j.neucom.2019.05.023. [DOI] [Google Scholar]
  • [35].Foster Adam, Pukdee Rattana, and Rainforth Tom. Improving transformation invariance in contrastive representation learning. In International Conference on Learning Representations, 2021. [Google Scholar]
  • [36].Kingma Diederik P and Ba Jimmy. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014. [Google Scholar]
  • [37].Holm Sture. A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics, 6(2):65–70, 1979. ISSN 03036898, 14679469. [Google Scholar]
  • [38].Vogelmeier Claus F., Criner Gerard J., Martinez Fernando J., Anzueto Antonio, Barnes Peter J., Bourbeau Jean, Celli Bartolome R., Chen Rongchang, Decramer Marc, Fabbri Leonardo M., Frith Peter, Halpin David M. G., Varela M. Victorina López, Nishimura Masaharu, Roche Nicolas, Rodriguez-Roisin Roberto, Sin Don D., Singh Dave, Stockley Robert, Vestbo Jørgen, Wedzicha Jadwiga A., and Agustí Alvar. Global Strategy for the Diagnosis, Management, and Prevention of Chronic Obstructive Lung Disease 2017 Report. GOLD Executive Summary. American journal of respiratory and critical care medicine, 195(5):557–582, March 2017. ISSN 1535-4970 1073-449X. doi: 10.1164/rccm.201701-0218PP. [DOI] [PubMed] [Google Scholar]
  • [39].McInnes Leland, Healy John, and Melville James. UMAP: Uniform manifold approximation and projection for dimension reduction, 2018.
  • [40].Guidotti Riccardo, Monreale Anna, Ruggieri Salvatore, Turini Franco, Giannotti Fosca, and Pedreschi Dino. A survey of methods for explaining black box models. Acm Computing Surveys, 51(5), August 2018. ISSN 0360-0300. doi: 10.1145/3236009. [DOI] [Google Scholar]
  • [41].Ribeiro Marco Tulio, Singh Sameer, and Guestrin Carlos. “Why Should I Trust You?”: Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, pages 1135–1144, New York, NY, USA, 2016. Association for Computing Machinery. ISBN 978-1-4503-4232-2. doi: 10.1145/2939672.2939778. [DOI] [Google Scholar]
  • [42].Lundberg Scott M and Lee Su-In. A unified approach to interpreting model predictions. In Guyon I., Von Luxburg U., Bengio S., Wallach H., Fergus R., Vishwanathan S., and Garnett R., editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. [Google Scholar]
  • [43].Hossain Md Imran, Zamzmi Ghada, Mouton Peter R., Salekin Md Sirajus, Sun Yu, and Goldgof Dmitry. Explainable AI for medical data: Current methods, limitations, and future directions. Acm Computing Surveys, December 2023. ISSN 0360-0300. doi: 10.1145/3637487. [DOI] [Google Scholar]
  • [44].Torop Max, Masoomi Aria, Hill Davin, Kose Kivanc, Ioannidis Stratis, and Dy Jennifer. SmoothHess: ReLU network feature interactions via stein’s lemma. Advances in Neural Information Processing Systems, 36, 2024. [Google Scholar]
  • [45].Lundberg Scott M., Erion Gabriel, Chen Hugh, DeGrave Alex, Prutkin Jordan M., Nair Bala, Katz Ronit, Himmelfarb Jonathan, Bansal Nisha, and Lee Su-In. From local explanations to global understanding with explainable AI for trees. Nature Machine Intelligence, 2(1):56–67, January 2020. ISSN 2522-5839. doi: 10.1038/s42256-019-0138-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [46].Hill Davin, Masoomi Aria, Torop Max, Ghimire Sandesh, and Dy Jennifer. Boundary-aware uncertainty for feature attribution explainers. In International Conference on Artificial Intelligence and Statistics, pages 55–63. PMLR, 2024. [Google Scholar]
  • [47].Frye Christopher, Rowat Colin, and Feige Ilya. Asymmetric Shapley Values: Incorporating causal knowledge into model-agnostic explainability. Advances in Neural Information Processing Systems, 33:1229–1239, 2020. [Google Scholar]
  • [48].Strumbelj Erik and Kononenko Igor. Explaining Prediction Models and Individual Predictions with Feature Contributions. Knowledge and Information Systems, 41(3):647–665, December 2014. ISSN 0219-1377. doi: 10.1007/s10115-013-0679-x. [DOI] [Google Scholar]
  • [49].Masoomi Aria, Hill Davin, Xu Zhonghui, Hersh Craig P., Silverman Edwin K., Castaldi Peter J., Ioannidis Stratis, and Dy Jennifer. Explanations of black-box models based on directional feature interactions. In 10th International Conference on Learning Representations, ICLR 2022. OpenReview.net, 2022. [PMC free article] [PubMed] [Google Scholar]
  • [50].Goodfellow Ian, Pouget-Abadie Jean, Mirza Mehdi, Xu Bing, Warde-Farley David, Ozair Sherjil, Courville Aaron, and Bengio Yoshua. Generative adversarial networks. Communications of the ACM, 63(11):139–144, 2020. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1

Articles from Research Square are provided here courtesy of American Journal Experts

RESOURCES