Abstract
Background:
Chronic obstructive pulmonary disease (COPD) is a heterogeneous condition with respect to onset, progression and response to therapy. Incorporating clinical- and imaging-based features to refine COPD phenotypes provides valuable information beyond that obtained from traditional clinical evaluations. We characterized the spectrum of COPD-related phenotypes in a sample of former and current smokers and evaluated how these sub-groups differ with respect to sociodemographic characteristics, COPD-related co-morbidities, and subsequent risk of lung cancer.
Methods:
White (N=659) and African American (N=520) male and female participants without lung cancer (controls) in the INHALE study who completed a chest CT scan, interview and spirometry test were used to define distinct COPD-related sub-groups based on hierarchical clustering. Seven variables were used to define clusters: pack years, quit years, FEV1/FVC, % predicted FEV1 and from quantitative CT imaging, % emphysema, % air trapping and mean lung density ratio. Cluster definitions were then applied to INHALE lung cancer cases (N=576) to evaluate lung cancer risk.
Results:
Five clusters were identified that differed significantly with respect to sociodemographic (e.g., race, age) and clinical (e.g., BMI, limitations due to breathing difficulties) characteristics. Increased risk of lung cancer was associated with increasingly detrimental lung function clusters (when ordered from most detrimental to least detrimental).
Conclusions:
Measures of lung function vary considerably among smokers, and are not fully explained by smoking intensity.
Impact:
Combining clinical (spirometry) and radiologic (quantitative CT) measures of COPD define a spectrum of lung disease that predicts lung cancer risk differentially among patient clusters.
Keywords: COPD, lung cancer, quantitative CT, smoking, spirometry
Introduction
Chronic obstructive pulmonary disease (COPD) is the third most common cause of death in the United States and is also a major risk factor for lung cancer, the leading cause of cancer deaths. Spirometry is used to diagnose COPD but only quantifies a single aspect of COPD pathophysiology, a heterogeneous disease with respect to onset, spatial distribution and progression. Other factors such as exacerbations, co-morbidities, and physical attributes such as body mass index (BMI) or exercise endurance have been explored as a means to further refine COPD sub-phenotypes.1–4 Quantitative imaging based on low-dose chest CT scans allows objective measurement of radiologic features of COPD, and the regional distribution of COPD gained from quantitative CT (qCT) measures have been shown to provide additional clinically relevant information beyond spirometry. Quantitative CT whole lung measures of emphysema and air trapping are correlated with measures of lung function from spirometry, even in pre-symptomatic individuals, as well as predicting risk of lung cancer.5–10
The INHALE study was designed to evaluate different measures of COPD in relation to lung cancer risk in lung cancer cases and population-based controls. In the present study, we explored COPD phenotyping in 1,179 ever-smoking controls (520 African American and 659 white) and 576 ever-smoking lung cancer cases (228 African American and 348 white) in INHALE who underwent spirometry and a chest CT scan, in addition to completing an interview. Hierarchical clustering was performed in controls to determine the spectrum of lung phenotypes and identify lung function sub-groups using both clinical (i.e., pack years, spirometry) and radiologic (qCT) measures. These cluster-defined categories were used to determine their association with race, other sociodemographic and clinical characteristics, and COPD-related co-morbidities including lung cancer risk.
Materials and Methods
Study Participants
The INHALE study was initiated in 2012 and has been previously described.10 Briefly, lung cancer cases were enrolled at the Karmanos Cancer Institute or Henry Ford Health System (HFHS) within 12 months of diagnosis. Volunteer controls were enrolled from the metropolitan Detroit area. Cases and controls were 21–89 years of age, able to complete the CT scan, never had taken Amiodarone or been diagnosed with bronchiectasis or cystic fibrosis. Additionally, controls never had surgical removal of any portion of either lung, or been diagnosed with lung cancer and carried health insurance (in the event medical follow-up was required based on a clinical finding on the CT or spirometry). Written informed consent was obtained from all subjects prior to participation. Participants completed an interview, low-dose chest CT scan and pulmonary function test (PFT) with either spirometry at the time of enrollment or, for some cases unable to complete spirometry at the time of interview, PFTs were abstracted from medical records around the time of diagnosis. The Wayne State University (WSU) and HFHS Institutional Review Boards approved the procedures used in collecting and processing participant information, which were in accordance with the Declaration of Helsinki. The analyses presented here included only white and African American current or former smoking control participants and lung cancer cases.
Trait Definitions
Demographic data, smoking history, medication use and other clinical characteristics (e.g., weight/height, physical activity, diet) were ascertained from interviews. Pack years was calculated by multiplying number of years smoked by the average number of cigarettes smoked per day divided by 20. Family history of lung cancer was recorded as “yes” if the participant reported at least one first-degree relative with a diagnosis of lung cancer. PFTs were performed by trained technicians in accordance with ATS guidelines.11 FEV1 and FVC were measured and FEV1/FVC ratio was calculated. Predicted normal values were calculated according to sex, age, height and race using the Third National Health and Nutrition Examination Survey (NHANES III) reference equation.12 Board certified pulmonologists, blinded to study group, reviewed the spirometry results for quality assurance. Severity of COPD was classified according to Global Initiative for Chronic Obstructive Lung Disease (GOLD) staging.13
CT scans were taken at both full inspiration and full expiration under a protocol standardized across scanners.14 Radiologic (qCT) measures of COPD were generated by VIDA Diagnostics software (www.vidadiagnostics.com). Whole lung qCT measures included percent air trapping (qCT % air trapping), defined as the percent voxels below −856 Houndsfield Units (HU) on expiration, and percent emphysema (qCT % emphysema), defined as the percent voxels below −950 HU on inspiration, and mean lung density (MLD) ratio, defined as expiratory / inspiratory MLD. MLD ratio is considered a scanner-independent measure of air trapping associated with emphysema.15
Statistical Analysis
Tests of homogeneity by race or by cluster were performed using either chi-squared tests or Fisher’s exact tests for categorical variables and t-tests or ANOVA for continuous variables. Due to skewness, median and IQR were calculated for percent emphysema and homogeneity was evaluated by the non-parametric Wilcoxon rank sum test. Hierarchical clustering was performed to identify groups of individuals who are more similar to each other than to other groups based on seven lung disease-related variables (pack years, quit years, FEV1/FVC, FEV1 % predicted, qCT % emphysema, qCT % air trapping and MLD ratio). The analyses were performed on standardized variables, due to differences in scale and variability between the measures. Dissimilarities were calculated using Ward’s method, which measures the distance between two groups as the increase in sums of squares associated with merging the groups. The merging cost was used as an objective measure to aid in selecting the optimal number of clusters. Linear discriminant analysis was used to evaluate separation among clusters; only the first two linear discriminants were used to plot separation among clusters. Statistical significance was defined as p < 0.05.
The optimal clustering result was ordered from ‘most detrimental’ to ‘least detrimental’ through a simple averaging of individual ‘lung disease’ Z-scores. Weights assigned to each value were either 1 or −1, depending on whether increased values negatively (1, i.e., pack years and qCT measures) or positively (−1, i.e., quit years and spirometry measures) affected lung disease, such that a higher score corresponded to a more detrimental lung disease profile. Clusters were applied to lung cancer cases, standardizing measures in cases based on their respective distribution in controls and estimating Euclidean distance from each cluster. The minimum distance among clusters was used to assign cases to control-based clusters. Logistic regression modeling was then used to estimate risk of lung cancer associated with (ordered) lung function sub-groups.
All analyses were performed using R v3.4 statistical software.
Results
Sample Description
A description of 659 white and 520 African American INHALE controls is presented in Tables 1 and 2. Whites and African Americans differed significantly by educational level (≤ high school/> high school), BMI, smoking status (former/current), frequent physical activity (< 3 times per week/≥ 3 times per week), reported limitations due to breathing difficulties (yes/no) and regular aspirin/NSAID use (yes/no). Of the seven variables used for clustering (Table 2), pack years, quit years, qCT % emphysema and MLD ratio differed significantly by race.
Table 1.
Description of INHALE controls used for clustering (N=1179). Categorical measures presented as N (%), continuous measures presented as mean (SD).
| Variable | Whites (n=659) | African Americans (n=520) | Test of Homogeneity |
|---|---|---|---|
| Gender | |||
| Male | 312 (47.3) | 222 (42.7) | 0.111 |
| Female | 347 (52.7) | 298 (57.3) | |
| Age | 61.1 (9.6) | 60.1 (8.9) | 0.057 |
| Education | |||
| ≤ high school | 191 (29.0) | 233 (44.8) | <0.001 |
| > high school | 468 (71.0) | 287 (55.2) | |
| BMI (kg/m2) | 28.6 (5.8) | 29.8 (6.7) | 0.002 |
| Smoking status | |||
| Former | 302 (45.8) | 138 (26.5) | <0.001 |
| Current | 357 (54.2) | 382 (73.5) | |
| Family history of lung cancer | |||
| No | 547 (83.0) | 442 (85.2) | 0.316 |
| Yes | 112 (17.0) | 77 (14.8) | |
| Frequent physical activitya | |||
| < 3x/week | 288 (43.7) | 259 (49.9) | 0.034 |
| ≥ 3x/week | 371 (56.3) | 260 (50.1) | |
| Limitations due to breathing difficultiesb | |||
| No | 524 (79.6) | 356 (68.7) | <0.001 |
| Yes | 134 (20.4) | 162 (31.3) | |
| Regular aspirin/NSAID usec | |||
| No | 179 (27.3) | 189 (36.6) | 0.001 |
| Yes | 478 (72.7) | 328 (63.4) | |
| Alcohol consumption (drinks/wk) | 4.4 (8.0) | 4.3 (10.6) | 0.835 |
| GOLD score | |||
| 0 (none) | 461 (70.0) | 331 (63.6) | |
| 1 (mild) | 30 (4.5) | 30 (5.8) | |
| 2 (moderate) | 107 (16.2) | 107 (20.6) | 0.211 |
| 3 (severe) | 54 (8.2) | 45 (8.6) | |
| 4 (very severe) | 7 (1.1) | 7 (1.4) | |
Frequent physical activity defined as activities or exercises (other than work) performed ≥ 3x/week for one month or more in the past year
Any reported limitation in usual activities due to breathing difficulties or shortness of breath
Regular aspirin/NSAID use defined as taking either adult aspirin, baby aspirin or NSAID ≥ 3x/week for one month or more
Table 2.
Description of variables used in hierarchical clustering of white and African American INHALE controls (N=1179). Variables summarized as mean (SD) except where noted.
| Variable | Whites (n=659) | African Americans (n=520) | Test of Homogeneity |
|---|---|---|---|
| Pack years | 38.0 (26.2) | 27.2 (19.4) | <0.001 |
| Quit years (former smokers) | 17.0 (12.2) | 14.6 (11.2) | 0.049 |
| FEV1/FVC | 0.72 (0.11) | 0.72 (0.13) | 0.497 |
| % predicted FEV1 | 77.3 (20.1) | 76.6 (21.1) | 0.564 |
| qCT % emphysemaa(median, IQR) | 1.2 (2.3) | 0.9 (2.2) | 0.001 |
| qCT % gas trappingb | 15.6 (15.8) | 15.9 (17.7) | 0.756 |
| qCT MLD ratioc | 0.87 (0.06) | 0.88 (0.07) | 0.038 |
Percent lung voxels < -950 HU on inspiration across both lungs on quantitative CT (qCT).
Percent lung voxels < -856 HU on expiration across both lungs on qCT.
Mean lung density (MLD) ratio = expiratory MLD / inspiratory MLD on qCT.
Sub-groups of smokers based on clustering in controls
We evaluated COPD phenotypes through clustering analysis in the 1,179 INHALE participants who were free of lung cancer (controls). Based on the increase in merging cost associated with each consecutive clustering event, five- and seven-cluster results were both considered (see Supplementary Figure S1). After investigating the separation of spirometry- and quantitative-based measures in each result, k=5 was selected as the optimum clustering. Separation of the five clusters according to the first two linear discriminants is presented in Supplementary Figure S2. The first two eigenvalues explained 77.0% of the variance.
To gain insight on the relative position of these clusters along the spectrum of lung disease, a weighted sum of variable Z-score means within each cluster was used to rank clusters from most detrimental (cluster 1) to least detrimental (cluster 5) with respect to lung disease. Cluster profiles are depicted in a mean trends chart in Figure 1, and actual mean values are listed in Table 3.
Figure 1.
Hierarchical clustering results for 1179 INHALE controls. Mean trends are shown for variables used in clustering. Clusters were ordered and numbered by most detrimental lung disease profile (cluster 1) to least detrimental profile (cluster 5), based on variable means within each cluster. Arrows indicate standardized group means for particular variables, relative to the overall mean (μ = 1), as follows: ↑/↓ = 0.2–1 SD above/below overall mean, ↑↑/↓↓ = 1–1.9 SDs above/below overall mean, ↑↑↑/↓↓↓ > 2 SDs above/below overall mean, -- = within 0.1 SDs above/below mean. Red indicates negative mean trend, blue indicates beneficial mean trend.
Table 3.
Hierarchical clustering of INHALE current/former smokers with quantitative CT (qCT) and spirometry data (N=1179). Clustering performed on standardized variables, actual values (cluster means and SDs) presented. Clusters are ordered from most detrimental lung disease profile (cluster 1) to least detrimental (cluster 5).
| Cluster |
|||||
|---|---|---|---|---|---|
| Variable | 1 | 2 | 3 | 4 | 5 |
| N | 64 | 232 | 73 | 609 | 201 |
| Pack years | 47.5 (24.1) | 38.4 (31.4) | 26.0 (18.7) | 34.2 (20.9) | 22.5 (19.6) |
| Quit years | 2.6 (4.9) | 1.1 (3.1) | 6.2 (10.1) | 1.8 (3.6) | 25.9 (9.9) |
| FEV1/FVC | 0.53 (0.11) | 0.69 (0.10) | 0.47 (0.11) | 0.77 (0.07) | 0.77 (0.07) |
| % predicted FEV1 | 53.9 (18.6) | 67.0 (16.6) | 49.1 (16.2) | 83.5 (16.9) | 86.5 (17.6) |
| qCT % emphysemaa | 16.3 (7.3) | 2.6 (2.4) | 1.5 (1.6) | 1.2 (1.4) | 2.4 (2.5) |
| qCT % air trappingb | 51.3 (13.6) | 30.6 (16.9) | 11.2 (9.6) | 7.1 (6.7) | 14.9 (13.5) |
| qCT MLD ratioc | 0.94 (0.04) | 0.95 (0.05) | 0.86 (0.06) | 0.84 (0.05) | 0.86 (0.06) |
Percent lung voxels < -950 HU on inspiration across both lungs.
Percent lung voxels < -856 HU on expiration across both lungs.
Mean lung density (MLD) ratio = expiratory MLD / inspiratory MLD
Cluster 1 (N=64, 5.4%) was defined by heavier smoking (higher pack years, lower quit years), very poor lung function on spirometry (FEV1/FVC < 0.70 and very low % predicted FEV1) and very poor lung imaging phenotype (very high % qCT emphysema, air trapping and MLD ratio).
Cluster 2 (N=232, 19.7%) included individuals with greater smoking intensity (above average pack years and very low average quit years), poor lung function on spirometry and poor imaging phenotype.
Cluster 3 (N=73, 6.2%) was defined by relatively lighter smokers with very poor lung function on spirometry but below average levels of qCT air trapping and emphysema.
Cluster 4 (N=609, 51.7%) was defined by moderate smoking intensity with little evidence of impaired lung function on either spirometry (FEV1/FVC > 0.7 and above average % predicted FEV1) or qCT (low % emphysema, % air trapping and MLD ratio).
Cluster 5 (N=201, 17.0%) included former smokers with low pack years, above average lung function on spirometry (FEV1/FVC > 0.7 and high mean % predicted FEV1) but average levels of qCT emphysema, air trapping and MLD ratio.
Sociodemographic and clinical cluster profiles
Cluster sociodemographic and clinical characteristics are presented in Table 4.
Table 4.
Sociodemographic and clinical characteristics by cluster of INHALE current/former smokers with quantitative CT (qCT) and spirometry data (N=1179). Clusters are ordered from most detrimental lung disease profile (cluster 1) to least detrimental (cluster 5). Dichotomous variables presented as N (%), continuous variables presented as mean (SD).
| Cluster |
||||||
|---|---|---|---|---|---|---|
| Variable | 1 | 2 | 3 | 4 | 5 | phomogeneity |
| N | 64 | 232 | 73 | 609 | 201 | |
| Race (African American) | 31 (48.4) | 117 (50.4) | 41 (56.2) | 274 (45.0) | 57 (28.4) | <0.001 |
| Gender (Female) | 27 (42.2) | 119 (51.3) | 36 (49.3) | 347 (57.0) | 116 (57.7) | 0.092 |
| Age (years) | 68.2 (7.7) | 62.9 (8.2) | 59.1 (9.4) | 58.1 (8.8) | 64.0 (9.4) | <0.001 |
| Education (> high school) | 36 (56.3) | 124 (53.5) | 50 (68.5) | 392 (64.4) | 153 (76.1) | <0.001 |
| Family history of lung cancer | 11 (17.2) | 34 (14.7) | 8 (11.0) | 101 (16.6) | 35 (17.4) | 0.703 |
| History of asthma | 7 (11.7) | 40 (17.5) | 10 (13.7) | 94 (15.5) | 53 (26.4) | 0.005 |
| Current smoker | 42 (65.6) | 195 (84.1) | 44 (60.3) | 458 (75.2) | 0 | <0.001 |
| Alcohol consumption (drinks/wk) | 3.5 (5.3) | 4.1 (7.5) | 5.5 (9.6) | 4.9 (11.1) | 2.8 (4.7) | 0.043 |
The proportion of African Americans was significantly different across clusters, ranging from 28.4% in cluster 5 (N=57) to 56.2% in cluster 3 (N=41) (p<0.001). Age and education level (≤ high school (HS) diploma versus > HS diploma) were also significantly different across clusters (p<0.001 for both characteristics), with the relative frequency of those with education beyond high school increasing with less severe lung disease profiles. The proportion of individuals reporting a history of asthma differed (p=0.005) across clusters, such that the lowest proportion was in the most detrimental cluster (11.7% in cluster 1) and the highest was in the least detrimental cluster (26.4% in cluster 5). Consistent with trends in quit years among clusters, proportions of current smokers were lowest in cluster 5 (no current smokers, highest quit years, mean = 25.9) and highest in cluster 2 (84% current smokers, lowest quit years, mean = 1.1).
Trends in COPD-related co-morbidities among clusters
Among COPD-related comorbidities available from INHALE data, BMI, limitations due to breathing difficulty and inflammatory conditions (gout, lupus, RA and sarcoidosis) differed significantly across clusters, while chronic pain conditions (osteoarthritis and fibromyalgia) and diabetes were marginally significantly different (Supplementary Table S1). Mean BMI was lowest in cluster 1 (mean=24.6, SD = 3.8) and highest in cluster 4 (mean=30.0, SD=6.3). The proportion of individuals with physical limitations due to breathing difficulties decreased with less detrimental lung disease clusters, with the highest proportion in cluster 1 (42.2%, N=27) and lowest in cluster 5 (12.9%, N=26).
Lung cancer risk among lung disease clusters
After applying cluster definitions to INHALE lung cancer cases, there was a trend in the crude odds ratio (OR) of lung cancer across clusters, such that higher odds of lung cancer were associated with more detrimental lung disease clusters, using cluster 5 as the reference group. After adjusting for age, race, gender and BMI, these trends persisted (Table 5). The odds of lung cancer in cluster 1 were more than 2.6 times the odds of lung cancer in cluster 5 (OR=2.65, 95% CI: (1.71, 4.10)), and odds of lung cancer in cluster 2 were more than 2.3 times the odds of lung cancer in cluster 5 (OR=2.33, 95% CI: (1.66, 3.26)). Risk of lung cancer was significantly increased in cluster 3 compared with cluster 5 (OR=1.68, 95% CI: (1.04, 2.73)), but risk was not significantly different in cluster 4 compared with cluster 5 (OR=0.92, 95% CI (0.67, 1.29)). When cluster order was treated as a continuous variable, odds of lung cancer decreased by 28% with each increase in lung disease cluster (less detrimental, p<0.0001).
Table 5.
Control cluster definitions applied to INHALE current/former smoker lung cancer cases with quantitative CT (qCT) and spirometry/PFT data (N=576). Clusters assigned based on standardized values, actual values (means and SDs) presented here.
| Cluster |
|||||
|---|---|---|---|---|---|
| Variable | 1 | 2 | 3 | 4 | 5 |
| N | 77 | 205 | 40 | 177 | 77 |
| Pack years | 54.4 (33.9) | 53.0 (31.4) | 34.9 (18.5) | 42.7 (25.7) | 22.3 (16.8) |
| Quit years | 5.5 (9.2) | 2.6 (5.9) | 5.3 (9.2) | 2.3 (4.1) | 29.4 (10.5) |
| FEV1/FVC | 0.51 (0.10) | 0.66 (0.08) | 0.52 (0.09) | 0.74 (0.06) | 0.74 (0.08) |
| % predicted FEV1 | 49.7 (17.4) | 67.4 (17.2) | 48.5 (11.4) | 80.4 (18.6) | 80.7 (19.2) |
| qCT % emphysemaa | 16.4 (8.3) | 3.0 (2.6) | 2.7 (2.2) | 1.3 (1.7) | 2.0 (2.0) |
| qCT % air trappingb | 54.9 (14.1) | 31.1 (15.2) | 17.3 (12.5) | 7.3 (5.4) | 16.2 (12.9) |
| qCT MLD ratioc | 0.95 (0.04) | 0.94 (0.04) | 0.88 (0.06) | 0.84 (0.05) | 0.87 (0.07) |
| Crude OR (95% CI)d | 3.14 (2.06, 4.79) | 2.31 (1.67, 3.19) | 1.43 (0.90, 2.28) | 0.76 (0.56, 1.04) | 1.00 |
| Adjusted OR (95% CI)e | 2.65 (1.71, 4.10) | 2.33 (1.66, 3.26) | 1.68 (1.04, 2.73) | 0.92 (0.67, 1.29) | 1.00 |
Percent lung voxels < -950 HU on inspiration across both lungs.
Percent lung voxels < -856 HU on expiration across both lungs.
Mean lung density (MLD) ratio = expiratory MLD / inspiratory MLD
Unadjusted odds ratio comparing odds of lung cancer in each cluster to odds of lung cancer in cluster 5 (reference).
Odds ratio comparing odds of lung cancer in each cluster to odds of lung cancer in cluster 5 (reference), adjusted for age, race, gender and BMI.
Bold indicates statistical significance at α=0.05.
Discussion
The spectrum of lung disease in this population-based sample of former and current smokers was defined by five unique combinations of smoking history, spirometry and quantitative imaging phenotypes. We found significant evidence of racial heterogeneity across these clusters, consistent with overall differences observed in smoking history and qCT measures. Despite African Americans smoking, on average, fewer pack years compared to whites, the proportion of African Americans in cluster 1 (most detrimental, heaviest smoking) was similar to the overall proportion of African Americans (48% in cluster 1 versus 44% overall), whereas the lightest smoking cluster (cluster 5, least detrimental) had the lowest proportion (28%) of African Americans among all clusters. The highest proportion of African Americans was found in cluster 3 (56%), which consisted of relatively light smokers (mean pack years = 26) with very poor spirometry measures. Cluster 3 was small (N=73) but notable because it included individuals with the poorest lung function on spirometry, yet qCT measures were below average, even slightly lower than cluster 5 (very low % emphysema, % air trapping and MLD ratio). Subjects in this cluster are more likely to be younger and African American than in cluster 5. These observations suggest that smoking intensity alone is not a sufficient indicator of overall lung disease, especially among African Americans, and that spirometry measures of poor lung function may precede qCT measures among lighter smoking African Americans. These findings are consistent with results from the National Emphysema Treatment Trial (NETT), which also found that African Americans had lower qCT measures of emphysema despite similarly poor spirometry values compared to whites.16 As in our study, African Americans in NETT were younger and smoked less, on average, than white participants, although African American enrollment was very limited in the NETT study (N=42). We note that race contributes approximately 1.3% of the variability in cluster assignment, which is similar to other covariates significantly associated with cluster (e.g., age – 1.7%, education – 1.6%), suggesting that race is only one of many factors contributing to the cluster result.
Cluster 4, which included mostly current (75%) and fairly heavy smokers (mean pack years = 34), had lung function on spirometry comparable to cluster 5 and better qCT than cluster 5. This was the largest cluster, comprising 52% of the total sample (609/1179) and was younger, on average, compared to cluster 5. This difference may indicate that airway damage as captured by qCT progresses with age, even after smoking cessation.
Although COPD is widely recognized as a heterogeneous disease, COPD sub-phenotyping continues to evolve. The most recent GOLD Executive Summary statement requires spirometry for diagnosis but recommends that other factors such as symptoms and comorbidities be considered in the sub-phenotyping of patients into risk categories.13 CT imaging has been used in specific subsets of patients with severe disease when making treatment decisions, but has not been routinely performed to aid in diagnosis or treatment of COPD.17 This is despite evidence that CT measures of emphysema, air trapping and airway morphology correlate strongly with COPD severity.8,18–20 The cluster results presented here indicate imaging data can aid clinicians in identifying individuals at risk for developing COPD-related co-morbidities, even in the absence of traditional risk factors such as high pack years/low quit years (i.e., cluster 5). Additionally, these clusters may prove useful for stratifying patients for treatment and disease related outcomes in COPD, although this avenue of research is currently in its infancy.
Other groups have employed clustering approaches among those with COPD or symptoms of airway obstruction using spirometry and qCT measures. COPDGene investigators identified a group of resistant smokers (low emphysema/airflow obstruction), a group of smokers with severe emphysema and two discordant groups in relation to emphysema and airflow obstruction.21 Another cluster analysis of NETT COPD patients also found four clusters with heterogeneous and discordant profiles.22 A study of 2,164 GOLD stage II-IV COPD patients found significant differences in mortality, hospitalizations and exacerbations among five distinct clusters using 13 variables including COPD symptoms and inflammation markers in addition to spirometry and qCT measures.23 These and other studies are consistent with our results in highlighting the complexity of COPD sub-phenotyping and the disparate information provided by qCT and spirometry measures.4 Our study advances these sub-phenotyping approaches by evaluating their contribution to lung cancer risk.
The cluster results were significantly predictive of lung cancer risk. Even after adjusting for covariates, the spectrum of lung disease across the clusters is strongly associated with lung cancer risk, such that the most detrimental cluster has the highest odds of lung cancer and risk decreases with less detrimental lung clusters. Additionally, the INHALE study tracks control subjects for subsequent diagnoses of lung cancer (diagnosis > 1 year post-CT). There are only nine controls in this sample who were subsequently diagnosed with lung cancer; however, it is worth noting that five of these subjects were in cluster 2, a group with mild evidence of COPD on spirometry (on average) yet high levels of qCT air trapping and elevated MLD ratio. Further, a disproportionate percentage of these control-to-case subjects (7/9=78%) were members of the two most detrimental clusters (clusters 1 and 2), which represent only 25% of the total sample (Fisher’s exact test p=0.001). The remaining two subjects were in cluster 4, the largest cluster. These findings demonstrate that 1) quantitative CT measures contribute to predicting lung cancer risk beyond that provided by spirometry/PFT and 2) lung cancer risk differs across COPD sub-phenotypes, suggesting qCT measures could aid in identifying COPD patients at greatest risk of developing lung cancer.
We note that while our study includes a large, diverse, population-based sample of current and former smokers, clustering results are inherently data-dependent and difficult to generalize. To validate the selected clusters, variables not used in the clustering, such as physical attributes (BMI, physical activity) and COPD-related comorbidities were evaluated and correlated with the clusters as expected. Validation of these results in additional smoking populations and monitoring for lung cancer diagnoses are critical next steps, and future efforts will focus on expanding the set of lung disease features considered to identify sub-phenotypes specific to whites and African Americans.
Sub-phenotyping of COPD is difficult due in part to the often-lengthy course of the disease. Utilizing a variety of measures related to COPD, including smoking intensity, spirometry and quantitative imaging, reveals a spectrum of lung disease in a population of current and former smokers. Evidence presented here suggests that radiologic measures based on quantitative imaging analysis add information on lung physiology beyond that captured by spirometry. We have demonstrated that these COPD sub-phenotypes are clinically relevant predictors of lung cancer risk.
Supplementary Material
Acknowledgments
Financial Support: This work was funded by the National Institutes of Health (grant numbers R01CA141769, P30CA022453, HHSN261201300011I) and the Herrick Foundation.
Footnotes
Conflicts of Interest: SMG reports personal fees from ARIAD, AstraZeneca, Genentech, Bristol Meyers Squibb and Pfizer, outside the submitted work. CML, ASW, DW, JCS, NR, GW, MP, CND, MF, TS, DS, MJS, AOS and AGS have no conflicts of interest to declare.
References
- 1.Agusti A, Calverley PM, Celli B, et al. Characterisation of COPD heterogeneity in the ECLIPSE cohort. Respir Res. 2010;11:122. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Burgel PR, Paillasseur JL, Peene B, et al. Two distinct chronic obstructive pulmonary disease (COPD) phenotypes are associated with high risk of mortality. PLoS One 2012;7: e51048. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Vestbo J COPD: definition and phenotypes. Clin Chest Med. 2014;35: 1–6. [DOI] [PubMed] [Google Scholar]
- 4.Weatherall M, Travers J, Shirtcliffe PM, et al. Distinct clinical phenotypes of airways disease defined by cluster analysis. Eur Respir J. 2009;34:812–18. [DOI] [PubMed] [Google Scholar]
- 5.Martinez FJ, Foster G, Curtis JL, et al. Predictors of mortality in patients with emphysema and severe airflow obstruction. Am J Respir Crit Care Med. 2006;173:1326–34. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Kim WJ, Silverman EK, Hoffman E, et al. CT metrics of airway disease and emphysema in severe COPD. Chest 2009;136:396–404. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Nambu A, Zach J, Schroeder J, et al. Quantitative computed tomography measurements to evaluate airway disease in chronic obstructive pulmonary disease: Relationship to physiological measurements, clinical index and visual assessment of airway disease. Eur J Radiol. 2016;85:2144–51. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Schroeder JD, McKenzie AS, Zach JA, et al. Relationships between airflow obstruction and quantitative CT measurements of emphysema, air trapping, and airways in subjects with and without chronic obstructive pulmonary disease. AJR Am J Roentgenol. 2013;201:W460–470. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Xie X, de Jong PA, Oudkerk M, et al. Morphological measurements in computed tomography correlate with airflow obstruction in chronic obstructive pulmonary disease: systematic review and meta-analysis. Eur Radiol. 2012;22:2085–93. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Schwartz AG, Lusk CM, Wenzlaff AS, et al. Risk of Lung Cancer Associated with COPD Phenotype Based on Quantitative Image Analysis. Cancer Epidemiol Biomarkers Prev. 2016;25:1341–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Pellegrino R, Viegi G, Brusasco V, Crapo RO, Burgos F, Casaburi R, Coates A, van der Grinten CP, Gustafsson P, Hankinson J, Jensen R, et al. Interpretative strategies for lung function tests. Eur Respir J. 2005;26:948–68. [DOI] [PubMed] [Google Scholar]
- 12.Hankinson JL, Odencrantz JR, Fedan KB. Spirometric reference values from a sample of the general U.S. population. Am J Respir Crit Care Med. 1999;159:179–87. [DOI] [PubMed] [Google Scholar]
- 13.Vestbo J, Hurd SS, Agusti AG, et al. Global strategy for the diagnosis, management, and prevention of chronic obstructive pulmonary disease: GOLD executive summary. Am J Respir Crit Care Med. 2013;187:347–65. [DOI] [PubMed] [Google Scholar]
- 14.Sieren JP, Newell JD Jr, Barr RG, et al. SPIROMICS protocol for multicenter quantitative computed tomography to phenotype the lungs. Am J Respir Crit Care Med. 2016;194:794–806. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Kubo K, Eda S, Yamamoto H, et al. Expiratory and inspiratory chest computed tomography and pulmonary function tests in cigarette smokers. Eur Respir J. 1999;13:252–6. [DOI] [PubMed] [Google Scholar]
- 16.Chatila WM, Hoffman EA, Gaughan J, et al. Advanced emphysema in African-American and white patients: Do differences exist? Chest 2006;130:108–18. [DOI] [PubMed] [Google Scholar]
- 17.Kirby M, van Beek EJR, Seo JB, et al. Management of COPD: Is there a role for quantitative imaging? Eur J Radiol. 2017;86:335–42. [DOI] [PubMed] [Google Scholar]
- 18.Agusti A, Edwards LD, Celli B, et al. Characteristics, stability and outcomes of the 2011 GOLD COPD groups in the ECLIPSE cohort. Eur Respir J. 2013;42:636–46. [DOI] [PubMed] [Google Scholar]
- 19.Lynch DA, Newell JD. Quantitative imaging of COPD. J Thorac Imaging 2009;24:189–94. [DOI] [PubMed] [Google Scholar]
- 20.Matsuoka S Kurihara Y, Yagihashi K, et al. Airway dimensions at inspiratory and expiratory multisection CT in chronic obstructive pulmonary disease: correlation with airflow limitation. Radiology 2008;248:1042–9. [DOI] [PubMed] [Google Scholar]
- 21.Castaldi PJ, Dy J, Ross J, et al. Cluster analysis in the COPDGene study identifies subtypes of smokers with distinct patterns of airway disease and emphysema. Thorax 2014;69:415–22. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Cho MH, Washko GR, Hoffmann TJ, et al. Cluster analysis in severe emphysema subjects using phenotype and genotype data: an exploratory investigation. Respir Res. 2010;11:30. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Rennard SI, Locantore N, Delafont B, et al. Identification of five chronic obstructive pulmonary disease subgroups with different prognoses in the ECLIPSE cohort using cluster analysis. Ann Am Thorac Soc. 2015;12:303–12. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.

