Skip to main content
JNCI Journal of the National Cancer Institute logoLink to JNCI Journal of the National Cancer Institute
. 2022 Aug 5;114(11):1501–1510. doi: 10.1093/jnci/djac149

The Oral Microbiome and Lung Cancer Risk: An Analysis of 3 Prospective Cohort Studies

Emily Vogtmann 1,#,, Xing Hua 2,#, Guoqin Yu 3, Vaishnavi Purandare 4, Autumn G Hullings 5, Dantong Shao 6, Yunhu Wan 7,8, Shilan Li 9,10, Casey L Dagnall 11,12, Kristine Jones 13,14, Belynda D Hicks 15,16, Amy Hutchinson 17,18, J Gregory Caporaso 19, William Wheeler 20, Dale P Sandler 21, Laura E Beane Freeman 22, Linda M Liao 23, Wen-Yi Huang 24, Neal D Freedman 25, Neil E Caporaso 26, Rashmi Sinha 27, Mitchell H Gail 28, Jianxin Shi 29, Christian C Abnet 30
PMCID: PMC9664178  PMID: 35929779

Abstract

Background

Previous studies suggested associations between the oral microbiome and lung cancer, but studies were predominantly cross-sectional and underpowered.

Methods

Using a case-cohort design, 1306 incident lung cancer cases were identified in the Agricultural Health Study; National Institutes of Health-AARP Diet and Health Study; and Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial. Referent subcohorts were randomly selected by strata of age, sex, and smoking history. DNA was extracted from oral wash specimens using the DSP DNA Virus Pathogen kit, the 16S rRNA gene V4 region was amplified and sequenced, and bioinformatics were conducted using QIIME 2. Hazard ratios and 95% confidence intervals were calculated using weighted Cox proportional hazards models.

Results

Higher alpha diversity was associated with lower lung cancer risk (Shannon index hazard ratio = 0.90, 95% confidence interval = 0.84 to 0.96). Specific principal component vectors of the microbial communities were also statistically significantly associated with lung cancer risk. After multiple testing adjustment, greater relative abundance of 3 genera and presence of 1 genus were associated with greater lung cancer risk, whereas presence of 3 genera were associated with lower risk. For example, every SD increase in Streptococcus abundance was associated with 1.14 times the risk of lung cancer (95% confidence interval = 1.06 to 1.22). Associations were strongest among squamous cell carcinoma cases and former smokers.

Conclusions

Multiple oral microbial measures were prospectively associated with lung cancer risk in 3 US cohort studies, with associations varying by smoking history and histologic subtype. The oral microbiome may offer new opportunities for lung cancer prevention.


Lung cancer has the second-highest incidence and highest mortality of all cancers globally, with more than 2.2 million new cases and nearly 1.8 million deaths per year (1). Cigarette smoking is the most important cause of lung cancer, but additional risk factors are also important, because an estimated 20% of lung cancer deaths in the United States are not attributable to smoking (2).

Growing evidence suggests that the oral microbiome may contribute to lung cancer risk. Lung infections, including pneumonia and tuberculosis, which are caused by bacteria or other microbes, are associated with lung cancer risk, even within never smokers (3). Periodontal disease, which is related to specific microbes including Porphyromonas gingivalis, has also been associated with lung cancer (4,5). In addition, tobacco use can modify the oral microbiome (6), so it is possible that the oral microbiome may mediate associations between tobacco use and lung cancer risk. Previous studies have suggested associations between the oral or sputum microbiome with lung cancer (7-12), but many of these studies included small sample sizes and were primarily of a case-control design, so reverse causation cannot be excluded.

Here, we prospectively examine the association between baseline oral microbiome and subsequent incident lung cancer in 3 US cohorts using a stratified case-cohort design.

Methods

A detailed description of the study methods is provided in the Supplementary Methods (available online). A brief description is provided below.

Study Populations

Oral wash specimens were collected in a subset of participants in the Agricultural Health Study (AHS), National Institutes of Health-AARP (NIH-AARP) Diet and Health Study, and the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial (PLCO) (13–15). We included lung cancer cases from the AHS (N = 252), NIH-AARP (N = 386), and PLCO (N = 697) among participants who had provided an oral wash specimen before diagnosis. Follow-up was through 2013, 2011, and 2009 for each cohort, respectively.

Referent Subcohort Selection

A stratified case-cohort design was selected to provide a common reference group for multiple cancer sites of interest. For each cohort, we tabulated the number of individuals with an oral wash specimen within strata defined by age, sex, cigarette smoking history, and chewing tobacco use (in AHS only). Within each stratum, we sampled individuals without replacement and without regard to future cancer development. The subcohort size in each stratum was chosen to be at least as large as the maximum of the number of cases at any of the cancer sites in that stratum. Separately for each cohort, subcohort members were assigned stratum-specific sampling weights based on the inverse of the observed sampling fractions. In total, subcohorts of 1073, 1135, and 1266 individuals were selected from the AHS, NIH-AARP, and PLCO, respectively.

Demographic and Lifestyle Information

Information on self-reported sex, race and ethnicity, education, alcohol consumption, and detailed smoking history was collected at baseline in all 3 cohorts using standardized questionnaires. Because smoking is an important risk factor for lung cancer, we excluded 116 individuals who did not provide complete smoking history information.

Laboratory Handling

The methods for DNA extraction, polymerase chain reaction (PCR) amplification, and sequencing from the buccal cell pellet was previously described in detail (16). In brief, DNA was extracted from all samples using the DSP DNA Virus Pathogen kit (Qiagen). PCR was performed using 16S rRNA gene V4 barcoded primers. DNA was sequenced using the MiSeq (Illumina) with 2 × 250-bp paired-end sequencing.

Bioinformatics

The sequencing data from the 3 cohorts were independently processed using QIIME 2 version 2018.4 (17) to generate amplicon sequence variants (ASVs) (18). Metrics for alpha diversity (observed ASVs, Shannon index, and Faith’s phylogenetic diversity) were generated based on rarefaction at 20 000 reads per sample. Because weighted samples are less amenable to beta diversity analyses, we conducted weighted principal component analysis (PCA) as a measure of the overall community structure for the relative abundance and presence of ASVs. PCA plots to visualize the vectors are presented in Supplementary Figure 1 (available online). After exclusions, 1254 (244 cases and 1010 subcohort members), 1448 (371 cases and 1077 subcohort members), and 1908 (691 cases and 1217 subcohort members) participants remained from the AHS, NIH-AARP, and the PLCO studies, respectively.

Statistical Analysis

Weighted descriptive statistics were calculated. Survival time was measured to the earliest of lung cancer incidence, death, or loss to or end of follow-up. Separately for each cohort, Cox proportional hazards models were generated for the association between alpha diversity, the top 5 PCA vectors, and relative abundance and presence of genera with overall lung cancer risk. For alpha diversity, the PCA vectors, and relative abundance, the hazard ratio (HR) represents a 1-SD change in the continuous predictor. The models also included adjustment for age, sex, race and ethnicity, education, alcohol, and detailed smoking categories. Using survey procedures, the model included the subcohort weight, selection strata, and any subcohort members who were also cancer cases. Cohort-specific hazard ratios and 95% confidence intervals (CIs) were meta-analyzed using fixed effects models, and a Pheterogeneity was calculated. We used the meta-analysis estimate as the primary measure of association whenever there was limited evidence of heterogeneity.

To evaluate possible reverse causation, we excluded the first 2 years of follow-up. We examined associations stratified by lung cancer histologic subtype and cigarette smoking status. Finally, we created zero-inflated negative binomial models for the associations between smoking history and the genera statistically significantly associated with lung cancer risk. Statistical tests were 2-sided, and P values less than .05 were considered statistically significant, except for the taxonomic analyses, which were adjusted for multiple testing using Bonferroni correction.

Results

Population Characteristics

After a weighted average of 13.9, 6.4, and 7.9 years of follow-up in the AHS, NIH-AARP, and PLCO, respectively, 1306 incident lung cancers were identified. In the AHS, 33.2% of lung cancer cases were female compared with 38.0% in NIH-AARP and 44.1% in PLCO. Most participants in all 3 cohorts were non-Hispanic White individuals. For the lung cancer cases, a high proportion of individuals were current smokers (AHS = 44.3%, NIH-AARP = 36.1%, PLCO = 42.6%) (Table 1). A more detailed categorization of the smoking history is presented in Supplementary Tables 1–3 (available online).

Table 1.

Demographic characteristics of the oral microbiome case-cohort study participants by lung cancer case status

Characteristics AHS
NIH-AARP
PLCO
Lung cancer casesa (n = 244) Referent subcohort
Referent subcohort
Referent subcohort
Unweighted (n = 1010) Weightedb (n = 31 156) Lung cancer casesa (n = 371) Unweighted (n = 1077) Weightedb (n = 23 554) Lung cancer casesa (n = 691) Unweighted (n = 1217) Weightedb (n = 37 135)
No. (%) No. (%) No. (%) No. (%) No. (%) No. (%) No. (%) No. (%) No. (%)
Age at sample collection, y
 <50 23 (9.4) 403 (39.9) 13 125 (42.1)
 50-59 67 (27.5) 290 (28.7) 8231 (26.4) 39 (5.6) 109 (9.0) 3533 (9.5)
 60-64 43 (17.6) 114 (11.3) 3356 (10.8) 32 (8.6) 210 (19.5) 4112 (17.5) 152 (22.0) 312 (25.6) 11 136 (30.0)
 65-69 54 (22.1) 103 (10.2) 3353 (10.8) 66 (17.8) 267 (24.8) 6034 (25.6) 191 (27.6) 301 (24.7) 10 376 (27.9)
 70-74 39 (16.0) 67 (6.6) 2076 (6.7) 120 (32.4) 325 (30.2) 7169 (30.4) 196 (28.4) 303 (24.9) 7870 (21.2)
 ≥75 18 (7.4) 33 (3.3) 1016 (3.3) 153 (41.2) 275 (25.5) 6239 (26.5) 113 (16.4) 192 (15.8) 4219 (11.4)
Sex
 Male 163 (66.8) 608 (60.2) 17 515 (56.2) 230 (62.0) 645 (59.9) 14 024 (59.5) 386 (55.9) 589 (48.4) 17 127 (46.1)
 Female 81 (33.2) 402 (39.8) 13 642 (43.8) 141 (38.0) 432 (40.1) 9530 (40.5) 305 (44.1) 628 (51.6) 20 008 (53.9)
Race and ethnicity
 White 237 (97.1) 989 (97.9) 30 631 (98.3) 354 (95.4) 1012 (94.0) 22 102 (93.8) 629 (91.0) 1119 (92.0) 34 431 (92.7)
 Other race and ethnicity groupsc 7 (2.9) 21 (2.1) 526 (1.7) 17 (4.6) 65 (6.0) 1451 (6.2) 62 (9.0) 98 (8.1) 2704 (7.3)
Education
 <High school 35 (14.3) 72 (7.1) 1913 (6.1) 29 (7.8) 37 (3.4) 807 (3.4) 79 (11.4) 84 (6.9) 1925 (5.2)
 High school 116 (47.5) 430 (42.6) 12 619 (40.5) 111 (29.9) 298 (27.7) 6485 (27.5) 274 (39.7) 446 (36.7) 12 941 (34.9)
 Some college 49 (20.1) 268 (26.5) 8402 (27.0) 92 (24.8) 254 (23.6) 5191 (22.0) 162 (23.4) 254 (20.9) 7888 (21.2)
 College graduate 31 (12.7) 190 (18.8) 6574 (21.1) 126 (34.0) 467 (43.4) 10 575 (44.9) 176 (25.5) 429 (35.3) 14 183 (38.2)
 Missing 13 (5.3) 50 (5.0) 1649 (5.3) 13 (3.5) 21 (2.0) 496 (2.1) 4 (0.3) 198 (0.5)
Body mass index, kg/m2
 <25 59 (24.2) 245 (24.3) 7698 (24.7) 138 (37.2) 422 (39.2) 8929 (37.9) 247 (35.8) 422 (34.7) 12 155 (32.7)
 25 to <30 55 (22.5) 266 (26.3) 8384 (26.9) 163 (43.9) 443 (41.1) 9944 (42.2) 297 (43.0) 517 (42.5) 16 001 (43.1)
 ≥30 36 (14.8) 126 (12.5) 3941 (12.7) 63 (17.0) 197 (18.3) 4303 (18.3) 140 (20.3) 254 (20.9) 8195 (22.1)
 Missing 94 (38.5) 373 (36.9) 11 135 (35.7) 7 (1.9) 15 (1.4) 378 (1.6) 7 (1.0) 24 (2.0) 783 (2.1)
Smoking status
 Current cigarette smoker 108 (44.3) 270 (26.7) 3265 (10.5) 134 (36.1) 290 (26.9) 2194 (9.3) 294 (42.6) 377 (31.0) 3236 (8.7)
 Former cigarette smoker 89 (36.5) 325 (32.2) 7935 (25.5) 209 (56.3) 594 (55.2) 11 821 (50.2) 338 (48.9) 425 (34.9) 15 633 (42.1)
 Never cigarette smoker who ever smoked pipe, cigar (or cigarillo or used chewing tobacco or snuff in AHS) 1 (0.4) 35 (3.5) 1409 (4.5) 3 (0.8) 14 (1.3) 693 (2.9) 5 (0.7) 38 (3.1) 1658 (4.5)
 Never smoker of cigarettes, pipe, cigar (or cigarillo or used chewing tobacco or snuff in AHS) 46 (18.9) 380 (37.6) 18 547 (59.5) 25 (6.7) 179 (16.6) 8845 (37.6) 54 (7.8) 377 (31.0) 16 608 (44.7)
Alcohol consumption
 0 drinks per day 105 (43.0) 326 (32.3) 11 221 (36.0) 60 (16.2) 306 (19.9) 5192 (22.0) 148 (21.4) 268 (22.0) 8504 (22.9)
 <1 drink per dayd 191 (51.5) 801 (52.1) 12 618 (53.5) 282 (40.8) 618 (50.8) 19 276 (51.9)
 1–2 drinks per day 98 (40.2) 518 (51.3) 15 576 (50.0) 49 (13.2) 224 (14.6) 3181 (13.5) 91 (13.2) 120 (9.9) 3901 (10.5)
 ≥2 drinks per day 33 (13.5) 139 (13.8) 3744 (12.0) 71 (19.1) 208 (13.5) 2577 (10.9) 83 (12.0) 90 (7.4) 2114 (5.7)
 Missing 8 (3.3) 27 (2.7) 616 (2.0) 87 (12.6) 121 (9.9) 3339 (9.0)
Alpha diversity (mean, SD)
 Observed ASVs 115.05 (49.60) 126.45 (46.49) 126.67 (44.76) 116.22 (47.16) 126.44 (43.85) 129.1 (41.50) 109.29 (43.92) 119.97 (41.43) 123.38 (38.91)
 Faith’s PD 9.95 (2.83) 10.47 (2.59) 10.45 (2.48) 9.87 (2.79) 10.44 (2.50) 10.57 (2.34) 8.61 (2.35) 9.1 (2.13) 9.24 (1.94)
 Shannon Index 4.29 (0.81) 4.37 (0.75) 4.37 (0.75) 4.4 (0.78) 4.55 (0.69) 4.59 (0.65) 4.28 (0.77) 4.43 (0.67) 4.47 (0.63)
a

The lung cancer cases all have a weight of 1 and therefore do not have weighted estimates. AHS = Agricultural Health Study; ASV = amplicon sequence variant; NIH-AARP = National Institutes of Health-AARP; PD = phylogenetic diversity; PLCO = Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial.

b

The weighted counts for the referent subcohort are rounded down to the nearest whole number.

c

In the AHS, the questionnaire included race and ethnicity category options of Asian, Black, Hispanic, White, or missing. For NIH-AARP, the questionnaire included race and ethnicity category options of Asian/Pacific Islander/American Indian/Alaskan Native, Hispanic, non-Hispanic Black, non-Hispanic White, or missing/unknown. For PLCO, the questionnaire had race and ethnicity category options of Asian, Pacific Islander, or American Indian, Hispanic, non-Hispanic Black, or non-Hispanic White. Because of small sample sizes (<15) in each race and ethnicity group for at least 1 of the cohorts, we were able to present data only for White individuals compared with individuals identifying in other race and ethnicity groups.

d

In the AHS, less than 1 drink per day was not an option in the questionnaire.

Microbial Associations With Overall Lung Cancer Risk

As seen in Table 2, higher alpha diversity was associated with lower lung cancer risk in the meta-analysis from the 3 cohorts after adjustment for confounders. For example, for every SD increase in the Shannon index, the risk of lung cancer decreased by approximately 10% (HR = 0.90, 95% CI = 0.84 to 0.96), with no evidence of between-study heterogeneity (Pheterogeneity = .76). Within each cohort, smoking history was the strongest confounder of the association (Supplementary Table 4, available online). For example, in PLCO, the hazard ratio for the Shannon index was 0.76 (95% CI = 0.68 to 0.84) without adjustment; 0.77 (95% CI = 0.69 to 0.86) after adjustment for age, sex, race and ethnicity, education, and alcohol; and 0.91 (95% CI = 0.82 to 1.01) after additional adjustment for detailed smoking history.

Table 2.

Associations between alpha diversity, principal component vectors, and specific genera with lung cancer risk

Microbiome metrics Meta-analysis AHSa NIH-AARPb PLCOc
Cases = 1306/subcohort = 3304
Cases = 244/subcohort = 1010
Cases = 371/subcohort = 1077
Cases = 691/subcohort = 1217
HR (95% CI) P P heterogeneity HR (95% CI) P HR (95% CI) P HR (95% CI) P
Alpha diversityd
 Observed ASVs 0.90 (0.83 to 0.97) .005 .94 0.92 (0.79 to 1.08) .30 0.89 (0.78 to 1.02) .08 0.90 (0.80 to 1.00) .05
 Faith's PD 0.91 (0.85 to 0.98) .01 .92 0.94 (0.80 to 1.09) .41 0.91 (0.80 to 1.02) .12 0.90 (0.81 to 1.00) .06
 Shannon Index 0.90 (0.84 to 0.96) .003 .76 0.93 (0.80 to 1.08) .35 0.87 (0.77 to 0.98) .02 0.91 (0.82 to 1.01) .08
PCA vectors from relative abundanced
 PC1 (44.8%) 0.99 (0.91 to 1.07) .72 .93 0.98 (0.83 to 1.17) .86 1.01 (0.87 to 1.17) .91 0.97 (0.87 to 1.09) .64
 PC2 (15.2%) 1.11 (1.04 to 1.18) .003 .43 1.03 (0.90 to 1.18) .64 1.10 (0.98 to 1.24) .11 1.15 (1.04 to 1.27) .005
 PC3 (6.6%) 0.96 (0.89 to 1.04) .33 .66 1.02 (0.86 to 1.21) .82 0.97 (0.84 to 1.11) .65 0.93 (0.82 to 1.05) .22
 PC4 (4.7%) 1.04 (0.96 to 1.12) .32 .02 1.25 (1.05 to 1.50) .01 0.93 (0.83 to 1.05) .26 1.05 (0.94 to 1.17) .36
 PC5 (3.8%) 1.00 (0.92 to 1.08) .97 .32 1.13 (0.94 to 1.36) .18 0.96 (0.85 to 1.09) .56 0.98 (0.87 to 1.10) .70
PCA vectors from presence/absenced
 PC1 (7.9%) 1.17 (1.08 to 1.25) <.001 .92 1.14 (0.98 to 1.33) .09 1.16 (1.02 to 1.32) .03 1.18 (1.06 to 1.32) .003
 PC2 (4.4%) 0.96 (0.89 to 1.03) .25 .67 0.94 (0.80 to 1.11) .50 1.00 (0.88 to 1.14) >.99 0.93 (0.82 to 1.04) .20
 PC3 (2.4%) 1.03 (0.95 to 1.12) .41 .28 0.92 (0.78 to 1.08) .32 1.06 (0.93 to 1.22) .38 1.08 (0.95 to 1.22) .23
 PC4 (2.1%) 0.97 (0.89 to 1.05) .44 .74 1.01 (0.85 to 1.20) .88 0.98 (0.85 to 1.12) .77 0.93 (0.82 to 1.06) .30
 PC5 (1.3%) 1.04 (0.96 to 1.13) .33 .25 1.05 (0.87 to 1.27) .58 0.96 (0.84 to 1.09) .50 1.11 (0.99 to 1.25) .09
Genus-level relative abundanceb
Abiotrophia 1.06 (1.03 to 1.10) .001 .13 1.05 (0.87 to 1.26) .63 0.89 (0.74 to 1.06) .19 1.07 (1.03 to 1.11) <.001
Lactobacillus 1.06 (1.03 to 1.09) <.001 .06 0.99 (0.92 to 1.06) .72 1.06 (1.01 to 1.12) .01 1.09 (1.05 to 1.14) <.001
Streptococcus 1.14 (1.06 to 1.22) .001 .78 1.12 (0.95 to 1.31) .18 1.10 (0.97 to 1.26) .13 1.17 (1.05 to 1.30) .005
Genus-level presence
Peptoniphilus 1.67 (1.33 to 2.10) <.001 .19 1.43 (0.86 to 2.38) .17 2.21 (1.52 to 3.21) <.001 1.41 (1.00 to 2.01) .05
Peptostreptococcus 0.74 (0.63 to 0.87) <.001 .64 0.84 (0.59 to 1.20) .34 0.76 (0.57 to 1.00) .05 0.69 (0.54 to 0.87) .002
Eubacterium yurii group 0.72 (0.60 to 0.85) <.001 .50 0.70 (0.49 to 1.01) .06 0.63 (0.46 to 0.85) .003 0.80 (0.61 to 1.04) .09
Aggregatibacter 0.74 (0.63 to 0.87) <.001 .71 0.85 (0.59 to 1.20) .35 0.73 (0.55 to 0.97) .03 0.71 (0.56 to 0.90) .005
a

In the AHS data, the model includes adjustment for sex, age categories (≤49, 50–58, 59–68, and ≥69 years), continuous age, age squared, race and ethnicity (non-Hispanic White and other race), 13 level smoking variable (see Supplementary Table 1, available online for details), education, and alcohol consumption. AHS = Agricultural Health Study; ASV = amplicon sequence variant; NIH-AARP = National Institutes of Health-AARP; PC = principal component; PCA = principal component analysis; PD = phylogenetic diversity; PLCO = Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial.

b

In the NIH-AARP data, the model includes adjustment for sex, age categories (≤70, 71–75, and ≥76 years), continuous age, age squared, race and ethnicity (non-Hispanic White and other race), 18 level smoking variable (see Supplementary Table 2, available online for details), education, and alcohol consumption.

c

In the PLCO data, the model includes adjustment for sex, age categories (≤64, 65–69, 70–74 and ≥75 years), continuous age, age squared, race and ethnicity (non-Hispanic White and other race), 22 level smoking variable (see Supplementary Table 3, available online for details), education, and alcohol consumption.

d

The hazard ratio for alpha diversity, PCA vectors, and relative abundance represents the change for 1 SD in the continuous predictor.

Next, we examined PCA vectors generated from both the relative abundance and the presence tables at the ASV level as a measure of the overall community structure. For the relative abundance table, every SD increase in PC2, which explained 15.2% of the variability in the matrix, was associated with 1.11 (95% CI = 1.04 to 1.18) times higher risk of lung cancer. The relative abundance of an ASV assigned to the Streptococcus genus was most strongly correlated with PC2, with a correlation coefficient of 0.79. For the presence table, every SD increase in PC1, which explained 7.9% of the variability in the matrix, was associated with 1.17 (95% CI = 1.08 to 1.25) times higher risk of lung cancer (Table 2). The presence of multiple ASVs was strongly, inversely correlated with this vector, including ASVs assigned to the Fusobacterium (R = −0.60), Kingella (R = −0.51), Rothia (R = −0.55), Corynebacterium (R = −0.52), Prevotella (R = −0.57), Corynebacterium (R = −0.56), Capnocytophaga (R = −0.56), Dialister (R = −0.54), Peptostreptococcus (R = −0.59), Campylobacter (R = −0.51), Ruminococcaceae UCG-014 (R = −0.58), and Peptococcus (R = −0.54) genera, suggesting a lower prevalence of these genera in cases. No taxa were strongly, positively correlated with this vector.

The relative abundance of 3 genera and the presence of 4 genera were found to be associated with lung cancer risk after adjustment for multiple comparisons (Table 2; Supplementary Tables 5 and 6, available online). For every SD increase in the relative abundance of Abiotrophia, the risk of lung cancer was increased by 1.06 (95% CI = 1.03 to 1.10). Although there was some indication for between-study heterogeneity (Pheterogeneity = .06), the risk of lung cancer was increased by 1.06 (95% CI = 1.03 to 1.09) for every SD increase in the relative abundance of Lactobacillus. This association with Lactobacillus was observed in the NIH-AARP and PLCO but not in the AHS. The relative abundance of the most common taxon, Streptococcus (average relative abundance of 40.6%, 36.6%, and 39.3% in AHS, NIH-AARP, and PLCO, respectively), was also associated with lung cancer risk, with a hazard ratio of 1.14 (95% CI = 1.06 to 1.22). The presence of Peptoniphilus was associated with 1.67 times the risk of lung cancer (95% CI = 1.33 to 2.10), and the presence of Peptostreptococcus, Eubacterium yurii, and Aggregatibacter were associated with 0.74 (95% CI = 0.63 to 0.87), 0.72 (95% CI = 0.60 to 0.85), and 0.74 (95% CI = 0.63 to 0.87) times the risk of lung cancer, respectively.

Associations for alpha diversity, PCA vectors, and genera were little changed after excluding the first 2 years of follow-up (Supplementary Table 7, available online).

Microbial Associations With Lung Cancer Histologic Subtypes

When we stratified by histologic subtype (Table 3), with 479 cases of adenocarcinoma, 256 cases of squamous cell carcinoma, and 171 cases of small cell lung cancer, all measures of alpha diversity were statistically significantly associated with lower squamous cell carcinoma risk (eg, observed ASVs HR = 0.79, 95% CI = 0.68 to 0.90). Although greater alpha diversity was associated with lower risk of adenocarcinoma and small cell lung cancer, the associations were weaker and not statistically significant. For the PCA vectors, the positive association between PC2 from the relative abundance table and lung cancer overall was generally seen for each of the histologic subtypes, but none reached statistical significance in these lower powered analyses. For PC1 from the presence table, a positive association was observed with all histologic subtypes, but the association was only statistically significant for squamous cell carcinoma (HR = 1.33, 95% CI = 1.16 to 1.53).

Table 3.

Meta-analyzed associations between alpha diversity, PC vectors, and specific genera with lung cancer subtypes

Microbiome metrics Adenocarcinomaa Squamous cell carcinomaa Small cell lung cancera
(Cases = 479)
(Cases = 256)
(Cases = 171)
HR (95% CI) P P heterogeneity HR (95% CI) P P heterogeneity HR (95% CI) P P heterogeneity
Alpha diversityb
 Observed ASVs 0.95 (0.86 to 1.06) .37 .93 0.79 (0.68 to 0.90) <.001 .60 0.92 (0.77 to 1.08) .30 .72
 Faith's PD 0.95 (0.86 to 1.05) .29 .97 0.81 (0.72 to 0.93) .002 .40 0.94 (0.80 to 1.12) .49 .82
 Shannon Index 0.94 (0.85 to 1.05) .27 .51 0.82 (0.72 to 0.93) .002 .83 0.86 (0.74 to 1.01) .07 .84
PCA vectors from relative abundanceb
 PC1 (44.8%) 1.05 (0.95 to 1.17) .35 .61 1.00 (0.86 to 1.17) .97 .30 0.95 (0.80 to 1.14) .59 .50
 PC2 (15.2%) 1.07 (0.97 to 1.18) .19 .24 1.13 (0.99 to 1.29) .06 .16 1.10 (0.96 to 1.25) .18 .74
 PC3 (6.6%) 1.01 (0.90 to 1.13) .86 .59 0.95 (0.81 to 1.12) .55 .62 0.80 (0.66 to 0.97) .02 .43
 PC4 (4.7%) 1.06 (0.96 to 1.17) .28 .18 0.94 (0.82 to 1.09) .42 .38 1.14 (0.96 to 1.35) .14 .28
 PC5 (3.8%) 0.97 (0.88 to 1.07) .56 .90 1.02 (0.89 to 1.18) .76 .42 1.20 (1.00 to 1.43) .046 .80
PCA vectors from presence/absenceb
 PC1 (7.9%) 1.10 (0.99 to 1.21) .08 .79 1.33 (1.16 to 1.53) <.001 .74 1.17 (0.99 to 1.39) .06 .57
 PC2 (4.4%) 0.97 (0.87 to 1.08) .56 .33 0.99 (0.86 to 1.14) .89 .99 0.84 (0.71 to 1.00) .046 .52
 PC3 (2.4%) 0.99 (0.88 to 1.11) .84 .64 1.09 (0.94 to 1.27) .25 .61 1.16 (0.97 to 1.39) .09 .56
 PC4 (2.1%) 1.06 (0.95 to 1.18) .31 .34 1.01 (0.86 to 1.19) .87 .71 0.77 (0.65 to 0.92) .005 .26
 PC5 (1.3%) 1.02 (0.91 to 1.13) .78 .21 1.05 (0.91 to 1.22) .48 .02 1.10 (0.92 to 1.33) .30 .28
Genus-level relative abundanceb
Abiotrophia 1.09 (1.06 to 1.11) <.001 .10 1.04 (0.86 to 1.26) .67 .14 1.17 (0.98 to 1.41) .09 .41
Lactobacillus 1.03 (0.97 to 1.09) .31 .35 1.07 (1.01 to 1.12) .01 .29 1.07 (0.99 to 1.15) .09 .06
Streptococcus 1.10 (0.98 to 1.22) .09 .41 1.17 (1.01 to 1.36) .03 .65 1.24 (1.05 to 1.45) .009 .62
Genus-level presence
Peptoniphilus 1.79 (1.29 to 2.47) <.001 .11 1.37 (0.89 to 2.10) .15 .53 1.97 (1.25 to 3.11) .003 .14
Peptostreptococcus 0.82 (0.65 to 1.03) .08 .21 0.56 (0.41 to 0.77) <.001 .80 0.73 (0.51 to 1.04) .08 .69
Eubacterium yurii group 0.71 (0.56 to 0.90) .005 .29 0.68 (0.48 to 0.96) .03 .97 0.74 (0.50 to 1.09) .12 .77
Aggregatibacter 0.85 (0.68 to 1.07) .17 .60 0.57 (0.42 to 0.79) <.001 .86 0.60 (0.41 to 0.86) .006 .12
a

In the AHS data, the model includes adjustment for sex, age categories (≤49, 50–58, 59–68, and ≥ 69 years), continuous age, age squared, race and ethnicity (non-Hispanic White and other race), 13 level smoking variable (see Supplementary Table 1, available online for details), education, and alcohol consumption. In the NIH-AARP data, the model includes adjustment for sex, age categories (≤70, 71–75, and ≥76 years), continuous age, age squared, race and ethnicity (non-Hispanic White and other race), 18 level smoking variable (see Supplementary Table 2, available online for details), education, and alcohol consumption. In the PLCO data, the model includes adjustment for sex, age categories (≤64, 65–69, 70–74 and ≥75 years), continuous age, age squared, race and ethnicity (non-Hispanic White and other race), 22 level smoking variable (see Supplementary Table 3, available online for details), education, and alcohol consumption. AHS = Agricultural Health Study; ASV = amplicon sequence variant; NIH-AARP = National Institutes of Health-AARP; PC = principal component; PCA = principal component analysis; PD = phylogenetic diversity; PLCO = Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial.

b

The hazard ratio for alpha diversity, PCA vectors, and relative abundance represents the change for 1 SD in the continuous predictor.

For the genera statistically significantly associated with overall lung cancer, some associations appeared to differ by histologic subtype. For example, the association for the relative abundance of Abiotrophia appeared to be restricted to adenocarcinoma (HR = 1.09, 95% CI = 1.06 to 1.11) and small cell lung cancer (HR = 1.17, 95% CI = 0.98 to 1.41), although only the association with adenocarcinoma was statistically significant (Table 3). The associations between the relative abundance and presence of all genera with lung cancer risk by histologic subtype are presented in Supplementary Tables 8 and 9 (available online).

Microbial Associations With Lung Cancer Risk Stratified by Smoking History

By smoking status, associations with alpha diversity were largely restricted to former smokers (eg, observed ASVs HR = 0.85, 95% CI = 0.75 to 0.96; Table 4). Associations within current smokers were also inverse, albeit weaker and not statistically significant. The associations between alpha diversity and lung cancer risk among never smokers were weakly positive but also not statistically significant. For the PCA vectors, the positive association between PC2 from the relative abundance table was observed only within former smokers (HR = 1.20, 95% CI = 1.08 to 1.34) and stronger than the overall association. Similarly, the positive association between PC1 from the presence table was detected only among former smokers (HR = 1.26, 95% CI = 1.12 to 1.42).

Table 4.

Meta-analyzed associations between alpha diversity, PC vectors, and specific genera with lung cancer risk stratified by smoking history

Microbiome metrics Current smokersa Former smokersa Never smokersa,b
(Cases = 536/subcohort = 937)
(Cases = 636/subcohort = 1344)
(Cases = 125/subcohort = 936)
HR (95% CI) P P heterogeneity HR (95% CI) P P heterogeneity HR (95% CI) P P heterogeneity
Alpha diversityc
 Observed ASVs 0.95 (0.84 to 1.08) .44 .42 0.85 (0.75 to 0.96) .007 .67 1.03 (0.82 to 1.30) .79 .98
 Faith’s PD 0.97 (0.85 to 1.10) .61 .54 0.86 (0.76 to 0.96) .01 .95 1.01 (0.81 to 1.27) .92 .83
 Shannon Index 0.91 (0.81 to 1.03) .13 .61 0.87 (0.77 to 0.97) .02 .54 1.03 (0.82 to 1.28) .82 .32
PCA vectors from relative abundancec
 PC1 (44.8%) 1.04 (0.91 to 1.18) .59 .75 0.93 (0.82 to 1.05) .25 .53 1.03 (0.85 to 1.24) .79 .52
 PC2 (15.2%) 1.06 (0.95 to 1.18) .31 .76 1.20 (1.08 to 1.34) <.001 .37 0.98 (0.81 to 1.18) .84 .06
 PC3 (6.6%) 0.92 (0.81 to 1.05) .23 .62 0.97 (0.85 to 1.10) .59 .54 0.99 (0.82 to 1.20) .92 .96
 PC4 (4.7%) 1.01 (0.90 to 1.14) .82 .17 1.04 (0.91 to 1.18) .58 .11 1.06 (0.84 to 1.34) .60 .25
 PC5 (3.8%) 1.07 (0.94 to 1.22) .30 .91 0.94 (0.83 to 1.07) .38 .25 1.00 (0.79 to 1.26) .97 .87
PCA vectors from presence/absencec
 PC1 (7.9%) 1.08 (0.96 to 1.23) .20 .41 1.26 (1.12 to 1.42) <.001 .86 1.00 (0.80 to 1.24) .98 .93
 PC2 (4.4%) 0.98 (0.86 to 1.12) .77 .95 0.91 (0.81 to 1.02) .10 .37 1.01 (0.80 to 1.28) .92 .66
 PC3 (2.4%) 1.07 (0.94 to 1.21) .32 .49 1.02 (0.91 to 1.15) .73 .84 0.96 (0.78 to 1.18) .68 .81
 PC4 (2.1%) 0.98 (0.85 to 1.12) .74 .80 0.96 (0.84 to 1.08) .49 .69 0.90 (0.75 to 1.09) .30 .91
 PC5 (1.3%) 1.08 (0.96 to 1.23) .20 .58 1.02 (0.90 to 1.15) .74 .05 1.02 (0.82 to 1.25) .88 .11
Genus-level relative abundancec
Abiotrophia 1.07 (0.96 to 1.19) .23 .04 1.05 (1.03 to 1.07) <.001 .19 1.03 (0.84 to 1.27) .79 .49
Lactobacillus 1.11 (1.02 to 1.22) .02 .08 1.06 (1.04 to 1.08) <.001 .12 0.87 (0.71 to 1.07) .18 .96
Streptococcus 1.09 (0.96 to 1.23) .19 .48 1.22 (1.08 to 1.37) .001 .70 1.03 (0.84 to 1.27) .76 .12
Genus-level presence
Peptoniphilus 1.50 (1.09 to 2.05) .01 .16 2.17 (1.47 to 3.19) <.001 .82 NA
Peptostreptococcus 0.78 (0.60 to 1.00) .05 .95 0.74 (0.58 to 0.96) .02 .89 0.83 (0.49 to 1.41) .50 .28
Eubacterium yurii group 0.71 (0.53 to 0.96) .03 .82 0.65 (0.49 to 0.85) .002 .51 1.05 (0.69 to 1.60) .83 .78
Aggregatibacter 0.91 (0.70 to 1.18) .46 .08 0.62 (0.49 to 0.80) <.001 .52 0.84 (0.53 to 1.33) .46 .25
a

In the AHS data, the model includes adjustment for sex, age categories (≤49, 50–58, 59–68, and ≥69 years), continuous age, age squared, race and ethnicity (non-Hispanic White and other race), 13 level smoking variable (see Supplementary Table 1, available online for details), education, and alcohol consumption. In the NIH-AARP data, the model includes adjustment for sex, age categories (≤70, 71–75, and ≥76 years), continuous age, age squared, race and ethnicity (non-Hispanic White and other race), 18 level smoking variable (see Supplementary Table 2, available online for details), education, and alcohol consumption. In the PLCO data, the model includes adjustment for sex, age categories (≤64, 65–69, 70–74 and ≥75 years), continuous age, age squared, race and ethnicity (non-Hispanic White and other race), 22 level smoking variable (see Supplementary Table 3, available online for details), education, and alcohol consumption. AHS = Agricultural Health Study; ASV = amplicon sequence variant; NIH-AARP = National Institutes of Health-AARP; PC = principal component; PCA = principal component analysis; PD = phylogenetic diversity; PLCO = Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial.

b

Never smokers includes only individuals who never smoked cigarettes, pipe, cigar (cigarillo, or never user of chewing tobacco or snuff for AHS).

c

The hazard ratio for alpha diversity, PCA vectors, and relative abundance represents the change for 1 SD in the continuous predictor.

For the genera statistically significantly associated with overall lung cancer, many of the associations were restricted to current and former smokers but generally only reached statistical significance in former smokers (Table 4). For example, for Abiotrophia, for every SD increase in the relative abundance, the risk of lung cancer increased by 1.07 (95% CI = 0.96 to 1.19) and 1.05 (95% CI = 1.03 to 1.07) for current and former smokers, respectively. The association among current smokers also appeared to be heterogeneous across the 3 cohorts (Pheterogeneity = .04). The associations between the relative abundance and presence of all genera with lung cancer risk stratified by smoking history are presented in Supplementary Tables 10 and 11 (available online).

When we additionally stratified current and former smokers by smoking intensity (Supplementary Table 12, available online), the inverse alpha diversity associations were generally the strongest among the former light and moderate smokers. For the association between PC2 from the relative abundance table, the additional stratification by smoking intensity did not appear to modify associations for current and former smokers, but the association between PC1 from the presence table was weakest among current heavy smokers (HR = 1.05, 95% CI = 0.86 to 1.30) and strongest among former light and moderate smokers (HR = 1.37, 95% CI = 1.14 to 1.66). For the genus-level associations, the associations for the relative abundance of Lactobacillus and the presence of Peptoniphilus were positively associated with lung cancer risk across smoking history and intensity categories, whereas the associations for other genera tended to differ across strata. In particular, few statistically significant genus-level associations were observed within current heavy smokers.

Associations Between Smoking History and Lung Cancer–Related Genera in the Referent Subcohort

As seen in Supplementary Table 13 (available online), most lung cancer–related genera were associated with smoking history, and current smokers tended to have stronger associations than former smokers. For example, for Lactobacillus, the zero-inflated term for current smokers was not estimable, but current smokers had 3.21 (95% CI = 2.38 to 4.34) times greater relative abundances of Lactobacillus compared with never smokers. Former smokers had statistically significantly lower odds of having absent Lactobacillus in all 3 cohorts (odds ratio = 0.63, 95% CI = 0.51 to 0.79) and had a 1.24 (95% CI = 0.97 to 1.57) times greater relative abundance of Lactobacillus compared with never smokers.

Discussion

In this case-cohort study nested within 3 cohorts in the United States, we found that greater microbial diversity was associated with lower risk of developing lung cancer over follow-up. We also found that PCA vectors driven by the relative abundance of Streptococcus and vectors driven by the presence of multiple genera were strongly associated with risk of lung cancer. When tested as individual genera, a higher relative abundance of Abiotrophia, Lactobacillus, or Streptococcus in the oral wash was associated with a greater risk of developing lung cancer over follow-up. Individuals with detectable Peptoniphilus had a higher risk of developing lung cancer, whereas individuals with detectable Peptostreptococcus, Eubacterium yurii, or Aggregatibacter had a lower risk of developing lung cancer. The observed associations were usually strongest for squamous cell carcinoma and among former smokers.

Most previous studies of the association between the oral or sputum microbiome with lung cancer have been case-control studies with relatively small sample sizes (7-10). Case-control studies are subject to reverse causation because samples are collected from people currently diagnosed with the disease. However, 2 prospective studies have investigated this association. In 1 study conducted in China, 114 prospectively ascertained never-smoking lung cancer cases were matched to 114 never-smoking controls within the Shanghai Women’s and Men’s Health Studies. An inverse association was observed between alpha diversity and risk of lung cancer, which was statistically significant for the Shannon and Simpson indices but not observed species. Greater relative abundances of Spirochaetia and Bacteroidetes were associated with lower lung cancer risk, whereas a greater relative abundance of Lactobacillales was associated with higher lung cancer risk (11). In a second US study, 156 prospectively ascertained lung cancer cases were matched to 156 controls in the Southern Community Cohort Study. Alpha diversity tended to be lower in cases compared with controls but did not reach statistical significance, and no taxa were found to be statistically significantly associated with lung cancer risk after Bonferroni correction (12).

In our substantially larger study, higher relative abundances of Abiotrophia, Lactobacillus, or Streptococcus, or detectable Peptoniphilus were associated with a greater risk of lung cancer, whereas detectable Peptostreptococcus, Eubacterium yurii, or Aggregatibacter were associated with a lower risk of lung cancer. Most of these genera associated with lung cancer risk have been found to be associated with pneumonia (19,20) or oral health parameters such as periodontal disease (21-23) or dental caries and decay (24,25). And previous epidemiological studies have found associations between lung infections, periodontal disease, and lung cancer risk (3-5). As there are few previous oral microbiome studies, our genus-specific findings require replication.

When we stratified by histologic subtype, the alpha diversity and PCA vector associations in our study were primarily restricted to squamous cell carcinoma and were not observed in adenocarcinoma or small cell lung cancer cases. However, the associations with Abiotrophia and Peptoniphilus appeared to be restricted to individuals with adenocarcinoma and small cell lung cancer, and the association with Aggregatibacter appeared to be restricted to individuals with squamous cell carcinoma and small cell lung cancer. In contrast, the associations for Lactobacillus, Streptococcus, and Eubacterium yurii were relatively consistent across histologic subtypes. Etiologic differences, including a differing impact of cigarette smoking, between each histologic subtype are possible and may explain differing associations with the oral microbiome. However, these results require replication.

When the data were stratified by smoking history, the alpha diversity and PCA vector associations in our study were strongest for the former smokers, although most of the specific taxonomic associations were observed in both current and former smokers. Smoking intensity may also play a role in these associations because we observed some differences in associations by categories of smoking history and intensity. Cigarette smoking causes lung cancer (26) and is also associated with the oral microbiome (6). It has been shown that quitting smoking can greatly reduce the risk of developing lung cancer (27), and although oral microbial communities of former smokers have appeared to be more similar to never smokers than current smokers, the microbial communities of former smokers also appeared to be more heterogeneous than never smokers (6). Additional studies of effects of smoking cessation on the oral microbiome may help to understand these findings. Within the United States, lung cancer in never smokers is considerably less common than among smokers (28), and never-smoking lung cancer has a distinct etiology (29). Reflecting the far lower incidence of lung cancer in never smokers, we had considerably less statistical power in this group relative to former and current smokers. Future, larger studies of the oral microbiome in never-smoking lung cancer are needed to complement studies among smokers.

This study has multiple important strengths. First, to our knowledge, the inclusion of 1306 lung cancer cases represents the largest study of the oral microbiome with lung cancer to date, and each cohort comprehensively assessed smoking and other cancer risk factors. Furthermore, this study was conducted prospectively, so any observed associations should be less affected by reverse causation. And because associations were essentially unchanged after excluding the first 2 years of follow-up, it is unlikely that the associations we detected were due to undiagnosed, prevalent lung cancer. Finally, the 3 cohorts included in this study represent different populations within the United States, and the consistency of many of the detected associations provides more confidence in the validity of these associations.

This study also has limitations that need to be considered. We lacked information about the timing of the oral wash (eg, morning, afternoon, or evening) collection, participant activities before the collection (eg, eating, smoking, brushing teeth, or using antibiotics), and participants’ oral health. In addition, no questionnaires were obtained at the time of the mouthwash collection, so all confounders included in this study were from the baseline assessment. We also had only a single timepoint of the oral wash specimen per participant; however, for oral microbial metrics that are fairly stable over time, a specimen from a single timepoint is generally sufficient (16). Finally, the populations of all 3 cohorts were predominantly non-Hispanic White individuals, and it is essential that future studies include more diverse participants to further investigate this association.

In conclusion, using data from 3 prospective cohort studies in the United States, we found that the oral microbiome was associated with risk of lung cancer, particularly squamous cell carcinoma and among former smokers. Additional research is needed within diverse populations and to understand the mechanisms by which the oral microbiome may contribute to lung cancer risk. If these findings are robust across diverse populations, the oral microbiome may represent an important risk factor for lung cancer and may offer new opportunities for cancer prevention.

Funding

This work was supported by the Intramural Research Program of the National Institutes of Health, the National Institute of Environmental Health Sciences (Z01-ES049030), and the National Cancer Institute (Z01-CP010119). This work has also been supported in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, under NCI Contract No. 75N910D00024. This work was also partially funded by the National Cancer Institute Informatics Technology for Cancer Research Award 1U24CA248454-01 to JGC.

Notes

Role of the funder: The study sponsor had no role in the study design; in the collection, analysis, or interpretation of data; in the writing of the report; or in the decision to submit the paper for publication.

Disclosures: We have no conflicts of interest to disclose. MHG, a JNCI Associate Editor and coauthor on this article, was not involved in the editorial review or decision to publish the manuscript.

Author contributions: Conceptualization (EV, GY, NEC, RS, MHG, JS, CCA), data curation (EV, XH, VP, AGH, DS, YW, WW, DPS, LEBF, LML, WYH, NDF, RS, JS), formal analysis (EV, XH, VP, AGH, DS, YW, SL, CLD, KJ, BDH, AH, WW, MHG, JS), funding acquisition (DPS, LEBF, RS, CCA), methodology (EV, YW, CLD, KJ, BDH, AH, JGC, RS, MHG, JS, CCA), writing—original draft (EV, CCA), and writing—review and editing (all authors).

Acknowledgements: We gratefully acknowledge the contributions of Dr Nathaniel Rothman for his role in initiating and designing the oral rinse collection in the AHS. We also acknowledge the research contributions of the Cancer Genomics Research Laboratory for their expertise, execution, and support of this research in the areas of project planning, wet laboratory processing of specimens, and bioinformatics analysis of generated data.

Disclaimers: The content of this publication does not necessarily reflect the views or policies of the Department of Health and Human Services, nor does mention of trade names, commercial products, or organizations imply endorsement by the US Government.

Prior presentations: This study was presented as a poster at the AACR Microbiome, Viruses, and Cancer Special Conference on February 22, 2020, and virtually presented as a short oral presentation at the 8th International Human Microbiome Consortium Congress on June 29, 2021.

Supplementary Material

djac149_Supplementary_Data

Contributor Information

Emily Vogtmann, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, USA.

Xing Hua, Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, WA, USA.

Guoqin Yu, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, USA.

Vaishnavi Purandare, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, USA.

Autumn G Hullings, Nutrition Department, University of North Carolina, Chapel Hill, NC, USA.

Dantong Shao, Guangzhou Institute of Pediatrics, Guangzhou Women and Children’s Medical Center, Guangzhou Medical University, Guangzhou, China.

Yunhu Wan, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, USA; Frederick National Laboratory for Cancer Research/Leidos Biomedical Research Laboratory, Inc, Frederick, MD, USA.

Shilan Li, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, USA; Department of Biostatistics, Bioinformatics and Biomathematics, Georgetown University Medical Center, Washington, DC, USA.

Casey L Dagnall, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, USA; Frederick National Laboratory for Cancer Research/Leidos Biomedical Research Laboratory, Inc, Frederick, MD, USA.

Kristine Jones, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, USA; Frederick National Laboratory for Cancer Research/Leidos Biomedical Research Laboratory, Inc, Frederick, MD, USA.

Belynda D Hicks, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, USA; Frederick National Laboratory for Cancer Research/Leidos Biomedical Research Laboratory, Inc, Frederick, MD, USA.

Amy Hutchinson, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, USA; Frederick National Laboratory for Cancer Research/Leidos Biomedical Research Laboratory, Inc, Frederick, MD, USA.

J Gregory Caporaso, Center for Applied Microbiome Science, Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, AZ, USA.

William Wheeler, Information Management Services, Inc, Rockville, MD, USA.

Dale P Sandler, Epidemiology Branch, Chronic Disease Epidemiology Group, National Institute for Environmental Health Science, Research Triangle Park, NC, USA.

Laura E Beane Freeman, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, USA.

Linda M Liao, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, USA.

Wen-Yi Huang, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, USA.

Neal D Freedman, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, USA.

Neil E Caporaso, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, USA.

Rashmi Sinha, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, USA.

Mitchell H Gail, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, USA.

Jianxin Shi, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, USA.

Christian C Abnet, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, USA.

Data Availability

Microbiome sequencing data are available at the Sequence Read Archive (SRA) under project number PRJNA801882 with limited metadata (https://www.ncbi.nlm.nih.gov/sra/). For complete metadata, a data application will need to approved from the Agricultural Health Study (www.aghealthstars.com), the NIH-AARP Diet and Health Study (www.nihaarpstars.com), and the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial (www.cdas.cancer.gov).

References

  • 1. Ferlay J, Colombet M, Soerjomataram I, et al. Cancer statistics for the year 2020: an overview. Int J Cancer. 2021;149:778-789. [DOI] [PubMed] [Google Scholar]
  • 2.American Cancer Society. Cancer Facts & Figures 2021. Atlanta: American Cancer Society; 2021. [Google Scholar]
  • 3. Brenner DR, McLaughlin JR, Hung RJ.. Previous lung diseases and lung cancer risk: a systematic review and meta-analysis. PLoS ONE. 2011;6(3):e17479. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Michaud DS, Liu Y, Meyer M, et al. Periodontal disease, tooth loss, and cancer risk in male health professionals: a prospective cohort study. Lancet Oncol. 2008;9(6):550-558. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Yoon HS, Wen W, Long J, et al. Association of oral health with lung cancer risk in a low-income population of African Americans and European Americans in the Southeastern United States. Lung Cancer. 2019;127:90-95. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Wu J, Peters BA, Dominianni C, et al. Cigarette smoking and the oral microbiome in a large study of American adults. ISME J. 2016;10(10):2435-2446. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Hosgood HD, Sapkota AR, Rothman N, et al. The potential role of lung microbiota in lung cancer attributed to household coal burning exposures. Environ Mol Mutagen. 2014;55(8):643-651. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Cameron SJS, Lewis KE, Huws SA, et al. A pilot study using metagenomic sequencing of the sputum microbiome suggests potential bacterial biomarkers for lung cancer. PLoS One. 2017;12(5):e0177062. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Yang J, Mu X, Wang Y, et al. Dysbiosis of the salivary microbiome is associated with non-smoking female lung cancer and correlated with immunocytochemistry markers. Front Oncol. 2018;8:520. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Zhang W, Luo J, Dong X, et al. Salivary microbial dysbiosis is associated with systemic inflammatory markers and predicted oral metabolites in non-small cell lung cancer patients. J Cancer. 2019;10(7):1651-1662. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Hosgood HD, Cai Q, Hua X, et al. Variation in oral microbiome is associated with future risk of lung cancer among never-smokers. Thorax. 2021;76(3):256-263. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Shi J, Yang Y, Xie H, et al. Association of oral microbiota with lung cancer risk in a low-income population in the Southeastern USA. Cancer Causes Control. 2021;32(12):1423-1007. /s10552-021-01490-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Alavanja MC, Sandler DP, McMaster SB, et al. The agricultural health study. Environ Health Perspect. 1996;104(4):362-369. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Schatzkin A, Subar AF, Thompson FE, et al. Design and serendipity in establishing a large cohort with wide dietary intake distributions: the National Institutes of Health-American Association of Retired Persons Diet and Health Study. Am J Epidemiol. 2001;154(12):1119-1125. [DOI] [PubMed] [Google Scholar]
  • 15. Prorok PC, Andriole GL, Bresalier RS, et al. ; Prostate, Lung, Colorectal and Ovarian Cancer Screening Trial Project Team. Design of the Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Trial. Control Clin Trials. 2000;21(suppl 6):273s-309s. [DOI] [PubMed] [Google Scholar]
  • 16. Vogtmann E, Hua X, Zhou L, et al. Temporal variability of oral microbiota over 10 months and the implications for future epidemiologic studies. Cancer Epidemiol Biomarkers Prev. 2018;27(5):594-600. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Caporaso JG, Kuczynski J, Stombaugh J, et al. QIIME allows analysis of high-throughput community sequencing data. Nat Methods. 2010;7(5):335-336. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Callahan BJ, McMurdie PJ, Rosen MJ, et al. DADA2: high-resolution sample inference from Illumina amplicon data. Nat Methods. 2016;13(7):581-583. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. van der Poll T, Opal SM.. Pathogenesis, treatment, and prevention of pneumococcal pneumonia. Lancet. 2009;374(9700):1543-1556. [DOI] [PubMed] [Google Scholar]
  • 20. Bahrani-Mougeot FK, Paster BJ, Coleman S, et al. Molecular analysis of oral and respiratory bacterial species associated with ventilator-associated pneumonia. J Clin Microbiol. 2007;45(5):1588-1593. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Margaret BS, Heath JR, Krywolap GN.. Pathogenic potential of Eubacterium yurii subspecies. J Med Microbiol. 1990;31(2):103-108. [DOI] [PubMed] [Google Scholar]
  • 22. Rams TE, Feik D, Listgarten MA, et al. Peptostreptococcus micros in human periodontitis. Oral Microbiol Immunol. 1992;7(1):1-6. [DOI] [PubMed] [Google Scholar]
  • 23. Newman MG, Socransky SS, Savitt ED, et al. Studies of the microbiology of periodontosis. J Periodontol. 1976;47(7):373-379. [DOI] [PubMed] [Google Scholar]
  • 24. Caufield PW, Schön CN, Saraithong P, et al. Oral lactobacilli and dental caries: a model for niche adaptation in humans. J Dent Res. 2015;94(suppl 9):110s-118s. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Loesche WJ. Role of Streptococcus mutans in human dental decay. Microbiol Rev. 1986;50(4):353-380. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. US Department of Health and Human Services. Center for Disease Control and Prevention, National Center for Chronic Disease Prevention and Health Promotion, Office on Smoking and Health. The Health Consequences of Smoking: A Report of the Surgeon General. Atlanta, GA: US Department of Health and Human Services; 2004. [Google Scholar]
  • 27. Godtfredsen NS, Prescott E, Osler M.. Effect of smoking reduction on lung cancer risk. JAMA. 2005;294(12):1505-1510. [DOI] [PubMed] [Google Scholar]
  • 28. Wakelee HA, Chang ET, Gomez SL, et al. Lung cancer incidence in never smokers. J Clin Oncol. 2007;25(5):472-478. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Sun S, Schiller JH, Gazdar AF.. Lung cancer in never smokers—a different disease. Nat Rev Cancer. 2007;7(10):778-790. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

djac149_Supplementary_Data

Data Availability Statement

Microbiome sequencing data are available at the Sequence Read Archive (SRA) under project number PRJNA801882 with limited metadata (https://www.ncbi.nlm.nih.gov/sra/). For complete metadata, a data application will need to approved from the Agricultural Health Study (www.aghealthstars.com), the NIH-AARP Diet and Health Study (www.nihaarpstars.com), and the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial (www.cdas.cancer.gov).


Articles from JNCI Journal of the National Cancer Institute are provided here courtesy of Oxford University Press

RESOURCES