Skip to main content
American Journal of Human Genetics logoLink to American Journal of Human Genetics
. 2015 May 7;96(5):832–840. doi: 10.1016/j.ajhg.2015.03.009

Low-Frequency Coding Variants at 6p21.33 and 20q11.21 Are Associated with Lung Cancer Risk in Chinese Populations

Guangfu Jin 1,2,10, Meng Zhu 1,10, Rong Yin 3, Wei Shen 1, Jia Liu 1, Jie Sun 1, Cheng Wang 1, Juncheng Dai 1, Hongxia Ma 1, Chen Wu 4, Zhihua Yin 5, Jiaqi Huang 6, Brandon W Higgs 6, Lin Xu 3, Yihong Yao 6, David C Christiani 7, Christopher I Amos 8, Zhibin Hu 1,2,3,11, Baosen Zhou 5,11, Yongyong Shi 9,11, Dongxin Lin 4,11, Hongbing Shen 1,2,11,
PMCID: PMC4570553  PMID: 25937444

Abstract

Genome-wide association studies have successfully identified a subset of common variants associated with lung cancer risk. However, these variants explain only a fraction of lung cancer heritability. It has been proposed that low-frequency or rare variants might have strong effects and contribute to the missing heritability. To assess the role of low-frequency or rare variants in lung cancer development, we analyzed exome chips representing 1,348 lung cancer subjects and 1,998 control subjects during the discovery stage and subsequently evaluated promising associations in an additional 4,699 affected subjects and 4,915 control subjects during the replication stages. Single-variant and gene-based analyses were carried out for coding variants with a minor allele frequency less than 0.05. We identified three low-frequency missense variants in BAT2 (rs9469031, c.1544C>T [p.Pro515Leu]; odds ratio [OR] = 0.55, p = 1.28 × 10−10), FKBPL (rs200847762, c.410C>T [p.Pro137Leu]; OR = 0.25, p = 9.79 × 10−12), and BPIFB1 (rs6141383, c.850G>A [p.Val284Met]; OR = 1.72, p = 1.79 × 10−7); these variants were associated with lung cancer risk. rs9469031 in BAT2 and rs6141383 in BPIFB1 were also associated with the age of onset of lung cancer (p = 0.001 and 0.006, respectively). BAT2 and FKBPL at 6p21.33 and BPIFB1 at 20q11.21 were differentially expressed in lung tumors and paired normal tissues. Gene-based analysis revealed that FKBPL, in which two independent variants were identified, might account for the association with lung cancer risk at 6p21.33. Our results highlight the important role low-frequency variants play in lung cancer susceptibility and indicate that candidate genes at 6p21.33 and 20q11.21 are potentially biologically relevant to lung carcinogenesis.

Main Text

Lung cancer is among the most frequently diagnosed cancers and is the leading cause of cancer-related death worldwide.1 Tobacco smoking is the major cause of lung cancer, whereas genetic factors determine individual predisposition to lung cancer. We and others have identified a subset of loci that are associated with lung cancer risk through genome-wide association studies (GWASs).2–10 These variants generally occur at a high frequency (minor allele frequency [MAF] > 0.05) in populations, and the effect of single variants is modest (odds ratios [ORs] = 1.1–1.4 for risk alleles). To date, these known common loci explain only a small fraction of the familial risk of lung cancer, and the remaining missing heritability is uncertain.

GWASs mainly focus on common proxy SNPs that are based on the HapMap Project. Most low-frequency (defined here as a MAF of 0.5%–5%) and rare (MAF < 0.5%) variants were not previously evaluated in most GWASs. An alternative hypothesis is that, unlike common variants with low penetrance, some low-frequency or rare variants might have strong effects and might contribute to the missing heritability of complex diseases, including cancer. Supporting this hypothesis is that several genes containing known low-frequency or rare missense variants are associated with various cancers: ATM (MIM: 607585), BRIP1 (MIM: 605882), CHEK2 (MIM: 604373), and PALB2 (MIM: 610355) for breast cancer;11 RAD51D (MIM: 602954) and BRIP1 for ovarian cancer;12,13 and HOXB13 (MIM: 604607) for prostate cancer.14 More recently, Wang et al. implicated two large-effect, low-frequency variants—rs11571833 (c.9976A>T [p.Lys3326]; GenBank: NM_000059) in BRCA2 (MIM: 600185) and rs17879961 (c.470T>C [p.Ile157Thr]; GenBank: NM_007194.3) in CHEK2 (MIM: 604373)—in susceptibility to lung cancer in populations of European ancestry on the basis of existing GWAS imputation data;15 these findings suggest that low-frequency or rare variants in coding regions are important to the missing heritability of lung cancer.

Sequencing is an ideal approach for investigating low-frequency or rare variants but has been limited so far because of its cost. The Illumina HumanExome Beadchip (referred to as “exome chip” hereafter) platform has thus been developed to capture low-frequency or rare variants in coding regions on the basis of genetic variants discovered from the whole-exome sequencing of >12,000 individuals. Recently, several groups have validated this platform as an effective complementary approach for determining the genetic basis of complex diseases or traits.16–18 To address the role of low-frequency or rare variants in the development of lung cancer, we generated and analyzed exome-chip data for 1,348 lung cancer subjects and 1,998 control subjects and subsequently evaluated promising associations in an additional 4,699 affected subjects and 4,915 control subjects. As a result, we identified three low-frequency missense variants in BAT2 (MIM: 142580), FKBPL, and BPIFB1, which are associated with lung cancer risk in Chinese populations.

A three-stage case-control analysis was conducted, and the characteristics of the subjects are summarized in Table S1. In the discovery stage, 1,348 lung cancer subjects and 1,998 control subjects were recruited from Nanjing and the surrounding areas; some of these individuals were also included in our previous GWAS.7 In the first replication stage (replication I), 1,115 affected subjects and 1,246 control subjects were recruited according to the same standards as those used in the discovery stage during 2009–2013. In the second replication stage (replication II), a total of 3,584 affected subjects and 3,669 control subjects were recruited from northern China; 2,466 affected subjects and 2,423 control subjects were from Beijing, and 1,118 affected subjects and 1,246 control subjects were from Shenyang. All of these affected subjects were collected from local hospitals and were histopathologically or cytologically confirmed as having lung cancer by at least two pathologists. All control subjects were cancer-free subjects receiving a routine physical examination in a local hospital or participants in a community screening of noncommunicable diseases. All subjects were unrelated ethnic Han Chinese and gave informed consent at recruitment. Smoking information was obtained via interviews; individuals who had smoked an average of one or more cigarettes per day for at least 1 year before recruitment were defined as current smokers, whereas smokers who had quit more than 1 year before recruitment were considered former smokers; otherwise, subjects were considered non-smokers. Current and former smokers were divided into light and heavy smokers according to the median smoking level of 25 pack years (the number of packs of cigarettes smoked per day multiplied by the number of years a person has smoked) among control subjects. This study was approved by the institutional review board of each participating institution.

We successfully genotyped 1,348 lung cancer subjects and 1,998 control subjects by using the Illumina HumanExome Beadchip system and found a total of 247,870 variants. The lung cancer and control subjects were genotyped together, and the technicians were blinded to the sample status. Genotypes were called by Illumina GenomeStudio software, and the selected variants were re-called by zCall.19 Systematic quality control of the raw genotyping data was performed to filter unqualified genetic variants and samples (Figure S1). A total of 175,447 variants were excluded from subsequent analysis because they (1) were mitochondrial variants or were located on the X or Y chromosome, (2) had duplicate variants on the chip, (3) were monomorphic in our study subjects, (4) had a call rate of <95%, or (5) presented a p value< 1 × 10−4 in a Hardy-Weinberg equilibrium test among the control subjects. A total of 7 affected subjects and 16 control subjects were excluded because they (1) had an overall genotyping rate of <95%, (2) were duplicates or showed familial relationships (PI_HAT > 0.25), or (3) had an extreme heterozygosity rate more than 6 SDs from the mean. Population outliers and stratification were detected with a method based on principal-component analysis. As shown in Figure S2, no individuals were excluded as outliers, and the affected and control subjects were genetically matched. We assessed genotyping consistency on the basis of 37 replicate samples and found an overall concordance rate of 99.98%. Moreover, 1,369 subjects were also scanned with an Affymetrix Genome-Wide Human SNP Array 6.0 in a previous GWAS,7 and the concordance rate was 99.93% for 6,660 overlapping variants after quality control. Accordingly, 6 samples and 50 variants with a concordance rate <95% were also excluded. Finally, 72,423 variants in 1,341 affected subjects and 1,982 control subjects were retained for further association analysis.

In this study, we mainly focused on the low-frequency or rare variants with MAFs between 0.1% and 5% when we could call at least six copies of the minor allele in our study samples. On the basis of the following items, we then selected promising variants for further genotyping in the replication stages: (1) variants were in nonsynonymous or splice sites, (2) the single-variant association p value was less than 0.001, (3) variant calling was visually inspected with a clear genotyping cluster, and (4) only one variant was selected when multiple variants were in linkage disequilibrium (LD; r2 ≥ 0.5). On the basis of the results from the discovery stage, we genotyped 21 variants in the replication I stage by using SNPscan technology (GeneSky). To obtain positive control samples of minor genotypes for low-frequency or rare variants, we included 192 discovery-stage samples in the replication I stage to ensure the presence of at least two heterozygotes or minor homozygotes, and the concordance rate was 99.4%. In the replication II stage, genotyping was performed with the TaqMan system (Applied Biosystems). Positive and negative control subjects were included in each 384-well plate for quality control. The average concordance rate between duplicate samples was >99%. Genotyping was performed by technicians who were blinded to sample status.

Assuming an additive genetic model, we performed a single-variant association analysis by using a logistic regression model as implemented in PLINK.20 At the discovery stage, we carried out principal-component analysis with EIGENSOFT21 to determine ancestry and population stratification on the basis of 4,604 autosomal ancestry-informative markers included on the exome chips. The top principal component was significant (p = 0.03), and we included it (together with age, gender, and smoking level by pack years) in the logistic regression model as a covariate when we estimated ORs and 95% confidence intervals (CIs). We also used the logistic score test22 and the Firth bias-corrected logistic likelihood-ratio test23 to assess the association results for rare or low-frequency variants. At the replication stages, we used age, gender, and smoking level as covariates. We performed joint analysis to combine the discovery and replication stages and used age, gender, smoking level, and study stage as covariates. We applied conditional analysis to test the independence of genetic variants in each region and used the predefined variant(s) as covariate(s). We performed two gene-based tests using nonsynonymous and splice-site variants with a MAF < 5% (n = 43,782): a simple burden test24 and a sequence kernel association test (SKAT).25 The SKAT was implemented in the sequence kernel association optimal test (SKAT-O).26 We defined statistical significance by using the Bonferroni correction and set the exome-wide significance levels at 2 × 10−7 for single-variant analysis (0.05/250,000 variants) and 2.83 × 10−6 for gene-based analysis (8,840 genes × 2 tests). The quantile-quantile plot was generated with R v.2.3.1, and regional plots were created with LocusZoom.27 We annotated variants according to GENCODE v.7 coding transcripts,28 dbNSFP v.2.0,29 or documentation files obtained from the Illumina Product Support Files.

We obtained the normalized expression data and clinical information for lung cancer samples from The Cancer Genome Atlas (TCGA) on July 8, 2014. A total of 107 paired samples (lung tumor with adjacent normal tissues) were used in this analysis. The paired t test was used to test whether gene expression differed between tumors and the adjacent normal tissues. Seventy-nine out of the 107 individuals had clinical follow-up information and were included in the survival analysis. The mRNA expression ratio of the tumor and adjacent normal tissues was calculated with read counts normalized to RNA sequencing (RNA-seq) by expectation maximization (RSEM). The individuals were divided into two groups on the basis of the median value of the expression ratio for each gene. The Kaplan-Meier method and the log-rank test were used for evaluating the association between gene expression and survival.

After quality control, 72,423 polymorphic variants were included in the exome chip (29.2% of 247,870 variants) performed on 3,323 Chinese Han subjects. The detailed distributions of these variants are summarized in Table S2. In the single-variant association analysis, the quantile-quantile plot revealed a good match between the distributions of the observed and expected p values (Figure S3). A small genomic-control inflation factor (λ) of 1.04, which decreased to 1.01 after the removal of variants that showed a cluster of association signals at 6p22.2–6p21.31, indicated a low possibility of false-positive associations resulting from population stratification. However, at the discovery stage, we did not find any variants associated with lung cancer risk at our predefined exome-wide significance level (p < 2 × 10−7) (Figure S4), which was probably due to limited statistical power, especially for low-frequency or rare variants (Figure S5).

We then conducted a two-stage replication study for promising nonsynonymous or splice-site variants with a MAF from 0.1% to 5%. At the replication I stage, we genotyped 21 variants with clear cluster plots (Figure S6) in 1,115 lung cancer subjects and 1,246 control subjects (Table S3). As a result, four variants with a p value <0.05 at the replication I stage showed consistent associations with variants found at the discovery stage (Table S3). At the replication II stage, we genotyped these four variants and found that the associations were consistent for all four variants and that two of them had a p value <0.05 (Table S3). When combining the results from the discovery and replication stages, we found three low-frequency, missense variants at BAT2 (rs9469031, c.1544C>T [p.Pro515Leu]; OR = 0.55, p = 1.28 × 10−10), FKBPL (rs200847762, c.410C>T [p.Pro137Leu]; OR = 0.25, p = 9.79 × 10−12), and BPIFB1 (rs6141383, c.850G>A [p.Val284Met]; OR = 1.72, p = 1.79 × 10−7) to be significantly associated with lung cancer risk and to have p values less than 2 × 10−7 (Table 1). We also found a promising HIST1H1E variant (rs2298090, c.455A>G [p.Lys152Arg]; OR = 0.51) with a combined p value of 2.95 × 10−7. The MAFs of these four variants were also less than 0.05 in other populations, and two of the variants (rs9469031 and rs2298090) were polymorphic but not associated with lung cancer risk according to in silico replication in populations of European ancestry (Table S4).15

Table 1.

The Identified Low-Frequency Variants Associated with Lung Cancer Risk

Chr Gene Variant ID Major/Minor Allele Variant Stage Affected Subjectsa Control Subjectsa MAF
OR (95% CI)b p Valuec
Affected Subjects Control Subjects
6p21.33 BAT2 rs9469031 C/T c.1544C>T (p.Pro515Leu) (GenBank: NM_004638) discovery 1,291/50/0 1,840/140/2 0.019 0.036 0.52 (0.37–0.73) 1.54 × 10−4
replication I 1,066/42/6 1,150/85/10 0.024 0.042 0.61 (0.44–0.83) 1.71 × 10−3
replication II 3,429/78/1 3,502/129/0 0.011 0.018 0.62 (0.46–0.84) 1.71 × 10−3
combinedd 0.55 (0.46–0.66) 1.28 × 10−10
6p21.33 FKBPL rs200847762 G/A c.410C>T (p.Pro137Leu) (GenBank: NM_022110) discovery 1,329/12/0 1,908/73/1 0.004 0.019 0.21 (0.11–0.39) 1.84 × 10−6
replication I 1,094/6/0 1,206/32/1 0.003 0.014 0.19 (0.08–0.46) 2.24 × 10−4
replication II 3,566/15/0 3,642/23/1 0.002 0.003 0.66 (0.34–1.28) 0.216
combinedd 0.25 (0.17–0.37) 9.80 × 10−12
6p22.2 HIST1H1E rs2298090 A/G c.455A>G (p.Lys152Arg) (GenBank: NM_005321) discovery 1,325/16/0 1,904/77/1 0.006 0.020 0.32 (0.19–0.56) 6.16 × 10−5
replication I 1,073/27/1 1,178/54/3 0.013 0.024 0.56 (0.36–0.87) 9.80 × 10−3
replication II 3,385/44/0 3,492/59/0 0.006 0.008 0.67 (0.44–1.02) 6.03 × 10−2
combinedd 0.51 (0.39–0.66) 2.95 × 10−7
20q11.21 BPIFB1 rs6141383 G/A c.850G>A (p.Val284Met) (GenBank: NM_033197) discovery 1,277/62/2 1,934/48/0 0.025 0.012 2.00 (1.36–2.95) 4.80 × 10−4
replication I 1,065/42/3 1,209/31/0 0.022 0.013 1.68 (1.07–2.63) 2.30 × 10−2
replication II 3,374/133/0 3,368/87/0 0.019 0.013 1.64 (1.24–2.17) 6.20 × 10−4
combinedd 1.72 (1.40–2.10) 1.79 × 10−7

Abbreviations are as follows: Chr, chromosomal region; MAF, minor allele frequency; CI, confidence interval.

a

Major homozygote/heterozygote/minor homozygote.

b

Derived from the logistic regression model after adjustment for age, gender, pack years of smoking, and the top principal component (for the discovery stage only) under the assumption of an additive genetic model.

c

Derived from the logistic regression model adjusting for age, gender, pack years of smoking, and the top principal component (for the discovery stage only) under the assumption of an additive genetic model.

d

The joint analysis was performed to combine the discovery and replication stages with age, gender, smoking level, and study stage as covariates.

We then analyzed the relationships between the four identified variants and the onset ages of the lung cancer case subjects. We observed that rs9469031 and rs6141383 were significantly associated with onset age after adjusting for gender and smoking level (p = 0.001 and 0.006, respectively; Figure 1). Lung cancer subjects carrying the protective allele (T) of rs9469031 had a higher onset age (62.12 ± 10.56 years) than those without the protective allele (59.52 ± 10.18 years), and those with the risk allele (A) of rs6141383 had a lower onset age (58.63 ± 9.23 years) than those without the risk allele (59.64 ± 10.23 years). In addition, we did not find significantly different associations between the subgroups divided by age, gender, smoking, or histology (Table S5).

Figure 1.

Figure 1

The Relationships between rs9469031 in BAT2 and rs6141383 in BPIFB1 and Age of Onset in Individuals with Lung Cancer

Individuals carrying rs9469031 CT/TT genotypes (A), which were associated with a decreased risk of lung cancer, were older at onset (62.12 ± 10.56 years) than those with CC genotypes (59.52 ± 10.18 years, p = 0.001 after adjustment for gender and smoking levels). Individuals carrying rs6141383 GA/AA genotypes (B), which were associated with an increased risk of lung cancer, were younger at onset (58.63 ± 9.23 years) than those with GG genotypes (59.64 ± 10.23 years, p = 0.006 after adjustment for gender and smoking levels).

We then carefully evaluated genetic variants in the flanking regions (1 Mb upstream or downstream) of rs9469031, rs200847762, rs2298090, and rs6141383. As shown in Figure S7, four variants, including the identified variant (rs200847762) at FKBPL, had a lower p values than that of rs9469031, one variant (rs138097862) failed to be replicated at the replication stages, and two variants (rs117160266, c.353A>G [p.Asn28Ser], in FKBPL and rs9469057, c.205G>C [p.Ala8Pro], in HSPA1L [MIM: 140559]) were in strong LD with rs9469031 (r2 = 0.96 and 0.99, respectively) (Table S6). The associations of these two highly correlated variants were abolished after conditioning on rs9469031 (Table S6). We did not find any associations that were more prominent than those of rs200847762, rs2298090, and rs6141383 for their respective regions (Figure S7). There were no other variants in strong LD (r2 > 0.5) with these three variants as genotyped on the exome chip (Table S6).

We further conducted gene-based analysis by using the SKAT-O and burden tests for variants with a MAF < 0.05 and found a significant association between FKBPL and lung cancer risk in both tests (p = 1.29 × 10−9 and 2.0 × 10−10, respectively; Table 2). As shown in Figure S8, three coding variants of FKBPL were included in the gene-based analysis. In the single-variant analysis, the variants rs200847762 and rs117160266 (tagged by rs9469031 at BAT2 with r2 = 0.96) were in low LD (r2 < 0.1) and were independently associated with lung cancer risk (Table 1 and Table S6). After we conditioned on either of these two variants, the significance of FKBPL was partially decreased, and the signal was abolished after we conditioned on both variants (Table 2). These results suggest that the gene-based signal of FKBPL is driven by rs200847762 and rs117160266. We also replicated the association between FKBPL and lung cancer risk in the gene-based analyses of the replication stages (p < 0.001) by using genotyping data for rs200847762 and rs9469031 (Table 2).

Table 2.

Association between FKBPL, at 6p21.33, and Lung Cancer Risk according to Gene-Based Analysis

Test Stage Variants Included p Valuea Conditional Analysis
Variants Included (p Value for Single-Variant Test) p Value for Gene-Based Analysisb
SKAT-O discovery rs200847762, rs117160266, and rs142997752 1.29 × 10−9 rs200847762 (1.84 × 10−6) 3.44 × 10−5
rs117160266 (4.91 × 10−5) 3.26 × 10−6
rs200847762 and rs117160266 0.106
replication I rs200847762 and rs117160266c 1.15 × 10−6
replication II rs200847762 and rs117160266c 5.55 × 10−4
Burden discovery rs200847762, rs117160266, and rs142997752 2.00 × 10−10 rs200847762 (1.84 × 10−6) 2.75 × 10−5
rs117160266 (4.91 × 10−5) 1.36 × 10−7
rs200847762 and rs117160266 0.081
replication I rs200847762 and rs117160266c 4.81 × 10−7
replication II rs200847762 and rs117160266c 9.61 × 10−4
a

After adjustment for age, gender, pack years of smoking, and the top principal component (for the discovery stage only).

b

After additional adjustment for the corresponding lead variant(s).

c

The genotypes of rs9469031 were used here because rs9469031 was highly correlated with rs117160266 at the discovery stage (r2 = 0.96).

The missense variant rs9469031 in BAT2 is located at 6p21.33, which is part of the human leukocyte antigen (HLA) region that was initially identified as a lung cancer susceptibility region and was tagged by common variants (rs3117582 and rs3131379) in GWASs involving subjects of European ancestry.6 The association was confirmed in some follow-up studies,30,31 but not all of these studies involved populations of European ancestry.32 However, studies based on East Asian populations consistently failed to replicate the association7,10 because none of the identified variants in European populations are polymorphic in East Asian populations. Notably, although the initially identified variant, rs3117582, was not associated with lung cancer risk in African Americans, the minor allele of missense variant rs2736158 (c.4140G>C [p.Gly1285Ala] in exon 16) in BAT2 was associated with a decreased risk of squamous cell lung cancer (OR = 0.64, 95% CI = 0.48–0.85).33 Of interest, this variant was consistently associated with lung cancer risk at our discovery stage (OR = 0.82, 95% CI = 0.70–0.96, p = 0.011; Table S6). In the samples examined during the discovery stage, the identified low-frequency variant rs9469031 (MAF = 0.036 in control subjects) was in low LD with the common variant rs2736158 (MAF = 0.139 in control subjects), given that r2 was 0.19; however, all of the individuals carrying the protective allele of rs9469031 also had the protective allele of rs2736158, yielding a D′ = 1.00 (Table S6). The association of rs2736158 was abolished after we conditioned on rs9469031 (OR = 0.92, 95% CI = 0.78–1.09, p = 0.331), whereas the association of rs9469031 changed modestly (OR = 0.56, 95% CI = 0.39–0.81, p = 0.002). Collectively, these findings indicate that the association of the common variant rs2736158 might be driven by association of the low-frequency variant rs9469031. In addition, two independent common variants rs3817963 and rs2395185 at 6p21.32 (837 kb and 772 kb away from rs9469031, respectively) are reported to be associated with lung cancer risk in Japanese10 and Chinese9 populations, respectively. The absence of LD between the low-frequency variant and these two common variants (r2 < 0.01) indicates that the signal of rs9469031 might be independent from the signals reported in Eastern Asian populations.

Because there are at least two variants (rs117160266 in FKBPL and rs9469057 in HSPA1L) in strong LD with rs9469031 in BAT2, it is important to determine which is the causal variant at 6p21.33. BAT2 (also known as PRRC2A) is in a cluster of HLA-B-associated transcripts (BAT1-5) in the human major histocompatibility complex class III region.34 FKBPL (FK506-binding protein-like), a divergent member of the immunophilin family, is implicated in the regulation of tumor growth and angiogenesis and might act as a cancer prognostic marker and a therapeutic target.35,36 HSPA1L encodes heat-shock 70-kDa protein 1-like, which in combination with other heat-shock proteins stabilizes existing proteins against aggregation and protects against DNA damage.37 Functional variants in HSPA1B have been associated with lung cancer risk and survival.38 RNA-seq data from TCGA indicated that BAT2 and FKBPL were upregulated in 84.1% (p = 8.50 × 10−16) and 91.6% (p = 2.25 × 10−18) of lung tumor tissues, respectively, and HSPA1L was significantly downregulated (79.4%, p = 1.56 × 10−10) (Figure S9). We also assessed the clinical relevance of these three genes and did not find significant associations between the mRNA levels and lung cancer survival (Figure S10). Of interest, for rs9469057 at HSPA1L, the substitution p.Ala8Pro is predicted to be damaging (Table S7). Nevertheless, gene-based analysis supported the conclusion that FKBPL, which has two independent risk-related variants, might be the lung-cancer-associated gene at 6p21.33.

The LD of common variants that were associated with lung cancer risk at chromosome 6 from 26–34 Mb (across 6p22.2–6p21.31) has been reported to extend over long distances.6 Genetic variants in this region have been associated with multiple diseases or traits, especially those relating to inflammation and/or immune-related diseases, such as allergies,39 chronic hepatitis B virus infection,40 ulcerative colitis,41 multiple myeloma,42 and diffuse large B cell lymphoma.43 Consistent with this finding, more than 100 low-frequency or rare variants in this region were observed to be associated with lung cancer risk (p < 0.01) in our study, yielding an obvious peak on the Manhattan plot (Figure S4). In addition to the two independent 6p21.33 loci that were described above, we also found another promising low-frequency variant: rs2298090 in HIST1H1E at 6p22.2. p.Lys152Arg, the amino acid change resulting from this variant, is predicted to be damaging (Table S7). HIST1H1E encodes a member of the linker histone H1 family, which interacts with linker DNA between nucleosomes and functions in the compaction of chromatin into higher-order structures.44 TCGA RNA-seq data showed that the mRNA levels of HIST1H1E were higher in lung tumors than in the paired normal tissues (83.2%, p = 2.12 × 10−7; Figure S9); however, the mRNA levels of HIST1H1E were not associated with lung cancer prognosis (Figure S10).

The variant rs6141383 in BPIFB1 localizes to 20q11.21, a region that has not been reported to be associated with lung cancer susceptibility. As predicted, the substitution p.Val284Met is damaging to BPIFB1 (Table S7). BPIFB1 (or LPLUNC1) is a secretory protein that is predominantly present in lung tissues and is present at low levels in other organs.45 BPIFB1 has been implicated in host innate immune defenses against pulmonary infection46 and in the pathogenesis of chronic lung diseases, such as cystic fibrosis and interstitial lung disease.47 Previous studies revealed that BPIFB1, which is downregulated in nasopharyngeal carcinoma (NPC),48 can inhibit inflammation and NPC growth by downregulating the STAT3 pathway.49 BPIFB1 was also significantly downregulated in lung tumors (67.3%, p = 0.0025; Figure S9), which is consistent with the prediction that the lung-cancer-associated risk allele of rs6141383 damages BPIFB1. However, high mRNA levels were associated with poor prognosis in individuals with lung cancer (p = 0.013; Figure S10), possibly suggesting that BPIFB1 has dual roles in the development and progression of lung cancer. Although the role of BPIFB1 in lung carcinogenesis is limited, these findings indicate that BPIFB1 might constitute part of the protective immunity shield that overlies pulmonary inflammation.

In the current study, we identified three low-frequency variants that were associated with lung cancer risk in Chinese populations; together with recent findings based on populations of European ancestry, these results show that low-frequency variants also contribute to lung cancer susceptibility. In particular, our results reveal that lung cancer susceptibility loci across 6p22.2–6p21.31, which have been reported in populations of European ancestry, are also present in East Asian populations and that FKBPL, for which two independent risk-related variants were found, might be a lung-cancer-associated gene at 6p21.33. We also observed a relationship between lung cancer and BPIFB1 at 20q11.21. These genetic associations, together with differences in expression in lung cancer tissues, indicate that genes at 6p21.33 and 20q11.21 might play key roles in lung carcinogenesis.

Acknowledgments

This work was funded by the National Key Basic Research Program (grants 2011CB503805, 2013CB910304, and 2013CB911400), the State Key Program of National Natural Science of China (grant 81230067), the National Distinguished Youth Science Foundation of China (grant 81225020), the National Outstanding Youth Science Foundation of China (grant 81422042), the Science Foundation for Distinguished Young Scholars of Jiangsu (grants BK2012042 and BK20130042), the National Natural Science Foundation of China (grant 81270044), the Jiangsu Specially Appointed Professor Project, the Natural Science Foundation of Jiangsu Province (grant BK20130060), the Key Grant of the Natural Science Foundation of Jiangsu Higher Education Institutions (11KJA330001), the National Program for Support of Top-Notch Young Professionals from the Organization Department of the Communist Party of China Central Committee, the Jiangsu Province Clinical Science and Technology Projects (grant BL2012008), and the Priority Academic Program for the Development of Jiangsu Higher Education Institutions (Public Health and Preventive Medicine). The authors wish to thank all the study participants, research staff, and students who participated in this work.

Published: April 30, 2015

Footnotes

Supplemental Data include ten figures and seven tables and can be found with this article online at http://dx.doi.org/10.1016/j.ajhg.2015.03.009.

Web Resources

The URLs for the data presented herein are as follows:

Supplemental Data

Document S1. Figures S1–S10 and Tables S1–S5 and S7
mmc1.pdf (1.8MB, pdf)
Table S6. 6p21.33, 6p22.2, and 20q11.21 Genetic Variants Genotyped on Exome Chip and Their Associations with Lung Cancer Risk for Those with p < 0.05
mmc2.xlsx (52.5KB, xlsx)
Document S2. Article plus Supplemental Data
mmc3.pdf (2.3MB, pdf)

References

  • 1.Jemal A., Bray F., Center M.M., Ferlay J., Ward E., Forman D. Global cancer statistics. CA Cancer J. Clin. 2011;61:69–90. doi: 10.3322/caac.20107. [DOI] [PubMed] [Google Scholar]
  • 2.Amos C.I., Wu X., Broderick P., Gorlov I.P., Gu J., Eisen T., Dong Q., Zhang Q., Gu X., Vijayakrishnan J. Genome-wide association scan of tag SNPs identifies a susceptibility locus for lung cancer at 15q25.1. Nat. Genet. 2008;40:616–622. doi: 10.1038/ng.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Hung R.J., McKay J.D., Gaborieau V., Boffetta P., Hashibe M., Zaridze D., Mukeria A., Szeszenia-Dabrowska N., Lissowska J., Rudnai P. A susceptibility locus for lung cancer maps to nicotinic acetylcholine receptor subunit genes on 15q25. Nature. 2008;452:633–637. doi: 10.1038/nature06885. [DOI] [PubMed] [Google Scholar]
  • 4.McKay J.D., Hung R.J., Gaborieau V., Boffetta P., Chabrier A., Byrnes G., Zaridze D., Mukeria A., Szeszenia-Dabrowska N., Lissowska J., EPIC Study Lung cancer susceptibility locus at 5p15.33. Nat. Genet. 2008;40:1404–1406. doi: 10.1038/ng.254. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Thorgeirsson T.E., Geller F., Sulem P., Rafnar T., Wiste A., Magnusson K.P., Manolescu A., Thorleifsson G., Stefansson H., Ingason A. A variant associated with nicotine dependence, lung cancer and peripheral arterial disease. Nature. 2008;452:638–642. doi: 10.1038/nature06846. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Wang Y., Broderick P., Webb E., Wu X., Vijayakrishnan J., Matakidou A., Qureshi M., Dong Q., Gu X., Chen W.V. Common 5p15.33 and 6p21.33 variants influence lung cancer risk. Nat. Genet. 2008;40:1407–1409. doi: 10.1038/ng.273. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Hu Z., Wu C., Shi Y., Guo H., Zhao X., Yin Z., Yang L., Dai J., Hu L., Tan W. A genome-wide association study identifies two new lung cancer susceptibility loci at 13q12.12 and 22q12.2 in Han Chinese. Nat. Genet. 2011;43:792–796. doi: 10.1038/ng.875. [DOI] [PubMed] [Google Scholar]
  • 8.Dong J., Hu Z., Wu C., Guo H., Zhou B., Lv J., Lu D., Chen K., Shi Y., Chu M. Association analyses identify multiple new lung cancer susceptibility loci and their interactions with smoking in the Chinese population. Nat. Genet. 2012;44:895–899. doi: 10.1038/ng.2351. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Lan Q., Hsiung C.A., Matsuo K., Hong Y.C., Seow A., Wang Z., Hosgood H.D., 3rd, Chen K., Wang J.C., Chatterjee N. Genome-wide association analysis identifies new lung cancer susceptibility loci in never-smoking women in Asia. Nat. Genet. 2012;44:1330–1335. doi: 10.1038/ng.2456. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Shiraishi K., Kunitoh H., Daigo Y., Takahashi A., Goto K., Sakamoto H., Ohnami S., Shimada Y., Ashikawa K., Saito A. A genome-wide association study identifies two new susceptibility loci for lung adenocarcinoma in the Japanese population. Nat. Genet. 2012;44:900–903. doi: 10.1038/ng.2353. [DOI] [PubMed] [Google Scholar]
  • 11.Stratton M.R., Rahman N. The emerging landscape of breast cancer susceptibility. Nat. Genet. 2008;40:17–22. doi: 10.1038/ng.2007.53. [DOI] [PubMed] [Google Scholar]
  • 12.Loveday C., Turnbull C., Ramsay E., Hughes D., Ruark E., Frankum J.R., Bowden G., Kalmyrzaev B., Warren-Perry M., Snape K., Breast Cancer Susceptibility Collaboration (UK) Germline mutations in RAD51D confer susceptibility to ovarian cancer. Nat. Genet. 2011;43:879–882. doi: 10.1038/ng.893. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Rafnar T., Gudbjartsson D.F., Sulem P., Jonasdottir A., Sigurdsson A., Jonasdottir A., Besenbacher S., Lundin P., Stacey S.N., Gudmundsson J. Mutations in BRIP1 confer high risk of ovarian cancer. Nat. Genet. 2011;43:1104–1107. doi: 10.1038/ng.955. [DOI] [PubMed] [Google Scholar]
  • 14.Ewing C.M., Ray A.M., Lange E.M., Zuhlke K.A., Robbins C.M., Tembe W.D., Wiley K.E., Isaacs S.D., Johng D., Wang Y. Germline mutations in HOXB13 and prostate-cancer risk. N. Engl. J. Med. 2012;366:141–149. doi: 10.1056/NEJMoa1110000. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Wang Y., McKay J.D., Rafnar T., Wang Z., Timofeeva M.N., Broderick P., Zong X., Laplana M., Wei Y., Han Y. Rare variants of large effect in BRCA2 and CHEK2 affect risk of lung cancer. Nat. Genet. 2014;46:736–741. doi: 10.1038/ng.3002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Huyghe J.R., Jackson A.U., Fogarty M.P., Buchkovich M.L., Stančáková A., Stringham H.M., Sim X., Yang L., Fuchsberger C., Cederberg H. Exome array analysis identifies new loci and low-frequency variants influencing insulin processing and secretion. Nat. Genet. 2013;45:197–201. doi: 10.1038/ng.2507. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Auer P.L., Teumer A., Schick U., O’Shaughnessy A., Lo K.S., Chami N., Carlson C., de Denus S., Dubé M.P., Haessler J. Rare and low-frequency coding variants in CXCR2 and other genes are associated with hematological traits. Nat. Genet. 2014;46:629–634. doi: 10.1038/ng.2962. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Peloso G.M., Auer P.L., Bis J.C., Voorman A., Morrison A.C., Stitziel N.O., Brody J.A., Khetarpal S.A., Crosby J.R., Fornage M., NHLBI GO Exome Sequencing Project Association of low-frequency and rare coding-sequence variants with blood lipids and coronary heart disease in 56,000 whites and blacks. Am. J. Hum. Genet. 2014;94:223–232. doi: 10.1016/j.ajhg.2014.01.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Goldstein J.I., Crenshaw A., Carey J., Grant G.B., Maguire J., Fromer M., O’Dushlaine C., Moran J.L., Chambert K., Stevens C., Swedish Schizophrenia Consortium. ARRA Autism Sequencing Consortium zCall: a rare variant caller for array-based genotyping: genetics and population analysis. Bioinformatics. 2012;28:2543–2545. doi: 10.1093/bioinformatics/bts479. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Purcell S., Neale B., Todd-Brown K., Thomas L., Ferreira M.A., Bender D., Maller J., Sklar P., de Bakker P.I., Daly M.J., Sham P.C. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 2007;81:559–575. doi: 10.1086/519795. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Price A.L., Patterson N.J., Plenge R.M., Weinblatt M.E., Shadick N.A., Reich D. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 2006;38:904–909. doi: 10.1038/ng1847. [DOI] [PubMed] [Google Scholar]
  • 22.Lin D.Y., Tang Z.Z. A general framework for detecting disease associations with rare variants in sequencing studies. Am. J. Hum. Genet. 2011;89:354–367. doi: 10.1016/j.ajhg.2011.07.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Wang X. Firth logistic regression for rare variant association tests. Front. Genet. 2014;5:187. doi: 10.3389/fgene.2014.00187. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Li B., Leal S.M. Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am. J. Hum. Genet. 2008;83:311–321. doi: 10.1016/j.ajhg.2008.06.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Wu M.C., Lee S., Cai T., Li Y., Boehnke M., Lin X. Rare-variant association testing for sequencing data with the sequence kernel association test. Am. J. Hum. Genet. 2011;89:82–93. doi: 10.1016/j.ajhg.2011.05.029. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Lee S., Wu M.C., Lin X. Optimal tests for rare variant effects in sequencing association studies. Biostatistics. 2012;13:762–775. doi: 10.1093/biostatistics/kxs014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Pruim R.J., Welch R.P., Sanna S., Teslovich T.M., Chines P.S., Gliedt T.P., Boehnke M., Abecasis G.R., Willer C.J. LocusZoom: regional visualization of genome-wide association scan results. Bioinformatics. 2010;26:2336–2337. doi: 10.1093/bioinformatics/btq419. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Harrow J., Frankish A., Gonzalez J.M., Tapanari E., Diekhans M., Kokocinski F., Aken B.L., Barrell D., Zadissa A., Searle S. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 2012;22:1760–1774. doi: 10.1101/gr.135350.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Liu X., Jian X., Boerwinkle E. dbNSFP v2.0: a database of human non-synonymous SNVs and their functional predictions and annotations. Hum. Mutat. 2013;34:E2393–E2402. doi: 10.1002/humu.22376. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Landi M.T., Chatterjee N., Yu K., Goldin L.R., Goldstein A.M., Rotunno M., Mirabello L., Jacobs K., Wheeler W., Yeager M. A genome-wide association study of lung cancer identifies a region of chromosome 5p15 associated with risk for adenocarcinoma. Am. J. Hum. Genet. 2009;85:679–691. doi: 10.1016/j.ajhg.2009.09.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Broderick P., Wang Y., Vijayakrishnan J., Matakidou A., Spitz M.R., Eisen T., Amos C.I., Houlston R.S. Deciphering the impact of common genetic variation on lung cancer risk: a genome-wide association study. Cancer Res. 2009;69:6633–6641. doi: 10.1158/0008-5472.CAN-09-0680. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Truong T., Hung R.J., Amos C.I., Wu X., Bickeböller H., Rosenberger A., Sauter W., Illig T., Wichmann H.E., Risch A. Replication of lung cancer susceptibility loci at chromosomes 15q25, 5p15, and 6p21: a pooled analysis from the International Lung Cancer Consortium. J. Natl. Cancer Inst. 2010;102:959–971. doi: 10.1093/jnci/djq178. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Walsh K.M., Gorlov I.P., Hansen H.M., Wu X., Spitz M.R., Zhang H., Lu E.Y., Wenzlaff A.S., Sison J.D., Wei C. Fine-mapping of the 5p15.33, 6p22.1-p21.31, and 15q25.1 regions identifies functional and histology-specific lung cancer susceptibility loci in African-Americans. Cancer Epidemiol. Biomarkers Prev. 2013;22:251–260. doi: 10.1158/1055-9965.EPI-12-1007-T. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Spies T., Blanck G., Bresnahan M., Sands J., Strominger J.L. A new cluster of genes within the human major histocompatibility complex. Science. 1989;243:214–217. doi: 10.1126/science.2911734. [DOI] [PubMed] [Google Scholar]
  • 35.Robson T., James I.F. The therapeutic and diagnostic potential of FKBPL; a novel anticancer protein. Drug Discov. Today. 2012;17:544–548. doi: 10.1016/j.drudis.2012.01.002. [DOI] [PubMed] [Google Scholar]
  • 36.McKeen H.D., Brennan D.J., Hegarty S., Lanigan F., Jirstrom K., Byrne C., Yakkundi A., McCarthy H.O., Gallagher W.M., Robson T. The emerging role of FK506-binding proteins as cancer biomarkers: a focus on FKBPL. Biochem. Soc. Trans. 2011;39:663–668. doi: 10.1042/BST0390663. [DOI] [PubMed] [Google Scholar]
  • 37.Singh R., Kolvraa S., Rattan S.I. Genetics of human longevity with emphasis on the relevance of HSP70 as candidate genes. Front. Biosci. 2007;12:4504–4513. doi: 10.2741/2405. [DOI] [PubMed] [Google Scholar]
  • 38.Guo H., Deng Q., Wu C., Hu L., Wei S., Xu P., Kuang D., Liu L., Hu Z., Miao X. Variations in HSPA1B at 6p21.3 are associated with lung cancer risk and prognosis in Chinese populations. Cancer Res. 2011;71:7576–7586. doi: 10.1158/0008-5472.CAN-11-1409. [DOI] [PubMed] [Google Scholar]
  • 39.Hinds D.A., McMahon G., Kiefer A.K., Do C.B., Eriksson N., Evans D.M., St Pourcain B., Ring S.M., Mountain J.L., Francke U. A genome-wide association meta-analysis of self-reported allergy identifies shared and allergy-specific susceptibility loci. Nat. Genet. 2013;45:907–911. doi: 10.1038/ng.2686. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Hu Z., Liu Y., Zhai X., Dai J., Jin G., Wang L., Zhu L., Yang Y., Liu J., Chu M. New loci associated with chronic hepatitis B virus infection in Han Chinese. Nat. Genet. 2013;45:1499–1503. doi: 10.1038/ng.2809. [DOI] [PubMed] [Google Scholar]
  • 41.Juyal G., Negi S., Sood A., Gupta A., Prasad P., Senapati S., Zaneveld J., Singh S., Midha V., van Sommeren S. Genome-wide association scan in north Indians reveals three novel HLA-independent risk loci for ulcerative colitis. Gut. 2015;64:571–579. doi: 10.1136/gutjnl-2013-306625. [DOI] [PubMed] [Google Scholar]
  • 42.Chubb D., Weinhold N., Broderick P., Chen B., Johnson D.C., Försti A., Vijayakrishnan J., Migliorini G., Dobbins S.E., Holroyd A. Common variation at 3q26.2, 6p21.33, 17p11.2 and 22q13.1 influences multiple myeloma risk. Nat. Genet. 2013;45:1221–1225. doi: 10.1038/ng.2733. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Cerhan J.R., Berndt S.I., Vijai J., Ghesquières H., McKay J., Wang S.S., Wang Z., Yeager M., Conde L., de Bakker P.I. Genome-wide association study identifies multiple susceptibility loci for diffuse large B cell lymphoma. Nat. Genet. 2014;46:1233–1238. doi: 10.1038/ng.3105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Happel N., Doenecke D. Histone H1 and its isoforms: contribution to chromatin structure and function. Gene. 2009;431:1–12. doi: 10.1016/j.gene.2008.11.003. [DOI] [PubMed] [Google Scholar]
  • 45.Shum A.K., Alimohammadi M., Tan C.L., Cheng M.H., Metzger T.C., Law C.S., Lwin W., Perheentupa J., Bour-Jordan H., Carel J.C. BPIFB1 is a lung-specific autoantigen associated with interstitial lung disease. Sci. Transl. Med. 2013;5 doi: 10.1126/scitranslmed.3006998. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Shin O.S., Uddin T., Citorik R., Wang J.P., Della Pelle P., Kradin R.L., Bingle C.D., Bingle L., Camilli A., Bhuiyan T.R. LPLUNC1 modulates innate immune responses to Vibrio cholerae. J. Infect. Dis. 2011;204:1349–1357. doi: 10.1093/infdis/jir544. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Bingle L., Wilson K., Musa M., Araujo B., Rassl D., Wallace W.A., LeClair E.E., Mauad T., Zhou Z., Mall M.A., Bingle C.D. BPIFB1 (LPLUNC1) is upregulated in cystic fibrosis lung disease. Histochem. Cell Biol. 2012;138:749–758. doi: 10.1007/s00418-012-0990-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Zhang B., Nie X., Xiao B., Xiang J., Shen S., Gong J., Zhou M., Zhu S., Zhou J., Qian J. Identification of tissue-specific genes in nasopharyngeal epithelial tissue and differentially expressed genes in nasopharyngeal carcinoma by suppression subtractive hybridization and cDNA microarray. Genes Chromosomes Cancer. 2003;38:80–90. doi: 10.1002/gcc.10247. [DOI] [PubMed] [Google Scholar]
  • 49.Liao Q., Zeng Z., Guo X., Li X., Wei F., Zhang W., Li X., Chen P., Liang F., Xiang B. LPLUNC1 suppresses IL-6-induced nasopharyngeal carcinoma cell proliferation via inhibiting the Stat3 activation. Oncogene. 2014;33:2098–2109. doi: 10.1038/onc.2013.161. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Document S1. Figures S1–S10 and Tables S1–S5 and S7
mmc1.pdf (1.8MB, pdf)
Table S6. 6p21.33, 6p22.2, and 20q11.21 Genetic Variants Genotyped on Exome Chip and Their Associations with Lung Cancer Risk for Those with p < 0.05
mmc2.xlsx (52.5KB, xlsx)
Document S2. Article plus Supplemental Data
mmc3.pdf (2.3MB, pdf)

Articles from American Journal of Human Genetics are provided here courtesy of American Society of Human Genetics

RESOURCES