Skip to main content
Carcinogenesis logoLink to Carcinogenesis
. 2012 Mar 1;33(5):1059–1064. doi: 10.1093/carcin/bgs116

Lung cancer and DNA repair genes: multilevel association analysis from the International Lung Cancer Consortium

Rémi Kazma 1,2,3, Marie-Claude Babron 2,4, Valérie Gaborieau 5, Emmanuelle Génin 2,4, Paul Brennan 5, Rayjean J Hung 6, John R McLaughlin 6, Hans E Krokan 7, Maiken B Elvestad 7, Frank Skorpen 8, Endre Anderssen 7, Tõnu Vooder 9, Kristjan Välk 10, Andres Metspalu 10,11, John K Field 12, Mark Lathrop 13,14, Alain Sarasin 1,15,16, Simone Benhamou 1,2,4,15,16,*, for the ILCCO consortium
PMCID: PMC3334518  PMID: 22382497

Abstract

Lung cancer (LC) is the leading cause of cancer-related death worldwide and tobacco smoking is the major associated risk factor. DNA repair is an important process, maintaining genome integrity and polymorphisms in DNA repair genes may contribute to susceptibility to LC. To explore the role of DNA repair genes in LC, we conducted a multilevel association study with 1655 single nucleotide polymorphisms (SNPs) in 211 DNA repair genes using 6911 individuals pooled from four genome-wide case–control studies. Single SNP association corroborates previous reports of association with rs3131379, located on the gene MSH5 (P = 3.57 × 10-5) and returns a similar risk estimate. The effect of this SNP is modulated by histological subtype. On the log-additive scale, the odds ratio per allele is 1.04 (0.84–1.30) for adenocarcinomas, 1.52 (1.28–1.80) for squamous cell carcinomas and 1.31 (1.09–1.57) for other histologies (heterogeneity test: P = 9.1 × 103). Gene-based association analysis identifies three repair genes associated with LC (P < 0.01): UBE2N, structural maintenance of chromosomes 1L2 and POLB. Two additional genes (RAD52 and POLN) are borderline significant. Pathway-based association analysis identifies five repair pathways associated with LC (P < 0.01): chromatin structure, DNA polymerases, homologous recombination, genes involved in human diseases with sensitivity to DNA-damaging agents and Rad6 pathway and ubiquitination. This first international pooled analysis of a large dataset unravels the role of specific DNA repair pathways in LC and highlights the importance of accounting for gene and pathway effects when studying LC.

Introduction

Lung cancer (LC) is the leading cause of cancer death worldwide (1) and tobacco smoking is the major risk factor (2,3). Genome-wide association studies identified several single nucleotide polymorphisms (SNPs) located at 15q25, 5p15 and 6p21 associated with LC (46). Recently, a study pooling 21 case–control samples from the International Lung Cancer Consortium (ILCCO) replicated two of these associations (7).

Several biological pathways may contribute to LC susceptibility, including pathways involved in DNA repair. They maintain genome integrity by reducing replication errors, removing DNA damage and minimizing deleterious rearrangements arising via aberrant recombination, and therefore reducing the mutation frequency of cancer-related genes. Genes coding for proteins of the DNA repair pathways are thus good candidates to test for association with LC.

Previous studies provide little insight on the role of DNA repair pathways in LC. Moreover, interactions between genetic variants and tobacco smoking are also important to investigate in LC. Indeed, in the presence of a gene–environment interaction, testing a single SNP may have less power to detect associations than testing an SNP and its interaction with the environment simultaneously (8). Besides, testing the association between LC and sets of SNPs with a biological meaning (e.g. gene or pathway) might also provide additional insight about the genetic architecture of LC (9).

To investigate the role of DNA repair pathways in LC and their interactions with tobacco smoking, we examined the association between LC and 1655 SNPs located in 211 DNA repair genes using a sample of 6911 individuals pooled from four case–control studies participating in the ILCCO (http://ilcco.iarc.fr/). This study reports results of association tests between LC and single SNPs, gene–environment interaction tests involving tobacco smoking, sex, age and histology as well as gene-based and pathway-based association tests.

Materials and methods

Study population

To study DNA repair genes, principal investigators of all case–control genome-wide association studies in the ILCCO were invited in 2008 to share their data and to participate in a combined analysis. Individual epidemiological and genotypic data from six studies were pooled comprising a total of 3416 LC cases and 4374 controls. The recruitment sites were located in Central Europe, Canada, Norway, Estonia, the United Kingdom and France. Their study designs have already been described extensively in other publications (5,1016) and are summarized in Supplementary Table I, available at Carcinogenesis Online. The United Kingdom and France samples included respectively only cases and only smokers. Since all analyzes required adjusting on study site and on smoking status, we excluded these two samples and the final pooling was completed with the four remaining studies totaling 2683 cases and 4228 controls. Blood samples and clinicopathological information from patients and controls were collected with informed consent and ethical review board approval in each country.

Regarding smoking status, subjects were classified as never-smokers or ever-smokers. Ever-smokers were defined as individuals who smoked daily (for studies in Norway and Estonia) or >100 cigarettes in their lifetime (for studies in Central Europe and Canada). Ever-smokers were further categorized into former smokers (i.e. smokers who had stopped smoking >1 year before inclusion in the four studies) and current smokers. The average number of cigarettes smoked per day and duration of smoking were also collected.

Genotyping, quality control and pathway definition

In all studies, genotyping was performed using the HumanHap300 BeadChips (Illumina, San Diego, CA) on genomic DNA isolated from blood samples. Self-reported European ancestry was previously validated using the program Structure (17,18). The genotypes of the HapMap populations classified as Europeans, Africans and Asians were used as reference panel (5,1013). We compiled a list of 222 genes coding for DNA repair proteins in a broad sense, i.e. proteins involved in the classical DNA repair, in human diseases sensitive to DNA-damaging agents and in cell cycle regulation (19). A total of 1823 SNPs located inside these genes or within the 20 kb flanking regions were selected. Genotype distributions and minor allele frequencies among cases and controls are presented in the Supplementary Table II, available at Carcinogenesis Online. Quality control procedures excluded 61 SNPs with no observed genotype in controls, 32 SNPs located on chromosome X, 62 SNPs with a minor allele frequencies in controls <0.05 and 13 SNPs with genotype distributions among controls significantly different from Hardy–Weinberg proportions (Dunn-Šidák correction: P < 3 × 10−5) (20,21). The 168 SNPs excluded from the analysis are flagged in the last column of the Supplementary Table II, available at Carcinogenesis Online.

Therefore, the analysis was completed on a total of 1655 SNPs (relating to 211 autosomal genes), 874 of them being located within the genes and 781 being located in the 20 kb flanking regions. These 211 genes were then classified in 13 pathways (Supplementary Table II is available at Carcinogenesis Online): 20 genes involved in base excision repair (BER), 22 genes involved in cell cycle regulation, 19 genes involved in chromatin structure (CHS), 3 genes involved in direct reversal of damage, 24 genes involved in homologous recombination (HR), 11 genes involved in mismatch repair, 6 genes involved in modulation of nucleotide pools, 23 genes involved in nucleotide excision repair, 10 genes involved in non-homologous end joining, 22 genes involved in DNA polymerases (POL), 7 genes involved in Rad6 pathway and ubiquitination (RPU), 16 genes involved in human diseases with sensitivity to DNA-damaging agents (SDA) and 28 genes coding for telomerases, topoisomerases and replicative accessory proteins (TEL).

Statistical analysis

Association between LC and the 1655 SNPs was individually assessed using multivariate unconditional logistic regression adjusted for smoking status (never/ever-smoker), age, sex and study site. Odds ratios (ORs) and 95% confidence intervals for the heterozygous and homozygous carriers of the minor allele were calculated using the homozygous carrier of the other allele genotype as reference. Assuming a codominant log-additive genetic model, OR per allele, 95% confidence interval and P-value for the 1-degree of freedom Cochran–Armitage trend test were also calculated.

To account for multiple testing, the effective number of tests was computed using the linkage disequilibrium matrix obtained with Haploview (22) and the method of Li and Ji (23). We determined that setting a threshold for significance at 4.00 × 10−5 ensured proper correction for multiple testing while maintaining overall type 1 error rate at 5%.

Potential heterogeneity was explored for the SNP found significantly associated with LC as a function of the other covariates (sex, age, smoking status and pack-years of smoking). To do so, multivariate unconditional logistic regression models were used adjusting for all other important covariates. Heterogeneity according to histological subtype was assessed through a multinomial logistic regression model using the control group as reference (8). Since only non-small cell lung carcinomas were included in the Estonian sample, the analysis stratifying on histology was conducted on the three other studies pooled.

Interaction with tobacco smoking status was evaluated for the 1655 SNPs through a multivariate logistic regression model that included an SNP × tobacco interaction term.

The 1655 SNPs studied relate to one of the 211 genes organized in 13 pathways. To analyze simultaneously sets of SNPs belonging to the same gene or pathway, we used the SNP-set Kernel Association Test as described in Wu et al. (24). Briefly, this method uses a logistic kernel machine model that takes into account the joint effect of the SNPs in the considered SNP set and can incorporate covariates. Missing genotypes are imputed based on Hardy–Weinberg proportions. To maximize the power of the test to detect complex and/or epistatic effects, we used the Identity-by-State kernel as suggested by the authors, adjusting for smoking status (never/ever-smoker), age, sex and study site.

Single SNP and interaction tests were carried out using Stata SE 10 (25). Gene-based and pathway-based association tests were carried out using R (26) with the package SNP-set Kernel Association Test v0.4 (24).

Results

Sample characteristics

Compared with controls, cases were more often males (72 versus 63%) and older (mean age of 61.4 versus 55.7 years). These significant differences were observed in the Central European, Canadian and Estonian studies. As expected, the proportion of current smokers (68.9%) and ever-smokers (90.3%) in cases was higher than in controls and ever-smokers in cases had higher pack-years of smoking than ever-smokers in controls. Histology was available for 2304 cases (85.9% of the total sample) in three studies (Central Europe, Canada and Norway). They comprised 583 adenocarcinomas (25.3%), 911 squamous cell carcinomas (39.6%), 358 small cell carcinomas (15.5%) and 452 mixed cells or other histological types (19.6%) (Table I).

Table I.

Characteristics of LC cases and controls

Characteristics Pooled
P Central Europe
Canada
Norway
Estonia
Cases Controls Cases Controls Cases Controls Cases Controls Cases Controls
Total sample 2683 4228 1841 2441 330 500 403 412 109 875
Males (%) 1931 (72) 2653 (63) 2.6 × 10−15 1435 (78) 1773 (73) 158 (48) 177 (35) 251 (62) 230 (56) 87 (80) 473 (54)
Agea, mean (SD) 61.4 (9.6) 55.7 (14.1) 1.4 × 10−74 60.3 (8.7) 59.6 (9.7) 63.8 (11.5) 52.3 (15.8) 63.7 (10.6) 63.2 (11.3) 64.5 (10.3) 43.0 (16.1)
Smoking status (%)
    Never 255 (9.7) 1614 (38.8) 137 (7.4) 867 (35.5) 77 (28.4) 199 (43.6) 27 (6.8) 120 (31.1) 14 (12.9) 428 (49.0)
    Ever 2366 (90.3) 2542 (61.2) 1704 (92.6) 1572 (64.4) 194 (71.6) 258 (56.4) 373 (93.2) 266 (68.9) 95 (87.1) 446 (51.0)
        Former 536 (20.5) 1065 (25.6) 362 (19.7) 628 (25.8) 94 (34.7) 139 (30.4) 79 (19.7) 149 (38.6) 1 (0.9) 149 (17.0)
        Current 1806 (68.9) 1433 (34.5) 1338 (72.7) 937 (38.4) 91 (33.6) 86 (18.8) 283 (70.7) 113 (29.3) 94 (86.2) 297 (34.0)
        Unknownb 24 (0.9) 44 (1.1) 4 (0.2) 7 (0.3) 9 (3.3) 33 (7.2) 11 (2.8) 4 (1.0) 0 (0.0) 0 (0.0)
    Unavailable 62 72 1.4 × 10−197 0 2 59 43 3 26 0 1
Pack-years of smokingc, mean (SD) 37.5 (21.2) 24.0 (19.2) 8.4 × 10−110 38.1 (19.1) 27.4 (18.2) 48.9 (37.0) 25.7 (28.3) 27.3 (16.0) 17.4 (14.8) 39.8 (18.0) 13.2 (13.1)
Histology (%)
    Adenocarcinoma 583 (25.3) 417 (22.6) 90 (55.5) 76 (25.2)
    Squamous cell 911 (39.6) 781 (42.4) 50 (30.9) 80 (26.6)
    Small cell 358 (15.5) 283 (15.4) 22 (13.6) 53 (17.6)
    Mixed/Other 452 (19.6) 360 (19.6) 0 (0.0) 92 (30.6)
    Unavailable 379 0 168 102

na, not available; SD, standard deviation; P, P-value of test of comparison between cases and controls (t-test or chi-square test).

a

in years.

b

The ‘Unknown’ category corresponds to ever-smokers whose former or current smoking status is undetermined.

c

among ever-smokers.

Single SNP association

Results for the top 30 SNPs ranked by lowest P-value are reported in Table II. Accounting for multiple testing, only the first SNP, rs3131379 (belonging to the MSH5 gene), is significantly associated with LC with a trend test P-value of 3.57 × 10−5. The OR per allele is 1.30 with a 95% confidence interval (1.15–1.47). Table III summarizes the results for rs3131379 of the overall analysis and when stratifying by sex, age, smoking status, pack-years of smoking and histology. In particular, the OR of association between rs3131379 and LC is modulated by histological subtype. On the log-additive scale, the OR associated with the A allele for this SNP was 1.04 (0.84–1.30) for adenocarcinomas, 1.52 (1.28–1.80) for squamous cell carcinomas and 1.31 (1.09–1.57) for other histological types. The test of heterogeneity between these three ORs had a P-value of 9.1 × 10−3.

Table II.

Top 30 SNPs associated to LC among the 1.655 SNPs in or nearby DNA repair genes

SNP Gene Pathway Chromosome Allele
MAF
Adjusted OR per allele 95% CI P
Minor Other Cases Controls
rs3131379 MSH5 MMR 6 A G 0.114 0.091 1.30 1.15–1.47 3.57 × 10−5
rs249633 MSH3 MMR 5 G A 0.258 0.241 1.18 1.08–1.29 2.18 × 10−4
rs1745335 POLN POL 4 A G 0.125 0.106 1.25 1.11–1.40 2.70 × 10−4
rs529966 POLN POL 4 G A 0.125 0.106 1.24 1.10–1.40 3.04 × 10−4
rs2736100 TERT POL 5 C A 0.515 0.476 1.15 1.07–1.24 3.07 × 10−4
rs9328764 POLN POL 4 A G 0.126 0.108 1.24 1.10–1.40 3.13 × 10−4
rs7659386 POLN POL 4 C T 0.126 0.108 1.24 1.10–1.40 3.31 × 10−4
rs4678 GTF2H4 NER 6 A G 0.231 0.200 1.18 1.08–1.30 3.51 × 10−4
rs10011549 POLN POL 4 T C 0.126 0.108 1.24 1.10–1.39 3.75 × 10−4
rs10018786 POLN POL 4 G T 0.126 0.108 1.24 1.10–1.39 4.09 × 10−4
rs6830513 POLN POL 4 C T 0.126 0.108 1.24 1.10–1.39 4.29 × 10−4
rs9534262 BRCA2 HR 13 T C 0.486 0.450 1.15 1.06–1.24 4.89 × 10−4
rs4942486 BRCA2 HR 13 T C 0.487 0.450 1.14 1.06–1.24 5.75 × 10−4
rs7893335 MMS19L NER 10 G T 0.148 0.166 0.85 0.76–0.94 1.47 × 10−3
rs206079 BRCA2 HR 13 A G 0.446 0.407 1.12 1.04–1.21 2.78 × 10−3
rs2236184 RAD51L1 HR 14 T C 0.307 0.291 1.14 1.04–1.23 2.79 × 10−3
rs6599418 POLN POL 4 A G 0.383 0.396 0.89 0.82–0.96 3.64 × 10−3
rs1012130 BRCA2 HR 13 C T 0.247 0.282 0.88 0.81–0.96 3.98 × 10−3
rs4899246 RAD51L1 HR 14 G A 0.219 0.204 1.15 1.04–1.26 3.99 × 10−3
rs7003908 PRKDC NHEJ 8 C A 0.370 0.353 1.12 1.04–1.22 4.04 × 10−3
rs10055011 POLK POL 5 A G 0.104 0.122 0.84 0.74–0.95 4.70 × 10−3
rs916962 RAD51L1 HR 14 A G 0.249 0.273 0.88 0.81–0.97 5.75 × 10−3
rs206319 BRCA2 HR 13 G A 0.344 0.317 1.12 1.03–1.21 5.95 × 10−3
rs2213178 PRKDC NHEJ 8 A G 0.312 0.301 1.12 1.03–1.22 5.97 × 10−3
rs618262 POLN POL 4 C A 0.061 0.051 1.26 1.07–1.49 6.79 × 10−3
rs3737559 BRCA1 HR 17 T C 0.090 0.106 0.84 0.74–0.95 7.19 × 10−3
rs13411119 FANCL SDA 2 A G 0.299 0.281 1.12 1.03–1.22 7.25 × 10−3
rs524051 POLN POL 4 A G 0.109 0.098 1.19 1.05–1.34 7.76 × 10−3
rs206120 BRCA2 HR 13 G A 0.186 0.166 1.14 1.03–1.26 8.62 × 10−3
rs11937432 POLN POL 4 G A 0.063 0.053 1.24 1.05–1.45 9.61 × 10−3

CI, confidence interval; MAF, minor allele frequencies; MMR: mismatch repair; NER, nucleotide excision repair; NHEJ, non-homologous end joining; P, P-value of the trend test of association adjusting for study, age, sex and tobacco smoking; POL, DNA polymerases.

Table III.

Overall and stratified results for SNP rs3131379 located in the MSH5 gene using unconditional logistic regression

Stratum Sample size
A/G versus G/G genotype
A/A versus G/G genotype
Log-additive model
P
Cases Controls OR 95% CI OR 95% CI OR 95% CI
Overalla 2607 4132 1.25 1.08–1.44 2.20 1.35–3.60 1.30 1.15–1.47 3.57 × 10−5
Sexb 0.27
    Male 1887 2608 1.30 1.10–1.54 2.51 1.37–4.60 1.36 1.17–1.58
    Female 720 1524 1.12 0.87–1.46 1.69 0.72–3.96 1.17 0.93–1.47
Agec (years) 0.98
    ≤50 379 1323 1.25 0.90–1.72 2.37 0.66–8.56 1.30 0.97–1.74
    >50 2228 2806 1.24 1.06–1.45 2.16 1.27–3.65 1.29 1.13–1.48
Smoking statusd 0.99
    Never-smokers 254 1608 1.24 0.87–1.76 2.47 0.75–8.14 1.31 0.95–1.79
    Former smokers 513 984 1.16 0.85–1.57 2.88 1.14–7.23 1.30 1.00–1.68
    Current smokers 1760 1389 1.28 1.04–1.57 2.11 1.01–4.41 1.32 1.10–1.58
Smoking quantityd 0.22
    Non-smokers 254 1608 1.22 0.86–1.73 2.45 0.74–8.04 1.29 0.95–1.77
    ≤20 425 1180 1.02 0.75–1.38 1.39 0.51–3.85 1.06 0.81–1.38
    >20 1863 1208 1.36 1.11–1.66 2.44 1.20–4.98 1.41 1.18–1.68
Histologye 9.10 × 10−3
    Adenocarcinoma 580 3260 1.03 0.81–1.32 1.20 0.49–2.96 1.04 0.84–1.30
    Squamous cell 907 3260 1.35 1.10–1.66 3.89 2.13–7.09 1.52 1.28–1.80
    Others 807 3260 1.29 1.04–1.58 2.00 0.99–4.05 1.31 1.09–1.57

CI, confidence interval; P, P-value of trend test for the overall analysis and P-value for heterogeneity for the stratified analyses.

a

Adjusting for study, age, sex and smoking status (never/ever-smokers).

b

Adjusting for study, age and smoking status (never/ever-smokers).

c

Adusting for study, sex and smoking status (never/ever-smokers).

d

Adjusting for study, age, sex and pack-years (coded quantitatively).

e

Adjusting for study, age, sex and smoking status (never/ever-smokers).

Of the top 30 SNPs (Table II), 12 were located within or close to the POLN gene on chromosome 4. The ORs are all similar with an effect per allele of 1.24 (1.10–1.39), except for rs6599418 whose OR per allele is 0.89 (0.82–0.96). These SNPs are in strong linkage disequilibrium (D′ > 0.95) except rs524051 which has a D′ between 0.68 and 0.74 with the other SNPs (Supplementary Figure is available at Carcinogenesis Online).

SNP × tobacco interaction

The interaction analysis (Table IV) yielded potential differences in LC risk association with rs2930961 (inside the RAD54B gene) between never- and ever-smokers but the P-value was not significant (P = 2.31 × 10−4) for an interaction coefficient estimate of 1.52 (1.21–1.90). Compared with the T/T genotype, the ORs were 0.79 (0.59–1.05) for the T/C genotype and 0.43 (0.25–0.73) for the C/C genotype among never-smokers. These two ORs were respectively 1.10 (0.98–1.25) and 1.13 (0.94–1.36), among ever-smokers. Note that the association test of rs2930961 alone was not significant (P = 0.06). No interaction with tobacco smoking was significant with any other SNP.

Table IV.

Results for SNP rs2930961 on RAD54B gene using unconditional logistic regression

Stratum Sample size
C/T versus T/T genotype
C/C versus T/T genotype
Log-additive model
P
Cases Controls OR 95% CI OR 95% CI OR 95% CI
Smoking status
    Never-smokers 255 1613 0.79 0.59–1.05 0.43 0.25–0.73 0.71 0.71–0.87
    Ever-smokers 2366 2535 1.11 0.98–1.25 1.13 0.94–1.36 1.08 0.99–1.17
Interaction 1.52 1.21–1.90 2.31 × 10−4

CI, confidence interval; P, P-value for rs2930961 × tobacco interaction adjusting on study, age, sex and pack-years (coded quantitatively).

Gene-based and pathway-based association

Among the 211 genes analyzed, 3 genes were associated to LC with a P-value <0.01 (after Dunn-Šidák correction (20,21)) using SNP-set Kernel Association Test. These genes are UBE2N (P = 2 × 10−8), structural maintenance of chromosomes (SMC)1L2 (P = 8 × 10−7) and POLB (P = 10−6). The Dunn-Šidák-corrected P-values of two other genes, RAD52 and POLN, are between 0.01 and 0.05 (Table V).

Table V.

Top 15 genes associated to LC among the 211 DNA repair genes

Gene Pathway Chromosome Number of SNPs P Pcorrected
UBE2N RPU 12 7 3.25 × 10−8 6.86 × 10−6
SMC1L2 CHS 22 11 1.70 × 10−6 3.59 × 10−4
POLB POL 8 3 3.51 × 10−6 7.40 × 10−4
RAD52 HR 12 9 6.14 × 10−5 1.29 × 10−2
POLN POL 4 20 3.09 × 10−4 6.31 × 10−2
BRCA2 HR 17 21 5.50 × 10−4 0.11
MAD2L1 CHS 4 5 9.69 × 10−4 0.18
FANCA SDA 16 9 2.72 × 10−3 0.44
PRKDC NHEJ 8 5 2.75 × 10−3 0.44
CHEK2 CCR 22 6 4.03 × 10−3 0.57
TOPBP1 TEL 3 12 4.29 × 10−3 0.60
DCLRE1B NHEJ 1 9 6.21 × 10−3 0.73
RAD54L HR 1 4 7.02 × 10−3 0.77
MSH4 MMR 1 11 7.10 × 10−3 0.78
FANCJ SDA 17 11 8.10 × 10−3 0.82

CCR, cell cycle regulation; MMR, mismatch repair; NHEJ, non-homologous end joining; P, P-value of the gene-based association test adjusting for study, age, sex and tobacco smoking; Pcorrected, Dunn-Šidák-corrected P-value (211 tests); POL, DNA polymerases; TEL, telomerases, topoisomerases and accessory proteins.

Among the 13 pathways previously described, 5 pathways were associated to LC with a Dunn-Šidák-corrected P-value <0.01 (Table VI). These five pathways ordered in increasing P-value are: CHS, DNA polymerases (POL), HR, genes involved in human diseases with SDA as well as RPU.

Table VI.

Pathway-based association test results

Acronym Pathway Number of genes Number of SNPs P Pcorrected
CHS Chromatin structure 19 160 8.6 × 10−6 1.12 × 10−4
POL DNA polymerases 22 165 2.1 × 10−5 2.73 × 10−4
HR Homologous recombination 24 221 2.7 × 10−5 3.51 × 10−4
SDA Genes defective in diseases associated with sensitivity to DNA-damaging agents 16 129 3.7 × 10−4 4.80 × 10−3
RPU Rad6 pathway and ubiquitination 7 41 3.8 × 10−4 4.93 × 10−3
NHEJ Non-homologous end joining 10 82 1.2 × 10−3 1.55 × 10−2
TEL Telomerases, topoisomerases and replicative accessory proteins 28 213 3.5 × 10−3 4.46 × 10−2
MMR Mismatch repair 11 88 6.3 × 10−3 0.08
NER Nucleotide excision repair 23 161 1.3 × 10−2 0.16
CCR Cell cycle regulation 22 162 3.2 × 10−2 0.34
BER Base excision repair 20 137 0.12 0.81
DRD Direct reversal of damage 3 56 0.16 0.90
MNP Modulation of nucleotide pool 6 40 0.20 0.95
Total 211 1655

P: P-value of the pathway-based association test adjusting for study, age, sex and smoking status; Pcorrected, Dunn-Šidák-corrected P-value (13 tests).

Discussion

Genome-wide association studies that focus on single marker analysis have been successful for identifying loci associated to LC (47). However, the origin of such a complex disease is more likely due to several genes involved in pertinent biological pathways rather than to single polymorphisms. Several innovative statistical methods for analyzing sets of SNPs have recently been developed (24). We report the first international pooled analysis with a large dataset that investigates the association of LC with DNA repair genes on a multilevel scale: single SNP, gene–environment interaction, gene-based and pathway-based tests.

The analysis of 1655 SNPs belonging to 211 DNA repair genes confirms a previously reported association with an SNP located in the intronic region of the MSH5 gene on chromosome 6 with a similar risk estimate (27). This gene encodes a member of the MutS family of proteins that are involved in DNA mismatch repair and in mitotic or meiotic recombination processes. Moreover, the interaction with the squamous cell histological subtype, which is mainly attributable to smoking, corroborates the hypothesis of an effect of this variant linked to a failure to protect from tobacco carcinogens.

The rare allele (C) of SNP rs2930961, located in the intronic region of RAD54B, is associated to an increased risk of LC among ever-smokers and a decreased risk of LC among never-smokers. RAD54B, a DNA-dependent adenosine triphosphatase, is also involved in mitotic and meiotic homologous repair. This pathway is considered as an error-free mode of removing DNA lesions that stall replication forks. Homozygous mutations of this gene are often found in various cancers (28,29).

The gene-based association reveals three genes (UBE2N, SMC1L2 and POLB) that are clearly associated to LC and two borderline significant genes (RAD52 and POLN).

The UBE2N gene (UBC13 homolog) encodes a member of the E2 ubiquitin-conjugating enzyme necessary for protein ubiquitination targeting abnormal or short-lived proteins for degradation but also modifying proteins through the Lys-63 residue in the non-proteolytic regulation of cellular signaling. This sequential ubiquitination process is composed of various ubiquitin cascades required for DNA repair and checkpoint signaling in response to DNA damage (30). For example, UBE2N is implicated in the Rad6/Rad18-dependent post-replication repair and translesion synthesis following UV and ionizing radiations. This pathway can be an error-prone and mutagenic way to cope with unrepairable DNA adducts. Cells deficient for this protein are compromised for double-strand repair by HR (31) and UBE2N is necessary for H2A-histone ubiquitination following DNA damage and for 53BP1 and BRCA1 foci formation to lesions (32). This protein is involved in numerous pathways linking DNA lesions and cell signaling, including a regulation of cell localization and activity of the tumor suppressor p53 protein (33).

The SMC1L2 (SMC1B) gene codes a subunit of the multifunctional protein SMC1, which is a member of the SMC family. SMC1 is part of a hetero-tetrameric complex called cohesin. The cohesin complex encircles and holds together chromatids during the mitotic phase until their separation in anaphase. The proximity of sister chromatids facilitates the repair of chromosome lesions by HR. The cohesin appears to be also part of cell cycle checkpoint regulation following DNA damage induced by genotoxic agents (34,35). Based on animal studies, SMC1B is required for sister chromatid cohesion during mitosis and meiosis and for DNA recombination (31).

The POLB gene codes the polymerase β protein (Polβ). Although we have classified it in the polymerase group, Polβ is the main DNA polymerase used for the short-gap filling step during BER, particularly during repair of an 8-oxo-G residue (36). Polβ knockout mice are not viable and cells with a low level of Polβ have reduced BER capacity, accelerated DNA damage and increased mutational response to carcinogens (37). Cigarette smoking gives rise to multiple damage, among which oxidized bases such as 8-oxo-G, that are repaired by BER. Consequently, association of LC with POLB may explain a higher tendency to accumulate unrepaired DNA lesions and therefore a higher frequency of cancer. Several mutations on POLB have been associated with human cancers (38) and a given POLB haplotype has been associated with bladder cancer, another smoking-related cancer (39).

RAD52 and POLN lie just at the significance level of the gene-based association tests (Table V). The RAD52 gene encodes a major protein for DNA double-strand break repair by HR. The RAD52 protein binds to single-stranded DNA ends and mediates the DNA–DNA interaction, in collaboration with the RAD51 recombination protein, for the annealing of complementary strands of DNA bound by replication protein A (40). Interestingly, RAD52 and OGG1 have been shown to cooperate to repair oxidative DNA damages that are removed by BER. RAD52 cellular depletion allows accumulation of oxidized bases (41). The Polβ polymerase is the enzyme that acts subsequently to the removal of 8-oxo-G by OGG1 during the BER cascade.

POLN encodes a newly discovered translesion DNA polymerase (Polν) able to replicate large bulky adducts such as, in an error-prone manner, a (+)-trans-anti-benzo(a)pyrene diolepoxide-deoxyadenosine adduct. This type of lesion is typically produced by cigarette smoking (42). This polymerase is also involved in repair of crosslinks by HR in association with the Fanconi anemia pathway (43).

The pathway-based association discloses five pathways of DNA repair that are clearly associated to LC: CHS, DNA polymerases (POL), HR, genes involved in human diseases with SDA and RPU. Interestingly, the five genes significantly associated with LC in the gene-based tests belong to four of these pathways (CHS, POL, HR and RPU). Although the SDA pathway is significantly associated to LC, none of the genes in that pathway are associated to LC when tested individually. This suggests a different genetic model for the effect of this pathway on LC, probably involving many genes with small effects.

Genes of the CHS category are necessary either to signal the presence of DNA lesions and to start the DNA damage response or to allow DNA repair at the chromatid level probably by HR. Modifications of the structure of these proteins may have important effects on the way cells are able to cope with carcinogens. The response to DNA damage involves signalization from chromatin proteins to recombination pathways via ubiquitination of numerous proteins. Repair of or translesion synthesis across oxidized bases or bulky adducts produced through cigarette smoking could be modified by the genetic variations on these genes.

Many DNA damages induce lesions of the two DNA strands, such as interstrand crosslinks or double-strand breaks. These lesions are removed by recombination. Because the HR pathway is believed to be essentially error-free, its efficient activity is important as a cancer-reducing pathway.

In the SDA category, we included all genes known to be associated with human diseases with increased cellular sensitivity to a given genotoxic agent but for which the role of the gene product is not exactly known. However, most of these gene products are associated with DNA repair by HR, by translesion synthesis, or by starting DNA damage signaling. Among this list of 16 genes, 9 belong to the Fanconi anemia pathway known to form a large protein complex involved in the repair of bulky adducts and crosslinks probably by HR.

The large group of POL comprises not only replicative polymerases but also all translesion polymerases able to replicate through DNA adducts or during the replicative step of DNA repair. Because most of the gene products of this latter category are error-prone enzymes, their activity could easily be linked to increased mutation rate and cancer.

Finally, the RPU pathway comprises seven genes involved in post-replication repair. But ubiquitination is involved in such a large number of enzymatic reactions that identifying the major pathway involved in the context of this study is difficult. However, the Rad6 pathway is able to activate and regulate the Fanconi anemia pathway in which the FANCD2 protein is mono-ubiquitinated (44). Since the Fanconi anemia pathway also modulates translesion synthesis, the Rad6 and the Fanconi anemia pathways are intertwined to maintain chromosomal stability and to suppress the development of cancer.

In this study, we took advantage of the large-scale pooled dataset from the ILCCO to systematically investigate the association between hundreds of SNPs in DNA repair genes and LC risk using a multilevel approach. This approach explores virtually all known human DNA repair genes. This study is thus far the most comprehensive large-scale analysis focusing on the association between DNA repair genes and LC. Collectively, our findings emphasize on the importance of accounting for gene and pathway effects in LC studies in addition to standard single SNP association tests. Furthermore, the pathways evidenced here are probably not specific to LC and may provide insight into the genetic basis of other smoking-related cancers.

Supplementary material

Supplementary Tables I and II and Figure 1 can be found at http://carcin.oxfordjournals.org/.

Funding

The study was financially supported by grants from the Programme Hospitalier de Recherche Clinique and from the Fondation de France. R.K. was supported by grants from the President of University Paris-Sud, from the Fondation pour la Recherche Médicale and from National Institutes of Health (R25CA112355). The Estonian study was supported by Targeted Financing grant (SF0180142s08) from Estonian Ministry of Education and Research; Estonian Science Foundation grant (6465); EU FP7-REGPOT-2009-1 grant #245536 OPENGENE and by the European Union through the European Regional Development Fund, in the frame of Centre of Excellence in Genomics.

Supplementary Material

Supplementary Data

Acknowledgments

Conflict of Interest Statement: None declared.

Glossary

Abbreviations

BER

base excision repair

CHS

chromatin structure

HR

homologous recombination

ILCCO

International Lung Cancer Consortium

LC

lung cancer

OR

odds ratio

RPU

Rad6 pathway and ubiquitination

SDA

sensitivity to DNA-damaging agents

SMC

structural maintenance of chromosomes

SNP

single nucleotide polymorphism

References

  • 1.Ferlay J, et al. GLOBOCAN 2008. Cancer Incidence and Mortality Worldwide: IARC CancerBase No. 10 [Internet] Lyon: International Agency for Research on Cancer; 2008. http://globocan.iarc.fr (December 2011, date last accessed) [Google Scholar]
  • 2.Doll R, et al. The mortality of doctors in relation to their smoking habits: a preliminary report. Br. Med. J. 1954;1:1451–1455. doi: 10.1136/bmj.1.4877.1451. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Shopland DR, et al. Smoking-attributable cancer mortality in 1991: is lung cancer now the leading cause of death among smokers in the United States? J. Natl Cancer Inst. 1991;83:1142–1148. doi: 10.1093/jnci/83.16.1142. [DOI] [PubMed] [Google Scholar]
  • 4.Amos CI, et al. Genome-wide association scan of tag SNPs identifies a susceptibility locus for lung cancer at 15q25.1. Nat. Genet. 2008;40:616–622. doi: 10.1038/ng.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Hung RJ, et al. A susceptibility locus for lung cancer maps to nicotinic acetylcholine receptor subunit genes on 15q25. Nature. 2008;452:633–637. doi: 10.1038/nature06885. [DOI] [PubMed] [Google Scholar]
  • 6.Thorgeirsson TE, et al. A variant associated with nicotine dependence, lung cancer and peripheral arterial disease. Nature. 2008;452:638–642. doi: 10.1038/nature06846. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Truong T, et al. Replication of lung cancer susceptibility loci at chromosomes 15q25, 5p.15, and 6p.21: a pooled analysis from the International Lung Cancer Consortium. J. Natl Cancer Inst. 2010;102:959–971. doi: 10.1093/jnci/djq178. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Kazma R, et al. Genetic association and gene-environment interaction: a new method for overcoming the lack of exposure information in controls. Am. J. Epidemiol. 2011;173:225–235. doi: 10.1093/aje/kwq352. [DOI] [PubMed] [Google Scholar]
  • 9.Wang K, et al. Pathway-based approaches for analysis of genomewide association studies. Am. J. Hum. Genet. 2007;81:1278–1283. doi: 10.1086/522374. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Brenner DR, et al. Lung cancer risk in never-smokers: a population-based case-control study of epidemiologic risk factors. BMC Cancer. 2010;10:285. doi: 10.1186/1471-2407-10-285. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Holmen J, et al. The Nord-Trøndelag Health study 1995–7 (HUNT 2): objectives, contents, methods and participation. Norsk Epidemiologi. 2003;13:19–32. [Google Scholar]
  • 12.Nelis M, et al. Genetic structure of Europeans: a view from the North-East. PLoS One. 2009;4:e5472. doi: 10.1371/journal.pone.0005472. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Scelo G, et al. Occupational exposure to vinyl chloride, acrylonitrile and styrene and lung cancer risk (Europe) Cancer Causes Control. 2004;15:445–452. doi: 10.1023/B:CACO.0000036444.11655.be. [DOI] [PubMed] [Google Scholar]
  • 14.Field JK, et al. The Liverpool Lung Project research protocol. Int. J. Oncol. 2005;27:1633–1645. [PubMed] [Google Scholar]
  • 15.Välk K, et al. Gene expression profiles of non-small cell lung cancer: survival prediction and new biomarkers. Oncology. 2010;79:283–292. doi: 10.1159/000322116. [DOI] [PubMed] [Google Scholar]
  • 16.Feyler A, et al. Myeloperoxidase -463G --> a polymorphism and lung cancer risk. Cancer Epidemiol. Biomarkers Prev. 2002;11:1550–1554. [PubMed] [Google Scholar]
  • 17.Falush D, et al. Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics. 2003;164:1567–1587. doi: 10.1093/genetics/164.4.1567. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Pritchard JK, et al. Inference of population structure using multilocus genotype data. Genetics. 2000;155:945–959. doi: 10.1093/genetics/155.2.945. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Kauffmann A, et al. High expression of DNA repair pathways is associated with metastasis in melanoma patients. Oncogene. 2008;27:565–573. doi: 10.1038/sj.onc.1210700. [DOI] [PubMed] [Google Scholar]
  • 20.Dunn OJ. Multiple comparisons among means. J. Am. Stat. Assoc. 1961;56:52–64. [Google Scholar]
  • 21.Šidàk Z. Rectangular confidence region for the means of multivariate normal distributions. J. Am. Stat. Assoc. 1967;62:626–633. [Google Scholar]
  • 22.Barrett JC, et al. Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics. 2005;21:263–265. doi: 10.1093/bioinformatics/bth457. [DOI] [PubMed] [Google Scholar]
  • 23.Li J, et al. Adjusting multiple testing in multilocus analyses using the eigenvalues of a correlation matrix. Heredity. 2005;95:221–227. doi: 10.1038/sj.hdy.6800717. [DOI] [PubMed] [Google Scholar]
  • 24.Wu MC, et al. Powerful SNP-set analysis for case-control genome-wide association studies. Am. J. Hum. Genet. 2010;86:929–942. doi: 10.1016/j.ajhg.2010.05.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.StataCorp. Statistical Software: Release 10.0. College Station, TX: Stata corporation; 2001. [Google Scholar]
  • 26.R Development Core Team. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing; 2011. [Google Scholar]
  • 27.Wang Y, et al. Common 5p15.33 and 6p21.33 variants influence lung cancer risk. Nat. Genet. 2008;40:1407–1409. doi: 10.1038/ng.273. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Hiramoto T, et al. Mutations of a novel human RAD54 homologue, RAD54B, in primary cancer. Oncogene. 1999;18:3422–3426. doi: 10.1038/sj.onc.1202691. [DOI] [PubMed] [Google Scholar]
  • 29.McManus KJ, et al. Specific synthetic lethal killing of RAD54B-deficient human colorectal cancer cells by FEN1 silencing. Proc. Natl Acad. Sci. USA. 2009;106:3276–3281. doi: 10.1073/pnas.0813414106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Wang B, et al. Ubc13/Rnf8 ubiquitin ligases control foci formation of the Rap80/Abraxas/Brca1/Brcc36 complex in response to DNA damage. Proc. Natl Acad. Sci. USA. 2007;104:20759–20763. doi: 10.1073/pnas.0710061104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Zhao GY, et al. A critical role for the ubiquitin-conjugating enzyme Ubc13 in initiating homologous recombination. Mol. Cell. 2007;25:663–675. doi: 10.1016/j.molcel.2007.01.029. [DOI] [PubMed] [Google Scholar]
  • 32.Stewart GS, et al. The RIDDLE syndrome protein mediates a ubiquitin-dependent signaling cascade at sites of DNA damage. Cell. 2009;136:420–434. doi: 10.1016/j.cell.2008.12.042. [DOI] [PubMed] [Google Scholar]
  • 33.Topisirovic I, et al. Control of p53 multimerization by Ubc13 is JNK-regulated. Proc. Natl Acad. Sci. USA. 2009;106:12676–12681. doi: 10.1073/pnas.0900596106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Bauerschmidt C, et al. Cohesin phosphorylation and mobility of SMC1 at ionizing radiation-induced DNA double-strand breaks in human cells. Exp. Cell Res. 2011;317:330–337. doi: 10.1016/j.yexcr.2010.10.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Kitagawa R, et al. Phosphorylation of SMC1 is a critical downstream event in the ATM-NBS1-BRCA1 pathway. Genes Dev. 2004;18:1423–1438. doi: 10.1101/gad.1200304. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Sobol RW, et al. Requirement of mammalian DNA polymerase-beta in base-excision repair. Nature. 1996;379:183–186. doi: 10.1038/379183a0. [DOI] [PubMed] [Google Scholar]
  • 37.Cabelof DC, et al. Base excision repair deficiency caused by polymerase beta haploinsufficiency: accelerated DNA damage and increased mutational response to carcinogens. Cancer Res. 2003;63:5799–5807. [PubMed] [Google Scholar]
  • 38.Bhattacharyya N, et al. A variant of DNA polymerase beta acts as a dominant negative mutant. Proc. Natl Acad. Sci. USA. 1997;94:10324–10329. doi: 10.1073/pnas.94.19.10324. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Michiels S, et al. Genetic polymorphisms in 85 DNA repair genes and bladder cancer risk. Carcinogenesis. 2009;30:763–768. doi: 10.1093/carcin/bgp046. [DOI] [PubMed] [Google Scholar]
  • 40.Grimme JM, et al. Human Rad52 binds and wraps single-stranded DNA and mediates annealing via two hRad52-ssDNA complexes. Nucleic Acids Res. 2010;38:2917–2930. doi: 10.1093/nar/gkp1249. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.de Souza-Pinto NC, et al. The recombination protein RAD52 cooperates with the excision repair protein OGG1 for the repair of oxidative lesions in mammalian cells. Mol. Cell Biol. 2009;29:4441–4454. doi: 10.1128/MCB.00265-09. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Yamanaka K, et al. Novel enzymatic function of DNA polymerase nu in translesion DNA synthesis past major groove DNA-peptide and DNA-DNA cross-links. Chem. Res. Toxicol. 2010;23:689–695. doi: 10.1021/tx900449u. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Moldovan GL, et al. DNA polymerase POLN participates in cross-link repair and homologous recombination. Mol. Cell Biol. 2010;30:1088–1096. doi: 10.1128/MCB.01124-09. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Park HK, et al. Convergence of Rad6/Rad18 and Fanconi anemia tumor suppressor pathways upon DNA damage. PLoS One. 2010;5:e13313. doi: 10.1371/journal.pone.0013313. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Data

Articles from Carcinogenesis are provided here courtesy of Oxford University Press

RESOURCES