Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2013 Sep 27.
Published in final edited form as: Twin Res Hum Genet. 2012 Jul 13;15(5):615–623. doi: 10.1017/thg.2012.38

Genome-wide Association Study for Ovarian Cancer Susceptibility using Pooled DNA

Yi Lu 1, Xiaoqing Chen 1, Jonathan Beesley 1, Sharon E Johnatty 1, Anna deFazio 2,3; AOCS Study group, Sandrina Lambrechts 4, Diether Lambrechts 5,6, Evelyn Despierre 4, Ignace Vergotes 4, Jenny Chang-Claude 7, Rebecca Hein 7, Stefan Nickels 7, Shan Wang-Gohrke 8, Thilo Dörk 9, Matthias Dürst 10, Natalia Antonenkova 11, Natalia Bogdanova 11,12, Marc T Goodman 13, Galina Lurie 13, Lynne R Wilkens 13, Michael E Carney 14, Ralf Butzow 15, Heli Nevanlinna 15, Tuomas Heikkinen 15, Arto Leminen 15, Lambertus A Kiemeney 16,17,18, Leon FAG Massuger 19, Anne M van Altena 19, Katja K Aben 17,18, Susanne Krüger Kjaer 20, Estrid Høgdall 20, Allan Jensen 21, Angela Brooks-Wilson 22,23, Nhu Le 24, Linda Cook 25, Madalene Earp 23, Linda Kelemen 26, Douglas Easton 27, Paul Pharoah 27, Honglin Song 27, Jonathan Tyrer 27, Susan Ramus 28, Usha Menon 29, Alexandra Gentry-Maharaj 29, Simon A Gayther 28, Elisa V Bandera 30,31, Sara H Olson 31, Irene Orlow 31, Lorna Rodriguez-Rodriguez 30, Stuart Macgregor 1,†,#, Georgia Chenevix-Trench 1,#
PMCID: PMC3785301  NIHMSID: NIHMS509308  PMID: 22794196

Abstract

Recent genome-wide association studies (GWAS) have identified four low-penetrance ovarian cancer susceptibility loci. We hypothesized that further moderate or low penetrance variants exist among the subset of SNPs not well tagged by the genotyping arrays used in the previous studies which would account for some of the remaining risk. We therefore conducted a time- and cost-effective stage 1 GWAS on 342 invasive serous cases and 643 controls genotyped on pooled DNA using the high density Illumina 1M-Duo array. We followed up 20 of the most significantly associated SNPs, which are not well tagged by the lower density arrays used by the published GWAS, and genotyping them on individual DNA. Most of the top 20 SNPs were clearly validated by individually genotyping the samples used in the pools. However, none of the 20 SNPs replicated when tested for association in a much larger stage 2 set of 4,651 cases and 6,966 controls from the Ovarian Cancer Association Consortium. Given that most of the top 20 SNPs from pooling were validated in the same samples by individual genotyping, the lack of replication is likely to be due to the relatively small sample size in our stage 1 GWAS rather than due to problems with the pooling approach. We conclude that there are unlikely to be any moderate or large effects on ovarian cancer risk untagged by the less dense arrays. However our study lacked power to make clear statements on the existence of hitherto untagged small effect variants.


Genome-wide Association Studies (GWAS) have been an unprecedented success in identifying common alleles with moderate to small effects associated with different diseases and phenotypes. In particular, more than a hundred common, low-penetrance loci have been uncovered by GWAS of different cancers (Varghese & Easton, 2010). The discovery of the susceptibility loci will provide significant insights to cancer aetiology and an improved understanding of the mechanisms of tumour biology. In addition, loci associated with tumour progression after treatment will offer targets for therapeutic intervention, and the risk predictions based on accumulated knowledge of cancer genetics, together with environmental risk factors, will help to identify individuals with an elevated risk of cancer (Fletcher & Houlston, 2010). Although each of the common loci identified through GWAS only account for a small proportion of risk, collectively more than 20% of familial risk of prostate cancer has been explained, and ~7%, ~6% and ~5% of familial risk of lung cancer, colorectal cancer and breast cancer respectively can now be explained by GWAS results (Varghese & Easton, 2010). These estimates are likely to be conservative as the effects of the causal variants are typically larger than the associations detected through tag SNPs (Fletcher & Houlston, 2010).

Ovarian cancer is the seventh leading cause of cancer mortality among woman globally. Despite its relatively rare incidence, it has the same pattern of familial aggregation as other major cancers. Early twin studies have shown that most of the excess familial risk for ovarian cancer is due to genetic factors rather than shared environmental factors (Lichtenstein et al., 2000). It is well established that although rare mutations in BRCA1 and BRCA2, identified originally by linkage studies, are the most important genetic risk factors in terms of their high penetrance, they do not fully account for the excess ovarian cancer risk seen in families. To date, one large GWAS has been conducted on ovarian cancer susceptibility aiming to identify some of the remaining unexplained familial risk. This GWAS used relatively low-density Illumina 610K and 550K arrays for cases and controls respectively [4]. The confirmed susceptibility loci reaching genome-wide significance level (P<5×10-8) uncovered by this GWAS are at 9p22.2 near gene BNC2 (Song et al., 2009), 19p13.11 near C19orf62 (also known as MERIT40) (Bolton et al., 2010), 8q24 and 2q31; two other borderline significant loci at 3q25 and 17q21were also identified (Goode et al., 2010). In addition, a candidate gene study has implicated the TERT locus which has been found to contain susceptibility SNPs by many other cancer GWAS (Johnatty et al., 2010). All these loci confer small risks (per-allele relative risk less than 1.3), supporting the concept of a polygenic architecture underlying ovarian cancer susceptibility. As found in other cancer types, ovarian cancer also shows histological subtype variation. The associations identified so far are stronger for serous tumours than for other histological subtypes. To identify histology-specific risk loci, separate GWAS on different subtypes will be more powerful than a single GWAS including all subtypes. However, the high cost of GWAS limits the desirability of carrying out studies with individual genotyping for the less common subtypes, such as endometrioid, mucinous and clear cell ovarian cancers, which may not be well powered.

GWAS genotyping on pooled DNA has proved to be a time- and cost-effective alternative to conventional GWAS which individually genotype all the study subjects (Craig et al., 2009; Macgregor, Visscher, & Montgomery, 2006; Macgregor et al., 2008; Norton, Williams, O’Donovan, & Owen, 2004; Sham, Bader, Craig, O’Donovan, & Owen, 2002; Visscher & Le Hellard, 2003). In this study we conducted a pooled GWAS on serous ovarian cancer risk, followed by validation of the pooled results and genotyping SNPs of interest in an independent large dataset from the Ovarian Cancer Association Consortium. We used the high density Illumina 1M-Duo array containing 1.2 million SNPs for our pooled GWAS, because it has a superior coverage with 93% of common SNPs in the CEU population tagged at r2 ≥0.8. The aim of this study was two-fold: to test the hypothesis that common SNPs with moderate or low risks, which are not well tagged by the lower-density arrays (Illumina 550K and 610K arrays used in the previous GWAS), also account for some of the residual ovarian cancer risk; and to determine whether pooled GWAS can be effectively carried out on DNA quantified by spectrophotometry, as opposed to Picogreen absorption which we have used previously.

Materials and Methods

Ethics Statement

This study was conducted according to the principles expressed in the Declaration of Helsinki. The study was approved by the human ethics committee of Queensland Institute of Medical Research. All participants provided written informed consent.

Samples

We used samples from the Australian Ovarian Cancer Study (AOCS) for the pooled GWAS. AOCS ascertained ovarian cancer cases through the surgical treatment centres in Australia, and from the Cancer Registries of Queensland, South and Western Australia, New South Wales and Victoria, while controls were population-based and drawn from the Commonwealth Electoral roll (Burkey & Kanetsky, 2009). We selected 342 invasive serous cases and 643 controls for the pooled GWAS. All the study subjects were self-reported White with non-Hispanic origin. Age at diagnosis and age at interview was recorded for cases and controls respectively. Detailed clinical information was also available for ovarian cancer patients, including primary site of tumour, stage and grade, and overall survival time. Most of the DNAs had been isolated using salt extraction (Chang et al., 2009), but a subset had been isolated with Qiagen columns and so these DNAs were kept separate.

DNA concentration was measured before the pools were made by spectrophometry using a Nanodrop, and the samples were adjusted through serial dilution to 48-52 ng/μl. 2 μl of each DNA sample was combined for each pool and the final concentration verified by Nanodrop. The salt-extracted DNAs from 303 invasive serous cases were further divided into tertiles according to their overall survival time. We made a separate pool of 39 DNAs from unselected cases that were isolated with Qiagen columns. The controls were randomly placed in seven pools, each with a size of 90-91 samples. We then matched each of the three salt-extracted case sets with two of the control sets. The smaller Qiagen-extracted DNA pool was matched with the remaining control pool. Thus we had four comparisons of case versus control pools, where each individual case was matched to approximately two individual controls (Table 1).

Table 1.

Design of case-control pool comparisons

Pool
Comparison
Case-control statusa N Mean Age
(±sd)
P-value of mean age
difference between cases
and controls
1 Case (good survival) 101 58.9 (8.6) 0.3038
Control 182 57.7 (10.4)
2 Case (medium survival) 101 59.4 (10.5) 0.1223
Control 182 57.3 (11.2)
3 Case (poor survival) 101 62.2 (9.5) 2.5e-4
Control 181 57.3 (12.5)
4 Case (extracted by
Qiagen columns)
39 60.5 (9.9) 0.0288
Control 90 56.2 (10.8)
a

We stratified the cases in the three large pools by overall survival time. The small subset of 39 case DNAs isolated by Qiagen columns were kept together in one pool.

We used samples from 12 sites in the Ovarian Cancer Association Consortium (OCAC) in the replication stage (Table 2). We genotyped 13,779 samples in total, including 6,966 non-Hispanic White controls and 4,651 non-Hispanic White invasive cases, among which 2,245 were of serous histology.

Table 2.

Summary of OCAC samples used for the replication study

Controls
Cases
OCAC
site
Study Name Controls
(non-
Hispanic
White)
Cases
(non-
Hispanic
White)
Invasive
case
(non-
Hispanic
White)a
Invasive
serous
cases
(non-
Hispanic
White)b
AUSc Australia Ovarian Cancer
Study/Australian Cancer Study
576 (524) 1276 (1207) 921 502
BEL Belgium Ovarian Cancer Study 428 (428) 257 (253) 197 128
GER German Ovarian Cancer Study 420 (420) 252 (251) 223 105
HJO/HMO Hannover-Jena and Hannover-
Minsk Ovarian Cancer Study
913 (903) 467 (463) 246 70
HAW Hawaii Ovarian Cancer Study 625 (166) 417 (102) 81 43
HOCS Helsinki Ovarian Cancer Study 456 (456) 262 (262) 261 130
NTH Nijmegen Polygene Study &
Nijmegen Biomedical Study
599 (598) 305 (300) 297 101
MAL Danish Malignant Ovarian
Tumour Study
1075
(1075)
263 (263) 263 162
NJO New Jersey Ovarian Cancer
Study
189 (173) 200 (177) 176 105
OVA Ovarian Cancer in Alberta and
British Columbia Study
530 (460) 834 (706) 538 291
SEA UK SEARCH Ovarian Cancer
Study
1231
(1227)
1172 (1160) 972 377
UKO UK Ovarian Cancer Population
Study
542 (536) 490 (476) 476 231
Total 7584
(6966)
6195 (5620) 4651 2245
a

Cases eligible for secondary analysis in the replication.

b

Cases eligible for primary analysis in the replication.

c

AOCS cases and controls included in the pools using in stage 1 were excluded from analysis in the replication study.

Genotyping and Quality Control

All the DNA pools were genotyped on Illumina Human 1M-Duo arrays using standard protocols. All pools were genotyped in triplicate, with the exception of one control pool which was genotyped in quadruplicate. A number of quality control (QC) steps described elsewhere (Lu et al., 2010) were also applied here: 1.SNPs must have less than 10% negative intensity values on each pool; 2.The number of working probes for the SNP on each pool must be larger than 20; 3.The sum of raw red and green intensity values must be more than 1200; 4.Minor Allele Frequency (MAF) in the HapMap CEU samples is over 5%; 5.SNP must not present significant variance difference between case and control pools. A number of additional checks were also applied. 6. The differential amplification parameter of the SNP must be between 1/3 and 3. “Differential amplification” refers to a phenomenon that the alleles at a locus are unequally amplified, in these cases the allele frequency estimates are biased due to the imbalanced raw intensity value. However, the differential amplification cancels out to a good approximation when we assess the allele frequency difference between case and control pools. We discarded those SNPs with very extreme differential amplification (<1/3 or >3). This additional check is equivalent to discarding the SNPs with estimated allele frequencies that are very different from the reference samples, for example the HapMap CEU samples used here. 7. SNPs that passed QC for more than two pool pairs out of four were kept, because in general the more working pool pairs, the more reliable results are. 8. For the SNPs of interest, the proxies (linkage disequilibrium (LD) r2>0.7) must have similar association results as the underlying SNP. We applied stringent QCs to limit false positive results rising from pooling design. 914,948 SNPs were retained after the whole series of QC steps.

Individual genotyping for 20 SNPs selected from pooled GWAS was performed using MALDI-TOF spectrophotometric mass determination of allele-specific primer extension products using Sequenom’s MassARRAY system and iPLEX technology (Sequenom Inc.). The design of oligonucleotides was carried out according to the guidelines of Sequenom and performed using MassARRAY Assay Design software (version 4.0). Multiplex PCR amplification of amplicons containing SNPs of interest was performed using Qiagen HotStart Taq Polymerase on a Perkin Elmer GeneAmp 2400 thermal cycler with 5 ng genomic DNA. Primer extension reactions were carried out according to manufacturer’s instructions for iPLEX chemistry. Assay data were analysed using Sequenom TYPER software (Version 3.4). These SNPs passed standard quality control checks: 1. P-value for Hardy-Weinberg equilibrium (HWE) test ≥ 0.05 in both cases and controls; 2.Call rate>95%; 3.Concordance >98% between duplicate pairs (at least 5% per study site). One SNP (rs12078260) failed the HWE test in controls.

Statistical Methods and Analytic tools

In pooled GWAS, the allele frequencies on each locus were estimated from each pool, and then the differences of allele frequencies between each pair of case/control pool were assessed in the association test. Details of pooling data analysis were described elsewhere (Lu, et al., 2010). The four sets of association results from each pool pair were then meta-analysed, where the allele frequency difference between each set of case and control pools was weighted by its inverse variance (binomial variances in case and control pools plus pooling error variances) (Macgregor, et al., 2006; Macgregor, et al., 2008). A pooling program that incorporates the steps of estimating pooled allele frequency, mean normalization, quality controls and finally association test taking into account of pool-specific errors, has been developed for pooled GWAS. This program is available on request.

For individually genotyped data, the SNP association was assessed in a logistic regression model implemented in PLINK (Purcell et al., 2007). Assuming a log additive model of inheritance, the per-allele risk was estimated by fitting the number of rare alleles as continuous variable. We did not adjust for the age effect in the IG validation in order to allow for a direct comparison of pooled and individual genotyped results on the same AOCS samples. However, the age-adjusted results were similar (results not presented). In the replication stage, both age and study site were adjusted for in the logistic regression model.

Results

To reduce heterogeneity, the majority of invasive serous cases from the Australian Ovarian Cancer Study (AOCS) included in the pooled GWAS had tumours that originated in the ovary (except for one case whose tumour appeared to arise in the Fallopian tubes), and of high stage (>92% cases with FIGO stage III or IV) and grade (>99% cases with grade 2 or 3). Since age at diagnosis for cases is a predictor of overall survival time, age differences were observed in the comparison of cases with poor survival and controls, which were younger than these cases. A nominally significant difference in mean age was also found in the comparison of cases extracted using Qiagen columns and controls (Table 1). After carrying out extensive quality control, we tested the association for 914,948 SNPs on each comparison of case-control pools (see Samples in Materials and Methods section for details of pooling designs), and then meta-analysed the four sets of genome-wide association results using a standard weighting method in order to maximize statistical power.

We chose 20 SNPs from the pooled GWAS for individual genotyping (IG) in the same AOCS samples as a validation of pooled results. These SNPs were among the top ranked SNPs from the pooled GWAS that had evidence of association with ovarian cancer susceptibility, but none of them reached genome-wide significance. Moreover, they were selected for being in the subset of SNPs not well-tagged by Illumina 610K array, as one of our aims was to test the hypothesis that this pooled GWAS using denser SNP arrays could uncover additional risk SNPs not identified by the previous GWAS. These 20 SNPs were successfully genotyped for nearly all the AOCS samples included in the pooled GWAS (971 out of the 985 pooled samples), but one SNP failed quality control. Table 3 compares the odds ratios and P-values from pooled GWAS and IG validation results. Despite the slight difference in samples, good concordance was observed in odds ratio (OR) estimates, with all the risk directions in agreement in both sets of results. For 15/19 SNPs, the putative associations found in the pooled GWAS were clearly validated in IG results. Therefore, by comparing the results from pooled genotyping and individual genotyping on the same set of samples, we showed that GWAS using pooled DNA, quantified by spectrometry, has the potential to estimate allele frequencies accurately and to provide an efficient test of association.

Table 3.

Comparison of pooled GWAS and individual genotyping (IG) validation results for 19 SNPs in AOCS samples

SNP ID Chr Coordinate Allelesa Pooled GWAS
IG validation
OR SE P_poolb OR SE P_IG
rs7562599 2 153711404 [T/C] 0.595 0.122 2.14E-05 0.637 0.153 3.01E-03
rs10792844 11 85658476 [A/C] 0.463 0.190 4.94E-05 0.886 0.176 4.89E-01
rs1573110 10 9135501 [A/G] 1.744 0.141 8.46E-05 2.003 0.162 1.41E-05
rs17759746 2 28247797 [T/C] 0.596 0.135 1.23E-04 0.649 0.158 5.73E-03
rs8043748 16 11753732 [A/G] 0.686 0.098 1.28E-04 0.679 0.097 6.84E-05
rs17353424 8 107892457 [T/C] 0.638 0.117 1.28E-04 0.565 0.267 3.02E-02
rs7974375 12 117070081 [A/C] 1.608 0.125 1.43E-04 1.680 0.128 4.39E-05
rs10818911 9 125854025 [T/G] 0.627 0.125 1.91E-04 0.656 0.150 4.65E-03
rs4887515 15 85233018 [T/C] 0.610 0.133 1.95E-04 0.573 0.165 6.48E-04
rs1903532 4 179644249 [T/G] 0.648 0.116 1.98E-04 0.751 0.136 3.46E-02
rs11592097 10 2166806 [A/C] 0.645 0.121 2.85E-04 0.609 0.194 9.90E-03
rs2798823 14 94490125 [A/G] 0.687 0.108 4.76E-04 0.701 0.130 6.17E-03
rs2086545 11 13164955 [C/G] 1.450 0.109 6.55E-04 1.154 0.130 2.72E-01
rs2499834 1 160236260 [A/C] 0.627 0.139 7.79E-04 0.623 0.175 6.28E-03
rs1053495 10 71544054 [T/C] 0.611 0.148 8.66E-04 0.779 0.188 1.84E-01
rs1566198 18 11249831 [T/G] 1.510 0.126 1.05E-03 1.373 0.128 1.27E-02
rs16135 7 24294445 [A/G] 0.576 0.170 1.16E-03 0.769 0.197 1.82E-01
rs16899823 5 81992978 [T/C] 0.665 0.133 2.20E-03 0.638 0.197 2.16E-02
rs12027970 1 37537850 [A/C] 0.685 0.124 2.31E-03 0.609 0.213 1.89E-02
a

The first allele was the risk allele of the SNP.

b

The table was sorted by the strength of association found in pooled GWAS (P_pool).

In addition, we sought independent replication for these 19 SNPs by individually genotyping a total of 13,779 samples collected from 12 study sites in the Ovarian Cancer Association Consortium (OCAC) (Table 2). Among 4,651 eligible White invasive cases of non-Hispanic origin, the majority (>95%) were classified as having had the primary tumour in the ovary, as opposed to the Fallopian tube or peritoneum. Unlike the AOCS cases included in the pooled GWAS, the OCAC cases on the whole were evenly distributed over all tumour stages and grades (~54% in high stage and ~58% in high grade). Two sets of analyses were performed according to the histology: in the primary analysis we restricted to White non-Hispanic cases with the serous subtype, which allowed for a direct replication of the SNPs found in the first-stage GWAS on serous ovarian cancer cases; whereas in the secondary analysis we included cases with all histological subtypes, to determine whether these SNPs show association with ovarian cancer regardless of histological types. The association results adjusted for age and study site are presented in Table 4. The results showed no replication for any of the 19 SNPs in the analyses restricted to serous cases only (primary analysis) or in the analyses combining all histological subtypes (secondary analysis).

Table 4.

Replication results of 19 SNP from pooled GWAS by individual genotyping of 13,779 OCAC samples

SNP Chr Coordinate Allelesa Primary analysisc Secondary analysisc
Nb OR SE P Nb OR SE P
rs7562599 2 153711404 [T/C] 9185 1.06 0.05 0.26 11580 1.03 0.04 0.48
rs10792844 11 85658476 [A/C] 9188 0.98 0.06 0.76 11588 1.06 0.05 0.27
rs1573110 10 9135501 [A/G] 9184 1.00 0.06 0.96 11581 0.99 0.04 0.88
rs17759746 2 28247797 [T/C] 9175 1.03 0.06 0.62 11572 0.99 0.05 0.80
rs8043748 16 11753732 [A/G] 9178 1.03 0.04 0.40 11574 1.03 0.03 0.37
rs17353424 8 107892457 [T/C] 9198 0.89 0.09 0.24 11599 0.89 0.07 0.11
rs7974375 12 117070081 [A/C] 9147 0.98 0.05 0.61 11547 1.00 0.04 0.97
rs10818911 9 125854025 [T/G] 9194 1.00 0.05 0.96 11597 0.98 0.04 0.61
rs4887515 15 85233018 [T/C] 9175 0.98 0.06 0.77 11566 1.03 0.05 0.50
rs1903532 4 179644249 [T/G] 9167 1.08 0.05 0.17 11551 1.01 0.04 0.84
rs11592097 10 2166806 [A/C] 9152 0.97 0.06 0.60 11535 0.95 0.05 0.27
rs2798823 14 94490125 [A/G] 9184 0.95 0.05 0.34 11583 0.94 0.04 0.08
rs2086545 11 13164955 [C/G] 9184 1.03 0.05 0.57 11581 1.06 0.04 0.19
rs2499834 1 160236260 [A/C] 9152 0.93 0.06 0.20 11547 0.96 0.04 0.33
rs1053495 10 71544054 [T/C] 9200 1.01 0.06 0.93 11600 1.02 0.05 0.72
rs1566198 18 11249831 [T/G] 9193 1.03 0.05 0.44 11593 1.04 0.04 0.30
rs16135 7 24294445 [A/G] 9187 1.04 0.07 0.55 11589 1.02 0.05 0.76
rs16899823 5 81992978 [T/C] 9187 1.04 0.06 0.51 11586 1.07 0.05 0.14
rs12027970 1 37537850 [A/C] 9184 0.95 0.07 0.44 11583 0.97 0.05 0.59
a

Same risk alleles as listed in a.

b

The number of samples with non-missing genotypes for each SNP.

c

Primary analysis was restricted to serous cases; secondary analysis included all histological subtypes.

Discussion

To date, one ovarian cancer GWAS has revealed several SNPs associated with susceptibility. None of the identified loci showed large effects (OR: 0.76-1.30 depending on the histological subtype), but the study was well powered to find common alleles with moderate effects (Song, et al., 2009). In contrast, our study of pooled GWAS on serous ovarian cancer susceptibility was under-powered to detect the alleles with moderate effects due to the small sample size. In our pooled GWAS, the published risk SNPs, rs3814113 at 9p22.2, rs2072590 at 2q31 and rs2665390 at 3q25, showed similar ORs and in the same direction as previously reported (Goode, et al., 2010; Song, et al., 2009), but they reached nominal or borderline significance only (Table 5). The other three SNPs (rs8170 and rs2363956 at 19p13, rs10088218 at 8q24) identified by the published GWAS (Bolton, et al., 2010; Goode, et al., 2010) were not significantly associated with risk in our results and the 17q21 SNP rs9303542 (Goode, et al., 2010), was not on Illumina Human 1M-Duo array (Table 5). We found a similar OR for rs6504172 which is in high linkage disequilibrium with rs9303542 (r2=0.841), but this SNP was not significantly associated with risk (p=0.49). We therefore found no support for our hypothesis that additional common SNPs represented on the 1M-Duo arrays contribute to ovarian cancer risk, probably because of insufficient power in the first stage of this pooled GWAS.

Table 5.

Pooled GWAS results on the published loci known to be associated with serous ovarian cancer susceptibility

Position SNP ID Publication A1 Reported per-
allele ORa
Reported
p-value
Pooled per-
allele OR
Pooled
p-value
9p22.2 rs3814113 Song et al, (2009) C 0.77 (0.73-0.81) 4.1e-21 0.86 (0.72-1.05) 0.082
19p13 rs8170 Bolton et al, (2010) C 1.18 (1.12-1.25) 2.7e-09 0.97 (0.77-1.31) 0.175
rs2363956 G 1.16 (1.11-1.21) 3.8e-11 0.92 (0.78-1.12) 0.60
2q31 rs2072590 Goode et al, (2010) T 1.20 (1.14-1.25) 3.8e-14 1.21 (1.01-1.41) 0.03
3q25 rs2665390 C 1.24 (1.15-1.34) 7.1e-08 1.22 (0.93-1.79) 0.060
8q24 rs10088218 A 0.76 (0.70-0.81) 8.0e-15 1.28 (0.89-1.66) 0.882
17q21 rs9303542 G 1.14 (1.09-1.20) 1.4e-07 NAb NAb
a

For a direct comparison of results, reported per-allele ORs are the results from the published GWAS restricted to serous cases.

b

rs9303542 was not on Illumina Human 1Mduo array, but rs6504172 is in high linkage disequilibrium with rs9303542 (r2=0.841). It had a per-allele OR in the pooled GWAS of 1.10 (0.84-1.56) (P=0.49).

In the pooling design, we divided the serous ovarian cancer cases into four case pools according to the overall survival time and/or the method in which the DNAs were isolated. In theory it would be possible to test for association of SNPs with survival time by comparing good, medium and poor survival pools, but we would have even less power to detect reliable association with survival, so these results are not presented.

Although we were under-powered to locate any common SNPs with weak effects, our study had the potential to identify common SNPs with moderate to large effects on ovarian cancer risk if any existed. A notable example in cancer genetics is the common variant in the KITLG with a per allele risk of 2.5 for testicular cancer, which was identified from an initial GWAS on ~300 cases and ~900 controls (Kanetsky et al., 2009). This empirical example suggests that although most loci exhibit smaller effect sizes, common SNPs with moderate to large effect do exist in cancer genetics and therefore it is of interest to test similar hypothesis in different cancer types. GWAS genotyping on pooled DNA does not suffer from substantial power loss compared to a conventional study using individual genotyping. For example in our pooled GWAS, assuming an additive effect risk allele with 20% frequency that confers a relative risk of 2, power was 80% even after scaling the original sample size by 10% in order to account for the additional variance due to pooling errors (Macgregor, et al., 2008), in comparison with 88% power using individual genotyping. An empirical study with examples of successful identification of the known variants including the eye colour locus at OCA2/HERC2 (15q11.2-q12), the age-related macular degeneration locus at CFH (1q32), and the locus for Pseudoexfoliation syndrome at LOXL1 (15q22) clearly showed that common alleles with large effects are not likely to be missed in the pooled GWAS (Craig, et al., 2009). Therefore, our results suggest that there are probably not hitherto poorly tagged common SNPs with moderate to large effects still to be identified.

This study also demonstrates that it is not always necessary to measure DNA concentration by Picogreen absorption, prior to making the DNA pools. At least for the set of DNAs we used, which were largely isolated by salt-extraction, this study demonstrated highly consistent results between pooled genotyping and individual genotyping. However, it is worth noting that we have previously found that the correlation between the concentrations measured by Nanodrop spectroscopy and Picogreen adsorption is high (r2=0.5107) for a related set of 200 DNAs.

It should be noted that lack of replication in this study was not because of the problems in the DNA pooling method. Because additional errors such as pool construction errors and pool measurement errors could be involved (Sham, et al., 2002), we have implemented careful experiments and rigorous analysis to address this concern. Firstly we performed careful experiments to ensure the equal quantity of DNA contributed by individual samples during the formation of the pools; secondly, all the pools were genotyped at least three times to yield better allele frequency estimates, and we applied stringent QCs to limit the number of possible false positives; lastly, we accounted for the additional variance due to pooling errors in the association tests. We also validated the pooling results using individual genotyping. Given that most of the top 20 SNPs from pooled GWAS were validated in the same samples by individual genotyping, the lack of replication is most likely to be due to the relatively small sample size in our stage 1 GWAS rather than due to problems with the pooling approach.

In order to improve power for GWAS using pooled DNA, larger sample sizes and higher density microarrays are required. However, to accommodate a large sample size properly, a balance between statistical power and the accuracy of the allele frequency estimates is needed. A number of empirical studies have investigated the impact of pool size (up to 1,000 samples in the pool) on the accuracy of allele frequency estimate, and usually found no obvious relationship between pool size and the accuracy of allele frequency estimation (Jawaid & Sham, 2009; Le Hellard et al., 2002; Macgregor, 2007). As indicated in Macgregor et al (2007), most variation from pooled DNA genotyping is attributable to array error rather than pool construction error (Macgregor, 2007). Therefore constructing large pools is not likely to yield a great loss of power. The optimal pool design for a limited research budget will be a few large pools which are then genotyped multiple times. One major criticism to pooled GWAS is that there is no information on individual genotypes or linkage disequilibrium information, so it is generally impossible to impute missing genotypes, to evaluate haplotypes or to fine map the regions of interest. However, given the cost advantages, more expensive SNP arrays with better coverage can be used to partially compensate for the power loss due to the imperfect linkage disequilibrium between the genetic markers and causal variants. Furthermore, fine mapping of loci identified by GWAS is usually performed in a second or third stage of genotyping once the loci have been confirmed in additional samples. Here we have investigated the use of dense Illumina 1M-Duo arrays in locating variants that were poorly tagged by previous arrays. We found that moderate to large effects on ovarian cancer risk are unlikely to exist among the SNPs on this array, but we are not able to make a clear statement about the possible existence of additional SNPs with small effects because of the limited study power.

In summary, we have carried out a pooled GWAS on 342 invasive serous cases and 643 controls. The accuracy of estimated odds ratios was then validated by individually genotyping the same subjects as included in the pool. We showed that pooled genotyping using DNAs quantified by Nanodrop spectroscopy, together with the analytical tools for pooled data, work well in terms of achieving accurate odds ratio estimations and providing reasonable association signals. We therefore propose that pooled GWAS be used for rarer subtypes of cancer or orphan diseases where research funds are limited. In addition, we have developed an analytical tool for analysing pooled GWAS data, which will be available on request.

Acknowledgement

The Australian Ovarian Cancer Study (AOCS) Management Group (D. Bowtell, G. Chenevix-Trench, A. deFazio, D. Gertig, A. Green and P.M. Webb) gratefully acknowledges the contribution of all the clinical and scientific collaborators (see http://www.aocstudy.org/). The Australian Cancer Study Management Group (A. Green, P. Parsons, N. Hayward, P.M. Webb, and D. Whiteman) thank all of the project staff, collaborating institutions and study participants. NJ Ovarian Cancer Study (NJO) investigators (EV Bandera, SH Olson, I Orlow, L Rodriguez-Rodriguez) would like to thank Melony Williams-King and the staff at the New Jersey State Cancer Registry, in particular, Lisa Paddock. BELOCS investigators would like to thank Gilian Peuteman, Thomas Van Brussel and Dominiek Smeets for technical assistance. SEARCH investigators thank the study participants, the Eastern Cancer Registration and Information Centre, the collaborating general practitioners, the SEARCH team for patient recruitment, and Caroline Baynes, Don Conroy and Craig Luccarini for sample preparation. The German Ovarian Cancer Study (GER) thanks Ursula Eilber and Tanja Koehler for competent technical assistance. The Hannover-Jena Ovarian Cancer Study (HJO) gratefully acknowledges the contribution of our clinical collaborators Frauke Kramer, Wen Zheng, Peter Hillemanns and Ingo Runnebaum. HAWAII investigators thank all study participants and all members of the research teams of all the participating studies, including research nurses, research scientists, data entry personnel and consultant gynecological oncologists.

Grant Support: Y. Lu is partly supported by Australian National Health and Medical Research Council (NHMRC) grant 496675. S. Macgregor is supported by a Career Development Award from the NHMRC. NJO was funded by grants from the U.S. National Cancer Institute (K07CA095666, R01CA83918, and K22CA138563) and the Cancer Institute of New Jersey. BELOCS was funded by the Nationaal Kankerplan initiatief 29. SEARCH is funded by a programme grants from Cancer Research UK (A490/10119 and A490/10124). The German Ovarian Cancer Study (GER) was supported by the German Federal Ministry of Education and Research of Germany, Programme of Clinical Biomedical Research (01 GB 9401); genotyping in part by the state of Baden-Württemberg through the Medical Faculty, University of Ulm (P.685); and data management by the German Cancer Research Center. HAWAII is funded by the U.S. National Institutes of Health (R01 CA58598, N01-CN-55424, N01-PC-67001, N01-PC-95001-20). UKO is Funded by Cancer Research UK, the Eve Appeal, the OAK Foundation and the UK Department of Health’s NIHR UCL/UCLH Biomedical Research Centre funding scheme.

References

  1. Bolton KL, Tyrer J, Song H, Ramus SJ, Notaridou M, Jones C, Gayther SA. Common variants at 19p13 are associated with susceptibility to ovarian cancer. Nat Genet. 2010;42(10):880–884. doi: 10.1038/ng.666. doi: ng.666 [pii] 10.1038/ng.666. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Burkey AR, Kanetsky PA. Development of a novel location-based assessment of sensory symptoms in cancer patients: preliminary reliability and validity assessment. J Pain Symptom Manage. 2009;37(5):848–862. doi: 10.1016/j.jpainsymman.2008.05.013. doi: S0885-3924(08)00566-6 [pii] 10.1016/j.jpainsymman.2008.05.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Chang YM, Newton-Bishop JA, Bishop DT, Armstrong BK, Bataille V, Bergman W, Barrett JH. A pooled analysis of melanocytic nevus phenotype and the risk of cutaneous melanoma at different latitudes. Int J Cancer. 2009;124(2):420–428. doi: 10.1002/ijc.23869. doi: 10.1002/ijc.23869. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Craig JE, Hewitt AW, McMellon AE, Henders AK, Ma LJ, Wallace L, MacGregor S. Rapid inexpensive genome-wide association using pooled whole blood. Genome Research. 2009;19(11):2075–2080. doi: 10.1101/gr.094680.109. doi: 10.1101/gr.094680.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Fletcher O, Houlston RS. Architecture of inherited susceptibility to common cancer. Nat Rev Cancer. 2010;10(5):353–361. doi: 10.1038/nrc2840. doi: nrc2840 [pii] 10.1038/nrc2840. [DOI] [PubMed] [Google Scholar]
  6. Goode EL, Chenevix-Trench G, Song H, Ramus SJ, Notaridou M, Lawrenson K, Pharoah PD. A genome-wide association study identifies susceptibility loci for ovarian cancer at 2q31 and 8q24. Nat Genet. 2010;42(10):874–879. doi: 10.1038/ng.668. doi: ng.668 [pii] 10.1038/ng.668. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Jawaid A, Sham P. Impact and Quantification of the Sources of Error in DNA Pooling Designs. Annals of Human Genetics. 2009;73:118–124. doi: 10.1111/j.1469-1809.2008.00486.x. doi: 10.1111/j.1469-1809.2008.00486.x. [DOI] [PubMed] [Google Scholar]
  8. Johnatty SE, Beesley J, Chen X, Macgregor S, Duffy DL, Spurdle AB, Chenevix-Trench G. Evaluation of candidate stromal epithelial cross-talk genes identifies association between risk of serous ovarian cancer and TERT, a cancer susceptibility “hot-spot”. PLoS Genet. 2010;6(7):e1001016. doi: 10.1371/journal.pgen.1001016. doi: 10.1371/journal.pgen.1001016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Kanetsky PA, Mitra N, Vardhanabhuti S, Li M, Vaughn DJ, Letrero R, Nathanson KL. Common variation in KITLG and at 5q31.3 predisposes to testicular germ cell cancer. Nat Genet. 2009;41(7):811–815. doi: 10.1038/ng.393. doi: ng.393 [pii] 10.1038/ng.393. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Le Hellard S, Ballereau SJ, Visscher PM, Torrance HS, Pinson J, Morris SW, Evans KL. SNP genotyping on pooled DNAs: comparison of genotyping technologies and a semi automated method for data storage and analysis. Nucleic Acids Research. 2002;30(15) doi: 10.1093/nar/gnf070. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Lichtenstein P, Holm NV, Verkasalo PK, Iliadou A, Kaprio J, Koskenvuo M, Hemminki K. Environmental and heritable factors in the causation of cancer--analyses of cohorts of twins from Sweden, Denmark, and Finland. N Engl J Med. 2000;343(2):78–85. doi: 10.1056/NEJM200007133430201. doi: 10.1056/NEJM200007133430201. [DOI] [PubMed] [Google Scholar]
  12. Lu Y, Dimasi DP, Hysi PG, Hewitt AW, Burdon KP, Toh T, Mackey DA. Common genetic variants near the Brittle Cornea Syndrome locus ZNF469 influence the blinding disease risk factor central corneal thickness. PLoS Genet. 2010;6(5):e1000947. doi: 10.1371/journal.pgen.1000947. doi: 10.1371/journal.pgen.1000947. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Macgregor S. Most pooling variation in array-based DNA pooling is attributable to array error rather than pool construction error. European Journal of Human Genetics. 2007;15(4):501–504. doi: 10.1038/sj.ejhg.5201768. doi: 10.1038/sj.ejhg.5201768. [DOI] [PubMed] [Google Scholar]
  14. Macgregor S, Visscher PM, Montgomery G. Analysis of pooled DNA samples on high density arrays without prior knowledge of differential hybridization rates. Nucleic Acids Research. 2006;34(7) doi: 10.1093/nar/gkl136. doi: 10.1093/nar/gkl136. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Macgregor S, Zhao ZZ, Henders A, Martin NG, Montgomery GW, Visscher PM. Highly cost-efficient genome-wide association studies using DNA pools and dense SNP arrays. Nucleic Acids Research. 2008;36(6) doi: 10.1093/nar/gkm1060. doi: 10.1093/nar/gkm1060. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Norton N, Williams NM, O’Donovan MC, Owen MJ. DNA pooling as a tool for large-scale association studies in complex traits. Annals of Medicine. 2004;36(2):146–152. doi: 10.1080/07853890310021724. doi: 10.1080/07853890310021724. [DOI] [PubMed] [Google Scholar]
  17. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, Sham PC. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81(3):559–575. doi: 10.1086/519795. doi: S0002-9297(07)61352-4 [pii] 10.1086/519795. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Sham P, Bader JS, Craig I, O’Donovan M, Owen M. DNA pooling: A tool for large-scale association studies. Nature Reviews Genetics. 2002;3(11):862–871. doi: 10.1038/nrg930. doi: 10.1038/nrg930. [DOI] [PubMed] [Google Scholar]
  19. Song H, Ramus SJ, Tyrer J, Bolton KL, Gentry-Maharaj A, Wozniak E, Gayther SA. A genome-wide association study identifies a new ovarian cancer susceptibility locus on 9p22.2. Nat Genet. 2009;41(9):996–1000. doi: 10.1038/ng.424. doi: ng.424 [pii] 10.1038/ng.424. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Varghese JS, Easton DF. Genome-wide association studies in common cancers--what have we learnt? Curr Opin Genet Dev. 2010;20(3):201–209. doi: 10.1016/j.gde.2010.03.012. doi: S0959-437X(10)00046-8 [pii] 10.1016/j.gde.2010.03.012. [DOI] [PubMed] [Google Scholar]
  21. Visscher PM, Le Hellard S. Simple method to analyze SNP-based association studies using DNA pools. Genetic Epidemiology. 2003;24(4):291–296. doi: 10.1002/gepi.10240. doi: 10.1002/gepi.10240. [DOI] [PubMed] [Google Scholar]

RESOURCES