Introduction
Early efforts to localize and identify genes that contribute to risk of common chronic diseases often used either candidate gene studies or family-based linkage studies, which suffered from low statistical power, lack of replication, and low precision (1). Although there were successes, progress was generally slow. Recently, genome-wide association studies (GWAS) have proven to be productive when they have adequate sample sizes and replication opportunities. Their primary aim is to identify novel genetic loci associated with inter-individual variation in the levels of risk factors, the measure of subclinical disease, or the risk of clinical events. The method does not require assumptions about a priori biologic involvement, is precise in its ability to localize genetic effects to relatively small regions of the genome, and can be extended to evaluate potential gene-environment interactions.
GWAS have successfully identified genetic loci associated with a variety of conditions such as type 2 diabetes (2) and coronary disease (3–5). The large number of statistical tests required in GWAS poses a special challenge because few studies that have DNA and high-quality phenotype data are sufficiently large to provide adequate statistical power for detecting small to modest effect sizes (6). Meta-analyses combining previously published findings have improved the ability to detect new loci (2). Even before the era of GWAS (7), the requirement for large sample sizes and the importance of replication have served as powerful incentives for collaboration.
Our understanding of the risk factors for common chronic diseases has benefited from large population-based cohort studies. Although these studies are costly and time consuming, they are generally free of the survival and recall biases typically encountered in case-control studies. The cohort design, with its prospective standardized data collection, is often the preferred method for estimating disease incidence and evaluating risk factors. The Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Consortium was formed to facilitate GWAS meta-analyses and replication opportunities among multiple large and well-phenotyped cohort studies. The design of the CHARGE Consortium includes five prospective cohort studies from the US and Europe: the Age, Gene/Environment Susceptibility (AGES)--Reykjavik Study, the Atherosclerosis Risk in Communities (ARIC) Study, the Cardiovascular Health Study (CHS), the Framingham Heart Study (FHS), and the Rotterdam Study (RS).
With genome-wide data on about 38,000 individuals, these cohort studies have a large number of phenotypes measured in a similar way, and a prospective meta-analysis of within-study association data from the 5 studies, with a properly selected level of genome-wide significance, is a powerful approach to finding genuine phenotypic associations with novel genetic loci. The CHARGE Consortium provides a unique opportunity for collaborative investigation of the genetic determinants of risk factors, measures of subclinical disease, and clinical events.
Design
Cohorts
Participating studies (8–14) were prospective cohort studies that had multiple cardiovascular and aging phenotypes in common and that had genome-wide scans completed or in progress in 2007–2008 (Table 1). Briefly, the AGES-Reykjavik Study represents a sample drawn from the established population-based cohort, the Reykjavik Study (8). The original Reykjavik Study comprised a random sample of 30,795 men and women living in Reykjavik in 1967 and born between 1907 and 1935. Over the years 1967–1996, 6 examinations were conducted in 6 subcohorts. Between 2002 and 2006, the AGES-Reykjavik study re-examined 5764 survivors of the original cohort. The ARIC study is a population-based prospective cohort study of cardiovascular disease and its risk factors sponsored by National Heart, Lung, and Blood Institute (NHLBI). ARIC included 15,792 individuals aged 45–64 years at baseline (1987–89), chosen by probability sampling from four US communities (9). Cohort members completed four clinic examinations, conducted three years apart between 1987 and 1998. Follow-up for clinical events is annual. The CHS is a population-based NHLBI-funded cohort study of risk factors for cardiovascular disease in adults 65 years of age or older conducted at four field centers (10). The original predominantly white cohort of 5201 persons was recruited in 1989–1990 from random samples of the Medicare lists. An additional 687 African-Americans were enrolled in 1992–93. CHS participants completed standardized clinical examinations and questionnaires at study baseline and at nine annual follow-up visits. Follow-up for clinical events occurs every 6 months. The FHS began in 1948 with the recruitment of an original cohort of 5209 men and women who were 28 to 62 years of age at entry (11). Clinic examinations were performed approximately every two years. In 1971, a second generation of study participants, 5124 children and spouses of children of the original cohort were enrolled (12). With two exceptions, clinic examinations took place approximately every four years. Enrollment of the third generation cohort of 4095 children of offspring cohort participants began in 2002 (13). The RS is a prospective population-based cohort study comprising 7983 subjects aged 55 years or older. A trained interviewer visited the individuals at home for a computerized questionnaire, and individuals were subsequently examined at a research center. Baseline data were collected between 1990 and 1993 (14). The original cohort underwent 3 additional examinations. In 2000–2001, an additional 3011 individuals aged 55 or older (mainly 55–64) were recruited and examined. Since 2006, an additional cohort of individuals aged 45 years or older (mainly 45–59 years) is being recruited, comprising 3236 subjects as of May 1, 2008. All of the CHARGE cohort studies were approved by their respective institutional review committee, and the subjects from all the cohorts provided written informed consent.
Table 1.
AGES | ARIC | CHS | FHS | RS | |
---|---|---|---|---|---|
Baseline exam | 1967–91 | 1987–89 | 1989–90 | 1948 | 1990–93 |
1991–96 | 1992–93 | 1971 | 2000–01 | ||
2002 | 2006–08 | ||||
N | 19,381 | 15,792 | 5888 | 14,428 | 14,626* |
Mean age, yr | 53 | 54 | 72 | 40 | 66 |
Women, % | 52 | 55 | 57 | 53 | 59 |
Race/ethnicity, % | |||||
Caucasian | 100 | 73 | 84 | 100 | 91 |
African American | 0 | 27 | 16 | 0 | 0 |
Other | 0 | 0 | 0 | 0 | 9 |
Sites, N | 1 | 4 | 4 | 1 | 1 |
GWAS plans | |||||
Date of DNA | 2002–2006 | 1987–94 | 1989–93 | 1971–2002 | 1990–2006 |
Other criteria | None | None | CVD at baseline | None | None |
Eligible, N | 3664 | 15,637 | 3980 | 9274 | 11,689 |
GWAS data available | |||||
Subjects, N | 3219 | 11,433 | 3865 | 8482 | 10,958 |
Mean age, yr | 77 | 54 | 72 | 55 | 65 |
Women, % | 58 | 55 | 61 | 53 | 58 |
Race/ethnicity, % | |||||
Caucasian | 100 | 77 | 85 | 100 | 94 |
African American | 0 | 23 | 15 | 0 | 0 |
Other | 0 | 0 | 0 | 0 | 6 |
Event follow-up Through | 6/2008 | 12/2005 | 6/2006 | 12/2007 | 1/2007 |
Successfully genotyped SNPs, N (autosomes) | 325,094 | 869,224 | 306,655 | 502,197 | 530,683 |
Total SNPs, N | 2,533,153 | 2,837,224 | 2,543,877 | 2,540,223 | 2,543,877 |
Abbreviations: AGES, Age, Gene/Environment Susceptibility (AGES)--Reykjavik Study; ARIC, The Atherosclerosis Risk in Communities Study; CHS, The Cardiovascular Health Study; FHS, The Framingham Heart Study; RS, the Rotterdam Study; N = number. Total SNPs = sum of successfully genotyped plus successfully imputed using the criteria in Table 3.
Number recruited through May 1, 2008.
Relationships among the studies
Each cohort study has its own administrative structure and set of investigators. Although investigators from several cohorts had occasionally collaborated on analyses, there was little precedent for consortia of cardiovascular epidemiology cohorts. In late 2007, it became clear that because all cohorts shared both a common prospective population-based design and a large number of phenotypes assessed by similar data-collection methods (Table 2), a cohort-level collaboration would facilitate a series of prospectively planned joint meta-analyses. The resulting CHARGE consortium represents a voluntary federation of 5 large complex studies. Between October 2007 and February 2008, the principles and procedures for the CHARGE consortium were developed and approved by the parent studies (public website: http://web.chargeconsortium.com).
Table 2.
Selected phenotype | AGES | ARIC | CHS | FHS | RS |
---|---|---|---|---|---|
Incident events | |||||
MI | A | A | A | A | A |
Stroke | A | A | A | A | A |
TIA | A | A | A | A | A |
Heart failure | A | A | A | A | A |
PVD | ND | A | A | A | A |
Mortality/Longevity | A | A | A | A | A |
Dementia | A | A | A | A | A |
Subclinical measures | |||||
Electrocardiography | MV | MV | MV | MV | MV |
Holter monitor | ND | ND | MV | MV | ND |
Echocardiography | P | P | MV | MV | SV |
Cardiac MRI | P | ND | ND | SV | ND |
Carotid IMT | SV | MV | MV | SV | MV |
Cerebral MRI | SV | P | MV | SV | SV |
Coronary calcium | SV | P | P | SV | SV |
Ankle-brachial index | ND | MV | MV | MV | MV |
Abd aortic ultrasound | ND | ND | SV | ND | SV |
Bone density | SV | ND | P | MV | MV |
Spine-hip CT | SV | ND | ND | SV | ND |
Endothelial function | ND | ND | SV | SV | ND |
Vessel wall stiffness | SV | SV | ND | SV | SV |
Pulmonary function | P | MV | MV | MV | SV |
Sleep studies | SV | P | P | P | SV |
Retinal photography | SV | MV | SV | SV | MV |
Traditional risk factors | |||||
Diabetes | MV | MV | MV | MV | MV |
Hypertension | MV | MV | MV | MV | MV |
Atrial fibrillation | MV | MV | MV | MV | MV |
Blood pressure | MV | MV | MV | MV | MV |
Fasting lipids | MV | MV | MV | MV | MV |
Fasting glucose | MV | MV | MV | MV | MV |
Glucose tolerance test | MV | SV | SV | MV | ND |
Behaviors (smoking) | P | MV | MV | MV | MV |
Medication use | MV | MV | MV | MV | MV |
Height, weight | MV | MV | MV | MV | MV |
Cognitive function | SV | MV | MV | MV | MV |
Depression | SV | MV | MV | MV | MV |
Quality of life | SV | ND | MV | MV | SV |
Physical activity | MV | MV | MV | MV | SV |
Renal function | MV | MV | MV | MV | MV |
Biomarkers | MV | MV | MV | MV | MV |
Abbreviations: AGES, Age, Gene/Environment Susceptibility (AGES)--Reykjavik Study; ARIC, The Atherosclerosis Risk in Communities Study; CHS, The Cardiovascular Health Study; FHS, The Framingham Heart Study; RS, the Rotterdam Study; MI = myocardial infarction; TIA = transient ischemic attack; PVD = peripheral vascular disease; IMT = intimal medial thickness; MRI = magnetic resonance imaging; CT = computed tomography; A = assessed; MV = measures at multiple visits; SV = measures at single visit; and P = measured once on part of the cohort; ND = not done.
CHARGE goals and organization
The primary aim is the conduct of high-quality analyses that produce, in an efficient and timely manner, reliable and valid findings across multiple cardiovascular and aging-related phenotypes. The organizational structure is simple and comprises a Research Steering Committee (RSC), an Analysis Committee, a Genotyping Committee, and approximately 20 phenotype-specific working groups. The RSC, which has 2 representatives from each cohort, is responsible for establishing the other committees, for nominating working group members, and for developing general guidelines for collaboration, authorship, sharing of results, publication, and timely participation. The Analysis Committee develops guidelines that the working groups are encouraged to adopt or adapt, and the Genotyping Committee coordinates requests for follow-up genotyping.
The main scientific work takes place in the phenotype-specific working groups, which have responsibility for developing and executing the scientific plans. Working groups standardize phenotypes across the cohorts, decide whether and how to include other non-member studies with similar phenotypes, and agree on analysis plans, often with input from the Analysis Committee. The phenotype working groups also develop plans for authorship and manuscripts, evaluate results, write manuscripts, and decide on the need for follow-up genotyping.
For each manuscript, the working-group investigators establish pre-specified plans for analysis and timelines for participation. Before results are shared, each cohort must formally opt-in or opt-out of participation. For any phenotype, each cohort may work with other studies or consortia rather than CHARGE, and individual cohorts remain free to publish cohort-specific findings for any phenotype. The decision to opt-in represents a commitment to collaborate only with the CHARGE working group for that particular analysis until the manuscript is accepted for publication. Only investigators from cohorts that have opted-in have access to shared results. After the results have been shared, investigators cannot opt out to publish their findings on their own. Working group members agree not to share the GWAS findings with outside groups without the permission of the members who generated the data. Transparency, disclosure, and communications about all collaborations, additional follow-up experiments, or efforts to obtain additional funding have been essential to developing, ensuring, and maintaining trust within the consortium.
In practice, many of the CHARGE phenotype working groups have already engaged investigators from non-member studies as collaborators, including at least a dozen other studies from the US and Europe. Collaborating non-member studies either agree to the overall CHARGE principles, or the CHARGE working group develops and negotiates a new CHARGE-compatible agreement with the non-member studies or consortia.
Using traditional authorship criteria (15), the CHARGE RSC encourages the designation of multiple co-equal first and last authors so that the authorship matches the scientific contributions of conducting and coordinating analyses from five complex studies. Special efforts are made to provide opportunities for young investigators. The original CHARGE consortium agreement calls for posting shared results on a public website once a manuscript is published in a journal. Recent change in the NIH GWAS policy may affect this plan (16). The consortium remains committed to the NIH GWAS policy on intellectual property (17).
Genotyping methods
The CHARGE consortium was developed after each cohort study had contracted for their genotyping platforms and decided on the selection of the individuals to be included in the GWAS. Indeed, the five cohorts used four different platforms (Table 3), which have fewer than about 60,000 SNPs in common. To maximize the availability of comparable genetic data and coverage of the genome, each cohort used recently developed methods (18,19) to impute for Europeans and European Americans their genotypes at each of the 2.5 million autosomal CEPH HapMap SNPs. Prior to imputation, individuals were excluded for low call rates or sex mismatches (Table 3). Next, criteria such as high levels of missingness, highly significant departures from Hardy-Weinberg equilibrium, or low minor allele frequencies (MAF) were used to determine which SNPs to include in the imputation step. All the remaining individuals and SNPs entered the imputation process, which provided estimates for all the HapMap SNPs, including any that may have failed the data-cleaning criteria.
Table 3.
Study | AGES | ARIC | CHS | FHS | RS |
---|---|---|---|---|---|
Array type | Illumina 370CNV | Affymetrix 6.0 | Illumina 370CNV | Affymetrix 500K and MIPS 50K combined | version 3 Illumina Infinium II HumanHap550 SNP chip array |
Genotyping center | NIA Genetic Laboratory | Broad Institute | Cedars-Sinai Medical Center | Affymetrix Core Laboratory | Erasmus Medical Center |
Genotype calling | Illumina BeadStudio | Birdseed | Illumina BeadStudio | BRLMM | Beadstudio v.3.1.14 |
Exclusion on SNPs used for imputation | call rate < 97%, HWE p<1e-6, MAF <1%, Mishap p < 1e-9, A/T and G/C SNPs, Mismatches between Illumina, dbSNP and/or HapMap position | call rate <90% MAF < 1% pHWE < 10−6 |
call rate <= 97% HWE p< 10−5 2 replicate errors or Mendelian inconsistencies(for reference CEPH trios) heterozygote frequency =0, not in HapMap |
call rate < 97% HWE p<1e-6 Mishap p < 1e-9 Mendelian errors > 100 |
Call rate <90% No MAF/HWE filter |
Exclusion on a per sample basis | Sex mismatch, Sample failure, Genotype mismatch with reference panel | call rate < 95%, sex mismatch, 1st degree relatives, genotype mismatch with reference panel, outliers based on IBS clustering or Eigenstrat | call rate < 95% sex mismatch sample failure | Call rate < 97%, subject heterozygosity > 5 SD away from the mean, large Mendelian error rate | call rate<97.5% sex mismatch, excess autosomal heterozygosity >0.336, outliers identified by the IBS clustering analysis |
Imputation | MACH (version 1.0.16 | Mach1 v1.0.16 | BIMBAM10 v0.99 | MACH (version 1.0.15) | Mach 1.0 |
Imputation Backbone (NCBI build) | Hapmap release 22 CEU(build 36) | July 2006, phased haplotypes, HapMap release 21 (build 35) | HapMap CEU using release 21A, build 36 | HapMap (release 22, build 36, CEU) | HapMap release 22 CEU(build 36) |
Data handling and statistical tests | PLINK and R | PLINK, Mach2QTL | R | R packages kinship, gee, coxph | Plink, Probable, Mach2QTL, |
Abbreviations: AGES, Age, Gene/Environment Susceptibility (AGES)--Reykjavik Study; ARIC, The Atherosclerosis Risk in Communities Study; CHS, The Cardiovascular Health Study; FHS, The Framingham Heart Study; RS, the Rotterdam Study; MAF, minor allele frequency; HWE, Hardy-Weinberg equilibrium.
The ratio of the observed dosage variance to the expected binomial variance, the dosage-variance ratio, has proved to be useful metric of imputation quality. To assess accuracy of imputed genotypes, cohorts compared the imputation output to SNPs that had been previously genotyped on other platforms and that had not been used in the imputation process. In an internal analysis that compared the imputed SNPs to the actually genotyped SNPs in the RS, the mean concordance (number of concordant individuals/total number of individuals) between the imputed and the genotyped SNPs was 0.989 for imputed SNPs with a dosage-variance ratio >= 0.9. For ratios between 0.5 and 0.9, the concordance was 0.937; and for ratios <= 0.5, it was 0.889. Validation efforts produced similar results in other cohorts.
Analysis methods
The CHARGE Analysis Committee developed a set of general plans as guidelines for all working groups. The issues include quality control of genotype data, decisions about what results to share across cohorts, formats for sharing data, strand alignments, coding of alleles, choice of covariates for adjustments, detection of and correction for population structure, within-study phenotype analysis plans, between-study meta-analysis methods, and the importance of written analysis plans prior to sharing the results. The goal was to provide a flexible plan that could be adapted or adopted by working groups. For each stage in the analysis, there are several valid options available, and some are summarized briefly.
Special features of the CHARGE consortium are the large overall sample size, the population-based recruitment of cohort members, the standardized methods of data collection, and the prospective follow-up for clinical events. In case-control studies, it is not usually possible to obtain DNA from fatal cases. With the cohort design, DNA is generally available for all events, including the fatal ones; and failure-time models are recommended for associations with incident disease.
For most traits, the additive or the 1 degree-of-freedom regression model is used to assess the association between the phenotype and the number of copies of a specified allele. For many patterns of ‘true’ associations, tests derived from this model have good power compared with other approaches (20). The single regression coefficient is readily interpreted and easily used in meta-analysis. When imputed genotypes are used, the observed allele count is simply replaced by the imputation’s “estimated dosage.” Standard errors for the regression estimates are usually calculated with model-robust (‘sandwich’) methods. Routine adjustment is anticipated for age and sex though specific studies may also adjust for site (CHS, ARIC), for family relationships (FHS), or for cohort (FHS, RS). When necessary, principal components analysis is used to correct for within-study population structure (21). Additionally, the method of genomic control is used to correct both within-study and meta-analyzed GWAS results for possible stratification (22).
For the additive model, the regression coefficients estimate the difference in phenotype associated with each extra copy of the minor allele. Due to low power and potentially misleading results, meta-analyses are not reported for those SNPs for which the MAF or the effective sample size, across CHARGE, is too small (23). The acceptable lower threshold of MAF depends on the total sample size for continuous traits or on the total number of events across all cohorts for dichotomous traits.
The analysis of 2.5 million SNPs across the genome poses an obvious multiple-testing problem. Before sharing results, working groups select a p-value threshold to identify a set of genotype-phenotype associations, almost all of which can be expected to replicate in similar populations. With 2.5 million tests, the use of a Bonferroni correction to control the Family-Wise Error rate (FWER) at 0.05 yields a threshold p-value of 2E-8. Another way to interpret this threshold is to estimate the expected number of false-positive (EFP) tests: if there are no true associations, each test contributes on average 2E-8 false positives and, across the genome, yields an expected total of 0.05 false-positive results. Similarly, a threshold of 1/2.5 million, which equals 4E-7, gives an expectation of one false-positive result for all tests. Unlike the FWER interpretation, the control of EFP is not “conservative” for correlated tests (24). The CHARGE Analysis Committee recommends pre-specifying a fixed p-value threshold as well as a number of tests, but the decision about the exact threshold to use is left up the working groups. The Analysis Committee has also provided power calculations for both continuous (Supplemental Figure 1) and binary phenotypes (Supplemental Figure 2).
When promising results from GWAS meta-analyses arise from SNPs that were imputed in some or all of the cohorts, genotyping the imputed markers in a sample of the existing cohort members serves to validate the imputation process. For the purpose of replication, genotyping high-signal SNPs in independent samples provides additional evidence about the presence or absence of an association. The number of SNPs and the number of independent individuals to be genotyped depend on the available resources and populations. Key follow-up efforts--resequencing high signal areas, fine mapping and functional studies--are likely to require new resources.
Example of coronary heart disease (CHD)
The cohort-study methods papers provide detail about many of the phenotypes listed in Table 2. For CHD, investigators knowledgeable about the phenotype in each study decided to focus on fatal and non-fatal myocardial infarction (MI) as the primary outcome because the MI criteria differed in only trivial ways among the studies. There were some minor differences in the definition of the composite outcome of MI, fatal CHD, and sudden death, which became the secondary outcome. Only subjects at risk for an incident event were included in the analysis. MI survivors whose DNA was drawn after the event were not eligible. The primary analysis was restricted to Europeans or European Americans. Patients entered the analysis at the time of the DNA blood draw, and were followed until an event, death, loss to follow up, or the last visit. The main recommendations of the Analysis Committee were adopted, and a threshold of 5 × 10−8 was selected for genome-wide statistical significance. Analyses in progress include about 1700 MIs and 2300 CHD events among about 29,000 eligible patients. Each cohort conducted its own analysis, and results were uploaded to a secure share site for the fixed-effects meta-analysis. Even with this number of events (Supplemental Figure 2), power is good for only for relatively high MAFs (> 0.25) and large relative risks (> 1.3).
Discussion
In thousands of published papers, the five CHARGE cohort studies and many of the collaborating studies have already characterized the risk factors for and the incidence and prognosis of a variety of aging-related and cardiovascular conditions. The analysis of the incident myocardial infarction, for instance, is free from the survival bias typically associated with cross-sectional or case-control studies. The methodologic advantages of the prospective population-based cohort design, the similarity of phenotypes across five studies, the availability of genome-wide genotyping data in each cohort, and the need for large sample sizes to provide reliable estimates of genotype-phenotype associations have served as the primary incentives for the formation of the CHARGE consoritum, which includes GWAS data on about 38,000 individuals. The consortium effort relies on collaborative methods that are similar to those used by the individual contributing cohorts.
Phenotype experts who know the studies and the data well are responsible for phenotype-standardization across cohorts. The coordinated prospectively planned meta-analyses of CHARGE provide results that are virtually identical to a cohort-adjusted pooled analysis of individual level data. This approach--the within-study analysis followed by a between-study meta-analysis--avoids the human subjects issues associated with individual-level data sharing.
Editors, reviewers, and readers expect replication as the standard in science (6). The finding of a genetic association in one population with evidence for replication in multiple independent populations provides moderate assurance against false-positive reports and helps to establish the validity of the original finding. In a single experiment, the discovery-replication structure is traditionally embodied in a two-stage design. The CHARGE consortium includes up to five independent replicate samples as well as additional collaborating studies for some phenotype working groups, so that it would have been possible to set up analysis plans within CHARGE to mimic the traditional two-stage design for replication. For instance, the two largest cohorts could have served as the discovery set and the others as the replication set. However, attaining the extremely small p-values expected in GWAS requires large sample sizes. For any phenotype, a prospective meta-analysis of all participating cohorts, with a properly selected level of genome-wide statistical significance to miminize the chance of false positives, is the most powerful approach to finding new genuine associations for genetic loci (25). When findings narrowly miss the pre-specified significance threshold, genotyping individuals in other independent populations provides additional evidence about the association. For findings that substantially exceed pre-established significance thresholds, the results of a CHARGE meta-analysis effectively provide evidence of a multi-study replication.
The effort to assemble and manage the CHARGE consortium has provided some interesting and unanticipated challenges. Participating cohorts often had relationships with outside study groups that pre-dated the formation of CHARGE. Timelines for genotyping and imputation have shifted. Purchases of new computer systems for the volume of work were sometimes necessary. Each cohort came to the consortium with their own traditions for methods of analysis, organization, and authorship policies that, while appropriate for their own work, were not always optimal for collaboration with multiple external groups. Within each cohort, the investigators had often formed working groups that divided up the large number of available phenotypes in ways that made sense locally but did not necessarily match the configuration that had been adopted by other cohorts. The RSC has attempted to create a set of CHARGE working groups that accommodate the needs and the conventions of the various cohorts. Transparency, disclosure, and professional collaborative behavior by all participating investigators have been essential to the process.
Resource limitations are another challenge. Grant applications that funded the original single-study genome-wide genotyping effort typically imagined a much simpler design. The CHS whole-genome study had as its primary aim, for instance, the analysis of data on three endpoints, coronary disease, stroke and heart failure. With a score of active phenotype working groups, the CHARGE collaboration broadened the scope of the short-term work well beyond initial expectations for all the participating cohorts.
One of the premier challenges has been communciations among scores of investigators at a dozen sites. CHS and ARIC are themselves multi-site studies. To be successful, the CHARGE collaboration has required effective communications: (1) within each cohort; (2) between cohorts; (3) within the CHARGE working groups; and (4) among the major CHARGE committees. In addition to the traditional methods of conference calls and email, the CHARGE “wiki,” set up by Dr J Bis (Seattle, WA), has provided a crucial and highly functional user-driven website for calendars, minutes, guidelines, working group analysis plans, manuscript proposals, and other documents. In the end, there is no substitute for face-to-face meetings, especially at the beginning of the collaboration, and this complex meta-organization has benefited from several CHARGE-wide meetings.
The major emerging opportunity is the collaboration with other studies and consortia. Many working groups have already incorporated non-member studies into their efforts. Several working groups have coordinated submissions of initial manuscripts with the parallel submission of manuscripts from other studies or consortia. Several working groups have embarked on plans for joint meta-analyses between CHARGE and other consortia. CHARGE has tried to acknowledge and reward the efforts of champions, who assume leadership responsibility for moving these large complex projects forward and who are often hard-working young investigators, the key to the future success of population science.
The CHARGE Consortium represents an innovative model of collaborative research conducted by research teams that know well the strengths, the limitations, and the data from five prospective population-based cohort studies. By leveraging the dense genotyping, deep phenotyping and the diverse expertise, prospective meta-analyses are underway to identify and replicate the major common genetic determinants of risk factors, measures of subclinical disease, and clinical events for cardiovascular disease and aging.
Supplementary Material
Acknowledgments
The authors thank Drs Josh Bis, Nicole Glazer, and Ken Rice for comments on earlier drafts. A full list of investigators from the CHARGE cohorts appears at: http://web.chargeconsortium.com.
Funding sources: Age, Gene/Environment Susceptibility--Reykjavik Study has been funded by NIH contract N01-AG-12100, the NIA Intramural Research Program, Hjartavernd (the Icelandic Heart Association), and the Althingi (the Icelandic Parliament). The Atherosclerosis Risk in Communities Study is carried out as a collaborative study supported by National Heart, Lung, and Blood Institute contracts N01-HC-55015, N01-HC-55016, N01-HC-55018, N01-HC-55019, N01-HC-55020, N01-HC-55021, N01-HC-55022 and R01HL087641; National Human Genome Research Institute contract U01HG004402; and National Institutes of Health contract HHSN268200625226C. The authors thank the staff and participants of the ARIC study for their important contributions. Infrastructure was partly supported by Grant Number UL1RR025005, a component of the National Institutes of Health and NIH Roadmap for Medical Research. Cardiovascular Health Study: The research reported in this article was supported by contract numbers N01-HC-85079 through N01-HC-85086, N01-HC-35129, N01 HC-15103, N01 HC-55222, N01-HC-75150, N01-HC-45133, grant numbers U01 HL080295 and R01 HL087652 from the National Heart, Lung, and Blood Institute, with additional contribution from the National Institute of Neurological Disorders and Stroke. A full list of principal CHS investigators and institutions can be found at http://www.chs-nhlbi.org/pi.htm. Framingham Heart Study: From the Framingham Heart Study of the National Heart Lung and Blood Institute of the National Institutes of Health and Boston University School of Medicine. This work was supported by the National Heart, Lung and Blood Institute’s Framingham Heart Study (Contract No. N01-HC-25195) and its contract with Affymetrix, Inc for genotyping services (Contract No. N02-HL-6-4278), and by grants from the National Institute of Neurological Disorders and Stroke (NS17950; PAW) and the National Institute of Aging, (AG08122, AG16495; PAW). Analyses reflect intellectual input and resource development from the Framingham Heart Study investigators participating in the SNP Health Association Resource (SHARe) project. Rotterdam Study: The GWA database of the Rotterdam Study was funded through the Netherlands Organisation of Scientific Research NWO (nr. 175.010.2005.011). The Rotterdam Study is supported by the Erasmus Medical Center and Erasmus University, Rotterdam; the Netherlands Organization for Scientific Research (NWO), the Netherlands Organization for Health Research and Development (ZonMw), the Research Institute for Diseases in the Elderly (RIDE), the Ministry of Education, Culture and Science, the Ministry for Health, Welfare and Sports, the European Commission (DG XII), and the Municipality of Rotterdam.
Footnotes
Disclosures and conflicts: None.
The authors had full access to and take full responsibility for the integrity of the data. All authors have read and agree to the manuscript as written.
References
- 1.Ioannidis JPA, Nizani EE, Trikalinos TA, Contopoulos-Ioannidis DG. Replication validity of genetic association studies. Nat Genet. 2001;29:306–309. doi: 10.1038/ng749. [DOI] [PubMed] [Google Scholar]
- 2.Zeggini E, Scott L, Saxena R, Boight BF for the Diabetes and large-scale replication and Meta-analysis (DIAGRAM) Consortium. Meta-analysis of genome-wide association data and large-scale replication identifies additional susceptibility loci for type 2 diabetes. Nat Genet. 2008;40:638–645. doi: 10.1038/ng.120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.The Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3000 shared controls. Nature. 2007;447:661–678. doi: 10.1038/nature05911. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.McPherson R, Pertsemlidis A, Kavaslar N, Stewart A, Roberts R, Cox DR, Hinds DA, Pennacchio LA, Hansen AT, Folsom AR, Boerwinkle E, Hobbs HH, Cohen JC. A common allele on chromosome 9 associated with coronary heart disease. Science. 2007;316:1488–91. doi: 10.1126/science.1142447. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Helgaddottir A, Thorleifsson G, Manolescu A, Gretarsdottir S, Blondal T, Jonasdottir A, Jonasdottir A, Sigurdsson A, Baker A, Palsson A, Masson G, Gudbjartsson D, Magnusson KP, Andersen K, Levey AI, Backman VM, Matthiasdottir S, Jonsdottir T, Paulson S, Einarsdottir H, Gunnarsdottir S, Gylafson A, Vaccarino V, Hooper WG, Reilly MP, Granger CB, Austin H, Rader DJ, Shah SH, Quyyumi AA, Gulcher JR, Thorgeirsson G, Thorsteinsdottir U, Kong A, Stefansson K. A common variant on chromosome 9p21 affects the risk of myocardial infarction. Science. 2007;316:1491–3. doi: 10.1126/science.1142842. [DOI] [PubMed] [Google Scholar]
- 6.NCI-NHGRI Working Group on Replication in Association Studies. Replicating genotype-phenotype associations. Nature. 2007;447:655–60. doi: 10.1038/447655a. [DOI] [PubMed] [Google Scholar]
- 7.Ioannidis JP, Boffetta P, Little J, O’Brien TR, Uitterlinden AG, Vineis P, Balding DJ, Chokkalingam A, Dolan SM, Flanders W, Higgins JPT, McCarthy M, McDermott DH, Page GP, Rebbeck TR, Seminara D, Khoury MJ. Assessment of cumulative evidence on genetic associations: interim guidelines. Int J Epidemiol. 2008;37:120–32. doi: 10.1093/ije/dym159. [DOI] [PubMed] [Google Scholar]
- 8.Harris T, Launer L, Eiriksdottir G, Kjartansson O, Jonsson PV, Sigurdsson G, Thorgeirsson G, Aspelund T, Garcia MF, Hoffman HJ, Gudnason V. Age, Gene/Environment Susceptibility-Reykjavik Study: multidisciplinary applied phenomics. Am J Epidemiol. 2007;165:1076–87. doi: 10.1093/aje/kwk115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.The ARIC Investigators. The Atherosclerosis Risk in Communities (ARIC) Study: design and objectives. Am J Epidemiol. 1989;129:687–702. [PubMed] [Google Scholar]
- 10.Fried LP, Borhani NO, Enright P, Furberg C, Gardin J, Kronmal R, Kuller LH, Manolio T, Mittelmark M, Newman A, O’Leary DH, Psaty B, Rautaharju P, Tracy RP, Weiler P. The Cardiovascular Health Study: design and rationale. Ann Epidemiol. 1991;1:263–276. doi: 10.1016/1047-2797(91)90005-w. [DOI] [PubMed] [Google Scholar]
- 11.Dawber TR, Meadors GF, Moore FE. Epidemiological approaches to heart disease: The Framingham study. Am J Public Health. 1951;41:279–286. doi: 10.2105/ajph.41.3.279. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Feinleib M, Kannel W, Garrison R, McNamara P, Castelli W. The Framingham Offspring Study: design and preliminary data. Prev Med. 1975;4:518–25. doi: 10.1016/0091-7435(75)90037-7. [DOI] [PubMed] [Google Scholar]
- 13.Splansky G, Corey D, Yang Q, Atwood LD, Cupples LA, Benjamin E, D’Agostino RB, Sr, Fox CS, Larson MG, Murabito JM, O’Donnell CJ, Vasan RS, Wolf PA, Levy D. The Third Generation Cohort of the National Heart, Lung, and Blood Institute’s Framingham Heart Study: design, recruitment, and initial examination. Am J Epidemiol. 2007;165:1328–35. doi: 10.1093/aje/kwm021. [DOI] [PubMed] [Google Scholar]
- 14.Hofman A, Breteler MMB, van Duijn CM, Krestin GP, Pols HA, Stricker BHC, Tiemeier H, Uitterlinden AG, Vingerling JR, Witteman JCM. The Rotterdam Study: objectives and design update. Eur J Epidemiol. 2007;22:819–29. doi: 10.1007/s10654-007-9199-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.International Committee of Medical Journal Editors. [accessed Sept 19, 2008];Uniform requirements for manuscripts submitted to biomedical journals: writing and editing for biomedical publication, updated October 2007. Http://www.icmje.org. [PubMed]
- 16.National Institutes of Health. [accessed Aug 29, 2008];Modification to genome-wide association studies (GWAS) data access, Aug 28, 2008. Http://www.grant.nih.gov/grants/gwas/data_sharing_policy_modifications_20080828.pdf.
- 17.National Institutes of Health. Policy for sharing of data Obtained in NIH supported or conducted genome-wide association studies (GWAS) Federal Register. 2007;72(166):4920–7. [Google Scholar]
- 18.Servin B, Stephens M. Imputation-based analysis of association studies: candidate regions and quantitative traits. PLoS Genet. 2007;3:e114. doi: 10.1371/journal.pgen.0030114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Li Y, Abecasis GR. Mach 1.0: rapid haplotype reconstruction and missing genotype inference. [accessed Sept 19, 2008];Am J Hum Genet. 2006 S79:2290; http://www.sph.umich.edu/csg/abecasis/MACH/
- 20.Lettre G, Lange C, Hirschhorn JN. Genetic model testing and statistical power in population-based association studies of quantitative traits. Genet Epidemiol. 2007;31:358–62. doi: 10.1002/gepi.20217. [DOI] [PubMed] [Google Scholar]
- 21.Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D. Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet. 2006;38:904–9. doi: 10.1038/ng1847. [DOI] [PubMed] [Google Scholar]
- 22.Devlin B. Genomic control for association studies. Biometrics. 1999;55:997–1004. doi: 10.1111/j.0006-341x.1999.00997.x. [DOI] [PubMed] [Google Scholar]
- 23.McCarthy MI, Abecasis GR, Cardon LR, Goldstein DB, Little J, Ioannidis JPA, Hirschhorn JN. Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat Rev Genet. 2008;9:356–69. doi: 10.1038/nrg2344. [DOI] [PubMed] [Google Scholar]
- 24.Gordon A, Glazko G, Qiu X, Yakovlev Control of the mean number of false discoveries, Bonferroni and stability of multiple testing. Ann Applied Statistics. 2007;1:179–190. [Google Scholar]
- 25.Skol AD, Scott LJ, Abecasis GR, Boehnke M. Joint analysis is more efficient than replication-based analysis for two-stage genome wide association studies. Nat Genet. 2006;38:209–13. doi: 10.1038/ng1706. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.