Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2014 Oct 1.
Published in final edited form as: Stroke. 2013 Sep 10;44(10):2694–2702. doi: 10.1161/STROKEAHA.113.001857

Stroke Genetics Network (SiGN) Study: Design and rationale for a genome-wide association study of ischemic stroke subtypes

James F Meschia 1, Donna K Arnett 1, Hakan Ay 1, Robert D Brown Jr 1, Oscar Benavente 1, John W Cole 1, Paul IW de Bakker 1, Martin Dichgans 1, Kimberly F Doheny 1, Myriam Fornage 1, Raji Grewal 1, Katrina Gwinn 1, Christina Jern 1, Jordi Jimenez Conde 1, Julie A Johnson 1, Katarina Jood 1, Cathy C Laurie 1, Jin-Moo Lee 1, Arne Lindgren 1, Hugh S Markus 1, Patrick F McArdle 1, Leslie A McClure 1, Braxton D Mitchell 1, Reinhold Schmidt 1, Kathryn M Rexrode 1, Stephen S Rich 1, Jonathan Rosand 1, Peter M Rothwell 1, Tatjana Rundek 1, Ralph L Sacco 1, Pankaj Sharma 1, Alan R Shuldiner 1, Agnieszka Slowik 1, Sylvia Wassertheil-Smoller 1, Cathie Sudlow 1, Vincent Thijs 1, Daniel Woo 1, Bradford B Worrall 1, Ona Wu 1, Steven J Kittner, On Behalf of the NINDS SiGN Study1,*
PMCID: PMC4056331  NIHMSID: NIHMS515620  PMID: 24021684

Abstract

Background and Purpose

Meta-analyses of extant genome-wide data illustrate the need to focus on subtypes of ischemic stroke for gene discovery. The NINDS Stroke Genetics Network (SiGN) contributes substantially to meta-analyses that focus on specific subtypes of stroke.

Methods

The NINDS Stroke Genetics Network (SiGN) includes ischemic stroke cases from 24 Genetic Research Centers (GRCs), 13 from the US and 11 from Europe. Investigators harmonize ischemic stroke phenotyping using the web-based Causative Classification of Stroke (CCS) system, with data entered by trained and certified adjudicators at participating GRCs. Through the Center for Inherited Diseases Research (CIDR), SiGN plans to genotype 10,296 carefully phenotyped stroke cases using genome-wide SNP arrays, and add to these another 4,253 previously genotyped cases for a total of 14,549 cases. To maximize power for subtype analyses, the study allocates genotyping resources almost exclusively to cases. Publicly available studies provide most of the control genotypes. CIDR-generated genotypes and corresponding phenotypic data will be shared with the scientific community through dbGaP, and brain MRI studies will be centrally archived.

Conclusions

The SiGN consortium, with its emphasis on careful and standardized phenotyping of ischemic stroke and stroke subtypes, provides an unprecedented opportunity to uncover genetic determinants of ischemic stroke.

Keywords: ischemic stroke, genetics, genomics

Introduction

Genome-wide association studies have been remarkably successful in identifying loci contributing to the genetic basis of human disease and complex phenotypes. Just over five years ago, knowledge of the genetic variants influencing disease risk was largely restricted to rare familial conditions that could be linked to rare mutations in single genes with high penetrance. While heritability and family studies consistently pointed to a substantial genetic contribution to common complex conditions like type 2 diabetes mellitus, coronary heart disease and ischemic stroke, the genetic loci accounting for a substantial component of risk for these disorders remained almost completely undiscovered. By applying technologies that allow genotyping of hundreds of thousands of variants across the genome in thousands of individuals at high accuracy, investigators have subsequently discovered many loci (and in some cases, the underlying causal genes within a locus) contributing to risk of common diseases (http://genome.gov/gwastudies/). The genotyping arrays were constructed using common variants for capturing regions of genomic variation, and variants in a locus that significantly associated with disease risk were common in the population and exerted relatively small effects. Thus, thousands of well-phenotyped individuals were required to identify these risk loci.

Founded in 2007, the International Stroke Genetics Consortium (ISGC) facilitates assembly of genome-wide data in thousands of cases, controls and families with ischemic stroke for the purpose of collaborative meta-analyses. Although there have been no consistently replicated loci associated with ischemic stroke, the ISGC has identified several loci associated with ischemic stroke subtypes. Variants on chromosome 9p21, 6p21.1, and near the HDAC9 gene have been related to large vessel atherosclerotic stroke.13 Genetic variants associated with atrial fibrillation, a condition strongly predisposing to ischemic stroke, have also been identified, including mutations in several ion channels (reviewed in Lubitz, et al.4), and at a locus on chromosome 4 identified through genome-wide association study (GWAS).5 It was quickly realized that much larger sample sizes would be required to detect further risk loci for both ischemic stroke overall and stroke subtypes. In addition, the underlying heterogeneous etiology of ischemic stroke (including small vessel, cardioembolic, and large vessel mechanisms) suggests that careful and systematic phenotyping and subtype-specific analyses are essential for successful gene discovery.

The success of subtype-specific genetic studies of ischemic stroke faces another substantial hurdle: the lack of agreement across centers on subtype assignment.6 To address these challenges, the U.S. National Institute of Neurologic Disorders and Stroke (NINDS) has established the Stroke Genetics Network (SiGN) with the goal of assembling the largest possible sample of individuals with ischemic stroke for genetic studies, where each individual has been uniformly and thoroughly characterized for stroke subtyping. Like the earlier Wellcome Trust Case-Control Consortium-2 (WTCCC2) GWAS of ischemic stroke, SiGN has grown out of the ISGC, and is committed to the widest possible sharing of data among all investigators dedicated to discovering the role of genetic variation in risk of ischemic and hemorrhagic stroke and related phenotypes and exploiting this knowledge for the benefit of patients. This manuscript describes and explains the rationale for key aspects of the design of SiGN.

Methods

SiGN responded to a Request for Applications (RFA) issued by the NINDS that proposed establishing a genome-wide association study consortium focused on identifying genes or genomic regions that affect either the susceptibility to, or outcome of, ischemic stroke. The RFA specified that multiple Genetic Research Centers (GRC) be established that have access to well-characterized ischemic stroke cases in whom extensive phenotype, covariate, and exposure data are available and high-quality DNA is banked or could be isolated from stored specimens and that standardized, validated, and easily replicated methods should be used to assign stroke subtypes. The RFA further specified that investigators submit the harmonized phenotype data used for the stroke subtyping and the newly generated genotype data to the NIH-supported database of Genotypes and Phenotypes (dbGaP) to create a national resource of high quality information for data mining, replication studies, and future hypothesis generation.

Structure of SIGN

SiGN consists of 24 Genetic Research Centers (GRCs), 13 from the US and 11 from Europe (see Table 1 for summary and Online Supplement 1 for descriptions). The GRCs represent centers that have existing collections of DNA samples from ischemic stroke cases, and agree to characterize all cases for stroke subtype using a single standardized protocol requiring detailed imaging and clinical information. Informed consent for data sharing is a requirement for a GRC. Figure 1 shows the administrative structure. The Scientific Steering Committee leads SiGN. Its members include Co-Principal Investigators, the Analysis Committee, and NINDS staff. The Scientific Steering Committee is responsible for scientific direction and policy decisions. It is also oversees the Publications and Data Access Committee, which develops guidelines for publication and authorship, prioritizes analysis resources for manuscript proposals, and recommends approval of proposals and manuscripts to the Scientific Steering Committee. The study has four Cores: Administrative, Data Management, Imaging, and Genotyping. The Administrative Core and Data Management Core monitor study progress, maintain efficient interactions among the Cores and the participating GRCs, ensure regulatory compliance, and are responsible for submitting the genotype and phenotype data to dbGaP. The Data Management Core also works closely with the Analysis Committee in the preparation of publications. The Analysis Committee, composed of genetic epidemiologists and statistical geneticists from four different institutions, advises the Scientific Steering Committee on design issues and is responsible for the genetic analyses (Online Supplement 2). The Phenotype Committee detailed below, is responsible for training and quality assurance of ischemic stroke subtyping at the GRCs. The Imaging Core, detailed below, is the centralized repository for clinically obtained MRI data from the GRCs. The Genotyping Core is the NINDS-designated Center for Inherited Disease Research (CIDR, Baltimore).The Genotyping Core performs quality control of the submitted DNA as well as initial quality control of the GWAS and exome-enriched genotyping. The Center for Biomedical Statistics (CBS) at the University of Washington (Seattle) provides more extensive quality control of the genotype data through a subcontract with CIDR. CIDR, CBS, and the Analysis Committee jointly decided on the design of the study, including choice of controls, and selection and use of within- and cross-study duplicates.

Table 1.

Cases of ischemic stroke (N = 17,298) classified using the Causative Classification of Stroke (CCS) system as of April 3, 2013 and their demographic characteristics for each Genetic Research Center (GRC).

GRC Location Recruitment
Source
Recruitment
Years
Cases
(n)
Age
Range (yrs)
Female
(%)
European
Descent
(%)
African
Descent
(%)
BASICMAR Barcelona, Spain Hospital-based 2005–2012 1088 30–101 47 97 0
BRAINS London, England Hospital-based 2005–2012 598 19–98 41 92 3
EDIN Edinburgh, Scotland Hospital-based 2002–2005 626 29–97 45 100 0
GASROS Boston, USA Hospital-based 2003–2009 686 18–100 36 90 4
GCNKSS Greater Cincinnati region, USA Population-based 1999–2006 642 20–104 50 75 24
GEOS Greater Baltimore region, USA Population-based 1992–2008 891 16–50 41 51 42
GRAZ Graz, Austria Hospital-based 1992–2011 685 19–101 41 100 0
ISGS Multi-center, USA Hospital-based 2002–2008 675 19–94 43 71 26
KRAKOW Krakow, Poland Hospital-based 2001–2011 1487 19–100 48 100 0
LEUVEN Leuven, Belgium Hospital-based 2005–2009 524 18–97 42 100 0
LUND Lund, Sweden Hospital-based 2006–2010 818 22–99 49 100 0
MCISS New Jersey, USA Hospital-based 1999–2009 876 19–98 49 68 12
MIAMISR Miami, USA Hospital-based 2008–2011 331 18–92 36 15 29
MUNICH* Munich, Germany Hospital-based 2002–2009 524 17–97 41 100 0
NHS National sample, USA Cohort study 1989–1992 470 45–85 100 93 1
NOMAS(S) Manhattan, USA Population-based and Cohort (2 sources) 1993–2001 578 33–104 55 20 25
OXVASC Oxfordshire, England Population-based 2002–2010 554 33–96 51 100 0
REGARDS National sample, USA Cohort study 2003–2007 555 46–93 46 57 43
SAHLSIS Gothenburg, Sweden Hospital-based 1998–2012 1085 16–69 36 100 0
SPS3* Multi-center; USA, Latin America, Spain Hospital-based 2003–2011 1139 32–89 37 47 14
ST GEORGE’S London, England Hospital-based 1995–2008 684 18–102 47 100 0
SWISS Multi-center, USA Cases with affected siblings 1999–2011 407 21–93 47 91 2
WHI National sample, USA Nested case-control study within a cohort study 1993–1998 840 54–87 100 86 8
WUSTL* St. Louis, USA Hospital-based 2008–2012 535 20–90 42 47 28

The proportion Hispanics by self-identification were: BASICMAR (0%) , BRAINS (0%), EDIN (0%), GASROS (4.1%), GCNKSS (0.3 %), GEOS (2.4 %), SAHLSIS (0 %), GRAZ (0.2 %), ISGS (1.2 %), KRAKOW (0%), LEUVEN (0%), LUND (0%), MCISS (6.5%), MIAMISR (54.1%), MUNICH (0%), NHS (1.3%), NOMAS(S) (52.9%), OXVASC (0%), REGARDS (0%), SPS3 (45%), ST. GEORGE’S (0%), SWISS (1.5%), WHI (2.0%), WUSTL (0%).

*

Cases being phenotyped.

Figure 1.

Figure 1

Organizational structure of SiGN study. GRC denotes Genetic Research Center; CIDR, Center for Inherited Diseases Research; CBS, Center for Biomedical Statistics at the University of Washington.

Phenotyping methods

SiGN uses the Causative Classification of Stroke (CCS) system for phenotyping of ischemic stroke cases. CCS incorporates multiple aspects of present-day diagnostic stroke evaluation (diffusion-weighted imaging, perfusion-weighted imaging, CT- and MR-angiography of extracranial and intracranial arteries, transthoracic and transesophageal echocardiography, and ambulatory electrocardiography) in a standardized manner to identify both likely causative and phenotypic subtypes. There is web-based, semi-automated CCS software to assign the most likely causative mechanism.7 The CCS divides ischemic stroke into five causative subtypes based on a framework that is well defined, easily replicable, and evidence-based: supra-aortic large artery atherosclerosis, cardio-aortic embolism, small artery occlusion, other uncommon causes, and undetermined causes.4 The system permits distinguishing patients with symptomatic intracranial atherosclerosis from patients with symptomatic extracranial atherosclerosis. The web-based CCS allows for remote data entry, as well as structuring and archiving of individual data elements such as diagnostic test findings. In an international multicenter study, a high degree of reliability (Kappa statistic, 0.80) was found among 20 raters from 13 centers in 8 countries when applying the web-based CCS to the same set of 50 consecutive abstracted case summaries.8

With the exception of ST. GEORGE’S, BASICMAR (early cases), and the SPS3 trial, physician adjudicators from each GRC adjudicate clinical histories, physical examination findings, and the results of diagnostic testing and enter the information into the web-based CCS system. Every adjudicator is required to have undergone formal training and certification in use of the CCS. The CCS website contains an interactive training module that has 10 training cases. A Phenotype Committee trainer provides training lectures to adjudicators at scheduled study meetings. Finally, the Committee presents a 90-minute webinar for additional training. The webinar reviews case numbering conventions, data entry, data submission, and archiving as well as standardized consensus responses to frequently asked questions about specific CCS items. Trainers provide sufficient time to answer any questions from learners. All GRC adjudicators and members of the Phenotype Committee are required to take and pass an online CCS certification examination. The CCS certification examination consists of 5 clinical vignettes (randomly selected from a pool of 15 vignettes) from which the test taker abstracts and enters data into web forms. The CCS assigns weights to test items based on their importance in determining subtype diagnosis. The total score is on a 40 to 100 scale. Points are deducted when critical data elements are missed or non-existing data elements are substituted. The minimum passing score is 80 points. The Phenotype Committee allows up to five attempts by test takers to pass. Individuals who achieve certification in CCS receive online confirmation of having passed. The GRC and the Phenotype Committee administrators retain copies of the certificate.

ST GEORGE’S entered data into the publically available version of CCS and is electronically transferring data to the study-specific version of CCS. For the first set of cases classified by BASICMAR, investigators at BASICMAR mapped pre-collected stroke research data that had been stored in an electronic database to the study-specific CCS. From the SPS3 trial, pre-collected clinical trial data captured on case report forms and stored electronically is being mapped to the CCS using decision rules authored by the Phenotype Committee in collaboration with the Principal Investigator of the trial.

The Phenotype Committee tracks progress in CCS adjudication center-by-center in a weekly conference call. The Committee also monitors data quality by assessing inter-rater reliability of case adjudication. An independent 10% random sample of cases is re-adjudicated for each GRC. For US centers, vascular neurologist members of the Phenotype Committee re-adjudicate cases. For non-US centers, a CCS-certified member of the local investigative team re-adjudicates cases. Raters perform all re-adjudications blinded to the results of the initial adjudication. For any given case, the adjudicator and the re-adjudicator are different individuals. When CCS reliability results fall below 50% complete agreement, the Phenotype Committee reviews which aspects of the CCS adjudication appear to be most problematic for the adjudicator, engages in retraining, and requires re-adjudication at a center.

The SiGN Imaging Platform

A distinctive feature of SiGN is that the imaging platform assembles in a central location, all available brain MRI images obtained at the time of or during follow-up after stroke for genotyped subjects. De-identified images have been stored on a central server. The goal is to have this resource utilized in future investigations by members of SiGN, the ISGC, or other investigators to advance understanding of the role of genetic variation in stroke and/or MRI-derived phenotypes. The imaging platform is based on the Extensible Neuroimaging Archive Toolkit (XNAT),9 an open-source software specifically designed to facilitate common management and productivity tasks for neuroimaging and associated metadata, such as image capture, quality control, automation, local use, collaborative use and public access. The Imaging Core has integrated XNAT with a production-ready, open source content management framework called Plone (http://www.plone.org), which provides an easily customizable front-end and a streamlined interface for imaging and clinical data management. Data dictionaries from each of the sites, consisting of SiGN ID, gender, race, ethnicity, age and infarct location are uploaded along with imaging data. Images can be viewed on web-browsers using a Java-based image viewer. Search query capabilities are provided with a concise interface similar to that of advanced search feature on PubMed. Users are able to query the data stored in the imaging repository, including metadata in the data dictionary, using field-based keywords. For example, a site can find all female patients aged less than 65 years for which imaging exists in the repository. Search results are available for export into an Excel-compatible file. The file can then be used to request detailed genetic information and/or raw images.

Overall Genotyping and Analysis Strategy

The support from NINDS to the SiGN study allowed 11,644 samples to be genotyped at CIDR using the Illumina Infinium Omni5 genotype array with Exome content. To maximize power to detect associations with stroke subtypes, a strategic decision was made to genotype primarily ischemic stroke cases for comparison with publicly available previously genotyped controls where possible. Cases from participating GRCs were prioritized for genotyping based on CCS subtyping: (1) cases with a determined CCS subtype excluding certain known rare causes (migraine-related stroke, acute arterial dissection, dilated cardiomyopathy, infective endocarditis, papillary fibroelastoma, left atrial myxoma, cerebral vasculitis, cerebral venous thrombosis, acute disseminated intravascular coagulation, drug-induced, heparin-induced thrombocytopenia type II, cerebral autosomal dominant arteriopathy with subcortical infarcts and leukoencephalopathy, iatrogenic causes, mitochondrial encephalopathy with lactic acidosis and stroke-like episodes, meningitis, primary infection of the arterial wall, and sickle cell disease) and (2) cryptogenic CCS subtype despite adequate evaluation. Additional cases were prioritized independent of CCS subtyping based on the availability of specific other desirable phenotypic information including digital MRI data and longitudinal outcome data.

The total number of CCS-phenotyped cases in SiGN is 16,411, contributed from 24 sites in the U.S. and Europe. A total of 10,296 cases have been prioritized for genotyping. A total of 4,253 additional CCS-phenotyped cases from 10 GRCs across Europe and the US have been previously genotyped prior to initiating SiGN (BRAINS, EDIN, GASROS, GEOS, ISGS, NHS, MUNICH, OXVASC, ST GEORGE’S, and SWISS). A total of 14,549 cases will have genotypes from an Illumina platform with at least 610,000 SNP genome-wide coverage available for analysis.

Selection of control subjects for SiGN

Where possible, controls with publicly available genotype data were selected to ancestry-match cases at each GRC. A key criterion for selection of these control groups was that they had been genotyped on an Illumina Omni series GWAS array in order to minimize technical artifacts between cases and controls.

Table 2 summarizes the key features of the control groups selected for each GRC. For the US samples, we identified 2 large population studies that had been genotyped previously to serve as control groups: (1) the Health and Retirement Study (HRS)10, a nationally representative sample of ~22,000 adults over the age of 50 years launched in 1992 to provide information about health and social issues relating to retirement (n = 12,507 genotyped subjects); (2) the Osteoarthritis Initiative (OAI), a prospective study of ~5,000 adults with the primary objective of identifying risk factors for incidence and progression of tibiofemoral knee OA (http://oai.epi-ucsf.org/datarelease/default.asp) (n = 4,130 genotyped subjects). HRS and OAI utilized the llumina Omni Quad 2.5M array. These studies, which include European Caucasian, African American, and Hispanic ethnicities, were selected to provide controls for all stroke cases to be genotyped through the 13 US GRCs. The GEOS GRC had previously carried out genotyping of patients and controls.

Table 2.

Genotyping platforms used for cases for each Genetic Research Center and its associated control population(s). Platforms used to genotype cases and controls prior to SiGN that will be used as part of the analysis appear in italics.

Cases Platform(s) for Cases Controls Platform for Controls
BASICMAR Illumina Infinium Omni5 INMA Project Illumina Omni1 array
ADHD Study Illumina Omni1 array
BRAINS Illumina Infinium Omni5
Illumina 650Q
Wellcome Trust Illumina Human1M-Duo
EDINBURGH Illumina Human 660W-Quad Wellcome Trust Illumina Human1M-Duo
GASROS Illumina Infinium Omni5
Illumina 610
HRS Illumina Omni Quad 2.5M
OAI Illumina Omni Quad 2.5M
GASROS Illumina 610
GCNKSS Illumina Infinium Omni5 HRS Illumina Omni Quad 2.5M
OAI Illumina Omni Quad 2.5M
GEOS Illumina Omni1-Quad GEOS Illumina 1M (Dr. Kittner,
should chip name be same
for both cases and controls?)
GRAZ Illumina Infinium Omni5 Austria Illumina 610
ISGS Illumina Infinium Omni5
Illumina 610/660
HRS Illumina Omni Quad 2.5M
OAI Illumina Omni Quad 2.5M
KRAKOW Illumina Infinium Omni5 KRAKOW Illumina Infinium Omni5
LEUVEN Illumina Infinium Omni5 LEUVEN Illumina Infinium Omni5
LUND Illumina Infinium Omni5 South Sweden Study Pending
MCISS Illumina Infinium Omni5 HRS Illumina Omni Quad 2.5M
OAI Illumina Omni Quad 2.5M
MIAMISR Illumina Infinium Omni5 Study of Latinos Pending
MUNICH IlluminaHuman660W-Quad Wellcome Trust IlluminaHuman1M-Duo
NHS Illumina Infinium Omni5 HRS Illumina Omni Quad 2.5M
OAI Illumina Omni Quad 2.5M
NOMAS(S) Illumina Infinium Omni5 HRS Illumina Omni Quad 2.5M
OAI Illumina Omni Quad 2.5M
OXVASC IlluminaHuman660W-Quad Wellcome Trust IlluminaHuman1M-Duo
REGARDS Illumina Infinium Omni5 HRS Illumina Omni Quad 2.5M
OAI Illumina Omni Quad 2.5M
SAHLSIS Illumina Infinium Omni5 South Sweden Study Pending
SPS3 Illumina Infinium Omni5 HRS Illumina Omni Quad 2.5M
OAI Illumina Omni Quad 2.5M
INMA Project Illumina Omni1 array
ADHD Study Illumina Omni 1 array
ST GEORGE’S IlluminaHuman660W-Quad Wellcome Trust IlluminaHuman1M-Duo
SWISS Illumina Infinium Omni5
Illumina 610/660
HRS Illumina Omni Quad 2.5M
OAI Illumina Omni Quad 2.5M
WHI Illumina Infinium Omni5 HRS Illumina Omni Quad 2.5M
OAI Illumina Omni Quad 2.5M
WUSTL Illumina Infinium Omni5 HRS Illumina Omni Quad 2.5M
OAI Illumina Omni Quad 2.5M

SiGN has identified separate control groups for the 11 European GRCs. These include previously genotyped control groups from Sweden, Spain, the UK, and Austria to be paired with the cases to be genotyped from the two GRCs from Sweden (LUND and SAHLSIS), and from the GRCs from Spain (BASICMAR), the UK (BRAINS), and Austria (GRAZ). Because suitable control groups with available GWAS genotyping could not be identified from Poland and Belgium, controls from the KRAKOW and LEUVEN GRCs will also be genotyped as part of SiGN.

Genotyping methods

The Illumina Infinium Omni5 genotype array with Exome content has been selected as the genotyping platform in consultation with NINDS and CIDR. This array includes ~4.3 million single nucleotide polymorphisms (SNPs) across the genome with excellent coverage of common and infrequent variants (minor allele frequency, MAF >1%). This array also includes ~240,000 rare but polymorphic variants selected from over 12,000 individually sequenced exomes and 475 mitochondrial markers. CIDR will perform all genotyping.

The 11,644 samples allocated for genotyping will include ischemic stroke cases (n = 10,296), controls from the KRAKOW and LEUVEN sites (n = 1,282), and 66 additional samples selected for quality control (see below). In the total sample of cases, there are slightly more women, since two GRCs enrolled only women (WHI and NHS). All GRCs provide subjects of European ancestry; in addition, 9 of the US sites provide African American subjects and 5 provide Hispanic subjects. There are no whole-genome amplified DNA samples used in the genotyping process.

Three types of samples will be genotyped for quality control purposes: (1) cross-study duplicates (previously genotyped control samples re-genotyped by CIDR to assess genotype concordance rates and identify SNPs that perform differently); (2) a 2% sample of duplicates from each GRC; and (3) HapMap controls. HRS, OAI, GRAZ, BASICMAR, and LUND will each provide 30 duplicates from their control groups for replicate genotyping.

For allocation to genotyping plates, DNA samples sent for genotyping at CIDR will be separated into 21 groups, based upon GRC (19 sites that contribute ischemic stroke DNA samples) and disease status (2 sites contributing control DNA samples). DNA samples will be randomized within these levels, and genotyped in 48-sample batches. Each batch will contain one HapMap control and one study duplicate. The duplicates will be randomly selected and be representative of the overall sample distribution. The 30 cross-study duplicates contributed by the five control groups will be distributed evenly across plates.

Prior to genotyping with the Illumina Infinium Omni5 genotype array with Exome content, all samples will be genotyped with a 96-SNP ‘barcode’ panel composed of autosomal, X- and Y-chromosome markers. The pretesting process allows for assurance of proper sample- and filetracking throughout data generation and release processes (concordance between pretesting genotypes and genotypes generated from 5M plus exome array data; confirmation of expected relationships and duplicates; and identification of file creation and/or aliquoting errors (primarily gender discrepancies and unexpected first-degree relatives among subjects). DNA aliquots that perform unexpectedly or result in poor data quality with the pretesting assay will be flagged for possible replacement or removal from the study. After the final sample set is determined, the GWAS assay will be performed. Poorly performing samples, usually those with a call rate < 98%, will be attempted a second time. In the GWAS processing of data, genotype cluster definitions will be determined using the Illumina Gentrain algorithm version 1.0 contained in Illumina GenomeStudio software (Illumina, Inc., San Diego, CA). We initially use this software to determine cluster boundaries including all samples for a project. Sample call rates and quality metrics will be evaluated. From prior CIDR experience, it is anticipated that a small portion of samples will be marked for exclusion from project release due to poor data quality (call rate generally < 97 to 98% for genomic DNAs). After exclusion of poor quality experiments, the clustering algorithm will be run again to determine final cluster positions, since it is important to include only high quality raw data for accurate clustering. Any genotype with a quality value below 0.15 will not be provided for analysis. Genotype cluster boundaries will be manually reviewed for all XY, Y and mitochondrial SNPs and adjusted as necessary. Additional SNP filtering will be performed with the goal being to remove genotypes only for markers that are complete assay failures.

Phenotype data and corresponding genotypes generated through SiGN will be made available on dbGaP. For dbGaP posting purposes, data for all SNPs will be provided for all samples that pass QC at CIDR and for which no sample identity issues arise during QC. The released datasets will include the raw data files (.idat files); genotypes for forward, A/B, design and top alleles; quality scores and intensity values (raw and normalized); SNP and sample summary tables, including quality flags and comments; SNP cluster definition files; and project summary and quality statistics. Quality statistics reported will include sample success rates, missing data rates, Mendelian consistency rates, investigator duplicate reproducibility rates and HapMap concordance rates.

The Center for Biomedical Statistics (CBS) at the University of Washington will perform additional post-release data processing as described previously.11 This group will assist the Analysis Committee with data cleaning and, if requested, posting of datasets to dbGaP as well as imputation using reference data from the 1000 Genomes Project.12 The GWAS data cleaning process typically focuses first on resolving any sample identity problems identified at release (e.g., sex mismatch, unexpected sample duplicates, and cryptic relatedness). Samples will be identified that should be removed for some analyses but may be retained as part of the posting to dbGaP, such as unexpected relatedness. Chromosome anomalies will be identified, and genotypes will be filtered from an anomalous region. Batch effects (samples processed together, DNA source or extraction method, study) will be checked, and the analysis will control for differences in ethnicity.

Principal components analysis will be used to identify ethnic outliers and to adjust for population stratification in association analyses. SNP filters will be developed including missing data filters, duplicate errors, minor allele frequency, and Hardy-Weinberg equilibrium. The CBS typically performs a relatively simple association (“pre-compute”) analysis to determine whether there is a problematic level of genomic inflation suggesting false positives. Given the complexity of the SiGN dataset with its multiple GRCs and control groups, this “pre-compute” will be performed within multiple strata to accommodate proper matching of cases and control groups (e.g., cases from US GRCs vs. US control groups; Swedish cases vs. Swedish controls). The “precompute” also will allow investigators who access the data to verify that they were able to download data, merge the genotype and phenotype datasets, and apply the filters correctly by repeating the pre-compute results. A quality control report describing the dataset and results of data cleaning will be posted on dbGaP. In addition, the CBS will impute untyped variants across the genome using 1000 Genomes Project data as a reference, and post the results on dbGaP.

Analysis Strategy

The Data Management Core will store cleaned genotype data for distribution to the Analysis Committee that will conduct the primary GWAS analysis. The analytic strategy will initially involve logistic regression models adjusting for GRC, country, or principal components to test the overall behavior of the test statistic. If the data show unacceptable levels of statistical inflation, the Analysis Committee will likely adopt linear mixed models to account for hidden structure in the case-control data. This approach has worked successfully for a genome-wide study with a similarly heterogeneous source of case and control samples.13 The Analysis Committee will adjust for age, and sex in the final association analysis. Analyses will be performed for total ischemic stroke and for each subtype.

Because most of the available control groups have not currently been genotyped for exome content, the exome analysis will be a secondary project that will proceed as ancestry-matched exome data become available in large control populations.

Power Estimates

Most cases are of European Caucasian ancestry, although some U.S. sites also contributed African American and Hispanic cases. Power estimates indicate that the available number of European Caucasian cases (n ~ 10,633), and equivalent number of controls would provide 80% power to detect stroke-associated SNPs with odds ratios of 1.05 to 1.09 across allele frequencies ranging from 0.10 to 0.50. For the two most common CCS-defined stroke subtypes, lacunar and cardioembolic stroke, power would be 80% to detect odds ratios ranging from 1.10 to 1.17.

Conclusion

The NINDS-supported Stroke Genetic Network (SiGN) is a large-scale international collaboration aimed to discover genetic determinants of ischemic stroke and its subtypes. SiGN is uniquely positioned to successfully accomplish this objective because of sufficient power to detect the genetic associations, a standardized approach to classification of stroke ischemic subtypes, a centralized approach to genotyping, and a large collection of clinical-phenotypic and imaging data as well as genotypes for future discoveries. SiGN investigators have emphasized quality control of the phenotype data, including blinded re-adjudication of ischemic stroke subtypes as well as other quality control checks. The NINDS provides considerable management and scientific input to the Scientific Steering Committee and the SiGN investigators, consistent with U01 Cooperative Agreement funding. The SiGN organizational structure with the management leadership team represented through the Scientific Steering Committee is open, flexible and transparent with a collaborative spirit, which has been essential to resolving scientific issues and accomplishing project tasks in timely manner.

The next challenge for SiGN will be to develop open collaborations with other studies and consortia. An ongoing collaboration with CHARGE14 has already been established in order to coordinate future proposals and analyses. It would be scientifically advantageous to collaborate with groups that have access to large cohorts of Asian descent. Genetic associations that are validated across diverse populations are more likely related to functional variants.15 The SiGN Publications and Data Access Committee has adopted policies that are open to collaboration with all researchers, with the goal of maximizing progress towards understanding the role of genetic variation in risk of stroke. The SiGN Publications and Data Access Committee helps the Scientific Steering Committee to prioritize analyses and publications and assures recognition of the scientific efforts of all investigators involved in SiGN.

Supplementary Material

1

Acknowledgements

NINDS Program Officials are Katrina Gwinn and Roderick A Corriveau.

Sources of Funding: SiGN: A cooperative agreement grant from the National Institute of Neurological Disorders and Stroke NINDS U01 NS069208 funds the SiGN study. BASICMAR: The BASICMAR Genetic Study was supported by the Ministerio de Sanidad y Consumo de España, Instituto de Salud Carlos III (ISC III) with the grants: “Registro BASICMAR” Funding for Research in Health (PI051737); Contract for Research Training for Professionals with Specialty (CM06100067); Grant for GWALA project from Fondos de Investigación Sanitaria ISC III (PI10/02064); and Grant ISC III FEDER (RD12/0042/0020). Additional support provided by the Fundació la Marató TV3 with the grant “GOD’s project. Genestroke Consortium” (76/C/2011). Genotyping services were provided by the Johns Hopkins University Center for Inherited Disease Research (CIDR), which is fully funded through a federal contract from the National Institutes of Health to the Johns Hopkins University (contract number HHSN268200782096C). Assistance with data cleaning was provided by the Research in Cardiovascular and Inflammatory Diseases Program of Institute of Medical Investigations Mar, Hospital del Mar, and the Barcelona Biomedical Research Park.

BRAINS: BRAINS is supported by the British Council (UKIERI), Henry Smith Charity and Dept of Health (UK). Dr P Sharma is supported by a Dept of Health Senior Fellowship.

EDIN: The Edinburgh Stroke Study was supported by the Wellcome Trust (clinician scientist award to Dr. Sudlow), and the Binks Trust. Sample processing occurred in the Genetics Core Laboratory of the Wellcome Trust Clinical Research Facility, Western General Hospital, Edinburgh. Much of the neuroimaging occurred in the Scottish Funding Council Brain Imaging Research Centre (www.sbirc.ed.ac.uk), Division of Clinical Neurosciences, University of Edinburgh, a core area of the Wellcome Trust Clinical Research Facility and part of the SINAPSE (Scottish Imaging Network – A Platform for Scientific Excellence) collaboration (www.sinapse.ac.uk), funded by the Scottish Funding Council and the Chief Scientist Office. Genotyping was performed at the Wellcome Trust Sanger Institute in the UK and funded by the Wellcome Trust as part of the as part of the Wellcome Trust Case Control Consortium 2 project (085475/B/08/Z and 085475/Z/08/Z and WT084724MA).

GASROS: The Massachusetts General Hospital Stroke Genetics Group has been supported by the National Institutes of Health Genes Affecting Stroke Risks and Outcomes Study (GASROS) Grant K23 NS042720, the American Heart Association/Bugher Foundation Centers for Stroke Prevention Research 0775010N and NINDS K23NS042695, R01NS059727, the Deane Institute for Integrative Research in Atrial Fibrillation and Stroke and by the Keane Stroke Genetics Fund. Genotyping services were provided by the Broad Institute Center for Genotyping and Analysis, supported by Grant U54 RR020278 from the National Center for Research Resources.

GCNKSS: The Greater Cincinnati/Northern Kentucky Stroke Study is supported by the National Institutes of Health (NS 030678).

GEOS: The GEOS Study was supported by the National Institutes of Health Genes, Environment and Health Initiative (GEI) Grant U01 HG004436, as part of the GENEVA consortium under GEI, with additional support provided by the Mid-Atlantic Nutrition and Obesity Research Center (P30 DK072488); and the Office of Research and Development, Medical Research Service, and the Baltimore Geriatrics Research, Education, and Clinical Center of the Department of Veterans Affairs. Genotyping services were provided by the Johns Hopkins University Center for Inherited Disease Research (CIDR), which is fully funded through a federal contract from the National Institutes of Health to the Johns Hopkins University (contract number HHSN268200782096C). Assistance with data cleaning was provided by the GENEVA Coordinating Center (U01 HG 004446; PI Bruce S Weir). Study recruitment and assembly of datasets were supported by a Cooperative Agreement with the Division of Adult and Community Health, Centers for Disease Control and by grants from the National Institute of Neurological Disorders and Stroke (NINDS) and the NIH Office of Research on Women’s Health (R01 NS45012, U01 NS069208-01).

GRAZ: The Austrian Stroke Prevention Study was supported by the Austrian Science Fund (FWF) grant numbers P20545-P05 and P13180 and I904-B13 (Era-Net). The Medical University of Graz supports the databases of the Graz Stroke Study and the Austrian Stroke Prevention Study.

ISGS and SWISS: The Ischemic Stroke Genetics Study (ISGS) was supported by the National Institute of Neurological Disorders and Stroke (R01 NS42733; PI James F. Meschia). The Sibling with Ischemic Stroke Study (SWISS) was supported by the National Institute of Neurological Disorders and Stroke (R01 NS39987; PI James F. Meschia). Both SWISS and ISGS received additional support in part from the Intramural Research Program of the National Institute on Aging (Z01 AG000954-06; PI Andrew Singleton). SWISS and ISGS used samples and clinical data from the NIH-NINDS Human Genetics Resource Center DNA and Cell Line Repository (http://ccr.coriell.org/ninds), human subjects protocol numbers 2003-081 and 2004-147. SWISS and ISGS used stroke-free participants from the Baltimore Longitudinal Study of Aging (BLSA) as controls with the permission of Dr. Luigi Ferrucci. The inclusion of BLSA samples was supported in part by the Intramural Research Program of the National Institute on Aging (Z01 AG000015-50), human subjects protocol number 2003-078.This study utilized the high-performance computational capabilities of the Biowulf Linux cluster at the NIH (http://biowulf.nih.gov).

KRAKOW: Phenotypic data and genetic specimens collection was founded by the grant from the Polish Ministry of Science and Higher Education for Leading National Research Centers (KNOW) and by the grant from the Medical College, Jagiellonian University in Krakow, Poland: K/ZDS/002848.

LEUVEN: The Leuven Stroke genetics study was supported by personal research funds from the Department of Neurology of the University Hospitals Leuven. Vincent Thijs is supported by a Fundamental Clinical Research grant from FWO Flanders (numbers 1.8.009.08.N.00 and 1800913N).

LUND: The Lund Stroke Register was supported by the Swedish Research Council (K2010-61X-20378-04-3), Region Skåne, the Freemasons Lodge of Instruction EOS in Lund, King Gustaf V’s and Queen Victoria’s Foundation, Lund University, and the Swedish Stroke Association. Biobank services were provided by Region Skåne Competence Centre (RSKC Malmö), Skåne University Hospital, Malmö, Sweden; and Biobank, LabmedicinSkåne, University and Regional Laboratories Region Skåne, Sweden.

MCISS: The Middlesex County Ischemic Stroke Study (MCISS) was supported by intramural funding from the New Jersey Neuroscience Institute/JFK Medical Center, Edison, NJ and The Neurogenetics Foundation, Cranbury, NJ. We acknowledge Dr. Souvik Sen for his advice and encouragement in the initiation and design of this study.

MIAMISR and NOMAS(S): The Northern Manhattan Study (NOMAS) was supported by grants from the National Institute of Neurological Disorders and Stroke (R37 NS029993, R01 NS27517). The Cerebrovascular Biorepository at University of Miami/Jackson Memorial Hospital (The Miami Stroke Registry, IRB#20070386) was supported by the Department of Neurology at University of Miami Miller School of Medicine and Evelyn McKnight Brain Institute. Biorepository and DNA extraction services were provided by the Hussmann Institute for Human Genomics (HIHG) at the Miller School of Medicine.

MUNICH: The MUNICH study was supported by the Vascular Dementia Research Foundation and the Jackstaedt Stiftung.

NHS: The Nurses’ Health Study work on stroke is supported by grants from the National Institutes of Health including HL088521 and HL34594 from the National Heart Lung and Blood Institute, as well as grants from the National Cancer Institute funding the questionnaire follow-up and blood collection: CA87969 and CA49449.

OXVASC: The Oxford Vascular Study was supported by the Stroke Association, Medical Research Council, Wellcome Trust, Dunhill Medical Trust, National Institutes of Health Research (NIHR) and National Institute for Health Research (NIHR) Oxford Biomedical Research Centre based at Oxford University Hospitals NHS Trust and University of Oxford. Rothwell is in receipt of Senior Investigator Awards from the Wellcome Trust and the NIHR.

REGARDS: Reasons for Geographic and Racial Differences in Stroke Study (REGARDS) [The Etiology of Geographic and Racial Differences in Stroke] for the University of Alabama at Birmingham, School of Public Health. This research project is supported by a cooperative agreement U01 NS041588 from the National Institute of Neurological Disorders and Stroke, National Institutes of Health, Department of Health and Human Service. A full list of participating REGARDS investigators and institutions can be found at “http://www.regardsstudy.org”. SAHLSIS: SAHLSIS was supported by the Swedish Research Council (K2011-65X-14605-09-6), the Swedish Heart and Lung Foundation (20100256), the Swedish state/Sahlgrenska University Hospital (ALFGBG-148861), the Swedish Stroke Association, the Swedish Society of Medicine, and the Rune and Ulla Amlöv Foundation.

SPS3: The Secondary Prevention of Small Subcortical Strokes trial is funded by the US National Institute of Health and Neurological Disorders and Stroke grant # U01NS38529-04A1 (principal investigator, O.R.B.; co-principal investigator, R.G.H.). The SPS3 Genetic Substudy (SPS3-GENES) is funded by R01 NS073346 (co-principal investigators, J.A.J, O.R.B, A.R.S.)

ST. GEORGE’S: The principal funding for this study was provided by the Wellcome Trust, as part of the Wellcome Trust Case Control Consortium 2 project (085475/B/08/Z and 085475/Z/08/Z and WT084724MA). Collection of some of the St George’s stroke cohort was supported by project grant support from the Stroke Association.

WHI: The Women’s Health Initiatives (WHI) program is funded by the National Heart, Lung, and Blood Institute, National Institutes of Health, U.S. Department of Health and Human Services through contracts N01WH22110, 24152, 32100-2, 32105-6, 32108-9, 32111-13, 32115, 32118-32119, 32122, 42107-26, 42129-32, and 44221. The Hormones and Biomarkers Predicting Stroke (HaBPS) was supported by a grant from the National Institutes of Neurological Disorders and Stroke (R01NS042618). The content is solely the responsibility of the authors and does not necessarily represent the official views of the NINDS or the NIH.

WUSTL: The collection, extraction of DNA from blood, and storage of specimens was supported by the Washington University SPOTRIAS Center grant (P50 NS055977, NINDS, NIH). Basic demographic and clinical characterization of stroke phenotype was prospectively collected in the Cognitive Rehabilitation and Recovery Group (CRRG) registry. The ReGenesIS (Recovery Genomics after Ischemic Stroke) study was supported by a grant from the Barnes-Jewish Hospital Foundation

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Conflicts of Interest: None.

References

  • 1.Gschwendtner A, Bevan S, Cole JW, Plourde A, Matarin M, Ross-Adams H, et al. Sequence variants on chromosome 9p21.3 confer risk for atherosclerotic stroke. Ann Neurol. 2009;65:531–539. doi: 10.1002/ana.21590. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.International Stroke Genetics C, Wellcome Trust Case Control C. Bellenguez C, Bevan S, Gschwendtner A, Spencer CC, et al. Genome-wide association study identifies a variant in hdac9 associated with large vessel ischemic stroke. Nat Genet. 2012;44:328–333. doi: 10.1038/ng.1081. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Holliday EG, Maguire JM, Evans TJ, Koblar SA, Jannes J, Sturm JW, et al. Common variants at 6p21.1 are associated with large artery atherosclerotic stroke. Nature genetics. 2012;44:1147–1151. doi: 10.1038/ng.2397. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Lubitz SA, Yi BA, Ellinor PT. Genetics of atrial fibrillation. Heart Fail Clin. 2010;6:239–247. doi: 10.1016/j.hfc.2009.12.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Gudbjartsson DF, Arnar DO, Helgadottir A, Gretarsdottir S, Holm H, Sigurdsson A, et al. Variants conferring risk of atrial fibrillation on chromosome 4q25. Nature. 2007;448:353–357. doi: 10.1038/nature06007. [DOI] [PubMed] [Google Scholar]
  • 6.Meschia JF. Addressing the heterogeneity of the ischemic stroke phenotype in human genetics research. Stroke. 2002;33:2770–2774. doi: 10.1161/01.str.0000035261.28528.c8. [DOI] [PubMed] [Google Scholar]
  • 7.Ay H, Benner T, Arsava EM, Furie KL, Singhal AB, Jensen MB, et al. A computerized algorithm for etiologic classification of ischemic stroke: The causative classification of stroke system. Stroke. 2007;38:2979–2984. doi: 10.1161/STROKEAHA.107.490896. [DOI] [PubMed] [Google Scholar]
  • 8.Arsava EM, Ballabio E, Benner T, Cole JW, Delgado-Martinez MP, Dichgans M, et al. The causative classification of stroke system: An international reliability and optimization study. Neurology. 2010;75:1277–1284. doi: 10.1212/WNL.0b013e3181f612ce. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Marcus DS, Olsen TR, Ramaratnam M, Buckner RL. The extensible neuroimaging archive toolkit: An informatics platform for managing, exploring, and sharing neuroimaging data. Neuroinformatics. 2007;5:11–34. doi: 10.1385/ni:5:1:11. [DOI] [PubMed] [Google Scholar]
  • 10.Juster FT, Suzman R. An overview of the health and retirement study. J Hum Resour. 1995;30:S7–S56. [Google Scholar]
  • 11.Laurie CC, Doheny KF, Mirel DB, Pugh EW, Bierut LJ, Bhangale T, et al. Quality control and quality assurance in genotypic data for genome-wide association studies. Genet Epidemiol. 2010;34:591–602. doi: 10.1002/gepi.20516. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Abecasis GR, Altshuler D, Auton A, Brooks LD, Durbin RM, Gibbs RA, et al. A map of human genome variation from population-scale sequencing. Nature. 2010;467:1061–1073. doi: 10.1038/nature09534. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Vogelzangs N, Beekman AT, Kritchevsky SB, Newman AB, Pahor M, Yaffe K, et al. Psychosocial risk factors and the metabolic syndrome in elderly persons: Findings from the health, aging and body composition study. J Gerontol A Biol Sci Med Sci. 2007;62:563–569. doi: 10.1093/gerona/62.5.563. [DOI] [PubMed] [Google Scholar]
  • 14.Psaty BM, O'Donnell CJ, Gudnason V, Lunetta KL, Folsom AR, Rotter JI, et al. Cohorts for heart and aging research in genomic epidemiology (charge) consortium: Design of prospective meta-analyses of genome-wide association studies from 5 cohorts. Circ Cardiovasc Genet. 2009;2:73–80. doi: 10.1161/CIRCGENETICS.108.829747. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Chen LS, Saccone NL, Culverhouse RC, Bracci PM, Chen CH, Dueker N, et al. Smoking and genetic risk variation across populations of european, asian, and african american ancestry--a meta-analysis of chromosome 15q25. Genetic epidemiology. 2012;36:340–351. doi: 10.1002/gepi.21627. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1

RESOURCES