The study by Ritchie et al.,1 in this issue employs electronic health record data and DNA biobanks to identify several genomic variants previously implicated 2,3 in the variation of ECG parameters of cardiac conduction and diseases of cardiac conduction. So why is this study worthy of note?
Ever since Enthoven first named the QRS complex 4, investigators have sought to define what constitutes a normal complex and the diagnostic and prognostic significance of deviations from the norm. The growing understanding that there is no categorical set of normal values, prompted population studies of (typically white and male) subjects numbering in the 100’s. 5 and eventually tens of thousands 6. These studies did generate a more robust set of reference values and did emphasize that the notion of normal vs. abnormal QRS was not appropriate and argued for “an index of the possibility of normals or abnormals occurring at various levels” and “variations in electrocardiograms….considerably greater than the present standards would lead one to expect…” 5 Subsequent, larger population studies including clinical trial populations 7,8 with broader age and gender distributions revealed that variation in QRS characteristics in healthy individuals was larger than suspected. In parallel, several studies analyzed the clinical correlates of ECG features, For example in 1967, Pipberger et al 9 conducted what might today be called a “phenome scan” 10,11. For each of the identified ECG measures, they scanned multiple constitutional features (e.g. obesity) and ethnicity to assess bias and correlation. Among their findings were the significant differences in QRS measures in African Americans, even when correcting for differences in the other constitutional features. Fifty years later, in the era of commodity-priced genotyping, cohort studies with tens of thousands of subjects have identified dozens of SNP’s which appear to be associated with reproducible and highly significant variation in QRS duration as well as several disorders of cardiac conduction (e.g. atrioventricular block). 2, 3 Several of these SNP’s implicated the SCN10A gene, a subunit of one of the voltage-gated sodium channels, also found in the study of Ritchie et al.,1
Also, over the last three decades, with the deployment of electronic health records (EHRs), informaticians at leading healthcare systems demonstrated how ECG data could be integrated with other clinical data obtained in the course of healthcare delivery and used to predict outcomes, such as mortality, in the very same populations being cared for. 12 These early efforts laid the foundations for exploiting the low incremental costs of using EHR data to rapidly characterize and select study populations. As phenotyping and sample acquisition became the major costs in disease genomics studies 13, the use of EHR’s to create an instrumented health enterprise for genetic discovery research using the informational byproducts of healthcare delivery (i.e. clinical documentation) and banked or discarded clinical blood samples has become increasingly attractive. Multiple studies have shown this EHR-driven approach feasible, accurate and cost-effective14 and several national funding agencies now support these studies internationally.
In this context, the contribution of the study by Ritchie et al., 1 is twofold. First, is the demonstration that EHR-driven phenotyping can be used to accurately select patients and reproduce genomic associations (principally pointing to the genes SCN5A and and SCN10A) for conduction disorders in a manner that scales cost-effectively to much larger population studies. Specifically, the EHRs were used to identify healthy individuals, to quantify their ECG measures and select the corresponding genotypes for the same individuals. Second, are the insights provided by the hypothesis-free inversion of conventional genome-wide studies. That is, the investigators selected the most significant SNP’s (with respect to QRS variation) and scanned the entirety of the diagnoses of all patients in the EHR to determine which diagnoses were significantly correlated with those common genomic variants. These included a variety of cardiac arrhythmias. Moreover, they used the EHR to longitudinally to track the patients originally identified as healthy in their QRS study, and found that 3% of patients developed atrial fibrillation or atrial flutter at some point at least one month following the normal ECG and 11% were coded as having a variety of subsequent arrhythmias. This in silico cohort selection and longitudinal study is in many ways a model case of precision medicine as defined in a recent Institute of Medicine report 15. That report argued for the creation of an “information commons” with multiple layers of measurement all linked to individual patients to accelerate the acquisition of biomedical knowledge. The report also emphasized the importance of population studies that include the full complexity of our patient populations, including their ethnic heterogeneity, polypharmacy and comorbidities. The report also anticipated redrawing the current categorical diagnostic or disease boundaries as multidimensional and/or probabilistic measures that draw directly from the quantitative measures available in the information commons. The work of Ritchie et al., provides additional evidence of the feasibility and efficacy of the precision medicine model.
There remain several important loose ends in this study. For example, members of underrepresented minorities were specifically excluded, even though there is at least a fifty-year history of ethnicity-specific variation in ECG characteristics. Because these same underrepresented minorities are often overrepresented in academic health centers, the same EHR-driven approach could be readily and rapidly used to study the genomic basis of those differences 16. Also, this study relied heavily on billing codes rather than the fine-grained diagnostic assessment of clinicians. The systematic application of natural language processing techniques to codify the content of clinical notes in EHRs will minimize the biases and lack of granularity that come from the use of billing data 17. Most ambitiously, restructuring the phenome scan to include broader processes such as inflammation or thrombosis may help speed the genomic characterization of the endopathotypes 18 that underlie multiple diseases.
Acknowledgments
Conflict of Interest Disclosures: I.K. is funded by the NIH eMERGE network that also funded the work of Ritchie et al. (under a separate grant).
References
- 1.Ritchie Marylyn D, Denny Joshua C, Zuvich RL, Crawford Dana C, Schildcrout JS, Bastarache L, Ramirez Andrea H, Mosley JD, Pulley JM, Basford MA, Bradford Y, Rasmussen LV, Pathak J, Chute CG, Kullo IJ, McCarty C, Chisholm RL, Kho AN, Carlson CS, Larson EB, Jarvik GP, Sotoodehnia N, Manolio TA, Li R, Masys DR, Haines JL, Roden DM. Genome- and phenome-wide analysis of cardiac conduction identifies markers of arrhythmia risk. Circulation. 2013;127:XX–XXX. doi: 10.1161/CIRCULATIONAHA.112.000604. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Sotoodehnia N, Isaacs A, de Bakker PIW, Dörr M, Newton-Cheh C, Nolte IM, van der Harst P, Müller M, Eijgelsheim M, Alonso A, Hicks AA, Padmanabhan S, Hayward C, Smith AV, Polasek O, Giovannone S, Fu J, Magnani JW, Marciante KD, Pfeufer A, Gharib SA, Teumer A, Li M, Bis JC, Rivadeneira F, Aspelund T, Köttgen A, Johnson T, Rice K, Sie MPS, Wang YA, Klopp N, Fuchsberger C, Wild SH, Leach IM, Estrada K, Völker U, Wright AF, Asselbergs FW, Qu J, Chakravarti A, Sinner MF, Kors JA, Petersmann A, Harris TB, Soliman EZ, Munroe PB, Psaty BM, Oostra BA, Cupples LA, Perz S, de Boer RA, Uitterlinden AG, Völzke H, Spector TD, Liu F-Y, Boerwinkle E, Dominiczak AF, Rotter JI, van Herpen G, Levy D, Wichmann H-E, van Gilst WH, Witteman JCM, Kroemer HK, Kao WHL, Heckbert SR, Meitinger T, Hofman A, Campbell H, Folsom AR, van Veldhuisen DJ, Schwienbacher C, O’Donnell CJ, Volpato CB, Caulfield MJ, Connell JM, Launer L, Lu X, Franke L, Fehrmann RSN, te Meerman G, Groen HJM, Weersma RK, van den Berg LH, Wijmenga C, Ophoff RA, Navis G, Rudan I, Snieder H, Wilson JF, Pramstaller PP, Siscovick DS, Wang TJ, Gudnason V, van Duijn CM, Felix SB, Fishman GI, Jamshidi Y, Stricker BHC, Samani NJ, Kääb S, Arking DE. Common variants in 22 loci are associated with qrs duration and cardiac ventricular conduction. Nat Genet. 2010;42:1068–1076. doi: 10.1038/ng.716. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Holm H, Gudbjartsson DF, Arnar DO, Thorleifsson G, Thorgeirsson G, Stefansdottir H, Gudjonsson SA, Jonasdottir A, Mathiesen EB, Njølstad I, Nyrnes A, Wilsgaard T, Hald EM, Hveem K, Stoltenberg C, Løchen M-L, Kong A, Thorsteinsdottir U, Stefansson K. Several common variants modulate heart rate, pr interval and qrs duration. Nat Genet. 2010;42:117–122. doi: 10.1038/ng.511. [DOI] [PubMed] [Google Scholar]
- 4.Bjerregaard P, Gussak I. Naming of the waves in the ecg with a brief account of their genesis. Circulation. 1999;100:e148. doi: 10.1161/01.cir.100.25.e148. [DOI] [PubMed] [Google Scholar]
- 5.Stewart CB, Manning GW. A detailed analysis of the electrocardiograms of 500 rcaf aircrew. Am Heart J. 1944;27:502–523. [Google Scholar]
- 6.Hiss RG, Lamb LE, Allen MF. Electrocardiographic findings in 67,375 asymptomatic subjects. X. Normal values. Am J Cardiol. 1960;6:200–231. doi: 10.1016/0002-9149(60)90047-3. [DOI] [PubMed] [Google Scholar]
- 7.Dmitrienke AA, Sides GD, Winters KJ, Kovacs RJ, Rebhun DM, Bloom JC, Groh W, Eisenberg PR. Electrocardiogram reference ranges derived from a standardized clinical trial population. Drug information journal. 2005;39:395–405. [Google Scholar]
- 8.Mason JW, Ramseth DJ, Chanter DO, Moon TE, Goodman DB, Mendzelevski B. Electrocardiographic reference ranges derived from 79,743 ambulatory subjects. J Electrocardiol. 2007;40:228–234. e228. doi: 10.1016/j.jelectrocard.2006.09.003. [DOI] [PubMed] [Google Scholar]
- 9.Pipberger HV, Goldman MJ, Littmann D, Murphy GP, Cosma J, Snyder JR. Correlations of the orthogonal electrocardiogram and vectorcardiogram with constitutional variables in 518 normal men. Circulation. 1967;35:536–551. doi: 10.1161/01.cir.35.3.536. [DOI] [PubMed] [Google Scholar]
- 10.Jones R, Pembrey M, Golding J, Herrick D. The search for genenotype/phenotype associations and the phenome scan. Paediatr Perinat Epidemiol. 2005;19:264–275. doi: 10.1111/j.1365-3016.2005.00664.x. [DOI] [PubMed] [Google Scholar]
- 11.Denny J, Ritchie M, Basford M, Pulley J, Bastarache L, Brown-Gentry K, Wang D, Masys D, Roden D, Crawford D. Phewas: Demonstrating the feasibility of a phenome-wide scan to discover gene-disease associations. Bioinformatics. 2010;26:1205–1210. doi: 10.1093/bioinformatics/btq126. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Tierney WM, Takesue BY, Vargo DL, Zhou X-H. Using electronic medical records to predict mortality in primary care patients with heart disease. J Gen Intern Med. 1996;11:83–91. doi: 10.1007/BF02599583. [DOI] [PubMed] [Google Scholar]
- 13.Murphy S, Churchill S, Bry L, Chueh H, Weiss S, Lazarus R, Zeng Q, Dubey A, Gainer V, Mendis M, Glaser J, Kohane I. Instrumenting the health care enterprise for discovery research in the genomic era. Genome Res. 2009;19:1675–1681. doi: 10.1101/gr.094615.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Kohane IS. Using electronic health records to drive discovery in disease genomics. Nat Rev Genet. 2011;12:417–428. doi: 10.1038/nrg2999. [DOI] [PubMed] [Google Scholar]
- 15.Toward precision medicine : Building a knowlege network for biomedical research and a new taxonomy of disease. Washington, DC: National Academies Press; 2011. [PubMed] [Google Scholar]
- 16.Kurreeman F, Liao K, Chibnik L, Hickey B, Stahl E, Gainer V, Li G, Bry L, Mahan S, Ardlie K, Thomson B, Szolovits P, Churchill S, Murphy SN, Cai T, Raychaudhuri S, Kohane I, Karlson E, Plenge RM. Genetic basis of autoantibody positive and negative rheumatoid arthritis risk in a multi-ethnic cohort derived from electronic health records. Am J Hum Genet. 2011;88:57–69. doi: 10.1016/j.ajhg.2010.12.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Carroll RJ, Thompson WK, Eyler AE, Mandelin AM, Cai T, Zink RM, Pacheco JA, Boomershine CS, Lasko TA, Xu H, Karlson EW, Perez RG, Gainer VS, Murphy SN, Ruderman EM, Pope RM, Plenge RM, Kho AN, Liao KP, Denny JC. Portability of an algorithm to identify rheumatoid arthritis in electronic health records. JAMIA. 2012;19:e162–9. doi: 10.1136/amiajnl-2011-000583. Epub 2012 Feb 28. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Loscalzo J, Kohane I, Barabasi AL. Human disease classification in the postgenomic era: A complex systems approach to human pathobiology. Mol Syst Biol. 2007;3:124. doi: 10.1038/msb4100163. [DOI] [PMC free article] [PubMed] [Google Scholar]