Skip to main content
Journal of the American Medical Informatics Association : JAMIA logoLink to Journal of the American Medical Informatics Association : JAMIA
. 2013 Sep 3;20(e2):e232–e238. doi: 10.1136/amiajnl-2013-001932

Defining a comprehensive verotype using electronic health records for personalized medicine

Mary Regina Boland 1, George Hripcsak 1, Yufeng Shen 1,2, Wendy K Chung 3,4,5, Chunhua Weng 1,5
PMCID: PMC3861934  PMID: 24001516

Abstract

The burgeoning adoption of electronic health records (EHR) introduces a golden opportunity for studying individual manifestations of myriad diseases, which is called ‘EHR phenotyping’. In this paper, we break down this concept by: relating it to phenotype definitions from Johannsen; comparing it to cohort identification and disease subtyping; introducing a new concept called ‘verotype’ (Latin: vere = true, actually) to represent the ‘true’ population of similar patients for treatment purposes through the integration of genotype, phenotype, and disease subtype (eg, specific glucose value pattern in patients with diabetes) information; analyzing the value of the ‘verotype’ concept for personalized medicine; and outlining the potential for using network-based approaches to reverse engineer clinical disease subtypes.

Keywords: Electronic Health Records, Phenotype, Genotype, Genetics

Introduction

During the seminal days of genetic research, researchers sought ways of describing and defining the complex topic of heritability while pondering over the transmission of traits1–3 before discovering the ‘gene’.4 Genetics research introduced the concept of genotypes, phenotypes, enterotypes (‘types’ based on composition of gut microbiota),5 endophenotypes (proximal disease-related phenotype with a clear genetic component regardless of disease presence)6–8 and deep phenotypes (detailed phenotype),9 10 enabling us to define human characteristics that reflect myriad disease states. The rapid adoption of electronic health records (EHR)11 introduces a new opportunity for disease characterization.

In this paper, we take a historical perspective to breakdown the concept of ‘EHR phenotyping’ by comparing the concept to those outlined by Johannsen.1 We also discuss the value of disease subtyping using EHR to identify related groups of patients useful for developing personalized medical treatment regimens. Then, we outline the value of network-based approaches for reverse engineering disease subtypes from EHR.

Breaking down EHR phenotyping using Johannsen's definitions

Historical background

Mendel described the pattern of transmission of ‘characters’ (or alleles) from parent to offspring (ie, genotype) as either dominant or recessive.2 3 A dominant allele controls the expression of a trait even if an individual is heterozygous (ie, possessing only one of two copies at a single locus). A recessive allele will not affect an individual's trait unless they are homozygous. Consequently, recessively inherited traits disappear in a generation and then reappear in subsequent generations.2 3 Later, Johannsen coined the terms phenotype, genotype, and biotype.1 These concepts were described before discovering that DNA transmits heritable characteristics to individuals.4

We illustrate the interrelationship among these concepts using eye color, a complex trait.12 Eye color can change as a result of health status13 and access to medical treatment,14 with 16 genes contributing to its heritability (genotype) in humans. Interestingly, individuals with lower social status developed darker eyes than those with high social status in Nile tilapia15 suggesting that other factors may also affect eye color. We use Johannsen's term biotype to describe individuals with the same genotype and phenotype1 as opposed to other slightly modified definitions.16–18 One example biotype consists of individuals with a genotype for blue eyes, but possessing green eyes (darkening of eye color was found in women with many pregnancies);19 while another example biotype consists of individuals with a genotype for green eyes and possessing green eyes (normal phenotype). Interestingly, certain individuals have a different phenotype in each of their eyes (heterochromia), but their underlying genotype is the same,20 which is a third example biotype.

Hippocrates described the identification of disease subtypes.21 The characterization of disease subtypes is called ‘deep phenotyping’ by some researchers22 while others reserve it for genetic information.23 Because we focus on clinical data stored in EHR, we use ‘clinical disease subtype’21 24 25 throughout this paper. A ‘clinical disease subtype’ is any ‘type’ that stratifies a diseased population into subpopulations. Table 1 provides a summary of definitions with medical examples.

Table 1.

Adaption of traditional phenotyping terminology to the EHR context

The genetic phenotyping context Adaption to the EHR context
Term Genetic definition Clinical data redefinition Examples
Johannsen Genotype “We do not know a ‘genotype’, but we are able to demonstrate ‘genotypical’ differences or accordances… ‘Genotype’…is the sum total of the potentialities of the zygotes in question. That these potentialities are partly separable (‘segregating’ after hybridization) is adequately expressed by the ‘genotype’ as composed of ‘genes’.”1 NA BRCA1 alleles, TCF7L2, glucokinase, HLA alleles
Phenotype “We may easily find out that the organisms in question resemble each other so much that they belong to the same ‘type’… or we may in other cases state that they present a disparity so considerable that two or more different ‘types’ may be discerned. All ‘types’ of organisms, distinguishable by direct inspection or only by finer methods of measuring or description, may be characterized as ‘phenotypes.’”1 Any phenotype, for example, diabetes, height, weight, that has related data elements extractable from EHR data Height, diabetes, atherosclerosis
Biotype A group of organisms characterized by having the same phenotype and genotype.1 NA Breast cancer and BRCA1
Hippocrates Clinical disease subtype Heterogeneous diseases can be classified into smaller disease ‘subtypes’ when the subtypes have different characteristics (eg, tissue-based biomarker, mutation, and symptom).21 24 25 Using Johannsen's definition for phenotype,1 we define ‘clinical disease subtype’ to be any set of characteristics that distinguishes a subset of diseased patients from the overall diseased population Chronic, benign, malignant

EHR, electronic health record; HLA, human leucocyte antigen.

Phenotypic variance

Johannsen describes two factors that introduce phenotypic variance: environmental and genetic (table 2).1 In EHR, phenotypic variance can also be introduced by variability in healthcare practice and medical decision-making among care providers26 27 or by varying documentation behaviors,28 which adds two factors that may contribute to phenotype variance: the healthcare process and documentation behavior (table 2).1

Table 2.

Factors that introduce phenotypic variance in genetic and EHR data

Type of phenotype variance Example
Genetic Genetic trait could result in organisms with shorter than expected heights
Environmental (non-inherited) Food shortage could result in organisms with shorter than expected heights
Healthcare process Presence or absence of insurance could result in individuals with un or underreported height
Healthcare documentation Presence or absence of clinician experience could result in inadequate or inconsistent measurement of individuals’ height

EHR, electronic health record.

In figure 1, we illustrate influential factors that affect the traditional and EHR-based phenotypes, respectively. Many factors affect EHR phenotypes including clinicians’ documentation behavior. The experience of the person documenting can affect the degree of detail contained in the documentation. For example, a medical student may include more details on certain less relevant items and then miss critical items. To detect this in EHR, notes can be compared across clinicians of varying experience levels for the same set of patients. Agreement could be assessed and outliers identified (eg, highly skilled clinicians). If outcome prediction is the goal, then the documentation of highly skilled clinicians (ie, outliers) may be more useful as skill and predictive ability are related. Some factors, such as lifestyle, are recorded by many EHR. However, these data are not always stored in a standard form, and may require specialized extraction methodologies. For example, smoking status can be assessed in multiple ways including using clinical notes,29 30 and billing codes.31 These factors can all introduce phenotypic variance. Differences between EHR data and the ‘true patient state’ are described elsewhere.32

Figure 1.

Figure 1

Factors that influence ‘phenotype’ identification in genetic and clinical data. Various factors introduce phenotypic variance in the traditional genetics model and the clinical data model (that utilizes EHR). Places where EHR can be utilized to assess each factor are highlighted in light orange. Thicker arrows show the main path for factors. EHR, electronic health record; VE, variance due to environment; VG, variance due to genetics; VHD, variance due to healthcare documentation; VHP, variance due to healthcare process. We include ‘well-controlled’, ‘stable’ and ‘critical’ condition as examples of patient status. For disease status, we include ‘early (eg, stage i)’, and ‘advanced (eg, stage iii)’ as examples. A patient may have a disease status indicating that their breast cancer is ‘advanced or stage iii’. If that same patient is later admitted to the hospital due to a car accident and is in a ‘critical condition’ then their patient status would be ‘critical’ while their disease status would remain unaffected (advanced breast cancer still present). The loop at the top of disease status indicates that a disease's status can affect the status of a second disease. For example, if a patient has advanced diabetes then their status for a second disease—retinopathy—could be affected.

Phenotyping with EHR

Phenotyping as cohort identification

Cohort identification, namely identifying patients with or without a given disease, for example, type 2 diabetes mellitus, is a popular use of EHR33 called ‘EHR-based phenotyping’34 35 or ‘EHR-driven phenotyping’.36 37 In phenome-wide association studies,38 EHR-based phenotyping is used to identify patient cohorts that possess a phenotype, for example, diabetes mellitus, hypothyroidism, cataracts,39 before associating phenotypes with genetic markers.38 EHR-based phenotyping is also used for electronic prescreening to determine patients’ eligibility for clinical trials.40 41 Importantly, cohort identification existed before EHR and therefore EHR are not necessarily required.42 43 However, there are situations in which cohort identification would be impractical without utilizing EHR. This is particularly true for identifying cohorts with a rare disease or outcome. Using an integrated EHR, over 33 000 HIV patients (a rare disease) were identified.44 A cohort of that size would be impossible, or practically unfeasible, to identify without the use of EHR. In general, EHR facilitate the process of cohort identification43 and often result in studies with greater power and lower cost;45 but they also possess their own unique set of challenges.46 47

Phenotyping as disease subtype discovery

Another use case is for identifying novel disease subtypes using clinical data from EHR. This ‘disease subtyping’ depends on the identification of a higher-level ‘parent’ phenotype, that is, the disease. Before EHR, identifying disease subtypes was challenging42 48–51 and in many cases it required glaring phenotypic differences. For example, types of diabetes were initially distinguished by age, namely juvenile (type 1) and adult (type 2) diabetes. Over time, these categories were made more descriptive with insulin dependent (type 1) and non-insulin dependent (type 2), which eventually gave way to ‘type 1’ and ‘type 2’. Therefore, disease subtyping was possible before EHR using clinical data (eg, observations, chart review). However, it was challenging as initial observations (juvenile vs adult) regarding a disease subtype were often incomplete. In genetics, disease subtyping often occurs by identifying genetic or molecular ‘biomarkers’ (ie, disease subtype) that segregate a diseased population into subpopulations.24 52 53 An example of EHR-based clinical disease subtyping includes identifying a patient subpopulation with an interesting glucose value pattern (ie, disease subtype) within diabetics (ie, disease).54

Verotype: the patient's ‘true’ type

Learning from Johannsen's definition of biotype as a group of organisms with the same phenotype and genotype,1 we introduce a new concept called ‘verotype’ from the Latin word vere, meaning truly or actually. This higher level ‘type’ defines a unique combination of genotype, phenotype, and disease subtype for an individual. We named it verotype because it indicates the true subpopulation that a patient belongs to, for example, diabetic with unique glucose pattern,54 and is related to the ‘true patient state’.32

  • Verotype: A group of organisms characterized by having the same phenotype, genotype and clinical disease subtype (eg, phenotype, breast cancer; genotype, BRCA1; clinical disease subtype, estrogen response pattern).

An example of what we would consider a complete verotype is a group of patients with type 2 diabetes mellitus (phenotype), a shared daily glucose pattern (clinical disease subtype), and identical genetic risk factor (genotype). The phenotype can be identified either using a non-EHR approach (eg, chart review, diagnostic criteria, clinical examination)55 or an EHR-based approach (eg, cohort identification algorithm).40 56 Figure 2 illustrates how the genotype contributes to the phenotype, which in turn contributes to the clinical disease subtype. Each unique combination of the three contributes to the patient's overall ‘verotype’. We hypothesize that identifying the entire ‘verotype’ will promote precision medicine57 as it characterizes not only the patient's disease, but also other important clinical characteristics (eg, post-prandial and fasting glycemia—a clinical disease subtype), and genetic underpinnings related to the disease.

Figure 2.

Figure 2

A semantic network illustrating the relationship between genotype, phenotype, clinical disease subtype, biotype and verotype. Places where electronic health records can be utilized are highlighted in light orange.

Reverse engineering clinical disease subtypes

The conventional approach

Before large-scale data mining of EHR, clinical disease subtyping was performed by collecting clinical data from patients with a given disease, for example, Parkinson's disease (PD).58 Patients were then clustered based on their observed clinical findings58 and statistically significant clusters were considered PD subtypes.58 Afterwards, the relationship between each PD subtype and outcome had to be established and verified.59 Using EHR enables researchers to develop algorithms for disease subtype classification,48 60 and to identify clinical features associated with a disease subtype, for example, estrogen/progesterone negative breast cancer.61

The high-throughput approach

EHR offer the opportunity to develop novel methods for investigating new disease characteristics using clinical data,32 for example, laboratory values.62 63 Furthermore, novel disease subtypes identified from EHR have the potential for predicting patient outcomes64 more accurately than predefined subtypes. This is particularly true for poorly characterized mental diseases,65 for example, PD,58 depression50 and amyloid lateral sclerosis.49 51

A proposal to apply a network-based approach to clinical disease subtyping

This led us to look to ‘network medicine’66 for a solution. Network medicine involves integrating knowledge from various sources including genes, biological pathways, protein–protein interaction complexes, and so on, to identify tailored biomarkers for disease treatment.66 Network approaches were used in genetics to identify regulatory pathways from gene expression.67 Not limited to genetics, some researchers have applied network approaches to demonstrate that social influences contributing to the development of obesity are as strong as genetic factors.68 Leveraging this expertise,69 we can apply network medicine methodologies to EHR32 to reverse engineer67 disease subtypes by integrating various data sources within EHR (eg, laboratory results, medications, visits), and linking them to external sources (eg, PubMed). To achieve this, we can treat each medical entity70 71 as a ‘marker’ for a clinical disease subtype. These markers can then be associated with various diseases or disease severities (eg, chronic, acute), using a high-throughput approach similar to those used in genetics.24 72–74 We can use laboratory values (for laboratory test entities),62 63 dosage level (for medication entities) or the frequency of specialist visits (for specialist entities) and so on. These EHR markers and their expression values are related to typical gene expression data used in genetics studies.75 76 Similar work was performed using the National Health and Nutrition Examination Survey.77

In genetics, network-based approaches were used78–81 to attain meaningful results because non-network-based association studies often lacked statistical power to analyze individual genes.82 Some network approaches search for hubs of interesting genes within a network,83 while others integrate various types of data (protein–protein interactions, gene expression) to find genes considered ‘important’ to the disease of interest.84 85 For clinical disease subtyping using EHR, a network approach would be useful to identify EHR ‘markers’ or combinations of EHR markers associated with a certain disease.

Figure 3 shows how each marker's expression pattern indicates a certain patient state: presence/absence of diabetes, presence/absence of difficult to manage diabetes and so on. If we integrate the analysis of EHR markers then a distinct diabetes pattern (clinical disease subtype) emerges. However, if markers are analyzed in isolation of each another then the resulting conclusion may differ drastically from the ‘true patient state’. For example, if only hemoglobin A1C is analyzed then it is possible that the patient does not have diabetes (figure 3). However, hemoglobin A1C values can be misleading when the patient is being treated for diabetes, but when EHR markers are integrated then a distinctive diabetes pattern emerges and the likelihood of the patient having diabetes depends on the expression of each marker. Integrating markers86 87 is applicable for both EHR-based phenotyping (eg, to identify that a patient has type 2 diabetes) and for clinical disease subtyping. Importantly, in order for a set of markers (or an integrated pattern of markers) to be considered as a clinical disease subtype then they must be able to stratify the diseased population into a subpopulation with some distinguishing characteristics.58

Figure 3.

Figure 3

The expression of markers for electronic health record (EHR) entities can be used to enable clinical disease subtyping. The highly diverse types of ‘markers’ stored in the EHR, for example, laboratory test, medication, specialist visits, can be utilized to reverse engineer clinical disease subtypes using a flexible ‘expression’ approach based on the type of marker. Each pattern indicates the most likely patient state based on the marker's expression level. HbA1C, hemoglobin A1C.

However, because of the diversity among EHR entities, the type of ‘expression’ must be based on the entity of the marker.88 For instance, EHR entities contain Boolean values (eg, presence/absence of International Classification of Disease, revision 9 codes), numerical values (eg, hemoglobin A1C) and nominal values (eg, medication name). A network-based approach would be useful because each marker's expression pattern would then either increase or decrease the probability that a patient belongs to a certain disease subpopulation (figure 3). Once a novel subtype is characterized, the expression of EHR markers could be used to identify the subtype.

Recommendations

We envision that more effective and personalized disease treatment regimens will be possible when each patient's genotype, phenotype, and clinical disease subtype information is integrated89 to form the patient's complete ‘verotype’. We posit that a network approach would be useful for integrating genetic and EHR markers because it would reduce the computational complexity90 introduced by having multiple EHR markers and multiple genes (polygenic) associated with one disease. When a patient's true disease subtype is known then clinicians can plan a more effective and personalized treatment plan for that patient. Identifying the entire verotype can also benefit future outcomes researchers by allowing the efficacy of two treatments to be compared within a subset of truly related patients.

Conclusions

We break down the concept of ‘EHR phenotyping’ by relating it to definitions defined by Johannsen.1 We relate it to cohort identification and disease subtyping. We also coin a new term, verotype, to group patients who have the same genotype, phenotype, and clinical disease subtype. We recommend using a patient's verotype to develop personalized medical treatment regimens. Finally, we outline the potential for a network-based approach to reverse engineer clinical disease subtypes using EHR markers.

Acknowledgments

The authors would like to thank: Gregory Hruby, Drashko Nakikj, Silis Jiang, Junfeng Gao, Nicole Weiskopf and Riccardo Miotto for useful discussions on the topic of phenotyping during various laboratory meetings.

Footnotes

Contributors: MRB reviewed and compiled literature from various domains, developed ideas and wrote the manuscript. GH, YS and WKC provided useful insights, discussion and feedback. CW is principal investigator, developed ideas, and wrote the manuscript. All authors read and approved the final manuscript.

Funding: The research described was supported by grants R01LM009886 from the National Library of Medicine, grant U01 HG006380 from the Human Genome Research Institute, and grant UL1 TR000040 from the National Center for Advancing Translational Sciences.

Competing interests: None.

Provenance and peer review: Not commissioned; externally peer reviewed.

References

  • 1.Johannsen W. The genotype conception of heredity. Am Nat 1911;45:129–59 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Mendel G. Versuche über Plflanzenhybriden. Verhandlungen des naturforschenden Vereines in Brünn 1866;4:3–47 [Google Scholar]
  • 3.Mendel G, Bateson W, Blumberg R.1996. Experiments in Plant Hybridization (1865) translated into English (1901) http://www.esp.org/foundations/genetics/classical/gm-65.pdf.
  • 4.Watson JD, Crick FC. Genetical implications of the structure of deoxyribonucleic acid. JAMA 1993;269:1967–9 [PubMed] [Google Scholar]
  • 5.Arumugam M, Raes J, Pelletier E, et al. Enterotypes of the human gut microbiome. Nature 2011;473:174–80 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Almasy L, Blangero J. Endophenotypes as quantitative risk factors for psychiatric disease: rationale and study design. Am J Med Genet 2001;105:42–4 [PubMed] [Google Scholar]
  • 7.Dick D, Jones K, Saccone N, et al. Endophenotypes successfully lead to gene identification: results from the collaborative study on the genetics of alcoholism. Behav Genet 2006;36:112–26 [DOI] [PubMed] [Google Scholar]
  • 8.Gur RE, Nimgaonkar VL, Almasy L, et al. Neurocognitive endophenotypes in a multiplex multigenerational family study of schizophrenia. Am J Psychiatry 2007;164:813–19 [DOI] [PubMed] [Google Scholar]
  • 9.Altshuler D, Daly MJ, Lander ES. Genetic mapping in human disease. Science 2008;322:881–8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Robinson PN. Deep phenotyping for precision medicine. Hum Mutat 2012;33:777–80 [DOI] [PubMed] [Google Scholar]
  • 11.Jha AK, Ferris TG, Donelan K, et al. How common are electronic health records in the United States? A summary of the evidence. Health Aff 2006;25:w496–507 [DOI] [PubMed] [Google Scholar]
  • 12.Sturm RA, Frudakis TN. Eye colour: portals into pigmentation genes and ancestry. Trends Genet 2004;20:327–32 [DOI] [PubMed] [Google Scholar]
  • 13.Wiersinga WM, Prummel MF, Mourits MP, et al. Classification of the eye changes of Graves’ disease. Thyroid 1991;1:357–60 [DOI] [PubMed] [Google Scholar]
  • 14.Imesch PD, Wallow IHL, Albert DM. The color of the human eye: a review of morphologic correlates and of some conditions that affect iridial pigmentation. Surv Ophthalmol 1997;41(Suppl. 2):S117–23 [DOI] [PubMed] [Google Scholar]
  • 15.Vera Cruz EM, Brown CL. The influence of social status on the rate of growth, eye color pattern and insulin-like growth factor-I gene expression in Nile tilapia, Oreochromis niloticus. Horm Behav 2007;51:611–19 [DOI] [PubMed] [Google Scholar]
  • 16.Clark HL. Biotypes and phylogeny. Am Nat 1912;46:139–50 [Google Scholar]
  • 17.Shull G. Genetic definitions in the new standard dictionary. Am Nat 1915;49:52–9 [Google Scholar]
  • 18.Downie D. Baubles, bangles, and biotypes: a critical review of the use and abuse of the biotype concept. J Insect Sci 2010;10:1–18 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Stjernschantz JW, Albert DM, Hu D-N, et al. Mechanism and clinical significance of prostaglandin-induced iris pigmentation. Surv Ophthalmol 2002;47(Suppl. 1):S162–S75 [DOI] [PubMed] [Google Scholar]
  • 20.White D, Rabago-Smith M. Genotype-phenotype associations and human eye color. J Hum Genet 2011;56:5–7 [DOI] [PubMed] [Google Scholar]
  • 21.Nguyen DX, Massague J. Genetic determinants of cancer metastasis. Nat Rev Genet 2007;8:341–52 [DOI] [PubMed] [Google Scholar]
  • 22.Banerjee P, Choi B, Shahine LK, et al. Deep phenotyping to predict live birth outcomes in in vitro fertilization. Proc Natl Acad Sci U S A 2010;107:13570–5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Tracy RP. ‘Deep phenotyping’: characterizing populations in the era of genomics and systems biology. Curr Opin Lipidol 2008;19:151–7 [DOI] [PubMed] [Google Scholar]
  • 24.Köbel M, Kalloger SE, Boyd N, et al. Ovarian carcinoma subtypes are different diseases: implications for biomarker studies. PLoS Med 2008;5:e232 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Froguel P, Zouali H, Vionnet N, et al. Familial hyperglycemia due to mutations in glucokinase—definition of a subtype of diabetes mellitus. N Engl J Med 1993;328:697–702 [DOI] [PubMed] [Google Scholar]
  • 26.Fitzpatrick AL, Powe NR, Cooper LS, et al. Barriers to health care access among the elderly and who perceives them. Am J Public Health 2004;94:1788–94 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Kennedy J, Rhodes K, Walls CA, et al. Access to emergency care: restricted by long waiting times and cost and coverage concerns. Ann Emerg Med 2004;43:567–73 [DOI] [PubMed] [Google Scholar]
  • 28.Tabak N, Bar-Tal Y, Cohen-Mansfield J. Clinical decision making of experienced and novice nurses. West J Nurs Res 1996;18:534–47 [DOI] [PubMed] [Google Scholar]
  • 29.Clark C, Good K, Jezierny L, et al. Identifying smokers with a medical extraction system. J Am Med Inform Assoc 2008;15:36–9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Uzuner Ö, Goldstein I, Luo Y, et al. Identifying patient smoking status from medical discharge records. J Am Med Inform Assoc 2008;15:14–24 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Wiley LK, Shah A, Xu H, et al. ICD-9 tobacco use codes are effective identifiers of smoking status. J Am Med Inform Assoc 2013;20:652–8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Hripcsak G, Albers DJ. Next-generation phenotyping of electronic health records. J Am Med Inform Assoc 2013;20:117–21 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Turchin A, Kohane IS, Pendergrass ML. Identification of patients with diabetes from the text of physician notes in the electronic medical record. Diabetes Care 2005;28:1794–5 [DOI] [PubMed] [Google Scholar]
  • 34.Overby C, Weng C, Haerian K, et al. Evaluation considerations for EHR-based phenotyping algorithms: a case study for drug induced liver injury. AMIA Summits on Translational Science Proceedings 2013:130–4 [PMC free article] [PubMed] [Google Scholar]
  • 35.Peissig PL, Rasmussen LV, Berg RL, et al. Importance of multi-modal approaches to effectively identify cataract cases from electronic health records. J Am Med Inform Assoc 2012;19:225–34 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Yao L, Zhang Y, Li Y, et al. Electronic health records: implications for drug discovery. Drug Discov Today 2011;16:594–9 [DOI] [PubMed] [Google Scholar]
  • 37.Thompson WK, Rasmussen LV, Pacheco JA, et al. An evaluation of the NQF quality data model for representing electronic health record driven phenotyping algorithms. AMIA Annual Symposium Proceedings 2012:911–20 [PMC free article] [PubMed] [Google Scholar]
  • 38.Denny JC, Ritchie MD, Basford MA, et al. PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene–disease associations. Bioinformatics 2010;26:1205–10 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Kho AN, Pacheco JA, Peissig PL, et al. Electronic medical records for genetic research: results of the eMERGE consortium. Sci Transl Med 2011;3:79re1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Weng C, Batres C, Borda T, et al. A real-time screening alert improves patient recruitment efficiency. AMIA Annual Symposium Proceedings 2011:1489–98 [PMC free article] [PubMed] [Google Scholar]
  • 41.Li L, Chase HS, Patel CO, et al. Comparing ICD9-encoded diagnoses and NLP-processed discharge summaries for clinical trials pre-screening: a case study. AMIA Annual Symposium Proceedings 2008:404–8 [PMC free article] [PubMed] [Google Scholar]
  • 42.Pipberger HV, Goldman MJ, Littmann D, et al. Correlations of the orthogonal electrocardiogram and vectorcardiogram with consitutional variables in 518 normal men. Circulation 1967;35:536–51 [DOI] [PubMed] [Google Scholar]
  • 43.Kohane IS. Automating the study of population variation of electrocardiographic features. Circulation 2013;127:1357–8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Fultz SL, Skanderson M, Mole LA, et al. Development and verification of a “virtual” cohort using the national VA health information system. Medical Care 2006;44(8 Suppl. 2):S25–30 [DOI] [PubMed] [Google Scholar]
  • 45.Kohane IS. Using electronic health records to drive discovery in disease genomics. Nat Rev Genet 2011;12:417–28 [DOI] [PubMed] [Google Scholar]
  • 46.Rea S, Pathak J, Savova G, et al. Building a robust, scalable and standards-driven infrastructure for secondary use of EHR data: the SHARPn project. J Biomed Inform 2012;45:763–71 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Chute CG, Pathak J, Savova GK, et al. The SHARPn project on secondary use of electronic medical record data: progress, plans, and possibilities. AMIA Annual Symposium Proceedings 2011:248–56 [PMC free article] [PubMed] [Google Scholar]
  • 48.Marras C, Lang A. Parkinson's disease subtypes: lost in translation? J Neurol Neurosurg Psychiatry 2013;84:409–15 [DOI] [PubMed] [Google Scholar]
  • 49.Camu W, Billiard M, Baldy-Moulinier M. Fasting plasma and CSF amino acid levels in amyotrophic lateral sclerosis: a subtype analysis. Acta Neurol Scand 1993;88:51–5 [DOI] [PubMed] [Google Scholar]
  • 50.Parker G, Wilhelm K, Mitchell P, et al. Subtyping depression: testing algorithms and identification of a tiered model. J Nerv Ment Dis 1999;187:610–17 [DOI] [PubMed] [Google Scholar]
  • 51.Murphy J, Henry R, Lomen-Hoerth C. Establishing Subtypes of the Continuum of Frontal Lobe Impairment in Amyotrophic Lateral Sclerosis. Arch Neurol 2007;64:330–4 [DOI] [PubMed] [Google Scholar]
  • 52.Hoshida Y, Brunet J-P, Tamayo P, et al. Subclass mapping: identifying common subtypes in independent disease data sets. PLoS One 2007;2:e1195. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Ritchie MD, Denny JC, Zuvich RL, et al. Genome- and phenome-wide analysis of cardiac conduction identifies markers of arrhythmia risk. Circulation 2013;127:1377–85 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Albers DJ, Hripcsak G, Schmidt M. Population physiology: leveraging electronic health record data to understand human endocrine dynamics. PLoS One 2012;7:e48058. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Group DS. Will new diagnostic criteria for diabetes mellitus change phenotype of patients with diabetes? Reanalysis of European epidemiological data. BMJ 1998;317:371–5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Navaneethan SD, Jolly SE, Schold JD, et al. Development and validation of an electronic health record-based chronic kidney disease registry. Clin J Am Soc Nephrol 2011;6:40–9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.National Research Council. Committee on A Framework for Developing a New Taxonomy of Disease . Toward precision medicine: building a knowledge network for biomedical research and a new taxonomy of disease. Washington, DC: National Academies Press, 2011 [PubMed] [Google Scholar]
  • 58.Lewis SJG, Foltynie T, Blackwell AD, et al. Heterogeneity of Parkinson's disease in the early clinical stages using a data driven approach. J Neurol Neurosurg Psychiatry 2005;76:343–8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Selikhova M, Williams DR, Kempster PA, et al. A clinico-pathological study of subtypes in Parkinson's disease. Brain 2009;132:2947–57 [DOI] [PubMed] [Google Scholar]
  • 60.van Rooden SM, Colas F, Martinez-Martin P, et al. Clinical subtypes of Parkinson's disease. Mov Disord 2011;26:51–8 [DOI] [PubMed] [Google Scholar]
  • 61.Colleoni M, Rotmensz N, Robertson C, et al. Very young women (<35 years) with operable breast cancer: features of disease at presentation. Ann Oncol 2002;13:273–9 [DOI] [PubMed] [Google Scholar]
  • 62.Chen DP, Weber SC, Constantinou PS, et al. Clinical arrays of laboratory measures, or “clinarrays”, built from an electronic health record enable disease subtyping by severity. AMIA Annual Symposium Proceedings 2007:115–19 [PMC free article] [PubMed] [Google Scholar]
  • 63.Chen D, Dudley J, Butte A. Latent physiological factors of complex human diseases revealed by independent component analysis of clinarrays. BMC Bioinformatics 2010;11(Suppl. 9):S4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Jensen PB, Jensen LJ, Brunak S. Mining electronic health records: towards better research applications and clinical care. Nat Rev Genet 2012;13:395–405 [DOI] [PubMed] [Google Scholar]
  • 65.John ER, Prichep LS, Almas M. Subtyping of psychiatric patients by cluster analysis of QEEG. Brain Topogr 1992;4:321–6 [DOI] [PubMed] [Google Scholar]
  • 66.Barabasi A-L, Gulbahce N, Loscalzo J. Network medicine: a network-based approach to human disease. Nat Rev Genet 2011;12:56–68 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Basso K, Margolin AA, Stolovitzky G, et al. Reverse engineering of regulatory networks in human B cells. Nat Genet 2005;37:382–90 [DOI] [PubMed] [Google Scholar]
  • 68.Barabási A-L. Network medicine—from obesity to the “diseasome”. N Engl J Med 2007;357:404–7 [DOI] [PubMed] [Google Scholar]
  • 69.Zanzoni A, Soler-López M, Aloy P. A network medicine approach to human disease. FEBS Lett 2009;583:1759–65 [DOI] [PubMed] [Google Scholar]
  • 70.Cimino JJ, Hripcsak G, Johnson SB, et al. Designing an introspective, multipurpose, controlled medical vocabulary. AMIA Annual Symposium Proceedings 1989:513–18 [Google Scholar]
  • 71.Cimino JJ. From data to knowledge through concept-oriented terminologies: experience with the medical entities dictionary. J Am Med Inform Assoc 2000;7:288–97 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Phillips HS, Kharbanda S, Chen R, et al. Molecular subclasses of high-grade glioma predict prognosis, delineate a pattern of disease progression, and resemble stages in neurogenesis. Cancer Cell 2006;9:157–73 [DOI] [PubMed] [Google Scholar]
  • 73.Maher EA, Brennan C, Wen PY, et al. Marked genomic differences characterize primary and secondary glioblastoma subtypes and identify two distinct molecular and clinical secondary glioblastoma entities. Cancer Res 2006;66:11502–13 [DOI] [PubMed] [Google Scholar]
  • 74.Desmedt C, Haibe-Kains B, Wirapati P, et al. Biological processes associated with breast cancer clinical outcome depend on the molecular subtypes. Clin Cancer Res 2008;14:5158–65 [DOI] [PubMed] [Google Scholar]
  • 75.Welsh JB, Zarrinkar PP, Sapinoso LM, et al. Analysis of gene expression profiles in normal and neoplastic ovarian tissue samples identifies candidate molecular markers of epithelial ovarian cancer. Proc Natl Acad Sci U S A 2001;98:1176–81 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Ramaswamy S, Tamayo P, Rifkin R, et al. Multiclass cancer diagnosis using tumor gene expression signatures. Proc Natl Acad Sci U S A 2001;98:15149–54 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Gordon MM, Moser AM, Rubin E. Unsupervised analysis of classical biomedical markers: robustness and medical relevance of patient clustering using bioinformatics tools. PLoS One 2012;7:e29578. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Gilman SR, Iossifov I, Levy D, et al. Rare de novo variants associated with autism implicate a large functional network of genes involved in formation and function of synapses. Neuron 2011;70:898–907 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Califano A, Butte AJ, Friend S, et al. Leveraging models of cell regulation and GWAS data in integrative network-based association studies. Nat Genet 2012;44:841–7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Chen JL, Li J, Stadler WM, et al. Protein-network modeling of prostate cancer gene signatures reveals essential pathways in disease recurrence. J Am Med Inform Assoc 2011;18:392–402 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81.Li H, Lee Y, Chen JL, et al. Complex-disease networks of trait-associated single-nucleotide polymorphisms (SNPs) unveiled by information theory. J Am Med Inform Assoc 2012;19:295–305 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82.Teo YY. Common statistical issues in genome-wide association studies: a review on power, data quality control, genotype calling and population structure. Curr Opin Lipidol 2008;19:133–43 [DOI] [PubMed] [Google Scholar]
  • 83.Benson M, Breitling R. Network theory to understand microarray studies of complex diseases. Curr Mol Med 2006;6:695–701 [DOI] [PubMed] [Google Scholar]
  • 84.Sieberts SK, Schadt EE. Moving toward a system genetics view of disease. Mamm Genome 2007;18:389–401 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 85.Wen Z, Liu Z-P, Liu Z, et al. An integrated approach to identify causal network modules of complex diseases with application to colorectal cancer. J Am Med Inform Assoc 2013;20:659–67 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 86.McCormick RK. Osteoporosis: integrating biomarkers and other diagnostic correlates into the management of bone fragility. Altern Med Rev 2007;12:113–45 [PubMed] [Google Scholar]
  • 87.Chen DP, Weber SC, Constantinou PS, et al. Novel integration of hospital electronic medical records and gene expression measurements to identify genetic markers of maturation. Pacific Symposium on Biocomputing 2008;13:243–54 [PMC free article] [PubMed] [Google Scholar]
  • 88.Schmidt CW. Signs of the times: biomarkers in perspective. Environ Health Perspect 2006;114:A700–5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 89.Nevins JR, Huang ES, Dressman H, et al. Towards integrated clinico-genomic models for personalized medicine: combining gene expression signatures and clinical factors in breast cancer outcomes prediction. Hum Mol Genet 2003;12(Suppl. 2):R153–R7 [DOI] [PubMed] [Google Scholar]
  • 90.Cheng J, Bell DA, Liu W. Learning belief networks from data: an information theory based approach. Proceedings of the sixth international conference on Information and knowledge management, Las Vegas, Nevada, USA. 266920: ACM, 1997:325–31 [Google Scholar]

Articles from Journal of the American Medical Informatics Association : JAMIA are provided here courtesy of Oxford University Press

RESOURCES