Abstract
Electronic Health Record (EHR) systems typically define laboratory test results using the Laboratory Observation Identifier Names and Codes (LOINC) and can transmit them using Fast Healthcare Interoperability Resource (FHIR) standards. LOINC has not yet been semantically integrated with computational resources for phenotype analysis. Here, we provide a method for mapping LOINC-encoded laboratory test results transmitted in FHIR standards to Human Phenotype Ontology (HPO) terms. We annotated the medical implications of 2923 commonly used laboratory tests with HPO terms. Using these annotations, our software assesses laboratory test results and converts each result into an HPO term. We validated our approach with EHR data from 15,681 patients with respiratory complaints and identified known biomarkers for asthma. Finally, we provide a freely available SMART on FHIR application that can be used within EHR systems. Our approach allows readily available laboratory tests in EHR to be reused for deep phenotyping and exploits the hierarchical structure of HPO to integrate distinct tests that have comparable medical interpretations for association studies.
INTRODUCTION
Electronic health records (EHRs) have been widely adopted in US hospitals since the Health Information Technology for Electronic and Clinical Health Act (HITECH) was passed in 2009, and offer an unprecedented opportunity to accelerate translational research because of advantages of scale and cost efficiency as compared with traditional cohort-based studies.1 In particular, EHRs contain rich phenotype information that can be utilized to stratify diseases and to develop hypotheses. For instance, phenome-wide association studies (PheWAS) can exploit EHR data to define case–control cohorts for disease diagnoses or laboratory traits and then analyze associations with hundreds of thousands of genetic variants.2–4 Despite the great potential of EHR data, patient phenotyping from EHRs is still challenging because the phenotype information is distributed in many EHR locations (laboratories, notes, problem lists, imaging data, etc.) and since EHRs have vastly different structures across sites. This lack of integration represents a substantial barrier to widespread use of EHR data in translational research.
Laboratory tests provide a critical resource for phenotype extraction. Deep phenotyping, i.e., comprehensive and precise phenotyping of individual disease manifestations, is an essential component of precision medicine and could potentially extend the reach of PheWAS studies.5,6 Laboratory tests have broad applicability for translational research, but EHR-based research using laboratory data have been challenging because of their diversity and the lack of standardization of reporting laboratory test results. For instance, some tests measure nitrite level in urine using an automated machine, whereas others use a test strip. Some report the value in mg/dL, whereas others report a qualitative value of positive/negative. If any of these tests were abnormal, the medical interpretation would be that nitrituria is present, yet current informatics frameworks do not easily support such inferences. Therefore, substantial challenges exist for standardization and integration of laboratory data for deep phenotyping and EHR-based translational research.
Recent advances in the standardization of EHR systems and phenotype ontologies make it feasible to extract patient phenotypes from laboratory tests at a large scale. The Fast Healthcare Interoperability Resource (FHIR) was introduced in 2013 and provides a standardized interface to individual EHR systems for healthcare-related data.7 FHIR separates healthcare-related data into granular components as “resources” such as observation, medication, patient identity and insurance claims, which have a standard definition and associated semantic bindings, which can be computationally integrated even when they are created by different methods and organizations. Laboratory tests, encoded as observations in FHIR, are uniquely identified with Laboratory Observation Identifier Names and Codes (LOINC), which is a universal code system that defines various kinds of clinical laboratory tests and other measurements (~86,000 entries).8 The outcome of a FHIR observation can be represented by a term in the Human Phenotype Ontology (HPO), which is a logically defined vocabulary for describing medically relevant abnormal phenotypes.9 The HPO has become the de facto standard for computational phenotype analysis in genomics and rare disease.10–12 The HPO currently contains 14,184 terms (February, 2019) including a comprehensive representation of laboratory abnormalities such as hyperglycemia, thrombocytopenia, nitrituria, and increased urine alpha-ketoglutarate concentration. Here, we present a computational method that semantically harmonizes FHIR, LOINC, and HPO. The software rolls up LOINC terms for tests whose outcomes are medically comparable into common categories and interprets the outcome as HPO terms, thereby automatically extracting detailed, deep phenotypic profiles of laboratory results for downstream studies.
RESULTS
Overview of strategy
We present an approach to mapping the outcomes of laboratory tests as encoded in EHRs with LOINC terms for the tests and FHIR Observation resources representing the test results as HPO terms. A LOINC term by itself does not specify the outcome of a test. But if the outcome of a test (such as “high” or “low”) and the nature of the test are known, we can then infer the phenotypic abnormality. For example, LOINC 32710-6 “Nitrite [Presence] in Urine” together with the outcome “positive” implies the phenotypic abnormality Nitrituria (HP:0031812).
LOINC-coded laboratory tests can be grouped broadly into three categories, those with a quantitative outcome (Qn), an ordered categorical outcome (ordinal, or Ord) and an unordered categorical outcome (nominal, or Nom). A quantitative test for an analyte has a normal range, and there are three types of mappings depending on the result of the test: L (lower than normal), N (normal), and H (higher than normal). Take, for instance, a test for the concentration of potassium in the blood (LOINC:6298-4, Fig. 1a). If the result is high, our procedure infers the corresponding HPO term for Hyperkalemia (HP:0002153). Analogously, a low result is mapped to Hypokalemia (HP:0002900). The HPO is an ontology of abnormal phenotypes, and thus there is no term that specifically represents a normal test result. However, computational analysis can record negated HPO terms, and the normal test result is represented as NOT Abnormal blood potassium concentration (HP:0011042).
Ordinal tests can have a series of ordered outcomes. The majority of the ordinal LOINC tests were mapped to two possible outcomes, POS (positive) or NEG (negative). For instance, the result of the test Nitrite in urine by test strip can be positive (present) or negative (absent) (Fig. 1a). If present, then our approach infers the HPO term Nitrituria (HP:0031812); if absent, our approach infers NOT Nitrituria (HP:0031812).
Nominal tests have a series of outcomes that lack a natural ordering. Yet, some nominal result values are considered abnormal. For instance, LOINC 5778-6, color of urine. Currently, nine abnormal results of this test are mapped to the nine child terms of abnormal urinary color (HP:0012086), including red urine (HP:0040318) and dark urine (HP:0040319).
A LOINC to HPO mapping library
We have mapped 2923 LOINC terms to HPO terms. In all, 80.4% of the mapped LOINC tests are Qn, 18.8% Ord, and 0.8% Nom (Fig. 2a). Taken together, these LOINC terms mapped to a total of 719 distinct HPO terms. We analyzed the distribution of the number of distinct LOINC terms that were mapped to an individual HPO term. In 54.8% of the cases, two or more LOINC terms are mapped to the same HPO term (mean 7.5) (Fig. 2b), reflecting the fact that multiple laboratory tests (and associated LOINC terms) have outcomes that we consider to have an equivalent clinical interpretation.
Algorithm for converting LOINC-coded laboratory tests into HPO-coded phenotypes
We designed an algorithm that inspects elements of a FHIR resource for laboratory tests and converts the outcome into an HPO term. A standard FHIR resource for laboratory tests (a FHIR Observation) contains patient information, test identification, test result, normal reference ranges, and interpretations (Fig. 1b). The algorithm compares the numerical result with the normal reference ranges to assign an interpretation code such as “L” or “POS” (Table 1), or make use of the interpretation codes when they are present, to map the result to the corresponding HPO term (Supplementary Fig. 1). Overall, the algorithm handles all three major types of LOINC-coded laboratory tests (Qn, Ord, and Nom) when combined with the LOINC to HPO annotation data.
Table 1.
Primary code | Other FHIR codes mapped | Meaning |
---|---|---|
A | AA, W | Abnormal |
L | <, D, LL, LU | Lower than normal |
N | B, I | Normal |
H | >, HH,HU, U | Higher than normal |
NEG | ND, NR | Not present |
POS | AC, DET, RR, TOX, WR | Present |
U | HM, IE, IND, MS, NS, null, OBX, QCF, R, S, SDD, SYN-R, SYN-S, VS | Unknown |
HPO on FHIR
To demonstrate conversion of FHIR-encoded LOINC tests into HPO, we created a SMART on FHIR application that uses the mapping library. SMART (Substitutable Medical Applications, Reusable Technologies) on FHIR is an application platform for EHRs that allows applications to run on different FHIR-enabled EHR systems.13 Our application, HPO on FHIR, transforms a bundle of laboratory observations for a patient into a list of HPO codes (Fig. 3). We have also developed a command-line application that can iterate through all laboratory tests in a FHIR-enabled server, convert each into an HPO term and store them in a relational database for translational research.
LOINC to HPO demonstration with asthma
To test our method for semantic integration of laboratory tests, we analyzed a de-identified EHR dataset from the University of North Carolina (UNC) comprising 15,681 patients who had a history of asthma or asthma-like symptoms. The cohort is skewed toward female (58.9%) and older patients (median age: 61.5 years, Fig. 4a). The median tracking period of patients in this cohort is 3.1 years. The dataset contains ~54 million records of LOINC-encoded clinical test results, medication prescriptions, diagnosis codes, procedure codes, patient information, and other supporting records (Fig. 4b). Using our LOINC to HPO conversion algorithm, we successfully transformed 9.9 out of 11 million (88.6%) laboratory tests into HPO terms (Fig. 4c). For the entire cohort, on average, each HPO term was mapped from 1.8 distinct types of laboratory tests (Fig. 4d), indicating that the transformation successfully integrated distinctly coded laboratory tests that have the same clinical interpretation. The mapping procedure assigned an average of 633 laboratory test-derived HPO terms per individual patient, many of which were from the same laboratory tests performed at different visits. The tests corresponded to a mean of 57.7 unique HPO terms, of which 20.8 were abnormalities and the remainder were normal phenotypes (Fig. 4e). The hierarchical structure of the HPO allows inferences to be propagated up to parent terms and their ancestors;14 using this method, we inferred an additional 51.2 HPO terms (total 73.5) based on 22.2 abnormalities for each patient (Supplementary Fig. 2).
As a proof-of-principle, we tested the ability of our procedure to identify phenotypic abnormalities associated with a diagnosis of asthma or with frequent prednisone use. About one-third of the patients in this cohort had an ICD 9/10 diagnosis of asthma, and the remaining patients had ICD 9/10 codes reflecting other, potentially asthma-like, respiratory complaints. In all, 14.2% of patients who had a diagnosis of asthma were administered or prescribed prednisone >3 times within a tracking period between 2004 and 2016; 8.5% of the remaining patients had been administered prednisone more than three times. Prednisone is a corticosteroid drug used for severe asthma treatment with multiple other indications.15 We reasoned that both the diagnosis of asthma and the history of treatment with prednisone would likely be correlated with different but overlapping sets of laboratory abnormalities. Using logistic regression, we assessed the contribution of frequent prednisone prescription and the presence of acute asthma diagnosis to each phenotypic abnormality.
Prednisone usage was significantly associated with an increased odds ratio for exhibiting many abnormal phenotypes that are consistent with the known effects of prednisone (Table 2), such as hypoalbuminemia (HP:0003073),16 neutrophilia (HP:0011897),17 monocytosis (HP:0012311),18 leukocytosis (HP:0001974),18 hypokalemia (HP:0002900),19 and elevated serum creatine phosphoki-nase (HP:0003236).20 An acute asthma diagnosis was significantly associated with seven phenotypes, abnormal metabolism (HP:0032245), abnormality of vitamin metabolism (HP:0100508), increased red blood cell count (HP:0020059), increased VLDL cholesterol concentration (HP:0003362), and eosinophilia (HP:0001880), and two ancestor terms of eosinophilia, abnormal eosinophil count (HP:0020064), and abnormal eosinophil morphology (HP:0001879). Eosinophilia is a well-established marker for acute allergic asthma.21 Several studies have linked vitamin A, B, C, D, E with asthma.22–24 In this study, we applied a threshold minimum number of patients before performing statistical analysis, and none of the specific subtypes of abnormality of vitamin metabolism (HP:0100508; n = 111 patients) passed this threshold. However, a number of patients were found to have increased blood folate (HP:0040087; n = 33 patients), vitamin B12 deficiency (HP:0200502; n = 6 patients), low serum calciferol (HP:0012053; n = 56 patients), and low serum calcitriol (HP:0012052; n = 6 patients). Thus, the hierarchical structure of HPO allowed us to infer the parent phenotype (Abnormality of vitamin metabolism) and aggregate enough data to find that it is associated with acute asthma diagnosis (Supplementary Fig. 3). The term abnormal metabolism (HP:0032245) was also flagged, but this was solely related to the 111 patients annotated to abnormality of vitamin metabolism, which is a child term of abnormal metabolism. Although there have been some conflicting results,25 a number of studies have shown a positive correlation between increased total, high- or low density lipoprotein cholesterol, or triglycerides (Supplementary Fig. 2) and asthma.26–30 An increased red blood cell count is not a recognized biomarker of asthma, but could conceivably reflect a number of factors including hypoxemia (11.1% with an acute asthma diagnosis also had a chronic obstructive pulmonary disease diagnosis), or hemoconcentration resulting from acute dehydration during an asthma attack, but the nature of this retrospective study does not allow us to consult the full medical records to investigate this.
Table 2.
HPO | Frequent prednisone prescription |
Acute asthma diagnosis |
||||||
---|---|---|---|---|---|---|---|---|
Odds ratio | Confidence interval (95%) | P-value | Odds ratio | Confidence interval (95%) | P-value | |||
Abnormal metabolism | 0.56 | [0.26–1.23] | 1.45 × 10−1 | - | 1.72 | [1.16–2.55] | 6.78 × 10−3 | ** |
Abnormality of vitamin metabolism | 0.56 | [0.26–1.23] | 1.45 × 10−1 | - | 1.72 | [1.16–2.55] | 6.78 × 10−3 | ** |
Increased red blood cell count | 2.48 | [2–3.07] | 5.42 × 10−17 | ** | 1.5 | [1.25–1.79] | 9.24 × 10−6 | ** |
Increased VLDL cholesterol concentration | 0.77 | [0.38–1.53] | 4.47 × 10−1 | - | 1.49 | [1–2.23] | 4.84 × 10−2 | * |
Abnormal VLDL cholesterol concentration | 0.72 | [0.36–1.44] | 3.50 × 10−1 | - | 1.42 | [0.96–2.1] | 7.91 × 10−2 | - |
Increased hematocrit | 2.42 | [1.89–3.11] | 2.21 × 10−12 | ** | 1.23 | [0.99–1.53] | 5.35 × 10−2 | - |
Abnormal eosinophil count | 3.72 | [3.17–4.37] | 1.42 × 10−59 | ** | 1.17 | [1.01–1.36] | 3.06 × 10−2 | * |
Abnormal eosinophil morphology | 3.72 | [3.17–4.37] | 1.42 × 10−59 | ** | 1.17 | [1.01–1.36] | 3.06 × 10−2 | * |
Eosinophilia | 3.74 | [3.19–4.39] | 7.58 × 10−60 | ** | 1.17 | [1.01–1.36] | 3.14 × 10−2 | * |
Reduced blood urea nitrogen | 2.35 | [2.01–2.76] | 6.46 × 10−27 | ** | 1.08 | [0.95–1.24] | 2.40 × 10−1 | - |
Increased LDL cholesterol concentration | 0.81 | [0.57–1.15] | 2.28 × 10−1 | - | 1.07 | [0.86–1.33] | 5.39 × 10−1 | - |
Hypercholesterolemia | 2.99 | [2.58–3.47] | 5.62 × 10−48 | ** | 1.05 | [0.93–1.19] | 4.48 × 10−1 | - |
Abnormal LDL cholesterol concentration | 0.85 | [0.61–1.19] | 3.33 × 10−1 | - | 1.02 | [0.82–1.26] | 8.71 × 10−1 | - |
P < 0.01
P < 0.05
P ≥ 0.05; table is sorted by the odds ratio for acute asthma diagnosis. Only HPO terms of which the odds ratio > 1 for acute asthma diagnosis are shown. Refer to Supplementary Table 3 for all terms
DISCUSSION
In this report, we present an approach to the semantic integration of laboratory tests and results in EHR data. Our approach connects a widely used system for denoting laboratory tests, LOINC, with a current standard for transmitting healthcare information, FHIR, and a computational resource for deep phenotyping, HPO, that was previously used mainly in the context of rare disease research and diagnostics. Previous work such as OntoServer provides lookup services of different terminologies and maps similar concepts that originate from different terminologies.31 The focus of our tool in contrast is to provide a means of interpreting the outcomes of laboratory tests using an ontology of phenotypic abnormalities. Normalizing laboratory tests with HPO terms is an effective solution for two fundamental issues in clinical research: data integration and deep phenotyping. Laboratory test results support a large proportion of medical decisions.32 It is common that different laboratory tests may lead to results that have very similar or identical clinical interpretations. These different tests are recorded in the EHR using distinct codes (for instance, currently, there are four different LOINC terms for different tests of urine nitrite). This level of granularity can create difficulties for the semantic integration of comparable test results. By converting the results of laboratory tests to HPO-encoded phenotypes, our method provides an effective way for integrating laboratory tests that have the same clinical interpretation but different LOINC codes. Extracted patient phenotypes can be directly utilized for PheWAS studies, which is important because phenotyping patients is a major bottleneck for conducting PheWAS studies.33 The Electronic Medical Records and Genomics (eMerge) network develops EHR-derived phenotyping algorithms by combining diagnosis codes, procedure codes, medication, narratives, and subsets of laboratory tests and iteratively refine them to identify control and disease cohorts for genome-wide association studies and PheWAS.1,3,33–35 Our method complements existing phenotyping algorithms because it extracts additional phenotypic information by systematically interrogating the vast amount of data in laboratory tests.
The analysis of UNC EHR data demonstrated the potential of combining deep phenotypes from our tool with EHR data for biomarker discovery. Our current mapping library allowed us to convert the majority of the laboratory tests into HPO terms and assign an average of 57.7 unique phenotypes to each patient. The statistical analysis identified phenotypic abnormalities that are associated with frequent prescriptions of prednisone and/or acute asthma diagnosis. The cohort used for this analysis is biased toward senior and female patients and may not be reflective of asthma patient distributions, but the fact that our analysis identified numerous abnormalities that are associated with either prednisone use or asthma suggests that our approach can be useful for the investigation of EHR data for laboratory-based biomarkers of diseases and conditions. We have demonstrated the utility of our approach on the UNC dataset using a simple logistic regression approach as a proof-of-principle; we envision that our mapping approach could be used together with a variety of statistical and algorithmic analysis strategies to address a variety of topics in EHR-based translational research, and we have therefore coded our foundational approach in a way that can easily be integrated into other statistical analysis pipelines. A particularly attractive direction is to incorporate temporal information to build predictive models based on longitudinal phenotypic timelines.36,37
Some practical issues need to be considered when adopting our approach. Although LOINC has been widely adopted by healthcare providers and increasingly mandated by various federal agencies, it is still not a universal system. Since we used LOINC for the mapping, locally coded laboratory tests will not be able to be mapped to HPO terms with our tools. Similarly, the SMART on FHIR tool reported here can only be utilized in FHIR-enabled hospital systems. However, our annotation file and the algorithmic approach we adopted can be used independently of FHIR.
Several other use cases for our approach are conceivable. Rule-based algorithms could be applied to infer HPO terms from the primary phenotypic abnormalities. For instance, the combination of decreased hemoglobin concentration (HP:0020062) and decreased mean corpuscular volume (HP:0025066) implies microcytic anemia (HP:0001935). The HPO is widely used in rare disease diagnostics, but one bottleneck is that in many settings, HPO terms need to be entered manually into the analysis software. A recent study used text-mining to extract detailed patient phenotypes through natural language processing of clinical narratives in EHR, and used the resulting lists of HPO terms for genomic diagnostics.11 Our tool could supplement such approaches by providing a computational representation of laboratory findings to genomic diagnostic software such as Exomiser.38–40 In principle, our tool could be used to support other tasks related to EHR data, including decision support and cohort recruitment. In the future, we anticipate that semantic integration of a wider range of EHR data will become the norm to support data-driven translational research and precision medicine.
METHODS
Mapping LOINC terms to HPO terms
We performed manual biocuration to construct a mapping library from each potential outcome of a LOINC test to the corresponding HPO term (Fig. 1a). The test outcome is represented using a subset of FHIR codes (Table 1, primary code), such as “lower than normal”, “normal”, or “higher than normal”. For quantitative tests that report a numeric measurement, we use FHIR interpretation code “L” and “H” to indicate lower or higher than normal, and “N” and “A” to indicate the result is normal or abnormal. For ordinal tests that have a binary outcome, i.e., present or absent of the test target, we use FHIR interpretation code “POS” to indicate present and “NEG” to indicate absent. In addition, other interpretation codes defined by FHIR are first mapped to primary codes. For example, FHIR codes “LL” (critically low) and “<” (off scale low) are both mapped to “L” (Table 1).
The value for a map entry is an HPO term accompanied by a boolean value to indicate whether it should be negated. That is, while an abnormal test outcome is mapped to a particular HPO term, the normal outcome for that test is mapped to the negated form, since the HPO contains only terms for abnormal phenotypes. Figure 1a shows three examples of mappings for Qn, Ord, or Nom LOINC terms.
In order to efficiently perform the biocuration needed to generate the LOINC mappings, we developed a JavaFX-based annotation tool that recommends candidate HPO terms to a LOINC test based on lexical matching between HPO term definitions and the name of a laboratory test. The recommended HPO terms were then manually vetted by one of five biocurators (i.e., one MD and four PhDs who have biomedical training and are major contributors to the HPO project) and cross-validated by a different annotator. Mapping problems were tracked by Github issues (https://github.com/TheJacksonLaboratory/loinc2hpoAnnotation/issues) and discussed during regular meetings. Source code and an executable version of the biocuration application are freely accessible at https://github.com/monarch-initiative/loinc2hpo. In addition, a subset (n = 160) of pediatric-specific laboratory tests were independently validated by five domain experts (i.e., three pediatric clinicians, a PhD-level molecular biologist, and a master’s-level epidemiologist). To perform this validation, a Qualtrics survey was designed so that each question featured a laboratory test description and set of reasonable HPO concepts. The survey was completed by all experts between October and December (2019). After completion, any laboratory test mapping that did not meet agreement by at least one clinician and both the biologist/epidemiologist were re-evaluated with one clinician until consensus was reached. The pediatric terms were additionally vetted on the loinc2hpoAnnotation GitHub tracker by the entire team of biocurators.
LOINC to HPO mapping file
The LOINC to HPO mapping file contains records of mapping from LOINC test outcomes to the corresponding HPO terms. The annotation data are serialized as a tab-separated value (TSV) file. Each line records the LOINC code, test outcome, the mapped HPO term, and whether the mapped term should be negated. The annotation file is deposited at Github and can be accessed at https://w3id.org/loinc2hpo/annotations. An excerpt is shown in Supplementary Table 1.
HPO on FHIR
We created a SMART on FHIR application, HPO on FHIR, to query a FHIR-enabled EHR servers and return patient laboratory results with LOINC codes and their corresponding HPO terms. The web interface of the application aggregates identical HPO terms together for visualization and also allows users to display source laboratory tests including subject, LOINC code, FHIR resource id, effective time and the corresponding HPO term. The application was written in the Java language with the Spring framework. The application implements the LOINC to HPO conversion algorithm described in Supplementary Fig. 1. The application is deposited at Github and can be accessed at (https://github.com/OCTRI/poc-hpo-on-fhir).
Command-line application for gathering FHIR server statistics
We created a command-line application that finds all laboratory tests for a patient on a FHIR server and attempts to convert them to HPO. The conversion results, both successes and failures, are stored in a relational database to aid in translational research. We ran the application on seven common FHIR sandboxes and gathered statistics about the LOINCs encountered, the rate of success in conversion, and the underlying causes of failure. The application was written in the Java language with the Spring framework. Source code, results, and a backup of the database, can be accessed at https://github.com/OCTRI/f2hstats.
Analysis of UNC data on patients with asthma or an asthma-like condition
For the purposes of demonstrating the potential utility of our library, we examined a de-identified EHR dataset extracted from the Carolina Data Warehouse for Health (CDWH) at the UNC. The data were accessed under a fully executed Data Use Agreement between The Jackson Laboratory and UNC. The CDWH is UNC Health Care System’s (UNCHCS) enterprise data warehouse, and contains EHR data for all UNCHCS patients from 2004 through 2016. The sample used for this investigation contains 15,681 patients with one or more encounters at UNCHCS with an asthma or asthma-like diagnosis (Supplementary Table 2). The data were exported from the UNC EHR system as eight separate comma-separated value (CSV) files containing clinical observations in a variety of data domains, including demographics, encounter details, diagnoses, procedures, medications, vital signs, and LOINC-coded lab results. Prior to transmission from UNC, the dataset was de-identified according to the Safe Harbor method of the Health Insurance Portability and Accountability Act (HIPAA), and all dates were shifted ±50 days. The project methods and use of the de-identified EHR-derived dataset were reviewed by The Jackson Laboratory Institutional Review Board and confirmed to be compliant with relevant guidelines and regulations and approved for data access on 19 December 2017.
Using the extracted laboratory data, we converted each LOINC-coded test into an HPO term. We note, however, that not every laboratory test result was captured in the available dataset. For each patient, we combined test records mapped to the same HPO terms and recorded the counts of observations for each HPO term. Then, we inferred additional phenotypic abnormalities based on the hierarchical structure of HPO, i.e., if a patient was assigned with an HPO term, we infer that the patient also had phenotypic abnormalities encoded by parent and other ancestor terms (Supplementary Fig. 2). We reasoned that an isolated abnormal measurement might represent an artifact or might not be typical of the clinical course of the patient, and therefore used a threshold of three observations over the entire observation period in order to classify a patient with the corresponding HPO-encoded phenotypic abnormality. We classified a patient not having an HPO-encoded phenotypic abnormality only when the patient had never been assigned to the HPO term in question. Patient age was calculated from the last hospital visit date subtracting the birth date and is subject to an inaccuracy of ±50 days due to the deidentification procedure (see above). Patients who rarely visited hospitals were less likely to receive laboratory tests and thus had less phenotypes, so we excluded those who had medical encounters on <10 days. Patients received >3 prednisone prescriptions were considered frequent users.
Statistics
We applied a logistic regression model to determine the weights of being a frequent prednisone user (values 0 or 1) and having an acute asthma diagnosis (values 0 or 1) in determining a patient having an HPO-encoded phenotype (values 0 or 1). We excluded HPO terms from analysis of which the majority (95%) of the cohort had universal values (all 0 or 1). The natural exponential of the weights ± 1.98 standard deviations were converted to the odd ratio and 95% confidence intervals for each variable.
Data cleaning, normalization, wrangling, and table joining were conducted by a combination of “tidyverse”, “RSQLite” packages in R, SQLite, and Java. Logistic regression was conducted with the “glm” package in R. All source code is deposited at Github and can be accessed through https://github.com/TheJacksonLaboratory/HUSHDataAnalysis.
Supplementary Material
ACKNOWLEDGEMENTS
We acknowledge colleagues from the Monarch Initiative for comments on this project. Research reported in this work was supported by the National Institutes of Health’s National Center for Advancing Translational Sciences, Grant Number U24TR00230, the Biomedical Data Translator program (awards OT3TR002019 and OT3TR002020), and the Clinical and Translational Science program (award UL1TR002489, UL1TR002369). The project also received support from the Intramural Research Program within the National Library of Medicine, National Institutes of Health and the National Human Genome Research Institute, National Institutes of Health (award NR24OD011883). This work was also supported by the U.S. National Library of Medicine contract HHSN276201400008C. Dr. Feinstein was supported by the Eunice Kennedy Shriver National Institute of Child Health and Human Development of the National Institutes of Health under Award Number K23HD091295. Dr. Hunter was supported by National Institute of Health R01LM008111. Tiffany Callahan was supported by Colorado Biomedical Informatics Training Program T15LM009451. Dr. Peden was supported by EPA Cooperative Agreement CR 83578501. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health, nor should any endorsements be inferred by NIH or the U.S. Government. This material contains content from LOINC® (http://loinc.org), which is copyright © 1995-2018, Regenstrief Institute, Inc. and the Logical Observation Identifiers Names and Codes (LOINC) Committee and is available at no cost under the license at http://loinc.org/license.
Footnotes
Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
DATA AVAILABILITY
The patient EHR dataset can be acquired from the UNC with Data Use Agreement.
CODE AVAILABILITY
Computer code used in this study is openly accessible with the links provided in the Methods section.
ADDITIONAL INFORMATION
Supplementary Information accompanies the paper on the npj Digital Medicine website (https://doi.org/10.1038/s41746-019-0110-4).
Competing interests: D.J.V. is the President of Blue Sky Premise, LLC and participates in the development, maintenance, and distribution of LOINC. The remaining authors declare no competing interests.
REFERENCES
- 1.Denny JC, Bastarache L & Roden DM Phenome-wide association studies as a tool to advance precision medicine. Annu. Rev. Genom. Hum. Genet. 17, 353–373 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Verma A et al. PheWAS and beyond: the landscape of associations with medical diagnoses and clinical measures across 38,662 individuals from Geisinger. Am. J. Hum. Genet. 102, 592–608 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Denny JC et al. Variants near FOXE1 are associated with hypothyroidism and other thyroid conditions: using electronic medical records for genome- and phenome-wide studies. Am. J. Hum. Genet. 89, 529–542 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Dewey FE et al. Distribution and clinical impact of functional variants in 50,726 whole-exome sequences from the DiscovEHR study. Science 354, aaf6814 (2016). [DOI] [PubMed] [Google Scholar]
- 5.Freimer N & Sabatti C The human phenome project. Nat. Genet. 34, 15–21 (2003). [DOI] [PubMed] [Google Scholar]
- 6.Robinson PN Deep phenotyping for precision medicine. Hum. Mutat. 33, 777–780 (2012). [DOI] [PubMed] [Google Scholar]
- 7.Leroux H, Metke-Jimenez A & Lawley MJ Towards achieving semantic interoperability of clinical study data with FHIR. J. Biomed. Semant. 8, 41 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.McDonald CJ et al. LOINC, a universal standard for identifying laboratory observations: a 5-year update. Clin. Chem. 49, 624–633 (2003). [DOI] [PubMed] [Google Scholar]
- 9.Köhler S et al. Expansion of the human phenotype ontology (HPO) knowledge base and resources. Nucleic Acids Res. 47, D1018–D1027 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Posey JE et al. Resolution of disease phenotypes resulting from multilocus genomic variation. N. Engl. J. Med. 376, 21–31 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Son JH et al. Deep phenotyping on electronic health records facilitates genetic diagnosis by clinical exomes. Am. J. Hum. Genet. 103, 58–73 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Köhler S et al. The human phenotype ontology in 2017. Nucleic Acids Res. 45, D865–D876 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Mandel JC, Kreda DA, Mandl KD, Kohane IS & Ramoni RB SMART on FHIR: a standards-based, interoperable apps platform for electronic health records. J. Am. Med. Inform. Assoc. 23, 899–908 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Robinson PN & Bauer S Introduction to Bio-Ontologies. (CRC Press Inc., Boca Raton, FL, 2011). [Google Scholar]
- 15.Krishnan JA, Davis SQ, Naureckas ET, Gibson P & Rowe BH An umbrella review: corticosteroid therapy for adults with acute asthma. Am. J. Med. 122, 977–991 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Aplasca EC & Rammohan M The effect of prednisone on the levels of serum albumin of 20 patients with renal transplants. J. Am. Diet. Assoc. 86, 1404–1405 (1986). [PubMed] [Google Scholar]
- 17.Dale DC, Fauci AS, Guerry D IV & Wolff SM Comparison of agents producing a neutrophilic leukocytosis in man. Hydrocortisone, prednisone, endotoxin, and etiocholanolone. J. Clin. Invest. 56, 808–813 (1975). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Shoenfeld Y, Gurewich Y, Gallant LA & Pinkhas J Prednisone-induced leukocytosis. Influence of dosage, method and duration of administration on the degree of leukocytosis. Am. J. Med. 71, 773–778 (1981). [DOI] [PubMed] [Google Scholar]
- 19.Veltri KT & Mason C Medication-induced hypokalemia. Pharm. Ther. 40, 185–190 (2015). [PMC free article] [PubMed] [Google Scholar]
- 20.Smithson J et al. Drug induced muscle disorders. Aust. Pharm. 28, 1056 (2009). [Google Scholar]
- 21.Price DB et al. Blood eosinophil count and prospective annual asthma disease burden: a UK cohort study. Lancet Respir. Med. 3, 849–858 (2015). [DOI] [PubMed] [Google Scholar]
- 22.Allen S, Britton JR & Leonardi-Bee JA Association between antioxidant vitamins and asthma outcome measures: systematic review and meta-analysis. Thorax 64, 610–619 (2009). [DOI] [PubMed] [Google Scholar]
- 23.Jolliffe DA et al. Vitamin D supplementation to prevent asthma exacerbations: a systematic review and meta-analysis of individual participant data. Lancet Respir. Med 5, 881–890 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Thuesen BH et al. Atopy, asthma, and lung function in relation to folate and vitamin B(12) in adults. Allergy 65, 1446–1454 (2010). [DOI] [PubMed] [Google Scholar]
- 25.Yiallouros PK et al. Low serum high-density lipoprotein cholesterol in childhood is associated with adolescent asthma. Clin. Exp. Allergy 42, 423–432 (2012). [DOI] [PubMed] [Google Scholar]
- 26.Ramaraju K, Krishnamurthy S, Maamidi S, Kaza AM & Balasubramaniam N Is serum cholesterol a risk factor for asthma? Lung India 30, 295–301 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Ko S-H et al. Lipid profiles in adolescents with and without asthma: Korea National Health and nutrition examination survey data. Lipids Health Dis. 17, 158 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Chen YC et al. Lipid profiles in children with and without asthma: interaction of asthma and obesity on hyperlipidemia. Diabetes Metab. Syndr. 7, 20–25 (2013). [DOI] [PubMed] [Google Scholar]
- 29.Al-Shawwa B, Al-Huniti N, Titus G & Abu-Hasan M Hypercholesterolemia is a potential risk factor for asthma. J. Asthma 43, 231–233 (2006). [DOI] [PubMed] [Google Scholar]
- 30.Cottrell L, Neal WA, Ice C, Perez MK & Piedimonte G Metabolic abnormalities in children with asthma. Am. J. Respir. Crit. Care. Med. 183, 441–448 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Metke-Jimenez A, Steel J, Hansen D & Lawley M Ontoserver: a syndicated terminology server. J. Biomed. Semant. 9, 24 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Badrick T Evidence-based laboratory medicine. Clin. Biochem. Rev. 34,43–46 (2013). [PMC free article] [PubMed] [Google Scholar]
- 33.Pathak J, Kho AN & Denny JC Electronic health records-driven phenotyping: challenges, recent advances, and perspectives. J. Am. Med. Inform. Assoc. 20, e206–11 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Ritchie MD et al. Genome- and phenome-wide analyses of cardiac conduction identifies markers of arrhythmia risk. Circulation 127, 1377–1385 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Karnes JH et al. Phenome-wide scanning identifies multiple diseases and disease severity phenotypes associated with HLA variants. Sci. Transl. Med. 9, 389 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Glueck M et al. PhenoLines: phenotype comparison visualizations for disease subtyping via topic models. IEEE. Trans. Vis. Comput. Graph. 24, 371–381 (2018). [DOI] [PubMed] [Google Scholar]
- 37.Rajkomar A et al. Scalable and accurate deep learning with electronic health records. npj Digital Med. 1, 18 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Robinson PN et al. Improved exome prioritization of disease genes through cross-species phenotype comparison. Genome Res. 24, 340–348 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Smedley D et al. Next-generation diagnostics and disease-gene discovery with the Exomiser. Nat. Protoc. 10, 2004–2015 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Smedley D et al. A whole-genome analysis framework for effective identification of pathogenic regulatory variants in Mendelian disease. Am. J. Hum. Genet. 99, 595–606 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.