Building Bridges Across Electronic Health Record Systems Through Inferred Phenotypic Topics

You Chen; Joydeep Ghosh; Cosmin Adrian Bejan; Carl A Gunter; Siddharth Gupta; Abel Kho; David Liebovitz; Jimeng Sun; Joshua Denny; Bradley Malin

doi:10.1016/j.jbi.2015.03.011

. Author manuscript; available in PMC: 2016 Jun 1.

Published in final edited form as: J Biomed Inform. 2015 Apr 1;55:82–93. doi: 10.1016/j.jbi.2015.03.011

Building Bridges Across Electronic Health Record Systems Through Inferred Phenotypic Topics

You Chen ¹, Joydeep Ghosh ², Cosmin Adrian Bejan ¹, Carl A Gunter ³, Siddharth Gupta ³, Abel Kho ⁴, David Liebovitz ⁴, Jimeng Sun ⁵, Joshua Denny ^1,⁶, Bradley Malin ^1,⁷

PMCID: PMC4464930 NIHMSID: NIHMS677489 PMID: 25841328

Abstract

Objective

Data in electronic health records (EHRs) is being increasingly leveraged for secondary uses, ranging from biomedical association studies to comparative effectiveness. To perform studies at scale and transfer knowledge from one institution to another in a meaningful way, we need to harmonize the phenotypes in such systems. Traditionally, this has been accomplished through expert specification of phenotypes via standardized terminologies, such as billing codes. However, this approach may be biased by the experience and expectations of the experts, as well as the vocabulary used to describe such patients. The goal of this work is to develop a data-driven strategy to 1) infer phenotypic topics within patient populations and 2) assess the degree to which such topics facilitate a mapping across populations in disparate healthcare systems.

Methods

We adapt a generative topic modeling strategy, based on latent Dirichlet allocation, to infer phenotypic topics. We utilize a variance analysis to assess the projection of a patient population from one healthcare system onto the topics learned from another system. The consistency of learned phenotypic topics was evaluated using 1) the similarity of topics, 2) the stability of a patient population across topics, and 3) the transferability of a topic across sites. We evaluated our approaches using four months of inpatient data from two geographically distinct healthcare systems: 1) Northwestern Memorial Hospital (NMH) and 2) Vanderbilt University Medical Center (VUMC).

Results

The method learned 25 phenotypic topics from each healthcare system. The average cosine similarity between matched topics across the two sites was 0.39, a remarkably high value given the very high dimensionality of the feature space. The average stability of VUMC and NMH patients across the topics of two sites was 0.988 and 0.812, respectively, as measured by the Pearson correlation coefficient. Also the VUMC and NMH topics have smaller variance of characterizing patient population of two sites than standard clinical terminologies (e.g., ICD9), suggesting they may be more reliably transferred across hospital systems.

Conclusions

Phenotypic topics learned from EHR data can be more stable and transferable than billing codes for characterizing the general status of a patient population. This suggests that EHR-based research may be able to leverage such phenotypic topics as variables when pooling patient populations in predictive models.

Keywords: Clinical phenotype modeling, Computers and information processing, Data mining, Electronic medical records, Medical information systems, Pattern recognition

Graphical abstract

1. Introduction

There is mounting evidence to suggest that data derived from electronic health records (EHRs) can be applied in a secondary fashion to support a wide range of activities. There are indications, for instance, that EHR data can facilitate novel clinical decision support [1–2], conduct biomedical association studies [3–9], improve auditing and EHR security [10–12], and assess the cost effectiveness of treatments [13]. It is further anticipated that EHR data can be utilized to efficiently support a learning healthcare system, where information about care and operations is translated into knowledge for evidence-based clinical practice and positive change [14–15]. At the same time, there are significant challenges to reusing EHR data, including a lack of common standards to merge clinical data and translate clinical concepts between disparate healthcare systems [1, 15–16]. As such, it is critical to develop scalable methods to learn clinical concepts (or phenotypes) that can be translated across disparate healthcare systems.

In recognition of such a challenge, the past several years have witnessed a movement towards strategies to engineer and implement processes that standardize EHRs and derived concepts [17–22]. These strategies are driven both by rule-based models that are specified by experts, as well as data-driven methods that attempt to learn patterns from the information within EHRs. With respect to rule-based models, researchers often rely upon billing codes (e.g., International Classification of Diseases, or ICD) or modified versions of such vocabularies (e.g., Phenome-Wide Association Study, or PheWAS, codes [23–24]) to characterize the diagnoses of patients in disparate healthcare systems (e.g., [19–20, 22]). Since billing codes can be inaccurate, often other EHR data, such as medication and laboratory data, often are combined with billing data to form more accurate phenotypes [25]. However, these rule-based methods are limited by the significant amount of manual effort (e.g., physician chart reviews) required to implement them. Furthermore, these types of studies are only appropriate for known phenotypes. As a result, the process of investigating phenotypes across disparate healthcare systems is often quite slow and hampered in the discovery of new phenotypes. By contrast, data-driven methods rely upon techniques to learn phenotypic patterns from databases of EHRs (e.g., [17–18]). Yet these methods are also limited in that they learn patterns from healthcare systems independently.

This paper introduces a method to automatically learn phenotypic topics and evaluate their consistency across disparate healthcare systems. For this study, we limit our analysis to billing code data as a demonstration project to investigate the method, recognizing that if successful, the method could be applied to other discrete EHR data. Such topics can be leveraged as control variables to align patient populations across multiple systems. After validation by knowledgeable domain experts, such topics may become novel phenotypes that are worthy of further investigation.

The proposed method is composed of two primary steps. First, it infers phenotypic topics from the EHRs of each healthcare system through a generative model. Second, it measures the consistency of the learned topics for characterizing the patient populations across disparate systems. To the best of our knowledge, this is the first approach to automatically infer and test the alignment of phenotypic topics from the EHR data of multiple healthcare systems. To demonstrate feasibility, we perform an analysis on four months of inpatient billing data from two geographically distinct systems: i) the Northwestern Memorial Hospital (NMH) and ii) Vanderbilt University Medical Center (VUMC). The results demonstrate that learned phenotypic topics that appear to have a high degree of similarity can be found in two different healthcare systems.

The remainder of this paper is structured as follows. Section 2 introduces data-driven and expert-based phenotypic topic learning algorithms. Models of phenotypic topic learning and evaluation criteria for their consistency across multiple systems are introduced in Section 3. The design of the experimental environment is described in Section 4, while Section 5 reports on the corresponding results. A discussion of the findings, as well as limitations is provided in Section 6 and Section 7 provides a conclusion and next steps.

2. Background

Methods for modeling phenotypic topics through EHR data can roughly be categorized into 1) expert- and 2) data-driven. The former is based on the experience of clinically-knowledgeable individuals. As such, the process is often limited to known phenotypes and can be slow, particularly when validating the specification of a phenotype across disparate systems. The latter incorporates automation, which leads to significant gains in efficiency and strives to minimize manual attention. However, to date, phenotypic topics have been learned from healthcare systems independently, such that their ability to serve as common variables across healthcare systems is unknown.

2.1. Expert-based Phenotypes

A significant number of healthcare organizations have implemented commercial EHR systems [26]. At times, systems are implemented, or adapted, in multiple sites according to standardized policies [27]. However, EHR systems remain highly diverse due to the fact that EHR (and terminology) utilization, as well business processes, is often site-specific [28–29]. As a result, it is difficult to perform investigations across sites [28–33]. Challenges remain in reusing such data for research, such as the mapping of the data to a common standard that can enable research across one large cohort [1, 28]. As such, the research community is only beginning to use phenotypic concepts to merge patients with similar conditions (or specific diseases) from disparate systems [19, 21–22, 29–33].

Here, we consider several representative works for illustration. First, Tanpowpong and colleagues [31] evaluated the value of ICD-9 codes for identifying a specific phenotype in the form of celiac disease. To do so, they identified all adults with an ICD-9 code of 579.0 at three hospitals and stratified the cohort according to the presence/absence of relevant serology and endoscopy codes into four groups. Columa and colleagues [32] moved beyond billing codes and demonstrated the potential for the integration of information from clinical narratives. Using the phenotype of acute myocardial infarction in EHR data from three European countries, it was shown that an approach using the combination of billing codes and free text yielded a better positive predictive value than an approach using codes alone.

Beyond identifying specific diseases, EHR-based phenotyping algorithms have been utilized to measure the similarity of patients from different sites. Schildcrout and colleagues [20] quantified the variability in comorbid ICD9 codes for six phenotypes across five sites, including type 2 diabetes and peripheral arterial disease. They found that patients with the same phenotype at disparate institutions appeared to exhibit more similar comorbidity profiles than those representing different phenotypes; however, there was still variability within the same phenotype at different institutions.

While a phenotyping algorithm can be specified using various terminologies, the application of an algorithm on patient cohorts in disparate settings can often yield differing results. In an attempt to address this challenge, it was indicated that standardized information modeling and meaningful use standards could be leveraged for the presentation of a phenotyping algorithm across institutions [29–30]. It was shown that a consensus model can be more effective than a single site’s specification for phenotype discovery across sites.

While these studies illustrate the potential for data derived from EHRs and the need for harmonization of phenotype definitions, they have several limitations. First, all of these studies require significant manual effort. This indicates that the speed of learning phenotypic topics across sites can be slow, lacking scalability to a large number of phenotypes. Second, such expert-based methods are restricted to known phenotypes, which limit their utility in discovery-based research.

2.2. Data-driven Phenotypes

By contrast, data-driven methods aim to automate the mining of phenotypic topics from EHR data. There has been a flurry of activity in various automated learning methods for high-throughput phenotyping over the past several years.

First, it was recently shown that inductive logic programming (ILP) can be applied to EHR data to learn ICD-9 code based phenotypes [34]. However, in preparation for ILP, which was applied to identify phenotype features, the investigators needed to review and assign labels to a set of patient records that were representative of a larger corpus.

While the previous work relies upon supervised learning, more recent methods have focused on the unsupervised setting. Lasko et al. [17], for instance, introduced an unsupervised algorithm, based on deep learning methods, to discover phenotypic features from EHR data. This method relies upon Gaussian process regression, followed by a feature discovery step based on deep learning, to learn phenotypic features from sequences of serum uric acid measurements. It was shown that the learned features could accurately distinguish between the uric-acid signatures of gout and acute leukemia. Other approaches have applied matrix (or, more generally, tensor) factorization methods to derive phenotypic topics in temporal settings [35]. With respect to the latter, variations of unsupervised nonnegative tensor factorization methods have been introduced to decompose combinations of diagnoses, medications, and procedures [18, 36]. This approach was applied, for instance, on a cohort of approximately 30,000 heart failure patients and illustrated that the top 40 phenotypic topics could outperform the original 640 features (which consisted of 169 diagnosis categories and 471 medication types) in learning patient clusters.

Beyond its application for mining phenotypic topics from EHR data, data-driven methods on EHR data have also been leveraged to mine communities of care providers [10–11], semantic concepts of patients [37] and clinical pathway patterns through the activity logs of healthcare systems [38–40]. For example, Huang and colleagues [39] used an altered latent Dirichlet allocation (LDA) model to infer patterns of clinical pathways from EHR activity logs. Specifically, they applied an altered LDA model on two cohorts: 1) patients treated for unstable angina and 2) patients treated in an oncological setting. The model inferred five clinical pathways for each of the two settings. Though a pilot study, it was demonstrated that learned pathway patterns can enable decision support and greater efficiency in coordinated clinical treatments. Bouarfa and Dankelman [38] derived a workflow consensus from clinical activity logs to detect outlying workflows without prior knowledge from experts. They adopted a tree-guided multiple sequence alignment approach to model the consensus of workflows. This strategy was validated over the workflow processes associated with laparoscopic cholecystectomy, where the results indicated the derived consensus conforms to the main steps of the surgical procedure as described in best practice guidelines.

The above data-driven research indicates the automated learning of concepts in the clinical domain can be efficient and scalable. However, the existing methods are limited in that they only learn phenotypic topics from the EHR of a single institution.

3. Methods

The general framework for the proposed method is depicted in Figure 1. The framework is composed of two parts: 1) a Topic Learning Model, which extracts phenotypic topics for each site (as depicted in the top part of the figure) and 2) three Topic consistency Measurements, which evaluate the consistency of phenotypic topics across disparate sites (as depicted in the bottom part of the figure).

A high-level overview of the architecture for extracting phenotypic topics and evaluating their consistency across healthcare systems.

We now provide a high-level overview of the models and then proceed with a deeper dive into each component. For reference, a legend of the notation used throughout this paper is provided in Table 1.

Table 1.

Common notation and the corresponding definitions.

Notation	Description
X	A healthcare system
P_X = {p_X,1, …, p_X,m}	A set of patients from X
C_X = {c_X,1, …, c_X,n}	A set of clinical terms defining patients in P_X
T_X = {t_X,1, …, t_X,k}	A set of k phenotypic topics retrieved from P_X defined by n clinical terms in C_X
G_Y,Z = {g₁, …, g_k}	A set of patient groups in P_Z clustered using k topics in T_Y
ψ_Y,Z (size k × m)	A probability matrix of k topics in T_Y to characterize m patient in P_Z
R_Y,Z (size 1 × k)	A vector of rates of patients in P_Z characterized by topics in T_Y

Open in a new tab

For illustration, we assume there are two healthcare systems, A and B. We let P_A represent the set of patients from site A, where each patient is defined over a set of clinical terms in C_A. A clinical term corresponds to a phenomenon associated with the patient in the clinical domain. For instance, a clinical term could be a diagnostic billing code, a medication, a diagnosis extracted from natural language processing, or the finding of a laboratory test. The set of phenotypic topics T_A are learned in this space, and are characterized as a probability matrix of topics over clinical terms. Specifically, a topic corresponds to a pattern of co-occurring clinical terms, defined by their probability distribution given (or “conditioned on”) that topic. A topic may or may not have an obviously clinical basis, but nevertheless can be useful for characterizing patients. We use ψ_A,A and ψ_B,A to represent matrices of probabilities that specify the likelihood that the patients in P_A are characterized by the topics in T_A and T_B, respectively. The terms P_B, C_B, T_B, ψ_A,B and ψ_B,B are defined similarly.

As mentioned earlier, there are numerous ways to learn from EHR data. In this work, we rely upon a general topic modeling strategy because it has a natural probabilistic interpretation. Once the phenotypic topics have been learned from each site, we evaluate their consistency from three perspectives: 1) similarity of topics of disparate sites, 2) stability of a population in the topics of disparate sites, and 3) transferability of a topic between disparate populations.

3.1. Topic Learning Model

We assume a patient is characterized by various clinical terms, such as diagnostic billing codes, and invoke an LDA model [41] to infer phenotypic topics. LDA is a probabilistic graphical model that was first developed to discover topics in natural language documents. It is a generative model that explains observations with hidden, or latent, patterns. Conceptually, patients can be thought of as documents, where the clinical terms constitute the vocabulary and the specific terms assigned to a patient’s record are the “words”. As such, given P_A characterized by clinical terms, we apply an LDA model to infer latent phenotypic topics T_A, each of which is composed of a probability distribution over the set of clinical terms.

The set of topics T_X is inferred from a matrix M_X (of size m × n), where m is the number of patients in P_X and n is the number of distinct clinical terms in C_X. Here, M_X(i, j) corresponds to the number of times clinical term c_X,j in C_X was assigned to a patient p_X,i in P_X.

LDA is applied to learn k latent topics T_X = {t_X,1, t_X,2, …, t_X,k}. It is often the case that perplexity [42], an information theoretic measure, is applied to assess the fitness of an LDA model and set k. However, low perplexity is insufficient to indicate if the learned LDA model is a good fit [41–42]. In our situation, we aim to determine the k value that determines the best separation between the phenotypic topics. To do so, we calculate the average similarity of the topics:

γ (T_{X}) = \frac{2}{k \times (k - 1)} \sum_{i = 1}^{k - 1} \sum_{j = i + 1}^{k} cos (t_{X, i}, t_{X, j})

(1)

where cos(t_X,i, t_X,j) is the cosine similarity [43] of topics t_X,i and t_X,j.¹

3.2. Measures of Topic consistency

We evaluated the consistency of the inferred topics using three quantitative measures: 1) similarity of topics, 2) stability of patient cohorts across topics, and 3) transferability of topics across sites.

3.2.1 Topic Similarity

The first topic consistency measure directly assesses the similarity of the inferred topics from disparate sites. Note, however, that T_A and T_B have a different number of rows (i.e., diagnoses). So to compare learned phenotypic topics, we substitute T_A and T_B with a vector U_AB (size n_U × 1) that represents the union of diagnoses, such that U_AB = C_A ∪ C_B. Thus, topics T_A and T_B are rewritten as T′_A (size n_U × k_A) and T′_B (size n_U × k_B). Based on this representation, the similarity of two phenotypic topics is calculated using the cosine similarity of the vectors:

β (t_{A, i}, t_{B, j}) = \frac{u_{i} \cdot u_{j}}{| u_{i} | | u_{j} |} (u_{i} \in {T'}_{A}, 1 \leq i \leq k_{A}; u_{j} \in {T'}_{B}, 1 \leq j \leq k_{B})

(2)

The larger the β, the stronger the similarity of the phenotypic topics.

Our aim is to find the largest average cosine similarity, where each topic in T_A matches a topic in T_B and vice versa. We use the Hungarian algorithm [45] to perform such matches.² To do so, let Ω be a matrix (sized k_A × k_B) that conveys the costs of matching topics between sites A and B, where cell Ω(i,j) indicates the cost of matching topic t_A,i and topic t_B,j. We assume that if the cosine similarity of a pair of topics is 1, then the cost of this matching is 0, such that the cost of a topic matching as:

Ω (t_{A, i}, t_{B, j}) = | β - 1 |

(3)

The topic similarity is thus defined as the minimum sum of costs for the maximum matching of topics between t_A and t_B. The higher the topic similarity between two sites, the smaller the cost.

3.2.2 Population Stability

The second consistency measure assesses the stability of a patient population across the topics derived from disparate sites. When the stability of a patient population is high, it is likely that the topics from one site will characterize the patients from another site.

ψ_Y,Z is defined as a matrix of probabilities of patients in P_Z characterized by topics in T_Y. ψ_Y,Z is retrieved by an inferred LDA model, which is based on an existing LDA model of site Y to characterize patients of site Z. According to definition of ψ_Y,Z, ψ_A,A (size k_A × m_A) and ψ_B,A (size k_B × m_A) represent the probabilities that the topics in T_A and T_B, respectively, characterize the patients in P_A. Specifically, a cell ψ_A,A(i,j) corresponds to the probability that topic t_A,i in T_A characterizes patient p_A,j in P_A. When a patient in P_A is characterized by a phenotypic topic t_A,i (or t_B,i) with a probability greater than a predefined threshold, we assign the patient to the topic. And thus, ψ_A,A and ψ_B,A can be invoked to group patients in P_A. In doing so, each phenotypic topic has a corresponding group of patients.³

Let T_Align,A and T_Align,B be a reordering of topics in T_A and T_B, respectively, such that T_Align,A(i) most closely matches T_Align,B(i) as per the Hungarian algorithm. For example, imagine T_A = {t_A,1, t_A,2, t_A,3} and T_B = {t_B,1, t_B,2, t_B,3}, and the Hungarian algorithm matches t_A,1 with t_B,2, t_A,2 with t_B,3, and t_A,3 with t_B,1. Then, T_Align,A = {t_A,1, t_A,2, t_A,3} and T_Align,B = {t_B,2, t_B,3, t_B,1}.

Now, let G_A,A = {g₁, …, g_{k_A}} and G_B,A = {g₁, ⋯, g_{k_B}} be the sets of groups for the patients in P_A associated with the topics in T_Align,A and T_Align,B, respectively. Moreover, let G′_A,A = [|g₁|, …, |g_{k_A}|] and G′_B,A = [|g₁|, …, |g_{k_B}|] represent the vectors with the number of patients per group. Population stability focuses on the relationship of the set of matched proportions (i.e., where each point is the rate at which patients in population A are characterized by the matched topics of sites A and B). So, we apply the Pearson correlation coefficient [46] to G′_A,A and G′_B,A:

ρ ({G'}_{A, A}, {G'}_{B, A}) \frac{COV ({G'}_{A, A}, {G'}_{B, A})}{σ_{{G'}_{A, A}} σ_{{G'}_{B, A}}},

(4)

where cov(·,·) is the covariance and σ_{G′_A,A} and σ_{G′_B,A} are the standard deviations of G′_A,A and G′_B,A, respectively. The correlation of G′_A,B and G′_B,B is defined similarly. The stability of a population on topics of two sites is measured via the Pearson correlation coefficient, as indicated by Equation 4.

3.2.3 Topic Transferability

The third consistency measure assesses how phenotypic topics transfer from one site to another. We aim to learn topics that characterize patients at a similar rate across the sites. This is because similar rates suggest that the sites manage similar populations.

To assess the transferability of topics in T_A, we define the following regression model:

log R_{A, A} = log I + α log R_{A, B},

(5)

where R_A,A (R_A,B) is a vector of the rates at which patients from P_A (P_B) are characterized by the learned phenotypic topics in T_A, α is the slope of the regression and I is the intercept.⁴

Transferability of topics within a site is defined as the mean and standard deviation of distances for all of its phenotypic topics to the regressed line. To illustrate, consider a topic t_A,i in T_A. The distance of this topic to the line is:

dis (r_{i, B}, r_{i, A}) = \frac{max (r_{i, A}, 10^{I} \times {r_{i, B}}^{α})}{min (r_{i, A}, 10^{I} \times {r_{i, B}}^{α})} log (| r_{i, A} - 10^{I} \times {r_{i, B}}^{α} | + 1)

(r_{i, A} \in R_{A, A}, r_{i, B} \in R_{A, B}; 1 \leq i, j \leq k_{A}),

(6)

where r_i,A and r_i,B are the rates at which a learned topic t_A,i is expressed by patients at site A and B respectively. (r_i,B, 10^I × r_i,B^α) is the corresponding point on the regressed line for (r_i,B, r_i,A). The term $\frac{max (r_{i, A}, 10^{I} \times {r_{i, B}}^{α})}{min (r_{i, A}, 10^{I} \times {r_{i, B}}^{α})}$ is a scaling factor that magnifies the effect of outliers on the transferability of phenotypic topics, the justification for which is in the Appendix A1. A logarithmic transformation is applied for normalization and ensures that the distance of a point that falls on the regressed line is equal to zero.

4. Experimental Design

4.1. Datasets

We evaluate the reliability of phenotypic topics on de-identified data from the EHRs of two distinct healthcare systems. The first dataset corresponds to four months of inpatient records from the StarPanel EHR system of the Vanderbilt University Medical Center (VUMC) [47]. The second dataset corresponds to four months of inpatient records from the EHR system of Northwestern Memorial Hospital (NMH) [37]. There are 14,606 and 17,947 inpatients at NMH and VUMC, respectively. Additional summary information about the datasets are provided in Table 2.

Table 2.

Summary information for four months of inpatient data derived from the EHRs.


	Northwestern	Vanderbilt
Patients	14,606	17,947
Unique ICD-9 codes	4,543	5,176
Unique PheWAS codes	1,447	1,413
Unique 〈ICD9 code, patient〉 assignments	114,133	84,331
Unique 〈PhaWAS code, patient〉 assignments	90,732	74,192

Open in a new tab

While we recognize that the clinical status of a patient is complex, this work focuses on a proof of concept and relies upon the billing codes to learn phenotypic topics. Such codes do not provide a complete picture of the status of a patient, but they are common in biomedical research and can provide insight into the capability of such a strategy. Nonetheless, multiple billing codes can be used to describe the same clinical disease [48–49], such that various EHR-driven phenotyping investigations (e.g., [19, 29–30]) have instead adopted the Phenome-Wide Association Study (PheWAS) vocabulary [23]. PheWAS codes correspond to groups of ICD-9 codes more closely match clinical or genetic understandings of diseases and reduce variability in identifying diseases. Based on this expectation, and to be in accordance with prior work in phenotyping, we translate a patient’s ICD-9 codes to PheWAS codes. All of the learned phenotypic topics reported in this paper are based on the PheWAS codes.

4.2. Setting the Number of Phenotypic Topics

We train LDA model by using Gibbs sampling which is a typical technique for parameter estimation and then check the negative log-likelihood at each iteration to judge when a model has converged upon a solution. To parameterize the number of phenotypic topics for the LDA model, we minimize 1) the perplexity score and 2) the average similarity of the topics within a site. Based on these measures, we set the number of topics to 25 for each site. Further details of this process can be found in Appendix A2.

4.3 Consistency of NMH and VUMC Topics

For topic similarity, we calculate the cost of matching NMH and VUMC topics using the Hungarian algorithm on the cost matrix defined in Equation 3. For each VUMC phenotypic topic, we match a NMH topic and vice versa. If each phenotypic topic in one site has a matching topic in another site with a low cost, it implies that the topics are common across the sites.

For stability, we calculate the Pearson correlation coefficient of a patient population characterized by NMH and VUMC topics respectively. The higher coefficient, the more stable for a population characterized on NMH and VUMC topics. We use the Pearson correlation coefficient (Equation 4) to calculate the stability.

For transferability, we learn a regression model for the NMH and VUMC phenotypic topics, respectively (Equation 5). We then compute the distance of a topic to the regressed line (Equation 6). We use the variance of the regression line to characterize the transferability of the corresponding model. To demonstrate the transferability of learned topics, we also conduct an analysis that compares the transferability of the learned phenotypic topics with ICD-9 and PheWAS codes. The rate at which ICD-9 and PheWAS codes transpire in the patients have a wide range (as discussed below), such that we use a binning approach to reduce the standard error of the linear regression [50]. Specifically, we use 50 bins, where each bin maps the set of points in a rectangular area of the distribution to a mean value which is supplied to the linear regression.

5. Results

To orient the reader, this section begins with a depiction of several learned phenotypic topics. We then report on the similarity, stability, and transferability of the derived topics.

5.1. Learned Phenotypic Topics

To better understand our experimental results, this section exemplifies a selected set of phenotypic topics inferred by the LDA model from the NMH and VUMC datasets. In our framework, each topic is expressed as a probability distribution over approximately 1500 PheWAS codes. To illustrate each topic in a succinct manner, we show the top five most probable PheWAS codes that best describe the corresponding topic. Details on topics not listed in this section are provided in Appendices A3 (for VUMC) and Appendices A4 (for NMH).

Figures 2 through 4 depict several notable groups of topics. Figure 2 shows a pair of topics (N₁₃ and V₇) that exhibits high similarity. Figure 3 shows three NMH topics (N₂, N₄ and N₁₇), that are similar to the same VUMC topic (V₄), and generally correspond to a collection of conditions associated with pregnancy and birth. Figure 4 shows four topics (V₁, V₁₅, V₁₈, and N₂₄), each of which lacks a corresponding topic at the other site.

The top five PheWAS codes in the pair of phenotypic topics with the highest similarity (a score of 0.86).

Three phenotypic topics from Vanderbilt and one topic from Northwestern lack a corresponding topic of other site with a similarity greater than 0.2.

Three phenotypic topics from Northwestern that are well matched with topic 4 from Vanderbilt.

5.2. Consistency of Phenotypic Topics

5.2.1 Similarity of Topics

The similarity of each phenotypic topic pair from NMH and VUMC is depicted in the heatmap in Figure 5(a). It can be seen that, for the majority of the topics, the similarity is high for the best match. To show the pairs with strong relations more clearly, Figure 5(b) displays a bipartite network of the similarity scores with values larger than 0.2. Here, it can be seen that almost every NMH phenotypic topic has at least one corresponding VUMC phenotypic topic. The only NMH topic that fails to have a partner is topic N₂₄, which is primarily associated with thrombosis. Similarly, almost every VUMC topic has a corresponding NMH topic. The exceptions are V₁, V₁₅, and V₁₈, which are most associated with perinatal conditions, internal injuries to organs, and burns.

Similarity of NMH and VUMC topics in (a) heatmap form and (b) network form (for scores ≥ 0.2). Lines drawn in (b) are connections only for pairs of topics at different sites. The wider thickness of the line indicates tighter relations of a pair of topics.

The results and corresponding cost of the alignment of the topics is reported in Appendix A5. It was found that the total cost for a maximum matching5 of topics between NMH and VUMC is 15.26. The average cost for each pair of phenotypic topics is 0.61, which indicates that the average cosine similarity for a pair of aligned phenotypic topics is 0.39 (Equation 3). The cost of alignment for the learned phenotypic topics is statistically significantly smaller than that of alignments for phenotypic topics in a random setting (details of the hypothesis test are in Appendix A5).

To illustrate the result of the alignment, Figure 6 compares matched phenotypic topics N13 and V7. This relationship appears natural because both topics are primarily associated with “coronary atherosclerosis” and “myocardial infarction” (each exhibits a high probability within the topics). At the same time, it should be noted that these topics include additional terms, such as “chronic airway obstruction”, “pulmonary heart disease”, “hyperlipidemia” and “peripheral arterial disease”. Yet these terms exhibit lower probabilities, suggesting the topics consist of a core and ancillary set concepts, the latter of which are nuanced and may be driven by populationspecific issues.

Comparison of the top PheWAS codes associated with topics N₁₃ and V₇.

5.2.2 Stability of a Patient Population Over Topics

The second consistency measure assesses the stability of a patient population (e.g., VUMC patient population) on phenotypic topics learned from the NMH and VUMC datasets. The goal of this portion of the investigation is to measure the relations between a patient population characterized by its own phenotypic topics and that characterized by the corresponding topics of the other site. To do so, we aligned the VUMC and NMH topics and get the corresponding clusters of patients from a site (e.g., VUMC). The alignment is shown in Table A1 of Appendix A5 and the resulting size of the clusters is shown in Figure 7.

Pearson correlation between the rate at which (a) NMH and (b) VUMC patients are characterized by phenotypic topics derived from the two sites.

The Pearson correlation coefficient of the VUMC and NMH populations is 0.957 and 0.649, respectively. This indicates there is generally high stability in the learned phenotypic topics across the sites. While the correlation for the NMH patients is clearly smaller than that which is observed for the VUMC patients, this is mainly because NMH has a higher volume of patients with certain conditions: 278.1 – Obesity; 649 – Mother Complicating Pregnancy; 665 – Obstetrical/Birth Trauma; and 645 – Late Pregnancy and Failed Induction, which are captured by three NMH topics (N₂, N₄ and N₁₇), but only one VUMC topic (V₄). The composition of these topics is summarized in Figure 3.

Note that, as depicted in Figure 7(a), phenotypic topic V₄ is expressed by over 30% of the NMH patients. Based on this observation, we performed a sub-analysis on the patient population that was not explained by {N₂, N₄, N₁₇} and their corresponding aligned topics {V₁₄, V₄, V₁} as depicted in Table A1 of Appendix A5. The correlation marginally increases for the VUMC patients (0.988), and substantially increases for the NMH patients (0.812). This suggests that a patient population on the learned phenotypic topics may be more stable when the sites are focused on a broad variety of patients (i.e., beyond several specific conditions).

To illustrate the stability of a patient population more specifically, let us consider a brief case study of N₁₃ and V₇. Figure 8 illustrates the intersection of NMH (a) and VUMC (b) patients assigned to these topics. It can be seen that both topics are expressed by most of the patients with a probability larger than 0.5.⁶

Extent to which (a) NMH and (b) VUMC patients are expressed by phenotypic topics N₁₃ and V₇. Each point corresponds to the probability a specific patient is characterized by a topic.

We calculate the rate of patients in common for these two phenotypic topics using the Jaccard measure:

r (V_{7}, N_{13}) = \frac{E_{V_{7}, V U M C} \cap E_{N_{13}, V U M C}}{E_{V_{7}, V U M C} \cup E_{N_{13}, V U M C}},

(7)

where E_V7,VUMC and E_N13,VUMC are the sets of VUMC patients assigned to topics V₇ and N₁₃, respectively. The degree of commonality for the NMH and VUMC patients is 0.35 and 0.51, which indicates a relatively high rate of patients in common.

5.2.3 Transferability of Topics

To determine if phenotypic topics are more transferable than expert-derived vocabularies for characterizing patient populations, we compared their variance of transferability to ICD-9 and PheWAS codes. For illustration, the distribution of the rate at which codes from the expert-derived vocabularies are expressed by patients is depicted in Figure 9.

Rate at which (a) ICD-9 and (b) PheWAS codes are expressed in the VUMC and NMH inpatient populations.

Notably, certain codes associated with common chronic diseases, such as ICD-9 401.9 and PheWAS 401.1 (a translation of ICD-9 401, 401.1 and 401.9), which are both associated with hypertension, are stable across the VUMC and NMH patient populations. However, there are certain instances where the codes exhibit a large variance in the population. Clear examples of this case are ICD-9 codes V05.3 - need for prophylactic vaccination and inoculation against viral hepatitis and V30.0 - Single liveborn, born in hospital, delivered without mention of cesarean, as well as PheWAS codes 656 - Other perinatal conditions and 637 - Short gestation; low birth weight; and fetal growth retardation.

The regression models for assessing transferability are summarized in Appendix A6. In summary, the average distance (and its corresponding standard deviation) of the ICD-9 and PheWAS codes to their regressed lines, are depicted in Figure 11. It can be seen that the ICD-9 codes (0.0109±0.2215) exhibit a larger variance than the PheWAS codes (0.0108±0.1299). This is due, in part, to the fact that most of the codes which are rare at one site (i.e., the upper left and bottom right of the plots in Figure 9) have a wider variance to the regressed line. By contrast, the codes that are more common (i.e., the upper right of the plots in Figure 9), such as essential hypertension exhibit low variance and, thus, are more stable for expressing the pat ient population than those locate in the left-bottom corner. The PheWAS codes exhibit a smaller variance than the ICD9 codes, which suggests the codes are consistently utilized to represent patients with a particular clinical notion across disparate sites.

Average distance (+/− 1 standard deviation) to the regressed line of vocabulary-based and learned phenotypic topic model.

For the learned phenotypic topics, we compute the regression models (which we refer to as N-Topic and V-Topic) and calculate the distance of topics to the regressed line. Figure 10 depicts the rate at which the phenotypic topics occur in the NMH and VUMC populations. It can be seen that the distribution of phenotypic topics exhibits smaller variance than the ICD-9 and PheWAS codes. This is more formally confirmed in Figure 11, which shows that the N-Topic (0.0055 ± 0.07) and V-Topic (0.00202 ± 0.06) models have smaller standard deviations than those of ICD-9 (0.0109 ± 0.2215) and PheWAS codes (0.0108 ± 0.13).

Rate at which phenotypic topics learned from (a) NMH and (b) VUMC occur in the NMH and VUMC patient populations.

However, there are three outliers in the NMH topics (Figure 10(a)) and one in the VUMC topics (Figure 10(b)). This is because a large proportion of the NMH patients are related with Obstetrical/birth trauma conditions. As alluded to earlier, these conditions are expressed by topics N₂, N₄, N₁₇, and V₄, which form a community. The proportion of patients characterized by these four topics is high, which will dominate the variances of other topics. We thus removed the outliers and remodeled the patients as N-Topic-Reduced (0.00087±0.0021) and V-Topic-Reduced (0.0011±0.001), the results for which are also shown in Figure 11. It can be seen these models exhibit the smallest variance, suggesting they are the most transferable for characterizing the patients across the sites.

6. Discussion

In general, the experimental results suggest that phenotypic topics, learned through a generative topic modeling strategy (i.e., LDA) in the inpatient populations of two distinct healthcare systems, exhibit high consistency. This finding has several notable implications. First, the learned phenotypic topics could be invoked as covariates when investigating expert-defined phenotypes across healthcare systems. For example, in a diabetes-related investigation, the phenotypic topics V₁₁ and V₂₂, which capture aspects of coronary heart disease, may serve as control variables that represent the complexity of such confounding clinical problems. Second, the learned topics may enable novel quality-based studies across systems in their own right. For instance, it would be possible to investigate how the quality of outcomes for phenotypes associated with a complex pregnancy (e.g., V₄ integrates delivery, obesity, and fetal heart rate).

At the same time, there are several limitations to this investigation. First, our notion of transferability is based on the premise that a topic should occur at the same rate at disparate healthcare organizations. However, if a topic occurs at varying rates, it does not imply that the topic is useless. Rather, it could imply that the organizations have different types of populations. The topics themselves may still be notable and worthy of further investigation, but we stress that they limit the extent to which population-based results at each institution are directly relatable.

Second, we acknowledge that this is a pilot study, which only focuses on the phenotypic topics that can be discovered through the ICD-9 (and PheWAS) codes assigned to patients while they are admitted to the hospital. While diagnosis codes do not provide a complete view of a patient, they are common in biomedical research. However, it should be noted that the methodological component of this work is not dependent upon diagnoses codes, or any particular clinical vocabulary, such that it can readily be extended to create more complex and robust phenotypes. As this work is extended, it will be necessary to enhance the approach and account for the semantics of the patient and healthcare setting (e.g., inpatient vs. outpatient), where the distribution of such terms may be utilized at other rates.

Third, while the model proposed in this paper is scalable to other types of structured data (e.g., medications, clinical concepts extracted from natural language notes, laboratory test findings), it is not trivial to combine other data types across different EHR systems. More work is needed to determine how such additional factors can be combined with diagnoses to learn more comprehensive and nuanced phenotypic topics.

Fourth, as this approach is rolled out to a larger number of healthcare systems, it will be critical to devise and apply reliability measures that account for more than two sites. We anticipate that this may be accomplished through the extension of the basic bivariate correlation to a multiple correlation model.

Fifth, our work focuses on the development of a methodology to learn phenotypic topics to align disparate patient populations. However, we did not validate the clinical meaning of such topics nor the semantics of the similarity between identified groups of patients. If such topics are to be used in association studies, their meaning must be interpreted by clinically knowledgeable experts.

Finally, the case study was performed with two healthcare systems only, which themselves may cover different types of patients. As such, it is not clear if the phenotypic topics, or the transferability of the topics discovered in this study, are directly applicable to other healthcare systems. Moreover, the case study focused on all inpatients in the system simultaneously. At VUMC, this population includes patients from multiple hospitals, including the primary hospital, children’s hospital, as well as psychiatric and rehabilitation hospitals. In doing so, we incorporated neonatal, pediatric, and adult populations, which may confound the learning process. Furthermore, NMH does not have a focus on birth or children, such that the VUMC and NMH populations are not quite the same. We suspect that the learning process has the ability to discover phenotypic topics that are specific to certain demographics (age and gender), but note that this warrants further investigation.

7. Conclusions

Data derived from electronic health record (EHR) systems has the potential to enable large studies that incorporate disjoint healthcare providers, as well as support learning healthcare systems. However, it is challenging to automate learning across such systems due to a lack of standards in the use of clinical vocabularies. In this paper, we investigated the extent to which an automated learning approach, based on latent Dirichlet allocation, could be leveraged to infer phenotypic topics that are consistently defined across healthcare systems.

Specifically, we evaluated the approach with four months of inpatient data from two large geographically distinct hospital systems. The results illustrate that latent topics can reduce dimensionality and increase the stability and transferability of phenotypic topics studied across such sites. In particular, the findings suggest such an approach can enable the characterization of complex phenotypic topics that could be invoked as covariates in multi-site studies or analyzed in comparative consistency assessments for healthcare systems. Nonetheless, we stress that there are several opportunities for enhancing the proposed strategy. In particular, the current study focused solely on diagnosis codes, but more comprehensive and nuanced phenotypic topics should be discovered via an expansion of the vocabulary to contain additional phenomena, such procedures, medications, and laboratory tests.

Supplementary Material

NIHMS677489-supplement.pdf^{(443.5KB, pdf)}

HIGHLIGHTS.

A topic modeling strategy to translate EHR data into phenotypic topics
Approaches to assess consistency of phenotypic topics across healthcare systems
An evaluation on over 32,000 inpatient events from two disparate environments

Acknowledgements

We would like to thank Daniel Schneider and Prasanth Nannapaneni for gathering and supplying the data from Northwestern Memorial Hospital and Steve Nyemba from Vanderbilt University. Thank Dr. Daniel Fabbri and Dr. Tom Lasko for helpful discussions in the early days of this research. This work is supported by the National Institutes of Health, under grant R01LM010207 and R01LM010685, the National Science Foundation, under grants CCF-0424422, CNS-0964063, and SCH1418504 and the Office of the National Coordinator for Health IT, under grant HHS-90TR0003/01.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Appendices

Appendix A1: Weighting a Distance Function to Account for Outliers

Appendix A2: Setting the Number of Phenotypic topics

Appendix A3: VUMC Phenotypic Topics

Appendix A4: NMH Phenotypic Topics

Appendix A5: Cost of Phenotypic Topic Alignment

Appendix A6: Regression Results for Transferability of Phenotypic Topics

Kullback-Leibler divergence (KLD) is often applied to measuring the divergence between two probability distributions [44] because of its sound basis in information theory. However, there are several problems. First, KLD is asymmetric with respect to the distributions. Second, the topics should be well separated and hopefully sparse, but, unless the estimated probability distributions are smoothed (e.g., via Laplace smoothing), this can lead to KLD becoming unbounded. The cosine, by contrast, is not subject to these limitations.

This algorithm is efficient when the cost matrix is small.

The value for such a threshold is dependent on the application. A value of 0.5, which we use in this work, signifies that the majority of the patient’s status is captured by a single concept.

⁴

This model builds on the observation in [20] that the rate of occurrence for billing codes in disparate sites is distributed around a centered line in the log scale.

⁵

A maximum matching transpires when every topic in NMH has a corresponding topic in VUMC and vice versa.

⁶

Recall, a patient is considered assigned to a phenotypic concept when the probability is greater than 0.5.

References

1.Prokosch HU, Ganslandt T. Perspectives for medical informatics - reusing the electronic medical record for clinical research. Methods Inf Med. 2009;48:38–44. [PubMed] [Google Scholar]
2.Ramick DC. Data warehousing in disease management programs. J Healthc Inf Manag. 2001;15:99–105. [PubMed] [Google Scholar]
3.Lang M, Kirpekar N, Bürkle T, Laumann S, Prokosch HU. Results from data mining in a radiology department: the relevance of data quality. Medinfo. 2007;12:576–580. [PubMed] [Google Scholar]
4.Lyman JA, Scully K, Harrison JH., Jr The development of health care data warehouses to support data mining. Clin Lab Med. 2008;28:55–71. doi: 10.1016/j.cll.2007.10.003. [DOI] [PubMed] [Google Scholar]
5.Tusch G, Müller M, Rohwer-Mensching K, Heiringhoff K, Klempnauer J. Data warehouse and data mining in a surgical clinic. Stud Health Technol Inform. 2000;77:784–789. [PubMed] [Google Scholar]
6.Silver M, Sakata T, Su HC, Herman C, Dolins SB, O’Shea MJ. Case study: how to apply data mining techniques in a healthcare data warehouse. J Healthc Inf Manag. 2001;15:155–164. [PubMed] [Google Scholar]
7.Lang M, Bürkle T, Laumann S, Prokosch HU. Process mining for clinical workflows: challenges and current limitations. Stud Health Technol Inform. 2008;136:229–334. [PubMed] [Google Scholar]
8.Melamed RD, Khiabanian H, Rabadan R. Data-driven discovery of seasonally linked diseases from an Electronic Health Records system. BMC Bioinformatics. 2014;15(Suppl 6):S3. doi: 10.1186/1471-2105-15-S6-S3. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Crawford DC, et al. eMERGEing progress in genomics—the first seven years. Front. Genet. 2014 doi: 10.3389/fgene.2014.00184. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Chen Y, Nyemba S, Malin B. Detecting anomalous insiders in collaborative information systems. IEEE Trans Dependable Secure Comput. 2012;9:332–344. doi: 10.1109/TDSC.2012.11. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Chen Y, Nyemba S, Malin B. Auditing medical record accesses via healthcare interaction networks. Proc AMIA Symp. 2012:93–102. [PMC free article] [PubMed] [Google Scholar]
12.Chen Y, Lorenzi N, Nyemba S, Schildcrout JS, Malin B. We work with them? health workers interpretation of organizational relations mined from electronic health records. Int J Med Inform. 2014;83:495–506. doi: 10.1016/j.ijmedinf.2014.04.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Muranaga F, Kumamoto I, Uto Y. Development of site data warehouse for cost analysis of DPC based on medical costs. Methods Inf Med. 2007;46:679–685. [PubMed] [Google Scholar]
14.Etheredge LM. Rapid learning: a breakthrough agenda. Health Aff. 2014;33:1155–1162. doi: 10.1377/hlthaff.2014.0043. [DOI] [PubMed] [Google Scholar]
15.Weiner M, Embi P. Toward reuse of clinical data for research and quality improvement: the end of the beginning. Ann Intern Med. 2009;151:359–360. doi: 10.7326/0003-4819-151-5-200909010-00141. [DOI] [PubMed] [Google Scholar]
16.Weiskopf N, Weng C. Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research. J Am Med Inform Assoc. 2013;20:144–151. doi: 10.1136/amiajnl-2011-000681. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Lasko TA, Denny JC, Levy MA. Computational phenotype discovery using unsupervised feature learning over noisy sparse, and irregular clinical data. PLoS One. 2013;8:e66341. doi: 10.1371/journal.pone.0066341. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Ho JC, Ghosh J, Steinhubl S, Stewart W, Denny JC, Malin B, Sun J. Limestone: High-throughput candidate phenotype generation via tensor factorization. J Biomed Inform. doi: 10.1016/j.jbi.2014.07.001. In press. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Overby CL, Pathak J, Gottesman O. A collaborative approach to developing an electronic health record phenotyping algorithm for drug-induced liver injury. J Am Med Inform Assoc. 2013;20:e243–e252. doi: 10.1136/amiajnl-2013-001930. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Schildcrout JS, Basford M, Pulley J, Masys DR, Roden DM, Wang D, et al. An analytical approach to characterize morbidity profile dissimilarity between distinct cohorts using electronic medical records. J Biomed Inform. 2010;43:914–923. doi: 10.1016/j.jbi.2010.07.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Pathak J, Bailey KR, Beebe CE, Bethard S, Carrell DC, Chen PJ, et al. Normalization and standardization of electronic health records for high-throughput phenotyping: the SHARPn consortium. J Am Med Inform Assoc. 2013;20:e341–e348. doi: 10.1136/amiajnl-2013-001939. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Springate DA, Kontopantelis E, Ashcroft DM, Olier I, Parisi R, Chamapiwa E, Reeves D. ClinicalCodes: an online clinical codes repository to improve the validity and reproducibility of research using electronic medical records. PLoS One. 2014;9:e99825. doi: 10.1371/journal.pone.0099825. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Denny JC, Ritchie MD, Basford MA, Pulley JM, Bastarache L, Brown-Gentry K, et al. PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene-disease associations. Bioinformatics. 2010;26:1205–1210. doi: 10.1093/bioinformatics/btq126. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Kho AN, et al. Electronic Medical Records for Genetic Research: Results of the eMERGE Consortium. Sci Transl Med. 2011;3(79):79re1. doi: 10.1126/scitranslmed.3001807. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Denny JC, et al. Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data. Nat Biotechnol. 2013;31(12):1102–1110. doi: 10.1038/nbt.2749. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Jha AK, DesRoches CM, Campbell EG, Donelan K, Rao SR, Ferris TG, et al. The use of electronic health records in U.S. hospitals. N Engl J Med. 2009;360:1628–1638. doi: 10.1056/NEJMsa0900592. [DOI] [PubMed] [Google Scholar]
27.Takian A, Petrakaki D, Cornford T, Sheikh A, Barber N. Building a house on shifting sand: methodological considerations when evaluating the implementation and adoption of national electronic health record systems. BMC Health Services Research. 2012;12(1):105. doi: 10.1186/1472-6963-12-105. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Sun JY, Fang YG. Cross-domain data sharing in distributed electronic health record systems. IEEE Transactions on Parallel and Distributed Systems. 2010;21:754–764. [Google Scholar]
29.Pathak J, Kho AN, Denny JC. Electronic health records-driven phenotyping: challenges, recent advances, and perspectives. J Am Med Inform Assoc. 2013;20:e206–e211. doi: 10.1136/amiajnl-2013-002428. [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Ludvigsson JF, Pathak J, Murphy S. Use of computerized algorithm to identify individuals in need of testing for celiac disease. J Am Med Inform Assoc. 2013;20:e306–e310. doi: 10.1136/amiajnl-2013-001924. [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Tanpowpong P, Broder-Fingert S, Obuch JC, Rahni DO, Katz AJ, Leffler DA, et al. Multicenter study on the value of ICD-9-CM codes for case identification of celiac disease. Ann Epidemiol. 2013;23:136–142. doi: 10.1016/j.annepidem.2012.12.009. [DOI] [PubMed] [Google Scholar]
32.Coloma PM, Valkhoff VE, Mazzaglia G, Nielsson MS, Pedersen L, Molokhia M, et al. Identification of acute myocardial infarction from electronic healthcare records using different disease coding systems: a validation study in three European countries. BMJ Open. 2013;3:e002862. doi: 10.1136/bmjopen-2013-002862. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Newton KM, Peissig PL, Kho AN, Bielinski SJ, Berg RL, Choudhary V, et al. Validation of electronic medical record-based phenotyping algorithms: results and lessons learned from the eMERGE network. J Am Med Inform Assoc. 2013;20:e147–e154. doi: 10.1136/amiajnl-2012-000896. [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Peissig P, Santos Costa V, Caldwell MD, Rottscheit C, Berg RL, Mendonca EA, Page D. Relational machine learning for electronic health record-driven phenotyping. J Biomed Inform. 2014 doi: 10.1016/j.jbi.2014.07.007. in press. [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Zhou J, Wang F, Hu J, Ye J. From micro to macro: data driven phenotyping by densification of longitudinal electronic medical records; Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2014. pp. 135–144. [Google Scholar]
36.Ho J, Ghosh J, Sun J. Marble: high-throughput phenotyping from electronic health records via sparse nonnegative tensor factorization; Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2014. pp. 115–124. [Google Scholar]
37.Gupta S, Hanson C, Gunter C, Frank M, Liebovitz D, Malin B. Modeling and detecting anomalous topic access; Proceedings of the IEEE International Conference on Intelligence and Security Informatics; 2013. pp. 100–105. [Google Scholar]
38.Bouarfa L, Dankelman J. Workflow mining and outlier detection from clinical activity logs. J Biomed Inform. 2012;45:1185–1190. doi: 10.1016/j.jbi.2012.08.003. [DOI] [PubMed] [Google Scholar]
39.Huang ZX, Dong W, Ji L, Gan CX, Lu XD, Duan HL. Discovery of clinical pathway patterns from event logs using probabilistic topic models. J Biomed Inform. 2014;47:39–57. doi: 10.1016/j.jbi.2013.09.003. [DOI] [PubMed] [Google Scholar]
40.Zhang H, Mehotra S, Gunter C, Liebovitz D, Malin B. Mining deviations from patient care pathways via electronic medical record system audits. ACM Transactions on Management Information Systems. 2014;4:17. [Google Scholar]
41.Blei DM, Ng AY, Jordan MI. Latent Dirichlet allocation. Journal of Machine Learning Research. 2003;3:993–1022. [Google Scholar]
42.Newman D, Asuncion A, Smyth P, Welling M. Distributed inference for latent Dirichlet allocation; Proceedings of Neural Information Processing Systems (NIPS); 2007. pp. 1–9. [Google Scholar]
43.Chuang J, Gupta S, Manning CD, Heer J. Topic model diagnostics: assessing domain relevance via topical alignment; Proceedings of the International Conference on Machine Learning (ICML); 2013. pp. 612–620. [Google Scholar]
44.Kullback S, Leibler RA. On Information and Sufficiency. Annals of Mathematical Statistics. 1951;22(1):79–86. [Google Scholar]
45.Kuhn HW. The Hungarian Method for the assignment problem. Naval Research Logistics Quarterly. 1955;2:83–97. [Google Scholar]
46.Lin LI. A concordance correlation coefficient to evaluate reproducibility. Biometrics. 1989;45:255–268. [PubMed] [Google Scholar]
47.Giuse D. Supporting communication in an integrated patient record system. AMIA Annu Symp Proc. 2003:1065. [PMC free article] [PubMed] [Google Scholar]
48.Chan M, Lim PL, Chow A, Win MK, Barkham TM. Surveillance for Clostridium difficile infection: ICD-9 coding has poor sensitivity compared to laboratory diagnosis in site patients. PLoS One. 2011;6:e15603. doi: 10.1371/journal.pone.0015603. [DOI] [PMC free article] [PubMed] [Google Scholar]
49.Deych EB, Waterman AD, Yan Y, Nilasena DS, Radford MJ, Gage BF. Accuracy of ICD-9-CM codes for identifying cardiovascular and stroke risk factors. Med Care. 2005;43:480–485. doi: 10.1097/01.mlr.0000160417.39497.a9. [DOI] [PubMed] [Google Scholar]
50.Perkins E, Williams JR. Generalized spatial binning of bodies of different sizes. Discrete Element Methods. 2002:52–55. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

NIHMS677489-supplement.pdf^{(443.5KB, pdf)}

[R1] 1.Prokosch HU, Ganslandt T. Perspectives for medical informatics - reusing the electronic medical record for clinical research. Methods Inf Med. 2009;48:38–44. [PubMed] [Google Scholar]

[R2] 2.Ramick DC. Data warehousing in disease management programs. J Healthc Inf Manag. 2001;15:99–105. [PubMed] [Google Scholar]

[R3] 3.Lang M, Kirpekar N, Bürkle T, Laumann S, Prokosch HU. Results from data mining in a radiology department: the relevance of data quality. Medinfo. 2007;12:576–580. [PubMed] [Google Scholar]

[R4] 4.Lyman JA, Scully K, Harrison JH., Jr The development of health care data warehouses to support data mining. Clin Lab Med. 2008;28:55–71. doi: 10.1016/j.cll.2007.10.003. [DOI] [PubMed] [Google Scholar]

[R5] 5.Tusch G, Müller M, Rohwer-Mensching K, Heiringhoff K, Klempnauer J. Data warehouse and data mining in a surgical clinic. Stud Health Technol Inform. 2000;77:784–789. [PubMed] [Google Scholar]

[R6] 6.Silver M, Sakata T, Su HC, Herman C, Dolins SB, O’Shea MJ. Case study: how to apply data mining techniques in a healthcare data warehouse. J Healthc Inf Manag. 2001;15:155–164. [PubMed] [Google Scholar]

[R7] 7.Lang M, Bürkle T, Laumann S, Prokosch HU. Process mining for clinical workflows: challenges and current limitations. Stud Health Technol Inform. 2008;136:229–334. [PubMed] [Google Scholar]

[R8] 8.Melamed RD, Khiabanian H, Rabadan R. Data-driven discovery of seasonally linked diseases from an Electronic Health Records system. BMC Bioinformatics. 2014;15(Suppl 6):S3. doi: 10.1186/1471-2105-15-S6-S3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] 9.Crawford DC, et al. eMERGEing progress in genomics—the first seven years. Front. Genet. 2014 doi: 10.3389/fgene.2014.00184. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] 10.Chen Y, Nyemba S, Malin B. Detecting anomalous insiders in collaborative information systems. IEEE Trans Dependable Secure Comput. 2012;9:332–344. doi: 10.1109/TDSC.2012.11. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] 11.Chen Y, Nyemba S, Malin B. Auditing medical record accesses via healthcare interaction networks. Proc AMIA Symp. 2012:93–102. [PMC free article] [PubMed] [Google Scholar]

[R12] 12.Chen Y, Lorenzi N, Nyemba S, Schildcrout JS, Malin B. We work with them? health workers interpretation of organizational relations mined from electronic health records. Int J Med Inform. 2014;83:495–506. doi: 10.1016/j.ijmedinf.2014.04.006. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] 13.Muranaga F, Kumamoto I, Uto Y. Development of site data warehouse for cost analysis of DPC based on medical costs. Methods Inf Med. 2007;46:679–685. [PubMed] [Google Scholar]

[R14] 14.Etheredge LM. Rapid learning: a breakthrough agenda. Health Aff. 2014;33:1155–1162. doi: 10.1377/hlthaff.2014.0043. [DOI] [PubMed] [Google Scholar]

[R15] 15.Weiner M, Embi P. Toward reuse of clinical data for research and quality improvement: the end of the beginning. Ann Intern Med. 2009;151:359–360. doi: 10.7326/0003-4819-151-5-200909010-00141. [DOI] [PubMed] [Google Scholar]

[R16] 16.Weiskopf N, Weng C. Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research. J Am Med Inform Assoc. 2013;20:144–151. doi: 10.1136/amiajnl-2011-000681. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] 17.Lasko TA, Denny JC, Levy MA. Computational phenotype discovery using unsupervised feature learning over noisy sparse, and irregular clinical data. PLoS One. 2013;8:e66341. doi: 10.1371/journal.pone.0066341. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] 18.Ho JC, Ghosh J, Steinhubl S, Stewart W, Denny JC, Malin B, Sun J. Limestone: High-throughput candidate phenotype generation via tensor factorization. J Biomed Inform. doi: 10.1016/j.jbi.2014.07.001. In press. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] 19.Overby CL, Pathak J, Gottesman O. A collaborative approach to developing an electronic health record phenotyping algorithm for drug-induced liver injury. J Am Med Inform Assoc. 2013;20:e243–e252. doi: 10.1136/amiajnl-2013-001930. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] 20.Schildcrout JS, Basford M, Pulley J, Masys DR, Roden DM, Wang D, et al. An analytical approach to characterize morbidity profile dissimilarity between distinct cohorts using electronic medical records. J Biomed Inform. 2010;43:914–923. doi: 10.1016/j.jbi.2010.07.011. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] 21.Pathak J, Bailey KR, Beebe CE, Bethard S, Carrell DC, Chen PJ, et al. Normalization and standardization of electronic health records for high-throughput phenotyping: the SHARPn consortium. J Am Med Inform Assoc. 2013;20:e341–e348. doi: 10.1136/amiajnl-2013-001939. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] 22.Springate DA, Kontopantelis E, Ashcroft DM, Olier I, Parisi R, Chamapiwa E, Reeves D. ClinicalCodes: an online clinical codes repository to improve the validity and reproducibility of research using electronic medical records. PLoS One. 2014;9:e99825. doi: 10.1371/journal.pone.0099825. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R23] 23.Denny JC, Ritchie MD, Basford MA, Pulley JM, Bastarache L, Brown-Gentry K, et al. PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene-disease associations. Bioinformatics. 2010;26:1205–1210. doi: 10.1093/bioinformatics/btq126. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R24] 24.Kho AN, et al. Electronic Medical Records for Genetic Research: Results of the eMERGE Consortium. Sci Transl Med. 2011;3(79):79re1. doi: 10.1126/scitranslmed.3001807. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R25] 25.Denny JC, et al. Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data. Nat Biotechnol. 2013;31(12):1102–1110. doi: 10.1038/nbt.2749. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R26] 26.Jha AK, DesRoches CM, Campbell EG, Donelan K, Rao SR, Ferris TG, et al. The use of electronic health records in U.S. hospitals. N Engl J Med. 2009;360:1628–1638. doi: 10.1056/NEJMsa0900592. [DOI] [PubMed] [Google Scholar]

[R27] 27.Takian A, Petrakaki D, Cornford T, Sheikh A, Barber N. Building a house on shifting sand: methodological considerations when evaluating the implementation and adoption of national electronic health record systems. BMC Health Services Research. 2012;12(1):105. doi: 10.1186/1472-6963-12-105. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R28] 28.Sun JY, Fang YG. Cross-domain data sharing in distributed electronic health record systems. IEEE Transactions on Parallel and Distributed Systems. 2010;21:754–764. [Google Scholar]

[R29] 29.Pathak J, Kho AN, Denny JC. Electronic health records-driven phenotyping: challenges, recent advances, and perspectives. J Am Med Inform Assoc. 2013;20:e206–e211. doi: 10.1136/amiajnl-2013-002428. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R30] 30.Ludvigsson JF, Pathak J, Murphy S. Use of computerized algorithm to identify individuals in need of testing for celiac disease. J Am Med Inform Assoc. 2013;20:e306–e310. doi: 10.1136/amiajnl-2013-001924. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R31] 31.Tanpowpong P, Broder-Fingert S, Obuch JC, Rahni DO, Katz AJ, Leffler DA, et al. Multicenter study on the value of ICD-9-CM codes for case identification of celiac disease. Ann Epidemiol. 2013;23:136–142. doi: 10.1016/j.annepidem.2012.12.009. [DOI] [PubMed] [Google Scholar]

[R32] 32.Coloma PM, Valkhoff VE, Mazzaglia G, Nielsson MS, Pedersen L, Molokhia M, et al. Identification of acute myocardial infarction from electronic healthcare records using different disease coding systems: a validation study in three European countries. BMJ Open. 2013;3:e002862. doi: 10.1136/bmjopen-2013-002862. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R33] 33.Newton KM, Peissig PL, Kho AN, Bielinski SJ, Berg RL, Choudhary V, et al. Validation of electronic medical record-based phenotyping algorithms: results and lessons learned from the eMERGE network. J Am Med Inform Assoc. 2013;20:e147–e154. doi: 10.1136/amiajnl-2012-000896. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R34] 34.Peissig P, Santos Costa V, Caldwell MD, Rottscheit C, Berg RL, Mendonca EA, Page D. Relational machine learning for electronic health record-driven phenotyping. J Biomed Inform. 2014 doi: 10.1016/j.jbi.2014.07.007. in press. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R35] 35.Zhou J, Wang F, Hu J, Ye J. From micro to macro: data driven phenotyping by densification of longitudinal electronic medical records; Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2014. pp. 135–144. [Google Scholar]

[R36] 36.Ho J, Ghosh J, Sun J. Marble: high-throughput phenotyping from electronic health records via sparse nonnegative tensor factorization; Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2014. pp. 115–124. [Google Scholar]

[R37] 37.Gupta S, Hanson C, Gunter C, Frank M, Liebovitz D, Malin B. Modeling and detecting anomalous topic access; Proceedings of the IEEE International Conference on Intelligence and Security Informatics; 2013. pp. 100–105. [Google Scholar]

[R38] 38.Bouarfa L, Dankelman J. Workflow mining and outlier detection from clinical activity logs. J Biomed Inform. 2012;45:1185–1190. doi: 10.1016/j.jbi.2012.08.003. [DOI] [PubMed] [Google Scholar]

[R39] 39.Huang ZX, Dong W, Ji L, Gan CX, Lu XD, Duan HL. Discovery of clinical pathway patterns from event logs using probabilistic topic models. J Biomed Inform. 2014;47:39–57. doi: 10.1016/j.jbi.2013.09.003. [DOI] [PubMed] [Google Scholar]

[R40] 40.Zhang H, Mehotra S, Gunter C, Liebovitz D, Malin B. Mining deviations from patient care pathways via electronic medical record system audits. ACM Transactions on Management Information Systems. 2014;4:17. [Google Scholar]

[R41] 41.Blei DM, Ng AY, Jordan MI. Latent Dirichlet allocation. Journal of Machine Learning Research. 2003;3:993–1022. [Google Scholar]

[R42] 42.Newman D, Asuncion A, Smyth P, Welling M. Distributed inference for latent Dirichlet allocation; Proceedings of Neural Information Processing Systems (NIPS); 2007. pp. 1–9. [Google Scholar]

[R43] 43.Chuang J, Gupta S, Manning CD, Heer J. Topic model diagnostics: assessing domain relevance via topical alignment; Proceedings of the International Conference on Machine Learning (ICML); 2013. pp. 612–620. [Google Scholar]

[R44] 44.Kullback S, Leibler RA. On Information and Sufficiency. Annals of Mathematical Statistics. 1951;22(1):79–86. [Google Scholar]

[R45] 45.Kuhn HW. The Hungarian Method for the assignment problem. Naval Research Logistics Quarterly. 1955;2:83–97. [Google Scholar]

[R46] 46.Lin LI. A concordance correlation coefficient to evaluate reproducibility. Biometrics. 1989;45:255–268. [PubMed] [Google Scholar]

[R47] 47.Giuse D. Supporting communication in an integrated patient record system. AMIA Annu Symp Proc. 2003:1065. [PMC free article] [PubMed] [Google Scholar]

[R48] 48.Chan M, Lim PL, Chow A, Win MK, Barkham TM. Surveillance for Clostridium difficile infection: ICD-9 coding has poor sensitivity compared to laboratory diagnosis in site patients. PLoS One. 2011;6:e15603. doi: 10.1371/journal.pone.0015603. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R49] 49.Deych EB, Waterman AD, Yan Y, Nilasena DS, Radford MJ, Gage BF. Accuracy of ICD-9-CM codes for identifying cardiovascular and stroke risk factors. Med Care. 2005;43:480–485. doi: 10.1097/01.mlr.0000160417.39497.a9. [DOI] [PubMed] [Google Scholar]

[R50] 50.Perkins E, Williams JR. Generalized spatial binning of bodies of different sizes. Discrete Element Methods. 2002:52–55. [Google Scholar]

PERMALINK

Building Bridges Across Electronic Health Record Systems Through Inferred Phenotypic Topics

You Chen

Joydeep Ghosh

Cosmin Adrian Bejan

Carl A Gunter

Siddharth Gupta

Abel Kho

David Liebovitz

Jimeng Sun

Joshua Denny

Bradley Malin

Abstract

Objective

Methods

Results

Conclusions

Graphical abstract

1. Introduction

2. Background

2.1. Expert-based Phenotypes

2.2. Data-driven Phenotypes

3. Methods

Figure 1.

Table 1.

3.1. Topic Learning Model

3.2. Measures of Topic consistency

3.2.1 Topic Similarity

3.2.2 Population Stability

3.2.3 Topic Transferability

4. Experimental Design

4.1. Datasets

Table 2.

4.2. Setting the Number of Phenotypic Topics

4.3 Consistency of NMH and VUMC Topics

5. Results

5.1. Learned Phenotypic Topics

Figure 2.

Figure 4.

Figure 3.

5.2. Consistency of Phenotypic Topics

5.2.1 Similarity of Topics

Figure 5.

Figure 6.

5.2.2 Stability of a Patient Population Over Topics

Figure 7.

Figure 8.

5.2.3 Transferability of Topics

Figure 9.

Figure 11.

Figure 10.

6. Discussion

7. Conclusions

Supplementary Material

HIGHLIGHTS.

Acknowledgements

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases