Skip to main content
AMIA Annual Symposium Proceedings logoLink to AMIA Annual Symposium Proceedings
. 2014 Nov 14;2014:1980–1989.

On Learning and Visualizing Practice-based Clinical Pathways for Chronic Kidney Disease

Yiye Zhang 1, Rema Padman 1, Larry Wasserman 1
PMCID: PMC4419909  PMID: 25954471

Abstract

Chronic Kidney Disease (CKD) is a costly and complex disease affecting 20 million US adults. Recent studies suggest that care delivery changes may improve clinical outcomes and quality of patient experience while reducing costs. This study analyzes the treatment data of 8,553 CKD patients to learn practice-based clinical pathways. Patients’ visit history is modeled as sequences of visits containing information on visit type, date, procedures and diagnoses. We use hierarchical clustering based on longest common subsequence (LCS) distance to discover six patient subgroups, with each subgroup differing in the distribution of demographics and health conditions. Transitions of visits with high probabilities are elicited from each patient subgroup to learn common clinical pathways and treatment durations. Insights from this study can potentially result in new evidence to support patient-centered treatment approaches, empower CKD patients to better manage their disease and its complications, and provide a review guide for clinicians.

1. Introduction

Chronic Kidney Disease (CKD) is a costly, complex and high mortality health condition affecting 20 million US adults1. Prevalence is estimated to be 8-16% worldwide2. Most patients are unaware of their disease, with more than 40% with end-stage renal disease (ESRD) requiring emergency hospitalization and dialysis for acute kidney failure3. CKD patients make up only 1.5% of US Medicare population, but incur almost 10% of Medicare costs annually1. These adverse outcomes are potentially preventable or can be mitigated through early identification and treatment of individuals at risk as they progress through the five stages of CKD. Typically, the treatment of CKD is focused on preventing the worsening of condition, such as maintaining patients in their current disease stage and delaying the progression from stage 5 to ESRD and dialysis. CKD treatment currently follows opinion-based, consensus guidelines developed by nephrologists. The two widely known guidelines are Kidney Disease Outcomes Quality Initiative (KDOQI) and Kidney Disease: Improving Global Outcomes (KDIGO), which is a collection of global clinical practice guidelines for different complications of kidney disease4,5. However, recent studies suggest that care delivery changes may improve clinical outcomes, enhance quality of patient experience, and reduce annual total per capita health spending1,6. Hence, there is a need to review the actual clinical events performed on patients and discover practices that may lead to improved outcomes. At the same time, patient education plays a vital role in effective CKD treatment. For example, pre-ESRD education may enhance patient satisfaction, delay dialysis, and increase cost-effectiveness7. Engagement and education using personalized clinical pathways and tools that support shared decision-making with their clinicians are likely to be a valuable approach in delaying disease progression. Leveraging the availability of a rich and unique clinical dataset extracted from the Electronic Health Record (EHR) of a community nephrology practice in Western Pennsylvania, we analyze the treatment of 8,533 CKD patients to learn practice-based clinical pathways that have the potential to meet these objectives.

2. Background

CKD is a condition where the kidney gradually loses its function1. Patients with high blood pressure and diabetes are especially at risk for CKD, but other health problems such as cardiovascular disease and high cholesterol are also known risk factors1. Due to the close monitoring requirements of the disease, a large proportion of Medicare budget every year is utilized in the treatment of CKD patients1. In fact, the per person per year average cost of CKD patients is $23,128, which is more than twice the average cost, at $11,103, of non-CKD Medicare patients, and the cost increases as the disease condition deteriorates1. The best estimate of kidney function is the glomerular filtration rate (GFR), or the amount of blood that passes through glomeruli per minute8. Patients are divided into 5 stages based on their GFR level: ≥ 90 mL/min as stage 1, 60-89 mL/min as stage 2, 30-59 mL/min as stage 3, 15-29 mL/min as stage 4, and finally <15 mL/min as stage 58. Conditions worse than stage 5 include ESRD, which requires costly dialysis and poses significant negative impact on patients’ life styles and health outcomes9. Studies have suggested that risks of CKD increase with age, with half of the CKD stage 3 cases developing after the age of 70 years10,11. The interactions between age, sex, race, especially between African-American and White, and the risk of CKD and ESRD are also found in multiple studies1214, but there have been limited efforts in discovering pathways of care from actual treatment data.

In this study, we elicit practice-based clinical pathways of CKD treatments using 4 years of office visit data containing detailed information on 8,533 patients who were treated at a community nephrology practice in Western Pennsylvania. Practice-based clinical pathways reflect patients’ disease progression over time in terms of the treatments provided during office visits. Specifically, we aim to identify the different patient types and the distinct paths along which their treatments may evolve. Office visit data is modeled as a sequence of visits, such that we can capture the chronological changes in visit type, visit date, procedures and diagnoses. Each patient is represented by one and only one sequence. We apply hierarchical clustering, a cluster analysis method frequently used in biomedical research to identify underlying structures15,16, to patients’ visit sequences in order to discover distinct patterns in the sequences that separate one subgroup of patients from another. To visualize the clinical pathway in each subgroup, we incorporate a stochastic model to connect pairs of visits that have high transition probabilities. This learned model can be instantiated for any patient, to allow the patient to visualize a projected clinical pathway and engage with the clinicians for shared decision making. It can also be used as a practice management tool for clinicians who wish to review their practices against current consensus guidelines for CKD.

Prior research has studied methods for identifying clinical pathways for a few health conditions1719. Huang et al. developed a clinical pathway mining algorithm and applied it on hospital inpatient care for bronchial lung cancer, gastric cancer, cerebral hemorrhage, breast cancer, infarction, and colon cancer17. Lin et al. adopted Hidden Markov Model in mining clinical pathways and applied it to the normal spontaneous delivery process18. Also, Lin et al. applied Bayesian networks to study the causal relationships between medical treatments and transitions of patient’s physiological states in the Hemodialysis process19. Yet, there is limited research specifically targeted for the five CKD stages prior to ESRD for many reasons. First, CKD treatments last for years as patients progress from stages 1 to 5, completing multiple office visits during this period. Second, CKD treatment is laboratory-test driven; before each visit to the clinic, patients are asked to complete specific laboratory tests and bring the results for physician review. Results from laboratory tests largely determine the next steps in treatments. Few procedures are performed as part of CKD treatment. Finally, CKD patients tend to suffer from a variety of comorbidities such as hypertension and diabetes1. Thus, patients need to be treated for those conditions as well. In fact, patients in the same CKD stage can evolve along distinctly different clinical pathways depending on their comorbidities and lifestyle choices. Therefore, clinical pathway mining algorithms designed for other conditions do not apply well to CKD due to the differences in the nature of disease and the many dimensions associated with its treatments. In Section 2, we describe our data set and illustrate the methods used to mine clinical pathways. In Section 3, the generated pathways and their components are presented using figures and tables. We discuss limitations and future steps in Section 4, and finally summarize our conclusions in Section 5.

2. Data and Methods

2.1. Data

De-identified data for this study was obtained from a large, forward-thinking nephrology practice in Western Pennsylvania that implemented an Electronic Health Record (EHR) system in 1994. The community practice provides care to patients at multiple clinics dispersed over a three-county geographic region. A four-year extract of the data, from March 2009 to May 2013, is drawn from the EHR. There are 8,553 patients in our study dataset, 4,195 female and 4,358 male patients (Table 1). Majority of the patients are White or African-American, with the ethnicity ratio remaining steady over the years. In this study, we mainly focus on patients’ office visits, which are categorized as new patient visit, follow up, extended follow up, hospital follow up, pre-ESRD education, and kidney biopsy review. Excluding vascular access center visits and check up for dialysis patients, 8,500 patients had at least one office visit. To reduce noise in the data set, diagnoses captured in our pathways include only CKD stage 1 through stage 5 (ICD9 = 585.1-585.5) and its commonly known comorbidities: diabetes (ICD9 = 250.xx), hypertension (ICD9 = 401.xx or ICD9 = 405.xx) and acute kidney failure (AKF, ICD9 = 584.xx). We chose these 3 conditions because AKF is frequently associated with increases in CKD stages, and prevalence of CKD is high among patients with diabetes and hypertension1,20,21. Procedures are captured using Current Procedural Terminology (CPT) codes. We include 27 types of procedures, such as renal ultrasound and renal artery Doppler.

Table 1.

Patient demographics

70 and above Between 50 and 69 Under 50 Total
Female White 2229 1080 337 3646
African-American 101 93 31 225
Other 8 11 12 31
Unknown 157 97 39 293
Male White 2328 1221 293 3842
African-American 74 110 43 227
Other 15 10 11 36
Unknown 134 84 35 253
Total 5046 2706 801 8553

Table 2 lists the number of occurrences of diagnoses and procedures recorded during visits over the 4-year study period. From previous study22, we suspect that a clinical pathway for CKD may contain 4 to 6 office visits, so we extracted data on patients who started from a specific point in CKD stages 1 to 5 and had at least 8 visits, to ensure adequate number of visits are included in the learned pathways. A total of 2,511 patients fulfilled the criteria. Our data shows that procedures are not commonly performed for CKD patients during office visits. Out of 52,349 visits in the data, only 201 visits had some type of procedures performed, for a total of 262 times. Patients in CKD stage 4 received the largest number of associated procedures, majority of which were referral to pre-ESRD education (Table 2). In this analysis, we exclude patients who only had hospital visits, but do not exclude hospital visits of patients who are also office patients so that we can capture the complete history of each patient’s treatments. In addition, since CKD is a chronic condition, patients may not start and complete treatments during the four-year study period. Therefore, the clinical history we see may start from a follow-up visit and end before patients complete treatment, move on to dialysis, or decease. Mortality information is not captured in the dataset.

Table 2.

Occurrence of diagnoses and procedures recorded during visits over the 4-year period

Condition Number of occurrences Number of Associated Procedures
CKD stage 1 407 3
CKD stage 2 2290 2
CKD stage 3 18698 22
CKD stage 4 9363 141
CKD stage 5 1493 8
Acute Kidney Failure 4298 13
Diabetes 10269 19
Hypertension 30926 54

2.2. Methods

We represent each patient’s clinical history as one and only one sequence of office visits ordered chronologically, with each visit in the sequence categorized by visit type, visit date, procedure(s) performed, and diagnoses noted (Figure 1). Length of sequence is the number of visits in the sequence. Unfortunately, data on medications was incomplete, so we do not include medications in the visit node in this analysis, but this extension is straightforward using the current representation. Each variable in the visit, such as diagnosis, is stored as an independent table in a secure relational database. Patients are characterized by a de-identified patient ID, age: under 50, above 50 and below 70, 70 and above; sex: female/male; and race: White, African-American, Other. Patients receive none, or multiple procedures and diagnoses during a visit. Given the data set, we first identify all combinations of unique procedures performed during a single visit, such as ‘renal Doppler, renal ultrasound’, and combination of unique diagnoses noted during a single visit, such as ‘CKD stage 4’ or ‘CKD stage 3, hypertension.’ Each combination of procedures is given a label ‘PCx’, where x is a number from 1 up to the total number of combinations. We apply the same to the combination of diagnoses, labeling each as ‘Dy’, where y is a number from 1 up to the total number of combinations. Then, each visit can be represented as ‘type’, ‘date’, ‘PCx’, and ‘Dy.’ We scan all visits and identify unique combinations of ‘type’, ‘PCx’, and ‘Dy,’ so we can label each visit as ‘Vz’ where z is a number from 1 up to the total number of combinations. For each patient, we use ‘date’ to chronologically order each ‘Vz’ into a sequence. It is possible for a sequence to have multiple visits of the same sort, such as V1-V3-V3-V2-V1, if the visit is of the same type, procedures and diagnoses. The numbers in the labels are used to distinguish each label, not to show the order of time. Table 3 lists examples of how we label combinations of procedures and diagnoses, and visits.

Figure 1.

Figure 1.

Illustration of a patient’s sequence of visits

Table 3.

Examples of procedure, diagnosis and visit labels

Label Description
D1 CKD stage 3, hypertension
D2 CKD stage4
PC1 renal ultrasound
V1 type: follow up, date: 2011/1/1, procedure: PC1, diagnosis: D2
V2 type: new patient visit, date: 2012/1/1, procedure: N/A, diagnosis: D1

After visit sequences are properly represented, we apply hierarchical clustering23, based on a distance measure called longest common subsequence (LCS) distance24, to cluster sequences by similarity. LCS has been widely applied in biomedical research as a similarity measure used in trajectory analysis and protein sequence analysis25. It is the longest subsequence that 2 sequences have in common, while preserving the order of occurrence, but possibly separated. For instance, if patient 1 has a sequence V1-V2-V1-V3-V1-V2, and patient 2 has a sequence V1-V2-V3-V1-V4, then their LCS is of length 4, being V1-V2-V3-V1.

LCS(x,y)=max{|u|:uS(x,y)}

is the length of LCS, where |u| is the length of the longest common subsequence for the pair of sequences (x, y), and S(x,y) is the nonempty set of subsequences of sequences x and y. LCS distance (dLCS) is defined as

dLCS(x,y)=|x|+|y|2LCS(x,y)

Each patient’s visit sequence will be compared to the rest of the patients' sequences, and each such comparison generates an dLCS. Since our sample size is 2511, the size of the dLCS matrix is 2511 by 2511. To enhance the result of hierarchical clustering, we apply a data transformation and square each dLCS in the matrix. Hierarchical clustering is applied on this transformed matrix to cluster similar sequences into subgroups. The optimal number of clusters is detemined using Silhouette,26 a measure commonly used in cluster analysis.

In order to create clinical pathways for each cluster, we elicit all transitions seen in the visit sequences of patients. The visit taking place first is called source and the successor is called target. For example, given a sequence V3-V2-V2-V5, there are 3 transitions: (V3, V2), (V2, V2), (V2, V5), where the source is V3 and target is V2 in the first transition. To build pathways, we connect transitions with a certain threshold such that they form an overall path. For example, given 3 transitions: (V1, V2), (V3, V4), (V2, V4), we can connect 1st and 3rd transitions to form a path V1-V2-V4, because the target in the first transition is the source in the 3rd transition. In order to ensure that transitions we use to build pathways occurr in the data beyond a certain probability, we define a measure called weight:

weight=|Nj=target|Ni=source||Ni=source|

where |Ni = source| is the number of times Ni appears as a source, and |Nj=target | Ni=source| is the number of times Nj appears as a target given Ni is the source. Weight is the conditional probability that Ni will transition to Nj. For example, in the sequence V3-V2-V2-V5, (V3, V2) has weight of 1, and (V2, V2) and (V2, V5) have weight of 1/2, respectively. In order to reduce noise, we include only transitions with weight above 0.3, a threshold that we think will give rise to a map of pathways that contains sufficient information and easily interpretable by clinicians. We also capture and categorize the difference in days between source and target as gap: 1) less than 1 week, 2) 1-2 weeks, 3) 2-3 weeks, 4) 3 weeks-1month, 5) 1-2 months, 6) at 2-3 months, 7) 3-4 months, 8) 4-6 months, 9) 6-12 months, and 10) at least 1 year. If the same transition of visits has more than one gap observed in multiple sequences, we take the average of all the gaps for a transition in term of days, then categorize it into the above categories. A pathway is not necessarily created from one patient’s visit sequence. It can be built with multiple transitions by multiple patients, if the target of one transition of a patient is identical to the source of one transition of another patient. Length of a pathway is the number of visits included in the pathway.

In each cluster, we visualize clinical pathways of length more than 3 as graph of nodes using Gephi27. Each node in the graph represents a visit label ‘Vz’, and 2 nodes that have transitional relationships are connected by an edge. The thickness of the edge is determined by weight. The node colors in the pathway graph range from purple to white, and is determined by degree, the number of directed edges connected to a node, both inbound and outbound28. The more purple a node appears, the more connecting edges it has, and conversely the whiter a node appears, the fewer connecting edges it has. We also keep track of high out-degree nodes, which are the nodes with high number of outbound edges, because they are points where one pathway split into two or more pathways. In order to check how many of the actual transitions are captured by the learned clinical pathways, we define a measure called average maximum LCS. Given all the pathways learned in a cluster, we calculate the length of LCS between each pathway and each patient’s sequence in the cluster. For example, given 3 pathways and 5 patients, we will have a 3 by 5 matrix of LCS between each of the 3 pathways and each of the 5 patients’ sequences. We call the maximum length of LCS for each patient as maximum LCS (mLCS). Average maximum LCS is the average of the mLCS.

3. Results

There are 2,468 unique sequences among 2,511 sequences, whose lengths range from 8 to 41, with an average of 12. Of 2,511 patients in the sample, only 65 patients shared the same clinical history with one or more other patients. Thirty-nine patients with CKD stage 3 and hypertension had follow up visits only while maintaining the same diagnoses. Four patients started as new patients with CKD stage 3, diabetes, and hypertension, and continued with the same diagnoses for 7 more follow up visits. Another 4 patients with CKD stage 4 and hypertension had 10 follow up visits with the same diagnosis. Aside from these patients, our data shows that 98% of the patients have evolved in their unique ways regardless of their initial diagnoses, an indication of the unpredictability and diversity in the conditions of CKD patients, and variations in practice patterns. A total of 281 V’s were found from the data, and the total number of visits is 30,780. Table 4 lists the 10 most frequent V’s, and their occurrences in the data. Follow up visits with CKD stage 3 or CKD stage 4, and hypertension are the most common, and ESRD education sessions for CKD stage 4 patients also count as one of the most frequent.

Table 4.

Frequent visit contents

Visit Type Diagnoses Procedures Count % of Total
FUP Chronic Kidney Disease Stage 3, Hypertension N/A 5050 16.4%
FUP Chronic Kidney Disease Stage 4, Hypertension N/A 3837 12.5%
FUP Chronic Kidney Disease Stage 3, Diabetes, Hypertension N/A 2849 9.3%
FUP Chronic Kidney Disease Stage 4, Diabetes, Hypertension N/A 1833 6.0%
FUP Chronic Kidney Disease Stage 5, Hypertension N/A 751 2.4%
FUP Chronic Kidney Disease Stage 2, Hypertension N/A 396 1.3%
FUP Acute Kidney Failure, Chronic Kidney Disease Stage 3, Hypertension N/A 381 1.2%
ESRD Chronic Kidney Disease Stage 4 N/A 376 1.2%
FUP Chronic Kidney Disease Stage 5, Diabetes, Hypertension N/A 319 1.0%
FUP Chronic Kidney Disease Stage 2, Diabetes, Hypertension N/A 213 0.7%

We identified 6 clusters of patient sequences based on the data of 2,511 patients who had at least 8 visits (Figure 2). The length of pathways is between 4 and 6, and number of pathways ranges from 17 to 67. As Figure 2 shows, graphs of clusters 1 and 6 are relatively sparse compared to graphs of clusters 2 through 5. More than half of the pathways in cluster 1 ended with follow up visits with diagnoses of CKD stage 4, diabetes and hypertension. Majority of the cluster 2 pathways ended with follow up visits with diagnosis of CKD stage 3, diabetes and hypertension. All pathways in cluster 3 ended with follow up visits of CKD stage 3 and hypertension. Pathways in cluster 4 predominantly ended with CKD stage 4 and hypertension. All pathways in cluster 5 ended with hospital treatment, and diagnoses are not noted in the data because office and hospital visits have separate billing systems. Finally cluster 6 seems to contain many non-compliant patients. Patients in cluster 3 have cancelled or not shown up for more than 56% of the visits, causing the data to contain empty visits without noted diagnosis. Given that most of the pathways have length of 4, average maximum LCS in clusters 2 through 5 suggest that pathways in these clusters represent a fair amount of information about their patients. On the other hand, clusters 1 and 6 may contain patients whose conditions are more complex and difficult to generalize. In fact, as Table 5 shows, the most common end point in cluster 1 has diagnoses of CKD stage 4, diabetes and hypertension, the most severe set of conditions out of the 6 clusters. All most common end points and high out-degree points had no procedures performed.

Figure 2.

Figure 2.

Pathway for identified clusters

Table 5.

Summary of Clusters

Clu ster Number of Patients Number of pathways Most Common End point Average maximum LCS3 High out-degree point
Type Diagnosis % of all pathways Type Diagnosis
1 433 17 FUP1 CKD stage 4, diabetes, hypertension 65% 1.8 HOSP2 AKF4, CKD stage 2
2 462 40 FUP1 CKD stage 3, diabetes, hypertension 95% 2.8 FUP1 AKF4, CKD stage 4
3 582 67 FUP1 CKD stage 3, hypertension 100% 3.0 NEW5 CKD stage2, hypertension
4 433 45 FUP1 CKD stage 4, hypertension 93% 3.0 FUP1 AKF4, CKD stage 3, diabetes, hypertension
5 233 50 HOSP2 N/A 100% 3.2 FUP1 CKD stage 5, diabetes
6 368 16 FUP1 N/A 75% 2.0 FUP1 CKD stage 5, diabetes
1

Follow up,

2

Hospital follow up,

3

Longest common subsequence,

4

Acute Kidney Failure,

5

New patient visit

We examined whether there are differences in terms of age, race, and sex (Figure 3). Applying Chi square test with Monte Carlo methods against the frequencies in the entire sample of 2,511 patients, we found significant differences in the expected and observed frequencies in clusters 2 through 6 (p-value < 0.05 using chisq.test in R) 26. For example, prior to clustering, 1.7% of the sample population are African-American males between 50 and 70 years of age, but the percentage increases to 2.8% in cluster 6, and decreases to 0.4% in cluster 5 after clustering. Similar trend can also be seen among White female under 50 years of age. Also, the percentage of White female above 70 years of age in cluster 5 nearly doubles compared to the original percentage in the overall sample. Similar pattern can be seen with African-American male above 70 years old and African-American female above 70 years old. These subgroups may have higher likelihood of receiving hospital treatment.

Figure 3.

Figure 3.

Distribution of patients by age, race and sex

We illustrate 3 example pathways below. Figure 4 shows a pathway belonging to cluster 1. It starts with a follow up visit where patient is diagnosed with CKD stage 4, diabetes and hypertension. Patient receives a chest X-ray during the visit. On average at least 1 month later, patient visits the office for a pre-ESRD education, which is held by a nurse to prepare and educate patients about potential onset of ESRD. Patient comes for a follow up visit at least a month later on average, without changes in diagnoses. Patient is able to maintain the condition for at least a year on average, and visits the office for another follow up. This pathway’s duration is on average 1 year and 2 months. Figure 5 shows a pathway in cluster 2, which starts with a CKD stage 3 patient attending an ESRD education session. After an average of 2 weeks, the patient is seen for a hospital follow up, where patient is diagnosed with AKF in addition to CKD stage 3. Then, in an average of 1 year, patient comes for an office follow up visit to receive diagnoses of diabetes and hypertension, in addition to CKD stage 3. Same visit is repeated one year later. On average, it takes at least 2 years 1 month and 2 weeks to complete the pathway. In Figure 5, patient avoids worsening of CKD for more than 2 years, in addition to the pathway in Figure 4 where patients avoid worsening for a year, indicating the known role of pre-ESRD education in helping patients to maintain their current conditions7. In fact, cluster 1’s high out-degree point (Table 5) divides one pathway into two, with one ending with the same CKD stage upon pre-ESRD education, and another progressing to CKD 3, diabetes and hypertension. One of the visits in cluster 2 with high out-degree is a follow-up visit where patient is diagnosed with AKF and CKD stage 4, which leads to a visit of same type and diagnoses, or in another case an improvement to CKD stage 3 (Table 5). In both cases, they eventually end with a follow up visit with CKD stage 3, diabetes and hypertension. One pathway from cluster 3 is shown in Figure 6. It starts with a follow up visit with a CKD stage 1 patient, who maintains the condition for at least 2 years on average. Patient then develops hypertension, but maintains the same for at least 2 years on average. At least 4 years later on average, patient progresses to CKD stage 3, while still suffering from hypertension. Same diagnoses are noted on a follow up visit at least 1 year later on average. In total, it takes at least 5 years to complete the learned pathway. This is an example where the duration of the learned pathway is longer than the real time range in data, since it is created using high-likelihood transitions of multiple patients. Pathways like this may serve as a projection for future patients who share the same earlier visits.

Figure 4.

Figure 4.

An example pathway in cluster 1

*CKD4: CKD stage 4, DM: diabetes, HP: hypertension,** FUP: follow up, ESRD: pre-ESRD education

Figure 5.

Figure 5.

An example pathway in cluster 2

*CKD3: CKD stage 3, AKF: acute kidney failure, DM: diabetes, HP: hypertension ** ESRD: pre-ESRD education, HOSP: hospital follow up, FUP: follow up

Figure 6.

Figure 6.

An example pathway in cluster 3

*CKD1: CKD stage 1, CKD3: CKD stage 3, HP: hypertension,** FUP: follow up

4. Discussion

Learning clinical pathways from EHR data allow us to discover improvement, deterioration or maintenance of the status quo in the course of disease evolution, so we may re-visit the historical data and clinicians’ notes to identify interventions that may have played a role in the evolution. For instance, examining pathways with pre-ESRD education may allow us to compare the time it takes for patients to deteriorate with, or without, pre-ESRD education. Also, finding patterns in pathways where improvements were seen in patients’ conditions may help clinicians to identify promising care models for a subgroup of the population. In addition, comparing the pathways of compliant and non-compliant patients may help clinicians to engage non-compliant patients in treatments using new approaches. Exploring clinical pathways also allow us to discover significant differences in the distribution of age, race and gender among clusters of patients. Learning about these differences may enable clinicians to design personalized treatment plans for patients according to demographics and health conditions, and develop hypotheses that can be tested in large populations.

In this paper, we chose to show all transitions of visits with weights above 0.3, but different thresholds of weight may produce pathways of different complexity. For example, the maximum length of pathways is 6, and the total number of pathways is 235 in this study. If we reduce the threshold of weight, we may be able to identify pathways of longer lengths, or larger total number of pathways. Developing a measure that takes into account the number of patients and their diversity may determine an optimal threshold of weight that allows most reasonable clinical interpretation of the pathways. Although clustering was applied to separate patients into subgroups, differences in clinical history still exist, even within one cluster, due to the diversity and complexity of patients’ conditions and practice variations. Perhaps instead of using weight to elicit a sample population, a more individualistic approach, where we learn the clinical pathways based on patients’ biochemical data, demographics and lifestyle choices, can help us to create personalized pathways and detect granular but important differences that separate one patient’s sequence from another. Furthermore, in this study we studied only CKD stage 1 through 5, AKF, diabetes and hypertension as diagnoses to control for noise. The patient subgroups may be further divided into smaller subgroups if we included more conditions such as anemia and proteinuria, also commonly seen among CKD patients1.

In this study we investigated 6 visit types, 8 conditions, and 27 procedures. Theoretically, there can be up to 1,296 combinations of visit types, conditions and procedures reflected in the visits. We observed 281 combinations from the data, as some combinations did not occur in actual visits. However, as the number of component we include increases, the number of possible combinations may explode computationally. For example, if we include 10 medications in the study, the number of possible combination increases by 10-fold. Alternatively, if we increase the conditions by 2, we double the number of possible combinations. As Table 4 shows, only a few of the ‘Vz’s are observed with high frequencies, making the distribution of ‘Vz’s right skewed with a long tail. In future studies where more components in the pathway need to be explored, we may focus only on the most frequent combinations, to avoid including too much noise in the pathway creation.

A big challenge in the study is the incomplete documentation in EHR. For example, clinicians may mention an existing condition only in their notes, and patients’ medication lists may not be up-to-date. In order to gain the most value from EHR data, it is vital that treatments are accurately recorded with the help from healthcare providers. Once curated data is available, we should investigate complete treatment data including medications, and also patient demographics and life style choices if possible, since these are crucial factors in altering the disease evolution as well. Similarly, we recognize that the types of laboratory tests ordered characterize patients’ current conditions and future treatment plans. Hence, results from important laboratory examinations such as GFR, serum Creatinine level, and serum Albumin level should also be included in the study data to provide a more accurate view of the patients’ clinical history. This way, we can use laboratory results to ensure that there is no discrepancy between the diagnostic codes and actual conditions, and track pathway evolution by the sequence of lab observations. Moreover, if large enough sample is available, we should limit study subjects to be patients whose treatment data include all visits from the first to the last before patient progress to ESRD. This will likely allow us to build more comprehensive clinical pathways. Furthermore, although we did capture the duration of pathways in this study, temporal factor was not used to cluster patients. Temporal factors, such as pathway duration and the differences in time between actual practice and the consensus guidelines, should be explored in future analysis. This is an important objective that should be pursued with guidance from clinicians, so we may learn dividing points in the pathways that differentiate outcomes of treatments. Finally, and most importantly, rigorous evaluation by clinicians of the learned pathways is crucial to identify unrealistic transitions and courses of disease progression.

5. Conclusion

CKD is a costly and complex health condition with high mortality, and affects millions of adults worldwide. In this study, we aim to learn practice-based clinical pathways for CKD through which we seek an understanding of the major treatment pathways in the evolution of the disease. Our analysis of patient data from an EHR yielded 6 patient subgroups after applying hierarchical clustering on the sequences of office visits. Each subgroup has distinct characteristics, and captures commonly known combination of comorbidities in CKD, such as CKD stage 3, hypertension and diabetes. Also, patients who may experience hospital treatments, or progress faster than others, have been identified in subgroups. We observe that multiple paths of disease evolution exist even within each patient subgroup, confirming CKD’s diverse course of development. Moreover, significant differences in the distribution of age, race and sex were found among subgroups. Clinical pathways learned from our study can be instantiated for any patient to enable shared decision making between clinicians and patients, so patients may gain insights into their projected clinical pathway based on demographics and earlier visits. At the same time, our study can serve as a practice management tool used by clinicians, to review their practice against current consensus guidelines for CKD, and identify care models that may lead to improved outcomes.

Acknowledgments

We are grateful to the physicians and staff of the community nephrology practice who generously provided data from their Electronic Health Record for this study. We particularly thank Dr. Teredesai, MD, Dr. Xie, MD, PhD, Dr. Patel, MD, and staff, L. Smith and A. Barletta, who gave us important clinical and technical information about the data and the key characteristics of CKD and its treatment. This study was designated as Exempt by the Institutional Review Board at Carnegie Mellon University.

References

  • 1.US Renal Data System . USRDS 2013 Annual Data Report: Atlas of Chronic Kidney Disease and End-Stage Renal Disease in the United States. Bethesda, MD: National Institutes of Health, National Institute of Diabetes and Digestive and Kidney Diseases; 2013. [Google Scholar]
  • 2.Jha V, Garcia-Garcia G, Iseki K, Li Z, Naicker S, Plattner B, Saran R, Wang AY, Yang CW. Chronic kidney disease: global dimension and perspectives. Lancet. 2013 Jul 20;382(9888):260–72. doi: 10.1016/S0140-6736(13)60687-X. [DOI] [PubMed] [Google Scholar]
  • 3.Arulkumaran N, Montero RM, Singer M. Management of the dialysis patient in general intensive care. British journal of anaesthesia. 2012 Feb;108(2):183–92. doi: 10.1093/bja/aer461. PubMed PMID: 22218752. [DOI] [PubMed] [Google Scholar]
  • 4.National Kidney F. K/DOQI clinical practice guidelines for chronic kidney disease: evaluation, classification, and stratification. American journal of kidney diseases : the official journal of the National Kidney Foundation. 2002 Feb;39(2 Suppl 1):S1–266. PubMed PMID: 11904577. [PubMed] [Google Scholar]
  • 5.Stevens PE, Levin A, Kidney Disease: Improving Global Outcomes Chronic Kidney Disease Guideline Development Work Group M Evaluation and management of chronic kidney disease: synopsis of the kidney disease: improving global outcomes 2012 clinical practice guideline. Annals of internal medicine. 2013 Jun 4;158(11):825–30. doi: 10.7326/0003-4819-158-11-201306040-00007. PubMed PMID: 23732715. [DOI] [PubMed] [Google Scholar]
  • 6.Abra G, Patel M, Moore D. Trend-bending Chronic Kidney Disease Care Model: Stanford University School of Medicine, Clinical Exellence Research Center. 2013. [cited 2014 03/01]. Available from: http://cerc.stanford.edu/fellowships/docs/CERC4modelsummary2.11.2013_3PMpdf.pdf.
  • 7.Hayslip DM, Suttle CD. Pre-ESRD patient education: a review of the literature. Advances in renal replacement therapy. 1995 Jul;2(3):217–26. doi: 10.1016/s1073-4449(12)80055-0. PubMed PMID: 7614358. [DOI] [PubMed] [Google Scholar]
  • 8.Israni A, Kasiske B. In: Laboratory assessment of kidney disease: glomerular filtration rate, urinalysis, and proteinuria. Brenner and Rector’s The Kidney. 9. Taal MW CG, Marsden PA, editors. Philadelphia, PA: Elsevier Saunders; 2011. [Google Scholar]
  • 9.The National Kidney Foundation Dialysis: The National Kidney Foundation. 2013. [cited 2014 3/10]. Available from: http://www.R-project.org/
  • 10.Grams ME, Chow EK, Segev DL, Coresh J. Lifetime incidence of CKD stages 3–5 in the United States. American journal of kidney diseases : the official journal of the National Kidney Foundation. 2013 Aug;62(2):245–52. doi: 10.1053/j.ajkd.2013.03.009. PubMed PMID: 23566637. Pubmed Central PMCID: 3723711. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Coresh J, Selvin E, Stevens LA, Manzi J, Kusek JW, Eggers P, et al. Prevalence of chronic kidney disease in the United States. JAMA : the journal of the American Medical Association. 2007 Nov 7;298(17):2038–47. doi: 10.1001/jama.298.17.2038. [DOI] [PubMed] [Google Scholar]
  • 12.Sud M, Tangri N, Levin A, Pintilie M, Levey AS, Naimark DM. CKD Stage at Nephrology Referral and Factors Influencing the Risks of ESRD and Death. American journal of kidney diseases : the official journal of the National Kidney Foundation. 2014 Jan 28; doi: 10.1053/j.ajkd.2013.12.008. PubMed PMID: 24485146. [DOI] [PubMed] [Google Scholar]
  • 13.Tarver-Carr ME, Powe NR, Eberhardt MS, LaVeist TA, Kington RS, Coresh J, et al. Excess risk of chronic kidney disease among African-American versus white subjects in the United States: a population-based study of potential explanatory factors. Journal of the American Society of Nephrology : JASN. 2002 Sep;13(9):2363–70. doi: 10.1097/01.asn.0000026493.18542.6a. [DOI] [PubMed] [Google Scholar]
  • 14.Lopes AA, Hornbuckle K, James SA, Port FK. The joint effects of race and age on the risk of end-stage renal disease attributed to hypertension. American journal of kidney diseases : the official journal of the National Kidney Foundation. 1994 Oct;24(4):554–60. doi: 10.1016/s0272-6386(12)80211-3. PubMed PMID: 7942809. [DOI] [PubMed] [Google Scholar]
  • 15.Xu R, Wunsch DC., 2nd Clustering algorithms in biomedical research: a review. IEEE Rev Biomed Eng. 2010;3:120–54. doi: 10.1109/RBME.2010.2083647. Review. [DOI] [PubMed] [Google Scholar]
  • 16.Yoo I, Alafaireet P, Marinov M, Pena-Hernandez K, Gopidi R, Chang JF, Hua L. Data mining in healthcare and biomedicine: a survey of the literature. J Med Syst. 2012 Aug;36(4):2431–48. doi: 10.1007/s10916-011-9710-5. [DOI] [PubMed] [Google Scholar]
  • 17.Huang Z, Lu X, Duan H. On mining clinical pathway patterns from medical behaviors. Artif Intell Med. 2012;56(1) doi: 10.1016/j.artmed.2012.06.002. [DOI] [PubMed] [Google Scholar]
  • 18.Lin F, Hsieh L, Pan S. Learning Clinical Pathway Patterns by Hidden Markov Model; HICSS; 01/03/2005; Hawaii, USA. 2005. p. 142a. [Google Scholar]
  • 19.Lin F, Chiu C, Wu S. Using Bayesian networks for discovering temporal-state transition patterns in Hemodialysis; HICSS; 1/7/2002; Hawaii, USA. 2002. pp. 1995–2002. [Google Scholar]
  • 20.Hsu CY, Chertow GM, McCulloch CE, Fan D, Ordonez JD, Go AS. Nonrecovery of kidney function and death after acute on chronic renal failure. Clinical journal of the American Society of Nephrology : CJASN. 2009 May;4(5):891–8. doi: 10.2215/CJN.05571008. PubMed PMID: 19406959. Pubmed Central PMCID: 2676192. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Islam TM, Fox CS, Mann D, Muntner P. Age-related associations of hypertension and diabetes mellitus with chronic kidney disease. BMC nephrology. 2009;10:17. doi: 10.1186/1471-2369-10-17. PubMed PMID: 19563681. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Zhang Y, Parman R, Wasserman L. On Learning Clinical Pathways for Chronic Kidney Disease from Electronic Health Record Data. A Preliminary Graphical Approach. 2014 [Google Scholar]
  • 23.Kaufman L, Rousseeuw PJ. Finding Groups in Data: An Introduction to Cluster Analysis. 1. New York: John Wiley; 1990. [Google Scholar]
  • 24.Elzinga CH. Sequence analysis: Metric representations of categorical time series. Socio- logical Methods and Research. 2008 [Google Scholar]
  • 25.Murzin AG, Brenner SE, Hubbard T, Chothia C. SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol. 1995 Apr 7;247(4):536–40. doi: 10.1006/jmbi.1995.0159. [DOI] [PubMed] [Google Scholar]
  • 26.Rousseeuw PJ. Silhouettes: a Graphical Aid to the Interpretation and Validation of Cluster Analysis. Computational and Applied Mathematics. 1987;(20):53–65. [Google Scholar]
  • 27.Bastian M, Heymann S, Jacomy M. Gephi: an open source software for exploring and manipulating networks. International AAAI Conference on Weblogs and Social Media; 2009. [Google Scholar]
  • 28.Harary F. Graph Theory. Reading, MA: Addison-Wesley; 1994. [Google Scholar]
  • 29.R Core Team . R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2012. [Google Scholar]

Articles from AMIA Annual Symposium Proceedings are provided here courtesy of American Medical Informatics Association

RESOURCES