Skip to main content
AMIA Annual Symposium Proceedings logoLink to AMIA Annual Symposium Proceedings
. 2017 Feb 10;2016:1737–1746.

Multi-Trajectory Models of Chronic Kidney Disease Progression

Philipp Burckhardt 1, Daniel S Nagin 1, Rema Padman 1
PMCID: PMC5333229  PMID: 28269932

Abstract

An ever increasing number of people are affected by chronic kidney disease (CKD). A better understanding of the progression ofCKD and its complications is needed to address what is becoming a major burden for health-care systems worldwide. Utilizing a rich data set consisting of the Electronic Health Records (EHRs) of more than 33,000 patients from a leading community nephrology practice in Western Pennsylvania, we applied group-based trajectory modeling (GBTM) in order to detect patient risk groups and uncover typical progressions of CKD and related comorbidities and complications. We have found distinct risk groups with differing trajectories and are able to classify new patients into these groups with high accuracy (up to ≈ 90%). Our results suggest that multitrajectory modeling via GBTM can shed light on the developmental course ofCKD and the interactions between related complications.

1. Introduction

Chronic Kidney Disease (CKD) is a growing burden for the national health-care sector. Today, it is estimated that more than 11% of the US adult population have some degree of CKD,1 and projections indicate that more than 50% of those aged 30 to 64 years will develop CKD.2 With costs amounting to $49.2 billion in the United States in the year 2011 for the treatment of End Stage Renal Disease (ESRD), the final stage of CKD,3 it is paramount to gain deeper insights into the progression of the disease in order to facilitate the development of new preventive care approaches.

The increasing adoption of Electronic Health Record Systems (EHRs) in recent years,4 fueled by the promise of cost savings, increased efficiency and better communication between the various healthcare providers, has resulted in the accumulation of a massive amount of structured data on patients and their disease progressions. It has been argued that the identification and the quality of care of CKD patients could be improved by an effective utilization of EHRs.5

Using a rich data set consisting of the EHRs for more than 33, 000 patients from a leading community nephrology practice in Western Pennsylvania, this is the first study that applies group-based trajectory modeling (GBTM) in order to uncover typical progressions of CKD and related comorbidites and detect patient risk groups. By modeling biomarkers not only for CKD, but also complications typically linked to it, we aim to obtain a fuller, multi-dimensional picture of its progression. Specifically, we use the estimated glomerular filtration rate (eGFR) as a biomarker for CKD and other appropriate laboratory-based biomarkers for its complications.

Historically, CKD progression was assessed via patient’s serum creatinine levels. However, serum creatinine is not a good measure of kidney function, so laboratories nowadays report eGFR in addition.6 eGFR can be calculated from the level of serum creatinine and patient characteristics such as age, gender and race. This change in reporting was associated with an increase in first nephrologist visits, but eGFR by itself is not sufficient for guiding decision-making on the care of kidney patients.7

A multi-dimensional approach is required since CKD patients tend to have numerous comorbidities. In fact, a large fraction of patients suffering from CKD do not progress to the later stages of the disease but die prematurely due to these comorbidities and complications. Thus, any treatment should take into account the contemporaneous progression of CKD and its varying complications at differing levels of severity that patients experience. This poses a major challenge, though, since care coordination amongst various medical professionals is involved. Discerning disease patterns via interpretable and accessible statistical models has the potential to alleviate the communication challenges between these stakeholders. To facilitate this, we develop and test a statistical model which can be used to track not only the progression of CKD but also the development of several complications. This can be accomplished by monitoring the levels of biomarkers for the considered complications.

In clinical research, group-based trajectory models (GBTM) are increasingly used to model the development of a clinically important indicator over time, with the goal of identifying groups of individuals sharing a common trajectory.

Originally devised by Nagin and colleagues as a criminological tool for classifying criminal careers, this approach lends itself towards usage in a wide range of disciplines, including clinical research.8 At their core, GBTMs are an example of a class of statistical models called finite mixture models. More precisely, GBTMs are mixtures of regression models applied to longitudinal data, where the likelihood of a time series for an individual is assumed to be a mixture of linear models that include both an explicit time variable as well as other covariates. This way of analyzing biomarker trajectories is different from other approaches such as latent growth curve modeling, which rests upon the assumption of a common functional form for the trajectories for all individuals, with its parameters varying randomly. Growth curve models, which can be formulated either as structural equation models or as multilevel models, effectively assume all individuals to belong to a single class. In contrast, GBTM have a finite number of classes, each of which is characterized by its own parametric trajectory. Given that our interest lies not in the study of the variability between subjects, but in discovering the typical marker trajectories that we suspect to exist, GBTMs are more appropriate for our needs.

The data set spans four years of patient data and includes diagnoses, lab results and patient characteristics. After restricting the data set to all patients who were diagnosed with CKD Stage III during the covered time span, we removed all transplant patients as their trajectories differ significantly from the rest of the patients. The final cohort consists of 1,944 individuals.

We fit a single, joint multi-group trajectory-model for five biomarker time series, and use an appropriate model selection procedure to identify a parsimonious and interpretable model. Covariates such as patient characteristics (age, weight, gender, race, etc.) and binary indicators on whether the patient suffers from the comorbidities of diabetes and hypertension are included in addition to the biomarker time series. By fitting a joint model for all markers, we build upon earlier work on single and dual trajectories, whose results hinted at the usefulness of GBTMs for risk stratification of CKD patient populations, even though only one or two markers were used at a time.9 The constructed trajectory model is necessarily a simplification of the real world, but ideally one which captures the main distinctive trajectories of disease progression one would typically encounter in patients. Its use could enable better screening of patient populations for high-risk individuals and lead to insights about the interplay between the various complications that are suffered by CKD patients.

2. Data

We are working with a rich clinical data set of patients from a leading nephrology practice in southwestern Pennsylvania. The total number of unique patient records is 33,882. The data set contains information about patient characteristics (age, gender, weight and height) as well as all of their lab measurements with associated dates.

The lab results span a total of 18 quarters, ranging from the years 2009 to 2013. The patient population is split almost evenly among female and male patients. Most patients are retirees, with median age of 70 years. 94% of the patients are white. About half the patients have a diagnosis of CKD Stage III while the remaining are in more advanced stages of CKD.

For our analysis, patients diagnosed with CKD Stage III between January 1, 2009 and November 19, 2012 were selected on the grounds that this diagnosis usually marks the point when patients start showing first disease symptoms and are referred to a nephrologist. Patients in CKD Stages I and II show almost normal kidney function, and are therefore rarely diagnosed. A doctor will only make these diagnoses by linking additional evidence like proteinuria or haematuria, which are not sufficient indicators of CKD.

Patients who received a kidney transplant on or after January 1, 2009 were removed from the analysis since their level of kidney function differs sharply from those who did not receive a transplant and keeping them will likely distort the identified disease progressions.

Historically, serum creatinine levels have been used as a marker for CKD: As kidney function deteriorates, blood levels of creatinine typically rise. Tracking the estimated Glomerular Filtration Rate (eGFR) has replaced the creatinine test as a more reliable means to detect early kidney damage and is a method suggested by the National Kidney Foundation (NKF). eGFR can be derived from the creatinine level and additional variables such as age, sex and race and possibly height and weight. For this study, eGFR was calculated from serum creatinine levels using the CKD-EPI equation.

The Chronic Kidney Disease Epidemiology Collaboration (CKD-EPI) developed and validated this equation, which is more accurate than the previously used Modification of Diet in Renal Disease (MDRD) Study equation,10 although it is not clear whether it improves risk prediction.11

Creatinine values larger than 10 mg/dl were removed because these values were likely entered incorrectly. We only keep measurements collected on or after the date the patient was diagnosed with CKD Stage III. Typical complications of CKD include Anemia,12 Secondary Hyperparathyroidism,13 Hyperphosphatemia14 and Metabolic Acidosis.15 Our data set allows us to track each of these via corresponding lab measurements. The respective lab measurements of Hemoglobin (HGB), parathyroid hormone (PTH), phosphate (PO4) and carbon dioxide (CO2) can be used as markers for the considered complications. In the cohort, 1, 367 patients develop Anemia, while 1,476 of them are diagnosed with Secondary Hyperparathyroidism. 438 patients have Acidosis, but only a hundred suffer from Hyperphosphatemia.

All outcome variables were averaged by quarter to deal with the relative data scarcity and infrequency of the lab measurements. In addition, this procedure yields measurements on a discrete time scale as required by the group- based trajectory model. For each biomarker, Table 1 displays the average value in each period as well as its standard deviation.

Table 1.

Averages for the five considered biomarkers across time periods (quarterly). Standard deviations are displayed in parentheses. The overall data availability is displayed in the second row.

Period 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Availability 0.267 0.43 0.546 0.45 0.529 0.409 0.412 0.338 0.405 0.292 0.289 0.255 0.287 0.211 0.196 0.149 0.13 0.035
eGFR 37.87 38.33 38.64 38.55 38.70 38.42 38.54 38.47 38.29 38.18 38.13 37.84 37.64 36.91 36.92 35.95 35.29 33.71
(12.55) (11.07) (11.34) (11.18) (11.24) (11.30) (11.54) (11.45) (11.51) (11.56) (11.77) (11.85) (11.99) (11.80) (11.35) (11.45) (11.56) (10.86)
HGB 12.11 12.33 12.52 12.57 12.65 12.66 12.69 12.70 12.69 12.65 12.67 12.66 12.65 12.61 12.69 12.73 12.61 12.30
(2.59) (1.98) (1.67) (1.58) (1.84) (1.69) (1.65) (1.66) (1.78) (1.68) (1.67) (1.77) (1.68) (1.66) (1.68) (1.70) (1.75) (1.83)
CO2 26.08 26.08 26.19 26.19 26.13 26.05 26.11 26.13 26.21 26.26 26.27 26.15 25.96 25.73 25.76 25.85 25.60 25.65
(3.34) (3.26) (3.15) (3.23) (3.20) (3.14) (3.19) (3.23) (3.30) (3.34) (3.31) (3.31) (3.29) (3.27) (3.06) (3.26) (3.18) (3.37)
PTH 75.36 72.63 71.80 70.96 70.78 70.85 71.27 71.07 70.48 69.24 68.13 69.52 71.09 73.76 73.63 79.63 82.91 85.94
(57.58) (50.56) (54.86) (47.75) (49.27) (51.00) (52.27) (51.94) (53.21) (47.86) (42.69) (44.57) (48.65) (50.13) (45.57) (54.93) (58.09) (55.42)
PO4 3.51 3.52 3.52 3.53 3.51 3.52 3.52 3.54 3.54 3.53 3.54 3.53 3.55 3.56 3.55 3.56 3.58 3.74
(0.63) (0.63) (0.62) (0.59) (0.59) (0.58) (0.59) (0.60) (0.61) (0.58) (0.61) (0.58) (0.59) (0.63) (0.61) (0.64) (0.70) (0.77)

Patients who do not have at least four observations in the considered time range were removed. Even for the remaining patients, observations are scarce: On average, for each patient we have only measurements for 34.29% of all of his or her relevant quarters. To deal with this missing data, we have linearly interpolated observations between any two quarters whenever there was a missing quarter between them. Doing so increased the percentage of available data points to 62.42%.

After data cleaning and processing, 1944 of the patients diagnosed with CKD Stage III on or after January 1, 2009 were used for model fitting.

3. Methods

Single Trajectory Model

Let Yj define the multivariate response variable for the j-th marker, for example eGFR. Then, Yj=(Yj,1, ...,Yj, 18)T is a vector of length 18, which holds the quarterly lab results for the marker in question. For a single response, the group-based trajectory model posited by Nagin16 assumes the following density for a sequence of longitudinal measurements yj=(yj,1, ...,yj, 18):

f(yj)=k=1KpkfYj(yj|C=k), (1)

where K denotes the total number of groups, pk the probability of belonging to group k and fYj(yi|C=k) the conditional density of the observed data given class k.

The probabilities pk of this mixture model are not estimated directly, but related via the softmax function to a k- dimensional vector θ of class coefficients and time-stable covariates x with associated weight vectors wk:

pj=eθj+xTwjk=1keθk+xTwk (2)

The response vector for each outcome is modelled as a multivariate normal random variable

Yj|C=kN(μjk,σj2I), (3)

where the elements of the mean vector are related to the period t (= time in quarters since diagnosis of CKD Stage III) of the individual patient as follows:

μjkt=βjk0+βjk1t+βjk2t2+βjk3t3. (4)

As can be seen, the group-based trajectory model is based on the assumption that the trajectories in each group have a simple polynomial form. From our experience, a polynomial order above three is rarely necessary, which is why we have constrained the model to have cubic terms at most.

Multi-Trajectory Model

This is one of the first works in which the group-based trajectory model devised by Nagin16 is used not to model just a single time series, but jointly the trajectories of multiple outcomes.17 In this model extension, the density for J outcomes becomes

f(y)=f(yj=1,...,J)=k=1Kpkj=1JfYj(yj|C=k). (5)

Model fitting and inference is carried out via the traj procedure from the Stata package of the same name,18 which implements a Newton-Raphson optimization algorithm for maximum likelihood estimation. For N observations y(1), ..., y(N) of all outcomes, the maximum-likelihood optimization problem is

θ,β,σ,w    maxL=θ,β,σ,w    maxi=1NfY(i)(y(i);θ,β,σ,w). (6)

Following the suggestion given by Jones et al., the Bayesian information criterion (BIC) is used to perform model selection and to determine the number of groups K.18

Model Predictions

The aforementioned procedure yields a multi-response model, which fits trajectories for all five biomarkers and incorporates time-stable covariates, namely demographic variables and indicators for the existence of diabetes and hypertension. New patients can be classified using the posterior probabilities of group membership, which can be computed by using Bayes’ rule as

Pr(C=l|{Yj=yj}j=1,...,J)=plj=1JfYj(yj|C=l)k=1Kpkj=1JfYj(yj|C=k), (7)

where the number of outcomes J is equal to five in our case. For the conditional densities, it follows from Equation (3) that

fYj(yj|C=k)=t=1Tϕ(yjtμjktσj), (8)

in which φ is the density function of the standard normal distribution. For the data used for model fitting, T = 18. However, in order to form predictions, we cannot calculate the group membership conditional on Yi for all time periods as they might not be available yet. By conditioning only on measurements up to the observed time, we can calculate posterior probabilities that take all currently available information into account.

Similarly, if for any outcome the time series Yi contains missing values or starts at a later time, the product will only be taken over the observed data. This allows us to generate predictions for new data points for which we have not yet observed the trajectories for 18 time periods, which is important given the goal of detecting high-risk patients early on and not when it may already be too late.

4. Results

The estimated trajectories for eGFR from the multi-trajectory model are displayed in Figures 1. While all patients in our cohort have been diagnosed with CKD Stage III and thus suffer from kidney damage, it becomes clear that some patients belong to groups characterized by trajectories that show almost no change in eGFR values (groups 58), whereas the kidney function of patients in other groups deteriorates significantly after they were first diagnosed (groups 1-4).

Figure 1.

Figure 1.

Fitted trajectories of the eight-group model for eGFR.

The number of groups was determined using Bayesian information criterion (BIC) as the model selection criterion, as was the order of the polynomial terms and the inclusion of time-stable covariates. Using BIC in the model search is supposed to help build a sparse but at the same time sufficiently large model to accurately reflect the data at hand. Starting from a full model with cubic polynomials and all available covariates, variables were removed from the model if they were statistically insignificant and their exclusion improved the BIC score. Eight groups were selected in total, and of the available time-stable covariates, gender, BMI, age, and indicators for being black, being diagnosed with hypertension and having diabetes all ended up in the model. First-order interaction terms between the indicators were not significant and hence we dropped them from the model.

Table 2 shows that the groups are roughly equal in size, but differ with respect to some demographic variables. The differences are most pronounced for the racial makeup, with Blacks being over-represented in groups one and seven. Even though the differences might be more nuanced for some of the other variables, the null hypothesis that there are no differences among the groups is rejected for each variable at a significance level of 5% when conducting an ANOVA. As one can also see from Table 2, patients with diabetes are more likely to belong to the high-risk groups compared to groups such as seven and eight, which are characterized by stable and better eGFR values.

Table 2.

Displayed are the group sizes as a proportion of the entire population in the second column of the table, as well as the averages of the included time-stable covariates inside of each group in columns three to eight.

Group Proportion Females Black Mean Age Mean BMI Hypertension Diabetes
1 0.10 0.52 0.09 68.71 31.50 0.99 0.62
2 0.12 0.69 0.03 73.96 30.98 0.99 0.53
3 0.15 0.46 0.04 71.09 30.04 0.96 0.45
4 0.13 0.60 0.07 76.44 32.16 0.98 0.52
5 0.16 0.31 0.02 70.89 31.64 0.96 0.43
6 0.09 0.46 0.02 68.49 31.69 0.98 0.56
7 0.13 0.50 0.11 73.27 30.97 0.98 0.39
8 0.12 0.18 0.05 67.27 31.27 0.93 0.38
Overall 1.00 0.46 0.05 71.45 31.26 0.97 0.48

Since patient characteristics seem to tell only part of the story, inspecting the estimated trajectories for the biomarkers might reveal deeper insights into the risk factors associated with CKD. All estimated trajectories are displayed together in Figures 2. Background shading indicates a ranking of the group trajectories in terms of whether marker values are better or worse than those for the other groups. As emphasized by the color coding, groups one to four have worse values compared to the rest for most markers, with group one being uniformly worse and the other three each having one or more markers for which values are better, indicating that patients might not (yet) be afflicted by the corresponding complication.

Figure 2.

Figure 2.

Fitted trajectories of the eight-group multi-trajectory model for the five considered biomarkers

For example, groups one, two and four are all characterized by patients suffering from Anemia, as evidenced by their low hemoglobin values. It has been argued that there are “potentially severe consequences of anemia in CKD”.12 Indeed, these three groups develop worse in terms of eGFR than the rest, except for group three. Yet, the remaining markers also show signs for the complications, which furthermore illustrates that patients suffer not only from CKD, but often a variety of other chronic conditions.

The complications are associated with how fast CKD itself progresses: For example, high-risk group one shows the worst trajectories for all considered markers, and a majority of its members do progress to at least CKD Stage IV, if not ESRD, as can be glanced from Table 3.

Table 3.

Contingency table for group membership and the final CKD diagnosis of the patient. A patient can experience either no progression of his CKD status or advance to either Stage IV or Stage V, which is also called End-Stage Renal Disease (ESRD).

Group No Decline Stage IV ESRD Total
1 16 106 73 195
8.2% 54.4% 37.4% 10.0%
2 118 97 11 226
52.2% 42.9% 4.9% 11.6%
3 131 138 15 284
46.1% 48.6% 5.3% 14.6%
4 111 131 16 258
43.0% 50.8% 6.2% 13.3%
5 246 64 3 313
78.6% 20.4% 1.0% 16.1%
6 140 37 1 178
78.7% 20.8% 0.6% 9.2%
7 244 9 3 256
95.3% 3.5% 1.2% 13.2%
8 234 0 0 234
100.0% 0.0% 0.0% 12.0%
Total 1240 582 122 1944

As expected, groups with worse trajectories for the considered markers (such as groups one to four) have a larger percentage of patients ending up in a more serious disease stage.

For each diagnosis displayed in Table 3, a chi-squared test rejects the null hypothesis of independence between the variable in question and group membership (p < 0.001). Knowing the risk group a patient most likely belongs to is therefore inextricably tied to observable outcomes and diagnoses.

One promising use of trajectory models for health outcomes is to detect sub-populations of patients who are particularly at risk and might benefit from early interventions. In Figures 3, the changes in misclassification rate are displayed when not all time periods of the data are used, but only those up to a chosen period. As expected, the misclassifica- tion error for the full training data approaches zero when all periods of data are used, which is tautological because the group memberships are determined by assigning each patient to the group with the highest posterior probability. However, when we use fewer and fewer data points, the misclassification rate goes up since the patients cannot be placed definitely in one of the groups yet. The negative slopes of the two plotted lines are large in magnitude, though, indicating that the model does dramatically better as more data becomes available. To demonstrate that this pattern occurs also on previously unseen data, we set aside 20% of the observations as a test set and refit the model on the remaining 80% of the data. The misclassification rate on the test set is displayed as the solid line in the plot. It shows the same curvature, with the main difference being that the error does not approach zero but instead converges to 10% after all periods are taken into account. The group memberships deemed as ground truth are obtained from fitting a multi-trajectory model on the entire data set. These results are encouraging evidence that the model has useful predictive power.

Figure 3.

Figure 3.

Model performance of the multi-group trajectory model as a function of the number of time periods.

Given our interest in obtaining predictions for individual patients, it is convenient that the laws of probability lead naturally to individualized posterior probabilities of group membership for a patient given his or her lab measurements and demographic variables, which we can track over time. In Figures 4, posterior probabilites are displayed for an average patient with the values for the biomarkers set to the overalls means of the entire cohort. The average patient has an age of 71.45 years and a BMI of 31.26. All other covariates are set to their baseline levels, i.e. the patient is male, has neither diabetes nor hypertension and is white. Starting with the second quarter, the posterior probability of belonging to group three exceeds all others. The final group label emerges as the top choice after only two time periods, suggesting that the trajectory model quickly converges to assign a patient to one of the groups as new data comes in. This is an encouraging observation, as it shows that trajectory modeling might be useful not only in analyzing outcomes ex post, but also beforehand.

Figure 4.

Figure 4.

Posterior probabilities as a function of the number of time periods for the average patient in our patient population.

Profiling the average patient might not be of much interest in itself, but is illustrative of the potential uses of the model in predicting group membership for individual patients and sub-populations that are of special concern.

5. Conclusion

Using group-based trajectory modeling (GBTM) to detect patient risk groups for CKD sharing comparable trajectories for the estimated glomerular filtration rate (eGFR) and markers for four complications of CKD, we have identified eight groups in a multi-trajectory model for the five outcomes. The results are consistent with the previous study by Padman et al.,9 but build on those findings with the application of the multi-trajectory to the five jointly considered outcome variables, thus offering a more detailed perspective. The eight groups are characterized by distinct trajectories and patient demographics. For the high-risk groups 1-4, diabetes is prevalent, whereas it occurs less often for patients in the lower-risk groups 5-8. Interestingly, the proportion of black patients is largest in the two most extreme groups, one and eight, which are characterized by both the best and the worst development of the tracked biomarkers.

It is a well known fact that Blacks are more prone to develop CKD than non-Blacks. Yet, at the same time it has been observed that Blacks have a survival advantage in the ESRD population, the reasons of which are not entirely clear.19 Against this background, one might have expected the larger proportion of Blacks in the high-risk group, but not necessarily in group seven.

In an analysis of the progression of renal failure, Hannedouche et al. found gender to be correlated with disease progression, with male patients progressing significantly faster.20 We did find some large differences in the gender makeup of the eight groups, with group eight standing out in particular: Only 18% of patients assigned to this group were females. While patients in this group are younger than in the others, in line with the finding that males develop CKD faster, their eGFR rate is surprisingly stable and only few of its members progress to stage 4 or stage 5 of the disease. This is a somewhat unexpected finding, which could be explained by the fact that groups seven and eight might be separated just because of the difference in hemoglobin levels between the sexes.

It is known that CKD prevalence increases drastically with age and it has been speculated that elderly people might be more susceptible to CKD.1 Yet, Hallan et al. found low eGFR to be “independently associated with mortality and ESRD regardless of age across a wide range of populations” in patient cohorts selected for CKD.21 This is consistent with our results, which do not enable us to use age to discriminate between the high-risk and low-risk groups.

All in all, it seems that the value of demographic variables for risk stratification is not high, and that more emphasis should be placed on the trajectories of the different biomarkers. At the same time, given the complex nature of CKD, additional biomarkers such as Urine-Albumin Creatinine Ratio (UACR) may need to be included in future studies.

We have shown that patients can be classified into the detected groups with an accuracy of more than 50% after a year of data. Collapsing the groups into the high-risk and low-risk communities of groups 1-4 and 5-8, this number increases to approximately 82%. For previously unseen patients, accuracy increased to 90% after using all time periods as evaluated against a gold standard data set obtained by fitting the GBTM on all observations. These are encouraging signs, which indicate that GBTM could help with population risk stratification as well as to assess individual patient’s risk. For example, our model could be used for predicting stage progression (see the work by Perotte et al.22) or another outcome like the need for a transplant. To avoid distorting the results, this latter case would require the inclusion of transplant patients into the cohort, which could be achieved by incorporating all their lab values until the day of their transplant. Here a strength of GBTMs is rendered visible: The ability to handle missing data.

One limitation of this study is a lack of mortality data for the patient population, which might have caused a bias in our results. Also, results might not generalize to other populations, given that the patients in our data set are fairly homogeneous in terms of demographic variables such as race and age. To establish that the identified patterns are externally valid, analyses based on samples with different demographics should be conducted. In this study, we had to deal with several data quality issues of the EHR such as erroneous patient identifiers and wrongly encoded lab values. Since it is not possible to rule out that some issues might have persisted, further analyses on similar data sets should be undertaken to replicate the results.

GBTMs could be extended in several ways, too. Assuming a polynomial trajectory is may not reflect reality well, but is a convenient assumption since it makes model fitting tractable and improves interpretability of the model. Alternatively, one could consider a non-parametric approach, at the expense of interpretability. In addition, the model assumes that conditional on group membership, the elements of Yj are uncorrelated with each other. Weakening this assumption and permitting the off-diagonal elements of covariance matrix σj2I to be non-zero might yield a more realistic picture of the true underlying data generating process. Yet, the assumption is not as restrictive as it might appear on first glance: Outcomes are not modelled to be conditionally independent at the population level. The model merely assumes that conditional on the latent group membership, the Gaussian noise added to the trend line stays constant over time.

The analysis could also be enhanced by exploiting the rich data available inside the EHRs of each patient, most of which is unstructured. For example, clinical narratives written after each patient’s visit might reveal insights which the biomarkers themselves would not give away. Alternatively, one could investigate whether medications are associated with a decline or increase in one of the trajectories. This was not done in the present study because of significant difficulties posed by the high-dimensionality of the problem, which would require dimensionality reduction by mapping individual drug names to drug classes. This is the subject of a follow-up study conducted at the Heinz College at CMU.

While group-based trajectory modeling (GBTM) has seen increasing adoption in clinical research, to the best of our knowledge, this study is the first to jointly model multiple outcomes, thereby providing a fuller picture on patient’s disease progression. Given our results, we believe that multi-trajectory models provide a simple but powerful tool for risk stratification of patients as they allow identification of a finite number of groups with distinct trajectories for the developmental course of a disease and relevant predictors of group membership. Hence, GBTMs provide interpretable summaries of typical disease progressions and might help in the development of targeted, proactive interventions for patients in all risk groups.

Acknowledgement

We would like to thank the physicians and staff from Teredesai, McCann & Associates for providing the data for this study and their valuable input and suggestions on our analysis.

This study was designated as Exempt by the Institutional Review Board at Carnegie Mellon University.

References

  • 1.El Nahas aM, Bello AK. Chronic kidney disease: The global challenge. Lancet. 2005;(365):331–340. doi: 10.1016/S0140-6736(05)17789-7. [DOI] [PubMed] [Google Scholar]
  • 2.Hoerger TJ, Simpson SA, Yarnoff BO, Pavkov ME, Rios Burrows N, Saydah SH, et al. The Future Burden of CKD in the United States: A Simulation Model for the CDC CKD Initiative. Am J Kidney Dis. 2015 Mar;65(3):403–11. doi: 10.1053/j.ajkd.2014.09.023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.ECLRJC P, AMKC P, AMSPQCQ P. Costs of end-stage renal disease. 2013. Available from: http://www.usrds.org/2013/pdf/v2{_}ch11{_}13.pdf.
  • 4.Charles D, Gabriel M, Furukawa MF. Adoption of Electronic Health Record Systems among U. S. Non -federal Acute Care Hospitals: 2008–2013. Heal Inf Technol. 2014;2008(16):2–7. [Google Scholar]
  • 5.Navaneethan SD, Jolly SE, Sharp J, Jain A, Schold JD, Schreiber MJ, Jr, et al. Electronic health records: A new tool to combat chronic kidney disease? Clin Nephrol. 2013;79(3):175–183. doi: 10.5414/CN107757. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Jain AK, McLeod I, Huo C, Cuerden MS, Akbari A, Tonelli M, et al. When laboratories report estimated glomerular filtration rates in addition to serum creatinines, nephrology consults increase. Kidney Int. 2009;76(3):318–23. doi: 10.1038/ki.2009.158. Available from: http://www.ncbi.nlm.nih.gov/pubmed/194 36331. [DOI] [PubMed] [Google Scholar]
  • 7.James MT, Hemmelgarn BR, Tonelli M. Early recognition and prevention of chronic kidney disease. Lancet. 2010 Apr;375(9722):1296–309. doi: 10.1016/S0140-6736(09)62004-3. [DOI] [PubMed] [Google Scholar]
  • 8.Nagin DS. Cambridge, MA: Harvard University Press; 2005. Group-Based Modeling of Development. [Google Scholar]
  • 9.Padman R, Nagin DS, Xie Q. Disease Progression and Risk Prediction for Chronic Kidney Disease: Analysis of Electronic Health Record Data using Group-Based Trajectory Models.. In: Proc. Work. Inf. Syst. Technol.; Auckland.2014. [Google Scholar]
  • 10.Levey AS, Stevens LA. Estimating GFR Using the CKD Epidemiology Collaboration (CKD-EPI) Creatinine Equation: More Accurate GFR Estimates, Lower CKD Prevalence Estimates, and Better Risk Predictions. Am J Kidney Dis. 2010;55(4):622–627. doi: 10.1053/j.ajkd.2010.02.337. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Matsushita K, Selvin E, Bash LD, Astor BC, Coresh J. Risk Implications of the New CKD Epidemiology Collaboration (CKD-EPI) Equation Compared With the MDRD Study Equation for Estimated GFR: The Atherosclerosis Risk in Communities (ARIC) Study. Am J Kidney Dis. 2010;55(4):648–659. doi: 10.1053/j.ajkd.2009.12.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.O'Mara NB. Anemia in Patients With Chronic Kidney Disease. Diabetes Spectr. 2008;21(1):12–19. [Google Scholar]
  • 13.Tomasello S. Secondary Hyperparathyroidism and Chronic Kidney Disease. Diabetes Spectr. 2008;21(1):19–25. [Google Scholar]
  • 14.Ka Hruska, Mathew S, Lund R, Qiu P, Pratt R. Hyperphosphatemia of chronic kidney disease. Kidney Int. 2008;74(2):148–157. doi: 10.1038/ki.2008.130. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Kraut JA, Kurtz I. Metabolic acidosis of CKD: Diagnosis, clinical characteristics, and treatment; 2005. [DOI] [PubMed] [Google Scholar]
  • 16.Nagin DS. Analyzing developmental trajectories: A semiparametric, group-based approach. Psychol Methods. 1999;4(2):139–157. doi: 10.1037/1082-989x.6.1.18. [DOI] [PubMed] [Google Scholar]
  • 17.Nagin DS, Jones BL, Lima Passos V, Tremblay RE. Group-Based Multi-Trajectory Modeling; 2016. [DOI] [PubMed] [Google Scholar]
  • 18.Jones BL, Nagin DS, Roeder K. Roe. A SAS procedure based on mixture models for estimating developmental trajectories. Sociol Methods Res. 2001;29(3):374–393. [Google Scholar]
  • 19.Mehrotra R, Kermah D, Fried L, Adler S, Norris K. Racial differences in mortality among those with CKD. J Am Soc Nephrol. 2008;19(7):1403–1410. doi: 10.1681/ASN.2007070747. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Hannedouche T, Chauveau P, Kalou F, Albouze G, Lacour B, Jungers P. Factors affecting progression in advanced chronic renal failure. Clin Nephrol. 1993;39(6):312–320. [PubMed] [Google Scholar]
  • 21.Hallan SI, Matsushita K, Sang Y, Mahmoodi BK, Black C, Ishani A, et al. Age and Association of Kidney Measures With Mortality and End-stage Renal Disease. Jama. 2012;308(22) doi: 10.1001/jama.2012.16817. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Perotte A, Ranganath R, Hirsch JS, Blei D, Elhadad N. Risk prediction for chronic kidney disease progression using heterogeneous electronic health record data and time series analysis. J Am Med Informatics Assoc. 2015;22(4):872–880. doi: 10.1093/jamia/ocv024. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from AMIA Annual Symposium Proceedings are provided here courtesy of American Medical Informatics Association

RESOURCES