Introduction
Precision medicine aims to prevent and treat diseases by tailoring management using individual-specific characteristics. Such approaches are especially poised to make an impact in diseases with heterogeneous presentation where treatment plans must vary based on the individual’s disease course. For example, patients with scleroderma, a systemic autoimmune disorder, show a variety of different patterns of disease activity. Individuals can develop complications of the lungs, skin, GI tract, heart, and kidneys to varying extents and at varying rates of progression [1]. Therefore, clinicians must guess which organs should be targeted with aggressive therapy (with potentially harmful side-effects). In many diseases, few biomarkers currently exist to accurately predict an individual’s course. Thus, we propose a computational framework that integrates diverse (static and time-varying) markers to predict the trajectory of a patient’s scleroderma-related lung disease. In addition to tailoring predictions, this dynamic inference procedure implements the vision of a learning health system that adapts and adjusts itself over time.
Methods
We develop a probabilistic framework that represents the disease activity trajectories at multiple resolutions: the population model captures effects that are shared across all individuals in the population, the subpopulation model captures effects specific to a subtype [1,2] — a group of individuals with similar disease presentation — and the individual model captures effects specific to a given individual. Population and subpopulation model parameters are learned offline while the individual-specific parameter estimates are refined as more data about the individual are observed. Personalized predictions are computed in real-time by using all of the data in the individual’s clinical history and marginalizing over the individual-specific parameters. Lung health is measured using a clinical marker called PFVC (% of predicted forced vital capacity). Our model uses the PFVC history of the individual to dynamically update predictions. In addition, it uses individual characteristics including demographic (gender and race) and serologic measurements (ACA and Scl-70 antibody positivity).
Results
Figure 1 shows the predictions obtained using the proposed framework for two individuals. Initial PFVC levels are comparable across both patients. After one year of followup, indicated by points (A) and (B), our model is able to correctly predict that the PFVC of the individual in the first row will remain stable, while the other will experience PFVC decline. After 4 years of data the confidence in each prediction is strengthened in spite of the sharp consecutive drop that both individuals exhibit—indicated by points (C) and (D). In particular, after a transient decrease in PFVC in the top patient at point (C) resulting from an episode of cholecystitis, the predicted PFVC trajectory appropriately continued to predict a largely stable course of lung disease. Using 10-fold cross validation on 672 patients, our model achieves mean absolute errors of 10.37, 8.95, and 6.98 when predicting PFVC values between 8-12 years of followup using 1, 2, and 4 years of data respectively.
Figure 1:
Top row shows white male, around 40 years old, Scl-70 pos, ACA neg, diffuse skin. Bottom row shows African American male, around 50 years old, Scl-70 pos, ACA neg, diffuse skin. Black dots indicate observed points at time of prediction, red dots indicate actual future points. The blue area represents the most likely predicted trajectory. The light green area represents the second most likely trajectory, Pr(*) = confidence in trajectory expressed as a probability.
Discussion
We have introduced a principled framework for integrating diverse clinical marker data to provide personalized prognosis of an individual’s disease trajectory. The proposed framework leverages the idea of refining predictions by modeling deviations at multiple resolutions — deviations from the population that are common to a subpopulation and deviations across individuals within a subpopulation.
References
- [1].J Varga, Denton CP, Wigley FM. Scleroderma: From Pathogenesis to Comprehensive Management. Springer; 2012. [Google Scholar]
- [2].Saria S, Goldenberg A. IEEE Intelligent Systems. 2015. Subtyping: What It Is and Its Role in Precision Medicine. [Google Scholar]
- [3].Schulam PF, Wigley FM, Saria S. Proc. of Association for the Advancement of Artificial Intelligence. 2015. Clustering Longitudinal Clinical Marker Trajectories from Electronic Health Data: Applications to Phenotyping and Endotype Discovery. [Google Scholar]

