Hierarchical Gaussian Processes and Mixtures of Experts to Model COVID-19 Patient Trajectories

Sunny Cui; Elizabeth C Yoo; Didong Li; Krzysztof Laudanski; Barbara E Engelhardt

. Author manuscript; available in PMC: 2025 Sep 24.

Published in final edited form as: Pac Symp Biocomput. 2022;27:266–277.

Hierarchical Gaussian Processes and Mixtures of Experts to Model COVID-19 Patient Trajectories

Sunny Cui ¹, Elizabeth C Yoo ², Didong Li ^1,³, Krzysztof Laudanski ⁴, Barbara E Engelhardt ^1,^5,⁶

PMCID: PMC12456735 NIHMSID: NIHMS1887226 PMID: 34890155

Abstract

Gaussian processes (GPs) are a versatile nonparametric model for nonlinear regression and have been widely used to study spatiotemporal phenomena. However, standard GPs offer limited interpretability and generalizability for datasets with naturally occurring hierarchies. With large-scale, rapidly-updating electronic health record (EHR) data, we want to study patient trajectories across diverse patient cohorts while preserving patient subgroup structure. In this work, we partition our cohort of over 2000 COVID-19 patients by sex and ethnicity. We develop and apply a hierarchical Gaussian process and a mixture of experts (MOE) hierarchical GP model to fit patient trajectories on clinical markers of disease progression. A case study for albumin, an effective predictor of COVID-19 patient outcomes, highlights the predictive performance of these models. These hierarchical spatiotemporal models of EHR data bring us a step closer toward our goal of building flexible approaches to capture patient data that can be used in real-time systems^*.

Keywords: COVID-19, electronic health record, Gaussian processes, patient trajectories

1. Introduction

The highly contagious nature of the emergent coronavirus (COVID-19) and limited knowledge of treatment methods necessitate decision support tools that can efficiently estimate and predict patient trajectories in order to measure disease progression. Notably, recent findings report considerable disparities in manifestations of COVID-19 across racial minorities within the United States, with a disproportionately high frequency of hospitalizations among African American, Hispanic, and Native American populations.¹ Higher rates of obesity, a known high-risk comorbidity, are observed in marginalized groups, which contribute to more severe illnesses and higher mortality rates for these patients.² Worse outcomes arise due to a complex combination of physiological, socioeconomic, behavioral, and cultural factors. A model that can account for group structures that arise both inherently and environmentally is necessary in order to develop clinical recommendations tailored to individual patients and to mitigate bias in treatment procedures; at the same time, that model should also allow for the sharing of signal across groups when patient group sample sizes are small.

The Hospitals at the University of Pennsylvania (HUP) COVID-19 dataset contains clinical observations of 2069 patients who tested positive for COVID-19 via a PCR test between April 2020 to August 2020 at the University of Pennsylvania Medical Center (UPMC) hospital in Philadelphia, PA.

This anonymized dataset includes the following patient information:

patient demographic information including age, sex and ethnicity;
labs and vital sign measurements, including blood serum creatinine, partial pressure of oxygen, and total urine output;
procedural information, including details of mechanical ventilation, nasal cannula, and liters of oxygen flow; and
medication information including type, dosage, and time of administration.

With an emergent disease like COVID-19, we want a model that is robust to missing and noisy patient data, and also computationally tractable to allow continuous data updates. Known for their flexibility, interpretability, and uncertainty quantification, Gaussian processes (GPs) have proven useful in machine learning,³ spatiotemporal statistics,⁴ and functional data analysis.⁵ Among their applications, GP regression is a nonparametric regression model that places a distribution on arbitrary nonlinear functions with smoothness modulated by the selected kernel function.⁶ Updated by observations, the GP posterior enables predictions and uncertainty estimates at unobserved locations on sequences, such as the time or space domain, including the future. Due to the Gaussian assumption of the joint distributions over observations, the posterior is Gaussian with closed-form mean and variance terms.

Previous work has exploited the flexibility of GPs to obtain insights into problems in healthcare, including early detection of sepsis through multi-output GPs,^7,8 online updates of patient vital signals with sparse multi-output GPs,⁹ and reliable prediction of adverse hospital events by jointly modeling longitudinal trajectories and time-to-event data.¹⁰

For the task of modeling disease trajectories, particularly for a large patient cohort, using standard GP regression is insufficient because many complex diseases such as lupus and pneumonia manifest heterogeneously in patients across different demographic and clinical subgroups.^11,12 Noting this heterogeneity, prior work placed a hierarchy on scleroderma patients at the population, subgroup, and individual levels.¹⁰ B-splines were used to model each subgroup trajectory and a GP was used to capture noise.

Although the MedGP approach⁹ combined information across patients using an empirical Bayes approach, allowing subgroups to be captured via kernel parameters, it lacks a rigorous approach to evaluating group structure and posteriors. Motivated by the need to explicitly account for group structure, our framework builds on the premise of a group structure in the patient population and provides a fully Bayesian treatment of hierarchical disease trajectory modeling.

The contributions of this work are as follows: At a high level, we develop a flexible Gaussian process that is able to capture sparse, noisy, electronic health record (EHR) time-series data. More specifically, we build a hierarchical mixture of experts (MOE) Gaussian process (GP) regression model that allows sharing of strength across patient samples with known group structure. The MOE allows each sample to participate in multiple patient groups simultaneously, such as inclusion in both the female (sex) and Black (race) patient groups. Furthermore, our fast closed-form inference method allows us to apply this framework to hundreds of COVID-19 patient trajectories to show its robustness in fitting a variety of clinically important covariates.

This paper is organized as follows: In Section 2, we discuss the background for standard, hierarchical, and MOE Gaussian process regression models. We introduce our framework of MOE hierarchical Gaussian process regression in Section 3. We demonstrate the performance of our framework on COVID-19 patient EHR data and discuss the implications of these results in Section 4. We conclude by exploring future directions in Section 5.

2. Background

In this section, we provide a brief summary of GP regression and its extension to a Bayesian hierarchical setting.

2.1. Gaussian process regression (GPR)

We consider the Bayesian analysis of standard linear regression $f (x_{i}) = β^{T} x_{i}$ , where $β$ is the weights of the linear model, $x_{i}$ are regressors, and $f (x_{i})$ is the noiseless function. Given observed data $D = (X, Y)$ where $X = {\{x_{i}\}}_{i = 1}^{n}$ are regressors such as time across $n$ total observations, and $Y = {\{y_{i}\}}_{i = 1}^{n}$ are noisy, scalar responses, then we can write each response as $y_{i} = f (x_{i}) + ϵ_{i}$ where $ϵ_{i} \sim N (0, σ^{2})$ is Gaussian white noise. Given a new set of regressors $X_{*} = \{x_{*}\}$ , the goal is to predict the responses $Y_{*} = f (X_{*})$ .

We can extend these linear models to nonlinear regression functions using Gaussian processes. Gaussian process regression is a probability distribution over arbitrary smooth functions such that any finite realization is a multivariate Gaussian random variable. For any observations $X = [x_{1}, \dots, x_{n}]$ ,

{[f (x_{1}), \dots, f (x_{n})]}^{⊤} \sim G P ({[m (x_{1}), \dots, m (x_{n})]}^{⊤}, (κ (x_{i}, x_{j}))),

where $m (\cdot)$ is the mean function and $κ (\cdot, \cdot)$ is a positive definite kernel function. As in prior work, the mean function $m$ is assumed to be zero.⁹ There are many possible positive definite kernel functions $κ$ , including exponential (Ornstein-Uhlenbeck), squared exponential, and Matérn covariance functions. These covariance functions include parameters that control the spatial variance and decay of the dependency over the domain; these kernel parameters are often estimated by maximizing the log likelihood (MLE):

\log (p (Y ∣ X)) = \log N (Y ∣ 0, Γ) = - \frac{1}{2} Y^{⊤} {(Γ + σ^{2} I)}^{- 1} Y - \frac{1}{2} \log |Γ + σ^{2} I| - \frac{N}{2} \log (2 π),

where $Γ_{i j} = κ (x_{i}, x_{j})$ . Let $Γ_{*} = κ (X, X_{*})$ and $Γ_{* *} = κ (X_{*}, X_{*})$ then the posterior of $Y_{*}$ is given by

\begin{array}{l} p (Y_{*} ∣ X_{*}, X, Y) = N (Y_{*} ∣ μ_{*}, Σ_{*}) \\ μ_{*} = Γ_{*}^{T} {(Γ + σ^{2} I)}^{- 1} Y \\ Σ_{*} = Γ_{* *} - Γ_{*}^{T} {(Γ + σ^{2} I)}^{- 1} Γ_{*} . \end{array}

A point estimate of $Y_{*}$ is given by $μ_{*}$ , the posterior mean, while $Σ_{*}$ is the variance of this posterior mean.

The computational complexity of inference for GPR is $O (n^{3})$ because of the need to invert $Γ$ , an $n$ by $n$ matrix. Fortunately, there is an immense literature on scalable inference algorithms for GPs, including tapering.¹³ The idea of tapering is to impose zero correlation between two points that are not close to each other by multiplying $κ$ by a tapering function $T : κ_{T} : = κ (x, y) T (x, y)$ . For example, when $T (x, y) = 1_{{‖ x - y ‖ < ϵ}}$ , $κ_{T} (x, y) = 0$ if $‖ x - y ‖ \geq ϵ$ , resulting in a sparse block diagonal covariance matrix.

2.2. Hierarchical Gaussian process (HGP) regression

One of the main challenges in predicting future values of a disease trajectory or imputing unobserved values within a trajectory is that biological and environmental factors lead to high variance in patient state and disease progression. For instance, many diseases include one or more disease subtypes, and the progression and severity of a disease can vary across patients with different ages, sexes, or chronic conditions.

For datasets with known subgroups, hierarchical models are a natural choice because they allow the sharing of information across and within subgroups. The use of hierarchical models allows precise modeling of each subgroup and sharing of signal across all of the subgroups; it is particularly beneficial in the case where each subgroup has a small sample size.

Hierarchical structure can be enforced through the mean function, the covariance function, or a structured prior. Prior work [14] placed a hierarchy on the mean function parameters to model PM_2.5 levels, a measurement of air quality, much like the spline model for individualized disease prediction.¹⁰ Other work [15] placed a hierarchy on gene expression at two levels—each experiment and each replicate gene—to model heterogeneity. Conjugate inverse Gamma priors were placed on the kernel parameters to model the relationships between low and high accuracy experiments.¹⁶ Variants of the hierarchical model include hierarchical MOE that lends a tree structure in computing parameter values,¹⁷ deep GPs in which inputs to each GP have their own GP prior.¹⁸ This work uses subsets of inducing points to fit experts, which hold information at the group and individual levels.¹⁹

3. Hierarchical Gaussian process regression for patient trajectories

In the context of prior work, we develop a Bayesian hierarchical GP regression model for patient data. We group the patient population by attributes including sex and ethnicity. We impose a hierarchy on these trajectories at the group and individual levels by letting the mean of each level in the hierarchy be distributed by a Gaussian process parameterized for the level above. We use $k = 1, \dots, K$ as the group-level subscript and $i = 1, \dots, N_{k}$ as the patient-level subscript in group $k$ . All patients in the $k$ th subgroup share an underlying trajectory modeled by $g_{k} (x)$ . Patient $i$ in subgroup $k$ is associated with a unique trajectory, denoted by $f_{k, i} (x)$ , that is influenced by various factors including demographics, lifestyle choices, genetic predispositions, and pre-existing conditions. Then,

\begin{array}{l} g_{k} \sim G P (0, k_{g}) \\ f_{i_{k}} \sim G P (g_{k}, κ_{f}) . \end{array}

Let $Y_{k} = {\{y_{k, i}\}}_{i = 1}^{N_{k}}$ be the collection of noisy observations of clinical markers of $N_{k}$ patients in subgroup $k$ at time points $X_{k} : = {\{X_{k, i}\}}_{i = 1}^{N}$ . The covariance between the data $Y$ and the functions $f (\cdot)$ , $g (\cdot)$ is

Cov (y_{k, i} (x), g_{k} (x^{'})) = κ_{g} (x, x^{'}) Cov (y_{k, i} (x), f_{k, i^{'}} (x^{'})) = \{\begin{array}{l} κ_{g} (x, x^{'}) + κ_{f} (x, x^{'}) if k = k^{'} \\ κ_{g} (x, x^{'}), otherwise . \end{array}

3.1. HGP kernel functions and tapering

Our model uses an additive hierarchical kernel, similar to that introduced by [15], with tapering that further enforces sparsity. For flexibility in the smoothness of the inferred functions, we choose the Matérn kernel with parameter $ν$ that controls the smoothness of the GP:²⁰

κ (x, x^{'}) = \frac{σ^{2}}{Γ (ν) 2^{ν - 1}} {(\frac{\sqrt{2 ν}}{γ} d (x, x^{'}))}^{ν} K_{ν} (\frac{\sqrt{2 ν}}{γ} d (x, x^{'})),

where $K_{ν}$ is the modified Bessel function of the second kind with order $v$ . In practice, we estimate these parameters by maximizing the likelihood. In our model, we set kernel parameter $ν = \frac{5}{2}$ at the group level, and $ν = \frac{3}{2}$ at the individual level.

With this kernel function, we model the data distribution as multivariate normal.

Y_{n} ∣ X_{n}, θ \sim N ({\hat{y}}_{n} ∣ 0, Σ_{n}) .

The parameters $θ$ are $\{α^{T}, β^{T}, γ^{T}\}$ . The covariance matrix $Σ_{n}$ is written as

Σ_{k} (i, i^{'}) = \{\begin{array}{l} Γ_{g} (x_{k, i}, x_{k, i^{'}}) + Γ_{f} (x_{k, i}, x_{k, i^{'}}) + β \cdot I, if i = i^{'} \\ Γ_{g} (x_{k, i}, x_{k, i^{'}}), otherwise. \end{array}

Both $Γ_{g}$ and $Γ_{f}$ are matrices formed by evaluating $κ_{g}$ and $κ_{f}$ , respectively, on $x_{k, i}$ and $x_{k, i^{'}}$ . These covariance matrices inherit a natural block structure from the kernels (Fig. 2). To scale up the HGP with computational complexity $O (n^{3})$ , we further perform tapering to enforce relationships only between close time points. Tapering encodes sparsity in the covariance matrix on the off-diagonal elements that are more distant from each other in time, which improves inference tractability.¹³

Fig. 2: — Model setup (left) and block structure of HGP covariance matrix (right).

3.2. Mixture of experts

Although the HGP allows us to model group structure and individual patient trajectories that differ from the group, its exponential cost with respect to number of groups renders it impractical for large patient cohorts with many groups. Because each patient belongs to multiple groups simultaneously – sex, ethnicity, and disease subtype for instance – we want a tractable way to combine information from all of the patient’s group attributes, i.e., an additive kernel. Thus, we extend the HGP with mixture of experts (MOE) kernels at the group level (Fig. 3). Originally developed to handle multiple modalities in large datasets,²¹ MOE GPs can be adapted to a hierarchical setting such that the group-level kernel is the sum of attribute kernels of patients belonging to that group. An ensemble of local experts allows the kernel function to adapt to each observation,²² which in our case corresponds to a patient. Again, we use a tapered Matérn 5/2 kernel at the group level and a tapered Matérn 3/2 kernel at the patient level. We perform efficient close-form inference using the SciPy Optimizer.

Fig. 3: — Model setup (left) and MOE HGP covariance matrix (right).

4. Experiments

We first benchmark our MOE HGP model, using HUP patient trajectories, against standard GPR and an HGP. We then present examples of fitted and predicted trajectories of cluster representatives, or patients whose trajectory minimizes the Wasserstein distance to all other patients in their subgroup. Intuitively, the cluster representative corresponds to the patient who best captures the canonical trajectory of that group.

We evaluate the performance of our MOE HGP on COVID-19 patient trajectories from the Hospitals at the University of Pennsylvania (HUP). For the purposes of model fitting, we only consider patient trajectories with over 25 observations corresponding to unique time points. We group patients based on attributes of sex (male and female) and ethnicity (Black and white). We create balanced patient cohorts with 30 patients per permutation of groups (i.e., 30 Black women, 30 Black men, 30 white women, 30 white men).

For each patient and each covariate, we select 25% of the measurements randomly as the test set and use the remaining measurements as the training set. It is also possible to include future time points in the test set, albeit at the expense of GP model performance as test points extend further into the future, meaning there is greater uncertainty in the predictions.²⁰ To evaluate performance, we use mean squared error (MSE) and $R^{2}$ metrics to compare the train and test sets to predicted values.

We also evaluate the 95% confidence intervals (CIs) to measure model calibration for GPR, HGP, and MOE HGP. In our discussion, the values reported for 95% CI calibration refer to the percentage of points that fall outside the 95% confidence interval. We focus on albumin as our covariate of interest, as it has been shown to be a clinical marker of COVID-19 progression.²³ The results for albumin are representative of trends across covariates in the dataset (see Supplementary material for details).

The shapes of the patient trajectories for albumin vary greatly (Fig. 4). GPR cannot, for example, capture the trajectory of patient 11, but the HGP and MOE HGP are able to do so. For patient 38, the more granular trends for the first few time points are captured by the MOE HGP, but not the HGP. The average train MSE across patients for the covariate albumin is the lowest for the MOE HGP. The average test MSE across patients is comparable across the three models. However, the train and test $R^{2}$ values, and the 95% CI calibration, are much better across patients for the HGP and MOE HGP as compared to GPR (Table 1).

Table 1:

Model metrics for covariate albumin

Model	Train MSE	Test MSE	% of Patient Train $R^{2}$ s for which Model > GPR	% of Patient Test $R^{2}$ s for which Model > GPR	% of Patient 95% CIs for which Model is better than GPR	% of Patient 95% CIs for which Model is same as GPR

GPR	0.04	0.21	—	—	—	—
HGP	0.03	0.23	73.17	58.54	21.95	60.98
MOE	0.02	0.21	60.98	53.66	21.95	68.29

Open in a new tab

We find substantial overlap in the the patient trajectories that benefit from the MOE HGP and HGP over GPR. Patient 7’s trajectory is a canonical case in which the $R^{2}$ value is greatly improved with the HGP and MOE HGP (Fig. 5, Top). The mean function for GPR appears to a running average in the first half of the observed time points. The HGP and MOE HGP both provide better fits where GPR cannot. Similar to patient 7, patient 2’s trajectory has higher variance with GPR (Fig. 5, Bottom). This large variance has negative consequences on the 95% CI calibration. This patient has eight test points, so GPR gives a 95% CI of 0%, but the HGP and MOE HGP give 95% CIs of 25% since they each have two “outlier” test points. Taken together, these empirical results suggest that the two hierarchical models are more effective on these complex patient trajectories.

Fig. 5: — Exemplars of patient trajectories benefiting from HGP and MOE HGP.

The importance of group structure becomes more evident when we examine the kernel parameters at the patient level. The MOE HGP has lower spatial variance across all patients, as reflected in the distribution of the patient-level kernel variance parameters. GPR, lacking a group structure, defaults to learning a higher variance parameter. The structure of the MOE HGP is also useful for comparison across groups. When partitioning the patient cohort by ethnicity and sex, we see that Black patients have higher variance parameters than white patients do (Fig. 6). We do not observe meaningful differences in these parameters between male and female patients.

Fig. 6: — Patient level kernel parameters for GPR versus the MOE HGP for *albumin*.

Next, we fit the three models to the following clinical markers of COVID-19 disease progression for a randomly selected patient: anion gap, creatinine, partial pressure of oxygen (PO₂ Arterial), blood carbon dioxide levels (CO₂), fraction of inspired oxygen (FIO₂) and blood oxygen saturation (Arterial O₂ Content) (Fig. 7). Our experiments suggest that the MOE HGP effectively fits these markers for any randomly selected patient in the cohort.

Fig. 7: — Covariate trajectories for a randomly selected patient in the cohort to demonstrate the robustness of the MOE HGP.

Across patients and groups, we see that the HGP and MOE HGP consistently outperform GPR in fitting patient trajectories for albumin, blood CO₂, fraction of oxygen inspired FIO₂, and lactic acid (Supplementary material Fig. 1–3, 5). These covariates – albumin as an indicator of kidney function and the remaining covariates as indicators of cardiovascular function – can inform immediate treatment decisions. Furthermore, the MOE HGP demonstrates superior uncertainty quantification over the HGP by giving the best 95% CI calibration at no observed cost to the test MSE, as reported for albumin, blood CO₂, fraction of oxygen inspired FIO₂ (Table 1, Supplementary material Tables 2–4). The MOE HGP’s strong performance, particularly in capturing complex trajectories with low spatial variance, can be attributed to its incorporation of group structures.

5. Conclusion

We propose a hierarchical mixture of experts Gaussian process (MOE HGP) model to fit and predict COVID-19 patient trajectories for clinically relevant covariates. We show that our MOE HGP model is effective in analyzing covariates and provides an in-depth analysis for albumin. We demonstrate the robustness of our model for an individual patient on indicators of blood oxygen levels like arterial PO₂, CO₂ and FIO₂. Theses covariates are noisy yet useful for monitoring patient state in ICUs. Overall, the MOE HGP allows us to model groups separately while sharing signal across groups to enable more precise modeling of the natural group structure in patient populations without losing statistical power.

A natural extension of this work is to generalize the model to perform multi-output predictions. Because clinical covariates are often correlated, a multi-output GP that captures correlations between disparate covariates, in addition to correlations between observations within a single covariate, would be useful for more accurately modeling of clinical markers across time. With a multi-output model, we may include larger patient cohorts that are more diverse with respect to group attributes that could serve as proxies of socioeconomic status such as zip code, marriage status, and insurance status. We anticipate that we would be able to leverage such group structure to explore differences in disease trajectory or biases in treatment. Other group attributes like age, body mass index (BMI), and estimated glomerular filtration rate (eGFR) inform our understanding of how comorbidities such as obesity and renal disease impact disease progression within a certain socioeconomic or ethnic subpopulation.

Another direction of future work may be to apply contrastive learning, or methods that capture differences between the groups using parameters present in one but not the other.²⁴ Contrastive modeling has been applied to linear dimension reduction²⁵ and formalized to a probabilistic model-based alternative.^26,27 With an extension of probabilistic contrastive modeling to Gaussian processes, we could improve the group-based prior for our model with information regarding differences between patients from traditionally marginalized populations, the “foreground” group, and their majority counterparts, the “background” group.

Fig. 1: — Patient cohort breakdown. Cohort size (top left); patient mortality by ethnicity (top right); patient mortality by age and sex (males bottom left and females bottom right)

6. Acknowledgments and Appendices

We would like to thank the University of Pennsylvania Medical Center for providing the data and consultation regarding clinical domain knowledge. This work was funded in part by a COVID-19 grant from the Fast Grants program, a grant from the Helmsley Trust, a grant from the NIH Human Tumor Atlas Research Program, NIH NHLBI R01 HL133218, and NSF CAREER AWD1005627.

Footnotes

The code and supplementary material are available at: https://github.com/bee-hive/HGP-MOE

References

1.Karaca-Mandic P, Georgiou A and Sen S, Assessment of COVID-19 hospitalizations by race/ethnicity in 12 states, JAMA internal medicine 181, 131 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Rodriguez F, Solomon N, de Lemos JA, Das SR, Morrow DA, Bradley SM, Elkind MS, Williams JH, Holmes D, Matsouaka RA et al. , Racial and ethnic differences in presentation and outcomes for patients hospitalized with COVID-19: findings from the American Heart Association’s COVID-19 Cardiovascular Disease Registry, Circulation 143, 2332 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Rasmussen CE, Gaussian processes in machine learning, in Summer school on machine learning, 2003. [Google Scholar]
4.Banerjee S, Carlin BP and Gelfand AE, Hierarchical modeling and analysis for spatial data (CRC press, 2014). [Google Scholar]
5.Shi JQ and Choi T, Gaussian process regression analysis for functional data (CRC Press, 2011). [Google Scholar]
6.Ghosal S and Van der Vaart A, Fundamentals of nonparametric Bayesian inference (Cambridge University Press, 2017). [Google Scholar]
7.Futoma J, Hariharan S, Heller K, Sendak M, Brajer N, Clement M, Bedoya A and O’brien C, An improved multi-output Gaussian process RNN with real-time validation for early sepsis detection, in Machine Learning for Healthcare Conference, 2017. [Google Scholar]
8.Futoma J, Hariharan S and Heller K, Learning to detect sepsis with a multitask Gaussian process RNN classifier, in International Conference on Machine Learning, 2017. [Google Scholar]
9.Cheng L-F, Dumitrascu B, Darnell G, Chivers C, Draugelis M, Li K and Engelhardt BE, Sparse multi-output Gaussian processes for online medical time series prediction, BMC Medical Informatics and Decision Making 20, p. 152 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Schulam P and Saria S, A framework for individualizing predictions of disease trajectories by exploiting multi-resolution structure, arXiv preprint arXiv:1601.04674 (2016). [Google Scholar]
11.Tan TC, Fang H, Magder LS and Petri MA, Differences between male and female systemic lupus erythematosus in a multiethnic population, The Journal of Rheumatology (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Gutiérrez F, Masiá M, Mirete C, Soldán B, Carlos Rodr J Iguez, S. Padilla, I. Hernández, G. Royo and A. Martin-Hidalgo, The influence of age and gender on the population-based incidence of community-acquired pneumonia caused by different microbial pathogens, Journal of Infection 53, 166 (2006). [DOI] [PubMed] [Google Scholar]
13.Kaufman CG, Schervish MJ and Nychka DW, Covariance tapering for likelihood-based estimation in large spatial data sets, Journal of the American Statistical Association 103, 1545 (2008). [Google Scholar]
14.Yu W, Liu Y, Ma Z and Bi J, Improving satellite-based pm2.5 estimates in China using Gaussian processes modeling in a Bayesian hierarchical setting, Scientific Reports 7, p. 7048 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Hensman J, Lawrence ND and Rattray M, Hierarchical Bayesian modelling of gene expression time series across irregularly sampled replicates and clusters, BMC Bioinformatics 14, p. 252 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Qian PZG and Wu CFJ, Bayesian hierarchical modeling for integrating low-accuracy and high-accuracy experiments, Technometrics 50, 192 (2008). [Google Scholar]
17.Ng JW and Deisenroth MP, Hierarchical mixture-of-experts model for large-scale Gaussian process regression, arXiv preprint arXiv:1412.3078 (2014). [Google Scholar]
18.Damianou A and Lawrence ND, Deep Gaussian processes, in Artificial intelligence and statistics, 2013. [Google Scholar]
19.Lee B-J, Lee J and Kim K-E, Hierarchically-partitioned Gaussian process approximation, in Artificial Intelligence and Statistics, 2017. [Google Scholar]
20.Stein ML, Interpolation of spatial data: some theory for kriging (Springer Science & Business Media, 2012). [Google Scholar]
21.Rasmussen CE and Ghahramani Z, Infinite mixtures of Gaussian process experts, Advances in neural information processing systems 2, 881 (2002). [Google Scholar]
22.Zhao X, Fu Y and Liu Y, Human motion tracking by temporal-spatial local Gaussian process experts, IEEE Transactions on Image Processing 20, 1141 (2010). [DOI] [PubMed] [Google Scholar]
23.feng Huang J, Cheng A, Kumar R, Fang Y, Chen G, Zhu Y and Lin S, Hypoalbuminemia predicts the outcome of COVID-19 independent of age and co-morbidity, Journal of Medical Virology (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Zou JY, Hsu DJ, Parkes DC and Adams RP, Contrastive learning using spectral methods, Advances in Neural Information Processing Systems 26, 2238 (2013). [Google Scholar]
25.Abid A, Zhang MJ, Bagaria VK and Zou J, Exploring patterns enriched in a dataset with contrastive principal component analysis, Nature Communications 9, p. 2134 (May 2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Li D, Jones A and Engelhardt B, Probabilistic contrastive principal component analysis, arXiv preprint arXiv:2012.07977 (2020). [Google Scholar]
27.Jones A, Townes FW, Li D and Engelhardt BE, Contrastive latent variable modeling with application to case-control sequencing experiments, arXiv preprint arXiv:2102.06731 (2021). [Google Scholar]

[R1] 1.Karaca-Mandic P, Georgiou A and Sen S, Assessment of COVID-19 hospitalizations by race/ethnicity in 12 states, JAMA internal medicine 181, 131 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] 2.Rodriguez F, Solomon N, de Lemos JA, Das SR, Morrow DA, Bradley SM, Elkind MS, Williams JH, Holmes D, Matsouaka RA et al. , Racial and ethnic differences in presentation and outcomes for patients hospitalized with COVID-19: findings from the American Heart Association’s COVID-19 Cardiovascular Disease Registry, Circulation 143, 2332 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] 3.Rasmussen CE, Gaussian processes in machine learning, in Summer school on machine learning, 2003. [Google Scholar]

[R4] 4.Banerjee S, Carlin BP and Gelfand AE, Hierarchical modeling and analysis for spatial data (CRC press, 2014). [Google Scholar]

[R5] 5.Shi JQ and Choi T, Gaussian process regression analysis for functional data (CRC Press, 2011). [Google Scholar]

[R6] 6.Ghosal S and Van der Vaart A, Fundamentals of nonparametric Bayesian inference (Cambridge University Press, 2017). [Google Scholar]

[R7] 7.Futoma J, Hariharan S, Heller K, Sendak M, Brajer N, Clement M, Bedoya A and O’brien C, An improved multi-output Gaussian process RNN with real-time validation for early sepsis detection, in Machine Learning for Healthcare Conference, 2017. [Google Scholar]

[R8] 8.Futoma J, Hariharan S and Heller K, Learning to detect sepsis with a multitask Gaussian process RNN classifier, in International Conference on Machine Learning, 2017. [Google Scholar]

[R9] 9.Cheng L-F, Dumitrascu B, Darnell G, Chivers C, Draugelis M, Li K and Engelhardt BE, Sparse multi-output Gaussian processes for online medical time series prediction, BMC Medical Informatics and Decision Making 20, p. 152 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] 10.Schulam P and Saria S, A framework for individualizing predictions of disease trajectories by exploiting multi-resolution structure, arXiv preprint arXiv:1601.04674 (2016). [Google Scholar]

[R11] 11.Tan TC, Fang H, Magder LS and Petri MA, Differences between male and female systemic lupus erythematosus in a multiethnic population, The Journal of Rheumatology (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] 12.Gutiérrez F, Masiá M, Mirete C, Soldán B, Carlos Rodr J Iguez, S. Padilla, I. Hernández, G. Royo and A. Martin-Hidalgo, The influence of age and gender on the population-based incidence of community-acquired pneumonia caused by different microbial pathogens, Journal of Infection 53, 166 (2006). [DOI] [PubMed] [Google Scholar]

[R13] 13.Kaufman CG, Schervish MJ and Nychka DW, Covariance tapering for likelihood-based estimation in large spatial data sets, Journal of the American Statistical Association 103, 1545 (2008). [Google Scholar]

[R14] 14.Yu W, Liu Y, Ma Z and Bi J, Improving satellite-based pm2.5 estimates in China using Gaussian processes modeling in a Bayesian hierarchical setting, Scientific Reports 7, p. 7048 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] 15.Hensman J, Lawrence ND and Rattray M, Hierarchical Bayesian modelling of gene expression time series across irregularly sampled replicates and clusters, BMC Bioinformatics 14, p. 252 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] 16.Qian PZG and Wu CFJ, Bayesian hierarchical modeling for integrating low-accuracy and high-accuracy experiments, Technometrics 50, 192 (2008). [Google Scholar]

[R17] 17.Ng JW and Deisenroth MP, Hierarchical mixture-of-experts model for large-scale Gaussian process regression, arXiv preprint arXiv:1412.3078 (2014). [Google Scholar]

[R18] 18.Damianou A and Lawrence ND, Deep Gaussian processes, in Artificial intelligence and statistics, 2013. [Google Scholar]

[R19] 19.Lee B-J, Lee J and Kim K-E, Hierarchically-partitioned Gaussian process approximation, in Artificial Intelligence and Statistics, 2017. [Google Scholar]

[R20] 20.Stein ML, Interpolation of spatial data: some theory for kriging (Springer Science & Business Media, 2012). [Google Scholar]

[R21] 21.Rasmussen CE and Ghahramani Z, Infinite mixtures of Gaussian process experts, Advances in neural information processing systems 2, 881 (2002). [Google Scholar]

[R22] 22.Zhao X, Fu Y and Liu Y, Human motion tracking by temporal-spatial local Gaussian process experts, IEEE Transactions on Image Processing 20, 1141 (2010). [DOI] [PubMed] [Google Scholar]

[R23] 23.feng Huang J, Cheng A, Kumar R, Fang Y, Chen G, Zhu Y and Lin S, Hypoalbuminemia predicts the outcome of COVID-19 independent of age and co-morbidity, Journal of Medical Virology (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R24] 24.Zou JY, Hsu DJ, Parkes DC and Adams RP, Contrastive learning using spectral methods, Advances in Neural Information Processing Systems 26, 2238 (2013). [Google Scholar]

[R25] 25.Abid A, Zhang MJ, Bagaria VK and Zou J, Exploring patterns enriched in a dataset with contrastive principal component analysis, Nature Communications 9, p. 2134 (May 2018). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R26] 26.Li D, Jones A and Engelhardt B, Probabilistic contrastive principal component analysis, arXiv preprint arXiv:2012.07977 (2020). [Google Scholar]

[R27] 27.Jones A, Townes FW, Li D and Engelhardt BE, Contrastive latent variable modeling with application to case-control sequencing experiments, arXiv preprint arXiv:2102.06731 (2021). [Google Scholar]

PERMALINK

Hierarchical Gaussian Processes and Mixtures of Experts to Model COVID-19 Patient Trajectories

Sunny Cui

Elizabeth C Yoo

Didong Li

Krzysztof Laudanski

Barbara E Engelhardt

Abstract

1. Introduction