SurvMaximin: Robust federated approach to transporting survival risk prediction models

Xuan Wang; Harrison G Zhang; Xin Xiong; Chuan Hong; Griffin M Weber; Gabriel A Brat; Clara-Lea Bonzel; Yuan Luo; Rui Duan; Nathan P Palmer; Meghan R Hutch; Alba Gutiérrez-Sacristán; Riccardo Bellazzi; Luca Chiovato; Kelly Cho; Arianna Dagliati; Hossein Estiri; Noelia García-Barrio; Romain Griffier; David A Hanauer; Yuk-Lam Ho; John H Holmes; Mark S Keller; Jeffrey G Klann MEng; Sehi L’Yi; Sara Lozano-Zahonero; Sarah E Maidlow; Adeline Makoudjou; Alberto Malovini; Bertrand Moal; Jason H Moore; Michele Morris; Danielle L Mowery; Shawn N Murphy; Antoine Neuraz; Kee Yuan Ngiam; Gilbert S Omenn; Lav P Patel; Miguel Pedrera-Jiménez; Andrea Prunotto; Malarkodi Jebathilagam Samayamuthu; Fernando J Sanz Vidorreta; Emily R Schriver; Petra Schubert; Pablo Serrano-Balazote; Andrew M South; Amelia LM Tan; Byorn WL Tan; Valentina Tibollo; Patric Tippmann; Shyam Visweswaran; Zongqi Xia; William Yuan; Daniela Zöller; Isaac S Kohane; Paul Avillach; Zijian Guo; Tianxi Cai; The Consortium for Clinical Characterization of COVID-19 by EHR 4CE

doi:10.1016/j.jbi.2022.104176

. Author manuscript; available in PMC: 2022 Nov 29.

Published in final edited form as: J Biomed Inform. 2022 Aug 23;134:104176. doi: 10.1016/j.jbi.2022.104176

SurvMaximin: Robust federated approach to transporting survival risk prediction models

Xuan Wang ^a, Harrison G Zhang ^b, Xin Xiong ^a, Chuan Hong ^b, Griffin M Weber ^b, Gabriel A Brat ^b, Clara-Lea Bonzel ^b, Yuan Luo ^c, Rui Duan ^d, Nathan P Palmer ^b, Meghan R Hutch ^c, Alba Gutiérrez-Sacristán ^b, Riccardo Bellazzi ^e, Luca Chiovato ^f, Kelly Cho ^g,^h, Arianna Dagliati ^e, Hossein Estiri ⁱ, Noelia García-Barrio ^j, Romain Griffier ^k,^l, David A Hanauer ^m, Yuk-Lam Ho ^h, John H Holmes ⁿ, Mark S Keller ^b, Jeffrey G Klann MEng ⁱ, Sehi L’Yi ^b, Sara Lozano-Zahonero ^o, Sarah E Maidlow ^p, Adeline Makoudjou ^o, Alberto Malovini ^q, Bertrand Moal ^k, Jason H Moore ^r, Michele Morris ^s, Danielle L Mowery ⁿ, Shawn N Murphy ^t, Antoine Neuraz ^u, Kee Yuan Ngiam ^v, Gilbert S Omenn ^w, Lav P Patel ^x, Miguel Pedrera-Jiménez ^j, Andrea Prunotto ^o, Malarkodi Jebathilagam Samayamuthu ^s, Fernando J Sanz Vidorreta ^y, Emily R Schriver ^z, Petra Schubert ^h, Pablo Serrano-Balazote ^j, Andrew M South ^aa, Amelia LM Tan ^b, Byorn WL Tan ^ab, Valentina Tibollo ^o, Patric Tippmann ^o, Shyam Visweswaran ^s, Zongqi Xia ^ac, William Yuan ^b, Daniela Zöller ^o, Isaac S Kohane ^b, Paul Avillach ^b,¹, Zijian Guo ^ad,¹, Tianxi Cai ^b,¹; The Consortium for Clinical Characterization of COVID-19 by EHR 4CE^b

^aDepartment of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, USA

^bDepartment of Biomedical Informatics, Harvard Medical School, Boston, MA, USA

^cDepartment of Preventive Medicine Northwestern University, Chicago, IL, USA

^dDepartment of Biostatistics, Harvard University, Boston, MA, USA

^eDepartment of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy

^fUnit of Internal Medicine and Endocrinology, Istituti Clinici Scientifici Maugeri SpA SB IRCCS, Pavia, Italy

^gPopulation Health and Data Science, VA Boston Healthcare System, Boston, MA, USA

^hMassachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, MA, USA

ⁱDepartment of Medicine, Massachusetts General Hospital, Boston, MA, USA

^jHealth Informatics, Hospital Universitario 12 de Octubre, Madrid, Spain

^kIAM unit, Bordeaux University Hospital, Bordeaux, France

^lINSERM Bordeaux Population Health ERIAS TEAM, ERIAS - Inserm U1219 BPH, Bordeaux, France

^mDepartment of Learning Health Sciences, University of Michigan, Ann Arbor, MI, USA

ⁿDepartment of Biostatistics, Epidemiology, and Informatics University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA

^oInstitute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center, University of Freiburg, Freiburg, Germany

^pMichigan Institute for Clinical and Health Research (MICHR) Informatics, University of Michigan, Ann Arbor, MI, USA

^qLaboratory of Informatics and Systems Engineering for Clinical Research, Istituti Clinici Scientifici Maugeri SpA SB IRCCS, Pavia, Italy

^rDepartment of Computational Biomedicine, Cedars-Sinai Medical Center, Los Angeles, CA, USA

^sDepartment of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA

^tDepartment of Neurology, Massachusetts General Hospital, Boston, MA, USA

^uDepartment of biomedical informatics, Hôpital Necker-Enfants Malade, Assistance Publique Hôpitaux de Paris (APHP), University of Paris, Paris, France

^vDepartment of Biomedical informatics, WiSDM, National University Health Systems, Singapore

^wDepts of Computational Medicine & Bioinformatics, Internal Medicine, Human Genetics, Public Health University of Michigan, Ann Arbor, MI, USA

^xDepartment of Internal Medicine, Division of Medical Informatics, University Of Kansas Medical Center

^yDepartment of Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA

^zData Analytics Center, University of Pennsylvania Health System, Philadelphia, PA, USA

^aaDepartment of Pediatrics-Section of Nephrology, Brenner Children’s, Wake Forest School of Medicine, Winston Salem, NC, USA

^abDepartment of Medicine, National University Hospital, Singapore

^acDepartment of Neurology, University of Pittsburgh, Pittsburgh, PA, USA

^adDepartment of Statistics, Rutgers, The State University of New Jersey, Piscataway, NJ, USA

These authors contributed equally.

Author contributions

XW, GMW, GAB, ISK, PA, and TC contributed to design and conceptualization of the study. GMW, GAB, YL, MRH, AGS, RB, LC, KC, AD, HE, NGB, RG, DAH, YLH, JHH, JGK, SEM, AM, AM, BM, JHM, MM, DLM, AN, KYN, LPP, MPJ, AP, MJS, FJSV, ERS, PS, PSB, ALMT, VT, PT, SV, ZX, DZ, ISK, and PA contributed to data collection. XW, HGZ, CH, GMW, GAB, CLB, YL, JHH, JGK, SLZ, AM, AM, LPP, AP, AMS, ALMT, BWLT, PT, WY, DZ, ISK, PA, ZG, and TC contributed to data analysis or interpretation. GMW, RB, JHM, DLM, SNM, AMS, WY, ISK, and ZG supplied grant funding for the work. All authors contributed to drafting the work or revising it critically for important intellectual content and approved the final version. All authors are responsible for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

PMCID: PMC9707637 NIHMSID: NIHMS1850166 PMID: 36007785

Abstract

Objective:

For multi-center heterogeneous Real-World Data (RWD) with time-to-event outcomes and high-dimensional features, we propose the SurvMaximin algorithm to estimate Cox model feature coefficients for a target population by borrowing summary information from a set of health care centers without sharing patient-level information.

Materials and Methods:

For each of the centers from which we want to borrow information to improve the prediction performance for the target population, a penalized Cox model is fitted to estimate feature coefficients for the center. Using estimated feature coefficients and the covariance matrix of the target population, we then obtain a SurvMaximin estimated set of feature coefficients for the target population. The target population can be an entire cohort comprised of all centers, corresponding to federated learning, or a single center, corresponding to transfer learning.

Results:

Simulation studies and a real-world international electronic health records application study, with 15 participating health care centers across three countries (France, Germany, and the U.S.), show that the proposed SurvMaximin algorithm achieves comparable or higher accuracy compared with the estimator using only the information of the target site and other existing methods. The SurvMaximin estimator is robust to variations in sample sizes and estimated feature coefficients between centers, which amounts to significantly improved estimates for target sites with fewer observations.

Conclusions:

The SurvMaximin method is well suited for both federated and transfer learning in the high-dimensional survival analysis setting. SurvMaximin only requires a one-time summary information exchange from participating centers. Estimated regression vectors can be very heterogeneous. SurvMaximin provides robust Cox feature coefficient estimates without outcome information in the target population and is privacy-preserving.

1. Introduction

Electronic health records (EHR) have been widely adopted in the U. S. and other countries [1-5]. The EHR contains a wealth of patient medical information collected over time by health care providers, and common structured data types include demographics, diagnoses, laboratory test results, medications, and vital signs. Given its longitudinal nature, EHR data have been utilized for various research purposes, including survival analysis [6-8]. For example, the Cox proportional hazards model is used commonly and has been applied to EHR risk prediction [9].

With the increasing availability of EHR data, there is a great interest in integrating knowledge from a diverse range of health care centers to improve generalizability and accelerate discoveries. There now exist multiple collaborative consortia each composed of diverse health care centers seeking to leverage their EHR data in unison. For example, the Consortium for Clinical Characterization of COVID-19 by EHR (4CE consortium) is an international research collaborative that collects patient-level EHR data to study the epidemiology and clinical course of COVID-19 [10]. The consortium comprises more than 300 hospitals across seven countries with 83,178 patients, representing a broad range of multi-national health care centers serving diverse patient populations.

However, EHR data obtained from multiple diverse health care centers often exhibit a high degree of heterogeneity due to variability in EHR and data warehouse platforms, patient populations, health care practices, coding, and documentation. Further, patient-level data often cannot be shared directly between health care centers in a timely manner due to patient and institutional privacy laws [11]. Thus, there is a need for robust analytic strategies to overcome the barriers to conduct multi-center EHR studies.

Our objective is to jointly leverage multi-center, high-dimensional EHR data to make more precise inferences for a target population in the survival analysis setting by sharing only summary statistics obtained from each center, such as Cox feature coefficients and covariance matrices. The target population may be the entire population inclusive of all centers, a subset of centers, or a new, separate population. Integrative analysis approaches that only require individual sites to share summary statistics are often referred to as federated learning [12-14].

Most existing federated learning methods focus on settings with a small number of predictors and/or homogeneous settings where the underlying predictive models are shared across sites [12-14]. In addition, existing methods generally require several rounds of communication between sites, which can be inefficient and labor-intensive. To ensure transportability of models across sites, transfer learning methods have been proposed to transfer knowledge from separate but related centers to provide robust and precise estimates for patients in a new center. This approach has widespread applications in medical studies such as drug sensitivity prediction, integrative analysis of “multi-omics” data, and natural language processing [15-18]. However, most transfer learning methods require outcome labels from the target population, which may be difficult and expensive to obtain, and do not consider the federated learning scenario where individual-level data cannot be shared across sites. In the absence of outcome labels in the target population, transfer learning methods require stringent assumptions that the target and source populations share the same underlying risk model, leading to potential transfer failure when the risk model for the target population is similar to only a subset of source populations [19-21].

With heterogeneous training datasets from multiple centers, one potential limitation of the existing federated transfer learning methods is that the performance of the prediction model can vary substantially across centers. Thus, although the overall performance may be satisfactory, the performance of the model in a particular center might be low. Moreover, when trained models are applied to a new population, transferability and portability are not guaranteed. To improve the robustness of prediction models, the maximin effect approach was first proposed in [22-24], and used as a metric to build a robust prediction model for continuous outcomes across heterogeneous training datasets. Instead of optimizing the average performance across all training datasets, the maximin effect method aims to train a model that maximizes the minimum gain over the null model among all training datasets. The maximin approach was further extended to a setting that allows for covariate shift between the source and target populations [25]. The group distributional robustness optimization in [26,27] is closely related to the maximin effect, which builds a robust prediction model by minimizing the worst-case training loss over a class of distributions. The maximin projection has been developed in [28] to construct the optimal treatment regimen for new patients by leveraging training data from different groups with heterogeneity in optimal treatment decision.

In this paper, motivated by the maximin algorithm for continuous outcomes in [22,25], we propose a maximin transfer learning algorithm for predicting a survival outcome (SurvMaximin) in a target population with high-dimensional features by robustly combining multiple prediction models trained in different source populations. This algorithm only requires sharing of summary statistics across centers and can easily accommodate high-dimensional features. SurvMaximin can be viewed as a robust federated approach to transfer models trained at multiple external centers to a target population, so we refer to it as a federated transfer learning method. SurvMaximin differs from existing transfer learning methods in that it does not require the target population to share the same underlying model with the source population, a highly desirable property when learning with multiple heterogeneous health care systems. The training of the SurvMaximin algorithm also does not require the target population to have gold-standard outcome labels.

2. Methods

2.1. SurvMaximin: Federated robust transfer learning for survival outcomes

The main aim of the SurvMaximin algorithm is to derive a robust risk prediction model for an unlabeled target population based on labeled data from the L source populations under the data sharing constraints. Suppose there are L source populations, indexed by l ∈ {1, …, L}, representing L studies and one target population, denoted by Q. The observed data from the l th source population consists of n_l independent and identically federated random vectors $𝒟_{l} = {(Z_{l i}, X_{l i}, δ_{l i})}_{1 \leq i \leq n_{l}}$ , where Z_li denotes the p-dimensional standardized baseline risk factors with p potentially large relative to the sample sizes, the censored survival times are observed as X_li = min(T_li, C_li) and δ_li = I(T_li ≤ C_li) with T_li and C_li denoting the survival time and follow up time for the i th subject in the lth population, respectively, and I(·) is an indicator function which equals to 1 if the inside is true and equals to 0 otherwise. In the target population Q, only the baseline features {Z_Qi}_{1≤i≤n_Q} are observed. We assume that the survival time in the l th source population, T_l, given the baseline features Z_l follows a Cox proportional hazards model, $Λ_{l} (t ∣ Z_{l i}) = Λ_{0, l} (t) exp (b_{l}^{T} Z_{l i})$ , which can be equivalently expressed [29] as.

{\tilde{T}}_{l i} \equiv \log Λ_{0, l} (T_{l i}) = - b_{l}^{T} Z_{l i} + \in_{l i} with \in_{l i} ⊥ Z_{l i} and P (\in_{l i} > x) = exp {- exp (x)}

(1)

where Λ_l(t∣Z_li) is the conditional cumulative hazard function given Z for the l th population, Λ_0,l(t) is the cumulative baseline hazard function, b_l ∈ R^p denotes the vector of unknown log hazard ratio parameters associated with the risk factors Z, and ⊥ denotes independence, ∈ is a random variable with distribution specified in (1). We assume that distributions of the baseline risk factors, hazard ratio parameters, and the baseline hazard functions may vary across the source populations due to study heterogeneity. Similarly, we assume that the survival time from the target population, T_Q, follows a Cox model with unknown baseline cumulative hazard Λ_0,Q(·) and feature effect function β_Q:

{\tilde{T}}_{Q i} \equiv \log Λ_{0, Q} (T_{Q i}) = - β_{Q}^{T} Z_{Q i} + \in_{Q i} with \in_{Q i} ⊥ Z_{Q i} and P (\in_{Q i} > x) = exp {- exp (x)} .

(2)

The SurvMaximin algorithm aims to identify a robust approximation to β_Q based on the estimated hazard ratio parameters trained from ${𝒟_{l}}_{1 \leq l \leq L}$ as well as the target feature distribution. Due to the lack of gold standard labels on Q and the unspecified heterogeneity among {b_l}_1≤l≤L and β_Q, the target β_Q cannot be identified with the observed data. Instead of targeting β_Q directly, the central idea of the SurvMaximin algorithm is to identify an approximation to β_Q that maximizes the minimum reward across all L source populations. Following [25], we define the hypothetical outcomes for the n_Q subjects in Q generated from the lth source models as.

{\tilde{T}}_{_{l i}}^{^{⋆}} \equiv - b_{l}^{T} Z_{Q i} + \in_{l i}^{⋆} with \in_{l i}^{*} ⊥ Z_{Q i} and P (\in_{l i}^{⋆} > x) = exp {- exp (x)} for 1 \leq l \leq L,

(3)

where ${\tilde{T}}_{l i}^{^{⋆}}$ can be viewed as the hypothetical outcome (transformed survival time) if the individual, Z_Qi was assigned to the l th source population. Then we define a robust prediction model as.

β_{Q}^{⋆} = \underset{β \in R^{p}}{arg \max} R_{Q} (β) with R_{Q} (β) = min_{1 \leq l \leq L} {E {({\tilde{T}}_{_{l i}}^{^{^{^{⋆}}}})}^{2} - E {({\tilde{T}}_{_{l i}}^{^{^{^{⋆}}}} + β^{T} Z_{Q i})}^{2}}

(4)

the expectation is taken with respect to Z_Qi and ${{\tilde{T}}_{_{_{l i}}}^{^{^{^{⋆}}}}}_{1 \leq l \leq L}$ defined in (3). Such a covariate shift maximin effect was defined in [25] for the linear model and is now extended to the Cox regression model. We shall note that $E {({\tilde{T}}_{_{_{l i}}}^{^{^{^{⋆}}}})}^{2} - E {({\tilde{T}}_{_{_{l i}}}^{^{^{^{⋆}}}} + β^{T} Z_{Q i})}^{2}$ is a reward function of β, which represents the variance of ${\tilde{T}}_{l i}^{^{⋆}}$ explained by the linear prediction −β^TZ_Qi. Our targeted maximin effect is maximizing the adversarial reward R_Q(β) across the L groups. The SurvMaximin estimate $β_{Q}^{^{⋆}}$ leads to a robust prediction model since the optimization in (4) guards against the worst-case scenario. The maximin effect can be interpreted from an adversarial perspective [23]: in a two-side game, we select an effect vector β and the counter agent then chooses the most challenging scenario for this β; that is, choose the source population such that β has the worst predictive performance. Our goal is to choose β such that the worst-case reward with respect to predicting the transformed survival time returned by the counter agent is maximized.

As shown in the Supplementary Materials, R_Q(β) can be equivalently expressed as:

R_{Q} (β) = min_{1 \leq l \leq L} {E {(b_{l}^{T} Z_{Q i})}^{2} - E {(b_{l}^{T} Z_{Q i} - β^{T} Z_{Q i})}^{2}} = min_{b \in B} {2 b^{T} Σ_{Q} β - β^{T} Σ_{Q} β},

(5)

where Σ_Q = E[Z_Qi(Z_Qi)^T], and $B = {b_{l}, l = 1, \dots, L}$ . Following [25],we may show that the maximin effect $β_{Q}^{^{⋆}}$ as defined by (4) can be expressed as a weighted average of {b_l}_1≤l≤L,

β_{Q}^{⋆} = B γ_{Q}^{⋆} with γ_{Q}^{⋆} = \underset{γ : ‖ γ_{l} ‖_{1} = 1, γ_{l} \geq 0}{argmin} γ^{T} Γ_{Q} γ

(6)

where B_p×L = [b₁, …, b_L], ∥·∥_q denotes the L_q norm, Γ_Q = B^TΣ_QB is a similarity matrix, and the minimization above is restricted to the simplex in L-dimension space. The optimal aggregation weight $γ_{Q}^{^{⋆}}$ in (6) depends on both {b_l}_1≤l≤L and the covariance matrix Σ_Q for the target population. The identification equation (6) reveals an important geometric interpretation of the maximin effect: $β_{Q}^{^{⋆}}$ is the point that has the smallest distance to the origin and lies on the convex combination of the regression vectors {b_l}_1≤l≤L [22]. The maximin estimator tends to shrink the components of {b_l}_1≤l≤L whose estimated coefficients vary with different signs across studies to zero and is not as sensitive to the inclusion of sites with an extreme hazard ratio regression vector [22]. In the transfer learning setting, we incorporate the target distribution Z_Q into the definition of the distance.

2.2. Implementation of the SurvMaximin algorithm

The SurvMaximin algorithm involves three key steps: (I) locally train the prediction model for each of the L source sites to obtain {b_l}_1≤l≤L; (II) estimate the covariance matrix Σ_Q and obtain a similarity matrix among ${{\hat{b}}_{l}^{T} Z_{Q}, l = 1, \dots, L}$ , denoted by ${\hat{Γ}}_{Q}$ ; and (III) obtain the final SurvMaximin estimator as an optimal linear combination of ${{\hat{b}}_{l}}_{1 \leq l \leq L}$ according to (6). The schema of the SurvMaximin algorithm is shown in Fig. 1.

Fig. 1. — Schematic of SurvMaximin algorithm for federated transfer learning.

Step I: Training L local risk prediction models.

We first obtain ${\hat{b}}_{l}$ as the maximizer of the penalized partial likelihood.

{\hat{b}}_{l} = \underset{b}{arg \max} {ℓ_{l} (b) + λ_{l} 𝒫 (b)}

(7)

where $ℓ_{l} (b)$ is the log partial likelihood associated with $𝒟_{l}$ and $𝒫 (b) = α ‖ b ‖_{1} + (1 - α) ‖ b ‖_{2}^{2}$ is the elastic net penalty function, which is frequently used to overcome high dimensionality and collinearity of features with α = 1 corresponding to the standard LASSO and α = 0 corresponding to the ridge penalty [30]. The non-negative penalty parameter λ_l can be selected via standard tuning criteria including the AIC, BIC, or cross-validation.

Step II: Estimate the similarity matrix among. ${b_{l}^{T} Z_{Q}, l = 1, \dots, L}$

We estimate the similarity matrix of ${b_{l}^{T} Z_{Q}, l = 1, \dots, L}$ , $Γ_{Q} = B^{T} Σ_{Q} B$ , as ${\hat{Γ}}_{Q} = {\hat{B}}^{T} {\hat{Σ}}_{Q} \hat{B}$ , where ${\hat{B}}_{p \times L} = [{\hat{b}}_{1}, \dots, {\hat{b}}_{L}]$ and ${\hat{Σ}}_{Q}$ is the empirical variance covariance matrix of Z_Q estimated based on the unlabeled target population data.

Step III: Maximin aggregation via (6).

Finally, we obtain the SurvMaximin aggregated log hazard ratio estimator as.

{\hat{β}}_{Q} = \hat{B} {\hat{γ}}_{Q} with {\hat{γ}}_{Q} = \underset{γ : ‖ γ_{l} ‖_{1} = 1, γ_{l} \geq 0}{argmin} γ^{T} {\hat{Γ}}_{Q} γ + η ‖ γ ‖_{_{2}}^{^{2}},

(8)

where η ≥ 0 is the tuning parameter and the ridge penalty is included to account for the potential high collinearity among ${{\hat{b}}_{l}}_{1 \leq l \leq L}$ . See Supplementary Materials for additional information on data adaptive approach to selecting η. In practice, we find that when there is some heterogeneity observed as in our 4CE studies, setting η = 0 works well and the results are not sensitive to the choice of η when a relatively small η is chosen.

2.3. Transfer to a target site with missing features

A substantial challenge in transfer learning across different health care centers is that certain risk predictors, such as laboratory test results or demographic information, may be available in one center but not in a different center. For example, in the 4CE consortium, all U.S. centers report data on race while European centers do not, causing race data to be entirely missing for European centers. To transport a risk prediction model for a target center Q with only a subset of features available, one may fit a reduced model limited to only the available features for each source center and transport the reduced risk models from the source centers. When the target center changes, essentially, we will need to retrain the model at each source center according to the feature availability of the target center. Such an approach is not computationally efficient as each center needs to fit multiple models, and also increases the number of communications required across centers.

To enable transfer learning in the context of differential feature availability, we propose a simple projection approach that only requires each source center to additionally compute the empirical covariance matrix of the features, ${{\hat{Σ}}_{l}}_{l = 1, \dots, L}$ , where ${\hat{Σ}}_{l} = n_{l}^{- 1} \sum_{i = 1}^{n_{l}} Z_{l i} Z_{l i}^{T}$ . Let $𝒜 \subset {1, \dots, p}$ index features that are available at the target site, $𝒜^{^{c}} = {1, \dots, p} ∖ 𝒜$ . Let $Z^{[𝒜]}$ denote the subvector Z corresponding to $𝒜$ . The key step is to project ${\hat{b}}_{l}^{T} Z_{l}$ to the subspace spanned by $Z_{l}^{[𝒜]}$ , ${\hat{b}}_{l}^{T} Z_{l} = {\hat{θ}}_{l}^{T} Z_{l}^{[𝒜]} + e_{l}$ , and predict T_l based on ${\hat{θ}}_{l}^{T} Z_{l}^{[𝒜]}$ . Since the features are all assumed to be centered, we obtain ${\hat{θ}}_{l} = {\hat{b}}_{l}^{[𝒜]} + {\hat{α}}_{l} {\hat{b}}_{l}^{[𝒜^{c}]}$ and ${\hat{α}}_{l} = {({\hat{Σ}}_{l}^{[𝒜, 𝒜]})}^{- 1} {\hat{Σ}}_{l}^{[𝒜, 𝒜^{c}]}$ , where ${\hat{Σ}}_{l}^{[𝒜, 𝒜]}$ and ${\hat{Σ}}_{l}^{[𝒜, 𝒜^{c}]}$ denote the submatrices of Σ_l corresponding to ${𝒜, 𝒜}$ and ${𝒜, 𝒜^{^{c}}}$ . The final SurvMaximin estimator for the feature effects of $Z_{Q}^{[𝒜]}$ can be constructed by replacing ${{\hat{b}}_{l}}_{1 \leq l \leq L}$ with ${{\hat{θ}}_{l}}_{l = 1, \dots, L}$ and ${\hat{Σ}}_{Q}$ with ${\hat{Σ}}_{Q}^{[𝒜, 𝒜]}$ . If $Z_{l}^{[𝒜]}$ is of high dimension or the vectors in $Z_{l}^{[𝒜]}$ are colinear, in which case the covariance ${\hat{Σ}}_{l}^{[𝒜, 𝒜]}$ is not invertible, we can apply regularization methods in [31,32] to stably invert ${\hat{Σ}}_{l}^{[𝒜, 𝒜]}$ and construct ${\hat{α}}_{l}$ .

2.4. Validation of SurvMaximin algorithm

We validated the performance of SurvMaximin in federated transfer learning using both simulation studies and a real-world study where we transported COVID-19 mortality risk prediction models to target centers using EHR data from hospitalized patients with COVID-19.

2.4.1. Simulation studies

Simulation studies were conducted to assess the performance of SurvMaximin and to compare its performance against existing federated learning methods. Since SurvMaximin transports a risk prediction model to a future target center without survival outcomes observed, we used other federating learning methods that also do not require supervised training on the target data as comparisons. Specifically, we consider the standard random effect meta-analysis estimator (herein referred to as Meta); the One-shot Distributed Algorithm (ODAC) for the Cox model [12]; and the locally trained risk prediction model with varying training sizes of n_Q = 200, 400, and 600. We considered simulation scenarios with L = 15 centers each with sample size $n_{l} = 300^{⋆} [\frac{l}{3}]$ , and p = 20 or 50 features in the risk prediction model.

We generated Z_l from a multivariate normal distribution MVN(0, Σ), where Σ is either $Σ = {[{0.5}^{- ∣ j - j^{'} ∣}]}_{j = 1, \dots, p}^{j^{’} = 1, \dots, p}$ with an autoregressive correlation (AR) structure, or where Σ is a compound symmetry covariance matrix with variance 1 and covariance 0.5. We then generated Z_Q from MVN(0, Σ_Q) with Σ_Q = 0.1 + Σ. Subsequently, we generated T_l and T_Q from:

2 \log T_{l i} = - \log {0.125 (1 + 0.05 l)} - b_{l}^{T} Z_{l i} + \in_{l i}, l = 1, \dots, L; 2 \log T_{Q i} = - \log 0.225 - b_{l}^{T} Z_{Q i} + \in_{Q i},

where ε_li and ε_Qi were generated from extreme value distributions. We let.

b_{Q} = {[β_{8 \times 1}^{†}, 0_{1 \times (p - 8)}]}^{T}, b_{l} = {[β_{8 \times 1}^{†} + e_{l}^{†}, 0_{1 \times (p - 8)}]}^{T}, e_{l}^{†} = [e_{l 1}, \dots, e_{l 8}]^{T}

and consider a range of scenarios for β^† and {e_l}_l=1,…,L to explore how the signal strength, the heterogeneity among the source sites, as well as the degree of similarity between the target site and the source sites affect the performance of SurvMaximin relative to other methods. Specifically, we consider β^† = [0.5, 0.4, 0.3, 0.2, −0.2, −0.3, −0.4, −0.5]^T and [0.25, 0.2, 0.15, 0.1, −0.1, −0.15, −0.2, −0.25]^T to represent moderate and weak signals. We consider two settings for ${e_{l}^{†}}$ under each with three levels of heterogeneity among source sites for e_l. In setting (I), we let e_lj = τ{1(l ≤ 5) +3I(l > 5)}(−1)^j, which results in the first 5 sites being more similar to the target site than the remaining 10 sites. In setting (II), we let e_lj = τ(l−1) such that a majority of the source sites are substantially different from the target site. We let τ = 0.05, 0.1, and 0.2 to reflect a low, medium and high degree of heterogeneity among the source sites. As τ increases, the target site also becomes more dissimilar to the source sites. We generated censoring time from Exponential (1) distribution, leading to about 20 % to 30 % event rates across the L source sites under setting (I) and 20 % to 40 % under setting (II).

To evaluate the performance of SurvMaximin for missing features, we considered setting (I) with moderate signal, covariance matrix Σ being AR (1), p = 20, and τ = 0.05, 0.1, 0.2. We let the first feature of the target site be missing and calculated the projected SurvMaximin estimator as described in Section 2.3, denoted by SurvMaximin_project. For comparison, we also fitted penalized Cox models to each site with covariates $Z^{[𝒜]}$ with $𝒜 = {2, \dots, p}$ to obtain the corresponding effect estimates ${\hat{b}}_{l}^{[𝒜]}$ for $Z^{[𝒜]}$ ; and then constructed SurvMaximin based on ${{\hat{b}}_{l}^{[𝒜]}}_{l = 1, \dots, L}$ . As naïve benchmarks, we additionally constructed the ODAC and Meta models based on Z and transported these models to the target site by removing the component associated with the first covariate. Such naive approaches are often adopted in practice due to the inability to refit the reduced models on source sites.

We also generated censoring time from Exponential (2.5) to consider scenarios with rare events, ranging from 4 % to 10 % for the 15 sites under setting (I) and 3 % to 35 % under setting (II).

We evaluate the overall performance of the estimated risk score from each method in predicting the survival time T_Q for the target site, based on the survival C-statistic with a truncation time close to the largest observed survival time from the target site [33]. We estimate the C-statistics based on an independent validation data of size N_Q = 2,000 generated from the target distributions. For each configuration, we summarize results based on 500 iterations.

2.4.2. Improving Cross-system portability of COVID-19 mortality risk prediction models with SurvMaximin

We further validated the performance of SurvMaximin by deriving robust and transportable mortality risk prediction models for patients hospitalized with COVID-19 using international, multi-institutional EHR data from the 4CE consortium [10,34]. Baseline risk factors and mortality information were available for 83,178 patients from L₀ = 17 participating health care centers of the consortium across three countries: France, Germany, and the U.S. Eligibility criteria for the study included a positive SARS-CoV-2 reverse transcription polymerase chain reaction (PCR) test result; an admission date between March 1, 2020 and January 31, 2021; and the admission occurring 7 days before to 14 days after the date of their first positive PCR test result recorded in their EHR. Each health care center performed analyses locally and then reported summary results to the central institution. We considered each of the individual health care centers as a potential target population and sought to derive a mortality risk prediction model that is transportable to this population from multiple external models. Given the multinational nature of our data, we anticipated a significant amount of between-health care center heterogeneity in their mortality risk models.

Baseline risk predictors considered included: age groups (18–25, 26–49, 50–69, 70–80, 80+), sex, and race (White, Black, Asian, Hispanic, and other); the pre-admission Charlson comorbidity index (CCI) derived from diagnostic codes; and laboratory test values at admission [35]. We focused on 10 commonly measured laboratory tests (with missing rates < 30 %), including C-reactive protein (CRP), albumin, aspartate aminotransferase (AST), AST to alanine aminotransferase ratio (AST/ALT), total bilirubin, creatinine, d-dimer, white blood cell count (WBC), lymphocyte count, and neutrophil count. Values of AST, d-dimer, and CRP were log-transformed due to their skewed distributions. Missing baseline laboratory values and CCI were imputed via the multivariate imputation by chained equation method and averaged over five imputed sets [36]. In total, we considered p = 19 potential risk predictors. A few predictors, including race data for the European centers, were not available (Supplementary Figure S4). When a variable was not ascertained at a site, the local Cox model fitting excluded it. We derived and evaluated prediction models for all-cause mortality by 3, 7, and 14 days after the admission date. We excluded patients who died on the day of admission in the survival analysis.

For each L₀ = 17 health care centers, we transported mortality risk prediction models trained from external analyses via SurvMaximin to the patient population in this center. Specifically, for the lth healthcare center, we fit LASSO penalized Cox models to estimate the effect of Z_l coefficients $\hat{b_{l}}, l = 1, \dots, L_{0}$ on survival outcome. For l = 1, …, L₀, we let the l th site be the target site and then trained the SurvMaximin algorithm based on the source data from all remaining sites that have predictors $Z^{𝒜_{l}}$ ascertained, where $Z^{𝒜_{l}}$ denotes the covariate vector that is available at site l. We used the proposed projection method when the target center had an incomplete set of features. After obtaining the SurvMaximin risk model for the target center, we compared the SurvMaximin risk model against each of the L₀ supervised locally trained models with respect to the accuracy in predicting t₀ = 3, 7, 14-day mortality in the target population. We quantified the accuracy of predicting t₀-day mortality based on the area under the receiver characteristic curve (AUC). We repeated this analysis for all L₀ = 17 centers each time considering one of them as the target population Q.

3. Results

3.1. Results for Simulation studies

Simulation results are summarized in Fig. 2 for the moderate signal scenario. In setting (I), where 5 source sites have feature coefficients like the target site, SurvMaximin results in models with accuracy comparable to those from ODAC and Meta when the heterogeneity is low (τ = 0.05, 0.1) and outperforms other methods when the heterogeneity is high (τ = 0.2). Since there are 5 source sites relatively like the target site, the transported model from SurvMaximin attained accuracy higher than the locally trained model with n_Q = 200 and comparable to those trained with n_Q = 600. When p or the correlation among the features increases, the estimated models generally attain lower prediction performance. Nevertheless, SurvMaximin remains to attain a robust performance relative to the other federated learning methods across different levels of heterogeneity.

In setting (II), where only 1 other site has similar feature coefficients to the target site, SurvMaximin exhibits a substantially better predictive performance when compared to Meta and ODAC across all settings, further highlighting the robustness of SurvMaximin to varying degrees of similarity between the target site and the source sites. Across all levels of heterogeneity, the Meta and ODAC estimators suffer from very small C-statistics indicating poor predictive performance. We observed similar trends regardless of using p = 20, 50 features or covariance matrix structure. Further, the performance of SurvMaximin remains better than the supervised model trained withn_Q = 200 labeled target site data and comparable to the locally trained models with n_Q = 400 and 600. This suggests that SurvMaximin may improve estimation performance when the target population sample size is small.

With weaker signals, the cross-site heterogeneity is more pronounced, leading to more apparent distinctions between SurvMaximin and other federated methods (Figure S1 of the Supplement). Only when the heterogeneity is very low with τ = 0.05 under Setting (I), all methods perform similarly. Under all other settings, SurvMaximin substantially outperforms ODAC and Meta. With weaker signals, locally trained models also require a larger sample size to attain performances comparable to SurvMaximin. This further illustrates the advantage of transporting existing models in a robust fashion over training a supervised model when the training sample size is not large relative to the feature dimension. Results for low event rates are shown in Figure S2, which show similar trends.

Results for assessing the performance of the projected SurvMaximin algorithm in the presence of missing features are summarized in Figure S3. The projected SurvMaximin model attains prediction performance comparable to the SurvMaximin model trained by aggregating the locally fit sub-models with Z. Thus, the projection method provides a comparable alternative SurvMaximin estimator when features may be missing for some sites, without the need to unify the set of features for all the centers all the time. The projected SurvMaximin estimator also outperforms the naïve approach of removing the component associated with the first covariate from the ODAC or Meta estimators.

3.2. Results for transporting COVID mortality risk models

For each covariate, we compared the L₀ local estimates of its log hazard ratio to those based on SurvMaximin in Fig. 3. While these two sets of estimators are generally consistent, SurvMaximin estimators tend to be more concentrated at the center, while local estimators exhibit higher variability in part due to unstable estimates from some sites. For example, the log hazard ratio (HR) of the age group (18–25) ranges from −6.58 to 0 for the local estimates while the SurvMaximin estimates range from −1.43 to −0.7.

The AUC estimates associated with the risk models obtained based on SurvMaximin and local supervised training for predicting 3, 7, and 14-day mortality are shown in Fig. 4. For each site, we also compared the AUC of models trained in each of the external sites, the locally trained model, and the SurvMaximin model for predicting 14-day mortality (Figure S5). The accuracy of the risk models transported by SurvMaximin, which does not utilize the outcome information of the target local site, is comparable or even sometimes higher than that of the locally trained models. The AUCs of SurvMaximin are more concentrated at a comparatively higher AUC, suggesting the robustness of the SurvMaximin approach.

Fig. 4. — Density plots of estimated AUCs for models trained via local estimation versus SurvMaximin for healthcare systems l = 1, …, L₀ = 17.

4. Discussion

We proposed the SurvMaximin approach to deriving a robust risk prediction model for a target population by robustly synthesizing information from estimated risk models from multiple sites. For the target site, the SurvMaximin estimator ${\hat{β}}_{m a x i m i n}$ is a linear combination of the coefficient estimators of the local sites ${{\hat{b}}_{l}}_{1 \leq l \leq L}$ , or it lies in the convex hull of all the ${{\hat{b}}_{l}}_{1 \leq l \leq L}$ and is closest to a zero point with respect to some distance related to the target population. The method enables us to safely transport a set of existing risk models to a target population in the presence of high cross-site heterogeneity.

Compared with existing federated learning methods, such as Meta and the federated learning methods proposed by [12], the proposed maximin method can handle high-dimensional covariates and is very robust to heterogeneity between sites. It is also robust to sample size differences and improves the inference when the sample size of the target population is small as seen from the simulation studies. The SurvMaximin algorithm is efficient in time and cost as it only requires one-time sharing of the summary statistics. Compared with existing transfer learning methods, the proposed maximin method can help to preserve the privacy and confidentiality of patients in different centers. Further, it requires less information, including only feature information of the target site and feature effect estimators from other sites. Thus, SurvMaximin is very flexible and generalized such that it can adapt to a variety of scenarios while achieving high accuracy with limited information.

5. Conclusion

In this paper, we developed a SurvMaximin covariate effect estimator for multi-center survival data with high-dimensional covariates. Simulation studies and real EHR data analysis show that the proposed estimator achieves high accuracy in a range of settings with different levels of heterogeneity between sites and different sample sizes. The SurvMaximin is a highly flexible and robust approach for multi-center survival analysis, which enables federated learning, transfer learning, as well as federated transfer learning.

Supplementary Material

S-Fig3

NIHMS1850166-supplement-S-Fig3.pdf^{(145.1KB, pdf)}

S-Fig1

NIHMS1850166-supplement-S-Fig1.pdf^{(397.4KB, pdf)}

S-Fig2

NIHMS1850166-supplement-S-Fig2.pdf^{(393.2KB, pdf)}

S-Fig4

NIHMS1850166-supplement-S-Fig4.pdf^{(383.2KB, pdf)}

S-Fig5

NIHMS1850166-supplement-S-Fig5.pdf^{(384.9KB, pdf)}

S-Fig6

NIHMS1850166-supplement-S-Fig6.pdf^{(423.8KB, pdf)}

S-Fig7

NIHMS1850166-supplement-S-Fig7.pdf^{(540KB, pdf)}

S-Fig8

NIHMS1850166-supplement-S-Fig8.pdf^{(174.7KB, pdf)}

S-Fig9

NIHMS1850166-supplement-S-Fig9.pdf^{(254.9KB, pdf)}

S-Fig10

NIHMS1850166-supplement-S-Fig10.pdf^{(394.3KB, pdf)}

Acknowledgement

GMW is supported by National Institutes of Health (NIH)/ National Center for Advancing Translational Sciences (NCATS) UL1TR002541, NIH/NCATS UL1TR000005, NIH/National Library of Medicine (NLM) R01LM013345, NIH/ National Human Genome Research Institute (NHGRI) 3U01HG008685-05S2. YL is supported by NIH/NCATS U01TR003528, and NLM 1R01LM013337. KC is supported by VA MVP000 and CIPHER. NGB is supported by PI18/00981, funded by the Carlos III Health Institute. DAH is supported by NCATS UL1TR002240. MSK is supported by NHGRI 5T32HG002295-18. JHM is supported by NLM 010098. MM is supported by NCATS UL1TR001857. DLM is supported by NIH/NCATS CSTA Award #UL1-TR001878. SNM is supported by NCATS 5UL1TR001857-05 and NHGRI 5R01HG009174-04. GSO is supported by NIH U24CA210867 and P30ES017885. LPP is supported by NCATS Clinical and Translational Science Award (CTSA) Award #UL1TR002366. FSJV is supported by NIH/NCATS UL1TR001881. AMS is supported by NIH/ National Heart, Lung, and Blood Institute (NHLBI) K23HL148394 and L40HL148910, and NIH/NCATS UL1TR001420. SV is supported by NCATS UL1TR001857. ZX is supported by National Institute of Neurological Disorders and Stroke (NINDS) R01NS098023. WY is supported by NIH T32HD040128.

IRB Approval was obtained at Assistance Publique - Hôpitaux de Paris, Beth Israel Deaconess Medical Center, Bordeaux University Hospital, Instituti Instituti Clinici Scientifici Maugeri Hospitals, University of Kansas Medical Center, Massachusetts General Brigham, Northwestern University, Medical Center University of Freiburg, University of Pittsburgh, and VA North Atlantic, Southwest, Midwest, Continental and Pacific. An exempt determination was made by Institutional Review Boards at Hospital Universitario 12 de Octubre, University of California Los Angeles, University of Michigan, and University of Pennsylvania.

Footnotes

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Appendix A. Supplementary material

Supplementary data to this article can be found online at https://doi.org/10.1016/j.jbi.2022.104176.

References

[1].Torda P, Han ES, Scholle SH, Easing the adoption and use of electronic health records in small practices, Health Aff. 29 (4) (2010) 668–675. [DOI] [PubMed] [Google Scholar]
[2].Decker SL, Jamoom EW, Sisk JE, Physicians in nonprimary care and small practices and those age 55 and older lag in adopting electronic health record systems, Health Aff. 31 (5) (2012) 1108–1114. [DOI] [PubMed] [Google Scholar]
[3].Kim Y-G, Jung K, Park Y-T, Shin D, Cho SY, Yoon D, Park RW, Rate of electronic health record adoption in South Korea: a nation- wide survey, Int. J. Med. Inf 101 (2017) 100–107. [DOI] [PubMed] [Google Scholar]
[4].Tavares J, Oliveira T, Electronic health record portal adoption: a cross country analysis. In: BMC medical informatics and decision making 17.1 (2017), pp. 1–17. [DOI] [PMC free article] [PubMed] [Google Scholar]
[5].Kose I, Rayner J, Birinci S, Ulgu MM, Yilmaz I, Guner S, Mahir SK, Aycil K, Elmas BO, Volkan E, Altinbas Z, Gencyurek G, Zehir E, Gundogdu B, Ozcan M, Vardar C, Altinli B, Hasancebi JS, Adoption rates of electronic health records in Turkish Hospitals and the relation with hospital sizes, BMC Health Services Res. 20 (1) (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
[6].Murphy SN, et al. , Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2), J. Am. Med. Inform. Assoc 17 (2) (2010) 124–130. [DOI] [PMC free article] [PubMed] [Google Scholar]
[7].Hagar Y, Albers D, Pivovarov R, Chase H, Dukic V, Elhadad N, Survival analysis with electronic health record data: Experiments with chronic kidney disease: Survival Analysis of EHR CKD Data, Statistical Analy Data Mining 7 (5) (2014) 385–403. [DOI] [PMC free article] [PubMed] [Google Scholar]
[8].Singal G, Miller PG, Agarwala V, Li G, Kaushik G, Backenroth D, Gossai A, Frampton GM, Torres AZ, Lehnert EM, Bourque D, O’Connell C, Bowser B, Caron T, Baydur E, Seidl-Rathkopf K, Ivanov I, Alpha-Cobb G, Guria A, He J, Frank S, Nunnally AC, Bailey M, Jaskiw A, Feuchtbaum D, Nussbaum N, Abernethy AP, Miller VA, Association of patient characteristics and tumor genomics with clinical outcomes among patients with non–small cell lung cancer using a clinicogenomic database, In: Jama 321 (14) (2019) 1391. [DOI] [PMC free article] [PubMed] [Google Scholar]
[9].Cox DR, Regression models and life-tables, J. Roy. Statistical Soc.: Ser. B (Methodological) 34 (2) (1972) 187–202. [Google Scholar]
[10].Brat GA, et al. , International electronic health record-derived COVID-19 clinical course profiles: the 4CE consortium, Npj Digital Med. 3 (1) (2020) 1–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
[11].Wolfson M, et al. , DataSHIELD: resolving a conflict in contemporary bioscience? performing a pooled analysis of individual-level data without sharing the data, Int. J. Epidemiol 39 (5) (2010) 1372–1382. [DOI] [PMC free article] [PubMed] [Google Scholar]
[12].Duan R, Luo C, Schuemie MJ, Tong J, Liang CJ, Chang HH, Boland MR, Bian J, Xu H, Holmes JH, Forrest CB, Morton SC, Berlin JA, Moore JH, Mahoney KB, Chen Y, Learning from local to global: An efficient distributed algorithm for modeling time-to-event data, J. Am. Med. Inform. Assoc 27 (7) (2020) 1028–1036. [DOI] [PMC free article] [PubMed] [Google Scholar]
[13].Wu Y, Jiang X, Kim J, Ohno-Machado L, G rid Binary LO gistic RE gression (GLORE): building shared models without sharing data, J. Am. Med. Inform. Assoc 19 (5) (2012) 758–764. [DOI] [PMC free article] [PubMed] [Google Scholar]
[14].Lu C-L, Wang S, Ji Z, Wu Y, Xiong L.i., Jiang X, Ohno-Machado L, WebDISCO: a web service for distributed cox model learning without patient-level data sharing, J. Am. Med. Inform. Assoc 22 (6) (2015) 1212–1219. [DOI] [PMC free article] [PubMed] [Google Scholar]
[15].Bastani H, Predicting with Proxies: Transfer Learning in High Dimension, Manage. Sci 67 (5) (2021) 2964–2984. [Google Scholar]
[16].Turki T, Wei Z, Wang JTL, Transfer learning approaches to improve drug sensitivity prediction in multiple myeloma patients, IEEE Access 5 (2017) 7381–7393. [Google Scholar]
[17].Sun YV, Yi-Juan H.u., Integrative analysis of multi-omics data for discovery and functional studies of complex human diseases, Adv. Genet 93 (2016) 147–190. [DOI] [PMC free article] [PubMed] [Google Scholar]
[18].Dauḿe Hal III. Frustratingly easy domain adaptation. arXiv preprint arXiv: 0907.1815 (2009). [Google Scholar]
[19].Tony Cai T, Wei H, Transfer learning for nonparametric classification: Minimax rate and adaptive classifier, Ann. Statistics 49 (1) (2021) 100–128. [Google Scholar]
[20].Sai Li T, Cai T, Li H, Transfer learning for high-dimensional linear regres- sion: Prediction, estimation, and minimax optimality. In: arXiv preprint arXiv: 2006.10593, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
[21].Cai T, Liu M, Xia Y, Individual data protected integrative regression analysis of high-dimensional heterogeneous data, J. Am. Stat. Assoc (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
[22].Bühlmann Peter and Meinshausen Nicolai. “Magging: maximin aggregation for inhomoge- neous large-scale data”. In: arXiv preprint arXiv:1409.2638 (2014). [Google Scholar]
[23].Meinshausen N, Buhlmann P, Maximin effects in inhomogeneous large-scale data, Ann. Statistics 43 (4) (2015) 1801–1830. [Google Scholar]
[24].Rothenhäausler Dominik, Meinshausen Nicolai, Bühlmann Peter. “Confidence intervals for maximin effects in inhomogeneous large-scale data”. In: Statistical Analysis for High- Dimensional Data. Springer, 2016, pp. 255–277. [Google Scholar]
[25].Guo Z. Inference for High-dimensional Maximin Effects in Heterogeneous Regression Models Using a Sampling Approach. In: arXiv preprint arXiv: 2011.07568 (2020). [Google Scholar]
[26].Weihua H, et al. , Does distributionally robust supervised learning give robust classifiers? Int. Conf. Mach. Learn. PMLR (2018) 2029–2037. [Google Scholar]
[27].Sagawa S, et al. , Distributionally robust neural networks for group shifts: On the impor- tance of regularization for worst-case generalization. In: arXiv preprint arXiv: 1911.08731, 2019. [Google Scholar]
[28].Shi C, Song R, Wenbin L.u., Bo F.u., Maximin projection learning for optimal treatment decision with heterogeneous individualized treatment effects, J. Roy. Statistical Soc.: Ser. B (Statistical Methodol.) 80 (4) (2018) 681–702. [DOI] [PMC free article] [PubMed] [Google Scholar]
[29].Cheng SC, Wei LJ, Ying Z, Analysis of transformation models with censored data, Biometrika 82 (4) (1995) 835–845. [Google Scholar]
[30].Hastie T, Tibshirani R, Wainwright M, Statistical learning with sparsity: the lasso and generalizations, Chapman and Hall/CRC, 2019. [Google Scholar]
[31].Cai T, Liu W, Luo X.i., A constrained ℓ 1 minimization approach to sparse precision matrix estimation, J. Am. Stat. Assoc 106 (494) (2011) 594–607. [Google Scholar]
[32].Cai TT, Ren Z, Zhou HH, Estimating structured high-dimensional covariance and precision matrices: Optimal rates and adaptive estimation, Electronic J. Statistics 10 (1) (2016) 1–59. [Google Scholar]
[33].Uno H, Cai T, Pencina MJ, D’Agostino RB, Wei LJ, On the C-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data, Statist. Med 30 (10) (2011) 1105–1117. [DOI] [PMC free article] [PubMed] [Google Scholar]
[34].Weber GM, et al. , International Changes in COVID-19 Clinical Trajectories Across 315 Hospitals and 6 Countries: a 4CE Consortium Study, In: J. med. internet res (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
[35].Deyo RA, Cherkin DC, Ciol MA, Adapting a clinical comorbidity index for use with ICD-9-CM administrative databases, J. Clin. Epidemiol 45 (6) (1992) 613–619. [DOI] [PubMed] [Google Scholar]
[36].Van Buuren S, Groothuis-Oudshoorn K, mice: Multivariate imputation by chained equations in R, J. Stat. Softw 45 (1) (2011) 1–67. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

S-Fig3

NIHMS1850166-supplement-S-Fig3.pdf^{(145.1KB, pdf)}

S-Fig1

NIHMS1850166-supplement-S-Fig1.pdf^{(397.4KB, pdf)}

S-Fig2

NIHMS1850166-supplement-S-Fig2.pdf^{(393.2KB, pdf)}

S-Fig4

NIHMS1850166-supplement-S-Fig4.pdf^{(383.2KB, pdf)}

S-Fig5

NIHMS1850166-supplement-S-Fig5.pdf^{(384.9KB, pdf)}

S-Fig6

NIHMS1850166-supplement-S-Fig6.pdf^{(423.8KB, pdf)}

S-Fig7

NIHMS1850166-supplement-S-Fig7.pdf^{(540KB, pdf)}

S-Fig8

NIHMS1850166-supplement-S-Fig8.pdf^{(174.7KB, pdf)}

S-Fig9

NIHMS1850166-supplement-S-Fig9.pdf^{(254.9KB, pdf)}

S-Fig10

NIHMS1850166-supplement-S-Fig10.pdf^{(394.3KB, pdf)}

[R1] [1].Torda P, Han ES, Scholle SH, Easing the adoption and use of electronic health records in small practices, Health Aff. 29 (4) (2010) 668–675. [DOI] [PubMed] [Google Scholar]

[R2] [2].Decker SL, Jamoom EW, Sisk JE, Physicians in nonprimary care and small practices and those age 55 and older lag in adopting electronic health record systems, Health Aff. 31 (5) (2012) 1108–1114. [DOI] [PubMed] [Google Scholar]

[R3] [3].Kim Y-G, Jung K, Park Y-T, Shin D, Cho SY, Yoon D, Park RW, Rate of electronic health record adoption in South Korea: a nation- wide survey, Int. J. Med. Inf 101 (2017) 100–107. [DOI] [PubMed] [Google Scholar]

[R4] [4].Tavares J, Oliveira T, Electronic health record portal adoption: a cross country analysis. In: BMC medical informatics and decision making 17.1 (2017), pp. 1–17. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] [5].Kose I, Rayner J, Birinci S, Ulgu MM, Yilmaz I, Guner S, Mahir SK, Aycil K, Elmas BO, Volkan E, Altinbas Z, Gencyurek G, Zehir E, Gundogdu B, Ozcan M, Vardar C, Altinli B, Hasancebi JS, Adoption rates of electronic health records in Turkish Hospitals and the relation with hospital sizes, BMC Health Services Res. 20 (1) (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] [6].Murphy SN, et al. , Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2), J. Am. Med. Inform. Assoc 17 (2) (2010) 124–130. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] [7].Hagar Y, Albers D, Pivovarov R, Chase H, Dukic V, Elhadad N, Survival analysis with electronic health record data: Experiments with chronic kidney disease: Survival Analysis of EHR CKD Data, Statistical Analy Data Mining 7 (5) (2014) 385–403. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] [8].Singal G, Miller PG, Agarwala V, Li G, Kaushik G, Backenroth D, Gossai A, Frampton GM, Torres AZ, Lehnert EM, Bourque D, O’Connell C, Bowser B, Caron T, Baydur E, Seidl-Rathkopf K, Ivanov I, Alpha-Cobb G, Guria A, He J, Frank S, Nunnally AC, Bailey M, Jaskiw A, Feuchtbaum D, Nussbaum N, Abernethy AP, Miller VA, Association of patient characteristics and tumor genomics with clinical outcomes among patients with non–small cell lung cancer using a clinicogenomic database, In: Jama 321 (14) (2019) 1391. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] [9].Cox DR, Regression models and life-tables, J. Roy. Statistical Soc.: Ser. B (Methodological) 34 (2) (1972) 187–202. [Google Scholar]

[R10] [10].Brat GA, et al. , International electronic health record-derived COVID-19 clinical course profiles: the 4CE consortium, Npj Digital Med. 3 (1) (2020) 1–9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] [11].Wolfson M, et al. , DataSHIELD: resolving a conflict in contemporary bioscience? performing a pooled analysis of individual-level data without sharing the data, Int. J. Epidemiol 39 (5) (2010) 1372–1382. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] [12].Duan R, Luo C, Schuemie MJ, Tong J, Liang CJ, Chang HH, Boland MR, Bian J, Xu H, Holmes JH, Forrest CB, Morton SC, Berlin JA, Moore JH, Mahoney KB, Chen Y, Learning from local to global: An efficient distributed algorithm for modeling time-to-event data, J. Am. Med. Inform. Assoc 27 (7) (2020) 1028–1036. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] [13].Wu Y, Jiang X, Kim J, Ohno-Machado L, G rid Binary LO gistic RE gression (GLORE): building shared models without sharing data, J. Am. Med. Inform. Assoc 19 (5) (2012) 758–764. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] [14].Lu C-L, Wang S, Ji Z, Wu Y, Xiong L.i., Jiang X, Ohno-Machado L, WebDISCO: a web service for distributed cox model learning without patient-level data sharing, J. Am. Med. Inform. Assoc 22 (6) (2015) 1212–1219. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] [15].Bastani H, Predicting with Proxies: Transfer Learning in High Dimension, Manage. Sci 67 (5) (2021) 2964–2984. [Google Scholar]

[R16] [16].Turki T, Wei Z, Wang JTL, Transfer learning approaches to improve drug sensitivity prediction in multiple myeloma patients, IEEE Access 5 (2017) 7381–7393. [Google Scholar]

[R17] [17].Sun YV, Yi-Juan H.u., Integrative analysis of multi-omics data for discovery and functional studies of complex human diseases, Adv. Genet 93 (2016) 147–190. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] [18].Dauḿe Hal III. Frustratingly easy domain adaptation. arXiv preprint arXiv: 0907.1815 (2009). [Google Scholar]

[R19] [19].Tony Cai T, Wei H, Transfer learning for nonparametric classification: Minimax rate and adaptive classifier, Ann. Statistics 49 (1) (2021) 100–128. [Google Scholar]

[R20] [20].Sai Li T, Cai T, Li H, Transfer learning for high-dimensional linear regres- sion: Prediction, estimation, and minimax optimality. In: arXiv preprint arXiv: 2006.10593, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] [21].Cai T, Liu M, Xia Y, Individual data protected integrative regression analysis of high-dimensional heterogeneous data, J. Am. Stat. Assoc (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] [22].Bühlmann Peter and Meinshausen Nicolai. “Magging: maximin aggregation for inhomoge- neous large-scale data”. In: arXiv preprint arXiv:1409.2638 (2014). [Google Scholar]

[R23] [23].Meinshausen N, Buhlmann P, Maximin effects in inhomogeneous large-scale data, Ann. Statistics 43 (4) (2015) 1801–1830. [Google Scholar]

[R24] [24].Rothenhäausler Dominik, Meinshausen Nicolai, Bühlmann Peter. “Confidence intervals for maximin effects in inhomogeneous large-scale data”. In: Statistical Analysis for High- Dimensional Data. Springer, 2016, pp. 255–277. [Google Scholar]

[R25] [25].Guo Z. Inference for High-dimensional Maximin Effects in Heterogeneous Regression Models Using a Sampling Approach. In: arXiv preprint arXiv: 2011.07568 (2020). [Google Scholar]

[R26] [26].Weihua H, et al. , Does distributionally robust supervised learning give robust classifiers? Int. Conf. Mach. Learn. PMLR (2018) 2029–2037. [Google Scholar]

[R27] [27].Sagawa S, et al. , Distributionally robust neural networks for group shifts: On the impor- tance of regularization for worst-case generalization. In: arXiv preprint arXiv: 1911.08731, 2019. [Google Scholar]

[R28] [28].Shi C, Song R, Wenbin L.u., Bo F.u., Maximin projection learning for optimal treatment decision with heterogeneous individualized treatment effects, J. Roy. Statistical Soc.: Ser. B (Statistical Methodol.) 80 (4) (2018) 681–702. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R29] [29].Cheng SC, Wei LJ, Ying Z, Analysis of transformation models with censored data, Biometrika 82 (4) (1995) 835–845. [Google Scholar]

[R30] [30].Hastie T, Tibshirani R, Wainwright M, Statistical learning with sparsity: the lasso and generalizations, Chapman and Hall/CRC, 2019. [Google Scholar]

[R31] [31].Cai T, Liu W, Luo X.i., A constrained ℓ 1 minimization approach to sparse precision matrix estimation, J. Am. Stat. Assoc 106 (494) (2011) 594–607. [Google Scholar]

[R32] [32].Cai TT, Ren Z, Zhou HH, Estimating structured high-dimensional covariance and precision matrices: Optimal rates and adaptive estimation, Electronic J. Statistics 10 (1) (2016) 1–59. [Google Scholar]

[R33] [33].Uno H, Cai T, Pencina MJ, D’Agostino RB, Wei LJ, On the C-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data, Statist. Med 30 (10) (2011) 1105–1117. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R34] [34].Weber GM, et al. , International Changes in COVID-19 Clinical Trajectories Across 315 Hospitals and 6 Countries: a 4CE Consortium Study, In: J. med. internet res (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R35] [35].Deyo RA, Cherkin DC, Ciol MA, Adapting a clinical comorbidity index for use with ICD-9-CM administrative databases, J. Clin. Epidemiol 45 (6) (1992) 613–619. [DOI] [PubMed] [Google Scholar]

[R36] [36].Van Buuren S, Groothuis-Oudshoorn K, mice: Multivariate imputation by chained equations in R, J. Stat. Softw 45 (1) (2011) 1–67. [Google Scholar]

PERMALINK

SurvMaximin: Robust federated approach to transporting survival risk prediction models

Xuan Wang

Harrison G Zhang

Xin Xiong

Chuan Hong

Griffin M Weber

Gabriel A Brat

Clara-Lea Bonzel

Yuan Luo

Rui Duan

Nathan P Palmer

Meghan R Hutch

Alba Gutiérrez-Sacristán

Riccardo Bellazzi

Luca Chiovato

Kelly Cho

Arianna Dagliati

Hossein Estiri

Noelia García-Barrio

Romain Griffier

David A Hanauer

Yuk-Lam Ho

John H Holmes

Mark S Keller

Jeffrey G Klann MEng

Sehi L’Yi

Sara Lozano-Zahonero

Sarah E Maidlow

Adeline Makoudjou

Alberto Malovini

Bertrand Moal

Jason H Moore

Michele Morris

Danielle L Mowery

Shawn N Murphy

Antoine Neuraz

Kee Yuan Ngiam

Gilbert S Omenn

Lav P Patel

Miguel Pedrera-Jiménez

Andrea Prunotto

Malarkodi Jebathilagam Samayamuthu

Fernando J Sanz Vidorreta

Emily R Schriver

Petra Schubert

Pablo Serrano-Balazote

Andrew M South

Amelia LM Tan

Byorn WL Tan

Valentina Tibollo

Patric Tippmann

Shyam Visweswaran

Zongqi Xia

William Yuan

Daniela Zöller

Isaac S Kohane

Paul Avillach

Zijian Guo

Tianxi Cai

Abstract

Objective:

Materials and Methods:

Results:

Conclusions:

1. Introduction

2. Methods

2.1. SurvMaximin: Federated robust transfer learning for survival outcomes

2.2. Implementation of the SurvMaximin algorithm

Fig. 1.

Step I: Training L local risk prediction models.

Step II: Estimate the similarity matrix among.{blTZQ,l=1,⋯,L}

Step III: Maximin aggregation via (6).

2.3. Transfer to a target site with missing features

2.4. Validation of SurvMaximin algorithm

2.4.1. Simulation studies

2.4.2. Improving Cross-system portability of COVID-19 mortality risk prediction models with SurvMaximin

3. Results

3.1. Results for Simulation studies

Fig. 2.

Step II: Estimate the similarity matrix among. ${b_{l}^{T} Z_{Q}, l = 1, \dots, L}$