Skip to main content
PLOS One logoLink to PLOS One
. 2021 Dec 10;16(12):e0260885. doi: 10.1371/journal.pone.0260885

Identification of patients at risk of new onset heart failure: Utilizing a large statewide health information exchange to train and validate a risk prediction model

Son Q Duong 1,*, Le Zheng 1,2, Minjie Xia 3, Bo Jin 3, Modi Liu 3, Zhen Li 4,5, Shiying Hao 1,2, Shaun T Alfreds 6, Karl G Sylvester 7, Eric Widen 3, Jeffery J Teuteberg 8, Doff B McElhinney 1,2, Xuefeng B Ling 2,7,*
Editor: Dylan A Mordaunt9
PMCID: PMC8664210  PMID: 34890438

Abstract

Background

New-onset heart failure (HF) is associated with poor prognosis and high healthcare utilization. Early identification of patients at increased risk incident-HF may allow for focused allocation of preventative care resources. Health information exchange (HIE) data span the entire spectrum of clinical care, but there are no HIE-based clinical decision support tools for diagnosis of incident-HF. We applied machine-learning methods to model the one-year risk of incident-HF from the Maine statewide-HIE.

Methods and results

We included subjects aged ≥ 40 years without prior HF ICD9/10 codes during a three-year period from 2015 to 2018, and incident-HF defined as assignment of two outpatient or one inpatient code in a year. A tree-boosting algorithm was used to model the probability of incident-HF in year two from data collected in year one, and then validated in year three. 5,668 of 521,347 patients (1.09%) developed incident-HF in the validation cohort. In the validation cohort, the model c-statistic was 0.824 and at a clinically predetermined risk threshold, 10% of patients identified by the model developed incident-HF and 29% of all incident-HF cases in the state of Maine were identified.

Conclusions

Utilizing machine learning modeling techniques on passively collected clinical HIE data, we developed and validated an incident-HF prediction tool that performs on par with other models that require proactively collected clinical data. Our algorithm could be integrated into other HIEs to leverage the EMR resources to provide individuals, systems, and payors with a risk stratification tool to allow for targeted resource allocation to reduce incident-HF disease burden on individuals and health care systems.

Introduction

The estimated age-adjusted annual incidence of heart failure (HF) is 0.72% in men and 0.47% in women aged 45 or greater, and among 40 year-olds, the estimated lifetime risk of developing HF is 1 in 5 [1,2]. Once diagnosed, HF has a poor prognosis, with one study estimating median survival of 2.3 years and 1.7 years in men and women, respectively, after a first HF hospitalization [3]. HF imposes a large burden on the healthcare system, with at least 20% of hospital admissions in adults >65 years due to HF [4]. There are several validated risk models available to practitioners to predict progression of disease once chronic HF has been diagnosed, such as the Seattle Heart Failure Model [5], but there is a lack of commonly used models to predict onset of heart failure. Given the major role HF plays in the utilization and cost of healthcare, as well as the potential for risk factor modification to delay progression of disease [6], there is a clinical need to develop tools to predict the onset of first diagnosis of HF in order to identify high-risk patients for targeted early interventions and resource allocation. The widespread adoption of the electronic medical record (EMR) and the linking of these records in health information exchanges (HIEs) allows for widespread collection of administrative and clinical data across multiple settings of clinical care, including the clinic, emergency room, hospital, pharmacy, and laboratory settings. These repositories represent a rich source of data with the potential to apply “big-data” machine learning techniques to aid in the risk stratification of individual patients in an automated fashion that may be implemented in the EMR system itself [7]. The objective of this study was to develop and validate a model to predict the individual one-year risk of developing a first-time diagnosis of HF in the adult population by applying machine learning methodology to a large, statewide HIE database that captures 97% of all EMR encounters in the state of Maine [8].

Methods

Database and subject selection criteria

This study was approved by the institutional review board of Stanford University. The dataset was derived from the Maine HIE network, which provides real-time point-of-care access for practitioners to records from patients who visited any of the 35 hospitals, 34 federally qualified health centers, and more than 400 ambulatory practices care facilities. The HIE covers nearly 95% of the population of the state of Maine and is managed by the HealthInfoNet organization [9]. The model was designed to predict a patient’s 1-year risk of receiving a first-time diagnosis of HF based off of their prior 1-year of EMR data. Three years of data from November 1, 2015 to October 31, 2018 were analyzed.

Training subjects in the discovery cohort were enrolled between November 1, 2015 and October 31, 2016. Discovery cohort subjects’ future one year clinical outcomes were tracked. Only patients ≥40 years of age were considered in analysis. Patients with any prior ICD9 and ICD10 code indicative of HF (S1 Table) were excluded. We limited subjects to individuals with at least one recorded encounter before the beginning of the observation period and excluded patients with missing demographic and income data (see feature selection below). Subjects in the discovery cohort were randomly split into 1/3 training, 1/3 calibration, and 1/3 performance testing groups. The modeling, calibration and performance blind testing processes were used to minimize over-optimism of the test performance characteristics. The model was then tested in a validation cohort consisting of subjects meeting inclusion criteria in the subsequent year from November 1, 2016 and October 31, 2017. Validation cohort subjects’ future one year clinical outcomes were followed to report final model performance results.

Feature standardization, reduction, and selection

We collected the following features from the database for consideration of potential input model predictors: ICD-9 and-10 billing codes, laboratory data in the last 12 months, medications prescribed in the last 12 months, CPT® codes assigned in the last 12 months, and average income of home ZIP code. We used ICD-10 codes throughout the learning process. ICD-9 codes were used only for historic records to identify the past HF events as exclusion criteria. Average income of in the home ZIP code was calculated from 2010 US Census data [10]. We collected all ICD-10 codes that were assigned to each patient during the prediction period, as well as all laboratory data coded in the LOINC system [11]. All outpatient medication prescriptions during the observation period were collected. Finally, all CPT-4® codes, representing billing codes for outpatient procedures, were collected as well. This raw data collection resulted in a massive number of potential coding features which required data reduction techniques to reduce dimensionality. Medications were mapped to medication class using the Established Pharmacologic Class coding system [12]. Laboratory data were provided from the HIE as “abnormal” and “normal” binary categories due data interoperability challenges requiring raw test values to be converted to binary abnormal/normal categorical variables via comparing test result value against the corresponding care providers’ test normal reference range. These aggregated data sources provided 43,906 unique potential model features for inclusion. Given the large dimensionality of this dataset, we performed an experiment to determine whether aggregating the 5-digit ICD-CM10-CM codes into the 3-digit code to the left of the decimal (henceforth known as the ICD-10 subheader code) would improve model performance and reduce dimensionality. Finally, we performed a univariate filtering step to eliminate features associated with the outcome with a chi-squared test p-value >0.2. These features were then utilized as candidate features for selection in the XGBoost algorithm. Of note, the algorithm additionally eliminated unimportant features by only considering features with an importance gain greater than 0. The “gain” implies the relative contribution of the corresponding feature to the model calculated by taking each feature’s contribution for each tree in the model. A higher value of this metric when compared to another feature implies it is more important for generating a prediction.

Outcome definition

Development of HF was defined as new assignment of an ICD-10 code for HF (S1 Table) during either 1 inpatient or 2 separate outpatient encounters during the prediction time period. This case definition has been described by others in large EMR studies [13,14] and we observed roughly similar incidence of HF in our cohort compared to that described in the prospectively collected and physician-adjudicated Framingham Heart Study [2] (S2 Table).

Model construction and tuning

A supervised machine learning and data mining tool used in several biomedical studies, XGBoost [1517] was applied to develop the prediction model. XGBoost uses a gradient boosting technique based on a strategy of additive decision trees. In each iteration, a decision tree-based model is trained to predict the prediction errors of the models trained in previous iterations. The decision tree-based model is optimized by an objective function, which consists of a loss function to minimize the error and a regularization term to avoid overfitting. The final prediction result is a sum of the predictions of all the trees. The technical details of this XGBoost procedure were described elsewhere [8]. A hyper-parameter fine tuning process was applied to improve the performance of the system on training set. The hyper-parameters learning rate (eta), maximum depth of a tree, and the number of estimators were tuned using hyperoptimization techniques based on grid search to combine all possible parameters to be optimized to identify the combination that most improved performance. Default regularization parameters were used (L1 = 0, L2 = 1, gamma = 0). In this process, the discovery (training) set is divided into 10-fold for cross validation, and parallel processing based on grid search was used to increase efficiency and minimize the time of parameter tuning. After hyper-parameter fine tuning process, the optimized hyper-parameter combination was found based on best model performance: the learning rate was set to 0.3, the depth of each tree was set to 5 and number of estimators was set to 500. These optimized parameter was applied for XGBoost on the training set to derive final model.

During training, ten-fold cross validation was used. After training, the prediction results were calibrated to the positive predicted value (PPV) to provide a universal standardized risk measurement. We constructed 2 models, 1 utilizing specific ICD-10 codes and the other utilizing ICD subheader codes. We hypothesized that utilization of ICD10 subheader codes would outperform specific ICD codes as it would reduce dimensionality and noise introduced by variations in provider billing practices.

Model evaluation

Risk scores were expressed as the predicted probability of development of HF, which is equivalent to the positive predictive value (PPV) of the model. The global model performance was evaluated through development of receiver operating curve (ROC) and calculation of area under the curve (AUC) with 95% confidence interval (CI) calculated through bootstrapping methods. We generated observed versus expected calibration curves to examine model performance across all risk scores. For purposes of application of the model to clinical practice, we selected a risk score of ≥0.05 (≥5% probability of development of HF) as the threshold at which patients would be considered high-risk and flagged as “test positive” in the model. At this threshold we calculated the sensitivity, specificity, PPV, and negative predictive value (NPV). For purposes of reproducibility and standardization, the TRIPOD reporting checklist [18] was utilized and available for review in S3 Table.

Software and hardware

R version 3.5.0 was used with the packages including but not limited to Xtable, XML, Xgboost, whoami, whisker, Xlsx, tidyverse, tidyselect, yaml, xlsxjars. The Windows Server OS 2012 R2+ was used to support computing boxes with CPU 96 vCores, memory 1 TB, 120 GB drive for the OS and 4 TB drive for data mart storage.

Exploratory model analyses

We explored whether modifying the outcome definition to require only one inpatient or outpatient diagnosis code substantially changed model results. Additionally, we performed additional analysis in which we changed the data dimensionality reduction technique by removing the univariate chi-squared test filtering step. Additionally a Weighted XGBoost method with fine tuning of the associated parameters was explored. Since the incidence rate of heart failure is low and the dataset is highly imbalanced, we applied class weighted XGBoost techniques to tune the training algorithm to increase weight to misclassification of the minority class for datasets with a skewed class distribution in order to achieve better performance on heart failure risk prediction problems with a severe class imbalance. In the process, positive class weight hyper-parameter was fine tuned to scale the gradient for the positive class, grid search a range of different class weightings (90, 95, 100, 110, 150) for class-weighted XGBoost and discover the best ROC AUC score. As result, positive class weight was set to 90.

Results

In the discovery cohort, 497,470 patients met criteria for study inclusion and 521,347 patients met criteria in the validation cohort (Fig 1). The baseline characteristics of the discovery and validation groups are shown in (Table 1). Incident HF was diagnosed in 6,816 (1.37%) individuals in the discovery and 5,668 (1.09%) in the validation cohort. Of the 43,906 possible data features before feature reduction techniques were performed, the algorithm selected 339 for inclusion in the final model (S4 Table). As an example, the top-ranked 25 features included several known to be associated with HF including age; respiratory disorders such as chronic obstructive pulmonary disease; prescriptions for anticoagulation, anti-hypertensive, diuretic, or pulmonary medications; and laboratory markers of abnormal kidney function or glucose homeostasis (Table 2). The use of ICD subheader codes substantially improved model characteristics compared to specific ICD codes, resulting in an increase in the prospective AUC (median, 95% CI) from 0.797 [0.790–0.803] to 0.824 [0.818–0.830]. Thus, ICD subheaders were included for the final model (see Fig 2 for performance characteristics). Model calibration is illustrated in S1 Fig. Calibration from risk scores 0 to 0.05 appeared adequate, but in patients with a risk score >0.05 (5% predicted risk), the model tended to underestimate the risk of developing HF. We therefore used a risk score of 0.05 and higher as a relevant test threshold to classify patients into a higher risk category. This yielded a test sensitivity of 29.2% [95% CI 28.1–30.4%]), specificity of 97.1% [95% CI 97.1–97.2%], positive predictive value of 10.0% [95% CI 9.7–10.4%], and negative predictive value of 99.2% [95% CI 99.1–99.2%]. The relative risk of development of HF in the test positive group was 9.17 times greater than the baseline incidence of 1.09%.

Fig 1. Cohort identification, discovery and validation cohorts.

Fig 1

Discovery cohort utilized to generate the prediction model, which is subsequently validated on the patient cohort one year after discovery period.

Table 1. Baseline characteristics of discovery and validation cohorts.

  Discovery cohort Validation cohort
  Heart Failure Group Non-Heart Failure Group Heart Failure Non-Heart Failure Group
  N = 6,816 N = 490,654 N = 5,668 N = 515,679
Age, mean (SD) 76 (10.49) 65 (13.42) 78 (11.16) 63 (13.05)
Gender, N(%)        
Male 3,953 214,775 2,864 278,742
Female 2,863 275,879 2,804 236,937
Type 2 diabetes:        
Yes 1,022 50,722 893 59,643
No 5,794 439,932 4,775 456,036
Essential Hypertension        
Yes 843 110,547 693 134,278
No 5,973 380,107 4,975 381,401
Chronic kidney disease (CKD)        
Yes 750 36,789 653 45,381
No 6,066 453,865 5,015 470,298

Table 2. Top 25 most important features from final model (of 339 total features selected).

Importance Rank Feature Feature Class
1 Loop diuretic medication prescribed Medication
2 Beta-Adrenergic Blocker prescribed Medication
3 Age Group (> = 85) Demographics
4 Age Group (75–84) Demographics
5 Long term (current) drug therapy ICD10 Subheader
6 Other chronic obstructive pulmonary disease ICD10 Subheader
7 Age Group (35–49) Demographics
8 Age Group (50–64) Demographics
9 Essential (primary) hypertension ICD10 Subheader
10 Presence of cardiac and vascular implants and grafts ICD10 Subheader
11 Age Group (65–74) Demographics
12 Vitamin K Antagonist prescribed Medication
13 Abnormalities of breathing ICD10 Subheader
14 Beta2-Adrenergic Agonist prescribed Medication
15 Patient had abnormal blood glucose laboratory test Laboratory
16 Hypertensive chronic kidney disease ICD10 Subheader
17 Male Demographics
18 Encounter for screening for malignant neoplasms ICD10 Subheader
19 Angiotensin Converting Enzyme Inhibitor prescribed Medication
20 Abnormal Blood Urea Nitrogen laboratory test Laboratory
21 Encounter for general exam without complaint ICD10 Subheader
22 Patient’s Zip Code area has a very low median Income Demographics
23 HMG-CoA Reductase Inhibitor prescribed Medication
24 Other peripheral vascular diseases ICD10 Subheader
25 Abnormal serum creatinine laboratory test Laboratory

Fig 2. Final model (discovery and validation) characteristics.

Fig 2

Model test characteristics from the discovery and validation cohorts. Blue shaded area represents 95% Confidence interval. Test characteristics shown at a clinically pre-set threshold risk score of 0.05 or greater with resultant test characteristics.

Exploratory analyses

An additional analysis was performed in which the outcome definition was modified to require only 1 outpatient encounter with coding for HF, and found minor increases in sensitivity, PPV, and AUC (S2 Fig). Another exploratory analysis was performed in which the univariate filtering step was eliminated and further model fine-tuning was added by incorporation of class weight methods. This model selected 234 features (of the original 43,906) as important and showed modest statistically significant improvements in the AUC compared to the original model (validation cohort AUC 0.858 vs. 0.824; p = 0.01). Sensitivity at the clinical threshold was increased, however the specificity was decreased, and overall PPV and NPV were unchanged (S3 Fig).

Discussion

In this study, we used data from a large, state-wide EMR clinical information exchange with aggregated demographic, medication, laboratory, medical procedure, and socioeconomic data to develop a model to predict the 1-year risk of developing HF in adults ≥40 years of age. The validated model exhibits good discrimination ability (AUC = 0.824), and by incorporating a clinically relevant threshold of a predicted 5% risk or greater, we were able to capture 29% of incident HF cases in the state of Maine from 2017–2018 with a PPV of 10%. This algorithm identifies a population with an over ninefold greater risk of developing HF compared to the baseline population. Put another way, approximately one in ten patients that test positive in this model are predicted to go on to develop heart failure in the next year, compared to one in 100 at baseline. A strength of this model is that it was built upon extant information in the EMR and was designed to be immediately applicable to the EMR as an “early warning” tool for clinicians and patients. It is possible that such a tool could focus practitioners to screen for asymptomatic left ventricular dysfunction, or control modifiable risk factors for HF such as hypertension [19] and lipid disorders [6] which may reduce progression of disease.

Our model is unique in that it only requires a prior years’ worth of patient data and utilizes information passively collected within the EMR from several sources to include demographic, pharmacy, inpatient, and outpatient encounters. This is unlike traditional risk prediction tools that rely on historical or laboratory markers collected actively. The boosting algorithm by nature is able to incorporate more features than other reported models derived from typical regression methods, and a total of 339 predictors were utilized as classifiers in our study. The model identified many of the traditionally described epidemiological risk factors for HF including age, hypertension, diabetes, chronic obstructive pulmonary disease, arrhythmia, atherosclerosis, cerebrovascular disease, kidney disease, and obesity. Interestingly, the model also agnostically identified more recently identified novel associations reported in the HF literature such as abnormal iron [20] and vitamin D levels [21]. This demonstrates how machine-learning derived tools applied to large clinical data sets can detect subtle associations, though further exploration is required to study if these are truly causative or contributing factors.

Systematic reviews of clinical HF prediction models [22,23] reported AUC values ranging from 0.71–0.92. These studies utilized logistic regression methods for prediction and all of them relied on actively measured risk factors. There was wide variation in outcome definition, ranging from ICD coding to the “gold standard” Framingham criteria. In the machine-learning literature, the reported AUC values for prediction of HF are comparable or to our findings [14,2429]. Direct comparison of models is limited due to differences in study design; namely, the prior studies all used a case-control design, whereas ours was validated on a validation cohort of “all-comers” meeting inclusion criteria, making our model more clinically applicable. Wu et al. [29] trained a model to predict incident HF over a 3-year period from case and control cohorts in large multi-site outpatient group in Pennsylvania. Using a boosting algorithm similar to ours, they reported a median model AUC of 0.78, but did not present a validation cohort. Ng et al. [26] used a matched case-control population of primary care clinics across a large practice in central and northeastern Pennsylvania to model incident HF using random forest modeling. They performed similar feature aggregation techniques to reduce data sparsity, but they used US Center for Medicare & Medicaid Services-derived hierarchical condition categories which aggregate diagnoses into a much more general categories than ICD10 subheaders. They reported an AUC of 0.78 and did not perform validation. Choi et al. [24] and Rasmy et al. [27] both applied a Recurrent Neural Network (RNN) deep learning algorithm to a case-control cohort within large, multi-hospital EMR systems and achieved an AUC of 0.77–0.79 in a validation test subset. Interestingly, Rasmy et al reported that the use of a different clinical classification hierarchical ontology of ICD codes from the US Agency for Healthcare Research and Quality Clinical Classification Software (CCS), was inferior to ICD codes, whereas Choi reported that the use of grouped codes, including CCS, improved prediction. Our results suggest that the ICD10 subheader organizational system yields better prediction than specific ICD codes, which is consistent with our hypothesis that utilizing this system can provide both feature and noise reduction due to variability in provider coding practices. To our knowledge this has not been reported previously in the HF risk prediction literature. Wang et al. [28] reported on models of HF diagnosis using gradient boosting on a matched group of cases and control patients in an outpatient EMR system. Over a shorter prediction window of 180 days and utilizing only ICD9 codes and medications, they achieved an AUC of 0.71. Of note, they reported that the use of principal component analysis to aggregate meaningful input features worked well for small training sets but as the training set size increased the use of aggregation hurt prediction performance whereas we found aggregation using ICD subheader codes to improve performance.

Potential limitations

As these are real-world data there are continued opportunities to improve model development. We show in additional exploratory analyses that the model has variable performance characteristics depending on choice of outcome definition and data reduction techniques. Removal of the univariate prefiltering step and the addition of methods to deal with class imbalance results in a small increase in the model AUC, but at the expense of model specificity. Decisions on which model is “best” ultimately depends on the clinical needs of the practitioner or health system utilizing the tool. These exploratory analyses show, though, that iterative optimization of our model is important for continued application on real-world data.

The use of ICD coding is both a strength and a weakness of this modeling approach—it allows for effortless computer-driven data collection, but is subject to incomplete data and inaccurate diagnosis. Importantly, the AUC estimates in this study are comparable to other studies that used ICD coding for case definition [13,3034]. As with all studies based on clinical and administrative information, coding can be incomplete or inaccurate. This may manifest in variations in provider coding of similarly related conditions, incorrect diagnosis, and implicit treatment for HF without explicit coding for HF. Physician coding practices may be variable due to a litany of factors including: incomplete or nonspecific clinical documentation, lack of commonality between coding terminology and disease processes, and discrepancies between coders and health care providers performing other forms of clinical documentation. The effect of this variation on individual risk assessments is difficult to predict, and is likely variable across different disease processes. We attempted to address variation in provider coding practices by aggregating 5-digit ICD-10 CM codes into 3-digit ICD subheader codes, and found greater capture of relevant features. Although domain knowledge manual curation-based feature engineering like ICD10 code hierarchical grouping can increase feature density, it may miss true risk factors at a lower level of the hierarchical structure. In the future, deep learning neural networks-based dimensionality reduction such as autoencoder methods [35,36] for unsupervised training for dimensionality reduction and feature discovery may improve model prediction performance. To address the potential for bias due to incorrect diagnosis of HF, we incorporated a more stringent case definition criteria of 2 outpatient or 1 inpatient criteria, which has been validated by other groups [13,14], and notably in additional exploratory analysis we did not see a significant change in model performance with a less stringent case definition. Additionally, our observed incidence of ~1% in adults aged over 40 years reasonably approximates that described by the gold-standard, physician-adjudicated prospective Framingham Heart Study [2] (S2 Table). Furthermore, any subjects that move into the Maine HIE system with a prior diagnosis of HF made outside of the system might be erroneously included as at-risk for HF development. Other limitations included the inability to collect some clinically relevant details due to incomplete coding, such as race/ethnicity, BMI, smoking status, and vital signs. However, some of these data may be captured indirectly in our model with ICD coding or medication prescription. Finally, implicit treatment for HF without explicit coding of the disease was likely present to some degree in our data set. In other words, we cannot assess the impact of patients who were being treated for HF by practitioners without explicitly being labeled as having HF. Presence of these patients may have inflated the apparent performance of the model but not represent clinically meaningful prediction.

Our model was not designed to be applicable to patients <40 years-old. However, given the vanishingly low incidence of HF in this population, we believe our performance characteristics are more conservatively estimated by reducing class imbalance from the high proportion of negative classifications in this group. Our model was also not designed to account for interactions between features, which has the potential to vastly improve the modeling but would exponentially increase the dimensionality of the feature set beyond our computational capacity. Additionally, our model can only generate predictions on patients that access the healthcare system on at least a yearly basis, and will not work for patients who seek care outside of the CCHIE database. Our model is only designed to predict a one-year risk which might limit the clinical window for intervention in patients. Finally, because we did not attempt to perform an adjusted analysis, it was unclear if some of the algorithm-selected features were important in their own right or simply correlated and collinear with other features. As an example, cataract surgery and eye disorders were identified as predictive, but from a clinical perspective this seems to be more likely correlated with age rather than an independent risk factor for HF, but further exploration is required.

Conclusion and further studies

In conclusion, we report the development of risk prediction tool for development of HF in adults using a large state-wide CCHIE from the state of Maine. As this CCHIE is based on Orion Health’s Rhapsody HL7 integration engine and associated stack, which is widely used in the US, we envision that this model could be automatically incorporated into CCHIEs to analyze the vast troves of data present in the modern EMR and identify, without any active provider intervention, a set of patients >40 years of age that are at substantially higher risk for development of HF than the general population. We envision that this system could be built directly into the EMR to allow healthcare providers of all types to adjust their recommendations, with the goal of possibly delaying progression of disease. Other interested parties, such as payors or managed care systems, may use this tool for targeted resource allocation, and even patients themselves could use such a tool to monitor their own disease risk profiles to encourage lifestyle modification. Before deployment in clinical practice, this model will need to be validated and refined in other large datasets and patient populations. Further work is needed to identify the clinical utility and cost effectiveness of screening with this tool.

Supporting information

S1 Fig. Model calibration curve.

The observed versus expected model predictions across all risk score assignments. Yellow shaded area represents 95% confidence interval.

(TIF)

S2 Fig. Exploratory analysis: Modification of outcome criteria to only 1 inpatient or outpatient heart failure code.

(TIF)

S3 Fig. Exploratory analysis: Model performance after addition of class weights and elimination of feature prefiltering.

(TIF)

S1 Table. ICD codes for heart failure outcome.

(DOCX)

S2 Table. Comparison of incidence of HF using study definition (2 outpatient or 1 inpatient ICD code assignment; 2b) versus observed incidences from the Framingham Heart Study (2a).

(DOCX)

S3 Table. TRIPOD prediction algorithm checklist.

(XLSX)

S4 Table. All features considered in final model.

(XLSX)

Data Availability

The work was performed under a business arrangement between HealthInfoNet (http://www.hinfonet.org), the operators of the Maine Health Information Exchange and HBI Solutions, Inc. (HBI) located in California. HIN is a steward of the data on behalf of its members which includes health systems, hospitals, medical groups and federally qualified health centers. The data is owned by the HIN members, not HIN. HIN is responsible for security and access to its members' data and has established data service agreements (DSAs) restricting unnecessary exposure of information. HIN and its board (comprised from a cross section of its members) authorized the use of the de-identified data for this research, as the published research helps promote the value of the HIE and value to Maine residents. The research was conducted on HIN technology infrastructure, and the researchers accessed the de-identified data via secure remote methods. All data analysis and modeling for this manuscript was performed on HIN servers and data was accessed via secure connections controlled by HIN. Access to the data used in the study requires secure connection to HIN servers and should be requested directly to HIN. Researchers may contact Shaun T. Alfreds at salfreds@hinfonet.org to request data. Data will be available upon request to all interested researchers. HIN agrees to provide access to the de-identified data on a per request basis to interested researchers. Future researchers will access the data through exact the same process as the authors of the manuscript.

Funding Statement

HBI Solutions, Inc. (HBI) is a private commercial company, and several authors are employed by HBI. HBI provided support in the form of salaries for the authors employed by HBI: MX, BJ, ML, and EW. HealthInfoNet also provided a salary for STA. The funders did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific roles of these authors are articulated in the ‘author contributions’ section.

References

  • 1.Ho KK, Pinsky JL, Kannel WB, Levy D. The epidemiology of heart failure: the Framingham Study. J Am Coll Cardiol. 1993;22(4 Suppl A):6A–13A. Epub 1993/10/01. doi: 10.1016/0735-1097(93)90455-a . [DOI] [PubMed] [Google Scholar]
  • 2.Lloyd-Jones DM, Larson MG, Leip EP, Beiser A, D’Agostino RB, Kannel WB, et al. Lifetime risk for developing congestive heart failure: the Framingham Heart Study. Circulation. 2002;106(24):3068–72. Epub 2002/12/11. doi: 10.1161/01.cir.0000039105.49749.6f . [DOI] [PubMed] [Google Scholar]
  • 3.Jhund PS, Macintyre K, Simpson CR, Lewsey JD, Stewart S, Redpath A, et al. Long-term trends in first hospitalization for heart failure and subsequent survival between 1986 and 2003: a population study of 5.1 million people. Circulation. 2009;119(4):515–23. Epub 2009/01/21. doi: 10.1161/CIRCULATIONAHA.108.812172 . [DOI] [PubMed] [Google Scholar]
  • 4.Jessup M, Brozena S. Heart failure. N Engl J Med. 2003;348(20):2007–18. Epub 2003/05/16. doi: 10.1056/NEJMra021498 . [DOI] [PubMed] [Google Scholar]
  • 5.Levy WC, Mozaffarian D, Linker DT, Sutradhar SC, Anker SD, Cropp AB, et al. The Seattle Heart Failure Model: prediction of survival in heart failure. Circulation. 2006;113(11):1424–33. Epub 2006/03/15. doi: 10.1161/CIRCULATIONAHA.105.584102 . [DOI] [PubMed] [Google Scholar]
  • 6.Yancy CW, Jessup M, Bozkurt B, Butler J, Casey DE Jr., Drazner MH, et al. 2013 ACCF/AHA guideline for the management of heart failure: executive summary: a report of the American College of Cardiology Foundation/American Heart Association Task Force on practice guidelines. Circulation. 2013;128(16):1810–52. Epub 2013/06/07. doi: 10.1161/CIR.0b013e31829e8807 . [DOI] [PubMed] [Google Scholar]
  • 7.Hao S, Wang Y, Jin B, Shin AY, Zhu C, Huang M, et al. Development, Validation and Deployment of a Real Time 30 Day Hospital Readmission Risk Assessment Tool in the Maine Healthcare Information Exchange. PLoS One. 2015;10(10):e0140271. Epub 2015/10/09. doi: 10.1371/journal.pone.0140271 ; PubMed Central PMCID: PMC4598005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Ye C, Fu T, Hao S, Zhang Y, Wang O, Jin B, et al. Prediction of Incident Hypertension Within the Next Year: Prospective Study Using Statewide Electronic Health Records and Machine Learning. J Med Internet Res. 2018;20(1):e22. Epub 2018/02/01. doi: 10.2196/jmir.9268 ; PubMed Central PMCID: PMC5811646. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.HealthInfoNet. 2018 [11/01/2019]. Available from: http://hinfonet.org/.
  • 10.Bureau USC. American FactFinder [11/1/19]. Available from: https://factfinder.census.gov/.
  • 11.Institute R. LOINC 2019 [3/20/2019]. Available from: https://loinc.org/.
  • 12.Administraion USFD. Pharmacologic Class 2018 [updated 03/27/201811/01/19]. Available from: https://www.fda.gov/industry/structured-product-labeling-resources/pharmacologic-class.
  • 13.Goyal A, Norton CR, Thomas TN, Davis RL, Butler J, Ashok V, et al. Predictors of incident heart failure in a large insured population: a one million person-year follow-up study. Circ Heart Fail. 2010;3(6):698–705. Epub 2010/08/28. doi: 10.1161/CIRCHEARTFAILURE.110.938175 ; PubMed Central PMCID: PMC3113516. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Sun J, Hu J, Luo D, Markatou M, Wang F, Edabollahi S, et al. Combining knowledge and data driven insights for identifying risk factors using electronic health records. AMIA Annu Symp Proc. 2012;2012:901–10. Epub 2013/01/11. ; PubMed Central PMCID: PMC3540578. [PMC free article] [PubMed] [Google Scholar]
  • 15.Chen T, Guestrin C. XGBoost. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; San Francisco, California, USA. 2939785: ACM; 2016. p. 785–94.
  • 16.Do DT, Le NQK. Using extreme gradient boosting to identify origin of replication in Saccharomyces cerevisiae via hybrid features. (1089–8646 (Electronic)). [DOI] [PubMed] [Google Scholar]
  • 17.Le NA-O, Do DT, Chiu FY, Yapp EKY, Yeh HY, Chen CY. XGBoost Improves Classification of MGMT Promoter Methylation Status in IDH1 Wildtype Glioblastoma. LID— doi: 10.3390/jpm10030128 LID—128. (2075–4426 (Print)). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Collins GS, Reitsma JB, Altman DG, Moons KG, Group T. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. The TRIPOD Group. Circulation. 2015;131(2):211–9. Epub 2015/01/07. doi: 10.1161/CIRCULATIONAHA.114.014508 ; PubMed Central PMCID: PMC4297220. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Gueyffier F, Bulpitt C, Boissel JP, Schron E, Ekbom T, Fagard R, et al. Antihypertensive drugs in very old people: a subgroup meta-analysis of randomised controlled trials. INDANA Group. Lancet. 1999;353(9155):793–6. Epub 1999/08/25. doi: 10.1016/s0140-6736(98)08127-6 . [DOI] [PubMed] [Google Scholar]
  • 20.von Haehling S, Ebner N, Evertz R, Ponikowski P, Anker SD. Iron Deficiency in Heart Failure: An Overview. JACC Heart Fail. 2019;7(1):36–46. Epub 2018/12/17. doi: 10.1016/j.jchf.2018.07.015 . [DOI] [PubMed] [Google Scholar]
  • 21.Brinkley DM, Ali OM, Zalawadiya SK, Wang TJ. Vitamin D and Heart Failure. Curr Heart Fail Rep. 2017;14(5):410–20. Epub 2017/08/16. doi: 10.1007/s11897-017-0355-7 . [DOI] [PubMed] [Google Scholar]
  • 22.Echouffo-Tcheugui JB, Greene SJ, Papadimitriou L, Zannad F, Yancy CW, Gheorghiade M, et al. Population risk prediction models for incident heart failure: a systematic review. Circ Heart Fail. 2015;8(3):438–47. Epub 2015/03/05. doi: 10.1161/CIRCHEARTFAILURE.114.001896 . [DOI] [PubMed] [Google Scholar]
  • 23.Sahle BW, Owen AJ, Chin KL, Reid CM. Risk Prediction Models for Incident Heart Failure: A Systematic Review of Methodology and Model Performance. J Card Fail. 2017;23(9):680–7. Epub 2017/03/25. doi: 10.1016/j.cardfail.2017.03.005 . [DOI] [PubMed] [Google Scholar]
  • 24.Choi E, Schuetz A, Stewart WF, Sun J. Using recurrent neural network models for early detection of heart failure onset. J Am Med Inform Assoc. 2017;24(2):361–70. Epub 2016/08/16. doi: 10.1093/jamia/ocw112 ; PubMed Central PMCID: PMC5391725. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Li J, Tan X, Xu X, Wang F. Efficient Mining Template of Predictive Temporal Clinical Event Patterns From Patient Electronic Medical Records. IEEE J Biomed Health Inform. 2019;23(5):2138–47. Epub 2018/10/23. doi: 10.1109/JBHI.2018.2877255 . [DOI] [PubMed] [Google Scholar]
  • 26.Ng K, Steinhubl SR, deFilippi C, Dey S, Stewart WF. Early Detection of Heart Failure Using Electronic Health Records: Practical Implications for Time Before Diagnosis, Data Diversity, Data Quantity, and Data Density. Circ Cardiovasc Qual Outcomes. 2016;9(6):649–58. Epub 2017/03/07. doi: 10.1161/CIRCOUTCOMES.116.002797 ; PubMed Central PMCID: PMC5341145. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Rasmy L, Wu Y, Wang N, Geng X, Zheng WJ, Wang F, et al. A study of generalizability of recurrent neural network-based predictive models for heart failure onset risk using a large and heterogeneous EHR data set. J Biomed Inform. 2018;84:11–6. Epub 2018/06/18. doi: 10.1016/j.jbi.2018.06.011 ; PubMed Central PMCID: PMC6076336. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Wang Y, Ng K, Byrd RJ, Hu J, Ebadollahi S, Daar Z, et al. Early detection of heart failure with varying prediction windows by structured and unstructured data in electronic health records. Annu Int Conf IEEE Eng Med Biol Soc. 2015;2015:2530–3. Epub 2016/01/07. doi: 10.1109/EMBC.2015.7318907 ; PubMed Central PMCID: PMC5233460. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Wu J, Roy J, Stewart WF. Prediction modeling using EHR data: challenges, strategies, and a comparison of machine learning approaches. Med Care. 2010;48(6 Suppl):S106–13. Epub 2010/05/18. doi: 10.1097/MLR.0b013e3181de9e17 . [DOI] [PubMed] [Google Scholar]
  • 30.Agarwal SK, Chambless LE, Ballantyne CM, Astor B, Bertoni AG, Chang PP, et al. Prediction of incident heart failure in general practice: the Atherosclerosis Risk in Communities (ARIC) Study. Circ Heart Fail. 2012;5(4):422–9. Epub 2012/05/17. doi: 10.1161/CIRCHEARTFAILURE.111.964841 ; PubMed Central PMCID: PMC3412686. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Choi EY, Bahrami H, Wu CO, Greenland P, Cushman M, Daniels LB, et al. N-terminal pro-B-type natriuretic peptide, left ventricular mass, and incident heart failure: Multi-Ethnic Study of Atherosclerosis. Circ Heart Fail. 2012;5(6):727–34. Epub 2012/10/04. doi: 10.1161/CIRCHEARTFAILURE.112.968701 ; PubMed Central PMCID: PMC4124746. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Kalogeropoulos A, Georgiopoulou V, Psaty BM, Rodondi N, Smith AL, Harrison DG, et al. Inflammatory markers and incident heart failure risk in older adults: the Health ABC (Health, Aging, and Body Composition) study. J Am Coll Cardiol. 2010;55(19):2129–37. Epub 2010/05/08. doi: 10.1016/j.jacc.2009.12.045 ; PubMed Central PMCID: PMC3267799. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Nambi V, Liu X, Chambless LE, de Lemos JA, Virani SS, Agarwal S, et al. Troponin T and N-terminal pro-B-type natriuretic peptide: a biomarker approach to predict heart failure risk—the atherosclerosis risk in communities study. Clin Chem. 2013;59(12):1802–10. Epub 2013/09/17. doi: 10.1373/clinchem.2013.203638 ; PubMed Central PMCID: PMC4208068. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Smith JG, Newton-Cheh C, Almgren P, Struck J, Morgenthaler NG, Bergmann A, et al. Assessment of conventional cardiovascular risk factors and multiple biomarkers for the prediction of incident heart failure and atrial fibrillation. J Am Coll Cardiol. 2010;56(21):1712–9. Epub 2010/11/13. doi: 10.1016/j.jacc.2010.05.049 ; PubMed Central PMCID: PMC3005324. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Hinton GE, Salakhutdinov RR. Reducing the dimensionality of data with neural networks. Science. 2006;313(5786):504–7. Epub 2006/07/29. doi: 10.1126/science.1127647 . [DOI] [PubMed] [Google Scholar]
  • 36.Kiarashinejad Y, Abdollahramezani S, Adibi A. Deep learning approach based on dimensionality reduction for designing electromagnetic nanostructures. npj Computational Materials. 2020;6(1):12. doi: 10.1038/s41524-020-0276-y [DOI] [Google Scholar]

Decision Letter 0

Dylan A Mordaunt

26 May 2021

PONE-D-21-10920

A prospectively validated novel risk prediction model for new onset heart failure utilizing a large statewide health information exchange

PLOS ONE

Dear Dr. Duong,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Thank you for the opportunity to edit this. It's a good piece of work, and will add to the literature in this rapidly expanding area. Any more detail that could be added for translatability/reproducibility, the better. Similarly, ensuring the manuscript meets a systematic review checklist would increase the assessed quality. All suggestions are optional and intended to add value and potential impact.

Please submit your revised manuscript by Jul 10 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Dylan A Mordaunt

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

3. PLOS requires an ORCID iD for the corresponding author in Editorial Manager on papers submitted after December 6th, 2016. Please ensure that you have an ORCID iD and that it is validated in Editorial Manager. To do this, go to ‘Update my Information’ (in the upper left-hand corner of the main menu), and click on the Fetch/Validate link next to the ORCID field. This will take you to the ORCID site and allow you to create a new iD or authenticate a pre-existing iD in Editorial Manager. Please see the following video for instructions on linking an ORCID iD to your Editorial Manager account: https://www.youtube.com/watch?v=_xcclfuvtxQ

4. We note that you have indicated that data from this study are available upon request. PLOS only allows data to be available upon request if there are legal or ethical restrictions on sharing data publicly. For information on unacceptable data access restrictions, please see http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions.

In your revised cover letter, please address the following prompts:

a) If there are ethical or legal restrictions on sharing a de-identified data set, please explain them in detail (e.g., data contain potentially identifying or sensitive patient information) and who has imposed them (e.g., an ethics committee). Please also provide contact information for a data access committee, ethics committee, or other institutional body to which data requests may be sent.

b) If there are no restrictions, please upload the minimal anonymized data set necessary to replicate your study findings as either Supporting Information files or to a stable, public repository and provide us with the relevant URLs, DOIs, or accession numbers. Please see http://www.bmj.com/content/340/bmj.c181.long for guidelines on how to de-identify and prepare clinical data for publication. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories.

We will update your Data Availability statement on your behalf to reflect the information you provide.

5.Thank you for stating the following in the Competing Interests section:

"I have read the journal's policy and the authors of this manuscript have the following competing interests: Dr. Ling, Mr. Widen, and Dr. Sylvester are co-founders and shareholders of HBI Solutions "

We note that one or more of the authors are employed by a commercial company: HBI Solutions

a) Please provide an amended Funding Statement declaring this commercial affiliation, as well as a statement regarding the Role of Funders in your study. If the funding organization did not play a role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript and only provided financial support in the form of authors' salaries and/or research materials, please review your statements relating to the author contributions, and ensure you have specifically and accurately indicated the role(s) that these authors had in your study. You can update author roles in the Author Contributions section of the online submission form.

Please also include the following statement within your amended Funding Statement.

“The funder provided support in the form of salaries for authors [insert relevant initials], but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific roles of these authors are articulated in the ‘author contributions’ section.”

If your commercial affiliation did play a role in your study, please state and explain this role within your updated Funding Statement.

b) Please also provide an updated Competing Interests Statement declaring this commercial affiliation along with any other relevant declarations relating to employment, consultancy, patents, products in development, or marketed products, etc.  

Within your Competing Interests Statement, please confirm that this commercial affiliation does not alter your adherence to all PLOS ONE policies on sharing data and materials by including the following statement: "This does not alter our adherence to  PLOS ONE policies on sharing data and materials.” (as detailed online in our guide for authors http://journals.plos.org/plosone/s/competing-interests) . If this adherence statement is not accurate and  there are restrictions on sharing of data and/or materials, please state these. Please note that we cannot proceed with consideration of your article until this information has been declared.

Please include both an updated Funding Statement and Competing Interests Statement in your cover letter. We will change the online submission form on your behalf.

Please know it is PLOS ONE policy for corresponding authors to declare, on behalf of all authors, all potential competing interests for the purposes of transparency. PLOS defines a competing interest as anything that interferes with, or could reasonably be perceived as interfering with, the full and objective presentation, peer review, editorial decision-making, or publication of research or non-research articles submitted to one of the journals. Competing interests can be financial or non-financial, professional, or personal. Competing interests can arise in relationship to an organization or another person. Please follow this link to our website for more details on competing interests: http://journals.plos.org/plosone/s/competing-interests

Additional Editor Comments :

- One of the editors has asked for expansion on details of the HIE, as this would be of interest both for reproducibility but also in terms of translation. I understand that the Maine HIE was based on Orion Health's Rhapsody HL7 integration engine and associated stack, which is a very similar stack to both HealthENet in NSW and CalIndex in California.

- I'm aware that previously Maine HIE had operationalized readmissions predictions based on daily extracts from the HIE, returning predictions to case managers. Although this was quality driven, it's worth considering that this was motivated by CMS funding. This track record would be worth citing.

- I can see how this model would be useful and I think the authors should be commended. The features are somewhat telelogical for the purpose, but it may be that in translation the application is different. Just something for consideration.

- This is a good piece of work. I would suggest that if this were to be included in a systematic review our critically analysed, it would be worth the authors undertaking the Tripod ML checklist or similar- https://www.tripod-statement.org/, so as to ensure to increase the impact, quality etc

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: Please include a more detailed description of the Health Information Exchange (HIE) - is is designed to provide real-time point of care access to practitioners or is is a passive repository for secondary analysis purposes?

I would like to see a sensitivity analysis that ascertains HF based on only a single outpatient HF code (rather than two outpatient codes) - does this improve sensitivity, PPV and AUC?

Perhaps could add to the discussion potential to use newer survival analysis extensions to deep learning models which might require less dimensionality reduction.

Could expand a bit more on how why there is variation in provider billing practices, how this influences ICD coding, and how this might impact the predictive models.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Louisa R Jorm

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2021 Dec 10;16(12):e0260885. doi: 10.1371/journal.pone.0260885.r002

Author response to Decision Letter 0


18 Jul 2021

Please see the Editor and Reviewer response below

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

3. PLOS requires an ORCID iD for the corresponding author in Editorial Manager on papers submitted after December 6th, 2016. Please ensure that you have an ORCID iD and that it is validated in Editorial Manager. To do this, go to ‘Update my Information’ (in the upper left-hand corner of the main menu), and click on the Fetch/Validate link next to the ORCID field. This will take you to the ORCID site and allow you to create a new iD or authenticate a pre-existing iD in Editorial Manager. Please see the following video for instructions on linking an ORCID iD to your Editorial Manager account: https://www.youtube.com/watch?v=_xcclfuvtxQ

4. We note that you have indicated that data from this study are available upon request. PLOS only allows data to be available upon request if there are legal or ethical restrictions on sharing data publicly. For information on unacceptable data access restrictions, please see http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions.

In your revised cover letter, please address the following prompts:

a) If there are ethical or legal restrictions on sharing a de-identified data set, please explain them in detail (e.g., data contain potentially identifying or sensitive patient information) and who has imposed them (e.g., an ethics committee). Please also provide contact information for a data access committee, ethics committee, or other institutional body to which data requests may be sent.

The work was performed under a business arrangement between HealthInfoNet (http://www.hinfonet.org), the operators of the Maine Health Information Exchange and HBI Solutions, Inc. (HBI) located in California. By business arrangement we mean HBI is a contracted vendor to HealthInfoNet (HIN), and HBI is under contract to deploy its proprietary applications and risk models on the HIN data for use by HIN members. HIN is a steward of the data on behalf of its members which includes health systems, hospitals, medical groups and federally qualified health centers. The data is owned by the HIN members, not HIN. HIN is responsible for security and access to its members' data and has established data service agreements (DSAs) restricting unnecessary exposure of information. HIN and its board (comprised from a cross section of its members) authorized the use of the de-identified data for this research, as the published research helps promote the value of the HIE and value to Maine residents.

HBI receives revenue for providing this service, which is performed remotely. HBI does not own or have access to the data outside of providing services to HIN. HIN manages and controls the data within its technology infrastructure. The research was conducted on HIN technology infrastructure, and the researchers accessed the de-identified data via secure remote methods. All data analysis and modeling for this manuscript was performed on HIN servers and data was accessed via secure connections controlled by HIN.

Access to the data used in the study requires secure connection to HIN servers and should be requested directly to HIN. Researchers may contact Shaun T. Alfreds at salfreds@hinfonet.org to request data. Data will be available upon request to all interested researchers. HIN agrees to provide access to the de-identified data on a per request basis to interested researchers. Future researchers will access the data through exact the same process as the authors of the manuscript.

b) If there are no restrictions, please upload the minimal anonymized data set necessary to replicate your study findings as either Supporting Information files or to a stable, public repository and provide us with the relevant URLs, DOIs, or accession numbers. Please see http://www.bmj.com/content/340/bmj.c181.long for guidelines on how to de-identify and prepare clinical data for publication. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories.

As this is analysis of a third-party dataset, we are unable to provide a dataset for upload (see above)

We will update your Data Availability statement on your behalf to reflect the information you provide.

5.Thank you for stating the following in the Competing Interests section:

"I have read the journal's policy and the authors of this manuscript have the following competing interests: Dr. Ling, Mr. Widen, and Dr. Sylvester are co-founders and shareholders of HBI Solutions "

We note that one or more of the authors are employed by a commercial company: HBI Solutions

a) Please provide an amended Funding Statement declaring this commercial affiliation, as well as a statement regarding the Role of Funders in your study. If the funding organization did not play a role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript and only provided financial support in the form of authors' salaries and/or research materials, please review your statements relating to the author contributions, and ensure you have specifically and accurately indicated the role(s) that these authors had in your study. You can update author roles in the Author Contributions section of the online submission form.

Please also include the following statement within your amended Funding Statement.

“The funder provided support in the form of salaries for authors [insert relevant initials], but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific roles of these authors are articulated in the ‘author contributions’ section.”

If your commercial affiliation did play a role in your study, please state and explain this role within your updated Funding Statement.

- Funding Statements:

The authors received no specific funding for this work.

HBI Solutions, Inc. (HBI) is a private commercial company, and several authors are employed by HBI. HBI provided support in the form of salaries for the authors employed by HBI: MX, BJ, ML, and EW. HBI did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific roles of these authors are articulated in the ‘author contributions’ section.

b) Please also provide an updated Competing Interests Statement declaring this commercial affiliation along with any other relevant declarations relating to employment, consultancy, patents, products in development, or marketed products, etc.

Within your Competing Interests Statement, please confirm that this commercial affiliation does not alter your adherence to all PLOS ONE policies on sharing data and materials by including the following statement: "This does not alter our adherence to PLOS ONE policies on sharing data and materials.” (as detailed online in our guide for authors http://journals.plos.org/plosone/s/competing-interests) . If this adherence statement is not accurate and there are restrictions on sharing of data and/or materials, please state these. Please note that we cannot proceed with consideration of your article until this information has been declared.

Conflict of Interest Disclosures:

We have the following interests: KGS, EW and XBL are co-founders and equity holders of HBI Solutions, Inc., which is currently developing predictive analytics solutions for healthcare organizations. MX, BJ, ML, and EW are employed by HBI Solutions, Inc.

From the Departments of Surgery, Stanford University School of Medicine, Stanford, California, KGS and XBL conducted this research as part of a personal outside consulting arrangement with HBI Solutions, Inc. The research and research results are not, in any way, associated with Stanford University.

There are no patents, further products in development or marketed products to declare. This does not alter our adherence to all the PLOS ONE policies on sharing data and materials, as detailed online in the guide for authors.

Please include both an updated Funding Statement and Competing Interests Statement in your cover letter. We will change the online submission form on your behalf.

Please know it is PLOS ONE policy for corresponding authors to declare, on behalf of all authors, all potential competing interests for the purposes of transparency. PLOS defines a competing interest as anything that interferes with, or could reasonably be perceived as interfering with, the full and objective presentation, peer review, editorial decision-making, or publication of research or non-research articles submitted to one of the journals. Competing interests can be financial or non-financial, professional, or personal. Competing interests can arise in relationship to an organization or another person. Please follow this link to our website for more details on competing interests: http://journals.plos.org/plosone/s/competing-interests

Additional Editor Comments :

- One of the editors has asked for expansion on details of the HIE, as this would be of interest both for reproducibility but also in terms of translation. I understand that the Maine HIE was based on Orion Health's Rhapsody HL7 integration engine and associated stack, which is a very similar stack to both HealthENet in NSW and CalIndex in California.

Thank you for the helpful suggestion. We have incorporated this comment into the discussion section line 251.

“As this CCHIE is based on Orion Health's Rhapsody HL7 integration engine and associated stack, which is widely used in the US, we envision that this model could be automatically incorporated into CCHIEs…”

- I'm aware that previously Maine HIE had operationalized readmissions predictions based on daily extracts from the HIE, returning predictions to case managers. Although this was quality driven, it's worth considering that this was motivated by CMS funding. This track record would be worth citing.

Thank you for the valuable input. We have included this citation in our manuscript on line 38.

- I can see how this model would be useful and I think the authors should be commended. The features are somewhat telelogical for the purpose, but it may be that in translation the application is different. Just something for consideration.

Thank you, we agree that in translation of the model into clinical practice, some of the proposed applications might change.

- This is a good piece of work. I would suggest that if this were to be included in a systematic review our critically analysed, it would be worth the authors undertaking the Tripod ML checklist or similar- https://www.tripod-statement.org/, so as to ensure to increase the impact, quality etc

Thank you for the suggestion. Our understanding is that the TRIPOD-ML checklist has not been released yet, however we have included the TRIPOD checklist as a supplemental table and included a reference to the table in the methods section on line 111.

“For purposes of reproducibility and standardization, the TRIPOD reporting checklist was utilized and available for review in Supplemental Table”

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: Please include a more detailed description of the Health Information Exchange (HIE) - is designed to provide real-time point of care access to practitioners or is a passive repository for secondary analysis purposes?

- The HIE is designed to provide real-time point of care access to practitioners. This is clarified in the methods section line 45.

“The dataset was derived from the Maine HIE network, which provides real-time point-of-care access for practitioners to records from patients who visited any of the 35 hospitals, 34 federally qualified health centers, and more than 400 ambulatory practices care facilities.”

I would like to see a sensitivity analysis that ascertains HF based on only a single outpatient HF code (rather than two outpatient codes) - does this improve sensitivity, PPV and AUC?

- Thank you for the suggestion. We have performed this analysis and found minor increases in sensitivity, PPV, and AUC, but overall the performance is not significantly changed. Please see addition to results on line 134 and Supplementary Figure 2.

“A sensitivity analysis was performed in which the outcome definition was modified to require only 1 outpatient encounter with coding for HF, and found minor increases in sensitivity, PPV, and AUC (Supplemental Figure 2).”

Perhaps could add to the discussion potential to use newer survival analysis extensions to deep learning models which might require less dimensionality reduction.

- Thank you for this excellent suggestion. We have added discussion of deep learning techniques for handling high-dimensional data (line 217)

“Although domain knowledge manual curation-based feature engineering like ICD10 code hierarchical grouping can increase feature density, it may miss true risk factors at a lower level of the hierarchical structure. In the future, deep learning neural networks-based dimensionality reduction such as autoencoder methods {Hinton, #82;Kiarashinejad, 2020 #81} for unsupervised training for dimensionality reduction and feature discovery may improve model prediction performance”

Could expand a bit more on how why there is variation in provider billing practices, how this influences ICD coding, and how this might impact the predictive models.

Thank you for the suggestion. We have added the following to the discussion/limitations to address this concern (line 211)

“Physician coding practices may be variable due to a litany of factors including: incomplete or nonspecific clinical documentation, lack of commonality between coding terminology and disease processes, and discrepancies between coders and health care providers performing other forms of clinical documentation. The effect of this variation on individual risk assessments is difficult to predict, and is likely variable across different disease processes.”

Attachment

Submitted filename: Response reviewers 2.docx

Decision Letter 1

Dylan A Mordaunt

23 Aug 2021

PONE-D-21-10920R1

A prospectively validated novel risk prediction model for new onset heart failure utilizing a large statewide health information exchange

PLOS ONE

Dear Dr. Duong,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Oct 07 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Dylan A Mordaunt

Academic Editor

PLOS ONE

Journal Requirements:

Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

Additional Editor Comments:

Thank you for your submission and amendments. We have received some additional feedback with regards to methods as per the reviewers. Whether these are minor or major suggestions is a matter of perspective, but they are all worth considering. In particular it would be useful to describe how and why this model is different from previous models in the field. All suggestions are addressable. I look forward to receiving your resubmission.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #2: (No Response)

Reviewer #3: (No Response)

Reviewer #4: (No Response)

Reviewer #5: (No Response)

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #2: Partly

Reviewer #3: Partly

Reviewer #4: Partly

Reviewer #5: Partly

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #2: N/A

Reviewer #3: Yes

Reviewer #4: Yes

Reviewer #5: I Don't Know

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #2: No

Reviewer #3: No

Reviewer #4: Yes

Reviewer #5: No

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #2: Yes

Reviewer #3: Yes

Reviewer #4: Yes

Reviewer #5: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #2: Some previous comments have been addressed. In my point of view, there are some major points that still need to be addressed to meet the quality for publication:

1. The manuscript itself lacks a lot of literature review on related works.

2. The authors should compare the performance results to previous studies on the same dataset.

3. The authors should propose more feature selection techniques to find out the optimal ones.

4. How did the authors perform hyperparameter optimization of the models?

5. Machine learning-based model (i.e., XGBoost) has been used in previously biomedical studies such as PMID: 31987913 and PMID: 32942564. Thus, the authors are suggested to refer to more works in this description to attract a broader readership.

6. There must have space before reference number.

Reviewer #3: Aim:

This paper developed a risk prediction tool to detect incident heart failure in adults using a large state-wide CCHIE from the state of Maine, USA.

A tree-boosting algorithm was trained in order to model the probability of incident heart failure in year two from data collected in year one, and then prospectively validated in year three.

This paper tackles a very important problem that could be solved by using existing routinely collected data. In addition, it shows how difficult could be for an algorithm to predict HF based on administrative and billing codes such ICD10. I enjoyed reading the paper.

The model obtains a high specificity but a very low sensitivity. This could be expected as the classes are highly imbalanced, making this problem a difficult one. If the algorithm classifies everyone as healthy, it will have already very high specificity. I would invite the authors to tackle this problem by using some of the machine learning techniques that deal with imbalanced datasets, for example, using a Class Weighted XGBoost or Cost-Sensitive XGBoost.

In addition, given the vast amount of data (labs, procedures, medicines,...) the authors could use other markers in addition to ICD10 to better define HF. This will include some work with clinicians.

Finally, the authors have seemed to just "plug" the XGBoost to the data without tuning any hyper-parameter. I would also invite the authors to re-visit this and tune some of the most important hyper-parameters of the XGBoost. This could significally improve the performance of the predictive algorithm.

Please find some other suggestions below:

Abstract:

1. Methods and Results: "A tree-boosting algorithm was developed": A tree-boosting algorithm was trained, rather than developed.

2. Conclusions: See my comment below regarding the term "prospectively validated".

Methods

Database and subject selection criteria:

1. I would use the terms "training and test" sets for the sets that you use for training and testing. I am not 100% clear which part of the data you use for training and which one for testing. Explain this in detail with dates and number of records. Table 1 will be ideal for that.

Every machine learning algorithm is tested in some data that the model hasn't seen during the training, but I wouldn't call it "prospectively validated" unless you collect the data prospectively, which as far as I understood, it is not the case here. This paper may be of interest: https://jamanetwork.com/journals/jamanetworkopen/fullarticle/2760438 (Prospective and External Evaluation of a Machine Learning Model to Predict In-Hospital Mortality of Adults at Time of Admission

Nathan Brajer). They used the terms training and test sets to build the model, and then, they prospectively validated by validation the model with real-time data: "The model was integrated into the production electronic health record system and prospectively validated on a cohort of 5273 hospitalizations representing 4525 unique adult patients admitted to hospital A between February 14, 2019, and April 15, 2019." Therefore, in my opinion, you use the data as if it was collected prospectively, which is exactly the idea behind any test set, but you didn't integrate the model into production and validate prospectively.

I would remove the term "prospectively validated" from the manuscript and replace it with "test the algorithm" or validate it in the test set.

2. "Subjects in the model building/training cohort were randomly split into a 2/3 training and 1/3 retrospective prediction group, which was used to train separate models under differing feature sets (see below)." I was confused with the definition of the training and test set. In addition ,Table 1 doesn't specify the dates or the amount of patients (2/3 and 1/3) that falls in each group.

Table 1 says that the training cohort contains 497,470 patients, but Figure 1 says the 497,470 were divided into "observation period" (usually this is the training set) and prediction period (this is NOT usually part of the training set).

Finally, the term "validation cohort" is a bit confusing too, since "validation or development set" is commonly used for hyperparameter tuning. As stated above, I would change it to training and test. If validation cohort is preferred, please state clearly, with dates and number of patients, what sets you used for training, test and external testing or "validation". But I don't think it is either external (since the data comes from the same system) or prospective (as it is not collected prospectively).

Feature selection and preprocessing:

1. Did you convert the ICD9 to ICD10 codes? You talked about using 5 or 3 digits with the ICD10. How do you deal with the ICD9 codes?

2. "Finally, we performed a univariate filtering step to eliminate features associated

78 with the outcome with a chi-squared test p-value >0.2." You could use XGBoost for both purposes, 1) feature/dimensionality reduction (https://machinelearningmastery.com/feature-importance-and-feature-selection-with-xgboost-in-python/) and 2) Final model. It would be interesting to see which features the XGBoost presents as the most relevant.

3. Did not you include patient characteristics such as gender? I think males tend to have higher incidence of HF.

4. "Medications were mapped to medication class using the Established Pharmacologic Class coding system". Were medications treated as binary 1-Medication was taken, 0 -Wasn't used, or did you use the amount of units, tablets,...?

In supplementary Table 3, the medications contain this text: Patient had *** medications , which makes me think you consider a continuous number. Please clarify.

5. Exactly the same for lab test, for example "Patient had *** abnormal laboratory tests (INR in Blood by Coagulation assay) in the last 12 months". Maybe a binary flag will be enough.

6. In addition, I think it would help to present this supplementary table 3 by groups: medicines, laboratory test, demographics, ICD10 Umbrella, CPT4 Code. It would be very interesting to see what features are relevant for the algorithm.

Outcome definition:

1. "Development of HF was defined as new assignment of an ICD-10 code for HF". Therefore, no ICD9 codes were considered for the definition of HF. am I right? I assumed all the systems were using ICD10 then. Supplementary table 1 includes some ICD9 codes, which was confusing.

2. As stated above, please clarify the training and test sets.

Model construction:

1. Hyper-parameter tuning: I haven't seen any reference to the hyper-parameters of the model. As per the API, there are many hyper-parameters that can be tuned: https://xgboost.readthedocs.io/en/latest/parameter.html. That is, a "development" set (in addition to the training and test set) should be put aside to tune these hyper-parameters. Alternatively, cross validation could be used in the training set using techniques such as grid search or random search to tune those hyperparameters.

a. Why did you decide to change the max_depth from the default value (6) to 5?

b. Why didn't you tune any of the others hyper-parameters or a subset of them? This could change the performance by much. For example, learning rate, number of estimators, type of regularization (L1 or L2), ...

2. Reproduction of the results: Which library, version and software did you use? Which type of machine?

Results:

1. In the methods you wrote: "Finally, we performed a univariate filtering step to eliminate features associated

78 with the outcome with a chi-squared test p-value >0.2." but in the results, it seems that the features were chosen by the XGBoost algorithm: "Of the 43,906 possible data features before feature reduction techniques were performed, the boost algorithm selected 339 for inclusion in the final model (Supplemental Table 3)." Please clarify.

2. I didn't understand this sentence: "The model also selected as weak classifiers features such as undergoing eye

surgery, laxative use, abnormal iron levels, and vitamin D use, to name a few."

Figure 2:

The confusion matrix was confusing. Please use standard names: Predicted versus True/

Reviewer #4: The authors report the development of risk prediction tool for development of HF in adults using a large state-wide CCHIE from the state of Maine. In terms of data, the author collects enough data for modeling and analysis; in terms of algorithm, the author uses the classical machine learning algorithm xgboost to model the probability of incident-HF in year two from data collected in year one, and then prospectively validated in year three. Here are some questions that may need to be explained:

1.Throughout the data set, the positive data (disease +) is much less than the negative data (disease -), only about 1%, the data is highly uneven. For this kind of binary classification problem of data imbalance, we should first down-sample the (disease -) data to keep the two types of data as balanced as possible before modeling. Because of the data imbalance, the Sensitivity and PPV shown in the confusion matrix of Figure2 are not high enough. How does the author think about this problem?

2.On page 6 start from Line 89, The authors use XGboost algorithm to build two models, but the reason for choosing this algorithm needs to be supplemented, why there is no other algorithm, such as SVM or fully connected neural network and so on.

3.Among the excluded patients, only the diagnosis or data of the previous year were excluded. If the patients did not come to see a doctor because of HF in the previous year, but had a previous history of HF (such as diagnosed earlier), how to exclude them? Will them be mistakenly included?

4.The table1 baseline data are too simple, so we should recompare the baseline data between the training group and the validation group, and whether there is any difference between the heart failure group and the non-heart failure group combined with the most important feature found by table2.

Reviewer #5: The manuscript describes an original application of machine learning for new-onset heart failure prediction in a 1-year timeframe on a large cohort of subjects.

Based on the manuscript in its current form, I do not have sufficient elements to know whether statistical analysis (especially machine learning) has been performed appropriately. The level of detail in which the methods and experimental procedures are described should be increased.

Most importantly, please add more details related to the machine learning experiments. For example:

- Is data reduction applied before the classification? If so, was it done only on the training set to avoid information leakage?

- What kind of cross-validation was used for model development?

- How was the final model derived?

- How were the XGBoost parameters chosen?

- The data set classes appear to be highly imbalanced: was this taken into account in model development?

Please provide further details about the "data reduction techniques to reduce dimensionality" mentioned in the Methods section.

The initial number of features is reported only in the Results section: I suggest mentioning it also in the Methods section, "Feature selection and preprocessing", together with the number of features resulting after dimensionality reduction and the univariate filtering step.

Please specify what criterion was followed to aggregate lab data into abnormal/normal.

When the AUC values are first reported, e.g. "0.797 [0.790-0.803]" etc, the "95% CI" statement inside the brackets is missing. Moreover, are the results reported as mean or median plus CI?

Minors:

- Please check the manuscript for proper spacing between words and references, spelling ("others models", "this algorithm could integrate into the HIE", etc.), and punctuation.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #2: No

Reviewer #3: Yes: Oscar Perez-Concha

Reviewer #4: No

Reviewer #5: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

Attachment

Submitted filename: Review.pdf

PLoS One. 2021 Dec 10;16(12):e0260885. doi: 10.1371/journal.pone.0260885.r004

Author response to Decision Letter 1


3 Oct 2021

PONE-D-21-10920R1 REBUTTAL LETTER

Title “A prospectively validated novel risk prediction model for new onset heart failure utilizing a large statewide health information exchange”

RESPONSE: Title has been revised as “Case finding for patients at risk of new onset heart failure: utilizing a large statewide health information exchange EHR data to train and validate a risk prediction model”.

===================================

RESPONSE: we logged on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to include the following items when submitting our revised manuscript:

• A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). This letter was uploaded as a separate file labeled 'Response to Reviewers'.

• A marked-up copy of your manuscript that highlights changes made to the original version. A separate file labeled 'Revised Manuscript with Track Changes' was uploaded.

• An unmarked version of your revised paper without tracked changes. A separate file labeled 'Manuscript' without track was uploaded.

RESPONSE to REVIEWER 2:

• The manuscript itself lacks a lot of literature review on related works.

RESPONSE: Additional literature of related works has been added to the METHODS per suggestion

• The authors should compare the performance results to previous studies on the same dataset.

RESPONSE: Our HIE dataset has not been analyzed by other groups yet, therefore, it would be technically challenging to compare on the same HIE dataset.

• The authors should propose more feature selection techniques to find out the optimal ones.

How did the authors perform hyperparameter optimization of the models?

RESPONSE: A new method section “Model revisited” was added to address these points.

• Machine learning-based model (i.e., XGBoost) has been used in previously biomedical studies such as PMID: 31987913 and PMID: 32942564. Thus, the authors are suggested to refer to more works in this description to attract a broader readership.

RESPONSE: The method section was revised to reference PMID: 31987913 and PMID: 32942564.

• There must have space before reference number.

RESPONSE: The text was revised to remove space before the reference number.

RESPONSE to REVIEWER 3:

I would invite the authors to tackle this problem by using some of the machine learning techniques that deal with imbalanced datasets, for example, using a Class Weighted XGBoost or Cost-Sensitive XGBoost.

Finally, the authors have seemed to just "plug" the XGBoost to the data without tuning any hyper-parameter. I would also invite the authors to re-visit this and tune some of the most important hyper-parameters of the XGBoost. This could significally improve the performance of the predictive algorithm.

RESPONSE: A new section under “Exploratory Model Analyses” was added to address these points.

Please find some other suggestions below:

Abstract:

1. Methods and Results: "A tree-boosting algorithm was developed": A tree-boosting algorithm was trained, rather than developed.

RESPONSE: revision was made throughout the main text.

Methods

Database and subject selection criteria:

1. I would use the terms "training and test" sets for the sets that you use for training and testing. I am not 100% clear which part of the data you use for training and which one for testing. Explain this in detail with dates and number of records. Table 1 will be ideal for that.

Every machine learning algorithm is tested in some data that the model hasn't seen during the training, but I wouldn't call it "prospectively validated" unless you collect the data prospectively, which as far as I understood, it is not the case here. This paper may be of interest: https://jamanetwork.com/journals/jamanetworkopen/fullarticle/2760438 (Prospective and External Evaluation of a Machine Learning Model to Predict In-Hospital Mortality of Adults at Time of Admission Nathan Brajer). They used the terms training and test sets to build the model, and then, they prospectively validated by validation the model with real-time data: "The model was integrated into the production electronic health record system and prospectively validated on a cohort of 5273 hospitalizations representing 4525 unique adult patients admitted to hospital A between February 14, 2019, and April 15, 2019." Therefore, in my opinion, you use the data as if it was collected prospectively, which is exactly the idea behind any test set, but you didn't integrate the model into production and validate prospectively.

I would remove the term "prospectively validated" from the manuscript and replace it with "test the algorithm" or validate it in the test set.

2. "Subjects in the model building/training cohort were randomly split into a 2/3 training and 1/3 retrospective prediction group, which was used to train separate models under differing feature sets (see below)." I was confused with the definition of the training and test set. In addition ,Table 1 doesn't specify the dates or the amount of patients (2/3 and 1/3) that falls in each group.

Table 1 says that the training cohort contains 497,470 patients, but Figure 1 says the 497,470 were divided into "observation period" (usually this is the training set) and prediction period (this is NOT usually part of the training set).

Finally, the term "validation cohort" is a bit confusing too, since "validation or development set" is commonly used for hyperparameter tuning. As stated above, I would change it to training and test. If validation cohort is preferred, please state clearly, with dates and number of patients, what sets you used for training, test and external testing or "validation". But I don't think it is either external (since the data comes from the same system) or prospective (as it is not collected prospectively).

I would remove the term "prospectively validated" from the manuscript and replace it with "test the algorithm" or validate it in the test set

2. "Subjects in the model building/training cohort were randomly split into a 2/3 training and 1/3 retrospective prediction group, which was used to train separate models under differing feature sets (see below)." I was confused with the definition of the training and test set. In addition ,Table 1 doesn't specify the dates or the amount of patients (2/3 and 1/3) that falls in each group.

RESPONSE:

All the above points are related to the clarification of discovery and validation analytics. Therefore, we address collectively here. We summarize the reviewer’s input into two perspectives: (1) confusion as to the meaning of “prospective”; (2) need additional clarification for the retrospective (training) and prospective (validation) cohorts (Table 1).

We agree with the reviewer that the manuscript needs additional clarification—overall we meant prospective to mean the algorithim was validated on data in a subsequent time period, but acknowledge that this is an unclear and potentially inaccurate definition. We therefore have removed all references to prospective validation and instead simply refer to this process as “validation”.

1. Title “A prospectively validated novel risk prediction model for new onset heart failure utilizing a large statewide health information exchange” was revised as “Identification of patients at risk of new onset heart failure: utilizing a large statewide health information exchange to train and validate a risk prediction model”.

2. In the method section, we used the terms of “discovery cohort” (i.e the old retrospective) and “validation cohort” (i.e the old prospective) to clarify. The discovery cohort was randomly partitioned into 1/3 for training and cross validation, 1/3 for calibration, and 1/3 for blind testing for performance evaluation. The final model performance was then measured on the subsequent year of data: the “validation” cohort.

3. We added an “Exploratory analysis” section to the methods and results. In these analyses we incorporated several reviewer suggestison on model development and tuning and reported the improved results.

4. Table 1 was updated in the main text to include more subject information.

1. Did you convert the ICD9 to ICD10 codes? You talked about using 5 or 3 digits with the ICD10. How do you deal with the ICD9 codes?

RESPONSE: Method section was revised to clarify that we only used ICD9 codes to exclude patients with historical record of having heart failure.

2. "Finally, we performed a univariate filtering step to eliminate features associated

78 with the outcome with a chi-squared test p-value >0.2." You could use XGBoost for both purposes, 1) feature/dimensionality reduction (https://machinelearningmastery.com/feature-importance-and-feature-selection-with-xgboost-in-python/) and 2) Final model. It would be interesting to see which features the XGBoost presents as the most relevant.

RESPONSE: We incorporated this suggestion and reported this under additional exploratory analyses. The method text was revised to add a new section and the results were summarized in the limitation section and the supplementary materials. A table in the supplemental file (S3) is provided with all features selected by their data type and relative importance rank as requested.

3. Did not you include patient characteristics such as gender? I think males tend to have higher incidence of HF.

RESPONSE: Gender is one of the discriminative features (Table 2, importance rank 17) used by XGboost in the trained model.

4. "Medications were mapped to medication class using the Established Pharmacologic Class coding system". Were medications treated as binary 1-Medication was taken, 0 -Wasn't used, or did you use the amount of units, tablets,...?

In supplementary Table 3, the medications contain this text: Patient had *** medications , which makes me think you consider a continuous number. Please clarify.

RESPONSE: Medications were treated as features with binary values. Supplementary Table 3 was revised for clarity.

5. Exactly the same for lab test, for example "Patient had *** abnormal laboratory tests (INR in Blood by Coagulation assay) in the last 12 months". Maybe a binary flag will be enough.

RESPONSE: Laboratory tests were treated as features with binary values. Supplementary Table 3 was revised for clarity.

6. In addition, I think it would help to present this supplementary table 3 by groups: medicines, laboratory test, demographics, ICD10 Umbrella, CPT4 Code. It would be very interesting to see what features are relevant for the algorithm.

RESPONSE: Supplementary table 3 was revised as suggested.

Outcome definition:

1. "Development of HF was defined as new assignment of an ICD-10 code for HF". Therefore, no ICD9 codes were considered for the definition of HF. am I right? I assumed all the systems were using ICD10 then. Supplementary table 1 includes some ICD9 codes, which was confusing.

RESPONSE: Only ICD-10 was used in the modeling process. However, when enrolling the patients, we excluded patients with historical HF events encoded by ICD-9. We used ICD-9 codes under this context.

2. As stated above, please clarify the training and test sets.

Model construction:

1. Hyper-parameter tuning: I haven't seen any reference to the hyper-parameters of the model. As per the API, there are many hyper-parameters that can be tuned: https://xgboost.readthedocs.io/en/latest/parameter.html. That is, a "development" set (in addition to the training and test set) should be put aside to tune these hyper-parameters. Alternatively, cross validation could be used in the training set using techniques such as grid search or random search to tune those hyperparameters.

a. Why did you decide to change the max_depth from the default value (6) to 5?

b. Why didn't you tune any of the others hyper-parameters or a subset of them? This could change the performance by much. For example, learning rate, number of estimators, type of regularization (L1 or L2), ...

RESPONSE: We have expanded the methods section to include the hyper-parameter tuning process using a grid search method and the parameters chosen under “model construction and Tuning”

2. Reproduction of the results: Which library, version and software did you use? Which type of machine?

RESPONSE: Method section was revised to include a new sub section to address these points.

Results:

1. In the methods you wrote: "Finally, we performed a univariate filtering step to eliminate features associated 78 with the outcome with a chi-squared test p-value >0.2." but in the results, it seems that the features were chosen by the XGBoost algorithm: "Of the 43,906 possible data features before feature reduction techniques were performed, the boost algorithm selected 339 for inclusion in the final model (Supplemental Table 3)." Please clarify.

RESPONSE: The method section was revised to clarify that the first step was to remove features with a univariate chi-squared test, and the second step was to allow the XGBoost algorithim to select important features with an importance gain greater than 0.

2. I didn't understand this sentence: "The model also selected as weak classifiers features such as undergoing eye

surgery, laxative use, abnormal iron levels, and vitamin D use, to name a few."

RESPONSE: This sentence was removed.

Figure 2:

The confusion matrix was confusing. Please use standard names: Predicted versus True

RESPONSE: The figure was revised as suggested as suggested.

Reviewer #4: The authors report the development of risk prediction tool for development of HF in adults using a large state-wide CCHIE from the state of Maine. In terms of data, the author collects enough data for modeling and analysis; in terms of algorithm, the author uses the classical machine learning algorithm xgboost to model the probability of incident-HF in year two from data collected in year one, and then prospectively validated in year three. Here are some questions that may need to be explained:

1.Throughout the data set, the positive data (disease +) is much less than the negative data (disease -), only about 1%, the data is highly uneven. For this kind of binary classification problem of data imbalance, we should first down-sample the (disease -) data to keep the two types of data as balanced as possible before modeling. Because of the data imbalance, the Sensitivity and PPV shown in the confusion matrix of Figure2 are not high enough. How does the author think about this problem?

RESPONSE: The goal of our study was to create a clinically applicable model using real world EMR data to model the future clinical outcomes. We agree that it is common in data mining to have an imbalanced dataset. However, in our study we felt that the PPV and sensitivity were actually very good for a real-world test meant to identify potential cases of heart failure, not to definitively diagnose heart failure. The goal here is NOT for binary classification but to find patients at risk for targeted clinical outcomes. However, we agree that class imbalances do need to be accommodated in our models and therefore we included additional analysis to do so under “Exploratory analyses”. The results were summarized in the supplementary section. Indeed, we observed the modest improvement (p value, 0.01) of the overall predictive performance (ROC AUC), though we note that the PPV was unchanged and the specificity of the test was actually decreased. We elaborate in the limitations section that the choice of optimal test ultimately lies in the clinical use of the test, and that continued tuning/optimization of a prediction model to meet the need in question is required for implementation.

2.On page 6 start from Line 89, The authors use XGboost algorithm to build two models, but the reason for choosing this algorithm needs to be supplemented, why there is no other algorithm, such as SVM or fully connected neural network and so on.

RESPONSE: We have tried different modeling algorithms (data not shown) including SVM and ultimately chose XGBoost for its modeling performance and computing efficiency.

3.Among the excluded patients, only the diagnosis or data of the previous year were excluded. If the patients did not come to see a doctor because of HF in the previous year, but had a previous history of HF (such as diagnosed earlier), how to exclude them? Will them be mistakenly included?

RESPONSE: In our study we excluded all patients with any prior history of HF in their record (even before one-year). However if a patient were new to the system and carried a diagnosis of HF from another system, than there is a potential for including them as at-risk for developing HF when in fact they already had that diagnosis. We have added this clarification to the limitation section.

4.The table 1 baseline data are too simple, so we should compare the baseline data between the training group and the validation group, and whether there is any difference between the heart failure group and the non-heart failure group combined with the most important feature found by table 2.

RESPONSE: Table 1 was revised as suggested.

Reviewer #5: The manuscript describes an original application of machine learning for new-onset heart failure prediction in a 1-year timeframe on a large cohort of subjects.

Based on the manuscript in its current form, I do not have sufficient elements to know whether statistical analysis (especially machine learning) has been performed appropriately. The level of detail in which the methods and experimental procedures are described should be increased.

Most importantly, please add more details related to the machine learning experiments. For example:

- Is data reduction applied before the classification? If so, was it done only on the training set to avoid information leakage?

RESPONSE:

We applied feature engineering pipeline to preprocess steps that transform raw data into features that can be used in machine learning algorithms on both training set and test set, and also perform feature selection based on transformed features in order to eliminate feature with low variation only for training set. And there should be no information leakage since no subject level data reduction was applied. For feature engineering, since raw data collection resulted in a massive number of potential coding features. Domain knowledge mapping was applied, medications were mapped to medication class using the Established Pharmacologic Class coding system; laboratory data were aggregated into “abnormal” and “normal” binary categories; and we performed an experiment to determine whether aggregating the 5-digit ICD-CM10-CM codes into the 3-digit code to the left of the decimal (henceforth known as the ICD-10 subheader code) would improve model performance and reduce dimensionality. Finally, we performed a univariate filtering step to eliminate features associated with the outcome with a chi-squared test p-value >0.2.

We revised the method section to add a subsection under “Exploratory Analysis” to followed this and other reviewers’ input to fine tune the model and results were described in the Potential limitation section to compare the improvement and effectiveness.

- What kind of cross-validation was used for model development?

RESPONSE: 10-fold cross-validation was used when we used 1/3 of the discovery cohort to train the model. Relevant method section was revised.

- How was the final model derived?

RESPONSE: Method section was revised to clarify: “Subjects in the discovery cohort were randomly split into 1/3 training, 1/3 calibration, and 1/3 performance testing groups. The modeling, calibration and performance blind testing processes were used to minimize over-optimism of the test performance characteristics.”

- How were the XGBoost parameters chosen?

RESPONSE: We have clarified this in the methods section under “Model construction and tuning”: After hyper-parameter fine tuning process, the optimized hyper-parameter combination was found based on best model performance (the learning rate was set to 0.3, the depth of each tree was set to 5 and number of estimators was set to 500). Then, these optimized parameters were applied for XGBoost algorithm on entire training set to derive final model.

- The data set classes appear to be highly imbalanced: was this taken into account in model development?

RESPONSE: This was raised by another reviewer as well. We addressed this in exploratory analyses.

Please provide further details about the "data reduction techniques to reduce dimensionality" mentioned in the Methods section. The initial number of features is reported only in the Results section: I suggest mentioning it also in the Methods section, "Feature selection and preprocessing", together with the number of features resulting after dimensionality reduction and the univariate filtering step.

RESPONSE: The method section was revised to clarify as suggested in the “Feature standardization, reduction, and selection”.

=============================================

Please specify what criterion was followed to aggregate lab data into abnormal/normal.

RESPONSE: Method section was revised to clarify this. “Laboratory data were provided from the HIE as “abnormal” and “normal” binary categories due data interoperability challenges requiring raw test values to be converted to binary abnormal/normal categorical variables via comparing test result value against the corresponding care providers’ test normal reference range.”

When the AUC values are first reported, e.g. "0.797 [0.790-0.803]" etc, the "95% CI" statement inside the brackets is missing. Moreover, are the results reported as mean or median plus CI?

RESPONSE: The text was revised as suggested. The results reported as median with 95% CI.

Minors:

- Please check the manuscript for proper spacing between words and references, spelling ("others models", "this algorithm could integrate into the HIE", etc.), and punctuation.

RESPONSE: The texts were revised accordingly.

________________________________________

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #2: No

Reviewer #3: Yes: Oscar Perez-Concha

Reviewer #4: No

Reviewer #5: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

Attachment

Submitted filename: 9 2021 PONE rebuttal letter v2.docx

Decision Letter 2

Dylan A Mordaunt

19 Nov 2021

Identification of patients at risk of new onset heart failure: utilizing a large statewide health information exchange to train and validate a risk prediction model

PONE-D-21-10920R2

Dear Dr. Ling,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Dylan A Mordaunt, MB ChB, FRACP, FAIDH

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Thank you for your resubmission. As you will have seen, we have had some change in reviewers, that often produces variability. I will address these as follows. The reviewers have provided some valuable feedback, and although my decision is to accept (which I detail below), I would suggest considering whether to include these in your final submission. Dr Perez-Concha has given a detailed discussion and it's a shame we don't have an editorial format that we could enable Dr Perez-Concha to expand on these, as I think they're valuable but shouldn't prevent publication under the PLoS One format.

With specific reference to PLoS One's criteria for publication (https://journals.plos.org/plosone/s/criteria-for-publication):

1. The study appears to present the results of original research.

2. Results appear not to have been published elsewhere.

3. Experiments, statistics, and other analyses are performed to a high technical standard and are described in sufficient detail. There are some additional comments from reviewers that do not represent critical flaws and are perhaps something to be addressed in post-publication review.

4. Conclusions are presented in an appropriate fashion and are supported by the data.

5. The article is presented in an intelligible fashion and is written in standard English.

6. The research meets all applicable standards for the ethics of experimentation and research integrity.

7. The article adheres to appropriate reporting guidelines and community standards for data availability.

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #2: (No Response)

Reviewer #3: (No Response)

Reviewer #4: (No Response)

Reviewer #5: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #2: No

Reviewer #3: Partly

Reviewer #4: Partly

Reviewer #5: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #2: I Don't Know

Reviewer #3: I Don't Know

Reviewer #4: I Don't Know

Reviewer #5: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #2: No

Reviewer #3: Yes

Reviewer #4: Yes

Reviewer #5: No

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #2: Yes

Reviewer #3: Yes

Reviewer #4: Yes

Reviewer #5: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #2: Some of my previous comments have been addressed. However, there are still some concerns as follows:

1. I previously asked for literature review in Introduction to show some previous works that focused on the same problem, not mean the related works in the Methods.

2. The authors should compare the performance results to previous studies on the same dataset. ==> If the authors aim to use their data and convince that their methods are good, they should replicate the other methods on their data to prove that. Currently there are some related works focusing on this prediction model, thus they must try and compare.

3. The authors should propose more feature selection techniques to find out the optimal ones.

Reviewer #3: Thank you for the opportunity to review this paper again. Thank you for having addressed my previous suggestions.

Abstract

•Conclusions: Instead of "passively" I would use the word routinely.

Methods

•Lines 74-77: “Laboratory data were provided from the HIE as “abnormal” and “normal” binary categories due data interoperability challenges requiring raw test values to be converted to binary abnormal/normal categorical variables via comparing test result value against the corresponding care providers’ test normal reference range”.

Questions:

a.Does this sentence mean that you do not know the criteria which were followed to aggregate lab data into abnormal/normal?

b.Did all the health providers (hospitals, outpatient clinics, …) follow the same criteria to convert numbers to normal and abnormal categories?

•The section Exploratory Model Analyses should be included in the section Model construction and tuning, as both sections deal with hyperparameters. Please create a table or summarize all the hyper-parameters that you have tuned, instead of finding this information scattered across the paper. I didn't have clear some of the steps that you followed and that's why I answered "I don't know" to the question of "3. Has the statistical analysis been performed appropriately and rigorously?"

•It will be very useful to include a “graph” or plot with the exact definition of “discovery” and “validation” cohort.

•Line 143: “the positive class, grid search a range of different class weightings (90, 95, 100, 110, 150)”. What was the value of the weight for the negative class? A value of 1? What is the meaning of a weight of 150?

Results

•Table 1. Reviewer 4, comment 4 suggested: “The table 1 baseline data are too simple, so we should compare the baseline data between the training group and the validation group, and whether there is any difference between the heart failure group and the non-heart failure group combined with the most important feature found by table 2”. You said you addressed this, but I don’t see that you listed the important features of table 2 within table 1. I agree I would add more features to the table and percentages to the numbers. You could add the top-ranked 25 known features associated with HF.

I think the main result of this study should be how difficult is to predict HF even with big amounts of data. I don’t think the main finding or result is a predictive algorithm per se. I would wonder how many clinicians would trust it. The sensitivity of 29.2 is a very low value. I don’t think AUC or specificity are very informative in this case, as the problem is highly imbalanced. I would suggest that the authors frame the question in terms of the methodology to predict HF, what we can be done with this method, and what we have to do in the future to predict HF more accurately.

For future work, it might be worth exploring the nature of the question, that is, prediction of HF in the next year. Maybe we need to understand better which predictors/features are going to predict HF more accurately, instead of feeding everything directly to the model. In addition, it could be beneficial to use several models and make a comparison.

Reviewer #4: Whether the model is used for diagnosis or screening, the nature of machine learning is the same. If the data is highly unbalanced, the model will learn more about the characteristics of the "majority" samples and ignore the "minority" samples. So I suggest adding an experiment to sample the "majority" samples so that the data are equalized and modeled again, and the PPV and sensitivity of your new model will be more valuable.

Reviewer #5: I would like to thank the authors for replying to my previous comments.

I suggest to rephrase the new paragraph “Laboratory data were provided from the HIE as “abnormal” and “normal” binary categories due data interoperability challenges requiring raw test values to be converted to binary abnormal/normal categorical variables via comparing test result value against the corresponding care providers’ test normal reference range.”, since it is hard to read.

Other newly added parts may also benefit from proofreading. I have no further issues.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #2: Yes: Khanh N.Q. Le

Reviewer #3: Yes: Oscar Perez-Concha

Reviewer #4: No

Reviewer #5: No

Acceptance letter

Dylan A Mordaunt

3 Dec 2021

PONE-D-21-10920R2

Identification of patients at risk of new onset heart failure: utilizing a large statewide health information exchange to train and validate a risk prediction model

Dear Dr. Ling:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Dylan A Mordaunt

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Fig. Model calibration curve.

    The observed versus expected model predictions across all risk score assignments. Yellow shaded area represents 95% confidence interval.

    (TIF)

    S2 Fig. Exploratory analysis: Modification of outcome criteria to only 1 inpatient or outpatient heart failure code.

    (TIF)

    S3 Fig. Exploratory analysis: Model performance after addition of class weights and elimination of feature prefiltering.

    (TIF)

    S1 Table. ICD codes for heart failure outcome.

    (DOCX)

    S2 Table. Comparison of incidence of HF using study definition (2 outpatient or 1 inpatient ICD code assignment; 2b) versus observed incidences from the Framingham Heart Study (2a).

    (DOCX)

    S3 Table. TRIPOD prediction algorithm checklist.

    (XLSX)

    S4 Table. All features considered in final model.

    (XLSX)

    Attachment

    Submitted filename: Response reviewers 2.docx

    Attachment

    Submitted filename: Review.pdf

    Attachment

    Submitted filename: 9 2021 PONE rebuttal letter v2.docx

    Data Availability Statement

    The work was performed under a business arrangement between HealthInfoNet (http://www.hinfonet.org), the operators of the Maine Health Information Exchange and HBI Solutions, Inc. (HBI) located in California. HIN is a steward of the data on behalf of its members which includes health systems, hospitals, medical groups and federally qualified health centers. The data is owned by the HIN members, not HIN. HIN is responsible for security and access to its members' data and has established data service agreements (DSAs) restricting unnecessary exposure of information. HIN and its board (comprised from a cross section of its members) authorized the use of the de-identified data for this research, as the published research helps promote the value of the HIE and value to Maine residents. The research was conducted on HIN technology infrastructure, and the researchers accessed the de-identified data via secure remote methods. All data analysis and modeling for this manuscript was performed on HIN servers and data was accessed via secure connections controlled by HIN. Access to the data used in the study requires secure connection to HIN servers and should be requested directly to HIN. Researchers may contact Shaun T. Alfreds at salfreds@hinfonet.org to request data. Data will be available upon request to all interested researchers. HIN agrees to provide access to the de-identified data on a per request basis to interested researchers. Future researchers will access the data through exact the same process as the authors of the manuscript.


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES