Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2020 May 7.
Published in final edited form as: J Am Coll Cardiol. 2019 May 7;73(17):2195–2205. doi: 10.1016/j.jacc.2019.01.074

Accelerating Biomarker Discovery through Electronic Health Records, Automated Biobanking, and Proteomics

Quinn S Wells a,b,*, Deepak K Gupta a,b,*, J Gustav Smith c,*, Sean P Collins d, Alan B Storrow d, Jane Ferguson a,b, Maya Landenhed Smith e, Jill M Pulley f, Sarah Collier f, Xiaoming Wang f, Dan M Roden b,g, Robert E Gerszten h, Thomas J Wang a,b
PMCID: PMC6501811  NIHMSID: NIHMS1526156  PMID: 31047008

Abstract

Background:

Circulating biomarkers can facilitate diagnosis and risk stratification for complex conditions such as heart failure (HF). Newer molecular platforms can accelerate biomarker discovery, but they require significant resources for data and sample acquisition.

Objectives:

To test a pragmatic biomarker discovery strategy integrating automated clinical biobanking with proteomics.

Methods:

Using the EHR, we identified patients with and without HF, retrieved their discarded plasma samples, and screened these specimens using a DNA aptamer-based proteomic platform (1,129 proteins). Candidate biomarkers were validated in three different prospective cohorts.

Results:

In an automated manner, plasma samples from 1,315 patients (31% with HF) were collected. Proteomic analysis of a 96-patient subset identified nine candidate biomarkers (p <4.42 × 10−5). Two proteins, angiopoietin-2 and thrombospondin-2, were associated with HF in three separate validation cohorts. In an emergency department-based registry of 852 dyspneic patients, the 2 biomarkers improved discrimination of acute HF compared with a clinical score (p <0.0001) or clinical score plus B-type natriuretic peptide (p=0.02). In a community-based cohort (n = 768), both biomarkers predicted incident HF independent of traditional risk factors and NTproBNP (HR per SD increment, 1.35 [95%CI, 1.14 to 1.61, p = 0.0007] for angiopoietin-2, and 1.37 [1.06–1.79, p = 0.02] for thrombospondin-2). Among 30 advanced HF patients, concentrations of both biomarkers declined (80–84%) following cardiac transplant (p <0.001 for both).

Conclusions:

A novel strategy integrating EHRs, discarded clinical specimens, and proteomics identified two biomarkers that robustly predict HF across diverse clinical settings. This approach could accelerate biomarker discovery for many diseases.

Keywords: Heart failure, biomarkers, proteomics, electronic health records

CONDENSED ABSTRACT:

Biomarker discovery is frequently limited by resource requirements and tendencies to focus on a few candidate molecules. We used automated, electronic health record (EHR)-based biobanking to collect discarded plasma samples from 1,315 patients with and without heart failure. A subset of samples was analyzed using a DNA aptamer-based proteomic platform (1,129 proteins) which identified nine candidate biomarkers. In three validation cohorts, two proteins, angiopoietin-2 and thrombospondin-2, improved acute HF diagnosis, predicted incident HF, and declined following cardiac transplant. This novel strategy integrating EHRs, clinical biobanking, and proteomics identified two robust HF biomarkers and could accelerate biomarker discovery for many diseases.

Introduction

Circulating biomarkers can aid diagnosis, risk stratification, and selection of therapies. Identification of novel biomarkers has, therefore, been of substantial interest for many conditions. Nonetheless, biomarker discovery is often hampered by the time and expense of subject identification, data collection, and biosample acquisition. Moreover, the scope of discovery is often constrained by known biology and the tendency to focus on candidate biomarkers from a limited number of pathophysiologic pathways (1,2).

The broad adoption of electronic health records (EHRs) has led to rapid accumulation of large, detailed, longitudinal datasets that are enriched for clinically-relevant phenotypes and outcomes. The development of computational methods to accurately extract data from clinical databases and link the data to DNA repositories has established EHRs as an efficient platform for clinical and genetic research (35). In parallel, advances in proteomic technologies have enabled simultaneous interrogation of many proteins, opening the possibility of biomarker discovery at scale and with less constraint by existing knowledge (68).

Thus, we developed a pragmatic biomarker discovery strategy that integrates EHR-based subject identification, automated biobanking, and high-throughput proteomics. Our approach links patient-level EHR data, from which many phenotypes can be identified, to an automated system for extraction of plasma from discarded clinical blood specimens. This paradigm has been effective for genetic studies but has not been extended to studies of circulating biomarkers (3,911).

We tested the strategy by applying it to the identification of heart failure (HF) biomarkers. HF is an increasingly common condition that can be difficult to diagnose. It is also a complex syndrome with multiple potential etiologies. Existing HF biomarkers such as the natriuretic peptides, soluble ST-2, and galectin-3, may be helpful in clinical practice but they have limitations. For instance, the natriuretic peptides are more effective for ruling out acute HF than for ruling it in. Furthermore, pathophysiologic understanding of the HF syndrome can be expedited by identification of biomarkers in new pathways.

Methods

Discovery sample

The discovery cohort was developed using a de-identified version of the Vanderbilt University Medical Center (VUMC) EHR. The EHR is linked to BioVU, the VUMC biorepository that houses DNA extracted from residual blood samples collected during routine clinical care that would otherwise be discarded (12,13). For the current project, BioVU was adapted to include collection of residual clinical plasma (Central Illustration) (14). We developed bioinformatic algorithms (Online Appendix) to identify both ambulatory and hospitalized patients from 3 groups: 1) patients with prevalent HF; 2) patients with cardiovascular disease but without HF; and 3) patients with neither cardiovascular disease nor HF. The positive predictive value for all algorithms was ≥95%.

Central Illustration. Automated Plasma Collection and Proteomic Analysis Procedures.

Central Illustration.

A) Electronic algorithms are deployed in an electronic health record (EHR) database for real-time identification of subjects with and without the phenotype of interest, e.g. heart failure. B) Discarded clinical blood samples from eligible subjects are flagged for automated collection, plasma extraction, and sample storage. C) Plasma samples are linked with de-identified EHRs and incorporated into a biobank. D) Discovery of novel candidate biomarkers by analysis of plasma from cases and controls using a DNA aptamer-based proteomic platform. E) Candidate biomarkers are validated in conventional cohorts. Numbered solid lines indicated the sequential strategy used in the current study. Other validation strategies (i.e., parallel validation) are also possible, as indicated by dashed lines.

We had several goals with the automated collection of discarded clinical blood samples. One goal was to deploy the bioinformatics algorithms in the EHR to determine the rate at which we could build a biobank. Algorithms were deployed for 110 days between September 2014 and September 2015 for real-time identification of patients. Subjects meeting one of the three phenotype definitions were flagged in the computer system, and then, in the clinical pathology laboratory, their residual blood samples were retrieved after the clinically-indicated storage period of three days at 4°C. The samples were robotically processed for plasma extraction and stored at −80°C. The only constraints to the number of discarded blood samples collected per day were the eligibility criteria.

The second goal was to test the feasibility of obtaining robust proteomic measurements from discarded clinical blood samples, while the third goal was to demonstrate that we could identify biomarkers efficiently and cost-effectively with this approach. Based on these considerations a random subset of 96 samples (one plate on the SOMAscan platform) was selected for the discovery cohort. The 96 samples were selected in a 1:1 ratio from groups one (HF) and three (no HF or cardiovascular disease). The extreme phenotypes of HF and non-CVD controls were chosen to maximize contrast for biomarker discovery and by the assumption that if differential levels of proteins on the SOMAscan platform could not be detected between these two extremes, then it would be unlikely to work well with an intermediate phenotype. Manual chart review of these 96 subjects confirmed 100% accuracy in categorization as either HF cases or controls. The Vanderbilt Institutional Review Board approved this study and all subjects consented to participate in BioVU.

Validation samples

Robust biomarkers are typically informative across multiple clinical settings; therefore, we evaluated biomarker candidates from the discovery phase in 3 separate validation cohorts: an emergency department (ED) cohort of patients with dyspnea, a community-based cohort of individuals at risk for HF, and a sample of patients undergoing heart transplantation. Detailed descriptions of these 3 cohorts are in the Online Appendix. Briefly, we first evaluated the diagnostic performance of candidate biomarkers in a subset (n = 852, Online Table 1) of subjects in the Improving Heart Failure Risk Stratification in the ED study (STRATIFY), a multi-center, prospective, observational cohort of adult patients with dyspnea and suspected acute HF from 4 EDs in Nashville, TN and Cincinnati, OH between 2007 and 2011 (NCT00508638) (15). In STRATIFY, the presence of acute HF was determined by an adjudication committee of three board-certified cardiologists with access to clinical data from the entire hospitalization (ED encounter plus inpatient stay).

Biomarkers that validated in STRATIFY then underwent testing in two additional cohorts (Online Appendix). First, we used a subset (n = 768) of the Malmö Diet and Cancer Study (MDCS), a population-based, prospective cohort to assess the biomarkers’ ability to predict incident HF. Participants were enrolled between 1991 and 1996, with follow up through December 31, 2013 (16,17). Second, we assessed the change in biomarker levels in response to advanced HF therapies in 30 patients with established HF undergoing heart transplantation or left ventricular assist device (LVAD) implantation. Advanced HF patients were recruited from Skane University Hospital in Lund, Sweden, between 2012 and 2016.

Laboratory methods

Biomarker measurements in the discovery sample, MDCS, and transplant/LVAD patients were performed using the SOMAscan platform (SOMALogic Inc., Boulder, CO) (18,19). The SOMAscan technology uses single-stranded DNA aptamers that target 1,129 proteins with antibody-like specificity (full list in reference) (20). In STRATIFY biomarkers were measured using commercially available ELISAs (Online Table 2). Detailed protocols are provided in the Online Appendix.

Statistical analysis

In the discovery sample, proteins were considered for validation if all of the following criteria were met: 1) there was a significant difference between HF cases and non-HF controls (Wilcoxon rank-sum test) using a Bonferroni-corrected p-value threshold (4.42 × 10−5 = 0.05/1,129 proteins), 2) median concentrations for cases and controls differed by >50%, and 3) the proteins were associated with HF in a multivariable logistic regression model adjusted for age, sex, blood pressure, and estimated glomerular filtration rate (eGFR). Detailed descriptions of the analyses in each of the validation cohorts are in the Online Appendix. DKG, QSW, and JGS had access to all of the data in the study and all authors had final responsibility for the decision to submit for publication.

Results

Automated biobanking and biomarker discovery

During the automated collection phase, 1,315 discarded plasma samples were collected (group one, HF [n = 412]; group two, cardiovascular disease without HF [n = 571]; group three, no HF or cardiovascular disease [n = 332]), at a mean rate of 12 samples per day. The median age of patients was 64 years, and 52% of patients were female. The majority of patients were white (86%). N-terminal pro B-type natriuretic peptide (NTproBNP) levels rose in the expected manner across a random sample of subjects from each clinical group (Online Figure 1). Full characteristics of the sample are shown in Table 1.

Table 1:

Characteristics of 1,315 subjects in the BioVU plasma biobank

Overall

N = 1,315 (100)
Group 1
HF
N = 412 (31)
Group 2
No HF
N = 571 (43)
Group 3
No CVD or HF
N = 332 (25)
Age, years 64 (54, 73) 68 (60, 76) 65 (57, 73) 54 (48, 65)
Male 625 (48) 249 (60) 290 (51) 86 (26)
Race
 White 1,129 (86) 345 (84) 488(86) 296(89)
 Black 148 (11) 59 (14) 67 (12) 22 (7)
 Other 38 (3) 8 (2) 16 (2) 14 (4)
Diabetes 300 (23) 149 (36) 147 (26) 4 (1)
CAD 580 (44) 361 (88) 215 (38) 0 (0)
Anti-platelet med 651 (50) 301 (73) 292 (51) 58 (18)
Lipid lowering med 642 (49) 309 (75) 308(54) 25 (8)
Hypertension 1,023 (78) 401 (97) 481(84) 141 (43)
Anti-hypertensive med 897 (68) 375 (91) 404 (71) 118 (36)
BMI, kg/m2 28 (25, 32) 28 (25, 32) 28 (25, 32) 27 (23, 31)
Systolic BP, mmHg 124 (116, 133) 120 (109, 130) 128 (121, 135) 123 (115, 131)
Diastolic BP, mmHg 70 (64, 76) 65 (60, 70) 71 (66, 77) 73 (69, 79)
eGFR, ml/min/1.73m2 73 (52, 91) 51 (35, 69) 75 (58, 93) 86 (75, 97)
LVEF, % 55 (45, 60) 52 (32, 55) 55 (55, 60) 55 (55, 60)

Values presented as n (%) or median (25th, 75th percentile). HF = heart failure, CVD = cardiovascular disease, BMI = body mass index, BP = blood pressure, eGFR = estimated glomerular filtration rate, LVEF = left ventricular ejection fraction, CAD = coronary artery disease.

The proteomic assay was successfully completed in 95 of 96 samples (99%) from the random patient subset. Compared with those free of HF, individuals with HF were older (median 66 vs. 59 years, p = 0.03), more commonly male (56 vs. 23%, p = 0.002), and had a higher burden of comorbidities such as hypertension, diabetes, and coronary artery disease (Online Table 3). In the assay of 1,129 proteins, levels of previously established biomarkers, such as ST2, galectin-3, and troponin, were higher in HF cases (p=0.1 to 0.0008) but did not meet all of the pre-specified selection criteria (Online Table 4). A total of nine proteins (0.8%) met selection criteria as candidate biomarkers (Table 2). Because the top two proteins (cystatin C and renin) had well established associations with HF, the next four proteins (thrombospondin-2, insulin like growth factor binding protein-6, angiopoietin-2, and interleukin-17 receptor C) were selected for subsequent validation.

Table 2.

Candidate heart failure biomarkers based upon DNA-aptamer proteomic results from discarded clinical plasma specimens

Protein (levels in relative fluorescent units) HF
N = 48
No HF
N = 47
p % Difference
Cystatin-C 2272 (1896, 2963) 1434 (1263, 1772) 2.04 × 10−10 58
Renin 1238 (709, 1945) 393 (289, 610) 4.74 × 10−10 215
Thrombospondin-2 21260 (13500, 32150) 9094 (7242, 13680) 9.60 × 10−9 134
Insulin like growth factor binding protein-6 697 (516, 884) 442 (406, 514) 1.46 × 10−8 58
Angiopoietin-2 593 (461, 791) 393 (331, 468) 3.51 × 10−7 51
Interleukin-17 receptor C 1729 (1331, 2329) 1065 (882, 1389) 4.24 × 10−7 62
Kallikrein-11 1866 (1206, 2279) 1073 (901, 1339) 4.58 × 10−7 74
Macrophage colony-stimulating factor-1 2166 (1663, 2733) 1292 (1138, 1553) 6.87 × 10−7 68
Leukotriene A-4 hydrolase 1763 (1165, 2344) 955 (796, 1187) 4.60 × 10−6 85

Values presented at median (25th, 75th percentile) relative fluorescence units. HF = heart failure. Percent difference calculated as the difference in median values between HF cases and controls (HF minus no HF)/no HF. P value from Wilcoxon rank-sum test using a Bonferroni-corrected p-value threshold of 4.42 × 10−5 = 0.05 / 1,129 proteins assayed. The % difference in median values was calculated as 100 x (median RFU in HF cases – median RFU in controls)/median RFU in controls.

Biomarker validation in emergency department cohort of dyspneic patients

In STRATIFY, 405 of 852 patients (48%) had cardiologist-adjudicated acute HF (Online Table 5). Concentrations of thrombospondin-2 and angiopoietin-2, but not the other candidate biomarkers, were significantly higher in patients with acute HF (p <0.001 for both, Figure 1). In multivariable models adjusted for age, sex, race, Framingham HF criteria, prior HF, BMI, eGFR, and BNP, both angiopoietin-2 and thrombospondin-2 remained associated with acute HF. The odds ratios per SD increase in biomarker were 1.36 (95% confidence interval, 1.09–1.69) for angiopoietin-2, and 1.50 (1.22–1.84) for thrombospondin-2 (Online Table 6A). The results of the multivariable-adjusted models did not substantially change with inclusion of medications (Online Table 6B).

Figure 1: Circulating thrombospondin-2 and angiopoietin-2 levels in patients with and without acute heart failure presenting to the emergency department.

Figure 1:

Box plots of circulating thrombospondin-2 (TSP2) and angiopoietin-2 (ANGPT2) levels in the emergency department-based validation cohort (STRATIFY), Wilcoxon rank-sum p < 0.001 for both. AHF = acute heart failure, n = 405. No AHF, n = 447.

The potential diagnostic value of adding angiopoietin-2 and thrombospondin-2 to clinical variables and BNP was assessed in several ways. First, in ROC curve analyses, the addition of angiopoietin-2 and thrombospondin-2 levels to a clinical score and BNP significantly improved the C-statistic to 0.78 (95% CI, 0.75–0.82), compared with a clinical score alone (0.71 [0.68–0.75], p <0.0001) or clinical score plus BNP (0.77 [0.73–0.80], p = 0.02). Second, among patients in whom acute HF had not been ruled-out (i.e., had BNP >100 pg/ml), levels of angiopoietin-2 and thrombospondin-2 provided additional stratification of acute HF across a wide range (probability of HF as high as 99% and as low as 40%) (Figure 2). Third, we identified angiopoietin-2 and thrombospondin-2 values that optimized sensitivity and specificity for differentiating acute HF from non-acute HF as 228 pg/ml and 33 ng/ml, respectively. A simple score equal to the number of biomarkers (0–3, includes BNP) above their cut-points yielded a C-statistic of 0.73 (95%CI, 0.70–0.76), which was significantly better than that for BNP >100 pg/ml alone (0.65, 95%CI, 0.63–0.67; p < 0.001) (Online Table 7). At a threshold of any one positive biomarker, the sensitivity of the score was 99% (95% CI, 97–100), though the specificity was only 24% (21–29). With three positive biomarkers, the specificity increased to 76% (72–80), at the expense of lower sensitivity, 57% (52–61). Fourth, using the continuous net reclassification index, angiopoietin-2 and thrombospondin-2 levels correctly reclassified 25% of cases and 22% of controls, for an overall net reclassification of 47%, p < 0.001, compared with BNP alone (Online Table 8).

Figure 2: Predicted probability of acute heart failure according to thrombospondin-2 and angiopoietin-2 levels among 619 patients with suspected heart failure and B-type natriuretic peptide > 100 pg/ml in the emergency department.

Figure 2:

Among patients presenting to the emergency department in whom the diagnosis of acute heart failure has not been excluded, i.e. B-type natriuretic peptide > 100 pg/ml, additional knowledge of both thrombospondin-2 and angiopoietin-2 levels further stratifies the probability of acute heart failure across a broad range beyond either marker alone. 364/619 (58.8%) patients had adjudicated acute heart failure.

The results for angiopoietin-2 and thrombospondin-2 were consistent even when examined according to reduced or preserved left ventricular ejection fraction (Online Figure 2). Similarly, when the ROC curve analysis was restricted to individuals with preserved ejection fraction, angiopoietin-2 and thrombospondin-2 levels significantly improved the C-statistic to 0.77 (95% CI, 0.73–0.80), compared with a clinical score alone (0.71 [0.68–0.75], p <0.0001) or clinical score plus BNP (0.75 [0.71–0.78], p = 0.004).

Biomarker validation in longitudinal, community-based stud

In the MDCS, over a median follow-up of 20.2 years, 185 individuals developed new-onset HF. Characteristics of the 185 individuals with incident HF and 583 randomly sampled population-representative MDCS participants without HF are shown in Online Table 9. Baseline angiopoietin-2 and thrombospondin-2 levels were higher among individuals who went on to develop HF than those who did not (p <0.001 for both). The risk of HF according to tertiles of each biomarker are shown in Figure 3. In a Prentice-weighted, multivariable Cox regression adjusted for traditional risk factors, anti-hypertensive medication use, and NTproBNP, both biomarkers were associated with risk of incident HF, with hazard ratios per SD biomarker increase of 1.36 (95% CI, 1.13–1.64) for angiopoietin-2, and 1.29 (95% CI, 1.02–1.62) for thrombospondin-2 (Online Table 10, model 5).

Figure 3: Cumulative incidence of heart failure according to tertiles of plasma angiopoietin-2 and thrombospondin-2 levels in the Malmö Diet and Cancer Study.

Figure 3:

Higher plasma levels of both angiopoietin-2 and thrombospondin-2 are associated with greater risk of incident heart failure, which was defined through validated international classification of diseases codes. The Kaplan-Meier plots are simple curves of the 768 participants followed over time.

Biomarker validation in the advanced HF, transplant, LVAD cohort

In patients with advanced HF, circulating levels of angiopoietin-2 and thrombospondin-2 were assessed before and after cardiac transplantation (Online Tables 11 and 12). Transplantation was associated with reductions in both angiopoietin-2 (change: −84% [95% CI, −89 to −77]) and thrombospondin-2 (change: −80% [95% CI, −87 to −70]), as well as NTproBNP (change: −72% [95%CI, −80 to −60), p ≤ 0.001 for all (Figure 4). Levels of both angiopoietin-2 and thrombospondin-2 also decreased after LVAD (p = 0.04 for both, Online Figure 3).

Figure 4: Angiopoietin-2, thrombospondin-2, and NTproBNP levels in advanced heart failure patients before and after heart transplant.

Figure 4:

Among advanced heart failure patients undergoing cardiac transplantation (n = 30), levels of angiopoietin-2, thrombospondin-2, and NTproBNP are lower after compared with before transplant, p <0.001 for all. ANGPT2 = angiopoietin-2, TSP2 = thrombospondin-2, NTproBNP = N-terminal pro B-type natriuretic peptide. RFU = relative fluorescence units.

Discussion

We developed a pragmatic biomarker discovery strategy integrating EHRs, automated collection of discarded specimens, and high-throughput proteomics, and applied it to HF. Our findings highlight the utility of discarded clinical specimens for biomarker discovery, and in doing so, identify angiopoietin-2 and thrombospondin-2 as robust biomarkers of both acute and pre-clinical HF. This approach could accelerate biomarker discovery across a variety of diseases.

Challenges of biomarker studies include the personnel, time, and resources required to collect biospecimens prospectively, especially in acute clinical settings. Although previously-frozen biospecimens are available from clinical trials and epidemiologic cohorts, limitations exist due to selection bias, lack of appropriate clinical context, and finite quantities of stored specimens. The use of blood specimens originally collected for clinical purposes has several advantages. First, it leverages the clinical laboratory infrastructure available at every hospital. Second, it ensures clinical applicability because the biospecimens are collected during the course of actual clinical care. Third, it reduces biases in patient selection or endpoint ascertainment as no investigators are involved in data collection. Fourth, it has the potential to reduce cost without restricting power or generalizability, as the global platform (e.g., proteomics) can be applied to a smaller set of cases and controls from the discarded samples, followed by targeted measurement of selected molecules in larger, well-characterized cohorts (Central Illustration).

The effectiveness of this approach is attributable, in part, to the transferability of biomarkers between various clinical settings. For instance, the most frequently measured biomarkers of cardiovascular disease were originally described in acutely ill patients, e.g. C-reactive protein, BNP, and cardiac troponins. Once assays became available to detect the low concentrations of these biomarkers found in ambulatory individuals, each biomarker was validated as a robust predictor of incident disease in apparently healthy people. Thus, hospital-collected specimens should be a reasonable resource for performing initial biomarker screens, provided that specimens from more generalizable cohorts exist for targeted follow-up studies, as in the present investigation.

For this study, we performed sequential validation of candidate biomarkers, first in an ED based registry of dyspneic subjects with and without acute HF, and then in an epidemiologic sample and an advanced HF cohort. Other validation strategies are possible. For example, it may be that a biomarker that has relatively modest accuracy for differentiating HF in patients with dyspnea has very good prognostic accuracy for identifying those at risk for incident HF in a general population cohort. Therefore, depending on the scientific goals, one could identify candidates using less conservative criteria and/or implement parallel rather than sequential validation.

Our findings also suggest that use of real-time EHR algorithms for identification of patients for biomarker studies is pragmatic and efficient. We accrued a large number of specimens (>1,000) in a short time frame (~4 months of active collection). Thus, the plasma biobanking methods employed herein could be applied to a wide range of clinical phenotypes, including rare ones, to facilitate biomarker discovery.

Prior examples of biomarker discovery through application of proteomics to discarded clinical samples are lacking. Proteomic methods such as mass spectrometry are presently hampered by low throughput and analytic sensitivity (6,7), whereas multiplex platforms such as those utilized in the current study allow simultaneous quantification of hundreds to thousands of proteins at once (18). Routine clinical laboratory practice and storage conditions of blood samples (i.e., held at 4°C for ~ 3 days) may raise concern for analyte degradation. However, circulating peptides are broadly stable in blood samples under these conditions.(14) Prior work regarding the impact of pre-analytic storage conditions on sample quality using liquid chromatography-tandem mass spectroscopy indicates few significant changes in peptides in plasma stored for up to a week at 4 °C or room temperature prior to plasma isolation.(21) The feasibility of using discarded clinical blood samples and the potential of high throughput multiplex proteomics is further highlighted by the fact that we identified two newer biomarkers using only 96 subjects in the initial screen. Thus, by combining automated collection of discarded plasma with a multiplexed proteomic assay, we put forward a pragmatic, scalable model for biomarker discovery that is generalizable to a range of clinical phenotypes.

We applied the clinical biomarker discovery strategy to HF as it is a complex condition that can be challenging to diagnose, and is associated with substantial morbidity and mortality. In the discovery phase, we observed the expected higher levels of several established HF biomarkers, such as ST2, galectin-3, and troponin, thereby supporting the validity of the approach. Though these known HF biomarkers did not reach the pre-defined criteria for selection of candidate biomarkers, cystatin-c and renin were among the top candidates, again supporting the validity of the approach. Angiopoietin-2 and thrombospondin-2 were among the top candidates in the discovery phase, demonstrating stronger associations than those observed for ST2 and galectin-3. While associations between angiopoietin-2 and thrombospondin-2 with HF have been previously reported, these are limited to a handful of smaller studies; therefore, we selected them for further validation (2230).

Angiopoietin-2 and thrombospondin-2 both have biologically plausible roles in HF. Angiopoietin-2 is an endothelial cell-derived factor linked to the regulation of vascular permeability (31,32). Thrombospondin-2 is a fibroblast-derived protein involved in maintaining myocardial matrix integrity in response to increased loading (3336). In contrast to the prior studies of these proteins in HF, we evaluated the diagnostic performance of these biomarkers in acutely symptomatic patients presenting to the ED, where the diagnosis of acute HF is most commonly made (2630). For the first time, we show that these 2 proteins in combination with BNP provide additional value for diagnosing acute HF beyond the currently accepted clinical standard. Further, the prognostic association between circulating levels of these proteins and the risk of incident HF has not been previously reported. Finally, the finding that levels of these biomarkers fall after transplantation or LVAD is novel. The consistency of the findings for angiopoietin-2 and thrombospondin-2 across multiple different cohorts for discovery and validation and using different assay techniques (DNA-aptamer based proteomics and conventional immunoassays) lends credence to their robustness as HF biomarkers.

Limitations

False-negative and false-positive results are inherent to the selection of candidate biomarkers from a discovery cohort. The DNA-aptamer based proteomic assay covered a broad, but incomplete range of proteins; some unmeasured proteins may be more strongly associated with HF than those we identified. Given the main objective of the study was to demonstrate that we could successfully identify clinical biomarkers using discarded specimens, we elected to use conservative criteria for selecting candidates, reducing the chance of false positives. That said, false negative results are possible due to analyte degradation, test characteristics for each analyte on the SOMAscan research platform, and/or the use of conservative statistical criteria, i.e. the Bonferroni correction. As dictated by the unique study design, blood samples were not utilized until they were no longer needed for clinical use (held at 4°C for ~3 days). Analyte degradation may have led to some false-negative findings, creating a conservative bias. Several biomarkers with established prognostic roles in HF, e.g. ST2, galectin-3, and troponin, trended in the expected direction, though they did not meet Bonferroni corrected level of significance. Findings for these proteins, analyte degradation, and Bonferroni correction, do not, however, explain positive results, such as those for renin, cystatin-C, angiopoietin-2 and thrombospondin-2. Further, given the goal to discover new HF biomarkers, we did not validate ST2, galectin-3, troponin, or other established HF biomarkers, i.e. renin and cystatin-c.

Though we did not perform ELISAs for candidate biomarkers within the discovery cohort, prior work has demonstrated concordance between DNA aptamer and ELISA or mass spectrometry based approaches for proteins present on the SOMAscan platform.(18,19,37) Moreover, the use of conventional ELISA methods for validation in STRATIFY confirms the specificity of the two proteins identified by the aptamer-based platform.

Selection of the candidate proteins from the discovery cohort for validation in STRATIFY occurred in order according to p-value and then by the availability of established ELISAs for use with human plasma. At the time of the study, we could not find a reliable ELISA for kallikrein-11, therefore this and subsequent proteins (macrophage colony stimulating factor-1 and leukotriene-A4 hydrolase) have not been evaluated in STRATIFY. Thus, we focused on a subset (four) of the top proteins from the discovery phase for subsequent validation in STRATIFY. Though we cannot exclude significant associations between the other candidate proteins and HF, this does not negate the robust associations between angiopoietin-2 and thrombospondin-2 and HF across all three validation cohorts. Future studies are planned to evaluate other potential candidates.

We also acknowledge that the performance of some biomarkers may differ by HF etiology or HF subtype. While left ventricular ejection fraction data was not available at that time of incident HF in the MDCS, the results from STRATIFY demonstrate similar performance of angiopoietin-2 and thrombospondin-2 for the diagnosis of acute HF regardless of reduced or preserved ejection fraction. Nevertheless, future studies should investigate subgroup-specific associations.

Conclusions

We demonstrate the feasibility of integrating real-time EHR phenotyping, automated retrieval of discarded plasma specimens, and proteomic analysis for biomarker discovery. In support of this concept, we showed that angiopoietin-2 and thrombospondin-2 are robust HF biomarkers with potential application for diagnosis and risk assessment. This pragmatic approach has the potential to accelerate future biomarker discovery across a range of diseases.

Supplementary Material

1

CLINICAL PERSPECTIVES.

Competency in Medical Knowledge: Biomarkers such plasma proteins can be useful in managing patients with a variety of cardiovascular diseases. Angiopoietin-2 and thrombospondin-2 are associated with heart failure, and future studies should evaluate their utility as clinical biomarkers for this condition.

Translational Outlook: Linking high-throughput proteomic platforms that measure a vast array of proteins to samples from patients identified through electronic medical record systems may be an efficient approach to accelerated discovery of robust biomarkers for many diseases.

Acknowledgements:

The authors would like to acknowledge the support of The Vanderbilt Institute for Clinical and Translational Research (VICTR).

Tweet: Using discarded clinical blood samples for proteomics leads to the identification of new heart failure biomarkers.

Funding: NIH grants K12 HL109019, K23 HL128928–01A1, R01HL133870–01A1, R01HL132320–01, and UL1TR000dd5, R01HL140074; Vanderbilt University Medical Center institutional instrumentation awards 1S10OD017985–01 and R-1306–0d869; European Research Council; Swedish Heart-Lung Foundation; Wallenberg Center for Molecular Medicine at Lund University; Swedish Research Council; Crafoord Foundation; governmental support of the Swedish National Health Service, Skane University Hospital in Lund, and Scania county.

Disclosures: SPC receives research funding from the NIH, PCORI, AHRQ, AHA, and Novartis. SPC is a consultant for Roche and Novartis. QSW receives funding from Abbott. ABS reports research funding from or consulting for NIH, PCORI, Beckman Coulter, Siemens, and Alere. A patent application has been filed for angiopoietin-2 and thrombospondin-2 as HF biomarkers. All other authors report no disclosures.

ABBREVIATIONS

EHR

electronic health record

HF

heart failure

ED

emergency department

MDCS

Malmö Diet and Cancer Study

LVAD

left ventricular assist device

BNP

B-type natriuretic peptide

eGFR

estimated glomerular filtration rate

BMI

body mass index

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

REFERENCES

  • 1.Dieplinger B, Gegenhuber A, Haltmayer M, Mueller T. Evaluation of novel biomarkers for the diagnosis of acute destabilised heart failure in patients with shortness of breath. Heart 2009;95:1508–13. [DOI] [PubMed] [Google Scholar]
  • 2.van Kimmenade RR, Januzzi JL Jr., Ellinor PT et al. Utility of amino-terminal pro-brain natriuretic peptide, galectin-3, and apelin for the evaluation of patients with acute heart failure. J Am Coll Cardiol 2006;48:1217–24. [DOI] [PubMed] [Google Scholar]
  • 3.Bowton E, Field JR, Wang S et al. Biobanks and electronic medical records: enabling cost-effective research. Sci Transl Med 2014;6:234cm3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Frankovich J, Longhurst CA, Sutherland SM. Evidence-based medicine in the EMR era. N Engl J Med 2011;365:1758–9. [DOI] [PubMed] [Google Scholar]
  • 5.Roque FS, Jensen PB, Schmock H et al. Using electronic patient records to discover disease correlations and stratify patient cohorts. PLoS Comput Biol 2011;7:e1002141. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Lam MP, Ping P, Murphy E. Proteomics Research in Cardiovascular Medicine and Biomarker Discovery. J Am Coll Cardiol 2016;68:2819–2830. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Lindsey ML, Mayr M, Gomes AV et al. Transformative Impact of Proteomics on Cardiovascular Health and Disease: A Scientific Statement From the American Heart Association. Circulation 2015;132:852–72. [DOI] [PubMed] [Google Scholar]
  • 8.Smith JG, Gerszten RE. Emerging Affinity-Based Proteomic Technologies for Large-Scale Plasma Profiling in Cardiovascular Disease. Circulation 2017;135:1651–1664. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Wells QS, Veatch OJ, Fessel JP et al. Genome-wide association and pathway analysis of left ventricular function after anthracycline exposure in adults. Pharmacogenet Genomics 2017;27:247–254. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Van Driest SL, Wells QS, Stallings S et al. Association of Arrhythmia-Related Genetic Variants With Phenotypes Documented in Electronic Medical Records. JAMA 2016;315:47–57. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Denny JC, Crawford DC, Ritchie MD et al. Variants near FOXE1 are associated with hypothyroidism and other thyroid conditions: using electronic medical records for genome- and phenome-wide studies. Am J Hum Genet 2011;89:529–42. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Roden DM, Pulley JM, Basford MA et al. Development of a large-scale de-identified DNA biobank to enable personalized medicine. Clin Pharmacol Ther 2008;84:362–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.McGregor TL, Van Driest SL, Brothers KB, et al. Inclusion of pediatric samples in an opt-out biorepository linking DNA to de-identified medical records: pediatric BioVU. Clin Pharmacol Ther 2013;93:204–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Bowton EA, Collier SP, Wang X et al. Phenotype-Driven Plasma Biobanking Strategies and Methods. J Pers Med 2015;5:140–52. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Collins SP, Lindsell CJ, Jenkins CA et al. Risk stratification in acute heart failure: rationale and design of the STRATIFY and DECIDE studies. Am Heart J 2012;164:825–34. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Smith JG, Newton-Cheh C, Almgren P et al. Assessment of conventional cardiovascular risk factors and multiple biomarkers for the prediction of incident heart failure and atrial fibrillation. J Am Coll Cardiol 2010;56:1712–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Ingelsson E, Arnlov J, Sundstrom J, Lind L. The validity of a diagnosis of heart failure in a hospital discharge register. Eur J Heart Fail 2005;7:787–91. [DOI] [PubMed] [Google Scholar]
  • 18.Gold L, Ayers D, Bertino J et al. Aptamer-based multiplexed proteomic technology for biomarker discovery. PLoS One 2010;5:e15004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Ngo D, Sinha S, Shen D et al. Aptamer-Based Proteomic Profiling Reveals Novel Candidate Biomarkers and Pathways in Cardiovascular Disease. Circulation 2016;134:270–85. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Benson MD, Yang Q, Ngo D et al. Genetic Architecture of the Cardiovascular Risk Proteome. Circulation 2018;137:1158–1172. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Zimmerman LJ, Li M, Yarbrough WG, Slebos RJ, Liebler DC. Global stability of plasma proteomes for mass spectrometry-based analyses. Mol Cell Proteomics 2012;11:M111 014340. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Eleuteri E, Di Stefano A, Tarro Genta F et al. Stepwise increase of angiopoietin-2 serum levels is related to haemodynamic and functional impairment in stable chronic heart failure. Eur J Cardiovasc Prev Rehabil 2011;18:607–14. [DOI] [PubMed] [Google Scholar]
  • 23.Berezin AE, Kremzer AA, Samura TA. Circulating thrombospondine-2 in patients with moderate-to-severe chronic heart failure due to coronary artery disease. J Biomed Res 2015;30. [DOI] [PubMed] [Google Scholar]
  • 24.Eleuteri E, Di Stefano A, Giordano A et al. Prognostic value of angiopoietin-2 in patients with chronic heart failure. Int J Cardiol 2016;212:364–8. [DOI] [PubMed] [Google Scholar]
  • 25.Kimura Y, Izumiya Y, Hanatani S et al. High serum levels of thrombospondin-2 correlate with poor prognosis of patients with heart failure with preserved ejection fraction. Heart Vessels 2016;31:52–9. [DOI] [PubMed] [Google Scholar]
  • 26.Chen S, Guo L, Chen B, Sun L, Cui M. Association of serum angiopoietin-1, angiopoietin-2 and angiopoietin-2 to angiopoietin-1 ratio with heart failure in patients with acute myocardial infarction. Experimental therapeutic med. 2013;5:937–941. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Chong AY, Caine GJ, Freestone B, Blann AD, Lip GY. Plasma angiopoietin-1, angiopoietin-2, and angiopoietin receptor tie-2 levels in congestive heart failure. J Am Coll Cardiol 2004;43:423–8. [DOI] [PubMed] [Google Scholar]
  • 28.Hanatani S, Izumiya Y, Takashio S et al. Circulating thrombospondin-2 reflects disease severity and predicts outcome of heart failure with reduced ejection fraction. Circulation J. 2014;78:903–10. [DOI] [PubMed] [Google Scholar]
  • 29.Link A, Poss J, Rbah R et al. Circulating angiopoietins and cardiovascular mortality in cardiogenic shock. Eur Heart J 2013;34:1651–62. [DOI] [PubMed] [Google Scholar]
  • 30.Poss J, Ukena C, Kindermann I et al. Angiopoietin-2 and outcome in patients with acute decompensated heart failure. Clinical research cardiol. 2015;104:380–7. [DOI] [PubMed] [Google Scholar]
  • 31.Benest AV, Kruse K, Savant S et al. Angiopoietin-2 is critical for cytokine-induced vascular leakage. PLoS One 2013;8:e70459. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Saharinen P, Eklund L, Miettinen J et al. Angiopoietins assemble distinct Tie2 signalling complexes in endothelial cell-cell and cell-matrix contacts. Nat Cell Biol 2008;10:527–37. [DOI] [PubMed] [Google Scholar]
  • 33.Papageorgiou AP, Swinnen M, Vanhoutte D et al. Thrombospondin-2 prevents cardiac injury and dysfunction in viral myocarditis through the activation of regulatory T-cells. Cardiovascular research 2012;94:115–24. [DOI] [PubMed] [Google Scholar]
  • 34.van Almen GC, Swinnen M, Carai P et al. Absence of thrombospondin-2 increases cardiomyocyte damage and matrix disruption in doxorubicin-induced cardiomyopathy. J molecular cellular cardiol. 2011;51:318–28. [DOI] [PubMed] [Google Scholar]
  • 35.Swinnen M, Vanhoutte D, Van Almen GC et al. Absence of thrombospondin-2 causes age-related dilated cardiomyopathy. Circulation 2009;120:1585–97. [DOI] [PubMed] [Google Scholar]
  • 36.Schroen B, Heymans S, Sharma U et al. Thrombospondin-2 is essential for myocardial matrix integrity: increased expression identifies failure-prone cardiac hypertrophy. Circulation research 2004;95:515–22. [DOI] [PubMed] [Google Scholar]
  • 37.Han Z, Xiao Z, Kalantar-Zadeh K et al. Validation of a Novel Modified Aptamer-Based Array Proteomic Platform in Patients with End-Stage Renal Disease. Diagnostics. 2018;8. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1

RESOURCES