Abstract
Background
The HEART score is a risk stratification aid that may safely reduce chest pain admissions for emergency department (ED) patients. However, differences in interpretation of subjective components potentially alters the performance of the score. We compared agreement between HEART scores during clinical practice with research generated scores and estimated their accuracy in predicting 30-day major adverse cardiac events (MACE).
Methods
We prospectively enrolled adult ED patients with symptoms concerning for acute coronary syndrome at a single tertiary center. ED clinicians submitted their clinical HEART score during the patient encounter. Researchers then independently interviewed patients to generate a research HEART score. Patients were followed by phone and chart review for MACE. Weighted (WK) and unweighted Cohens Kappa, prevalence adjusted, bias adjusted Kappa (PABAK) and test probabilities were calculated.
Results
From November 2016 to June 2019, 336 patients were enrolled, 261 (77.7%) were admitted and 30 (8.9%) had a MACE. Dichotomized HEART scores agreement was 78% (Kappa 0.48, 95CI 0.37–0.58, PABAK 0.57, 95CI 0.48–0.65) with the lowest agreement in the history (72%, WK 0.14, 95CI 0.06–0.22), and electrocardiogram (85%, WK 0.4, 95CI 0.3–0.49) components. Compared to researchers, clinicians had 100%, sensitivity (95CI 88.4%−100% versus 86.7%, 95CI 69.3%−96.2%) and 27.8% specificity (95CI 22.8%−33.2%, versus 34.6%, 95CI 29.3%−40.3%) for MACE. Four participants with a low research HEART score had MACE.
Conclusion
ED clinicians had only moderate agreement with research HEART scores. Combined with uncertainties regarding accuracy in predicting MACE, we urge caution in the widespread use of the HEART score as the sole determinant of ED disposition.
INTRODUCTION
At 6.5 million visits per year, chest pain is the second most common presenting symptom to US emergency departments (EDs), with hospital costs exceeding 10 billion dollars annually.(1) Despite the high cost of hospitalization, the incidence of major adverse cardiac events (MACE), including percutaneous coronary intervention (PCI), myocardial infarction (MI), coronary artery bypass graft (CABG) surgery and death in patients with non-ischemic electrocardiograms (ECG’s) and normal serum cardiac troponins is low.(2) Yet, provider overestimation of risk, combined with the personal and legal fears of misdiagnosis continue to reinforce potentially unnecessary low-risk chest pain admissions.(3)
The HEART score is a risk stratification aid designed to estimate the probability of MACE within 30 days for patients who present to the ED with symptoms concerning for acute coronary syndrome (ACS). The HEART score uses a mix of subjective and objective variables to stratify patients into a low risk category, with an estimated 30-day MACE rate between 0.9% and 1.7%, or to moderate to high risk, with an estimated 30-day MACE rate between 12% and 65%.(4,5) The components used in calculating a HEART score include the patient history, past medical risk factors, age, electrocardiogram interpretation and serum troponin results. The HEART score has been validated and endorsed by clinical practice guidelines, and its use has expanded throughout the US and beyond.(6–10)
However, the utility of the HEART score when used by ED clinicians outside a research setting remains unclear. Specifically, because the HEART score is comprised of components that require subjective interpretation, it is unknown if clinicians are calculating the HEART score in a manner consistent with previous validation studies, or if persistent overestimation of risk, uncertainty, or lack of knowledge may alter the calculation and subsequent accuracy of the HEART score in predicting MACE. The primary objective of our study was to evaluate agreement between HEART scores calculated during clinical practice (clinician scores) compared to scores generated using standardized research methods similar to previous validation studies (research scores). In addition, we estimated the accuracy of clinician and research scores in predicting 30-day MACE.
METHODS
We conducted a prospective observational study at a single tertiary, academic, STEMI receiving center with an annual ED volume of over 115,000 visits and a 3-year emergency medicine residency program. We enrolled patients presenting to the ED with symptoms concerning for ACS for whom clinicians reported using the HEART score as part of their usual care, and compared agreement between clinician scores with research generated HEART scores. The study was approved by the Baystate Medical Center Institutional Review Board. The STROBE guidelines were used to ensure the reporting of this observational study.
Selection of Participants
Adult patients (18+) who presented to the ED with a chief complaint of chest pain, pressure or discomfort, or other symptoms for which the treating emergency clinician both considered ACS among their top three diagnoses and was using the HEART Score to risk stratify the patient were eligible for inclusion. Exclusion criteria included patients who the treating ED clinician determined were clinically unstable, altered, or unable to complete an interview, patients with a STEMI or other dynamic ECG changes concerning for active ischemia, patients who were pregnant, and patients with an alternate diagnosis confirmed by the treating clinician through objective testing, including but not limited to aortic dissection, pneumothorax, pneumonia, esophageal rupture, pulmonary embolism, congestive heart failure or arrhythmia.
Emergency Medicine clinicians were included in the study if they were attending physicians, senior residents (years 2 and 3) or advanced practitioners who self-identified that they used the HEART score as part of their regular clinical practice. Given their relative inexperience with the HEART score, emergency medicine interns, medical students, and off service rotating providers were excluded. Because no special training in the HEART score was offered during the study, emergency clinicians who stated they were not familiar with the HEART score or who did not use it in clinical practice were excluded. Finally, emergency clinicians who inherited the patient as a sign out, who were not the provider of record for the patient, or who were study authors were excluded.
Study Design
Trained research associates, available Monday to Friday, 7AM to 9PM with occasional weekend coverage, screened eligible patients by chief complaint and the presence of a laboratory 4th generation troponin order on the ED tracking board. When a patient was identified, the research associate approached the treating ED clinician to determine if both the provider and patient were eligible for the study according to the protocol. (see Supplement 1 for the patient screening form) If both were eligible, the ED clinician was asked to provide their HEART score used during the patient encounter on a brief data collection form.
The research associate then approached the patient for enrollment. Following written consent, the patient completed an in-person structured interview with the research associate. The interview collected information necessary to calculate the history and risk factor components of the research-generated HEART score, which included a detailed description of the patient’s current symptoms, past medical history, family history and history of smoking. The results of the patient interview were not discussed with the treating ED clinician, and did not lead to any changes in the care of the patient. (Figure 1)
Figure 1.

Emergency Department (ED) HEART score study design
All research associates involved in the study received a one-hour training on the HEART score and the data collection forms. Further, research associates were required to observe 5 interviews performed by study authors, then complete 5 proctored interviews with study authors before independently conducting patient interviews. To ensure consistency, research associates were periodically observed at least once a semester during interviews by study authors KEP and SCM throughout the study period.
Patient Follow up
Research associates conducted patient follow up 6–8 weeks after the index ED visit using both clinical chart review and structured phone follow up. Patients were screened for troponin laboratory values, hospitalizations, future healthcare visits and MACE, including MI, PCI, CABG and death. MI was defined by final hospital discharge diagnosis and a rise in 4th generation cardiac troponin, with at least one value above the 99th percentile as described in previous literature.(11) PCI was defined as coronary catherization with documented balloon angioplasty and / or stent placement; diagnostic coronary catheterizations without intervention were not included. Any MACE diagnoses discovered on phone interview were confirmed in the clinical chart.
Research associates were required to attend mandatory chart review training by study authors, and abstracted at least 5 charts in parallel with authors prior to independent data abstraction. Phone interviews and chart reviews were abstracted into standardized data entry forms. (See Supplement 4 for the 30 day chart review template) To ensure consistency, completed chart reviews of research associates were periodically audited at least once a semester during interviews by study authors KEP and SCM throughout the study period. At the completion of follow up, study author WES abstracted and confirmed all MACE, as well as an additional 5% of randomly selected charts; no changes or missed occurrences of MACE or hospitalization were discovered.
Instruments
The ED clinician data collection form was created by the research team and included basic demographic information (gender, role), the provider HEART score (components and final score) the method the ED clinician used to calculate the HEART score (Online Resource, Phone Application, Hospital Specific Protocol, No external resource, Other), and a self-assessment of ED workload (1-not busy, 10-busiest you have ever been). The ED clinician was provided no instructions or prompts on by the research team on how to determine their HEART score. (See Supplement 2 for the ED clinician data form)
The research-generated HEART score data collection form used to guide the research associate interview was modeled after previously published HEART score validation data collection forms, and included detailed yes/no questions to elicit the pertinent presenting symptoms, past medical risk factors, previous surgeries, and smoking history required for HEART score calculation.(12) (See Supplement 3 for the research participant interview form).
Research HEART score Derivation
After study enrollment and follow up were complete, research team members (WS, SG, TM, AK) not involved in patient interviews or primary data collection convened to calculate a research generated HEART score for each study participant. Age and index Troponin were confirmed in the clinical chart. Risk factors were scored per primary HEART score literature using information gathered in the structured patient interview.
The History score was calculated by assigning scores to low and high-risk chest pain features previously described.(8,12) High risk features were each scored as 1 point and included: middle or left sided pain, heavy chest pain, diaphoresis, radiation, nausea or vomiting, exertional pain and relief of symptoms using sublingual nitrates. Low risk features were each scored as −1 point and included: well localized pain, sharp pain, non-exertional pain, no diaphoresis and no nausea or vomiting. Scores were totaled and assigned a HEART score value as follows: −5 to −2 points (mostly low risk features) scored as 0, −1 to 3 points (mix of high and low risk features) scored as 1, and 4 to 7 points (mostly high-risk features) scored as 2. (See Supplement 5 for further details on derivation of the research generated score)
All ECG’s were de-identified and independently scored over two rounds by research team members and content experts WS and TM, who were blinded to participant outcomes. Final inter-rater agreement was 87% (Weighted Kappa (WK) 0.76). Remaining discrepancies were solved through consensus discussion. (See Supplement 6 for further details on the ECG scoring)
Outcomes
The primary outcome was the agreement between the clinicians HEART score with the research generated dichotomized HEART score, defined as low-risk (HEART score 0–3) and moderate to high risk (HEART score 4–15). We also compared agreement of HEART scores as a continuous score (0–15), agreement between HEART score components (History, Age, ECG, Risk Factors, Troponin) and agreement of scores stratified by provider position.
As a secondary study outcome, we evaluated the diagnostic accuracy of clinical and research generated HEART scores on 30 day MACE, overall and stratified by provider role.
Analysis
Descriptive statistics (frequencies and percentages) were used to summarize differences in HEART scores overall and stratified by predefined groups. The frequency of ED clinician participation was summarized to ensure that a minority of ED providers were not overrepresented in the study population.
Cohens Kappa was used to evaluate the primary outcome, agreement between clinician and research dichotomous HEART scores, with a score of 0.01 to 0.2 representing poor agreement, 0.21–0.4 as fair, 0.41–0.6 as moderate, 0.61– 08 as substantial and 0.81 to 1 as near perfect agreement.(13) Intraclass Correlation Coefficient (ICC) was used to evaluate agreement between clinician and research continuous HEART scores, and weighted Kappa (WK) was used to evaluate HEART score components with more than two categories.
Because variations in prevalence influence kappa, and because the troponin component of the HEART score had a very low prevalence of abnormal values, we conducted a sensitivity analysis and estimated a prevalence adjusted, bias adjusted kappa (PABAK) to help understand the range of kappa based on differences in prevalence of scores. (14,15)
The assessment of diagnostic accuracy included sensitivity, specificity, predictive values and likelihood ratios, which were calculated for clinician and research generated HEART scores based on 30-day MACE outcomes.
Study data were collected and managed using REDCap electronic data capture tools. All analyses were performed using Stata MP v15.1.
Sample Size
Based on prior literature demonstrating an ICC statistic of 0.6 between two attending physicians using the TIMI cardiac risk stratification score (12% absolute difference in scores with 2 or more points), we assumed a similar baseline ICC of two providers using the HEART score of 0.6.(16) We estimated a sample of 260 patients would be needed to able to calculate an ICC with 95% confidence intervals within 0.1 (half width CI) using alpha of 0.05 and power of 0.8. Given the frequency of chest pain ED visits, uncertainty surrounding the enrollment of both providers and patients, as well as a desire to include MACE rates, we included an additional 20% enrollment for a final sample size of 312 participants.
RESULTS
Between November 2016 to June 2019, 3,335 patients were screened for inclusion, for which 815 patients were approached for enrollment and 336 were included in the study. (Figure 2)
Figure 2:

Emergency Department (ED) HEART score enrollment flow diagram.
Participants
Fifty-three unique ED clinicians used the HEART score to evaluate the 336 patients included in the study, with each ED clinician evaluating a median of 10 patients during the study period (IQR 7–12). Participants enrolled in the study were 53% male (178/336) with a median age of 59 (IQR 52 to 68). In comparison, of the 479 patients who were approached but not enrolled in the study, 44% (209/479) were male with a median age of 60 (IQR 51–72). Participants past medical history included diabetes (27.7%, 93/336), hypertension (61.3%, 206/336), and hypercholesterolemia (57.1%, 192/336). A diagnosis of atherosclerotic disease, which included a reported history of MI, PCI, CABG or stroke, was present in 36.6% (123/336) of patients.
Over half of participants (51.8%, 174/336) were seen by a female ED provider, with 27% (91/336) seen primarily by an attending ED physician, 31.3% (105/336) by an advanced practitioner, 22% (74/336) by a 3rd year resident and 19.6% (66/336) by a 2nd year resident. Seventy eight percent (261/336) were admitted to the hospital on the index ED visit. (Table 1)
Table 1.
Demographics of the 336 ED patients enrolled in the study.
| ED Patient Demographics (N=336) | |
|---|---|
| Age (Median (IQR)) | 59 (52, 68) |
| Gender | |
| Male | 178 (53.0%) |
| Female | 158 (47.0%) |
| Race / Ethnicity | |
| Black | 33 (9.8%) |
| Hispanic | 45 (13.4%) |
| White | 243 (72.3%) |
| Past Medical History | |
| Hypertension | 206 (61.3%) |
| Diabetes | 93 (27.7%) |
| Hypercholesterolemia | 192 (57.1%) |
| Atherosclerotic Disease | 123 (36.6%) |
| Current Smoker | 71 (21.1%) |
| Emergency Severity Index (ESI) Level | |
| 2 | 279 (83.0%) |
| 3 | 51 (15.2%) |
| 4 | 1 (0.3%) |
| Treating ED Clinician Gender | |
| Male | 162 (48.2%) |
| Female | 174 (51.8%) |
| ED Clinician Role | |
| 2nd Year Resident | 66 (19.6%) |
| 3rd Year Resident | 74 (22.0%) |
| Advanced Practitioner | 105 (31.3%) |
| Fellow or Attending Physician | 91 (27.1%) |
| Used to Calculate HEART score | |
| Online Calculator (e.g. MDCalc) | 243 (72.3%) |
| Phone Application | 20 (6.0%) |
| Hospital Specific Protocol | 1 (0.3%) |
| Memory | 65 (19.3%) |
| Other | 7 (2.1%) |
| Perceived ED Clinician Workload (1–10) (mean/SD) | 5.3 (2.1) |
| ED Patient Outcomes | |
| Present within 30 days after index ED visit: | |
| Hospitalization during index ED visit | 261 (77.7%) |
| Hospitalization after ED discharge | 28 (8.3%) |
| Repeat ED visit | 22 (6.5%) |
| 30-day MACE event* | 30 (8.9%) |
| Myocardial Infarction | 14 (4.2%) |
| Percutaneous Coronary Intervention | 14 (4.2%) |
| Coronary Artery Bypass Grafting | 10 (3%) |
| Death | 1 (0.3%) |
Each patient may have had multiple MACE qualifying events
Primary Outcome
Dichotomous HEART score agreement between clinicians and researchers was 78% (263/336) with a kappa of 0.48 (95CI 0.37–0.58, PABAK 0.57, 95CI 0.48–0.65). ED clinicians scored 49 patients as high-risk who had low-risk research HEART scores, and 24 patients as low-risk who had high-risk research HEART scores. (Table 2) Agreement in the continuous HEART score (0–15) among ED clinicians and researchers was (ICC 0.65 (95CI 0.59–0.71).
Table 2.
Two by two contingency table of the frequency of emergency department (ED) clinician and research generated high and low HEART score risk stratification with total percent for each cell.
| ED Clinician Score | Low Risk HEART score (0–3) | 61 (18%) | 24 (7%) | 85 (25%) |
| High-Risk HEART score (4–15) | 49 (15%) | 202 (60%) | 251 (75%) | |
| 110 (33%) | 226 (67%) | 336 (100%) |
Evaluating components of the HEART score, agreement was highest for Age (agreement 96.7%, WK 0.89, 95CI 0.85–0.94, PABAK 0.93, 95CI 0.90–0.96) and Troponin (agreement 98%, WK 0.46, 95CI 0.26–0.67, PABAK 0.96, 95CI 0.93–0.98). Agreement was lowest for History (agreement 72%, WK 0.14, 95CI 0.06–0.22, PABAK 0.37, 95CI 0.30–0.43). Compared to research HEART scores, ED clinicians assigned higher mean History scores for participants (1.2, SD 0.7 versus 1.0, SD 0.6). (Table 3)
Table 3.
Agreement between emergency department (ED) clinician and research generated HEART scores. Interrater reliability (IRR) includes intraclass correlation coefficient (ICC) for continuous variables, kappa (K) for dichotomous variables, weighted kappa (WK) and prevalence adjusted, bias adjusted kappa (PABAK) for ordinal variables.
| HEART score (continuous) | 4.3 (1.4) | 4.1 (1.4) | n/a | ICC | 0.65 | 0.59–0.71 |
| Heart score (dichotomous) | n/a | n/a | 78% | K | 0.48 | 0.37 – 0.58 |
| PABAK | 0.57 | 0.48 – 0.65 | ||||
| Components | ||||||
| History | 1.2 (0.7) | 1 (0.6) | 72% | WK | 0.14 | 0.06 – 0.22 |
| PABAK | 0.37 | 0.30 – 0.43 | ||||
| ECG | 0.3 (0.5) | 0.4 (0.6) | 84.8% | WK | 0.40 | 0.30 – 0.49 |
| PABAK | 0.66 | 0.60 – 0.72 | ||||
| Age | 1.2 (0.6) | 1.1 (0.6) | 96.7% | WK | 0.89 | 0.85 – 0.94 |
| PABAK | 0.93 | 0.90 – 0.96 | ||||
| Risk Factors | 1.5 (0.6) | 1.5 (0.6) | 85.1% | WK | 0.51 | 0.42 – 0.59 |
| PABAK | 0.67 | 0.61 – 0.72 | ||||
| Troponin | 0 (0.2) | 0 (0.2) | 98.1% | WK | 0.46 | 0.26 – 0.67 |
| PABAK | 0.96 | 0.93 – 0.98 |
Of the 73 participants with discordant HEART scores, 59% (43/73) differed by 1 point, 37% (27/73) by 2 points, and 4% (3/73) by 3 points. The most common difference in discordant scores was between a threshold score of 3 to 4 (58.9%, 43/73) followed by a score of 3 to 5 (20.5%, 15/73) The most common HEART score disagreement in discordant scores was history at 44.7% (51/114) followed by risk factors at 32.5% (37/114) and ECG interpretation at 19.3% (22/114). (see Supplements 7 and 8 for details on discordant HEART scores)
HEART score Accuracy in Predicting 30-day MACE Events
In addition to the structured chart review, 74.2% (262/336) of participants were able to be reached for phone follow up. During the study period, the 30-day participant MACE rate was 8.9% (30/336), which included 1 STEMI, 13 non-STEMI, 14 PCI, 10 CABG, and 1 death. All patients with 30-day MACE were admitted during the index ED visit. Four MACE events occurred after discharge from the index hospital admission.
Compared to research generated HEART scores, ED clinicians’ HEART scores had a 100% sensitivity (95CI 88.4%−100%, versus 86.7%, 95CI 69.3%−96.2%), and a 27.8% specificity (95CI 22.8%−33.2%, versus 34.6%, 95CI 29.3% – 40.3%) in predicting 30-day MACE. This trend was consistent across all ED clinician roles (Table 4).
Table 4.
Diagnostic Accuracy of the HEART score for predicting 30-day major adverse cardiac events (MACE). Sensitivity (Sen) Specificity (Spec), Positive Predictive Value (PPV), Negative Predictive Value (NPV) Positive Likelihood Ratio (+LR) and Negative Likelihood Ratio (−LR) with 95% Confidence interval (95CI)
| Research-Generated HEART score | 39.3% | 86.7% (69.3%–96.2%) | 34.6% (29.3%–40.3%) | 11.5% (7.7%–16.4%) | 96.4% (91%–99%) | 1.33 (1.13–1.56) | 0.39 (0.15–0.97) |
| Clinician HEART score | 34.2% | 100.0% (88.4%–100%) | 27.8% (22.8%–33.2%) | 12.0% (8.2%–16.6%) | 100% (95.8%–100%) | 1.38 (1.29–1.48) | 0 |
| ED Provider Position | |||||||
| Senior Resident (N=140) | 36.4% | 100% (75.3%–100%) | 29.9% (22.1%–38.7%) | 12.7% (7%–20.8%) | 100% (90.7%–100%) | 1.43 (1.278–1.6) | 0 |
| Advanced Practitioner (N=105) | 32.4% | 100% (63.1%–100%) | 26.8% (18.3%–36.8%) | 10.1% (4.5%–19%) | 100% (86.8%–100%) | 1.37 (1.21–1.54) | 0 |
| Attending Physician (N=91) | 33% | 100% (66.4%–100%) | 25.6% (16.6%–36.4%) | 12.9% (6.1%–23%) | 100% 83.9%–100%) | 1.34 (1.18–1.53) | 0 |
Based on index admission to the hospital, ED providers adhered to their own clinical HEART score recommendations in 87.5% of cases (294/336, 95CI = 83.5% to 90.8%). Of the 85 patients with a low-risk clinical HEART score, 33% (26, 95CI = 23.1 to 44.0) were admitted to the hospital. None of the patients classified by clinicians as low-risk went on to have MACE. Of 251 patients with a high-risk clinical HEART score, 6.4% (16, 95CI = 3.7 to 10.1) were discharged home, with 8 leaving against medical advice as documented by the clinical chart.
In contrast, 4 patients classified as low risk by research-generated scores subsequently had a MACE, which included one PCI, one CABG, two STEMI events and one death. (see Supplement 9 for case details)
LIMITATIONS
Our study has multiple limitations. Sampling bias may impact the generalizability of results. Specifically, the screening of patients, determination of eligibility by the treating ED clinician, and the high decline to participate rate limit generalizability. With respect to screening, due to available resources, trained undergraduate research associates screened the electronic health record for eligible participants. As such, we were unable to screen during overnights and some weekends. Further, because research associates were not medical providers, screening was based on typical chief complaint symptoms and may have missed atypical presentations of ACS.
After screening, the treating ED clinician determined if the patient met study criteria. As our goal was to capture HEART scores used in clinical practice, it was important that the ED clinician determined if ACS was on their differential and if using the HEART Score was appropriate. It is possible that this method of selection resulted in a slightly healthier patient population, reflected in the low 30-day MACE rate.
Finally, many eligible patients declined to participate in the study. Patients were often approached toward the end of their ED workup, given the need for a troponin to calculate the HEART score, and we were told many did not want to spend extra time to participate in the study. Additionally, although similar in age, a higher ratio of females declined to participate compared to our study population.
To address sampling bias, we attempted to capture and present inclusion and exclusion criteria for all screened patients. Of patients who consented to participate in the study, all patients but 4 completed the study. Further, we found no additional MACE on phone follow up that was not already discovered on chart review. Therefore, although the characteristics of our study population may be slightly different from the general population, our adherence to rigorous study methods and transparency in reporting strengthens the validity of our study.
Second, if ED clinicians became aware of the study aims, it could have changed their approach to calculating the HEART Score. In order to avoid bias secondary to the Hawthorne effect, we maintained multiple safeguards. First, our provider data entry sheet did not contain any prompts or guidelines on how to calculate the HEART Score. Second, ED clinicians were not present during the patient interview to generate data for the research HEART Score. Third, research HEART Scores were not calculated until the end of the study, in part so that providers could not compare the research score with their own clinical score. Finally, the study took place over 3 years involving 53 unique ED clinicians, limiting the likelihood that the ED clinician adapted practice due to participating multiple times in the study.
Finally, our study was not powered to detect a significant difference MACE. As a secondary outcome, we hope that our MACE results help to inform the creation of future studies to determine the performance of the HEART score in clinical practice.
DISCUSSION
Understanding the real-world application of risk stratification aids outside of a research setting is important, especially when components of the tool require clinician interpretation. Poor agreement, differences in accuracy or mis-application may change the performance of the tool and the subsequent impact on patient care.
Previous studies examining agreement among providers using the HEART score have been mixed. An evaluation of 88 patients with suspected ischemic chest pain found excellent agreement among ED physicians and nurses, with an ICC of 0.91 (95% CI 0.87–0.93). However, clinicians had access to educational initiatives including posters, YouTube videos and pocket-sized HEART score reference cards not normally available to the larger emergency medicine community.(17) Additionally, a prospective evaluation of 311 ED patients with chest pain found good agreement on dichotomized HEART score rating between attending and resident emergency physicians (Kappa 0.68, 95%CI 0.60–0.77), though it was unclear how much communication about the patient presentation occurred prior to HEART score calculation.(18) On the other hand, studies comparing agreement between emergency physicians and prehospital nursing and cardiology attendings found substantially lower rates of agreement (Kappa 0.514, 0.13 respectively)(19,20) In each study, inter-rater reliability was worse for the subjective components of the HEART score.
In order to evaluate HEART scores in practice, we prospectively enrolled a diverse group of ED clinicians who were using the HEART score in real time during an actual patient encounter. We provided no study specific education or aid to the ED provider, to avoid influencing their HEART score calculation and subsequent patient care. As a comparator, we generated a standardized research score independent of the clinicians’ HEART score. Although simple agreement was relatively high at 78%, inter-rater agreement was only moderate for the dichotomized HEART scores, even when adjusting for differences in prevalence. Our results are similar to Mahler et al., who demonstrated moderate agreement (kappa 0.63) between treating ED providers and physician study investigators in 282 patients randomized to the HEART Pathway.(12)
Further, there were multiple concerning trends in the 22% of discordant clinician and research HEART scores. While most discordant scores differed by only one point, it was common to differ at the critical HEART score test threshold between 3 and 4 points. Additionally, differences in scores were most frequently due to the history component, which had a low kappa value. These results raise concerns that whatever methods our ED clinicians were using to score the history component of the HEART score (and, to a lesser extent, risk factors and ECG interpretation) do not appear consistent with the methods described in prior validation studies. The poor agreement in subjective components may subsequently change the overall performance of the HEART score in predicting MACE. It remains unclear if the differences in agreement between the subjective components of the HEART score could be improved with education, clinical decision support or other aids.
With respect to accuracy, though not significant, ED clinicians had higher raw sensitivities and lower specificities for MACE compared to research scores, with no observed MACE events in patients who ED clinicians assigned as low risk. ED clinicians also had higher overall HEART and history component scores, suggesting our clinicians may have been more risk adverse, potentially reducing the stated benefit of safely discharging low risk chest pain patients. On the other hand, the research generated HEART Score sensitivity was low with confidence intervals between 68%−96%. Although wide, our confidence intervals are consistent with prior meta-analyses, with pooled HEART score sensitivities of 96% (95CI 93% to 98%). (21,22) Our results raise concerns that the HEART score, even when collected in a standardized fashion, may miss MACE at a rate too high for broad clinical implementation as standard of care. Large scale, clinical outcomes studies are needed to assess the accuracy of the HEART score in clinical practice compared to clinical gestalt in predicting MACE.
Finally, we were surprised to find a high rate of nonadherence by clinicians in the application of their own HEART scores. Specifically, 33% of the clinician assigned low-risk HEART score patients were admitted on the index visit, with none having a subsequent MACE. While adherence is likely determined by hospital culture and policies, it does raise questions of variability in implementation and adherence to the HEART score. In the HEART Pathway, Mahler et al. discovered ED provider nonadherence in 20% (28/141) of patients, including over-testing in 13% (19/141), and 10 additional admissions.(23) Few studies have addressed the prevalence of non-adherence to the HEART score and the subsequent influence on outcomes. (24)
In conclusion, ED clinicians demonstrated only moderate agreement with research generated dichotomized HEART scores, with most discrepancies occurring at the test threshold and involving the history component, which had low agreement. Even when using standardized methods to generate a research HEART score, sensitivity was low. With uncertainties in agreement, accuracy and adherence, we urge caution in the widespread use of the HEART score in isolation as a standard of care to determine the disposition of ED patients with chest pain.
Supplementary Material
Grants:
This work is supported by a grant from the Agency for Healthcare Research and Quality (AHRQ) Health Effectiveness and Outcomes Research (5R03HS024815-02), as well as the Tufts REDCap grant UL1TR002544. WES is supported by grant 5K08DA045933-03 from the National Institute on Drug Abuse. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Conflicts of Interest: The authors have no financial conflicts or competing interests to disclose.
Meetings: Results were presented at the 2019 SAEM National Meeting in Las Vegas, NV. Interim results were presented at the 2018 SAEM National Meeting in Indianapolis, IN.
References
- 1.Owens PL, Barrett ML, Gibson TB, Andrews RM, Weinick RM, Mutter RL. Emergency department care in the United States: a profile of national data sources. Ann Emerg Med. 2010. August;56(2):150–65. [DOI] [PubMed] [Google Scholar]
- 2.Weinstock MB, Weingart S, Orth F, VanFossen D, Kaide C, Anderson J, et al. Risk for Clinically Relevant Adverse Cardiac Events in Patients With Chest Pain at Hospital Admission. JAMA Intern Med. 2015. July 1;175(7):1207. [DOI] [PubMed] [Google Scholar]
- 3.Lin GA, Redberg RF. Addressing Overuse of Medical Services One Decision at a Time. JAMA Intern Med. 2015. July 1;175(7):1092. [DOI] [PubMed] [Google Scholar]
- 4.Six AJ, Backus BE, Kelder JC. Chest pain in the emergency room: value of the HEART score. NHJL. 2008. June;16(6):191–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Backus BE, Six AJ, Kelder JC, Mast TP, van den Akker F, Mast EG, et al. Chest Pain in the Emergency Room: A Multicenter Validation of the HEART Score. Critical Pathways in Cardiology: A Journal of Evidence-Based Medicine. 2010. September;9(3):164–9. [DOI] [PubMed] [Google Scholar]
- 6.Backus BE, Six AJ, Kelder JC, Bosschaert MAR, Mast EG, Mosterd A, et al. A prospective validation of the HEART score for chest pain patients at the emergency department. International Journal of Cardiology. 2013. October;168(3):2153–8. [DOI] [PubMed] [Google Scholar]
- 7.American College of Emergency Physicians Clinical Policies Subcommittee (Writing Committee) on Suspected Non–ST-Elevation Acute Coronary Syndromes:, Tomaszewski CA, Nestler D, Shah KH, Sudhir A, Brown MD. Clinical Policy: Critical Issues in the Evaluation and Management of Emergency Department Patients With Suspected Non-ST-Elevation Acute Coronary Syndromes. Ann Emerg Med. 2018;72(5):e65–106. [DOI] [PubMed] [Google Scholar]
- 8.Six AJ, Cullen L, Backus BE, Greenslade J, Parsonage W, Aldous S, et al. The HEART Score for the Assessment of Patients With Chest Pain in the Emergency Department: A Multinational Validation Study. Critical Pathways in Cardiology. 2013. September;12(3):121–6. [DOI] [PubMed] [Google Scholar]
- 9.Oliver JJ, Streitz MJ, Hyams JM, Wood RM, Maksimenko YM, Long B, et al. An external validation of the HEART pathway among Emergency Department patients with chest pain. Intern Emerg Med. 2018. December;13(8):1249–55. [DOI] [PubMed] [Google Scholar]
- 10.Mahler SA, Miller CD, Hollander JE, Nagurney JT, Birkhahn R, Singer AJ, et al. Identifying patients for early discharge: Performance of decision rules among patients with acute chest pain. International Journal of Cardiology. 2013. September;168(2):795–802. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Luepker RV, Apple FS, Christenson RH, Crow RS, Fortmann SP, Goff D, et al. Case definitions for acute coronary heart disease in epidemiology and clinical research studies: a statement from the AHA Council on Epidemiology and Prevention; AHA Statistics Committee; World Heart Federation Council on Epidemiology and Prevention; the European Society of Cardiology Working Group on Epidemiology and Prevention; Centers for Disease Control and Prevention; and the National Heart, Lung, and Blood Institute. Circulation. 2003. November 18;108(20):2543–9. [DOI] [PubMed] [Google Scholar]
- 12.Mahler SA, Riley RF, Hiestand BC, Russell GB, Hoekstra JW, Lefebvre CW, et al. The HEART Pathway Randomized Trial: Identifying Emergency Department Patients With Acute Chest Pain for Early Discharge. Circ Cardiovasc Qual Outcomes. 2015. March;8(2):195–203. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.McHugh ML. Interrater reliability: the kappa statistic. Biochem Med (Zagreb). 2012;22(3):276–82. [PMC free article] [PubMed] [Google Scholar]
- 14.Gwet KL 2014. Handbook of Inter-Rater Reliability: The Definitive Guide to Measuring the Extent of Agreement Among Raters. 4th ed. Gaithersburg, MD: Advanced Analytics. [Google Scholar]
- 15.Brennan RL, and Prediger DJ. 1981. Coefficient kappa: Some uses, misuses, and alternatives. Educational and Psychological Measurement 41: 687–699. [Google Scholar]
- 16.Taylor B, Mancini M. Discrepancy Between Clinician and Research Assistant in TIMI Score Calculation (TRIAGED CPU). WestJEM. 2015. January 1;16(1):24–33. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Niven WGP, Wilson D, Goodacre S, Robertson A, Green SJ, Harris T. Do all HEART Scores beat the same: evaluating the interoperator reliability of the HEART Score. Emerg Med J. 2018. December;35(12):732–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Gershon CA, Yagapen AN, Lin A, Yanez D, Sun BC. Inter-rater Reliability of the HEART Score. Acad Emerg Med. 2019;26(5):552–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.van Meerten KF, Haan RMA, Dekker IMC, van Zweden HJJ, van Zwet EW, Backus BE. The interobserver agreement of the HEART-score, a multicentre prospective study. Eur J Emerg Med. 2020. October 29; [DOI] [PubMed] [Google Scholar]
- 20.Wu WK, Yiadom MYAB, Collins SP, Self WH, Monahan K. Documentation of HEART score discordance between emergency physician and cardiologist evaluations of ED patients with chest pain. Am J Emerg Med. 2017. January;35(1):132–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Laureano-Phillips J, Robinson RD, Aryal S, Blair S, Wilson D, Boyd K, et al. HEART Score Risk Stratification of Low-Risk Chest Pain Patients in the Emergency Department: A Systematic Review and Meta-Analysis. Ann Emerg Med. 2019;74(2):187–203. [DOI] [PubMed] [Google Scholar]
- 22.Fernando SM, Tran A, Cheng W, Rochwerg B, Taljaard M, Thiruganasambandamoorthy V, et al. Prognostic Accuracy of the HEART Score for Prediction of Major Adverse Cardiac Events in Patients Presenting With Chest Pain: A Systematic Review and Meta-analysis. Acad Emerg Med. 2019;26(2):140–51. [DOI] [PubMed] [Google Scholar]
- 23.Mahler SA, Riley RF, Russell GB, Hiestand BC, Hoekstra JW, Lefebvre CW, et al. Adherence to an Accelerated Diagnostic Protocol for Chest Pain: Secondary Analysis of the HEART Pathway Randomized Trial. Baumann B, editor. Acad Emerg Med. 2016. January;23(1):70–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Westafer LM, Kunz A, Bugajska P, Hughes A, Mazor KM, Schoenfeld EM, et al. Provider Perspectives on the Use of Evidence-based Risk Stratification Tools in the Evaluation of Pulmonary Embolism: A Qualitative Study. Courtney DM, editor. Acad Emerg Med. 2020. June;27(6):447–56. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
