Abstract
Objective:
Clinicians order laboratory tests in an effort to reduce diagnostic or therapeutic uncertainty. Information theory provides the opportunity to quantify the degree to which a test result is expected to reduce diagnostic uncertainty. We sought to apply information theory toward the evaluation and optimization of a diagnostic test threshold and to determine if the results would differ from those of conventional methodologies. We used a heparin/PF4 immunoassay (PF4 ELISA) as a case study.
Materials and Methods:
The laboratory database was queried for PF4 ELISA and serotonin release assay (SRA) results during the study period, with the latter serving as the gold standard for the disease heparin-induced thrombocytopenia (HIT). The optimized diagnostic threshold of the PF4 ELISA test was compared using conventional versus information theoretic approaches under idealized (pretest probability = 50%) and realistic (pretest probability = 2.4%) testing conditions.
Results:
Under ideal testing conditions, both analyses yielded a similar optimized optical density (OD) threshold of OD > 0.79. Under realistic testing conditions, information theory suggested a higher threshold, OD > 1.5 versus OD > 0.6. Increasing the diagnostic threshold improved the global information value, the value of a positive test and the noise content with only a minute change in the negative test value.
Discussion:
Our information theoretic approach suggested that the current FDA approved cutoff (OD > 0.4) is overly permissive leading to loss of test value and injection of noise into an already complex diagnostic dilemma. Because our approach is purely statistical and takes as input data that are readily accessible in the clinical laboratory it offers a scalable and data-driven strategy for optimizing test value that may be widely applicable in the domain of laboratory medicine.
Conclusion:
Information theory provides more meaningful measures of test value than the widely used accuracy-based metrics.
Keywords: Information theory, Clinical laboratory techniques, Medical information sciences, Diagnostic errors, Medical informatics
1. Introduction
A central task in clinical laboratory medicine is to evaluate, compare, and optimize diagnostic assays in order to offer tests that are of the greatest possible value to clinicians and patients. A major hurdle to this process is the difficulty associated with quantifying the clinical value of a test. Clinicians are encouraged to use Bayesian inference to estimate the posttest disease probability for a patient using the pretest probability and considering the likelihood ratio associated with the test result. Combined with clinical acumen, such analyses can give a gestalt as to whether the test result changed the disease probability estimate in a clinically actionable way. Unfortunately, humans often struggle to accurately apply Bayesian inference[1-2]. Further, Bayesian inference is difficult to apply within the boundaries of the clinical laboratory for the following reasons: 1) the clinical data needed to estimate patient-specific pretest probabilities are rarely available in an readily usable format, 2) the clinical relevance of a change in disease probability can be subjective, 3) Bayesian inference does not provide a clear objective function for test optimization, and 4) Bayesian inference is cumbersome to bring up to the scale of a high-volume clinical testing center.
For these reasons, laboratories routinely turn to accuracy-based metrics, such as sensitivity and specificity, because they are relatively simple to calculate given an acceptable reference standard. While it is true that an inaccurate test is unlikely to provide clinical value, it is not always true that an accurate test can be expected to provide clinical value. For example, when considering a patient who is not at risk for a disease, it is generally accepted that it would be wasteful to spend money on even the most accurate of laboratory tests and that one could actually cause harm to the patient should the test come back as positive. This extreme example illustrates a critical limitation of accuracy-based measures; they do not consider disease prevalence.
Given that clinicians typically order laboratory testing in an effort to reduce diagnostic or therapeutic uncertainty, we hypothesized that information theory, a branch of statistics that permits the quantitation of uncertainty[3-4], could provide more direct and useful measures of test value when compared to conventional accuracy-based measures of test performance. Claude Shannon, a pioneer in information theory, developed a model of a communication system and a theory of how to design such a system in order to optimize the encoding, transmission, and recovery of messages produces by a stochastic information source[3]. The diagnostic process can be described using Shannon’s model[3,5] (Supplementary Fig. 1). The tested patient population is the stochastic information source emitting messages regarding the underlying state of each patient’s health. This message is encoded into the concentration of the measured analyte. The laboratory assay is the communication channel outputting a human interpretable signal that is a noisy reflection of the analyte concentration. Finally, the interpretation of the laboratory result is analogous to the decoding step. As a result of this parallelism, it is reasonable to assume that the mathematical constructs of information theory can be applied toward optimizing laboratory testing.
Information theory offers several concepts that have been shown to be useful in the evaluation of a clinical test. For example, the information theoretic constructs of mutual information (MI) and information gain (IG) have been proposed previously as metrics of clinical test value [6-12]. MI quantifies the degree to which performing the test can be expected to reduce uncertainty regarding the underlying disease. MI is a measure of global test value that considers the reduction in uncertainty associated with each possible test result and the probabilities of observing each test result. IG (aka ‘relative entropy’) quantifies the expectation that a specific test result, i.e. positive (IG+) or negative (IG−), will reduce diagnostic uncertainty. IG is a useful construct when clinical considerations disproportionately weight a particular test result. For example, if clinicians are more concerned about the potential harms of a missed diagnosis they may focus on IG−, the information value of a negative test result to help confidently rule out the disease. Finally, noise is an information theoretic construct that has received less attention in the diagnostic testing literature. Noise quantifies uncertainty in the test result that is unrelated to the underlying disease process. Noise should be minimized because it leads to false negative or false positive diagnoses and because clinicians may learn to ignore a noisy test. The literature provides several excellent primers on applying information theoretic concepts to diagnostic testing [6,9,13]; however, these approaches have not been widely adapted in practice and we found only few examples of applications towards evaluating and optimizing the diagnostic threshold of a clinical test [7,8,14]. To provide a real-world case study, we evaluated and optimized the performance of the heparin-PF4 immunoassay (PF4 ELISA), a commonly ordered clinical test that aids in the diagnosis of heparin-induced thrombocytopenia (HIT), using either conventional or information theoretic methods.
HIT is a rare but severe complication of exposure to the commonly used anticoagulant drug heparin. Patients become immunized to neoantigens consisting of exogenous heparin and endogenous platelet factor 4 (PF4). The resulting antibodies activate platelets in vivo, leading to a consumptive thrombocytopenia and a substantially elevated risk of arterial and venous thrombosis[15]. Due to the acuity of the thrombotic risk, initial management decisions must be guided by clinical evaluation alone. For this purpose, risk-stratification using the “4Ts score”, a validated clinical assessment tool, is recommended by expert guidelines [16-17]. HIT is considered ruled out in low-risk patients (4Ts = 0–3), while intermediate-risk (4Ts = 4–6) or high-risk (4Ts = 7–8) patients should be empirically treated prior to and during confirmatory testing.
Confirmatory testing typically involves a sequential algorithm[17]. Patients are first screened with an immunoassay that detects reactivity against PF4/heparanoid antigens. These immunoassays are inexpensive and widely available, but they also have limited specificity due to the detection of antibodies that are not functionally relevant, i.e. that do not activate platelets in vivo[18]. Patients who screen positive are then subjected to confirmatory testing by a reference-standard functional assay, such as the serotonin release assay (SRA)[19]. Functional assays determine if the patient’s serum can activate donor platelets in vitro. Functional assays are only available at select reference laboratories due to their technical complexity, which usually precludes their use as a first-line test.
Currently, the U.S. Food and Drug Administration (FDA) has approved the use of the Stago Asserochrom (Parsippany, NJ), heparin/PF4 immunoassay (PF4 ELISA) as a qualitative test using an internally normalized diagnostic threshold of OD > 0.4. However, several studies have demonstrated that the quantitative OD value provides additional information about the likelihood of HIT. Specifically, increasing PF4 ELISA OD values beyond 0.4 are positively correlated to the average 4Ts score, the probability of a positive confirmatory functional assay, and the probability of a thrombotic complication[20-28].
In this study, we used information theory to evaluate and optimize the PF4 ELISA OD threshold under ideal and realistic testing conditions by maximizing the mutual information. We analyzed the impact of this optimization on the IG+, IG−, and noise. Our results suggested that the FDA-approved cutoff performs poorly in practice and that increasing the diagnostic threshold from OD > 0.4 to OD > 1.5 would greatly increase the MI, increase the IG+, and decrease the noise, with only a small decrease the IG−. However, the results also indicated that the most impactful way to improve the information content of the PF4 ELISA is to refine patient selection for testing via the recommended use of the 4Ts score. None of these valuable insights were brought forth when we performed conventional optimization using accuracy-based metrics.
2. Materials and methods
2.1. Assays
PF4 ELISA reactivity was measured using the Asserachrom HPIA (Diagnostica Stago, Parsippany, NJ) according to the manufacturers’ instructions. Although clinical results are typically reported as “indeterminate” if the OD value falls within +/− 5% of an internally standardized cutoff, all samples with OD > 0.4 were taken to be nominally positive in this study. The SRA was performed at Blood Center of Wisconsin (Milwaukee, Wisconsin). SRA positive samples were defined as only the samples reported as “positive”.
2.2. Patient samples
The clinical laboratory database at Barnes-Jewish Hospital (St Louis, MO) was queried for all results of the PF4 ELISA ordered between 1/1/2015 and 3/16/2017 and results of any SRA testing ordered within 24 h of the PF4 ELISA.
2.3. Estimating entropies and information
The bits of entropy, H(X), of a discrete random variable X with the probability distribution p(x) was calculated as[3-4]:
The entropy of a continuous valued random variable, Xc, with the probability density function f(Xc) was estimated as the differential entropy H(XΔ) of XΔ, a quantized version of Xc with bins of length Δ where the probability of element . The Δ used was 0.25 OD units:
2.4. Mutual information
The mutual information shared between the PF4 ELISA and the SRA was calculated as follows [3-4]:
where H(PF4) is the entropy of the marginal distribution of PF4 ELISA results, H(SRA) is the entropy of the marginal distribution of SRA results, and H(PF4,SRA) is the entropy of the joint probability distribution of the PF4 ELISA and SRA results.
2.5. Noise
Noise was defined as the uncertainty surrounding the PF4 ELISA result that was not contained within the mutual information shared between the PF4 ELISA and the SRA[3-4]:
where H(PF4) is the entropy of the marginal distribution of the PF4 ELISA results, and MI(PF4,SRA) is the mutual information shared between PF4 ELISA and the SRA, and H(PF4∣SRA) is the noise.
2.6. Information gain
Information gain associated with a positive PF4 ELISA result, IG+, was calculated as the Kullback-Leibler divergence between the SRA outcome distribution for all samples ignoring PF4 ELISA results (i.e. prior-probability distribution) and the SRA outcome distribution within the group of samples with a positive PF4 ELISA result (i.e. posterior-probability distribution), as described by the following equation[4]:
where p(s) is the probability distribution function of SRA outcomes in the entire population agnostic to the PF4 ELISA results and q+(s) is the probability distribution function of SRA outcomes among the patients with a positive PF4 ELISA result (i.e. p(s∣PF4+)).
In a similar manner, the information gain associated with a negative test result, IG−, was calculated as:
2.7. Statistics
Data are presented as the mean +/− 2*std of 100 bootstraps. Non-parametric pairwise comparisons were made by Mann-Whitney U test.
2.8. Chart review
51 cases were randomly selected for chart review by assigning a consecutive unique index to each sample and then randomly selecting 51 unique indices using the randi function in Matlab[29]. A 4Ts score was calculated by a single author (MAZ) using retrospective chart review and considering only data in the electronic health records that were documented prior to the time of the PF4 ELISA order or in the daily note from the day of the PF4 ELISA order[16].
This study was approved by the Institutional Review Board at Washington University in St. Louis (HRPO #201709086)
3. Results
3.1. Dataset characteristics
During the study period of 26.5 months, the clinical laboratory at Barnes Jewish Hospital performed 2,594 in-house PF4 ELISA tests, of which 474 (18.3%) were positive using the diagnostic cutoff of OD > 0.4. In total, sixty-one samples were confirmed to be positive by the SRA performed at Blood Center of Wisconsin, accounting for 2.4% of the total PF4 ELISA orders and 12.9% of PF4-positive samples. Consistent with previous reports[20-28], the PF4 ELISA OD values for the confirmed SRA-positive samples were significantly greater than that of the remaining samples (Median [IQR]: 2.56 [1.82–2.84] versus 0.13 [0.07–0.26], p < 0.001 by Mann Whitney U test). The information content of a diagnostic test cannot exceed the pretest uncertainty. Therefore, the ideal testing scenario with respect to information yield would be one that maximizes pretest uncertainty. For a single disease test, a case-control study design models this ideal scenario providing a pretest probability of 50% and maximizing uncertainty as to whether or not a randomly selected patient will have the disease of interest. In contrast, the disease prevalence in a real-life clinical testing is typically much lower. Therefore, two datasets were assembled to simulate idealized and realistic testing conditions, respectively. The ‘idealized’ dataset followed a case-control design by including all 61 SRA positive samples as cases and 61 randomly selected samples as controls, giving a pretest probability of 50% (Fig. 1A). The ‘realistic’ dataset included all PF4 ELISA orders as intended to test (Fig. 1B). This dataset included the 61 SRA positive samples as cases and all remaining 2,533 samples as controls, giving a pretest probability of 2.4%. Note that all samples that were not subjected to SRA testing were classified as controls to avoid the following ordering bias: clinicians are unlikely to order SRA testing for patients with a negative PF4 ELISA result.
Fig. 1. Conventional optimization of the PF4 ELISA using accuracy-based metrics.
Results for the “idealized” dataset (prevalence = 50%) are shown in the left panels (A,C,E,G). Results for the “realistic” dataset (prevalence = 2.4%) are shown in the right panels (B,D,F,H). (A,B) Histograms of PF4 ELISA OD values for SRA-positive (filled bars) and SRA negative (open bars) samples. (C,D) Sensitivity (Se, solid line) and specificity (Sp, dashed line) versus OD cutoff. (E,F) Positive predictive value (PPV, solid line) or negative predictive value (NPV, dashed line) versus OD cutoff. (G,H) Receiver operator characteristics curves (ROCC) with area under the curve (AUC). Data are presented as mean +/− 2*std of 100 bootstraps. The FDA-approved cutoff (red) and conventionally optimized cutoffs (blue) are indicated in each panel.
3.2. Optimization of analytical accuracy
The sensitivity, specificity, negative predictive value (NPV), positive predictive value (PPV), and receiver operating characteristic curves (ROCC) were estimated considering all possible diagnostic thresholds (Fig. 1). The sensitivity and specificity curves intersected at a cutoff of OD > 0.78 and OD > 0.6 for the idealized and realistic datasets, respectively (Fig. 1C,D). For the idealized dataset, the NPV and PPV curves intersected at a cutoff of OD > 0.78 (Fig. 1E). For the realistic dataset, the NPV and PPV curves did not intersect because the NPV was>0.975 and above the PPV at all cutoffs (Fig. 1F). The area under the ROCC was 0.89 +/− 0.03 and 0.92 +/− 0.02 for the idealized and realistic datasets, respectively, indicating that the PF4 ELISA has high analytical accuracy under high- and low-prevalence testing scenarios (Fig. 1G,H).
3.3. Optimization of mutual information
Mutual information (MI) was used to estimate the average amount by which the PF4 ELISA result can be expected to reduce uncertainty regarding the presence or absence of HIT. For the idealized dataset (prevalence = 50%), the MI versus OD curve exhibited a bell-shape with the peak MI of 0.61 +/− 0.07 bits occurring at a threshold of OD > 0.79 (Fig. 2C). The MI at cutoff of OD > 0.4 was 0.47 +/− 0.08 bits, an information loss of 23% compared to the optimized threshold. For the realistic dataset (prevalence = 2.4%), the MI versus OD curve again exhibited a bell-shape; however, the peak MI was only 0.09 +/− 0.01 bits and occurred at a threshold of OD > 1.5 (Fig. 2D). In this case, OD > 0.4 provided 0.06 bits of MI, a loss of 33.3% compared the optimized cutoff. These results demonstrate that, 1) although the accuracy remains high, the PF4 ELISA loses the majority of its information value under low pretest probability conditions encountered in practice, and 2) the maximum MI OD threshold increases when the pretest probability decreases.
Fig. 2. Information theoretic based optimization of the PF4 ELISA using metrics of mutual information, noise, and information gain.
Results for the “idealized” dataset (prevalence = 50%) are shown in the left panels (A,C,E,G). Results for the “realistic” dataset (prevalence = 2.4%) are shown in the right panels (B, D,F,H). (A,B) Histograms of PF4 ELISA OD values for SRA-positive (filled bars) and SRA-negative (open bars) samples. (C,D) Mutual information (MI) versus OD cutoff. (E,F) Noise versus OD cutoff. (G,H) Information gain for a positive (IG+, solid line) or negative (IG−, dashed line) test result versus OD cutoff. Data are presented as mean +/− 2*std of 100 bootstraps. The FDA-approved cutoff (red) and mutual information optimized cutoffs (blue) are indicated for each dataset.
3.4. Evaluation of noise
Noise was estimated as the uncertainty in the PF4 ELISA result that was unrelated to whether or not HIT was present. In practice, noise can lead to over-diagnosis or under-diagnosis of HIT, both of which have the potential to cause significant patient harm. Under idealized conditions, the MI maximizing threshold of OD > 0.79 also corresponded to a local minimum of 0.39 +/− 0.07 bits in the noise versus threshold curve (Fig. 2E). At OD > 0.4 the noise increased by 30.8% to 0.51 +/− 0.07. Under realistic conditions, the noisiest cutoff was OD > 0.13, which provided 0.98 +/− 0.002 bits of noise (Fig. 2G). Beyond this threshold, the noise decreased sharply before leveling off. The MI maximizing threshold (OD > 1.5) provided 0.12 +/− 0.01 bits of noise. In comparison, the FDA-approved cutoff (OD > 0.4) provided 0.50 +/− 0.02 bits of noise, an increase of 317%. Thus, when applying the OD > 0.4 cutoff under realistic conditions, the vast majority (0.50/0.59 bits or 85%) of the total uncertainty surrounding the PF4 ELISA result is noise and will not provide useful diagnostic information.
3.5. Evaluation of information gain
In certain clinical settings, the information provided by a positive or a negative result may not be weighted equally. For example, when the potential adverse health outcomes associated with a falsely negative test result can be severe, e.g., a life- or limb-threatening thrombotic complication of untreated HIT, it may be acceptable to sacrifice some amount of global test performance in order to maximize the amount of information provided by a negative result. In this way, HIT can be more confidently ruled out in a patient that has a negative test result.
The expected amount of information gained by a negative PF4 ELISA result was estimated by calculating the Kullback-Leibler divergence (relative entropy) between the probability distribution of SRA results agnostic to the PF4 ELISA result (prior distribution) and that of the negatively testing patients (posterior distribution). The value of a positive PF4 ELISA result was quantified in an analogous way. For the idealized dataset (prevalence = 50%), the information gain provided by a negative PF4 ELISA was 0.64 +/− 0.11 bits and 0.57 +/− 0.09 bits for OD > 0.4 and OD > 0.79, respectively, while the information gain provided by a positive PF4 ELISA was 0.31 +/− 0.08 bits and 0.65 +/− 0.0.1 bits, respectively (Fig. 2D). Thus, under idealized conditions, the application of the FDA-approved cutoff instead of the MI-maximizing cutoff sacrifices 0.34 bits of the information gain of a positive result in order to garner an additional 0.07 bits of information gain of a negative result.
Next, the value of a positive or negative test result was evaluated for the realistic dataset (prevalence = 2.4%). The information gain provided by a positive test was 0.31 +/− 0.03 bits and 2.30 +/− 0.27 bits for OD > 0.4 and OD > 1.5, respectively, while the information gain provided by a negative PF4 ELISA result was 0.027 +/− 0.003 bits and 0.017 +/− 0.003 bits, respectively (Fig. 2H). Therefore, under realistic conditions, applying the FDA-approved cutoff, instead of the MI-optimized cutoff, sacrifices 1.99 bits of the information gain of a positive test for 0.01 bits of additional information gain of a negative test.
Strikingly, for the realistic dataset (prevalence = 2.4%) the information gain associated with a negative PF4 result was no higher than 0.03 bits over the entire range of OD cutoffs (Fig. 2H, dashed line), suggesting that, in practice, a negative PF4 ELISA result provides very little novel information to help rule-out HIT and that many patients could have been ruled out for HIT by clinical evaluation alone. To explore this hypothesis, a 4Ts score was calculated for 51 randomly chosen samples based on retrospective chart review (Supplemental Table 1). 40/51 (78.4%) of the sampled cases had a low-risk 4Ts score (4Ts = 0–3), a category for which laboratory testing is not recommended [17]. These results provided independent evidence for a low HIT prevalence in the tested patient population.
3.6. Impact of prevalence estimate
A sensitivity analysis was performed to address the concern that a biased estimate of the true HIT prevalence in the tested population could lead to an inaccurate estimation of the information content of the PF4 ELISA. The confirmed positive cases were re-sampled to simulate various pretest probabilities up to 10% and the information content of the PF4 ELISA was reanalyzed. At every simulated prevalence, 1) the MI-optimized threshold was OD > 1.5; 2) the IG− was <0.078 bits; and 3) compared to the MI-optimized threshold, application of the FDA-approved threshold garnered very little additional IG− (<0.04 bits) while sacrificing >1 bit of IG+ (Supplementary Fig. 2). These observations indicate that the main results of this study are not sensitive to variation in the assumed HIT prevalence.
3.7. MI of the continuous OD value
It is well established that dichotomization of a continuous test result decreases the power and information content of a diagnostic test[8,30]. To estimate the degree of information loss, we estimated the MI for the continuous-valued PF4 ELISA. Under idealized conditions, the continuous-valued PF4 ELISA result provided 0.93 +/− 0.01 bits of MI, an increase of 53% compared to the optimal dichotomized threshold of OD > 0.79. Under realistic conditions, the continuous-valued PF4 ELISA result provided 0.15 +/− 0.00 bits of MI, an increase of 66.7% compared to the optimized threshold of OD > 1.5.
4. Discussion
4.1. Accuracy versus information
In the present study, it was observed that when the prevalence of HIT was decreased from 50% to 2.4%, the PF4 ELISA remained an accurate test (AUROCC = 0.89 +/− 0.03 vs 0.92 +/− 0.16, respectively) but lost most of its information value (MI = 0.61 +/− 0.07 bits vs 0.09 +/− 0.01 bits, respectively). These results are consistent with those of Reibnegger and Schrabmair, who found that MI is sensitive to disease prevalence while accuracy-based metrics are not[14]. These findings illustrate an important difference between accuracy-based and information-based measures, accuracy metrics consider only the assay in isolation while information theoretic measures consider both the properties of both the tested population (the source) and the assay (the channel), providing a more holistic assessment of the testing process.
4.2. The optimal PF4 ELISA OD cutoff
The choice of a diagnostic threshold is an encoding step where the signal is transformed into a binary output for communication to the physician. A key concept in information theory is that encoding must be carefully chosen to match the statistical properties of the information source in order to maximize information transmission and minimize errors. In this study, it was observed that the MI optimized cutoff increased from OD > 0.79 to OD > 1.5 when the HIT prevalence was reduced from 50% to 2.4%, respectively. These findings suggested that diagnostic thresholds should be chosen to match local physician ordering practices rather than using a single diagnostic threshold determined in case-control studies with an artificially inflated disease prevalence.
4.3. When is a negative PF4 ELISA useful?
Clinicians are understandably cautious when ruling-out HIT, because the consequences of undiagnosed and untreated HIT can be severe. Many healthcare providers may prefer to utilize a lower diagnostic threshold, reasoning that a more sensitive test should be better for ruling out the disease and that a false positive can be remedied by confirmatory testing. Indeed, the NPV for the PF4 ELISA was 0.998 when using the FDA-approved cutoff under realistic testing conditions (prevalence = 2.4%), which gives the impression that a negative PF4 ELISA result is very useful for ruling out HIT. However, under realistic conditions, the NPV was excellent regardless of which OD threshold was applied. For example, the NPV was 0.995 at the MI-optimized cutoff of OD > 1.5. These results illustrated that, when using accuracy-based metrics, is not always simple to ascertain if and when a negative PF4 ELISA result actually provides novel information to help rule out HIT. Information theory enabled the direct quantification of IG−, the information value of a negative test result. IG− increased by only 0.01 bits when using OD > 0.4 instead of OD > 1.5 under realistic testing conditions. Thus, when using the information theoretic approach, it became readily apparent that choosing the more permissive threshold under these conditions was unlikely to strengthen the rule-out value of a negative PF4 ELISA in a clinically relevant way.
4.4. What is the harm in being cautious?
In contrast to the marginal change in IG−, choosing the more permissive threshold of OD > 0.4 resulted in 1.99 fewer bits of IG+ and 0.38 additional bits of noise under realistic conditions. The additional noise likely contributes to the previously reported high rate of overdiagnosis and overtreatment of HIT, which unnecessarily exposes patients to expensive alternative anticoagulants that carry an elevated risk of bleeding[18]. An additional potential harm is that clinicians may become conditioned to expect that a positive PF4 ELISA result is most likely a false-positive and therefore wait for confirmatory testing results before making difficult therapeutic decisions. Due to the acute thrombotic risk in HIT, such a ‘wait and see’ approach is not consistent with expert recommendations and should be avoided[17]. It will be important to investigate in future studies if lowering the diagnostic threshold to prevent underdiagnosis of HIT can paradoxically lead to undertreatment of HIT due to the tremendous loss of information associated with a positive test result.
4.5. Information content of sequential HIT testing algorithms
Information theoretic approaches have been developed to evaluate the information associated with the application of multiple diagnostic tests[6,13]. The standard-of-care HIT testing algorithm employs three different tests in sequence, a clinical scoring tool (ex. 4Ts), a screening antigen test (ex. PF4 ELISA), and a confirmatory functional assay (ex. SRA). Application of information theory to this multiple testing strategy would yield valuable insights into the interactions between these diagnostic steps, could determine the total information yield, and could elucidate when confirmatory test is necessary versus redundant. These analyses were not addressed in the present study because a suitable (non-SRA) gold-standard was not available. Furthermore, in the case of HIT, the total information yield may be less relevant as management decisions cannot be delayed to complete the testing algorithm due to the acute risk of severe complications.
4.6. The benefit of a continuous valued OD result
In this study, the continuous valued PF4 ELISA result yielded at least 50% more MI under both idealized and realistic testing conditions. These results are consistent with the work of Diamond and colleagues who reported a 41% increase in the MI of an echocardiographic stress test when the magnitude of ST segment changes was discretized instead of dichotomized[8]. To leverage this additional information, clinicians would have to feel comfortable using a Bayesian approach that considers both the specific OD value and the patient’s pretest probability to estimate post-test probability. The treatment threshold could then be set as the post-test probability value at which the likelihood and severity of negative outcomes associated with untreated HIT outweigh the those associated with falsely treating for HIT.
4.7. Navigating a complex diagnostic dilemma
HIT testing represents an example of a diagnostic dilemma that is fraught with complexity. The estimation of post-test probability is a function of patient specific factors, assay specific factors, and the specific PF4 ELISA OD value. To further complicate matters, these different pieces of information may only become available over a period of time that spans multiple handoffs of the patient’s care between different members of the clinical team. For these reasons, it may be hard for clinicians to quantitatively estimate post-test probability in real time. Therefore, HIT testing may represent an excellent opportunity to apply a machine learning approach to build a classification or regression model that would output a predicted diagnosis or post-test probability, respectively. The literature provides examples of clinical support tools based on machine learning algorithms that enable predicting unknown lab results using available laboratory observables [31], reducing unnecessary test ordering [32,33], identifying potential laboratory errors [34], and reducing time-to-diagnosis [35,36]. Future studies would be needed to explore the application of machine learning towards diagnosing HIT. The information theoretic approach presented here may aide this endeavor in several ways. First, low test information yield may identify opportunities for improvement through application of machine learning. Second, selection of clinical variables sharing a high degree of mutual information with the reference standard can be used as a constraint upon the feature space that is input into the machine learning model. Finally, improved information yield could be used as an objective and clinically relevant measure of success. Further yet, the information shared between the output of a machine learning algorithm and the eventual treatment decisions made by caring physicians would provide insight into how clinicians interact with the output of an opaque high-dimensional model.
4.8. Limitations of the current study
Because of the retrospective observational study design, we had to assume the SRA-untested samples to be negative in order to address an ordering bias, i.e. PF4 ELISA-negative samples are not typically sent out for SRA testing. It is possible that some clinicians may have felt comfortable diagnosing HIT without sending for a SRA when confronted with a patient with a high pretest probability and a positive PF4 ELISA. Such cases would be falsely classified as negative in our study design, potentially leading to underestimation of the true HIT prevalence. To address the concern, we provided independent evidence of a low pretest probability via chart review (Supplemental Table 1), and we performed a sensitivity analysis (Supplementary Fig. 2), which demonstrated that our main conclusions are not sensitive to even a 4-fold increase in the HIT prevalence estimate.
5. Conclusions and recommendations
We found that using information theory to evaluate and optimize a clinical laboratory test provided a more effective and complete evaluation than using conventional metrics of analytical accuracy. As a result of our information theoretic analyses, we can provide three strategies for improving the value of the PF4 ELISA. First and foremost would be to improve patient selection. The loss of information attributable to the low pretest probability far exceeded the differences in information content when comparing diagnostic thresholds. Second, avoiding dichotomization altogether would increase the information content and eliminate the issue of finding an optimal threshold value. Third, if clinician ordering or test interpretation practices cannot be changed, then the diagnostic threshold should be optimized to match practice using information theory. At our institution, this would mean adopting a threshold cutoff of OD > 1.5, although this value may differ at other institutions. This study demonstrates proof-of-principle that deploying information theory in clinical laboratories could improve diagnostic test performance for the case of HIT. The constructs offered by information theory are not limited to any particular type of signal or data distribution and their estimation requires only the results of an assay and a suitable reference standard. Therefore, we posit that the approach illustrated by the case-study of HIT may be generalizable to diagnostic tests for other diseases. Additional research will be needed to validate this hypothesis.
Supplementary Material
Acknowledgments
We thank Dr. Arjun Raman (University of Chicago) for valuable discussions and insights.
Funding details
JRB is supported by the NIH Office of the Director (DP5 OD028125) and Burroughs Wellcome Fund (CAMS 1019648).
Footnotes
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Appendix A. Supplementary data
Supplementary data to this article can be found online at https://doi.org/10.1016/j.jbi.2021.103756.
References
- [1].Operskalski JT, Barbey AK, Risk literacy in medical decision making, Science 352 (6284) (2016) 413–414. [DOI] [PubMed] [Google Scholar]
- [2].Henriquez R. Ronald, Korpi-Steiner Nichole, Bayesian Inference Dilemma in Medical Decision-Making: A Need for User-Friendly Probabilistic Reasoning Tools, Clin. Chem 62 (2016) 1285–1286. [Google Scholar]
- [3].Shannon CE, A mathematical theory of communication, Bell Syst. Tech. J 27 (1948) 379–423. [Google Scholar]
- [4].Cover TM and Thomas JA. Elements of information theory. 2006. Wiley and Sons. New Jersey. ISBN-13 978-0-471-24195-9. [Google Scholar]
- [5].Eiseman NA, Bianchi MT, and Westover MB. The Information Theoretic Perspective on Medical Diagnostic Inference. Hosp Pract (1995) 2014;42(2):125–138. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [6].Good IJ, Card WI, The diagnostic process with special reference to errors, Methods Inf. Med 10 (3) (1971) 176–188. [PubMed] [Google Scholar]
- [7].Metz CE, Goddenough DJ, Rossmann K, Evaluation of receiver operating characteristic curve data in terms of information theory, the applications in radiography, Radiology 109 (1973) 297–303. [DOI] [PubMed] [Google Scholar]
- [8].Diamond GA, Hirsch M, Forrester JS, Saniloff HM, Vas R, Halpern SW, Swan HJ, Application of information theory to clinical diagnostic testing The electrocardiographic stress test, Circulation 63 (1981) 915–921. [DOI] [PubMed] [Google Scholar]
- [9].Somoza E, Mossman D, Comparing and optimizing diagnostic tests: An information-theoretical approach, Med Decis Mak 12 (1992) 179–188. [DOI] [PubMed] [Google Scholar]
- [10].Benish WA, Relative entropy as a measure of diagnostic information, Med Decis Mak 19 (1999) 202–206. [DOI] [PubMed] [Google Scholar]
- [11].Benish WA, Mutual information as an index of diagnostic test performance, Methods Inf. Med 42 (2003) 260–264. [PubMed] [Google Scholar]
- [12].Benish WA, Intuitive and axiomatic arguments for quantifying diagnostic test performance in units of information, Methods Inf. Med 48 (2009) 552–557. [DOI] [PubMed] [Google Scholar]
- [13].Benish WA, A review of the Application of Information Theory to Clinical Diagnostic Testing, Entropy 22 (2020) 97. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [14].Reignegger G, Schrabmair W, Optimum binary cut-off threshold of a diagnostic test: comparison of different methods using Monte Carlo technique, BMC Med. Inf. Decis. Making 14 (2014) 99. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [15].Pishko A and Cuker A. Heparin-induced thrombocytopenia. Transfusion Medicine and Hemostasis Elsevier Science; 2019. pg 627–680. ISBN # 978-0-12-813726-0. [Google Scholar]
- [16].Cuker A, Gimotty PA, Crowther MA, Warkentin TE, Predictive value of the 4Ts scoring system for heparin-induced thrombocytopenia: a systematic review and meta-analysis, Blood 120 (2012) 4160–4167. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [17].Linkins LA, Dans AL, Moores LK, Bona R, Davidson BL, Schulman S, Crowther M, Treatment and prevention of heparin-induced thrombocytopenia: antithrombotic therapy and prevention of thrombosis, 9th ed: American College of Chest Physicians evidence-based clinical practice guidelines, Chest 141 (2012) e495S–e530. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [18].Cuker A, Heparin-induced thrombocytopenia (HIT) in 2011: an epidemic of overdiagnosis, Thromb. Haemost 106 (2011) 993–994. [DOI] [PubMed] [Google Scholar]
- [19].Warkentin TE, Arnold DM, Ishac Nazi, Kelton JG, The platelet serotonin-release assay, Am. J. Hematol 90 (2015) 564–572. [DOI] [PubMed] [Google Scholar]
- [20].Warkentin TE, Sheppard JI, Moore JC, Sigouin CS, Kelton JG, Quantitative interpretation of optical density measurements using PF4-dependent enzyme-immunoassays, J. Thromb. Haemost 6 (2008) 1304–1312. [DOI] [PubMed] [Google Scholar]
- [21].Nazi I, Arnold DM, Moore JC, Smith JW, Ivetic N, Horsewood P, Warkentin TE, Kelton JG, Pitfalls in the diagnosis of heparin-induced thrombocytopenia: A 6-year experience from a reference laboratory, Am. J. Hematol 90 (2015) 629–633. [DOI] [PubMed] [Google Scholar]
- [22].Chan CM, Woods CJ, Warkentin TE, Sheppard JI, Shorr AF, The role for optical density in heparin-induced thrombocytopenia: a cohort study, Chest 148 (2015) 55–61. [DOI] [PubMed] [Google Scholar]
- [23].Baroletti S, Hurwitz S, Conti NAS, Fanikos J, Piazza G, Goldhaber SZ, Thrombosis in suspected heparin-induced thrombocytopenia occurs more often with high antibody levels, Am. J. Med 125 (2012) 44–49. [DOI] [PubMed] [Google Scholar]
- [24].Nellen V, Sulzer I, Barizzi G, Bernhard L, Alberio L, Rapid exclusion or confirmation of heparin-induced thrombocytopenia, Haematologica 97 (2012) 89–97. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [25].Pearson MA, Nadeau C, Blais N, Correlation of ELISA optical density with clinical diagnosis of heparin-induced thrombocytopenia: a retrospective study of 104 patients with positive anti-PF4/heparin ELISA, Clin. Appl. Thromb. Hemost 20 (2014) 349–354. [DOI] [PubMed] [Google Scholar]
- [26].Greinacher A, Juhl D, Strobel U, Wessel A, Lubenow N, Selleng K, Eichler P, Warkentin TE, Heparin-induced thrombocytopenia: a prospective study on the incidence, platelet-activating capacity and clinical significance of antiplatelet factor 4/heparin antibodies of the IgG, IgM, and IgA classes, J. Thromb. Haemost 5 (2006) 1666–1673. [DOI] [PubMed] [Google Scholar]
- [27].McFarland J, Lochowicz A, Aster R, Chappell B, Curtis B, Improving the specificity of the PF4 ELISA in diagnosing heparin-induced thrombocytopenia, Am. J. Hematol 87 (2012) 776–781. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [28].Ruf KM, Bensadoun ES, Davis GA, Flynn JD, Lewis DA, A clinical-laboratory algorithm incorporating optical density value to predict heparin-induced thrombocytopenia, Thromb. Haemost 105 (2011) 553–559. [DOI] [PubMed] [Google Scholar]
- [29].MATLAB and Statistics Toolbox Release 2017b, The MathWorks, Inc., Natick, Massachusetts, United States. [Google Scholar]
- [30].Altman and Royston, The cost of dichotomizing continuous variables, BMJ 332 (2006) 1080. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [31].Luo Y, Szolovits P, Dighe AS, Baron JM, Using Machine Learning to Predict Laboratory Test Results, Am. J. Clin. Pathol 145 (6) (2016), 778 788. [DOI] [PubMed] [Google Scholar]
- [32].Huang R, McEvoy DS, Baron JM, Dighe AS, Iron studies and transferrin, a source of ordering confusion highly amenable to clinical decision support, Clin Chem Acta 510 (2020) 337–343. [DOI] [PubMed] [Google Scholar]
- [33].Chi C-L, Street WN, Katz DA, A decision support system for cost-effective diagnosis, Artif. Intell. Med 50 (2010) 149–161. [DOI] [PubMed] [Google Scholar]
- [34].Mathias PC, Turner EH, Scroggins SM, Salipante SJ, Hoffman NG, Pritchard CC, Shirts BH, Applying Ancestry and Sex Computation as a Quality Control Tool in Targeted Next-Generation Sequencing, Am. J. Clin. Pathol 145 (3) (2016) 308–315. [DOI] [PubMed] [Google Scholar]
- [35].Tomasev N, et al. , A clinically actionable approach to continuous prediction of future acute kidney injury, Nature 572 (2019) 116–119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [36].Henry KE, Hager DN, Pronovost PJ, Suchi S. A targeted real-time early warning. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.


