Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2017 Jun 1.
Published in final edited form as: Crit Care Med. 2016 Jun;44(6):e432–e442. doi: 10.1097/CCM.0000000000001623

A One-Nearest-Neighbor Approach to Identify the Original Time of Infection using Censored Baboon Sepsis Data

Li Ang Zhang 1, Robert S Parker 1,2,3,5, David Swigon 4, Ipsita Banerjee 1,3,5, Soheyl Bahrami 6, Heinz Redl 6, Gilles Clermont 1,2,3,5
PMCID: PMC5297595  NIHMSID: NIHMS753520  PMID: 26968022

Abstract

Objective

Sepsis therapies have proven to be elusive due to the difficulty of translating biologically sound and effective interventions in animal models to humans. A part of this problem originates from the fact that septic patients present at various times after the onset of sepsis while the exact time of infection is controlled in animal models. We sought to determine whether data mining longitudinal physiological data in a non-human primate model of E. coli-induced sepsis could help inform the time of onset of infection.

Design and Setting

A nearest-neighbors approach was used to back cast the time of onset of infection in animal models of sepsis. Animal data was censored to simulate prospective monitoring at any moment along the septic infection. This was compared against an uncensored database to find the most similar animal in order to estimate the infection onset time. Leave-one-out cross-validation was utilized for validation. Biomarker selection was performed based on criteria of estimation accuracy and/or ease-of-measurement.

Subjects

Retrospective data from 33 septic baboons (Papio ursinus) subjected to E.coli infusion. Validation was performed using 14 pigs that were subjected to surgically-induced peritonitis with feces and 22 pigs that were subjected to LPS infusion.

Measurements and Main Results

The presence of uniquely changing biomarkers during septic infection enabled the estimation of infection onset time in the datasets. Various combinations of temporal biomarkers such as white blood cell, oxygen content, mean arterial pressure, and heart rate yielded estimation accuracies of up to 97.8%. The use of temporal vital signs and a single measurement of serum biomarkers yielded highly accurate estimates without the need for invasive measurements. Validation in the pig data revealed similar results despite the heterogeneity of multiple experimental cohorts. This suggests that the method may be effective if sufficiently similar subjects are present in the database.

Conclusions

One nearest neighbor analysis showed promise in accurately identifying the onset of infection given a database of known infection times and of sufficient breadth. We suggest that this approach is ready for evaluation within the clinical setting using human data.

Keywords: sepsis, bacterial infection, data mining, cluster analysis, biomarkers

Introduction

Sepsis is a severe acute physiological response that results from the systemic effects of inflammatory mediators in response to infection. Each year, sepsis afflicts millions worldwide with extensive morbidity and mortality (1). Treatment of sepsis has proven to be a challenge due to the fast-changing dynamics and multiple trajectories and outcomes of the syndrome (2, 3). Progress in sepsis therapeutics has generally been slow. Since 1982, clinical trials for pharmacological interventions in sepsis showed either no effect or a negative effect on mortality (4). There is currently no FDA-approved drug to treat septic shock. Frustratingly, therapies that have shown promise in animal models have often failed in human trials due to a lack of efficacy or due to safety concerns (5, 6). Part of this translational disconnect originates from the fact that septic patients are admitted into the hospital at various times after the onset of sepsis symptoms (7, 8). Data collection and treatment initiation are performed relative to the time of enrollment, not time of disease onset. Because of the dynamic nature of the physiologic response to infection, both the nature and the timings of potential interventions are likely to be determinant factors in influencing outcome (2, 9). For example, an abrogation of the early TNF response increased mortality in some animal models, while most pre-clinical treatment models showed benefit (10, 11). This may explain, at least in part, why many sepsis therapies that showed promise from animal models, where timing is known, failed in human studies.

Several mathematical and statistical models have been posed to elucidate the fast-acting dynamics of sepsis and to offer predictions regarding the potential effects of interventions and their timing (2, 3, 12, 13). However, training and fitting parameters of these models for individual patients is challenging, in part due to the aforementioned timing issues with human data collection. For the purposes of such models, a population mean is typically computed from data pooled at time points relative to time of enrollment. Naïve pooling becomes a problem because these data points are located at various points along temporal trajectories of individual patients. Methodological obstacles in developing robust models, such as inter-individual timing and variability and response, combined with a lack of familiarity of the research community with such computational tools have delayed the introduction of such advanced tools as core to the design of clinical trials of sepsis.

Identifying the time of onset of infection offers two advantages in human sepsis research. First, it potentially enables the revisiting of previously failed trials with the purpose of analyzing, a posteriori, possible relationships between the elapsed time from onset and effectiveness. Second, identifying onset time enables patient biomarker data to be shifted relative to the time of infection therefore allowing more effective translation between animal results and human expectations. Further, mathematical models can be properly trained and provide more accurate predictions if mechanistically-based, biomarker-driven interventions are contemplated. We present an approach that, at least in three animal models, accurately estimates the time of onset of an inflammatory challenge from commonly obtained measurements.

Methods

Datasets

A baboon sepsis dataset was retrospectively analyzed for this work (14). The original experimental design was to investigate the therapeutic effects of a nitric oxide synthase (NOS) inhibitor on septic shock. Thirty-three baboons of the species Papio ursinus were sedated and 2x109 colony forming units/kg of E coli were infused intravenously into each subject over two hours. Fluid resuscitation and antibiotic therapy were provided to all subjects throughout the experiment. The proposed NOS inhibitor treatment began after hour 12 on sixteen subjects. Animals were treated in accordance with National Institutes of Health guidelines. The experimental protocol was reviewed and approved by the Institutional Animal Care and Use Committee of Biocon Research Laboratories, Pretoria, South Africa. 73 biomarkers were obtained as time series for each baboon including vital signs, arterial blood gases and lactate, hemodynamic parameters, complete blood counts and differential, and biochemistry. Baseline measurements were taken 30 minutes prior to E coli infusion. Additional measurements were collected at specified times throughout the experiment. Subjects that survived the experiment had final measurements taken prior to sacrifice. Biomarkers that were intermittently measured throughout the experiment were eliminated from the analysis. This reduced the number of biomarkers to 29, where measurements were available for all baboons at hours −0.5, 0, 0.5, 1, 2, 3, 4, 5, 6, 11, and 12, where hour 0 marked the beginning of the E coli infusion. This time point was considered the true time of onset of infection. Time points past 12 hours post-infection were not utilized in this analysis. Baboons in the sham and treatment groups were combined for analysis. The biomarkers evaluated herein are listed in Table 1.

Table 1. Biomarker Acronym Dictionary.

Dictionary of biomarker acronyms. The third column indicates how the biomarker was measured within the baboon data. The right most column indicates the feasibility of measuring the biomarker in patients within a clinical setting. (N indicates noninvasive-to-measure, M indicates minimally invasive-to-measure, I indicates very invasive-to-measure)

Acronym Meaning Method of Measurement in Baboon Data Feasibility within human patients
WBC White blood Cell count Arterial and mixed venous blood sample M
HR Heart Rate Straightforward N
HCO3A Bicarbonate Arterial blood gas analysis M
SVR Systemic vascular resistance Arterial catheter I
HB Hemoglobin Arterial and mixed venous blood sample M
CO Cardiac output Thermal-dilution technique with Swan-Ganz catheter I1
CI Cardiac index Calculated from CO and body surface area N
CcO2 Capillary oxygen content Calculated from APO2 I
ABEA Arterial base excess Calculated from HCO3A and pHA M
MAP Systemic arterial pressure Arterial catheter N2
PVR Pulmonary vascular resistance Arterial catheter I
PaO2 Arterial oxygen tension Arterial blood gas analysis M
RBC Red blood cell count Arterial and mixed venous blood sample M
CaO2 Arterial oxygen content Calculated from arterial and mixed venous blood sample I
PLT Platelet count Arterial and mixed venous blood sample M
APO2 Alveolar oxygen tension Arterial catheter M
HCT Hematocrit Arterial and mixed venous blood sample M
PWP Pulmonary wedge pressure Pulmonary artery catheter I
RAP Central venous pressure Arterial catheter I
PaCO2 Arterial carbon dioxide tension Arterial blood gas analysis M
TEMP central blood temperature Swan-Ganz catheter N2above
O2DEL Oxygen delivery Calculated from respirometry and AaDO2 I
SATAO2 Arterial oxygen saturation Calculated from arterial blood sample N2
MPAP Mean pulmonary artery pressure Arterial catheter I
RR Respiratory rate Straightforward N
HOROW Horowitz index Calculated from PaO2 and fraction of inspired oxygen M
QUOTIENT Respiratory quotient Respirometry N
PHA Arterial pH Arterial blood gas analysis M
AaDO2 Alveolar-arterial oxygen difference Calculated from PaO2 and APO2 I
1

CO can be estimated via an ultrasound technique, but this technique is not widely adopted

2

Widely accepted noninvasive methods exist to obtain or closely estimate this value

For validation of the method, two pig datasets were used. The first dataset contained 14 pigs that were subjected to surgically induced peritonitis, of which 7 subjects received a superoxide dismutase treatment. Measurements were collected at 2, 4, 6, 8, 10, 12 hours after abdominal closure. Both groups were combined during analysis because trajectories between groups were not significantly different. The second dataset included 22 pigs subjected to one hour LPS infusions: 12 subjects received 1 µg/kg/h and 10 received 10 µg/kg/h. Half of the animals received biliverdin. Measurements were collected at 0, 1, 2, 3, 4, 5, 7 hours after the start of LPS infusion. Despite significant differences in trajectories between LPS doses, all groups were combined during analysis to test the method’s performance on a non-homogenous database. In both porcine studies, baboon-comparable biomarkers were utilized.

One Nearest Neighbor Analysis

Figure 1 provides an illustration of the process to estimate the time of onset of infection for a given subject. The temporal trajectories of biomarkers for this subject were left censored at all possible time points to simulate prospective monitoring at any given moment along the subject’s sepsis trajectory. Trajectories were right censored to 1,2,3, or 4 consecutive points to simulate a time period of monitoring. The one-nearest-neighbor method compared the subject’s subtrajectory against a database of equal length from the remaining 32 subjects, identified the most similar subtrajectory, and assigned the time of onset as the known time of onset for the most similar subtrajectory.

Figure 1.

Figure 1

An entire white blood cell count trajectory (normalized to healthy baseline) is shown for a baboon and is censored prior to testing the nearest neighbor method. Left censoring (left shaded area) was performed to simulate the passage of time between the onset of sepsis and the first measurement taken at simulated hospital enrollment. All subsequent time points were renumbered to simulate clinical time points where data is relative to hospital enrollment time. Right censoring (right shaded area) was performed to emulate sparseness of human data. In this case, measurements ended at 2 hours post enrollment for a total of 3 subsequent measurements (right censor level 3).

Suppose a three hour WBC data segment was available for a study subject. A Euclidean distance was calculated between this segment and all possible three consecutive WBC points in the database. Figure 2 conceptualizes this methodology. This process was repeated for each additional biomarker and their distances were summed. The three-hour length subtrajectory in the database with the shortest distance to the study subject’s was identified as the nearest-neighbor. The first time point of this subtrajectory was used to estimate the study subject’s elapsed time since infection. Using more than one nearest neighbor was tested, but it did not improve the accuracy of results (data not shown). A glossary of terms and detailed procedures used are provided in the Supplemental Digital Content.

Figure 2.

Figure 2

The normalized and censored white blood cell count (WBC) trajectory from Figure 1 was compared against the WBC database of the remaining 32 baboons (normalized to respective healthy baseline). The 3-point WBC trajectory was compared against every group of three sequential points in the database by calculating a Euclidean distance. Each type of line visualized represents one such comparison. If additional biomarkers were included for analysis, the Euclidean distances from all biomarkers were summed.

To account for scaling differences across biomarkers and for inter-baboon variability, data were normalized on a per-baboon basis. Each subject’s biomarker trajectories were normalized to its respective baseline (value at t=−0.5hr prior to infusion). However, the baseline values for the study subject were unknown due to left censoring. A linear regression model for estimating baseline was created for each biomarker using all time points in the database (Biomarkerbaseline= β0 + β1*Biomarkert). This was used to estimate the unknown baseline values and normalize each biomarker for the study subject. The intercept, β0, represents a population average baseline and the slope parameter, β1, represents a subject-specific shift to this baseline.

To test estimation accuracy, we use leave-one-out validation. For each of the 33 subjects, every possible left censoring within the interval [0.5hr, 12hr] was tested to emulate a maximum of 9 possible “arrival” times. Estimations were generated for each of these cases and accuracy was determined by dividing the number of correct estimations by the total number of estimations generated. An estimated infection time within a tolerance of ± one time point from the actual infection time was deemed correct.

In Silico Experiments

A combinatorial search was performed in order to find the set of biomarkers that yielded the highest accuracy in estimating time of infection while minimizing the number of invasive clinical measurements required. The majority of the 29 biomarkers within the data set were the result of invasive measurements and some of these are difficult to collect from human patients. To improve the clinical feasibility of the method, the search was performed involving single point measurements of minimally invasive biomarkers, i.e., blood samples and vital signs.

The first accuracy experiment tested the individual estimation capacity for time of onset of each of the 29 biomarkers across various levels of right censoring. The best biomarkers were identified by calculating the mean accuracy across the four right censoring durations and selecting the top ten. Additionally, the null hypothesis was tested by making estimations based on randomly generated trajectories sampled from a zero mean log normal distribution. The second accuracy experiment exhaustively searched all possible combinations of the previously identified top ten biomarkers. A combinatorial 10Cn search, where n ϵ {2,3,…,10}, was performed to identify the best n combinations of biomarkers that identify infection time. This search was performed for each of the right censoring options.

The first feasibility experiment tested the ability of a mixture of time series biomarkers and single point measure biomarkers to estimate infection time. Vitals heart rate (HR), mean arterial blood pressure (MAP), temperature (TEMP), oxygen saturation (SATAO2), and respiratory rate (RR) are noninvasive-to-measure and were made available in time series. Minimally invasive biomarkers white blood cell count (WBC), PaO2, and platelet count (PLT) were chosen due to their diagnostic abilities as listed in the Surviving Sepsis Guidelines (1). Additionally, the top three minimally invasive biomarkers from the first accuracy experiment were included as well. All combinations of these vitals and minimally invasive biomarkers were tested for their infection time estimation accuracy. This experiment was repeated for right censoring at 2, 3, and 4 hours. The second feasibility experiment further explored this combination of time series biomarkers and single-point biomarkers by comparing two diagnostic panels that can be realistically performed at the time of patient enrollment. Two minimally invasive diagnostic panels were chosen: arterial blood gas test (yielding: arterial base excess [ABEA], PaO2, arterial bicarbonate [HCO3A]) and blood analysis (yielding: WBC, PLT, HB). Similar to before, vitals were available in time series and this panel of biomarkers was available at the time of simulated enrollment for a given emulated patient. All combinations of the aforementioned vitals (except for RR) were tested with data from either or both diagnostic panels. RR was excluded because it was not among the top performers in the previous experiment. This experiment was repeated for right censoring at two, three, and four hours.

For validation, the first accuracy experiment (single biomarker search) and the first feasibility experiment (vitals+1 search) were repeated on each of the pig datasets. The biomarkers used in those experiments were selected to be comparable to those of the baboons’.

Results

Accuracy Experiments

Table 2 shows the infection time estimation accuracy of individual biomarkers. The first right censor duration tested only a single time measurement and yielded low accuracies throughout the table. Accuracy increased as more of the study subject’s temporal points were included in the search. Biomarkers were sorted by decreasing mean accuracy across the four right censor durations. WBC was the top biomarker in all columns and yielded a maximum estimation accuracy of 93.1% when using four sequential hourly points. Jackknife resampling of the data revealed that the standard deviation of accuracies across biomarkers and right censor durations had a mean of 0.7%. No entry within Table 2 fell below their respective right censor level null accuracy. The mean null hypothesis accuracy increased with higher right censor levels due to the shrinking of estimation possibilities.

Table 2. Single biomarker prediction accuracy.

Time-of-infection estimation accuracy over varying right censor values (temporal durations). Results were generated with available biomarkers from the baboon data and then sorted by mean accuracy. The null hypothesis, tested with randomized biomarkers, performed worse than all of the tested biomarkers. This method was repeated on the two porcine data sets for validation. Only 3 right censor durations were tested due to the sampling rate and duration limitations of those experiments.

Baboon Accuracy Pig Peritonitis Accuracy Pig LPS Accuracy
Biomarker R 1 R 2 R 3 R 4 Mean R 1 R 2 R 3 Mean R 1 R 2 R 3 Mean
WBC 51.80% 67.00% 83.00% 93.10% 73.70% 60.70% 58.60% 76.80% 65.40% 46.50% 64.40% 73.40% 61.40%
HR 42.10% 58.20% 74.60% 84.80% 65.00% 64.60% 83.80% 87.00% 78.50% 47.20% 61.00% 69.10% 59.10%
HCO3A 48.50% 56.20% 67.80% 75.30% 62.00% - - - - - - - -
SVR 32.70% 59.30% 72.00% 79.70% 60.90% 50.00% 57.40% 63.00% 56.80% 50.70% 63.60% 66.00% 60.10%
HB 34.80% 58.20% 70.80% 79.20% 60.80% 53.60% 67.10% 83.90% 68.20% 52.80% 73.70% 84.00% 70.20%
CO 33.60% 54.50% 68.90% 84.80% 60.50% 46.30% 51.50% 64.80% 54.20% 45.80% 55.10% 63.80% 54.90%
CI 32.10% 51.90% 72.00% 84.40% 60.10% - - - - - - - -
CCO2 33.90% 55.90% 70.80% 79.70% 60.10% - - - - - - - -
ABEA 44.80% 51.50% 65.90% 74.50% 59.20% 59.80% 73.50% 77.80% 70.40% 50.40% 63.90% 69.60% 61.30%
MAP 33.00% 50.80% 65.90% 82.70% 58.10% 43.90% 50.00% 61.10% 51.70% 53.50% 65.30% 77.70% 65.50%
PVR 38.50% 52.50% 66.70% 74.50% 58.00% - - - - - - - -
PaO2 45.20% 49.50% 63.60% 70.60% 57.20% 53.70% 51.50% 64.80% 56.60% 47.20% 49.20% 61.70% 52.70%
RBC 29.70% 49.80% 67.40% 78.80% 56.40% 52.40% 74.30% 87.50% 71.40% 54.20% 73.70% 83.00% 70.30%
CAO2 30.90% 51.20% 64.00% 77.10% 55.80% 51.20% 50.00% 57.40% 52.90% 47.90% 56.80% 57.40% 54.00%
PLT 50.90% 49.20% 56.10% 60.60% 54.20% 61.90% 70.00% 82.10% 71.30% 41.50% 50.00% 61.70% 51.10%
aPO2 40.30% 49.50% 56.40% 62.80% 52.30% - - - - - - - -
HCT 28.80% 45.50% 62.90% 70.10% 51.80% 53.60% 72.90% 91.10% 72.50% 50.00% 65.30% 81.90% 65.70%
PWP 34.50% 53.20% 54.20% 63.60% 51.40% 45.10% 64.70% 81.50% 63.80% 38.00% 46.60% 66.00% 50.20%
RAP 28.80% 46.50% 58.70% 70.60% 51.10% 43.90% 63.20% 70.40% 59.20% 41.50% 51.70% 64.90% 52.70%
PaCO2 39.40% 52.20% 51.50% 61.00% 51.00% 51.20% 50.00% 57.40% 52.90% 47.90% 56.80% 57.40% 54.00%
TEMP 32.70% 49.80% 56.10% 62.80% 50.30% 54.90% 67.60% 83.30% 68.60% 59.90% 66.10% 79.80% 68.60%
O2DEL 31.80% 41.40% 54.50% 73.60% 50.30% - - - - - - - -
SATAO2 34.50% 45.80% 53.80% 62.80% 49.20% 62.20% 58.80% 70.40% 63.80% 55.60% 55.90% 73.40% 61.70%
MPAP 37.00% 43.80% 48.90% 58.00% 46.90% 59.80% 52.90% 61.10% 57.90% 47.20% 57.60% 85.10% 63.30%
RR 29.10% 35.40% 45.50% 59.30% 42.30% 50.00% 73.50% 77.80% 67.10% 36.60% 44.10% 53.20% 44.60%
HOROW 28.20% 38.00% 47.30% 54.10% 41.90% 52.44% 61.76% 72.22% 62.14% - - - -
QUOTIENT 28.20% 36.00% 43.60% 53.70% 40.40% - - - - - - - -
PHA 27.90% 38.70% 44.30% 48.10% 39.70% 42.70% 57.40% 75.90% 58.70% 55.60% 65.30% 76.60% 65.80%
AADO2 27.60% 32.30% 45.10% 51.50% 39.10% - - - - - - - -
Mean Null 26.60% 29.00% 32.30% 36.70% 31.20% 38.10% 44.70% 55.00% 45.90% 42.70% 49.10% 57.70% 49.80%

All possible combinations of the top ten biomarkers from Table 2 were tested to maximize the accuracy at each level of right censorship. The top result for each right censor level is shown in Table 3. Remarkably, single time point measurements of six biomarkers yielded roughly 70% estimation accuracy. WBC was selected in all cases. There was a strong preference given to arterial blood gas measurements, hemoglobin, and cardiovascular measurements.

Table 3. Multiple Biomarker Prediction Accuracy.

All possible n-biomarker combinations of the top 10 (baboon) biomarkers from Table 2 were tested for their estimation accuracy. The absolute best biomarker set is shown for each right censor duration under Baboon Accuracy. Validation of these biomarkers were performed on the porcine data sets. Biomarkers HCO3A, CI, and CCO2 were unavailable for the porcine data sets and were respectively substituted by pH, CO, and PaO2.

Accuracy
Right Censor
Length
Biomarkers Baboon Pig Peritonitis Pig LPS
  1 time points WBC HCO3A HR SVR CI ABEA 72.70% 71.95% 58.87%

  2 time points WBC HCO3A MAP HB CO CCO2 85.90% 70.59% 72.41%

  3 time points WBC HR HCO3A MAP SVR HB 93.20% 87.04% 93.48%

  4 time points WBC HR MAP CO CCO2 HCO3A HB CI 97.80% - -

Feasibility Experiments

The goal of the feasibility experiments was to identify a parsimonious set of biomarkers, in both time series and single-point measurements, which were minimally invasive to measure in humans and yielded a good accuracy in their ability to estimate the time of infection onset. HCO3A, hemoglobin (HB), and cardiac output (CO) were the top performing minimally invasive biomarkers in Table 2 and were included in the experiment. Table 4 shows the best results from each measurement collection (one minimally invasive biomarker plus a combination of vitals in time series), organized by right censor duration. For comparison, accuracies were calculated for each entry with vitals alone and with all biomarkers in time series. The inclusion of a point measurement had the greatest impact on accuracy for the lower right censor durations. A single value of WBC combined with two hours of HR data improved accuracy from 58.2% to 71.4%. HCO3A and WBC were consistently selected throughout the table. Point measures of biomarkers helped many entries achieve over 90% estimation accuracy.

Table 4. Prediction Accuracy of Longitudinal Vital Signs with a Single Blood Biomarker.

Results from the first feasibility experiment where vital signs in time series were combined with single point measurements (at time of simulated enrollment) of minimally invasive biomarkers to estimate infection time. Vitals TEMP, MAP, SATAO2, HR, RR, or in any combination thereof, was tested in conjunction with a single blood biomarker for their prediction accuracy. The top 3 biomarker sets for each combination and right censor duration are shown, with their accuracies listed in the “With Point” column. “No Point” shows the estimation accuracy for the vitals without the point biomarker. “Time Series Point” shows the estimation accuracy when all biomarkers (vitals and point) were available in time series.


Accuracy

Right Censor Time Series Biomarkers Point Biomarker No Point With Point Time Series Point

HR WBC 58.2% 71.4% 81.1%
HR HCO3A 58.2% 70.0% 71.7%
MAP HCO3A 50.8% 69.4% 70.0%

MAP HR HCO3A 70.0% 81.1% 81.1%
MAP HR WBC 70.0% 77.8% 81.1%
2 time points SATAO2 HR WBC 60.3% 76.1% 81.8%

MAP SATAO2 HR HCO3A 76.8% 81.8% 81.5%
TEMP MAP HR HCO3A 73.4% 81.1% 81.1%
TEMP MAP HR PAO2 73.4% 79.1% 81.1%

TEMP MAP SATAO2 HR HCO3A 75.4% 81.8% 81.8%
TEMP MAP SATAO2 HR WBC 75.4% 80.1% 82.2%
TEMP MAP SATAO2 HR PAO2 75.4% 80.1% 81.8%

HR WBC 74.6% 76.9% 86.4%
MAP HCO3A 65.9% 75.8% 81.8%
HR HCO3A 74.6% 75.4% 78.4%

MAP HR HCO3A 87.5% 89.8% 92.0%
MAP HR HB 87.5% 89.0% 90.5%
3 time points MAP HR PAO2 87.5% 88.3% 92.8%

MAP SATAO2 HR HCO3A 88.3% 92.0% 92.8%
MAP SATAO2 HR PAO2 88.3% 91.3% 93.9%
MAP SATAO2 HR HB 88.3% 89.4% 91.7%

TEMP MAP SATAO2 HR HCO3A 87.5% 92.0% 92.8%
TEMP MAP SATAO2 HR PAO2 87.5% 90.5% 93.9%
TEMP MAP SATAO2 HR HB 87.5% 89.4% 91.3%

MAP HCO3A 82.7% 88.3% 89.2%
MAP WBC 82.7% 86.1% 94.4%
HR HCO3A 84.8% 85.7% 88.7%

MAP HR WBC 94.4% 94.8% 94.4%
MAP HR HCO3A 94.4% 94.8% 95.2%
4 time points MAP HR HB 94.4% 93.9% 94.4%

MAP SATAO2 HR HCO3A 95.2% 96.1% 96.5%
TEMP MAP HR WBC 93.9% 95.2% 93.9%
MAP SATAO2 HR WBC 95.2% 95.2% 94.8%

TEMP MAP SATAO2 HR WBC 95.7% 95.2% 94.8%
TEMP MAP SATAO2 HR HCO3A 95.7% 95.2% 96.5%
TEMP MAP SATAO2 HR PAO2 95.7% 95.2% 97.0%

Time series of invasive measurements did not improve accuracy, thus not warranting the extra probing of subjects. For example, two measurements of HR and WBC yielded an estimation accuracy of 81.1%. The same accuracy was achieved by using two measurements of MAP and HR along with a single measurement of HCO3A. Furthermore, the use of MAP, SATAO2, and HR + 1x HCO3A outperformed the majority of entries in each right censor category.

The second feasibility experiment tested the accuracy of using multiple single point biomarkers with time series vitals. The top three results from the combinatorial search are shown in Table 5 with results sorted based on accuracies from the “Both Panels” column. The information gained by administering both diagnostic panels aided in almost all entries of the 2 and 3 time point right censor levels, achieving maximum accuracies of 84.2% and 90.2%, respectively. Regardless, despite the added data from the diagnostic panels, many entries from Table 4 using a point biomarker yielded equal or higher accuracies.

Table 5. Prediction Accuracy of Longitudinal Vitals with Different Diagnostic Blood Panels.

Results from the comparison of two types of diagnostic panels: arterial blood gas (HCO3A, PaO2, ABEA) and blood analysis (WBC, PLT, HB). Vitals TEMP, MAP, SATAO2, HR, or in any combination thereof, was tested in conjunction with either or both diagnostic panels for their estimation accuracy. The top performing combinations are shown.


Accuracy

Right
Censor
Time Series Biomarkers Both Panels' Accuracy Blood Cell Panel Accuracy Arterial Blood Gas Panel Accuracy
MAP 81.8% 74.7% 69.0%
HR 79.5% 71.0% 70.4%
TEMP 77.4% 65.0% 58.3%

MAP HR 82.2% 78.8% 78.8%
MAP SATAO2 77.8% 76.1% 70.0%
2 time points TEMP MAP 77.4% 76.1% 70.0%

TEMP MAP HR 82.8% 79.8% 78.5%
MAP SATAO2 HR 82.2% 80.5% 78.5%
TEMP MAP SATAO2 77.8% 77.1% 69.7%

TEMP MAP SATAO2 HR 84.2% 81.1% 77.8%

MAP 86.7% 77.7% 75.8%
HR 85.6% 70.5% 75.4%
TEMP 81.1% 62.2% 54.9%

MAP HR 83.7% 84.5% 84.5%
3 time points TEMP MAP 79.9% 78.4% 75.8%
MAP SATAO2 79.6% 80.7% 76.9%

TEMP MAP HR 83.7% 85.2% 84.8%
MAP SATAO2 HR 83.7% 86.0% 85.2%
TEMP MAP SATAO2 80.7% 80.3% 77.3%

TEMP MAP SATAO2 HR 90.2% 86.7% 85.2%

MAP 90.5% 84.8% 81.8%
HR 89.2% 80.5% 78.8%
SATAO2 87.5% 68.4% 64.5%

MAP HR 89.6% 95.2% 88.7%
4 time points MAP SATAO2 87.0% 87.0% 83.5%
TEMP MAP 86.2% 85.3% 82.7%

TEMP MAP HR 90.0% 94.8% 89.6%
MAP SATAO2 HR 89.6% 95.7% 90.9%
TEMP MAP SATAO2 87.5% 87.0% 83.5%

TEMP MAP SATAO2 HR 93.5% 95.2% 90.9%

Method Validation on Porcine Data Sets

Table 2 shows single biomarker accuracies for both pig experiments. Accuracies were, in general, equivalent or higher than those of the baboons. Time of onset estimation differed slightly among the models. For example, HB, TEMP and HCT performed better in both of the pig data sets. Alternatively, PLT performed similarly between the baboon and the pig LPS data, but was more informative in the pig peritonitis data. This may be the result of the differences in sepsis induction protocols across the three models and suggestive towards the existence of sepsis endotypes characterized by distinctive biomarker trajectories.

The vitals+1 biomarker search on the porcine data revealed MAP, TEMP, and SATAO2 to provide highly accurate estimates (LPS: 70 to 90+%, peritonitis: 80 to 90+%) when used in conjunction with a single WBC or PHA measurement. Detailed porcine results from this search are available in Supplemental Tables 1 and 2.

Discussion

A one-nearest-neighbor approach was selected to tackle the problem of identifying infection time from left and right censored data. We found that serial measurements of non-invasive vital signs, combined with a single minimally invasive, yet routinely done, blood work, yielded good accuracy to identify time of onset of infection in an experimental baboon model of sepsis. The method was further confirmed in two additional animal models. The one-nearest-neighbor method was developed based on the hypothesis that cohorts of septic subjects exist with similar characteristic biological responses. The baboon study chosen for analysis represented a homogenous cohort that all exhibited leukopenia following the E. Coli infusion. Pig validation sets did not share this feature. Given a censored and normalized trajectory for a study baboon, one-nearest-neighbor identified the most similar trajectory within the database. The temporal information of this trajectory provided the time-of-infection estimate for the study baboon. To the best of our knowledge, there are currently no proposed methods to identify time of onset of an inflammatory challenge for sepsis or non-sepsis data. A related paper was found to use a variable k-nearest-neighbor approach for the purposes of early diagnosis in neonatal sepsis (15). Another recent paper utilized the nearest-neighbor approach to identify novel genes associated with sepsis severity (16). Nearest-neighbor is a popular nonparametric approach in data mining and was selected here to find similarities among biomarker trajectories.

The exhaustive biomarker search revealed that certain biomarkers provide highly accurate estimations of time of onset of infection. WBC, HR, HB, MAP, and HCO3A were the top performers in Table 2 and had many appearances in Table 3. In contrast to CO or SVR, both of which requires the insertion of an arterial catheter, WBC, HR, HB, MAP, and HCO3A are relatively easy to measure. The modest accuracy increase from the inclusion of CO or SVR is of doubtful clinical significance and did not seem to justify the invasive measurement. This suggested that more invasive to measure biomarkers might be unnecessary and that this time of infection onset can be estimated with easily-measured, patient-friendly biomarkers. It is also interesting to note the consistent appearance of many biomarkers across animal models.

Most clinicians do not perform consecutive hourly blood sample tests, and only noninvasive biomarkers are likely to be acquired in time series. Many biomarkers in the baboon data cannot be obtained hourly (or ever) in human patients due to practical reasons (no available commercial assay, slow turnovers) or to unjustifiable expenses. Patients with suspected sepsis upon enrollment sometimes have a panel of diagnostic tests administered, including arterial blood gas tests and a WBC measurement. These measurements are typically only taken once at the time of enrollment. In contrast, vital signs such as HR, MAP, TEMP, and oxygen saturation are continuously monitored and universally measured in time series. The feasibility experiments addressed these issues by using clinically-obtainable data from humans and yielded interesting results. Specifically, they showed that: (i) the addition of a minimally invasive point measurement generally improved time-of-infection accuracy over the use of time series vitals alone, (ii) taking minimally invasive measurements in time series may be unnecessary for estimating time-of-infection, and (iii) there existed a data saturation limit where one-nearest-neighbor did not benefit from additional data.

The main limitation associated with this method was the small sample size of all three datasets. This method worked well for the heterogeneous pig datasets at least in part because of the uniformity of the insult within datasets, thus improving the chances of a relevant “closest looking” animal in the cohort. If the database does not contain sufficiently similar subjects to an incoming subject for which an estimation is to be made, the method might not be able to generate a meaningful estimate.

Three main challenges currently prevent this method from being directly translated to human patients. First, developing a human model requires access to a temporally rich collection of physiologic markers at baseline and following time-of-infection. The second challenge is human variability, demanding that the dataset be of sufficient size to be representative on most endotypes. Third, human sepsis often happens in conjunction with other inflammatory stressors, such as surgery or trauma. To address these challenges, a human dataset of sufficient breadth could potentially be assembled in a cohort of patients developing sepsis in association with an invasive procedure, therefore bounding time of onset to within a few hours. The impact of the procedure itself on biomarkers time series would constitute useful additional information.

While human patients exhibit high variability, we argue that there exists a finite number of sepsis endotypes and that these endotypes can be grouped into cohorts with similar clinical biomarker trajectories. Future work will employ data-mining techniques to classify these endotypes and to identify the associated biomarkers. By informing a nearest neighbor search on a sufficiently varied database with endotype-indicative biomarkers, the method should have enough discrimination to generate meaningful estimates of sepsis onset times.

Conclusion

We propose a novel application of the nearest neighbor approach to estimate the onset time of infection within artificially censored baboon and porcine sepsis data. High accuracies were achieved with varying sets of biomarkers, but some biomarkers are difficult to measure clinically in humans. A compromise was made between minimizing the invasiveness of measurement and maximizing the estimation accuracy. A feasible set of clinically-measurable biomarkers was identified; combinations of vital signs in time series and minimally invasive point measurements yielded similar but slightly lower accuracies.

There are no other approaches to estimating time-of-infection in sepsis, and the method developed here may have profound implications if successfully extended for use with human data. Septic patients would be able to be grouped into early, middle, and late infection times and treated differently, which may play a clinically-meaningful role in patient outcome. Clinical trials may be revisited or new therapeutic targets may emerge, because treatment timings can be more effectively controlled with respect to the temporal span of infection. Finally, computational models of sepsis and immunomodulatory interventions used in the design of such trials may immediately benefit from the knowledge of infection times.

Supplementary Material

Supplemental Data File _.doc_ .tif_ pdf_ etc._

Acknowledgments

This research was supported by National Institutes of Health grant R01 GM105728 and by the Department of Education Graduate Assistance in Areas of National Need Fellowship P200A120195.

Li Ang Zhang’s institution received grant support from the Department of Education (DOE Graduate Assistance in Areas of National Need Fellowship P200A120195) and the National Institutes of Health (NIH R01GM105728). Dr. Clermont, Parker and Swigon received support from the NIH (R01GM105728). Dr. Clermont has also received support from Edwards LifeSciences and Astute Medical Inc. though the University of Pittsburgh. Dr. Clermont has received royalties from UpToDate, Inc.

Dr. Clermont received support for article research from the National Institutes of Health (NIH), consulted for Edwards LifeSciences and Astute Medical Inc., and received royalties from UptoDate Inc. His institution received grant support from the NIH and the National Science Foundation. Dr. Zhang received support for travel from the NIH (Travel to ICCAI 2014) and received support for article research from the NIH and the Department of Education. His institution received grant support from the NIH (NIHR01GM105728) and the Department of Education (GAANN P200A120195). Dr. Parker received support for article research from the NIH and Department of Education. His institution received grant support from the NIH, Department of Education, and NSF. Dr. Swigon received support for article research from the NIH and NSF. His institution received grant support from the NIH and NSF and received support for travel from the NIH. Dr. Bahrami disclosed employment.

Footnotes

Conflict of Interest Statement:

The remaining authors have disclosed that they do not have any potential conflicts of interest.

References

  • 1.Dellinger RP, Levy MM, Rho des A, et al. Surviving sepsis campaign: international guidelines for management of severe sepsis and septic shock: 2012. Crit Care Med. 2013;41:580–637. doi: 10.1097/CCM.0b013e31827e83af. [DOI] [PubMed] [Google Scholar]
  • 2.Kumar R, Clermont G, Vodovotz Y, et al. The dynamics of acute inflammation. J Theor Biol. 2004;230:145–155. doi: 10.1016/j.jtbi.2004.04.044. [DOI] [PubMed] [Google Scholar]
  • 3.Reynolds A, Rubin J, Clermont G, et al. A reduced mathematical model of the acute inflammatory response: I. Derivation of model analysis of anti-inflammation. J Theor Biol. 2006;242:220–236. doi: 10.1016/j.jtbi.2006.02.016. [DOI] [PubMed] [Google Scholar]
  • 4.Fink MP. Animal models of sepsis. Virulence. 2014;81:137–143. doi: 10.4161/viru.26083. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Rittirsch D, Hoesel LM, Ward PA. The disconnect between animal models of sepsis and human sepsis. J Leukoc Biol. 2007;81:137–143. doi: 10.1189/jlb.0806542. [DOI] [PubMed] [Google Scholar]
  • 6.Suntharalingam G, Perry MR, Ward S, et al. Cytokine storm in a phase 1 trial of the anti-CD28 monoclonal antibody TGN1412. N Engl J Med. 2006;355:1018–1028. doi: 10.1056/NEJMoa063842. [DOI] [PubMed] [Google Scholar]
  • 7.Rivers EP, Jaehne AK, Nguyen HB, et al. Early biomarker activity in severe sepsis and septic shock and a contemporary review of immunotherapy trials: not a time to give up, but to give it earlier. Shock. 2013;39:127–137. doi: 10.1097/SHK.0b013e31827dafa7. [DOI] [PubMed] [Google Scholar]
  • 8.Puskarich MA, Trzeciak S, Shapiro NI, et al. Association between timing of antibiotic administration and mortality from septic shock in patients treated with a quantitative resuscitation protocol. Crit Care Med. 2011;39:2066–2071. doi: 10.1097/CCM.0b013e31821e87ab. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Cross AS, Opal SM. A new paradigm for the treatment of sepsis: is it time to consider combination therapy? Ann Intern Med. 2003;138:502–505. doi: 10.7326/0003-4819-138-6-200303180-00016. [DOI] [PubMed] [Google Scholar]
  • 10.Moore TA, Perry ML, Getsoian AG, Monteleon CL, Cogen AL, Standiford TJ. Increased mortality and dysregulated cytokine production in tumor necrosis factor receptor 1-deficient mice following systemic Klebsiella pneumoniae infection. Infect Immun. 2003;71(9):4891–4900. doi: 10.1128/IAI.71.9.4891-4900.2003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Bodmer M, Fournel MA, Hinshaw LB. Preclinical review of anti-tumor necrosis factor monoclonal antibodies. Crit Care Med. 1993;21(10 Suppl):S441–S446. doi: 10.1097/00003246-199310001-00005. [DOI] [PubMed] [Google Scholar]
  • 12.DiLeo MV, Kellum JA, Federspiel WJ. A simple mathematical model of cytokine capture using a hemoadsorption device. Ann Biomed Eng. 2009;37:222–229. doi: 10.1007/s10439-008-9587-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Day J, Rubin J, Vodovotz Y, et al. A reduced mathematical model of the acute inflammatory response II. Capturing scenarios of repeated endotoxin administration. J Theor Biol. 2006;242:237–256. doi: 10.1016/j.jtbi.2006.02.015. [DOI] [PubMed] [Google Scholar]
  • 14.Schlag G, Redl H, editors. Shock, Sepsis, and Organ Failure. Berlin, Heidelberg: Springer Berlin Heidelberg; 1999. [Google Scholar]
  • 15.Xiao Y, Griffin MP, Lake DE, et al. Nearest-neighbor and logistic regression analyses of clinical and heart rate characteristics in the early diagnosis of neonatal sepsis. Med Decis Making. 2010;30:258–266. doi: 10.1177/0272989X09337791. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Khaenam P, Rinchai D, Altman MC, et al. A transcriptomic reporter assay employing neutrophils to measure immunogenic activity of septic patients’ plasma. J Transl Med. 2014;12:65. doi: 10.1186/1479-5876-12-65. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Data File _.doc_ .tif_ pdf_ etc._

RESOURCES