Abstract
Sampling repeated clinical laboratory tests with appropriate timing is challenging because the latent physiologic function being sampled is in general nonstationary. When ordering repeated tests, clinicians adopt various simple strategies that may or may not be well suited to the behavior of the function. Previous research on this topic has been primarily focused on cost-driven assessments of oversampling. But for monitoring physiologic state or for retrospective analysis, undersampling can be much more problematic than oversampling. In this paper we analyze hundreds of observation sequences of four different clinical laboratory tests to provide principled, data-driven estimates of undersampling and oversampling, and to assess whether the sampling adapts to changing volatility of the latent function. To do this, we developed a new method for fitting a Gaussian process to samples of a nonstationary latent function. Our method includes an explicit estimate of the latent function’s volatility over time, which is deterministically related to its nonstationarity. We find on average that the degree of undersampling is up to an order of magnitude greater than oversampling, and that only a small minority are sampled with an adaptive strategy.
Introduction
Appropriately timing repeated observations of a clinical laboratory test is challenging because the physiologic function being sampled is in general nonstationary. The nonstationarity implies that optimally timing the next sample can depend on understanding the function’s recent past and guessing at its near future. Current clinical testing practice is often based on simple rules of thumb, with details that vary by clinician. Generally, if a doctor wants to monitor the value of a test over time, he will repeat the test at a standard interval felt appropriate for that particular test. If the previous observation was outside the reference range for healthy patients, or if the doctor has reason to believe that the physiologic parameter under test is likely to be unstable (because, for example, the patient has started a particular medication), then the doctor may schedule the test at a shorter interval.
Historically, research into repeat testing strategies has emphasized the cost-driven question of whether we are oversampling the physiologic function (van Walraven and Raymond 2003; Bates et al. 1998). But if the goal of testing is to monitor a patient’s physiologic state, then undersampling can cause us to miss significant changes in that state, with subsequent delays in appropriate treatment and adverse clinical outcomes.
Moreover, understanding implicit sampling strategies is important in retrospective analysis, where the goal is to investigate patterns residing in data collected over the course of routine clinical care. For example, if we want to retrospectively model a physiologic parameter as a continuous function over time, the accuracy of our model will depend on how we handle any potential dependence between the samples’ timing and the parameter that they are monitoring.
In this paper we investigate the sampling strategies implicit in sequences of laboratory test measurements, as well as the effect that those strategies have on the accuracy of our inferred models. To do this, we introduce a new method for inferring a nonstationary Gaussian process from a finite set of observations by explicitly modeling the volatility of the function over time as a step in the inference. The volatility function is used to create a warping that transforms the observations into a space in which the latent physiologic function is stationary and can be modeled with standard methods.
We define volatility more precisely below, but it can be thought of as the rate at which a measurement made at a particular time goes stale. A high volatility implies a high information rate provided by the latent physiologic function, and we will use volatility and information rate interchangably.
We validate our methods against synthetic data and demonstrate the accuracy and limitations of recovering both the latent physiologic function and the volatility function under various sampling strategies. We then introduce the use of volatility functions to retrospectively evaluate the sampling strategies of hundreds of real sequences from four common laboratory tests against the actual behavior of the clinical variables being observed.
Nonstationary Gaussian Process Regression
Gaussian process regression is a Bayesian nonparametric method that models a set of time-dependent observations y = {y1, …, yn} made at times t = {t1, …, tn} by inferring a distribution P(f(t)|y, t) over all functions y = f(t) that could have produced the observations. The topic is a large one for which extensive tutorials are available (Rasmussen and Williams 2006).
To construct the model, we place a Gaussian process prior over all possible functions f(t), where M(t) is the prior mean function (that for convenience we will take to be zero) and C(t, t′; θ) is a covariance function with hyperparameters θ that defines the dependence between any two values f(t) and f(t′).
The predictive distribution of an unobserved value y* = f(t*) given observations {y, t} and hyperparameters θ is
where
| (1) |
| (2) |
K is a matrix with elements Kij = C(ti, tj; θ), k is a vector with elements ki = C(ti, t*; θ), and κ = C(t*, t*; θ) is a scalar. Here, is the predictive mean, and the predictive variance.
Applications of Gaussian processes commonly use stationary covariance functions that depend only on the difference t − t′. A ubiquitous example is the squared exponential covariance function
| (3) |
These kinds of covariance functions are appealing because they encode the notion that observations made close in time should be close in value. Most such functions have (at least) a length-scale hyperparameter τo that encodes the meaning of close in time, and an overall signal variance hyperparameter that encodes the meaning of close in value.
However, using a constant value τo does not provide useful results for our data because the volatility of clinical laboratory values can vary over time by as much as an order of magnitude. We would like to account for this variable volatility by modeling our data using a time-varying length scale τ(t), or equivalently a time-varying volatility function v(t) = 1/τ(t).
Our method infers a posterior distribution over functions v(t|y, t) simultaneously with the distribution over functions f(t|y, t). It operates by warping the observation times ti into a new space xi = w(ti) such that yi are well modeled by for which the covariance function C is stationary and therefore amenable to standard Gaussian process inference with a fixed τo. The overall regression function then becomes f(t) = g(w(t)).
The warp function is the integrated volatility function, which we model as a log Gaussian process, or . The full generative model is as follows:
-
θv = {τv, σv, … } ~ P(θv)
θg = {τg = 1, σg = 1, … } ~ P(θg)
f(t) = g(w(t))
(We constrain σg = 1, τg = 1 to avoid an identifiability problem.)
Inference
Given observations y and t, we infer the marginal likelihood P(y|t), marginalized over all values of the warped locations x and hyperparameter settings θ = {θv, θg} (Algorithm 1). As part of this inference we get a posterior distribution over volatility functions v(t).
The marginal likelihood is
| (4) |
where
| (5) |
because y is independent of t given x. Each of the two conditional probabilities in (5) is a standard marginal likelihood of a Gaussian process, which has the analytic form
We compute the double integral in (4) using MCMC draws for θ and ln v(t) (specifically, we use the method of slice sampling with surrogate data (Murray and Adams 2010)).
Lastly, we add a regularization term R[v(t)] to produce the final log likelihood L for our MCMC iterations
| (6) |
where
and regularization weight b is tuned using human-guided search.
Sampling Strategy Analysis
In addition to using Algorithm 1 to infer the latent functions f(t) and v(t), we also use a modified version of it to infer the sampling rate function λ(t), which models the decision to sample a clinical variable at time ti. The function λ(t) is directly analgous to v(t) but is inferred using only the sample times t. To estimate λ(t) we change Algorithm 1 Step 4 to fit the warped times with a stationary gamma process instead of a stationary Gaussian process, with appropriate modifications to the likelihood function (Lasko 2014). A gamma process models the observation times t without regard to the values y. For this inference task we used the MATLAB code1 accompanying (Lasko 2014); for brevity, we refer readers to that work for further details.
The sampling rate λi = λ(ti) estimates how often a variable is being measured around time ti, and the information rate vi = v(ti) estimates how quickly the variable is changing during the same period (Figure 1). We can visualize them together in rate space by plotting estimates of the sampling rate λi vs. the information rate vi for each ti (Figure 2). The rate space plot allows us to distinguish the various sampling strategies described below.
Figure 1.
Results of our inference algorithm applied to our highly nonstationary synthetic dataset sampled under two different strategies. A regular sampling strategy (194 samples) produces general oversampling with periods of undersampling (bottom left, intervals around t = 70 and t = 140). An adaptive sampling strategy using approximately the same number of samples (190 samples) distributes them so as to be approximately 2× oversampled everywhere (bottom right). Adaptive sampling allows for much more accurate inference of both the nonstationary latent function (top two panels) and its volatility curve (bottom panel). Thin lines: true curves, thick lines: inferred curves, shaded regions: 95% confidence intervals. Ticks at the bottom of the lowest panel are observation times.
Figure 2.
Sampling strategy appearance in rate space. Adaptive sampling at high and ideal rates are displayed in the left two panels, regular sampling at high and moderate rates in the right two. Points indicate rate values (vi, i) for each actual sample, with ellipses indicating 95% confidence intervals. Diagonal line indicates ideal sampling. Rate angle φ is shown for one point in the third panel. The first and fourth panels are for the same data displayed in Figure 1.
We arbitrarily define λi = vi as the ideal sampling rate for a given volatility, indicated by a diagonal line on the rate space plot. Sampling below this rate (points below the line) increases the uncertainty between observations, and sampling above it (points above the line) decreases the uncertainty. The true ideal in practice would be determined by the cost of making observations vs. the cost of uncertainty between measurements.
For a given set of observations, the degree of oversampling can be reliably estimated, but the degree of undersampling is less precise, because the undersampling itself hampers our ability to estimate the true information rate. For example, aliasing effects can degrade the volatility estimate, with the inferred function appearing to vary slower than it actually does (Figure 1, top left panel), although the confidence intervals usually do still capture the truth.
Adaptive sampling is the strategy of making observations at a rate proportional to the volatility of the variable. It appears in rate space as points distributed along the line λi = mvi, for some value of m (Figure 2, left two panels). For example, using m = 1 is ideal adaptive sampling and m > 1 is adaptive oversampling.
Perfectly adaptive sampling chooses samples t such that their warped counterparts x = w(t) are equally spaced. If we approximate x and t as discrete and mapped one-to-one via w, then x is an optimal (maximum entropy) sampling of the stationary g(x) given n points (Shewry and Wynn 1987). And because the distributions P(f(t*) = y* |y, t, θ) = P(g(x*) = y*|y, x, θ) are identical at any point x* = w(t*), then the observations t are an optimal sampling of the physiologic function f(t).
Regular sampling is the strategy of making observations at regular intervals, independent of the volatility of the variable. It appears in rate space as points distributed along a horizontal line (Figure 2, right two panels). Regular sampling can be both wasteful (by oversampling during periods of low volatility) and insufficient (by undersampling during periods of high volatility).
Random sampling is the strategy of making observations at times drawn independently from some probability density that does not depend on the variable being observed. In its pure form, it is not done deliberately in clinical practice, but it is a significant unintentional factor in most real-world sequences. It appears in rate space as jitter over what would otherwise be a smooth progression of points. While true random sampling may be impractical for clinical laboratory testing, it has intriguing possibilities for sampling efficiency (Candès and Wakin 2008), which we do not further explore here.
Summary Measures
In our experiments we used the following heuristic measures to quantify oversampling, undersampling, and apdativity in aggregate.
We define the Sample Excess E and the analagous Sample Deficit D as
| (7) |
| (8) |
The Sample Excess E is the expected number of superfluous samples made when a variable with information rate v(t) is observed using a sampling rate λ(t). Using the total observation time Δt, we can express normalized measures Excess Rate , with units of superfluous samples per year, and Excess Fraction EF = E/n giving the fraction of actual samples expected to be superfluous. For deficits, the analagous Deficit Rate DR gives the number of missing samples per year, and the Deficit Fraction DF gives the number of missing samples per actual sample. A sequence can have a nonzero number of both excess samples made during times of oversampling, and deficit samples missed during times of undersampling.
The Rate Angle
| (9) |
gives the angle of the point at time t in rate space (Figure 2). The ideal sampling rate λ(t) = v(t) produces φ(t) = π/4; extreme oversampling produces φ(t) = π/2, and extreme undersampling produces φ(t) = 0.
The Adaptivity
| (10) |
is an aggregate measure of how closely the sampling rate adapts to the apparent information rate. A sequence in which all points are sampled near the ideal rate has an adaptivity A ≈ 1, extreme under- or oversampling produces A ≈ 0, and rates off by a factor of two in either direction will produce A ≈ 0.5. Regular sampling will produce varying values for A, but will generally be A < 0.5.
Lastly, the Undersampling Time (Fraction)
| (11) |
where I(·) is the indicator function, is the fraction of time for which the variable is undersampled.
Related Work
There is very little prior work investigating sampling strategies in clinical medicine. Van Walraven and Raymond (2003) found between 49 and 153 redundant samples per 100 persons per year over a handful of tests types, using a fixed time interval for each type to determine redundancy of a repeated sample. Using similar methods, Bates et al. (1998) found on average 8.6% of repeated samples of 10 different tests to be redundant.
Investigating the use of repeated samples in retrospective research, Albers (2014) studied the use of sample time, which is equivalent to a fixed warping such that the observations are equally spaced, and Lasko (2013) constructed a warping based on the inter-sample intervals in the original space, with a parameter that globally tunes the warping to a point between no warping and sample time. Our method is a more principled approach that produces a dynamic warping and a data-driven assessment of sampling strategies.
There is more prior work on methods to fit a Gaussian process to a nonstationary function (Rasmussen and Williams 2006). An early approach (Sampson and Guttorp 1992) warped two-dimentional space based on known covariances between repeated measurements of the points in that space, rather than learning the covariances from a single observation as we have done. Pfingsten, Kuss, and Rasmussen (2006) proposed a clever mixture model. Adams and Stegle (2008) used a product of Gaussian processes to model a nonstationary function, equivalent to inferring a time-varying σ(t). Several others have modeled τ(t) in conjunction with a specific covariance function that averages the length scales at the two points under consideration (Gibbs 1997; Paciorek and Schervish 2004). Unfortunately, that covariance function has counterintuitive properties that render the length scale model unsuitable for investigating sampling strategies (although still useful for modeling nonstationary functions for which the volatility function is irrelevant). Finally, the deep Gaussian processes method (Damianou and Lawrence 2013) constructs multiple successive warping layers, but it is not obvious how to constrain them to our requirement of monotonicity.
Experiments
We ran a number of experiments on sequences of both synthetic and real data. For each sequence {y, t}, we inferred distributions over f(t), g(x), and v(t) using Algorithm 1 under 1000 burn-in and 1000 sampling iterations. We separately inferred a distribution over λ(t) under the same settings. Convergence of both runs was usually within 500 iterations.
For all experiments, the specific covariance function we used in Algorithm 1 Step 4 was the composite function
where a and σn are learned hyperparameters. The second term on the right-hand side models longer time-scale, less dominant variations in f, and the third term models noise in the measurements.
Synthetic Data
We evaluated our algorithm using a synthetic dataset that mimics our clinical data. We created the latent function f(t) and volatility function v(t) using the generative model described above, and sampled f(t) using both adaptive (xi ~ Poisson(α)) and regular (ti ~ Poisson(α)) sampling strategies at various levels of overall sampling rate. Setting α = 1 corresponds to ideal adaptive and moderate regular sampling, and increasing increases the sampling rate.
We computed the mean squared error (MSE) for our algorithm recovering f(t) and v(t) at 200 equally spaced points along the curves, and computed all summary measures (Equations 7 - 10) averaged over all MCMC draws.
We found that for a given number of samples, adaptive sampling allowed for about a 2× more accurate estimate of f(t) and up to an order of magnitude better estimate of v(t) (Table 1 and Figure 1). The summary measures behave as expected for these known strategies, and the rate space plots clearly differentiate between them (Figure 2).
Table 1.
Inference accuracy and summary measures under various sampling strategies. Accuracy is higher for adaptive strategies given a similar sample size. Summary measures behave as expected. Selected datasets are visualized in detail in Figures 1 and 2.
| Strategy | n | MSE v(t) | MSE f(t) | EF | DF | ER | DR | UF | A |
|---|---|---|---|---|---|---|---|---|---|
| Adaptive high | 190 | 0.012 | 0.040 | 0.61 | 0.00 | 213 | 0.7 | 0.09 | 0.49 |
| Adaptive ideal | 81 | 0.024 | 0.130 | 0.20 | 0.12 | 29.3 | 17.6 | 0.47 | 0.84 |
| Adaptive low | 36 | 0.094 | 0.287 | 0.09 | 1.65 | 5.7 | 108 | 0.79 | 0.58 |
|
| |||||||||
| Regular high | 779 | 0.070 | 0.013 | 0.91 | 0.00 | 1294 | 0.0 | 0.00 | 0.08 |
| Regular moderate | 194 | 0.180 | 0.096 | 0.70 | 0.27 | 248 | 97.2 | 0.16 | 0.20 |
| Regular low | 56 | 0.266 | 0.854 | 0.62 | 0.89 | 63.3 | 90.8 | 0.29 | 0.24 |
Real Data
We next applied these methods to analyze sampling strategies for four common clinical laboratory tests that we expected to display a range of strategies: Uric Acid (UA), Thyroid Stimulating Hormone (TSH), Creatinine (Cr), and LDL Cholesterol (LDL). Sequences were typically several (up to 15) years in duration.
For this experiment we used patient records extracted (with IRB approval) from the deidentified mirror of our institution’s electronic medical record. For each test we selected 100 patient records uniformly at random from all patients with at least 10 existing samples for the test, and collected the complete sample sequence from each record. We preprocessed each sequence by removing a trendline fit by linear regression and then standardizing to unit standard deviation. After inference, we computed the summary measures for each test, averaged over all MCMC draws and all 100 sequences of that test.
We found adaptive sampling to be an uncommon strategy for all four tests, and on average the relative fraction of missed samples DF is up to an order of magnitude greater than the fraction of redundant samples EF (Figure 3 and Table 2). There was an even greater imbalance for the rate of missed DR vs. extra ER samples per day of observation. TSH and LDL had fewer samples on average than Uric Acid and Creatinine, but they were still up to 50% more redundant. Uric Acid was the most adaptively sampled test, with about 50% of sequences having adaptivity above 0.5, and it had the lowest Excess Fraction (0.18).
Figure 3.
Sampling behavior experienced by the four tests. See text for metric semantics. Lines to the right are better for Adaptivity, lines to the left are better for other metrics. CDF: Cumulative distribution function, UA: Uric Acid, Cr: Creatinine, TSH: Thyroid Stimulating Hormone, LDL: Low Density Lipoprotein Cholesterol.
Table 2.
Overall sampling behavior for real data. Median values for the densities in Figure 3 are given. All tests are undersampled (EF, ER) to a much greater degree than they are oversampled (DF, DR), spending more than half their time in an undersampled state (UF). None experience predominantly adaptive sampling (A near 1.0), although Uric Acid does moreso than the others. n̄: Mean number of samples per sequence. See text for details of other measures.
| n̄ | EF | DF | ER | DR | UF | A | |
|---|---|---|---|---|---|---|---|
| UA | 21.2 | 0.18 | 1.88 | 1.17 | 17.0 | 0.68 | 0.49 |
| Cr | 26.4 | 0.24 | 2.37 | 1.11 | 18.2 | 0.56 | 0.34 |
| TSH | 15.2 | 0.36 | 1.57 | 0.56 | 2.7 | 0.51 | 0.39 |
| LDL | 15.6 | 0.30 | 1.88 | 0.45 | 2.9 | 0.56 | 0.39 |
Discussion
We have introduced a new method for fitting a Gaussian process to data produced by a nonstationary latent function f(t). Our method explicitly estimates the information rate or volatility v(t) of the function in order to represent its nonstationarity. We used this method in combination with our related existing method that infers nonstationary sampling rates λ(t), to compare the actual sampling of clinical laboratory tests against their theoretically optimal sampling as judged in retrospect by the volatility of the data.
To our knowledge this is the first study to compare information rates to sampling rates in this way, and to evalute clinical laboratory sampling strategies using principled, data-driven methods. We found that while clinicans occasionally employ a strategy close to optimal adaptive sampling, in most cases they do not. This may be due to the fact that actually executing adaptive sampling in practice is difficult, as well as to other valid considerations such as scheduling convenience or greater clinical concern with abnormal values than with normal ones.
Under our mathematical criteria, we found a slight preference for undersampling, but when present it is up to an order of magnitude greater effect than oversampling; clinicians undersampled on average by about 190 percent (DF) as judged by the variable’s information rate, but oversampled only by about 27 percent (EF). One limitation of our study is that some patients occasionally seek care outside our system, and if their records are included in our study then our results would be biased toward undersampling.
These numbers rely on our standard that the optimal sampling is to make the observations at exactly the rate at which they go stale, or λ(t) = v(t). This standard is defensible but arbitrary. In reality, the optimal balance may take the form λ(t) = mv(t), where m balances the immediate financial cost of oversampling against the longer-term costs to clinical decision making, health maintenance, and research of being less able to accurately estimate the patient’s physiologic state between measurements.
This work focused on retrospective analysis of existing data. It is an exciting direction of future work to be able to give prospective guidance on when to sample next, given past observations.
Acknowledgments
This work was funded by grants from the Edward Mallinckrodt, Jr. Foundation and the National Institutes of Health 1R21LM011664-01. Clinical data was provided by the Vanderbilt Synthetic Derivative, which is supported by institutional funding and by the Vanderbilt CTSA grant ULTR000445.
Footnotes
References
- Adams RP, Stegle O. Gaussian process product models for nonparametric nonstationarity; Proceedings of the 25th International Conference on Machine Learning, ICML ’08; New York, NY, USA: ACM. 2008.pp. 1–8. [Google Scholar]
- Albers D. Dynamical biomedical informatics; Conference on MeaMeaning Use of Complex Medical Data (MUCMD 2014); 2014. [Google Scholar]
- Bates DW, Boyle DL, Rittenberg E, Kuperman GJ, Ma’Luf N, Menkin V, Winkelman JW, Tanasijevic MJ. What proportion of common diagnostic tests appear redundant? Am J Med. 1998;104(4):361–368. doi: 10.1016/s0002-9343(98)00063-1. [DOI] [PubMed] [Google Scholar]
- Candès EJ, Wakin MB. An introduction to compressive sampling. Signal Processing Magazine, IEEE. 2008;25(2):21–30. [Google Scholar]
- Damianou AC, Lawrence ND. Deep gaussian processes; Proceedings of the 16th International Conference on Artificial Intelligence and Statistics (AISTATS); 2013. [Google Scholar]
- Gibbs MN. Bayesian Gaussian Processes for Regression and Classification. University of Cambridge; 1997. Ph.D. Dissertation. [Google Scholar]
- Lasko TA, Denny JC, Levy MA. Computational phenotype discovery using unsupervised feature learning over noisy, sparse, and irregular clinical data. PLoS One. 2013;8(6):e66341. doi: 10.1371/journal.pone.0066341. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lasko TA. Efficient inference of gaussian process modulated renewal processes with application to medical event data; Proceedings of the Thirtieth Conference on Uncertainty in Artificial Intelligence (UAI); 2014; [PMC free article] [PubMed] [Google Scholar]
- Murray I, Adams RP. Slice sampling covariance hyperparameters of latent Gaussian models. In: Lafferty J, Williams CKI, Zemel R, Shawe-Taylor J, Culotta A, editors. Advances in Neural Information Processing Systems. Vol. 23. 2010. pp. 1723–1731. [Google Scholar]
- Paciorek CJ, Schervish MJ. Nonstationary covariance functions for gaussian process regression. In: Thrun S, Saul L, Schölkopf B, editors. Advances in Neural Information Processing Systems 16. MIT Press; Cambridge, MA: 2004. [Google Scholar]
- Pfingsten T, Kuss M, Rasmussen K. Nonstationary Gaussian process regression using a latent extension of the input space; ISBA Eighth World Meeting on Bayesian Statistics; 2006; Extended Abstract. [Google Scholar]
- Rasmussen CE, Williams CKI. Gaussian processes for Machine Learning. MIT; 2006. [Google Scholar]
- Sampson PD, Guttorp P. Nonparametric estimation of nonstationary spatial covariance structure. J Am Stat Assoc. 1992;87(417):108–119. [Google Scholar]
- Shewry MC, Wynn HP. Maximum entropy sampling. Journal of Applied Statistics. 1987;14(2):165–170. [Google Scholar]
- van Walraven C, Raymond M. Population-based study of repeat laboratory testing. Clinical Chemistry. 2003;49(12):1997–2005. doi: 10.1373/clinchem.2003.021220. [DOI] [PubMed] [Google Scholar]




