Skip to main content
Wiley Open Access Collection logoLink to Wiley Open Access Collection
. 2025 Sep 23;44(20-22):e70277. doi: 10.1002/sim.70277

Gaussian Process Regression for Value‐Censored Functional and Longitudinal Data

Adam Gorm Hoffmann 1,, Claus Thorn Ekstrøm 1, Benjamin Zeymer Christoffersen 2,3, Andreas Kryger Jensen 1
PMCID: PMC12455908  PMID: 40985482

ABSTRACT

Gaussian process (GP) regression is widely used for flexible and non‐parametric Bayesian modeling of data arising from underlying smooth functions. This paper introduces a solution to GP regression when the observations are subject to value‐based censoring. We derive exact and closed‐form expressions for the conditional posterior distributions of the underlying functions in both the single‐curve fitting case and in the case of a hierarchical model where multiple functions are modeled simultaneously. Our method can accommodate left, right, and interval censoring, and is directly applicable as an empirical Bayes method or integrated in a Markov–Chain Monte Carlo sampler for full posterior inference. The method is validated through extensive simulations, where it substantially outperforms naive approaches that either exclude censored observations or treat them as fully observed values. We give an application to a real‐world dataset of longitudinal HIV‐1 RNA measurements, where the observations are subject to left censoring due to a detection limit.

Keywords: Bayesian data analysis, functional data analysis, longitudinal data, outcome truncation, value‐censored data

1. Introduction

Gaussian process (GP) regression is widely recognized for its flexibility and non‐parametric approach in Bayesian modeling of data derived from underlying smooth functions [1]. However, a frequent challenge encountered across various fields is the censoring of certain observations from these latent functions. This issue often arises due to detection limits, for example, in scenarios such as privacy concerns in social sciences, with light saturation in image analysis [2], or the estimation of substance concentrations from bioassays [3]. A motivating example where censored functional data is encountered in medical research is data from continuous glucose monitoring (CGM) devices that are routinely used to monitor patients with diabetes. Data from such devices are often right‐censored due to a detection limit on high glucose values. Different summary measures, such as daily average or time‐in‐range, are calculated from these CGM trajectories to characterize patients' status, and such summary measures will be biased unless the censoring is specifically modeled. Therefore, robust statistical methods to handle censoring for GP regression are essential for accurate data analysis and decision‐making.

Several authors have studied potentially left‐, right‐, or interval‐censored responses of underlying Gaussian distributions. Thiébaut and Jacqmin‐Gadda [4] reviewed and compared two likelihood‐based methods for handling left‐censored data in linear mixed models, using HIV‐1 RNA measurements as an example. Their models are fully parametric—including the random effects—and the different model specifications provide similar estimation results. However, one of their implementations allows for more flexibility of the error term, as a random process can be assumed.

The earlier paper by Hughes [5] takes an identical approach for linear mixed models but addresses the implementation problem by deriving an expectation‐maximization (EM) style solution for fast estimation in combination with a Gibbs sampler. In the paper, the author claims that a Bayesian solution might be a possible alternative, but that convergence would be prohibitively slow. The EM algorithm is also the crux of the paper by Vaida and Liu [6], where the idea of Hughes [5] is extended to non‐linear mixed effects and includes restricted maximum likelihood estimates. In addition, Vaida and Liu [6] relaxes the necessity to include Monte Carlo simulation as part of the estimation procedure, thereby obtaining a 40 times computational speedup.

Vock et al. [7] considered a mixed model framework with a smooth, semi‐nonparametric random effects density for censored longitudinal data. Their approach alleviates the problem that the maximum likelihood estimators will be inconsistent when the random‐effects density deviates substantially from normality. They find through simulations that the flexible random‐effect density reduces bias and improves efficiency compared to assuming Gaussian random effects.

Common to all of the above references is that they assume that all measurements are taken at the same (fixed) time points, and they require the random effect distribution to be specified. In addition, they are all within the frequentist framework and indirectly consider the observed and censored measurements as fixed‐time representations of an underlying process that is never modeled. We show how to remedy this and move the setting to a Bayesian framework.

Estimation of latent Gaussian processes in the presence of censoring has previously been studied by Groot and Lucas [8] and Gammelli et al. [9]. Their approaches approximate the posterior distribution of the (censored) latent process through expectation propagation since they both claim that the posterior distribution is intractable.

Mattos et al. [10] analyzes a dataset on HIV‐1 RNA levels in 46 patients where the viral load detection is censored because of the sensitivity of the lower detection limit of the laboratory assay. They propose a semiparametric mixed‐effects model with auto‐correlation and obtain estimates using the EM algorithm in a frequentist setting. Their approach is sensitive to the time registration and spacing, and their results do not provide estimates of the full latent process, let alone Bayesian interpretations. The HIV‐1 RNA data were also analyzed in a previous work by some of the same authors, where they employed a flexible longitudinal linear mixed‐effects model for censored data to compute maximum likelihood estimates [11].

In contrast, we instead give an exact representation of the conditional posterior distribution of the latent function values given the observations, the censoring limits, and the parameters governing the Gaussian process. Based on this expression, we give an exact posterior simulation procedure that, when used as part of an empirical Bayes method, gives faster computation times than full Bayesian inference via MCMC.

We provide an implementation of the method in R [12] with certain components implemented in the probabilistic programming language Stan [13] and exposed to R using the R package cmdstanr [14]. The implementation and code for applying it on the HIV‐1 RNA data set can be found on the first author's GitHub page at https://github.com/adamgorm/censored‐gp.

This manuscript is structured as follows. In Section 2, we present the hierarchical Gaussian process regression model in both the univariate case of a single latent function and in the multivariate case where a collection of latent functions is modeled together with their common mean function. We derive the likelihood functions for the observed data, the exact analytic form of the posterior distributions of the latent functions, and present a constructive way of simulating from the posterior distributions using an equivalence in distributions. In Section 3, we perform a simulation study comparing the integrated squared and absolute errors of the posteriors from our method to an oracle model and to naive ways of handling the censoring. In Section 4, we investigate frequentist coverage of the credible intervals. In Section 5, we give an application to longitudinal measurements of HIV‐1 RNA with left censoring, and we conclude with a discussion in Section 6.

Notation: Vectors and matrices are denoted by bold lower and upper case letters, respectively, and for vectors x,yn we write x>y to mean that xi>yi for all i{1,,n}. We write ϕ(·;μ,) and Φ(·;μ,) for density and cumulative density function of the multivariate normal distribution with expectation μ and covariance matrix , respectively, and write PTN(μ,;l,u) when P follows a multivariate truncated normal distribution with parameters μ and truncated element‐wise from below at the vector l and above by the vector u.

2. Methods

2.1. Univariate Gaussian Process Regression

We assume a classical univariate Gaussian process regression setting with a single latent function f from which we observe noisy realizations Y(tj). Specifically, we consider the following hierarchical Gaussian process model:

ΘGΨf|θGP(0,Kθ)Y(tj)|f(tj),ΘN(f(tj),σ2) (1)

with covariance function Kθ and some prior distribution GΨ on the parameters Θ:=(θ,σ).

We will assume that at Jo time points, to, we get complete observations Yo=yoJo, while at Jc time points, tc, we only observe that Yc>ycJc, where yc is a vector of censoring values. For simplicity, we only present the right‐censored case, but similar calculations would work in the left‐censored case, where we observe Yc<yc and in the interval censored case where we observe y<Yc<yu. We point out that we do not put any restriction on the censoring values yc, so in particular they are allowed to be time‐dependent. This corresponds to observing the response (Y˜i,Δi) at each time ti, where Δi=1 if Y˜i=Yi (indicating that at time ti, we observed the actual response of interest Yi) and Δi=0 if Y˜i<Yi (indicating that at time ti, we only observed a censoring value Y˜i below the actual response of interest Yi). With this notation, we define the vectors

to=ti:Δi=1,Yo=Yi:Δi=1,yo={Y˜i:Δi=1},tc=ti:Δi=0,Yc=Yi:Δi=0,yc={Y˜i:Δi=0}

ordered identically so that Yio corresponds to tio and so on. We denote by t˜Jp the time points at which we want to evaluate the posterior of f(t˜) and further define the following covariance matrices according to the model specification in Equation (1) for the complete and censored data, and the latent Gaussian process as

o=Kθ(to,to)+σ2IJo,c=Kθ(tc,tc)+σ2IJc,f=Kθ(t˜,t˜),co=Kθ(tc,to),fc=Kθ(t˜,tc),fo=Kθ(t˜,to) (2)

where In denotes the n×n identity matrix.

The goal of inference is the posterior distribution p(f(t˜)|Yo,Yc>yc). In what follows, we will distinguish between two ways of doing posterior inference. In full Bayesian inference, MCMC is used to sample from this posterior, for example, using Hamiltonian Monte Carlo by implementing the posterior density p(f(t),Θ|Yo,Yc>yc)p(Yo,Yc>yc|f(t),Θ)p(f(t)|Θ)p(Θ) (where t=(to,tc,t˜)) from Equation (1) in a probabilistic programming language such as Stan [13]. In empirical Bayes inference, instead of placing a prior GΨ on Θ (see, e.g., the section Priors for Gaussian process parameters in [13]), a point estimate of Θ is first obtained by maximizing the log‐likelihood Θ^=argsupΘlogp(Yo=yo,Yc>yc|Θ) and then one samples from the conditional posterior p(f(t˜)|Yo,Yc>yc,Θ^). In full Bayes one would instead have first sampled Θ from its marginal posterior p(Θ|Yo,Yc>yc)p(Yo,Yc>yc|Θ)p(Θ) and then have sampled f(t˜) from the conditional posterior p(f(t˜)|Yo,Yc>yc,Θ(i)) for each posterior sample Θ(i) (although in practice Θ and f(t) would often be sampled jointly). This illustrates how empirical Bayes neglects uncertainty about the GP parameters Θ since they are treated as known and equal to Θ^, meaning that credible intervals will tend to be narrower compared to the full Bayes posterior.

Under this model, the likelihood of the observed data is given in Proposition 1, a proof of which is given in the Supporting Information.

Proposition 1

In the censored Gaussian process regression setting under the model in Equation (1), the likelihood of the observed data in terms of the parameters Θ=(θ,σ) is

L(Θ)=p(Yo=yo,Yc>yc|Θ)=ϕ(yo;0Jo,o)Φ(yc;ξc|o,c|o)

with ξc|o=coo1yo, c|o=cccoo1coT, and the involved covariance matrices are given in Equation (2).

The result from Proposition 1 enables us to either evaluate the likelihood in a fully Bayesian setting or to estimate the parameters Θ in an empirical Bayes setting with Θ^=argsupΘlogL(Θ) (where one may consider reparametrizing from σ to log(σ) depending on the optimization routine used). In either case, the next step is to obtain an expression for the conditional posterior distribution p(f(t˜)|Yo=yo,Yc>yc,Θ) such that posterior evaluations of the latent Gaussian process can be obtained at any vector of time points t˜. A direct calculation (given in the Supporting Information) shows that

p(f(t˜)|Yo=yo,Yc>yc,Θ)=ϕ(f(t˜)|ξf|o,f|o)×Φ(yc;ξc|o,f,c|o,f)Φ(yc;ξc|o,c|o) (3)

where the conditional means and covariances are given by

ξf|o=foo1yo,f|o=ffoo1foTξc|o=coo1yo,c|o=cccoo1coTξc|o,f=cocfooffof1yof(t˜)c|o,f=ccocfooffof1coTcfT (4)

based on the different evaluations of the covariance kernel defined in Equation (2).

It is of interest to note that the conditional posterior density in Equation (3) consists of the term ϕ(f(t˜)|ξf|o,f|o), which is the posterior distribution for only the non‐censored observations, scaled by the term Φ(yc;ξc|o,f,c|o,f)Φ(yc;ξc|o,c|o)1, which takes the censored observations into account. This scaling term gives higher weight to functions f that take on large values at the censoring times.

As the form of the conditional posterior density is non‐standard, it is not immediately clear how to draw samples from it. Using a result from Arellano‐Valle et al. [15, eqn. (15)], Proposition 2 states a constructive way for simulating random variables from a distribution having a density function of the form given in Equation (3). A proof is given in the Supporting Information.

Proposition 2

The distribution of f(t˜)|Yo=yo,Yc>yc,Θ is the same as the distribution of ξf|o+fc|oc|o1P+Q, where

PTN(0Jc,c|o,ycξc|o,),QN0Jp,f|ofc|oc|o1fc|oT

with fc|o=fcfoo1oc.

From the result in Proposition 2, the conditional posterior distribution of the latent Gaussian process, f, at any vector of evaluation points t˜ can be simulated by drawing from the corresponding multivariate normal and multivariate truncated normal distributions and combining them according to the expression. It thus eliminates the need for potentially inefficient custom MCMC procedures to sample from the density in Proposition 1, instead requiring just an efficient way to simulate from the multivariate truncated normal distribution, which is already implemented in, for example, the TruncatedNormal package by Botev and Belzile [16]. This can either be implemented in the empirical Bayes setting where Θ is substituted by Θ^ that maximizes the likelihood in Proposition 1 or in the fully Bayesian setting with posterior draws of Θ using a Markov chain Monte Carlo algorithm. A simulation study included in the Supporting Information shows that an empirical Bayes implementation using Proposition 2 is faster than a naive full Bayes MCMC implementation and that the relative speed difference increases with the number of observations and number of points to predict at. It is, for example, about 19 times faster when fitting to 500 time points and predicting at 500 time points.

In the case of left censoring or interval censoring, similar applications of Arellano‐Valle et al. [15, eqn. (15)] give ways to simulate from the conditional posterior distribution as in Proposition 2. For left‐censored data (i.e., Yc<yc) one must draw PTN(0Jc,c|o,,ycξc|o) and for interval‐censored data (i.e., y<Yc<yu) one must draw PTN(0Jc,c|o,yξc|o,yuξc|o).

2.2. Multi‐Level Gaussian Process Regression

We now turn to the multi‐level Gaussian process regression setting studied in Kaufman and Sain [17] and Hoffmann et al. [18] and show how to include censoring with calculations very similar to those for a single observed function given above. This model extends Gaussian process regression to a setting with multiple observed functions fi=μ+ηi which are given by a common mean function μ and subject‐specific deviations ηi, both of which are modeled with Gaussian process priors. To avoid the identifiability issue that arises due to modeling the n functions f1,,fn with the n+1 functions μ,η1,,ηn, the ηi are required to sum to zero. This is achieved by specifying dependent Gaussian proces priors with covariance kernel Kθηη and cross‐covariances given by Cov(ηi(ti),ηj(tj)|θη)=1n1Kθηη(ti,tj) for ij. Writing GPnsum‐to‐zero for the joint dependent Gaussian process prior just described and Θ:=(θμ,θη,σ), the model can be written

ΘGΨμ|θμGP0,Kθμμ(η1,,ηn)|θηGPnsumtozero0,Kθηηfi=μ+ηiyi(tij)|fi,ΘNfi(tij),σ2 (5)

where GΨ is some prior distribution.

We now have n latent functions fi=μ+ηi from which we get noisy observations. We write Yo=((Y1o)T,,(Yno)T)T for the noisy non‐censored realizations at the Jo=i=1nJio times to=((t1o)T,,(tno)T)T and Yc=((Y1c)T,,(Ync)T)T for the noisy censored observations at the Jc=i=1nJic times tc=((t1c)T,,(tnc)T)T. We fully observe Yo=yo, while we only see that Yc>yc. No censoring mechanism is imposed, so in particular, the censoring values can differ among the latent functions or be time‐dependent. The model definition in Equation (5) gives the relevant covariance matrices

Ωoi,j:=Cov(Yio,Yjo|Θ)=Kθμμ(tio,tio)+Kθηη(tio,tio)+σ2IJio,i=jKθμμ(tio,tjo)1n1Kθηη(tio,tjo),ijΩci,j:=Cov(Yic,Yjc|Θ)=Kθμμ(tic,tic)+Kθηη(tic,tic)+σ2IJic,i=jKθμμ(tic,tjc)1n1Kθηη(tic,tjc),ijΩcoi,j:=Cov(Yic,Yjo|Θ)=Kθμμ(tic,tio)+Kθηη(tic,tio),i=jKθμμ(tic,tjo)1n1Kθηη(tic,tjo),ijΩμ:=Cov(μ(t˜)|Θ)=Kθμμ(t˜,t˜),Ωμc:=Cov(μ(t˜),Yc|Θ)=Kθμμ(t˜,tc),Ωμo:=Cov(μ(t˜),Yo|Θ)=Kθμμ(t˜,to). (6)

Using these covariance matrices, Proposition 3 gives the likelihood of the observed data. A proof is given in the Supporting Information.

Proposition 3

In the censored Gaussian process regression setting under the model in Equation (5), the likelihood of the observed data in terms of the parameters Θ=(θμ,θη,σ) is

L(Θ)=p(Yo=yo,Yc>yc|Θ)=ϕ(yo;0Jo,Ωo)Φ(yc;νc|o,Ωc|o)

with νc|o=ΩcoΩo1yo and Ωc|o=ΩcΩcoΩo1ΩcoT.

Similar to the univariate Gaussian process regression setting above, the next step is to find an expression for the conditional posterior distribution μ(t˜),η(t˜)|Yo=yo,Yc>yc,Θ, which will also give us the conditional posterior distribution of each fi(t˜)=μ(t˜)+ηi(t˜). To derive the density, we have to consider η(t˜):=(η1(t˜)T,,ηn1(t˜)T)T where we exclude ηn because it is completely determined from η1,,ηn1 since they must sum to zero and would thus lead to a degenerate normal distribution without density. A calculation similar to the univariate setting and given in the Supporting Information, gives the conditional joint posterior density

p(μ(t˜),η(t˜)|Yo=yo,Yc>yc,Θ)=ϕ(μ(t˜),η(t˜)|νμ,η|o,Ωμ,η|o)×Φ(yc;νc|μ,η,o,Ωc|μ,η,o)Φ(yc;νc|o,Ωc|o) (7)

where

νμ,η|o=ΩμoΩηoΩo1yo,Ωμ,η|o=Ωμ0Jp,(n1)Jp0(n1)Jp,JpΩηΩμoΩηoΩo1ΩoμΩoηνc|μ,η,o=ΩcμΩcηΩcoΩμ0Jp,(n1)JpΩμo0(n1)Jp,JpΩηΩηoΩoμΩoηΩo1μ(t˜)η(t˜)yoΩc|μ,η,o=ΩcΩcμΩcηΩcoΩμ0Jp,(n1)JpΩμo0(n1)Jp,JpΩηΩηoΩoμΩoηΩo1×ΩμcΩηcΩoc (8)

Analogous to the univariate setting, the density of the conditional joint posterior distribution consists of a term that only includes the observed values scaled by a term involving the censored values. Again using the result from Arellano‐Valle et al. [15, eqn. (15)], Proposition 4 gives a way of sampling from this non‐standard density relying on the covariance matrices

Ω(μ,η)c|o=ΩμcΩηcΩμoΩo1ΩocΩηoΩo1ΩocΩμc=Kθμμ(t˜,tc)Ωμo=Kθμμ(t˜,to)Ωηci,j=Kθηη(t˜,tic),i=j1n1Kθηη(t˜,tjc),ijΩηoi,j=Kθηη(t˜,tio),i=j1n1Kθηη(t˜,tjo),ij (9)

that follow from the model specification in Equation (5). The proof can be found in the Supporting Information.

Proposition 4

The conditional posterior (μ(t˜)T,η(t˜)T)T|Yo=yo,Yc>yc,Θ is the same as the distribution of

νμ,η|o+Ω(μ,η)c|oΩc|o1Pμ,η+Qμ,η

where Pμ,η is truncated normal TN(0Jc,Ωc|o,ycνc|o,) truncated at the lower bound ycνc|o, and Qμ,ηN(0nJp,Ωμ,η|oΩ(μ,η)c|oΩc|o1(Ω(μ,η)c|o)T).

Proposition 4 thus shows how to transform draws of multivariate normal and multivariate truncated normal variables to draws from the conditional posterior distribution in Equation (7).

3. Simulation Study

In Figure 1, we show the conditional posterior of f obtained by using the model above on simulated data. We compare it to the oracle method, where we have access to all censored observations and to two naive approaches: The include method, where we include the censoring values as observations, and the exclude method, where we completely exclude all censored observations. We have drawn f from a Gaussian process with the exponentiated quadratic covariance kernel with magnitude 1 and length scale 1. At 50 equidistant points between 10 and 10, we take an observation of f and add iid. N(0,0.22) noise. We censor all observations above 0.75. To fit the posterior, we use empirical Bayes, where we first maximize the log‐likelihood to get a point estimate of the kernel parameters and then use Proposition 2 to draw 1000 samples of f(t˜) from the conditional posterior given the point estimate of the kernel parameters, where t˜ is a fine equidistant grid of 200 time points. In this example, including or excluding the censoring values as observations leads to bias, since it doesn't correctly describe the censored parts of the latent function. Our censored‐GP model, however, gives a conditional posterior distribution that also follows the censored parts of the latent function, albeit with wider credible bands than if no observations had been censored.

FIGURE 1.

FIGURE 1

Comparison of the conditional posteriors for different Gaussian process regression methods fitted on simulated censored data. We have drawn a latent function f (black line) from a GP and take 50 equidistant observations between 10 and 10 with iid. Gaussian noise. All observations above 0.75 (dotted line) are censored. Observations are shown as points and censored observations as crosses. We show conditional posterior means (red line) and pointwise 90% credible intervals (red ribbon) for the latent function f using four different methods: The oracle method that gets to see the censored observations, our proposed censored‐GP model that correctly accounts for censoring, the include method that includes the censoring values as observations, and the exclude method that excludes all censored observations. Both include and exclude lead to bias.

Having illustrated how our method performs compared to naive methods and the oracle in a single example, we now do a larger simulation study to compare their performance in the multi‐level setting with 10 latent functions. We do this by comparing the integrated squared error (ISE) of the conditional posterior means of μ and f1,,f10 from the different methods on 1000 simulated data sets. The Supporting Information also provides results for the integrated absolute error (IAE) and the sup‐norm difference. Means and 90% intervals for the ISE of the four methods across the 1000 data sets are shown in Figure 2. The smaller these are, the better the conditional posterior means match the true underlying μ and f1,,f10 on the simulated data sets. Our method, which correctly accounts for censoring, is seen to outperform both the exclude and include methods and gets much closer to the performance of the oracle method. We refer to the Supporting Information for other simulation settings and error measures.

FIGURE 2.

FIGURE 2

The performance of our proposed method for value‐censored Gaussian process regression compared against that of an oracle method and two baseline methods on 1000 simulated data sets observed at nobs=26 equidistant time points. We compare performance when fitting both the latent curves f1,,f10 and their common mean‐process μ and quantify performance by computing the integrated squared error (ISE) of the conditional posterior mean compared to the truth: Smaller is better. This is done for two censoring schemes: The individual censoring scheme, where a different censoring cut‐off value is chosen for each data set so that each obtains the desired censoring percentage, and the population censoring scheme, where a single censoring cut‐off value is chosen to obtain the desired total censoring percentage across all 1000 data sets. The plot shows the mean (point) and 90% intervals (bar) across the 1000 simulated data sets.

The details of the simulation setup are as follows. We simulate 1000 data sets from the setting with n=10 curves, where μ comes from a GP with exponentiated quadratic kernel with magnitude 1 and length scale 1, and η comes from a sum‐to‐zero GP with exponentiated quadratic kernel with magnitude 0.5 and length scale 0.5. This gives us the 10 latent curves fi=μ+ηi, i{1,,10}. We observe each curve at nobs=26 equidistant time points between 10 and 10 and add iid. N(0,0.22) noise; see the Supporting Information for simulations with nobs=51. A censoring cut‐off above which all observations are censored is chosen to achieve a desired censoring percentage pcens in two ways: A different censoring cut‐off for each of the 1000 data sets so that each has pcens of its observations censored and the same cut‐off for all 1000 data sets so that all together pcens of the observations are censored across all 1000 data sets. To fit the posterior for each data set, we use empirical Bayes, where we first maximize the log‐likelihood to get a point estimate of the kernel parameters. We then use Proposition 4 to draw 100 samples of f(t˜) from the conditional posterior given the point estimate of the kernel parameters, where t˜ is a fine equidistant grid of 201 time points. We do this for each of the 1000 data sets. For each data set number i{1,,1000} we calculate posterior means μi and fji and compute ISE, IAE and the sup‐norm distance (SUP) for both μ and f as follows. For each posterior mean μi(t˜) we use trapezoidal integration to approximate ISEμi=1010(μ(t)μi(t))2dt and IAEμi=1010|μ(t)μi(t)|dt and calculate SUPμi=maxtt˜|μ(t)μi(t)|. For each posterior mean fi(t˜) we use trapezoid integration to approximate ISEfi=j=1101010(fj(t)fji(t))2dt and IAEfi=j=1101010|fj(t)fji(t)|dt and calculate SUPfi=maxj{1,,10},tt˜|fj(t)fji(t)|. We calculate means and 90% quantile‐based intervals for the ISEs, IAEs, and sup‐norm distances across the 1000 simulated data sets.

4. Coverage of Credible Intervals

We assess the coverage of the 0.95 credible intervals on simulated data. Table 1 shows the coverage at censored points (which attains 0.93 on large data sets) and at non‐censored points (which attains 0.94 on large data sets). The details of the simulation setup are as follows. We draw 1000 underlying GPs with mean 0 and an exponentiated quadratic covariance function with magnitude 1 and length scale 1. From each of these, we draw a data set by adding N(0,0.22) noise to an equidistant grid of Nobs observations on the interval [10, 10]. For each data set, we censor observations above the 0.75 quantile so that each data set has a censoring percentage of 25%. We fit the model to each data set and obtain 0.95 credible intervals for each point. The coverage is now computed as the proportion of credible intervals containing the true function value at that time point. It is computed separately for censored points and non‐censored points to show that close‐to‐nominal coverage is obtained at both.

TABLE 1.

Coverage of 0.95 credible intervals at non‐censored and censored points.

Coverage
No. of observations (Nobs=Jo+Jc) Non‐censored points Censored points
50 0.92 0.92
100 0.94 0.93
200 0.94 0.93
300 0.94 0.93

5. Application: HIV‐1 RNA Viral Load

We now apply our method for value‐censored multi‐level Gaussian process regression to the A5055 data set introduced in Acosta et al. [19]. This data set has previously been analyzed in, for example, Mattos et al. [10], Lachos et al. [11], Wang et al. [20], and Lin and Wang [21]. The data analyzed consists of a total of 365 observations on 44 patients infected with HIV‐1. For each patient, plasma HIV‐1 RNA viral load was measured in copies/mL from blood samples. The blood samples were taken at an irregular number of days, varying from study day 0 to day 205. The median number of observations per subject was nine, the minimum was three, and the maximum was eleven. Thirty‐five (80%) of the subjects had eight or more observations during the study period. Due to a lower detection limit in the measurement process, values below 50 copies/mL were left‐censored. In total, 110 (30%) of the observations were censored with a median number of censorings per subject of two, minimum equal to zero, and maximum equal to eight. 32 (73%) of the subjects had at least one censored observation during their follow‐up time.

Figure 3 shows the A5055 data on the log10 scale. The observed data points are marked with circles, and the censored observations are marked with plus signs. The horizontal line shows the lower limit of detection at log1050. We compare posterior means and 90% credible intervals for μ from the include method, which includes all censored observations as if they were regular observed data points, to the posterior distribution from our method censored‐GP, which correctly accounts for censoring. Figure 4 likewise shows posterior means and 90% credible intervals for three of the subject specific curves f1, f9, and f41, again comparing our censored‐GP method to the include method. Naively including the censoring values as observations leads to a large bias in the posterior distributions of μ and f.

FIGURE 3.

FIGURE 3

HIV‐1 RNA data set and posterior means and 90% credible intervals for the mean μ when using the method proposed in this paper that correctly accounts for censoring (censored‐GP) or when naively including the censoring values as observations (include). The horizontal line shows the censoring threshold.

FIGURE 4.

FIGURE 4

HIV‐1 RNA data (points connected by thick line) for subjects 1 (left panel), 9 (middle panel) and 41 (right panel) and corresponding posterior means and 90% intervals for the latent function fi when using the method proposed in this paper that correctly accounts for censoring (censored‐GP) or when naively including the censoring values as observations (include). The horizontal line shows the censoring threshold.

6. Discussion

In this paper, we have given a procedure for Gaussian process regression in the setting where some of the observations are value‐censored from above, below, or known to be within an interval in the form of an exact analytic expression of the likelihood of the observed data, and a way to directly simulate from the conditional posterior of the latent function. This was done both for the classic single‐curve Gaussian process regression setting and a multi‐level setting with multiple curves around a common mean curve. Simulation studies showed large improvement over naively including or excluding the censored observations and large runtime improvements.

While our expressions have exact and analytic forms, they require numerical evaluations of the multivariate normal CDF and the multivariate truncated normal distribution. Fast numeric algorithms exist [22, 23] when the dimension is below 1000, but future work is required to investigate how these numerical algorithms scale with the proportion of censored observations.

While our proposed method can, in principle, be used for full Bayesian inference using Hamiltonian Monte Carlo, it requires an implementation of the multivariate normal CDF, which is not currently built into the probabilistic programming language Stan. Future work could consider how to do a suitable approximation in Stan of the multivariate normal CDF and its gradient using, for example, importance sampling as in Christoffersen et al. [24]. Additional future work would be to compare the performance of empirical Bayes and full Bayes GP regression for different choices of kernels and priors.

Our simulation studies and examples show the behavior of the posterior mean. Future work might consider the behavior of other summary choices, such as the conditional maximum a posteriori estimate of the latent function f, which maximizes the conditional posterior in Equation (3). This will be possible under the empirical Bayes approach introduced in this paper thanks to the closed‐form expression for the conditional posterior in Equation (3), but it will most likely be difficult to find the maximum a posteriori estimate under the full Bayes approach since a closed‐form expression for the posterior p(f(t˜)|Yo=yo,Yc>yc) might be hard to find.

Conflicts of Interest

The authors declare no conflicts of interest.

Supporting information

Data S1. Supporting Information.

SIM-44-0-s001.pdf (224.2KB, pdf)

Gorm Hoffmann A., Thorn Ekstrøm C., Zeymer Christoffersen B., and Kryger Jensen A., “Gaussian Process Regression for Value‐Censored Functional and Longitudinal Data,” Statistics in Medicine 44, no. 20‐22 (2025): e70277, 10.1002/sim.70277.

Funding: The authors received no specific funding for this work.

Data Availability Statement

The data that support the findings of this study are openly available in censored‐GP at https://github.com/adamgorm/censored‐gp.

References

  • 1. Williams C. K. and Rasmussen C. E., Gaussian Processes for Machine Learning (MIT Press, 2006). [Google Scholar]
  • 2. Ekstrøm C. T., Bak S., Kristensen C., and Rudemo M., “Spot Shape Modelling and Data Transformations for Microarrays,” Bioinformatics 20, no. 14 (2004): 2270–2278. [DOI] [PubMed] [Google Scholar]
  • 3. He H., Tang W., Kelly T., Li S., and He1 J., “Statistical Tests for Latent Class in Censored Data due to Detection Limit,” Statistical Methods in Medical Research 29, no. 8 (2020): 2179–2197. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Thiébaut R. and Jacqmin‐Gadda H., “Mixed Models for Longitudinal Left‐Censored Repeated Measures,” Computer Methods and Programs in Biomedicine 74, no. 3 (2004): 255–260. [DOI] [PubMed] [Google Scholar]
  • 5. Hughes J. P., “Mixed Effects Models With Censored Data With Application to Hiv Rna Levels,” Biometrics 55, no. 2 (1999): 625–629. [DOI] [PubMed] [Google Scholar]
  • 6. Vaida F. and Liu L., “Fast Implementation for Normal Mixed Effects Models With Censored Response,” Journal of Computational and Graphical Statistics 18, no. 4 (2009): 797–817. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Vock D. M., Davidian M., Tsiatis A. A., and Muir A. J., “Mixed Model Analysis of Censored Longitudinal Data With Flexible Random‐Effects Density,” Biostatistics 13, no. 1 (2012): 61–73. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Groot P. and Lucas P. J., “Gaussian Process Regression With Censored Data Using Expectation Propagation,” in Conference: PGM ‘12: Proceedings of the 6th European Workshop on Probabilistic Graphical Models (DECSAI, University of Granada, 2012). [Google Scholar]
  • 9. Gammelli D., Peled I., Rodrigues F., Pacino D., Kurtaran H. A., and Pereira F. C., “Estimating Latent Demand of Shared Mobility Through Censored Gaussian Processes,” Transportation Research Part C: Emerging Technologies 120 (2020): 102775. [Google Scholar]
  • 10. Mattos T. B., Avila Matos L., and Lachos V. H., “A Semiparametric Mixed‐Effects Model for Censored Longitudinal Data,” Statistical Methods in Medical Research 30, no. 12 (2021): 2582–2603. [DOI] [PubMed] [Google Scholar]
  • 11. Lachos V. H., Matos L. A., Castro L. M., and Chen M.‐H., “Flexible Longitudinal Linear Mixed Models for Multiple Censored Responses Data,” Statistics in Medicine 38, no. 6 (2019): 1074–1102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. R Core Team , R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, 2022). [Google Scholar]
  • 13. Stan Development Team , “Stan Modeling Language Users Guide and Reference Manual, Version 2.33,” (2023).
  • 14. Gabry J. R., Češnovar J., and Johnson A., “cmdstanr: R Interface to ‘CmdStan’,” (2023), https://mc‐stan.org/cmdstanr/, https://discourse.mc‐stan.org.
  • 15. Arellano‐Valle R. B., Branco M. D., and Genton M. G., “A Unified View on Skewed Distributions Arising From Selections,” Canadian Journal of Statistics 34, no. 4 (2006): 581–601. [Google Scholar]
  • 16. Botev Z. and Belzile L., “TruncatedNormal: Truncated Multivariate Normal and Student Distributions,” R Package Version 2, no. 2 (2021): 2. [Google Scholar]
  • 17. Kaufman C. G. and Sain S. R., “Bayesian Functional ANOVA Modeling Using Gaussian Process Prior Distributions,” Bayesian Analysis 5, no. 1 (2010): 123–149. [Google Scholar]
  • 18. Hoffmann A. G., Ekstrøm C. T., and Jensen A. K., “Computationally Efficient Multi‐Level Gaussian Process Regression for Functional Data Observed Under Completely or Partially Regular Sampling Designs,” Manuscript Submitted for Publication (2024), arXiv Preprint arXiv:2406.13691 [Stat.ME].
  • 19. Acosta E. P., Wu H., Hammer S. M., et al., “Comparison of Two Indinavir/Ritonavir Regimens in the Treatment of HIV‐Infected Individuals,” JAIDS Journal of Acquired Immune Deficiency Syndromes 37, no. 3 (2004): 1358–1366. [DOI] [PubMed] [Google Scholar]
  • 20. Wang W.‐L., Lin T.‐I., and Lachos V. H., “Extending Multivariate‐t Linear Mixed Models for Multiple Longitudinal Data With Censored Responses and Heavy Tails,” Statistical Methods in Medical Research 27, no. 1 (2018): 48–64. [DOI] [PubMed] [Google Scholar]
  • 21. Lin T.‐I. and Wang W.‐L., “Multivariate‐Nonlinear Mixed Models With Application to Censored Multi‐Outcome AIDS Studies,” Biostatistics 18, no. 4 (2017): 666–681. [DOI] [PubMed] [Google Scholar]
  • 22. Botev Z. I., “The Normal Law Under Linear Restrictions: Simulation and Estimation via Minimax Tilting,” Journal of the Royal Statistical Society, Series B: Statistical Methodology 79, no. 1 (2016): 125–148. [Google Scholar]
  • 23. Genz A. and Bretz F., Computation of Multivariate Normal and t Probabilities. Lecture Notes in Statistics, vol. 195 (Springer Verlag, 2009). [Google Scholar]
  • 24. Christoffersen B., Mahjani B., Clements M., Kjellström H., and Humphreys K., “Quasi‐Monte Carlo Methods for Binary Event Models With Complex Family Data,” Journal of Computational and Graphical Statistics 32, no. 4 (2023): 1393–1401. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Data S1. Supporting Information.

SIM-44-0-s001.pdf (224.2KB, pdf)

Data Availability Statement

The data that support the findings of this study are openly available in censored‐GP at https://github.com/adamgorm/censored‐gp.


Articles from Statistics in Medicine are provided here courtesy of Wiley

RESOURCES