Abstract
We revisit the classic situation in functional data analysis in which curves are observed at discrete, possibly sparse and irregular, arguments with observation noise. We focus on the reconstruction of individual curves by prediction intervals and bands. The standard approach consists of two steps: first, one estimates the mean and covariance function of curves and observation noise variance function by, e.g. penalized splines, and second, under Gaussian assumptions, one derives the conditional distribution of a curve given observed data and constructs prediction sets with required properties, usually employing sampling from the predictive distribution. This approach is well established, commonly used and theoretically valid but practically, it surprisingly fails in its key property: prediction sets constructed this way often do not have the required coverage. The actual coverage is lower than the nominal one. We investigate the cause of this issue and propose a computationally feasible remedy that leads to prediction regions with a much better coverage. Our method accounts for the uncertainty of the predictive model by sampling from the approximate distribution of its spline estimators whose covariance is estimated by a novel sandwich estimator. Our approach also applies to the important case of covariate-adjusted models.
Keywords: Coverage, curve reconstruction, functional data analysis, noisy discrete observation, prediction set, spline smoothing
Mathematical Subject Classfication: 62M99, 62G99
1. Introduction
Functional data analysis [6,12] deals with collections of data units that are mathematical functions, that is, complex objects such as curves, surfaces or images. Depending on the application setting, these functions (regarded as realizations of random processes) may be observed in different forms. Sometimes observations of functions are available to the statistician in continuous argument and without noise, or on a dense set of argument values without or with noise in which case they can be converted to the underlying smooth function by interpolation or smoothing techniques applied to each function separately. However, in many areas of application, such as biomedical and econometric longitudinal studies, observations of each curve are made only at a lower number of argument values and with noise, making it impossible or unreliable to reconstruct individual functions only from their observation; instead, values measured on other subjects need to be used as well.
Our work is motivated by data consisting of heart rate measurements obtained as part of the Swiss Kidney Project on Genes in Hypertension [1]. Several hundred participants of this cohort study underwent 24-hour ambulatory blood pressure measurement using an automatic device that recorded blood pressure and heart rate every 15 minutes during the day and every 30 minutes during the night, starting at around 8 AM. Figure 1 shows measurements (in beats per minute) for a subset of 20 participants. Measurements recorded between 8 PM and 2 AM (that is, in the interval ) are plotted as this is a period of particular interest. Measurements are relatively sparse (with 10–20 observations on most participants) and irregularly located. Moreover, observation noise is present. Each participant's data can thus be seen as noisy discrete observations of the underlying continuous time heart rate curve. Methods studied in this work address the task of reconstructing individual curves in continuous time and, in particular, the challenge of correctly dealing with the uncertainty of such a reconstruction. A functional data analysis of these data by other means was previously performed in [7] and [8].
Figure 1.
Heart rate measurements for a subset of participants.
The scenario of noisy discrete observation of functional data (also called sparse functional data) was introduced by Staniswalis and Lee [13] and Yao et al. [16]. It can be formalized as follows. Let the underlying true functions be independent identically distributed realizations of smooth functions on the domain , which for simplicity will be an interval (hence the functions are curves). Instead of observing for all , one observes values for the ith subject only at arguments , and measurements are subject to observation noise. Hence the available data are , where are mutually independent mean zero random variables with variance that are independent of .
To estimate the mean function , the covariance function and the noise variance function , one usually employs nonparametric smoothing techniques such as kernel or penalized spline smoothing of pooled data from all subjects; see, e.g. [5] or [6]. We describe one such approach in Section 2.1. Yao et al. [16] and Xiao [15] provided theoretical results for kernel and spline estimators, respectively. Hall et al. [4], Zhang and Wang [18] and Liebl [10] studied theoretically the impact of sparsity on nonparametric estimation of the model parameters. These studies can help decide whether the pooled data smoothing or individual curve smoothing is more appropriate for estimation of µ and ρ. When there are enough points per curve, it may be possible to use individual smoothing, but the studies have shown that even a relatively slowly increasing number of points per curve (of larger order than ) leads to the parametric optimal convergence rate of estimators of µ and ρ, i.e. the same rate as estimators from fully observed curves. It thus appears that individual curve smoothing is mainly appropriate in very dense observation regimes.
Once the model components are estimated, we can turn to the reconstruction of individual curves . The goal is to recover the entire curve based on the discrete (often sparse) noisy measurements along the curve. Point reconstruction (i.e. prediction) is obtained by exploiting the covariance structure between the data observed on the ith subject and the underlying curve to be reconstructed, leading to the best linear unbiased prediction. Yao et al. [16] developed such a method using functional principal components. The reconstruction task was further studied by many authors, e.g. [2] or [3]. In addition to point reconstruction, one is interested in quantifying the uncertainty about the underlying curve, which is the main focus of our paper. We review a version of the standard approach to the construction of prediction sets (pointwise intervals and simultaneous bands) in Section 2.2.
It turns out that standard prediction sets do not perform well in practically important situations. We show by simulations that their actual coverage probability is often considerably lower than the desired coverage. Despite its gravity, this phenomenon has received little attention in the literature; [3] is perhaps the only publication that addresses it. The issue occurs because the traditional sets fail to account for important sources of uncertainty. We propose a new method that takes into account not only the predictive uncertainty but also the uncertainty about the predictive model parameters due to the estimation of the model components. Our method involves sampling from the approximate distribution of the spline estimators , and whose covariance is estimated by a sandwich estimator taking into account their correlation and the group correlation structure of the data. The proposed method has better coverage properties in all situations under study. Moreover, it is computationally favourable, in particular in comparison with other possible approaches.
It is an important asset of our approach that it extends to the case in which the model components (mean function, covariance function, noise variance function) are adjusted for the effect of a vector of scalar covariates in an arbitrarily flexible way permitted by spline additive models. There are no results currently available in the literature that would deal with prediction sets for curve reconstruction in a covariate-adjusted setting.
The contents of the paper is organized as follows. Section 2 introduces the existing approach to curve reconstruction. In Section 3, we present our main contribution, the improved method of predictive sampling. Section 4 provides the results of a simulation study. In Section 5 we illustrate our method on a data set of automatically recorded heart rate measurements in humans.
2. Estimation and usual prediction sets
2.1. Spline smoothing estimation of model components
The model components , and are estimated nonparametrically. Two main types of nonparametric smoothing, kernel smoothing and spline smoothing, are commonly used. The spline approach is more convenient for our purpose due to its linear regression structure.
Since , the mean function is estimated by spline smoothing of noisy functional observations , , as response and corresponding arguments as covariate. The mean is expressed as a linear combination of some spline basis functions, say , e.g. cubic B-splines, with coefficients . The coefficients are estimated by minimizing the penalized sum of squared residuals,
The roughness penalty term is included to avoid overfitting. The quadratic form given by the matrix measures the complexity of the shape of the corresponding function, e.g. by the squared -norm of the second derivative or by the squared Euclidean norm of differences of the coefficients. The smoothing parameter, which we absorb into the matrix to simplify the notation, is estimated by the Restricted Maximum Likelihood (REML) or selected by the Generalized Cross-Validation (GCV). The coefficients can be obtained in closed form as
| (1) |
where Y is the vector of length containing the response values and is the -matrix of evaluated basis functions.
Given the estimated mean function , the covariance function can be estimated by bivariate spline smoothing of the ‘raw covariances’ , , , , as response and arguments , as covariates. This is based on the fact that . The covariance function is expressed as a linear combination of bivariate spline basis functions obtained by the tensor product of a marginal spline basis with itself. Denote by the vectorization of the matrix . The coefficients are estimated by penalized least squares,
The roughness penalty is composed of wiggliness measures in each marginal direction and corresponding smoothing parameter; for details see [14]. It follows that
| (2) |
with and defined in an obvious way.
Next, since , the noise variance function is estimated by spline smoothing of the ‘raw noise variances’ , , as response and arguments as covariate. The coefficient estimates are obtained as
where the spline basis functions and the penalty matrix are analogous to the previous cases. Here we see that
We can also adjust the parameters for the effect of covariates. In general, the mean, covariance and noise variance functions may take the form of smooth functions , and , respectively, where is a vector of scalar covariates (subject features constant on the domain ). We may consider this general form or a simplification in the form of an additive model in which covariates may interact nonparametrically or semiparametrically with the argument(s). All these variants can be modelled by tensor products of splines and estimated by penalized least squares similarly to the basic case without covariates described before. Other types of splines can be used as well, for example, thin plate regression splines instead of cubic splines, and bivariate thin plate regression splines instead of tensor products. One can also use a logarithmic link function when estimating the noise variance function to ensure that the estimate is positive. The methods we develop are applicable with all these approaches and their combinations. They can also be used with functional data that are more complex than curves, like smooth surfaces or images.
We refer to [14] for general principles and details of penalized spline smoothing in the form needed in the above estimation procedures. There, the reader can find further information on the different approaches to the construction of splines of one and more variables, on additive models and interactions between variables, on roughness penalties, on smoothing parameter estimation or selection methods, on their advantages and disadvantages, on connections with mixed effects models, on computational details, and on the related mgcv package in R, which we use in our implementation.
2.2. Standard approach to curve reconstruction and prediction sets
Given observations at arguments , the goal now is to reconstruct the curve , . Assuming that the curve and observation noise terms are Gaussian and independent, we see immediately that the conditional distribution of given is a Gaussian process with mean function
| (3) |
and covariance function
| (4) |
where and is the matrix with entries , is the diagonal matrix with on the diagonal and .
The curve is then predicted by the estimated conditional mean
Since the predictive distribution is Gaussian, pointwise prediction intervals can be obtained easily.
Simultaneous prediction bands can be constructed in a number of ways. For example, one can use a band composed of pointwise prediction intervals and inflate it to achieve the required simultaneous coverage . Denote by the τ-quantile of given . The band has boundaries
Here (e.g. ) and is the level of the pointwise quantiles used for the construction of the boundaries, and is a factor by which these boundaries are inflated. The simultaneous coverage probability of the band is
The value of is thus determined as the -quantile of the above supremum. This value and the pointwise quantiles are computed by simulation from the predictive distribution with parameters , replaced by their estimates , . The procedure for generating a suitable number, say M, of curves from the predictive distribution is summarized in Algorithm A.
Since the predictive distribution is Gaussian, the width of the band is proportional to the pointwise predictive standard deviation. The form of the boundaries thus simplifies to , , where is the -quantile of given (again computable by simulation). We nevertheless write it in the above form involving predictive quantiles instead of predictive standard deviations because it is more appropriate for non-symmetric (and non-Gaussian) predictive distributions, which arise in the method we propose in the next section.
The form of the predictive distribution used here was previously used, e.g. by Degras [2]. Yao et al. [16] used principal components in their approximation of the predictive distribution. The construction of simultaneous prediction bands can also be based on other principles than inflation of pointwise intervals. For example, data depth can be used (e.g. [11]). For the sake of clarity and ease of presentation, we restrict our attention to the simple approach described above.
3. Proposed method
3.1. Causes of undercoverage
In a simulation study of the performance of prediction sets, which we report in detail in Section 4, we noticed that their empirical coverage probability is often dramatically lower than the nominal coverage. For example, in a typical situation with 400 subjects in the training sample, with an average of 3 noisy observations per subject, on the nominal coverage of the average (across the domain) coverage of pointwise prediction intervals was about and the simultaneous coverage of the simultaneous prediction band was as low as about . The situation was even worse with fewer subjects in the training sample.
We investigated the possible causes of this phenomenon by simulation experiments. One idea was that there might be a problem with the spline smoothing. We tried to reduce the possible boundary effects by extending the domain at the smoothing step but it did not lead to an improvement. We also tried to increase the number of spline basis functions or use thin plate regression splines instead of cubic splines or employ GCV instead of REML, also without success. Another idea was that the computation of prediction sets by sampling from the predictive distribution might be inaccurate due to a low number of curves sampled, but increasing the number did not help either.
Finally, we noticed that when we use the true model parameters µ, ρ and γ instead of their estimates, the empirical coverage probability is very close to the prescribed nominal coverage. The true parameters are, of course, unknown in real settings, and, therefore, the inaccuracy of estimates must be taken into account in some way.
3.2. Accounting for estimation uncertainty
In the traditional approach, the predictive model parameters are estimated by , and but treated as if the estimates were equal to the true functions µ, ρ, γ. This approach is theoretically correct because the estimators are consistent and hence their uncertainty is asymptotically negligible in comparison with non-vanishing prediction uncertainty. The same approach that neglects the estimation uncertainty is commonly used in many predictive tasks such as regression prediction or time series forecasting. However, in the present setting this approach often performs poorly for practically relevant sample sizes because it does not properly account for all important sources of uncertainty.
To incorporate both estimation and prediction uncertainty, an idea that naturally arises is the bootstrap. Goldsmith et al. [3], who reported on poor coverage of standard prediction sets, proposed a bootstrap approach in connection with their principal component based reconstruction method. We now outline how the bootstrap can be used in our setting. The key idea is to account for uncertainty by mimicking the entire process that gives rise to predictions. Therefore, to generate a sample from the predictive distribution one does not use the same predictive model estimated from the observed data. Instead, each predictive draw is made from the model estimated from a bootstrap sample. The bootstrap sample is obtained by sampling with replacement from the observed data. To preserve the within-subject dependence structure the resampling is applied to the collection of subjects. The method is described in Algorithm 2.
The main drawback of the bootstrap is its computational cost due to the repeated spline smoothing on each bootstrap sample. The smoothing step is costly mainly due to the bivariate smoothing of the covariance surface. This prompts us to look for other methods.
We propose to account for the prediction model uncertainty by considering the approximate distribution of the estimators of the model parameters. Instead of sampling curves from the predictive distribution with estimated parameters, each sample is drawn from the predictive distribution whose parameters are drawn from the approximate distribution of their estimators. The sampling of the model parameters is done by sampling their spline coefficients. The distribution of the spline coefficients estimators is approximated by a multivariate Gaussian distribution with mean and covariance matrix derived in the next subsection. The proposed method, which may be regarded as proper predictive sampling, is described in Algorithm 3. Unlike the traditional method of improper predictive sampling (Algorithm 1), our approach includes model uncertainty, but unlike the bootstrap method (Algorithm 2), it does not repeat the costly smoothing.
Once a sample of sufficiently many curves is obtained by any of Algorithms 1, 2 or 3, we can use it in the computation of predictions. The point reconstruction can be obtained by the pointwise median of the predictive sample. Pointwise prediction intervals can be based on pointwise quantiles of the predictive sample. A simultaneous prediction band can be computed by suitably inflating pointwise prediction intervals as described in Section 2.2.
The proposed approach has some limitations. Unlike the bootstrap, which fully accounts for all uncertainties in the predictive model, it neglects the estimation or selection uncertainty of the smoothing parameters and the uncertainty due to the construction of the spline bases (location of knots etc.). The bootstrap accounts for them at a high computational cost. Despite these limitations, the proposed approach performs in all situations substantially better than the standard approach.
3.3. Sandwich estimator of the covariance of the model estimators
It remains to find a suitable estimator of the approximate joint covariance matrix of the spline coefficient estimators . When doing this, one needs to pay attention to two complications: first, the data have a group structure with a within-subject correlation and heteroskedasticity of observations, and, second, the estimators , and , despite being computed separately, are based on the same data and hence generally correlated.
First, let us inspect the issue of group correlation and heteroskedasticity. All three parts, , , , are obtained by penalized least squares and thus their covariance estimators are readily available for fixed smoothing parameters. For example, for , standard methodology and implementations would, in light of (1), use
where would be a residual variance estimator. This is called the frequentist covariance estimator in the literature on generalized additive models [14] and related software (mgcv package in R). It would be a correct estimator if the response values were independent and homoskedastic. However, independence and homoskedasticity are working assumptions for the construction of the spline smoother rather than valid facts. In general, observations taken on the same subject are correlated and their variance is not the same. Indeed, for and . The true covariance matrix of is
where is the submatrix of with rows corresponding to observations pertaining to the ith curve. It can be estimated by
which is a sandwich covariance matrix estimator. Similar estimators are used with various unpenalized regression models in biostatistics and econometrics applied to group correlated data (e.g. [9,17]). Similarly, we can estimate the covariance matrix of by
and that of by
where the subscript i at , , , refers to the rows corresponding to the ith subject.
Next, we inspect the correlation between , and . We see from (1) and (2) that and are linear in response vectors Y and , respectively. Therefore, their covariance matrix is approximately
We note that this is an approximation because we neglect the part of covariance between Y and due to the centring in . It can be estimated by the sandwich estimator
The remaining covariances are estimated similarly by
and
Finally, we combine the covariance blocks and obtain the estimator
| (5) |
When the logarithmic link function is used in the estimation of the noise variance function, the sandwich estimator needs to be modified in a way that we explain in detail in Appendix B.
4. Simulations
We investigate the performance of point reconstructions, pointwise prediction intervals and simultaneous prediction bands by simulation. Point reconstructions are pointwise median curves of predictive samples. Pointwise prediction intervals are based on pointwise quantiles of predictive samples. Prediction bands are computed from predictive samples by inflating pointwise intervals as described in Section 2.2. The simulation study aims to compare three approaches to generating predictive samples: (i) the standard approach (Algorithm 1) using the predictive distribution with parameters held fixed at their estimates (labelled ‘Fixed’ in figures and tables), (ii) the proposed approach (Algorithm 3) using the predictive distribution with parameters sampled from their approximate distribution (labelled ‘Sampled’), and (iii) the ideal but practically impossible approach using the true predictive distribution without parameter estimation (labelled ‘True’). We do not include the bootstrap approach (Algorithm 2) in the simulation study due to its enormous computational cost. We illustrate its use on a real dataset in the next section, where it is seen that the computational time needed for the reconstruction of one curve is on the order of hours, while the proposed approach takes seconds. This makes the bootstrap approach unattractive for practical purposes and hardly feasible in simulation studies.
The data generating process is as follows. We generate Gaussian random curves on with mean function , where is the density function of the normal distribution with mean a and standard deviation b, and covariance function . The curves are evaluated on a grid of 100 equidistant points. For each curve, we independently generate the number of observation points according to two scenarios: uniform on (labelled ‘Sparse’) and uniform on (labelled ‘Less sparse’). The locations of the observation points are generated independently from the uniform distribution on the evaluation grid. Function values at observation points are further subjected to additive noise, which is independent normally distributed with mean 0 and variance (we obtained similar results for non-constant error variance ). We consider sample sizes n = 100, 400, 1000. For each combination of the sample size and sparsity regime we generate 1000 training samples. For each training sample we estimate the model, randomly select a curve to be reconstructed, generate 1000 curves sampled from the predictive distribution of the selected curve, and compute the point reconstruction and prediction sets. All computations were done in R 4.2.3 with mgcv 1.8–42.
First, we explore the behaviour of point reconstructions. We measure their accuracy by the integrated median absolute prediction error (IMAPE) which is estimated by the integral of the empirical median (over simulation repetitions) of the absolute difference between the point reconstruction and the true curve. Figure 2 shows the IMAPE values. It is seen that as the training sample size increases, the prediction error decreases and approaches the value that would be attained if the true parameters were known. More importantly, we see that under both sparsity scenarios there is essentially no difference between the two approaches to generating the predictive sample. Thus by using the proposed method that simulates the parameters instead of the standard method that fixes them at their estimates we do not increase the error of point reconstructions. This is a valuable observation because one could be afraid of a possibly increased error due to the additional variability induced by the parameter simulation.
Figure 2.
Integrated median absolute prediction error estimated by simulation.
Next, we focus on coverage properties of pointwise prediction intervals and simultaneous prediction bands. Figure 3(a) shows simulation estimates of the pointwise coverage probability of pointwise prediction intervals. We see that the standard approach using estimated parameters in the predictive distribution leads to serious undercoverage especially in situations with smaller training data sets due to sparsity or low number of subjects. We can see that the proposed method using simulated parameters in the predictive distribution results in coverage probabilities closer to the nominal level. Of course, the proposed method is not perfect either. It also sometimes suffers from insufficient coverage, mainly when training data are scarce, but in all situations it undercovers less than the standard approach. Figure 3(b) presents simulation estimate of the simultaneous coverage probability of simultaneous prediction bands. We can see even more clearly that the standard approach is not reliable in terms of coverage which, e.g. falls to about instead of prescribed in a practically relevant situation. The proposed method appears to produce much more reliable prediction bands and, therefore, emerges as the preferred method. Table 1 provides detailed numerical values of average pointwise coverage probabilities and simultaneous coverage probabilities from Figure 3, and also analogous values for other relevant nominal coverages. The standard error of the simulation estimates of these coverage characteristics is at most 0.016.
Figure 3.
Coverage probability of prediction sets with nominal coverage estimated by simulation. (a) Pointwise coverage probability of pointwise prediction intervals. (b) Simultaneous coverage probability of simultaneous prediction bands.
Table 1.
Coverage probability and width of prediction sets estimated by simulation.
| Less sparse | Sparse | ||||||
|---|---|---|---|---|---|---|---|
| Nominal coverage | n | Fixed | Sampled | True | Fixed | Sampled | True |
| Pointwise prediction intervals | |||||||
| Average pointwise coverage | |||||||
| 0.80 | 100 | 0.758 | 0.801 | 0.801 | 0.647 | 0.698 | 0.797 |
| 0.80 | 400 | 0.793 | 0.819 | 0.801 | 0.749 | 0.772 | 0.797 |
| 0.80 | 1000 | 0.801 | 0.820 | 0.801 | 0.776 | 0.795 | 0.797 |
| 0.90 | 100 | 0.861 | 0.903 | 0.901 | 0.753 | 0.820 | 0.899 |
| 0.90 | 400 | 0.893 | 0.918 | 0.901 | 0.855 | 0.881 | 0.899 |
| 0.90 | 1000 | 0.902 | 0.919 | 0.901 | 0.877 | 0.897 | 0.899 |
| 0.95 | 100 | 0.918 | 0.955 | 0.951 | 0.821 | 0.891 | 0.949 |
| 0.95 | 400 | 0.944 | 0.964 | 0.951 | 0.913 | 0.939 | 0.949 |
| 0.95 | 1000 | 0.951 | 0.963 | 0.951 | 0.930 | 0.948 | 0.949 |
| Average median width | |||||||
| 0.80 | 100 | 0.495 | 0.560 | 0.492 | 0.685 | 0.779 | 0.882 |
| 0.80 | 400 | 0.504 | 0.538 | 0.492 | 0.799 | 0.842 | 0.882 |
| 0.80 | 1000 | 0.503 | 0.526 | 0.492 | 0.834 | 0.867 | 0.882 |
| 0.90 | 100 | 0.635 | 0.731 | 0.631 | 0.878 | 1.029 | 1.130 |
| 0.90 | 400 | 0.646 | 0.699 | 0.631 | 1.024 | 1.101 | 1.130 |
| 0.90 | 1000 | 0.644 | 0.681 | 0.631 | 1.070 | 1.128 | 1.130 |
| 0.95 | 100 | 0.756 | 0.891 | 0.750 | 1.044 | 1.267 | 1.345 |
| 0.95 | 400 | 0.768 | 0.848 | 0.750 | 1.218 | 1.343 | 1.345 |
| 0.95 | 1000 | 0.766 | 0.821 | 0.750 | 1.271 | 1.368 | 1.345 |
| Simultaneous prediction band | |||||||
| Simultaneous coverage | |||||||
| 0.80 | 100 | 0.588 | 0.828 | 0.803 | 0.289 | 0.554 | 0.807 |
| 0.80 | 400 | 0.750 | 0.878 | 0.803 | 0.590 | 0.745 | 0.807 |
| 0.80 | 1000 | 0.784 | 0.867 | 0.803 | 0.673 | 0.794 | 0.807 |
| 0.90 | 100 | 0.721 | 0.902 | 0.896 | 0.412 | 0.696 | 0.891 |
| 0.90 | 400 | 0.866 | 0.946 | 0.896 | 0.739 | 0.882 | 0.891 |
| 0.90 | 1000 | 0.893 | 0.940 | 0.896 | 0.817 | 0.904 | 0.891 |
| 0.95 | 100 | 0.817 | 0.963 | 0.944 | 0.511 | 0.819 | 0.942 |
| 0.95 | 400 | 0.915 | 0.983 | 0.944 | 0.833 | 0.954 | 0.942 |
| 0.95 | 1000 | 0.937 | 0.978 | 0.944 | 0.884 | 0.957 | 0.942 |
| Average median width | |||||||
| 0.80 | 100 | 0.900 | 1.092 | 0.904 | 1.114 | 1.402 | 1.569 |
| 0.80 | 400 | 0.925 | 1.047 | 0.904 | 1.370 | 1.550 | 1.569 |
| 0.80 | 1000 | 0.928 | 1.013 | 0.904 | 1.454 | 1.601 | 1.569 |
| 0.90 | 100 | 1.020 | 1.278 | 1.020 | 1.290 | 1.688 | 1.783 |
| 0.90 | 400 | 1.045 | 1.218 | 1.020 | 1.569 | 1.839 | 1.783 |
| 0.90 | 1000 | 1.046 | 1.168 | 1.020 | 1.657 | 1.879 | 1.783 |
| 0.95 | 100 | 1.124 | 1.486 | 1.121 | 1.444 | 2.006 | 1.969 |
| 0.95 | 400 | 1.149 | 1.408 | 1.121 | 1.740 | 2.159 | 1.969 |
| 0.95 | 1000 | 1.148 | 1.330 | 1.121 | 1.837 | 2.175 | 1.969 |
Figure 4 displays the average (over the arguments) of the estimated median width of pointwise prediction intervals and simultaneous prediction bands. As can be expected, the better coverage properties of the proposed method are possible due to the larger widths of the sets. Table 1 provides numerical values of average median widths.
Figure 4.
Average median width of prediction sets with nominal coverage estimated by simulation. (a) Pointwise prediction intervals. (b) Simultaneous prediction bands.
Finally, Figure 5 reports average compute times (in seconds) of the standard and proposed method and their breakdown into four main computational tasks: nonparametric estimation of the model by spline smoothing, sandwich covariance estimation, simulation of 1000 curves from the predictive distribution and computation of the prediction band. The last part is negligible in comparison with the first three. The predictive sampling is also negligible when the parameters are held fixed but represents a substantial part of the total computational burden when the parameters are simulated. The computation of the sandwich covariance matrix of the estimated spline coefficients is computationally demanding mainly when the observation regime is less sparse. Nevertheless, the total compute times of the proposed method are reasonable for practice. On the other hand, if we were to generate 1000 predictive curves by the bootstrap, we would have to repeat the estimation step 1000 times, leading to dramatically higher total computation times.
Figure 5.
Breakdown of average compute times into main computational tasks.
Next, we consider a setting in which the data parameters depend on a covariate, namely, , and , where is a covariate. When u = 0.5, the parameters agree with those in the previous setting of identical distributions. The covariates are independent uniformly distributed on . The parameters are estimated as general bivariate and trivariate smooth functions using tensor product splines. We consider samples of size n = 200, 400, 1000.
Table 2 contains estimated coverage probabilities and average median widths computed from all reconstructed curves analogously to Table 1. Figure 6 displays the coverage probabilities as a function of the covariate computed by kernel smoothing of the pointwise or simultaneous coverage indicators from the simulation against the corresponding covariate values. It shows that the proposed method performs much better in terms of coverage than the standard approach also in the covariate-adjusted setting, and this finding is stable across the range of covariate values. Plots of additional performance measures (integrated median absolute prediction error, average median width) as a function of the covariate can be found in the supplementary document.
Table 2.
Coverage probability and width of prediction sets estimated by simulation in the covariate-adjusted setting.
| Less sparse | Sparse | ||||||
|---|---|---|---|---|---|---|---|
| Nominal coverage | n | Fixed | Sampled | True | Fixed | Sampled | True |
| Pointwise prediction intervals | |||||||
| Average pointwise coverage | |||||||
| 0.80 | 200 | 0.696 | 0.764 | 0.795 | 0.642 | 0.704 | 0.795 |
| 0.80 | 400 | 0.731 | 0.766 | 0.795 | 0.690 | 0.728 | 0.795 |
| 0.80 | 1000 | 0.753 | 0.768 | 0.795 | 0.734 | 0.756 | 0.795 |
| 0.90 | 200 | 0.804 | 0.871 | 0.894 | 0.748 | 0.823 | 0.897 |
| 0.90 | 400 | 0.841 | 0.873 | 0.894 | 0.798 | 0.843 | 0.897 |
| 0.90 | 1000 | 0.860 | 0.873 | 0.894 | 0.840 | 0.862 | 0.897 |
| 0.95 | 200 | 0.869 | 0.930 | 0.946 | 0.814 | 0.889 | 0.946 |
| 0.95 | 400 | 0.901 | 0.930 | 0.946 | 0.864 | 0.910 | 0.946 |
| 0.95 | 1000 | 0.917 | 0.931 | 0.946 | 0.897 | 0.922 | 0.946 |
| Average median width | |||||||
| 0.80 | 200 | 0.472 | 0.576 | 0.491 | 0.687 | 0.802 | 0.882 |
| 0.80 | 400 | 0.482 | 0.534 | 0.491 | 0.735 | 0.807 | 0.882 |
| 0.80 | 1000 | 0.489 | 0.509 | 0.491 | 0.782 | 0.824 | 0.882 |
| 0.90 | 200 | 0.605 | 0.756 | 0.630 | 0.881 | 1.057 | 1.132 |
| 0.90 | 400 | 0.618 | 0.693 | 0.630 | 0.943 | 1.057 | 1.132 |
| 0.90 | 1000 | 0.628 | 0.657 | 0.630 | 1.001 | 1.072 | 1.132 |
| 0.95 | 200 | 0.719 | 0.925 | 0.750 | 1.049 | 1.298 | 1.346 |
| 0.95 | 400 | 0.735 | 0.837 | 0.750 | 1.122 | 1.291 | 1.346 |
| 0.95 | 1000 | 0.747 | 0.787 | 0.750 | 1.191 | 1.300 | 1.346 |
| Simultaneous prediction band | |||||||
| Simultaneous coverage | |||||||
| 0.80 | 200 | 0.388 | 0.672 | 0.779 | 0.257 | 0.523 | 0.786 |
| 0.80 | 400 | 0.478 | 0.676 | 0.779 | 0.383 | 0.589 | 0.786 |
| 0.80 | 1000 | 0.563 | 0.634 | 0.779 | 0.510 | 0.643 | 0.786 |
| 0.90 | 200 | 0.534 | 0.841 | 0.893 | 0.384 | 0.690 | 0.885 |
| 0.90 | 400 | 0.647 | 0.828 | 0.893 | 0.526 | 0.758 | 0.885 |
| 0.90 | 1000 | 0.723 | 0.817 | 0.893 | 0.659 | 0.811 | 0.885 |
| 0.95 | 200 | 0.616 | 0.929 | 0.948 | 0.476 | 0.817 | 0.948 |
| 0.95 | 400 | 0.747 | 0.912 | 0.948 | 0.614 | 0.870 | 0.948 |
| 0.95 | 1000 | 0.826 | 0.898 | 0.948 | 0.762 | 0.903 | 0.948 |
| Average median width | |||||||
| 0.80 | 200 | 0.833 | 1.084 | 0.899 | 1.120 | 1.427 | 1.562 |
| 0.80 | 400 | 0.860 | 0.998 | 0.899 | 1.226 | 1.449 | 1.562 |
| 0.80 | 1000 | 0.880 | 0.940 | 0.899 | 1.329 | 1.479 | 1.562 |
| 0.90 | 200 | 0.950 | 1.273 | 1.015 | 1.298 | 1.713 | 1.775 |
| 0.90 | 400 | 0.977 | 1.157 | 1.015 | 1.412 | 1.725 | 1.775 |
| 0.90 | 1000 | 0.998 | 1.075 | 1.015 | 1.524 | 1.741 | 1.775 |
| 0.95 | 200 | 1.051 | 1.479 | 1.117 | 1.450 | 2.017 | 1.959 |
| 0.95 | 400 | 1.079 | 1.317 | 1.117 | 1.573 | 2.020 | 1.959 |
| 0.95 | 1000 | 1.101 | 1.202 | 1.117 | 1.693 | 2.014 | 1.959 |
Figure 6.
Coverage probability of prediction sets with nominal coverage estimated by simulation in the covariate-adjusted setting as a function of the covariate. (a) Pointwise coverage probability of pointwise prediction intervals. (b) Simultaneous coverage probability of simultaneous prediction bands.
The supplementary document also reports the results of simulations with non-Gaussian data. The main conclusion regarding the comparison of the existing and proposed methods remains valid under these scenarios, that is, the proposed method usually leads to a coverage that is closer to that under the ideal case with known mean, covariance and noise variance. An interesting finding is that the normality based methodology appears relatively robust with respect to the Gaussian assumption under the particular realistic violations considered there.
5. Application to heart rate temporal profiles
We use heart rate data measured in the interval , which is the transition period between the day and night cardiac regime. We adjusted each person's time scale so that the time 23 corresponds to the time when they went to bed, which was available from a questionnaire. In total we use data measured on 702 persons. The median number of measurements per person is 16, the total number of measurements is 10, 739.
First, we estimate the mean, covariance and noise variance functions by penalized spline smoothing. These estimates are plotted in the supplement. Next, we illustrate the proposed methods on a randomly selected participant. Reconstruction of the heart rate temporal profile is performed on a grid of 100 equidistant points by the standard method (Algorithm 1) in which the predictive model's parameters are held fixed at their estimates, the bootstrap method (Algorithm 2) in which the predictive model is repeatedly fitted to bootstrap samples, and the proposed method (Algorithm 3) in which the parameters are sampled from their approximate distribution.
The results are displayed in Figure 7. Black dots in all panels correspond to 17 heart rate measurements available for this person. Point predictions and prediction sets are plotted in Figure 7(a). It is seen that the prediction sets from the proposed method are somewhat larger than those from the standard method, especially the simultaneous bands. The average width (over 100 equidistant grid points) of the simultaneous band is 19.2 for the standard method (Algorithm 1), 21.12 for the bootstrap (Algorithm 2) and 20.72 for the proposed method (Algorithm 3). For pointwise intervals the average width is 12.28, 13.26 and 12.75, respectively. Based on the results of the simulation study, the sets produced by the proposed method are considered more reliable. Figure 7(b) displays fifty curves simulated from the predictive distribution by each method to illustrate the uncertainty of reconstruction.
Figure 7.
Reconstruction of the heart rate temporal profile of one person. Dots in all panels correspond to noisy discrete observations. (a) Point reconstruction (solid), pointwise prediction intervals (dotted) and prediction band (dashed) with nominal coverage . (b) Several curves simulated from the predictive distribution.
Generating a predictive sample of 1000 curves, from which the prediction sets are computed, took about 0.1 second by the standard method, 9.5 seconds by the proposed method and more than 2 hours by the bootstrap. This further illustrates the usefulness of the proposed algorithm in comparison with the bootstrap whose computational cost is extreme.
Further details of the fitted model can be found in the supplementary document. It also contains results of estimation and reconstruction obtained with an extended model in which the mean, covariance and noise variance are adjusted for the effect of a covariate in a flexible way.
Supplementary Material
Acknowledgments
The author is grateful to an Associate Editor and two referees for their helpful comments and suggestions.
Data availability statement
R code implementing the proposed methods, simulation and data analysis scripts and the data set used in this article are available at https://gitlab.ics.muni.cz/238224/prediction-sets-noisy-discrete-fd.
Disclosure statement
No potential conflict of interest was reported by the author(s).
Supplemental Data
Supplemental data for this article can be accessed online at http://dx.doi.org/10.1080/02664763.2024.2420223.
References
- 1.Alwan H., Pruijm M., Ponte B., Ackermann D., Guessous I., Ehret G., Staessen J.A., Asayama K., Vuistiner P., Younes S.E., Paccaud F., Wuerzner G., Pechere-Bertschi A., Mohaupt M., Vogt B., Martin P.-Y., Burnier M., and Bochud M., Epidemiology of masked and white-coat hypertension: The family-based SKIPOGH study, PLoS ONE 9 (2014), p. e92522. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Degras D.A., Simultaneous confidence bands for nonparametric regression with functional data, Stat. Sin. 21 (2011), pp. 1735–1765. [Google Scholar]
- 3.Goldsmith J., Greven S., and Crainiceanu C., Corrected confidence bands for functional data using principal components, Biometrics 69 (2013), pp. 41–51. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Hall P., Müller H.-G., and Wang J.-L., Properties of principal component methods for functional and longitudinal data analysis, Ann. Stat. 34 (2006), pp. 1493–1517. [Google Scholar]
- 5.Hsing T. and Eubank R., Theoretical Foundations of Functional Data Analysis, with An Introduction to Linear Operators, Wiley, 2015. [Google Scholar]
- 6.Kokoszka P. and Reimherr M., Introduction to Functional Data Analysis, Chapman and Hall/CRC, New York, 2017. [Google Scholar]
- 7.Kraus D., Components and completion of partially observed functional data, J. R. Stat. Soc. Ser. B: Stat. Methodol. 77 (2015), pp. 777–801. [Google Scholar]
- 8.Kraus D., Inferential procedures for partially observed functional data, J. Multivar. Anal. 173 (2019), pp. 583–603. [Google Scholar]
- 9.Liang K.-Y. and Zeger S.L., Longitudinal data analysis using generalized linear models, Biometrika 73 (1986), pp. 13–22. [Google Scholar]
- 10.Liebl D., Inference for sparse and dense functional data with covariate adjustments, J. Multivar. Anal. 170 (2019), pp. 315–335. [Google Scholar]
- 11.Narisetty N.N. and Nair V.N., Extremal depth for functional data and applications, J. Am. Stat. Assoc. 111 (2016), pp. 1705–1714. [Google Scholar]
- 12.Ramsay J.O. and Silverman B.W., Functional Data Analysis, 2nd ed., Springer Series in Statistics, Springer, New York, 2005.
- 13.Staniswalis J.G. and Lee J.J., Nonparametric regression analysis of longitudinal data, J. Am. Stat. Assoc. 93 (1998), pp. 1403–1418. [Google Scholar]
- 14.Wood S.N., Generalized Additive Models: An Introduction with R, 2nd ed., Taylor & Francis Group, New York, 2017. [Google Scholar]
- 15.Xiao L., Asymptotic properties of penalized splines for functional data, Bernoulli 26 (2020), pp. 2847–2875. [Google Scholar]
- 16.Yao F., Müller H.-G., and Wang J.-L., Functional data analysis for sparse longitudinal data, J. Am. Stat. Assoc. 100 (2005), pp. 577–590. [Google Scholar]
- 17.Zeileis A., Köll S., and Graham N., Various versatile variances: An object-oriented implementation of clustered covariances in R, J. Stat. Softw. 95 (2020), pp. 1–36. [Google Scholar]
- 18.Zhang X. and Wang J.-L., From sparse to dense functional data and beyond, Ann. Stat. 44 (2016), pp. 2281–2321. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
R code implementing the proposed methods, simulation and data analysis scripts and the data set used in this article are available at https://gitlab.ics.muni.cz/238224/prediction-sets-noisy-discrete-fd.







