Time-varying coefficient model estimation through radial basis functions

Juan Sosa; Lina Buitrago

doi:10.1080/02664763.2021.1910938

. 2021 Apr 5;49(10):2510–2534. doi: 10.1080/02664763.2021.1910938

Time-varying coefficient model estimation through radial basis functions

Juan Sosa ^1,^CONTACT, Lina Buitrago ¹

PMCID: PMC9225525 PMID: 35757039

ABSTRACT

In this paper, we estimate the dynamic parameters of a time-varying coefficient model through radial kernel functions in the context of a longitudinal study. Our proposal is based on a linear combination of weighted kernel functions involving a bandwidth, centered around a given set of time points. In addition, we study different alternatives of estimation and inference including a Frequentist approach using weighted least squares along with bootstrap methods, and a Bayesian approach through both Markov chain Monte Carlo and variational methods. We compare the estimation strategies mention above with each other, and our radial kernel functions proposal with an expansion based on regression spline, by means of an extensive simulation study considering multiples scenarios in terms of sample size, number of repeated measurements, and subject-specific correlation. Our experiments show that the capabilities of our proposal based on radial kernel functions are indeed comparable with or even better than those obtained from regression splines. We illustrate our methodology by analyzing data from two AIDS clinical studies.

Keywords: Bayesian inference, bootstrap, radial kernel functions, longitudinal data analysis, time-varying coefficient model, variational inference

1. Introduction

Statistical models for longitudinal data are powerful instruments of analysis when experimental units (subjects) are measured repeatedly over time in relation to a response variable along with static or time-dependent covariates. A very important feature of this kind of data that must be taken into account when fitting a statistical model, is the likely presence of serial correlation within repeated measurements on a given subject (observations between subjects are assumed to be independent). Typically, the main purpose of the analysis is to identify and characterize the evolution (mean tendency) of the response variable over time and quantify its association with the covariates. Parametric techniques for longitudinal data analysis have been exhaustively studied in the literature [37,38,42, see for example and references within]. Though useful in many cases, questions about the adequacy of the assumptions of parametric models and the potential impact of model misspecification on the analysis often arise. For instance, one of the basic assumptions associated with parametric techniques, yet not always satisfied, establishes that the mean response must be a known function of both fixed and random effects, indexed by a set of unknown parameters. Thus, for many practical situations, parametric models may be too restrictive or even unavailable.

In order to overcome such difficulties, building on the contributions of [8,24,70], the work of [27] considered a nonparametric model that lets the parameters vary over time. Nonparametric models of this nature allow more flexible functional dependence between the response variable and the covariates, since they are based on time-dependent coefficients (smooth functions of time) instead of fixed unknown parameters. Due to their interpretability and flexibility, these models have been the subject of active research over the last twenty years.

Early popular developments are given in [5,6,14,15,29,35,36,47,62,63,68,71]. For applications and surveys, see [16,54,65,72].

Specifically, consider the longitudinal dataset

D = \{(y_{i, j}, x_{i, j}, t_{i, j},) : j = 1, \dots, n_{i}, i = 1, \dots, n\},

where $y_{i, j} \equiv y_{i} (t_{i, j})$ and $x_{i, j} \equiv x_{i} (t_{i, j})$ with $x_{i} (t) = (x_{0, i} (t), x_{1, i} (t), \dots, x_{d, i} (t))$ are the real-valued response variable and the $(d + 1)$ column covariate vector, corresponding to measurement j of subject i, observed at time $t_{i, j}$ , n is the number of subjects, $n_{i}$ is the number of observations associated with subject i, and the total number of observations in the sample is $N = \sum_{i = 1}^{n} n_{i}$ . Measurement times are often distinct and irregularly spaced in a fixed interval of finite length. In order to evaluate the mean joint effect of time t and the covariates $x (t)$ on the outcome $y (t)$ , we use the structured nonparametric model

y (t) = x (t)^{T} β (t) + ϵ (t)

(1)

where $β (t) = (β_{0} (t), β_{1} (t), \dots, β_{d} (t))$ is a $(d + 1)$ column vector of real-valued nonparametric functions of time t, called dynamic coefficients or dynamic parameters, and $ϵ (t)$ is a zero-mean stochastic process, independent of $x (t)$ , with covariance function $γ (s, t) = C o v [ϵ (s), ϵ (t)]$ . The model in equation (1) is referred to as a (fixed-effects) time-varying coefficient model (TVCM). This model provides a parsimonious approach for characterizing time varying association patterns of a set of dynamic predictors on the expected value of a functional response. Notice that this model has a natural interpretation since for a fixed time point t, the TVCM (1) reduces to a multiple linear model with response variable $y (t)$ and covariate vector $x (t)$ , for which a standard interpretation of the time-varying coefficients $β_{r} (t)$ , $r = 0, 1, \dots, d$ , holds. In our experiments, we take $x_{0} (t) \equiv 1$ for all t, which means that $β_{0} (t)$ is the intercept coefficient describing the baseline time-trend. Finally, note that a discretized version of the TVCM can be obtain by simply substituting t by $t_{i, j}$ in Equation (1), considering all data points given in $D$ , in order to highlight the dependence on data when necessary.

In order to illustrate the full potential that a model such as a TVCM has, we provide below a fair revision on extensions of the model that have taken place over recent years. Motivated by applications rather than simply a desire to modify a statistical model, several extensions of the TVCM (1) have been proposed over the years in all sorts of directions, along with more complex structures and weaker assumptions. Some of these extensions typically share characteristics with each other. Here, we list some relevant instances in no particular order. A popular extension emerged naturally in order to efficiently capture both population and individual relationships. In that way, mixed-effects time-varying coefficient models extended TVCMs by dividing the error term into two parts, one part representing the subject-specific deviation from the population mean function, and the other representing the measurement error [7,31,34,40,53,67]; we consider a model of this sort in our first simulation study (see Section 5.1 for details). Another widespread extension took place when non-Gaussian responses where modeled directly, ranging from dichotomous and categorical outcomes to variables with skewed distributions, which provided a unified framework to do so. Thus, generalized time-varying coefficient models extended TVCMs by introducing a known link function to relate the dynamic linear predictor and the response process to each other [3,5,30,39,40,49,50]. Also, more complex types of dynamic functional dependencies have been developed, such as the relationships provided in time-varying additive and nonlinear models [13,45,51,59,64]. In addition, more adaptations were developed to deal with the same issues that non-varying models deal with. For instance, quantile regression [1,57], variable selection and shrinkage estimation [12,33,56,58,61], and even spatial modeling [2,20,31,43,52,55,66].

On the other hand, several smoothers can be used to estimate the dynamic coefficients of TVCM (1). The key idea behind the estimation process relies on rewriting each $β_{r} (t)$ through a linear expansion of parametric functions in order to make possible parametric-like inference as in standard models. In general, each smoother is indexed by a smoothing parameter vector that controls the trade-off between goodness-of-fit and model complexity. Thus, smoothing parameter selection criteria are in order. Some of the most popular smoothers include local polynomial smoothers, regression spline, smoothing spline, and P-splines [65,69]. Different smoothers have different strengths in one aspect or another. For example, smoothing spline may be good for handling sparse data, while local polynomial smoothers may be computationally advantageous for handling dense designs.

The purpose of this paper is twofold. First, in order to estimate the time-varying coefficients $β_{r} (t)$ , we propose a linear smoother based on kernel functions, by treating them as if they were radial basis functions [4, see for a complete catacterization of radial basis funcions]. This approach has been used in both the semiparametric regression [48] and statistical learning literature [23,25] but, to the best of our knowledge, it has not been fully exploited yet in the context of longitudinal data analysis, apart from the work of [52] on space-time varying coefficients models. Our proposal applies to both time-invariant and time-dependent covariates as well as regular and irregular placed design times, and also allows for different amounts of smoothing for different coefficients. Second, since a radial kernel expansion resembles very closely an approximation using spline basis functions, we compare these smoothing alternatives with each other in terms of goodness-of-fit and prediction using both Frequentist and Bayesian inference frameworks. To that end, from a Frequentist perspective, we consider weighted least squares and bootstrap methods. From a Bayesian perspective, we consider Markov chain Monte Carlo (MCMC) along with variational methods. The Bayesian approach has become more popular in recent years [3,19,28,31,39,41,55], but variational algorithms have not been explored to ease the computational burden under this framework.

The remainder of the paper is structured as follows: Section 2 introduces the estimation of time-varying coefficients through radial kernel functions and regression spline. Section 3 discusses different approaches to statistical inference. Section 4 discusses the choice of knots and smoothing parameter selection. Section 5 compares the estimation alternatives through an extensive simulation study. Section 6 illustrates our proposal by analyzing AIDS data coming from two clinical studies. Finally, Section 7 presents some concluding remarks and directions for future work.

2. Estimation using radial kernel functions

The main idea behind estimation through radial kernel functions consists in expressing each dynamic coefficient in model (1) as a linear combination of kernel functions by treating them as radial basis functions. A radial smoother can be constructed using the following set of radial basis:

1, t, \dots, t^{g}, ξ (| t - κ_{1} |), \dots, ξ (| t - κ_{k} |),

(2)

where $ξ (\cdot)$ is a kernel function, $| \cdot |$ is the Euclidean norm in $R$ , and $κ_{1} < \dots < κ_{k}$ are k knots covering the time domain. The smoother performance strongly depends on the proper selection of both location and number of knots (see Section 4 for details). The basis degree g is usually less crucial and it is typically taken as 1, 2, or 3, for computational convenience. On the other hand, note that the first g + 1 basis functions of (2) are polynomials of degree up to g, and the others are all kernel functions, which satisfy the property $ξ (t) = ξ (| t |)$ . Such functions are known as radial functions [4]. Different kinds of kernel functions are commonly used in practice, such as Gaussian or Epanechnikov kernels, among many others [60, see for a review]. This choice is less significant in terms of smoothing.

Using the radial basis (2), we can express each time-varying coefficient $β_{r} (t)$ , $r = 0, 1, \dots, d$ , as

β_{r} (t) = \sum_{ℓ = 0}^{g} α_{r, ℓ} t^{ℓ} + \sum_{ℓ = 1}^{k_{r}} α_{r, g + ℓ} ξ (| t - κ_{ℓ} |) = Ξ_{r} (t)^{T} α_{r},

(3)

where $Ξ_{r} (t) = (1, t, \dots, t^{g}, ξ (| t - κ_{1} |), \dots, ξ (| t - κ_{k_{r}} |))$ and $α_{r} = (α_{r, 0}, \dots, α_{r, k_{r} + g + 1})$ are $p_{r} \times 1$ column vectors with $p_{r} = k_{r} + g + 1$ , composed of basis functions evaluated at time t and unknown parameters, respectively. Such a representation is able to accommodate a variety of shapes and smoothness for the dynamic coefficients without overfitting the data, since separate number of knots are allowed. This means that, for a fixed degree basis g, the number of knots $k_{0}, \dots, k_{d}$ play the role of smoothing parameters.

In this way, the dynamic vector $β (t)$ in model (1) becomes

β (t) = Ξ (t)^{T} α,

(4)

where $α = (α_{0}^{T}, \dots, α_{d}^{T})$ and $Ξ (t) = d i a g [Ξ_{0} (t), Ξ_{1} (t), \dots, Ξ_{d} (t)]$ . Note that $α$ is a column vector of size $p \times 1$ , $p = \sum_{r = 0}^{d} p_{r}$ , whereas $Ξ (t)$ is a rectangular matrix of size $p \times (d + 1)$ . Now, substituting $β (t)$ in model (1) for its equivalent expression given in (4), it follows that the TVCM (1) can be approximately written as

y_{i, j} = z_{i, j}^{T} α + ϵ_{i, j}, j = 1, \dots, n_{i}, i = 1, \dots, n,

(5)

where $y_{i, j} \equiv y_{i} (t_{i, j})$ , $ϵ_{i, j} \equiv ϵ_{i} (t_{i, j})$ , and $z_{i, j} = (z_{0, i, j}^{T}, \dots, z_{d, i, j}^{T})$ , with

z_{r, i, j} = x_{r, i} (t_{i, j}) Ξ_{r} (t_{i, j}) r = 0, 1, \dots, d .

(6)

Similar to $α$ , each $z_{i, j}$ is a $p \times 1$ column vector of covariate values times radial basis functions. For the i-th subject, $i = 1, \dots, n$ , we denote the response vector, the random error vector, and the design matrix as

y_{i} = (y_{i, 1}, \dots, y_{i, n_{i}}), ϵ_{i} = (ϵ_{i, 1}, \dots, ϵ_{i, n_{i}}), Z_{i} = [z_{i, 1}, \dots, z_{i, n_{i}}]^{T} .

Consistently, we denote the response vector, the random error vector, and the design matrix for the whole dataset as

y = (y_{1}^{T}, \dots, y_{n}^{T}), ϵ = (ϵ_{1}^{T}, \dots, ϵ_{n}^{T}), Z = [Z_{1}^{T}, \dots, Z_{n}^{T}]^{T},

which allow us to express model (5) in matrix form as a standard linear model:

y = Z α + ϵ .

(7)

Once an estimate for $α$ , $\hat{α} = ({\hat{α}}_{0}^{T}, \dots, {\hat{α}}_{d}^{T})$ , is obtained; it is straightforward to get an estimate for $β (t)$ , $\hat{β} (t) = ({\hat{β}}_{0} (t), \dots, {\hat{β}}_{d} (t))$ , by simply letting $\hat{β} (t) = Ξ (t)^{T} \hat{α}$ . Therefore, our task reduces to estimate $α$ in model (7) using an appropriate method.

Similarly, the key concept working with spline functions is to represent each dynamic coefficient through a regression spline basis, such as the truncated power basis, B-spline basis, or wavelet basis, among others [46, see for a review]. The B-spline basis is a powerful choice due to its simplicity and capability to capture local features of dynamic relationships. In this way, emulating the methodology described above, in order to estimate the time-varying coefficient $β_{r} (t)$ , we consider the following expansion based on truncated power functions:

β_{r} (t) = \sum_{ℓ = 0}^{g} α_{r, ℓ} t^{ℓ} + \sum_{ℓ = 1}^{k_{r}} α_{r, g + ℓ} (t - κ_{ℓ})_{+}^{g} = Φ_{r} (t)^{T} α_{r},

(8)

where $x_{+}^{g}$ denotes the g-th power of the positive part of x, $x_{+} = max (0, x)$ , and $κ_{1}, \dots, κ_{k_{r}}$ are $k_{r}$ knots (in an increasing order) scattered in the range of interest. As before, note that $Φ_{r} (t) = (1, t, \dots, t^{g}, (t - κ_{1})_{+}^{g}, \dots, (t - κ_{k_{r}})_{+}^{g})$ and $α_{r} = (α_{r, 0}, \dots, α_{r, k_{r} + g + 1})$ are $p_{r} \times 1$ column vectors, $p_{r} = k_{r} + g + 1$ , composed of basis functions evaluated at time t and unknown parameters, respectively. Then, it is simple to obtain an estimate for $β (t)$ as $\hat{β} (t) = Φ (t)^{T} \hat{α}$ , where $Φ (t) = d i a g [Φ_{0} (t), Φ_{1} (t), \dots, Φ_{d} (t)]$ and $\hat{α}$ is an estimate of $α$ in model (7), whose design matrix $Z$ is constructed from (6) using $Φ_{r} (t_{i, j})$ instead of $Ξ_{r} (t_{i, j})$ .

Finally, the reader should note that our proposal given in equation (3) is a direct reformulation of the expansion based on truncated power functions (which have been extensively investigated in the literature as in Wu and Zhang [69], for example; and therefore, we use them as a baseline), obtained by using other kind of basis functions. We argue that this is a sensible thing to do since radial functions (kernels in particular) have very desirable properties (e.g. the semiparametric regression expansion considered here based on radial functions are kernel machines within the reproducing kernel Hilbert space framework; see Harezlak et al., [23], for example) to representing (smoothing) all sorts of functional behaviors. That is why we believe that our approach constitutes a reasonable choice to represent dynamic coefficients in any TVCM. As a final comment, we note that, within a given inference paradigm, the ‘computational complexity’ of either expansion is equivalent, because each basis is composed of 1 + g + K real-valued functions.

3. Inference methods

According to the previous section, TVCM (1) is locally equivalent to standard linear model (7) in which it is required to estimate the parameter vector $α$ . In what follows, we consider both Frequentist and Bayesian approaches to carry out statistical inference on $α$ , and as a consequence, on $β (t)$ . First, we motivate a very popular sampling distribution as well as widely known classical bootstrap methods for performing statistical inference. Then, we consider information external to the dataset by means of a conjugate prior distribution, along with our proposal for quantifying uncertainty based on simulation and variational methods. We discuss in detail implications, challenges, and algorithms for each protocol, but focus our attention on our estimation approach embedded in the Bayesian paradigm.

3.1. Frequentist inference

From a likelihood point of view, in its simplest form, we can consider the sampling distribution

y ∣ Z, W, α, σ^{2} \sim N (Z α, σ^{2} W^{- 1}),

(9)

where $W = d i a g [W_{1}, \dots, W_{n}]$ and $W_{i} = w_{i} I_{n_{i}}$ is the weight matrix for the i-th experimental unit, $i = 1, \dots, n$ , which is equivalent to assuming $ϵ ∣ W, σ^{2} \sim N (0, σ^{2} W^{- 1})$ in model (7), in a way that $ϵ (t) \sim G P (μ, γ)$ is a Gaussian process with $μ (t) = 0$ and $γ (s, t) = σ^{2} 1_{{s = t}}$ in model (1). The weights $w_{1}, \dots, w_{n}$ are known positive constants such that $\sum_{i = 1}^{n} n_{i} w_{i} = 1$ , which quantify the relative importance of experimental units. In our experiments, we follow [65] and consider the ‘subject uniform weight’, $w_{i} = 1 / (n n_{i})$ , where each subject is inversely weighted by its number of repeated measurements $n_{i}$ , so that the subjects with fewer repeated measurements receive more weight than the subjects with more repeated measurements. The above independence assumption is convenient mathematically and works well when longitudinal data tend to be sparse. However, in our experience, and also, considering empirical evidence from both Wu and Zhang [69] and Wu and Tian [65], such an assumption can be robust to some deviations from data sparsity. Therefore, the sampling distribution (9) is an appealing choice in practice.

Under this setting, the resulting maximum likelihood estimator of $α$ is given by

\hat{α} = (Z^{T} W Z)^{- 1} Z^{T} W y,

(10)

which is equivalent to the estimator obtained as a result of minimizing the weighted least squares (WLS) criterion

W L S (α) = \sum_{i = 1}^{n} \sum_{j = 1}^{n_{i}} w_{i} (y_{i, j} - z_{i, j}^{T} α)^{2} .

(11)

Note that according to the Gauss-Markov theorem, the estimator provided in (10) is the best linear unbiased estimator (BLUE) of $α$ . Furthermore, it can be shown that an unbiased estimator of $σ^{2}$ is

{\hat{σ}}^{2} = \frac{1}{N - p} (y - Z \hat{α})^{T} W (y - Z \hat{α}),

(12)

where $N = \sum_{i = 1}^{n} n_{i}$ is the total number of observations, $p = \sum_{r = 0}^{d} p_{r}$ is the expansion dimension, and $\hat{α}$ is the estimator of $α$ given in (10).

Under the Frequentist paradigm, confidence intervals can be computed based on either asymptotic distributions or bootstrap methods [10]. However, given the compound structure of longitudinal data, inferences that are based on asymptotic distributions are typically difficult to justify in practice, since they heavily rely on assumptions that are difficult to meet. Thus, we consider bootstrap methods, which can always be implemented based on the available data regardless of sample sizes and sampling distributions. See Appendix A.1 for a detailed description of the bootstrap algorithm. The main advantage of using a bootstrap procedure is that it does not rely on asymptotic distributions and can be used to construct confidence intervals. For instance, at a given time t, the $100 (1 - α) %$ percentile-based confidence interval for $β_{r} (t)$ , $r = 0, 1, \dots, d$ , is given by

(β_{r, α / 2} (t), β_{r, 1 - α / 2} (t)),

(13)

where $β_{r, α / 2} (t)$ and $β_{r, 1 - α / 2} (t)$ are the $α / 2$ and $1 - α / 2$ quantiles of the bootstrap samples $β_{r} (t)^{(1)}, \dots, β_{r} (t)^{(B)}$ , which are computed based on $α^{(1)}, \dots, α^{(B)}$ and a given set of basis functions. Other types of confidence intervals are available [e.g. normal-based confidence intervals; see 10 for a review]. We highlight that the confidence intervals given above correspond to pointwise confidence sets that only work for $β_{r} (t)$ at a given time t. In most practical situations, such pointwise inferences are sufficient. However, in some studies, we might require a confidence band that simultaneously includes the true dynamic coefficient $β_{r} (t)$ for a range (typically large) of time values. In such situations, we need to construct a simultaneous confidence band for $β_{r} (t)$ for t within a given time interval. We refer the reader to Wu and Tian [65] for details about this matter.

Even though theoretical properties of bootstrap procedures in this setting have not been systematically investigated, we are quite confident about the coverage rates in this case, given previous simulations studies about this matter as in Hoover et al. [27] and Wu and Chiang [62] [65, see also and refereces therein for a comprehensive review and also more empirical evidence in this regard].

3.2. Bayesian inference

Under a Bayesian framework, in order to obtain an estimate for $α$ under the sampling distribution (9), it suffices to consider the standard normal regression model

y ∣ Z, α, σ^{2} \sim N (Z α, σ^{2} I_{N}),

(14)

since it can be easily obtained from (9) by means of a linear transformation on $y$ based on a Choleski factorization of $W^{- 1}$ [17, see for details]. In what follows, we consider the sampling distribution (14), having in mind that a preprocessing step is required before fitting the model. In order to complete the model we choose the so-called independent Zellner's $g$ -prior as a simple semiconjugate prior distribution on $α$ and $σ^{2}$ to be used when there is little prior information available. Under this invariant g-prior, we let

α ∣ σ^{2} \sim N (0, N σ^{2} I_{N}) a n d σ^{2} \sim I G a m (a_{σ}, b_{σ}),

where $a_{σ}$ and $b_{σ}$ are known hyperparameters.

Regarding the hyperparameter elicitation, we need a prior distribution to be as minimally informative as possible in the absence of real external information. We recommend setting $g = N$ , $a_{σ} = 2$ , and $b_{σ} = {\hat{σ}}^{2}$ as in (12). This choice of $g$ makes $\frac{g}{g + 1}$ very close to 1, and therefore, we are practically centering $α$ around $\hat{α}$ a priori. Similarly, the prior distribution of $σ^{2}$ is also weakly centered around ${\hat{σ}}^{2}$ , since $a_{σ} = 2$ implies an infinite variance on $σ^{2}$ a priori. Such a distribution cannot be strictly considered as a real prior distribution, as it requires knowledge of $y$ to be constructed. However, it only uses a small amount of the information in $y$ , and can be loosely thought of as the prior distribution of a researcher with unbiased but weak prior information [26, see for a discussion]. Refer to Appendix A.2 for details about the Gibbs sampler.

Even though the MCMC algorithm is straightforward in this case, inference may become impractical as the number of experimental units and the number of covariates grow. For this reason, we also implement a variational Bayes alternative that can potentially alleviate the computational burden in big data scenarios. See Appendix A.3 for details regarding the variational algorithm.

4. Location and number of knots

The smoothers' quality strongly depends on both knot locations and the number of knots. The degree of the expansion g is usually less crucial and it is often taken as 1, 2, or 3. In terms of knot location, we distinguish two widely used alternatives. The first method locates equally spaced points in the range of interest, independently of the design time points. It is usually employed when the design time points are uniformly scattered in the range of interest. The second method locates equally spaced quantiles of the design time points as knots. It locates more knots where more design time points are scattered. These methods are equivalent when the design time points are uniformly scattered. However, in our experience, the equally spaced method to locate knots is very convenient due to both its simplicity and proclivity to work well even in all sort of situations.

Another essential feature that we need to handle in practice is how to choose the smoothing parameter vector $p = (p_{0}, \dots, p_{d})$ . A popular method to do so is the so called leave-one-point-out cross-validation [11, PCV,]. This approach aims to select a good smoothing parameter vector via trading-off the goodness-of-fit and the model complexity. The idea behind this criteria consists in choosing the smoothing parameter vector $p$ that minimizes the expression

P C V = \sum_{i = 1}^{n} \sum_{j = 1}^{n_{i}} w_{i} {[y_{i, j} - x_{i} (t_{i, j})^{T} {\hat{β}}^{(- i, j)} (t_{i, j})]}^{2},

(15)

where ${\hat{β}}^{(- i, j)} (t_{i, j})$ is an estimate of $β (t_{i, j})$ using the entire dataset except the j-th measurement of the i-th experimental unit. It can be shown that expression (15) is equivalent to

P C V = \frac{(y - A y)^{T} W (y - A y)}{{(1 - t r (A) / N)}^{2}},

(16)

where $t r (A)$ is the trace of the smoothing matrix $A$ , which is a square matrix such that $\hat{y} = A y$ . Even though the PCV criteria does not account for the within-subject correlation effectively, it is a suitable method since the computational performance is substantially better than that provided by other alternatives [e.g. leave-one-subject-out cross-validation; see 65, for details]. As discussed in Section 7, other alternatives relying on model-based knot introduction or deletion are available.

5. Simulation study

In this section, we present two benchmark simulation scenarios to evaluate the performance of our proposed methodology and compare the different inferential methods. The first simulation scenario is inspired on an experiment originally proposed by [67,69]. The second experiment follows very closely a simulation study performed by Wu and Chiang [62], Huang et al. [29], and Wu and Tian [65].

5.1. Simulation scenario 1

In order to test our methodology with challenging real-life like datasets, and also, evaluating the robustness of the model to typical deviations from the true data generating process, we consider in this experiment a mixed-effects time-varying coefficient model with no covariate information. Such a model takes a (fixed-effects) TVCM with d = 0 and $x_{0} (t) \equiv 1$ for all t, and decomposes the error term into two random parts: the first one, which is subject-specific, describes the characteristics of each individual that deviate from the mean population behavior; whereas the second one, which handles directly pure random error, encompasses all those factors out of reach by the modeler (such as error measurement). Thus, we generate synthetic datasets as follows:

y_{i} (t) = β_{0} (t) + υ_{i} (t) + ϵ_{i} (t), i = 1, \dots, n,

(17)

where $β_{0} (t)$ is a known time-varying coefficient, $υ_{i} (t) = a_{i, 0} + a_{i, 1} \cos (2 π t) + a_{i, 2} \sin (2 π t)$ is a subject-specific random effect, with $(a_{i, 0}, a_{i, 1}, a_{i, 2}) N (0, d i a g [σ_{0}^{2}, σ_{1}^{2}, σ_{2}^{2}])$ , and $ϵ_{i} (t)$ is a zero-mean Gaussian process such that $ϵ_{i} (t) N (0, σ_{ϵ}^{2} [1 - e x p (- 0.5 t - i / n)]^{2})$ . We assume that $σ_{1}^{2} = σ_{2}^{2} = σ_{ϵ}^{2} = σ^{2}$ , and therefore, the correlation between repeated measurements ρ within each experimental unit is bounded by $(σ_{0}^{2} + 2 σ^{2})^{- 1} (σ_{0}^{2} - σ^{2})$ and $(σ_{0}^{2} + 2 σ^{2})^{- 1} (σ_{0}^{2} + σ^{2})$ . We consider three cases in order to simulate different correlation levels, namely, weak within-subject correlation, $σ^{2} = 0.01$ and $σ_{0}^{2} = 0.01$ , which corresponds to $0.00 \leq ρ \leq 0.67$ ; medium within-subject correlation, $σ^{2} = 0.01$ and $σ_{0}^{2} = 0.04$ , which corresponds to $0.50 \leq ρ \leq 0.83$ ; and strong within-subject correlation, $σ^{2} = 0.01$ and $σ_{0}^{2} = 0.09$ , which corresponds to $0.73 \leq ρ \leq 0.91$ .

Additionally, design times are simulated as $t_{i, j} = j / (m + 1)$ , $i = 1, \dots, n$ , $j = 1, \dots, m$ , where m is a positive integer. In order to simulate unbalanced datasets for each subject, repeated measures are randomly removed with a rate r = 0.5; thus, we expect $m (1 - r)$ repeated measurements per experimental unit and $n m (1 - r)$ measurements in total. In addition, number an location of knots are chosen according to the PCV criteria and the equally spaced method described in Section 4, respectively. We generated 250 datasets with two dynamic parameters, $β_{0} (t) = 2 e^{t}$ and $β_{0} (t) = 1 + \cos (2 π t) + \sin (2 π t)$ , and also, three sample sizes, n = 25, n = 50, and n = 100. Each time, once the number and location of knots are fixed, we fitted model (1) using both radial kernel and regression spline functions setting g = 2 as a degree, with Frequentist, Bayesian, and variational methods. Setting the prior distribution as discussed in Section 3.2, Bayesian and variational estimates are based on 2, 000 samples from the posterior distribution. For Bayesian inference, we use a burn-in period of 500 samples; whereas for variational inference, we use a negligible increased in ELBO of 1e-06. Such a setting showed no evidence of lack of convergence in any case.

In this case, the performance of an estimate is measured by means of the average mean square error (AMSE), which is defined as

A M S E (β_{r}) = \sum_{i = 1}^{n} \sum_{j = 1}^{n_{i}} \frac{1}{n n_{i}} {(β_{r} (t_{i, j}) - {\hat{β}}_{r} (t_{i, j}))}^{2}, r = 0, \dots, d .

Figures 1 and 2 show the AMSE distribution of $β_{0} (t) = 2 e^{t}$ and $β_{0} (t) = 1 + \cos (2 π t) + \sin (2 π t)$ , respectively, corresponding to 250 synthetic datasets generated according to TVCM (17), in each of nine scenarios delimited by correlation level (weak, medium, and high within-subject correlation) and sample size (n = 25, n = 50, and n = 100). In general, the AMSE distribution is quite consistent across inference paradigms, which is particularly evident in the first case. Such a behavior was somewhat predictable because the number of measurements, even for the smallest datasets, is big enough (about 375 observations) to allow the likelihood to overcome the prior distribution. Even though AMSEs are also very similar across basis functions, error rates are slightly smaller in the second case when Bayesian methods along with radial functions are employed.

Figure 1. — AMSE distribution of $β_{0} = 2 e^{t}$ corresponding to 250 synthetic datasets generated according to TVCM (17). Scenarios are delimited by correlation level in rows (weak, medium, and high within-subject correlation) and sample size in columns (n = 25, n = 50, and n = 100). The model is fitted each time using both radial kernel (K) and regression spline (S) functions, with Frequentist (blue), Bayesian (black), and variational (green) methods.

Figure 2. — AMSE distribution of $β_{0} (t) = 1 + \cos (2 π t) + \sin (2 π t)$ corresponding to 250 synthetic datasets generated according to TVCM (17). Scenarios are delimited by correlation level in rows (weak, medium, and high within-subject correlation) and sample size in columns (n = 25, n = 50, and n = 100). The model is fitted each time using both radial kernel (K) and regression spline (S) functions, with Frequentist (blue), Bayesian (black), and variational (green) methods.

Furthermore, Frequentist inferences are equivalent regardless of the smoothing approach. Also, we observe that the variational approximation to the posterior distribution under the Bayesian paradigm is very precise. This is the case because the mean field assumption is breaking negligible correlations in the posterior distribution. Moreover, as expected, estimates are more consistent as the sample size increases since the variability of the AMSE distribution decreases and its center remains stable. On the contrary, such variability increases as the within-subject correlation becomes higher, which strongly suggests that despite our approach not taking into account within-subject correlation directly, estimates are robust enough to produce accurate results.

5.2. Simulation scenario 2

In this experiment, we generate synthetic datasets as follows:

y_{i} (t) = β_{0} (t) + β_{1} (t) x_{1, i} + β_{1} (t) x_{1, i} + ϵ_{i} (t), i = 1, \dots, n,

(18)

where

\begin{aligned} β_{0} (t) & = 3.5 + 6.5 \sin (t π / 60), \\ β_{1} (t) & = - 0.2 - 1.6 \cos ((t - 30) π / 60), a n d \\ β_{3} (t) & = 0.25 - 0.0074 ((30 - t) / 10)^{3} \end{aligned}

are three nonlinear time-varying coefficients, associated with time-invariant independent covariates $x_{1, i}$ and $x_{2, i}$ such that $x_{1, i} B e r (0.5)$ and $x_{2, i} N (0, 4^{2})$ , and finally, $ϵ_{i} (t)$ is Gaussian process with zero mean and covariance function $γ (s, t) = 0.0625 \exp (- | s - t |)$ . Also, we assume that subjects are independent of each other and scheduled to be observed at m = 31 equally spaced time design points $0, 1, \dots, m$ . However, at each given time point, a subject has 50% probability to be randomly missing. As before, we generated 250 datasets with three sample sizes, n = 25, n = 50, and n = 100, and once again, we fitted the model (1) using exactly the same settings as in Section 6.1.

In order to measure the performance of an estimate fairly, we define the mean absolute deviation of errors (MADE) as

M A D E = \sum_{r = 0}^{d} \sum_{i = 1}^{n} \sum_{j = 1}^{n_{i}} \frac{1}{n n_{i}} \frac{| β_{r} (t_{i, j}) - {\hat{β}}_{r} (t_{i, j}) |}{r a n g e (β_{r})}, r = 0, \dots, d .

Figure 3 shows the MADE distribution along with the dynamic parameter Bayesian estimates of the coefficients, corresponding to 250 synthetic datasets generated according to TVCM (17), in each of three scenarios delimited by sample size (n = 25, n = 50, and n = 100). Again, it is quite obvious the effect of the sample size on the error rates. This behavior is evident from both the decreasing variability of the MADE distribution and the consistent estimates of the coefficients around their true value. Clearly, these simulation results demonstrate that both estimation approaches independently of the inference paradigm, provide reasonably good estimators, at least for interior time points.

5.3. Execution time

Following a suggestion given by one of the referees, here, we provide a comparison in terms of execution time between our two competing approaches to carry out Bayesian inference, namely, MCMC methods and variational methods, i.e. simulation-based methods and optimization-based methods.

In this spirit, Table 1 contains mean running times (in milliseconds) using a single core of an AMD A12-9730P processor, when generating 2,000 samples of the posterior distribution for the model based on radial kernel functions (our proposal) using both MCMC and variational methods, for each synthetic dataset under all simulation settings considered in Section 5. We see that the variational approach clearly greatly outperforms its simulation-based counterpart in terms of execution time. Such an effect is particularly clearer for bigger samples sizes, where variational methods can even be 45 faster than MCMC methods. Lastly, note that MCMC execution times increase notoriously as the sample size grow, whereas variational execution times remain roughly constant.

Table 1.

Mean running times (in milliseconds) using a single core of an AMD A12-9730P processor, when generating 2,000 samples of the posterior distribution for the model based on radial functions (our proposal) using both MCMC (MC) and variational (V) methods, for each synthetic dataset under all simulation settings considered in Section 5.

	n = 25		n = 50		n = 100
Setting	MC	V	MC	V	MC	V
Scenario 1a	128.60	7.93	160.17	6.07	347.17	7.13
Scenario 1b	151.30	10.33	204.10	7.10	458.77	9.87
Scenario 2	280.43	12.03	381.67	12.37	581.30	12.77

Open in a new tab

6. Illustrations

6.1. Case study 1

Our first illustration is based on an AIDS clinical trial developed by the AIDS Clinical Trials Group¹ (ACTG). In this group, [18] evaluated two different 4-drug regimens containing indinavir with either efavirenz or nelfinavir for the treatment of 517 patients with advanced HIV disease (i.e.patients with high HIV-1 RNA levels and low CD4 cell counts). This study was a randomized, open-label study and initially planned to last 72 weeks but later increased to 120 weeks beyond the enrollment of the last subject. The randomization was carried out by using a permuted block design and stratified according to CD4 cell count and HIV-1 RNA level at screening, as well as previous antiretroviral experience. In addition, clinical assessments, HIV-1 RNA measurements, CD4 cell counts, and routine laboratory tests were performed before study entry, at the time of study entry, at weeks 4 and 8, and every 8 weeks thereafter. More details about design, subjects, treatments and outcome measurements of this study are given in [18].

Here, we model the CD4 cell count, which is an essential marker for assessing immunologic response of an antiviral regimen, in one of the two treatment arms. This group includes 166 patients treated with highly active antiretroviral therapy for 120 weeks, during which CD4 cell counts were monitored along with other important markers. Patients might not exactly follow the designed schedule, and missing clinical visits for CD4 cell measurements frequently occurred, which makes this dataset¹ (named ACTG 388) a typical longitudinal dataset. The main goal in this study is to model the mean CD4 cell count trajectories over the treatment period for the entire treatment arm.

In this specific group of patients, the number of CD4 cell count measurements per patient varies from 1 to 18 observations, and the CD4 cell count ranges from 0 to 1,364. Figure 4 shows CD4 cell counts (in logarithmic scale) for each one of the n = 166 patients during the 120 weeks of treatment. Even though individual cell counts are quite noisy and there is evidence of some atypical trajectories associated with low counts, this plot suggests that cell counts tend to stabilize around the middle of the treatment. Thus, it is not possible to ensure that the antiviral treatment was quite effective since there are no apparent reasons to believe that CD4 cell counts profiles are either increasing continuously or at least remaining stable.

Figure 4. — ACTG 388 data: CD4 cell counts (logarithmic scale).

In order to estimate the mean trajectory of CD4 cell counts over the treatment period, we fit the TVCM

y_{i, j} = β_{0} (t_{i, j}) + ϵ_{i, j}, j = 1, \dots, n_{i}, i = 1, \dots, n,

under a Bayesian approach with the prior distribution given in Section 3.2, employing both Gaussian kernel and regression spline functions with g = 2, where $y_{i, j}$ is the CD4 cell count (in logarithmic scale) of the j-th measurement of the i-th patient, and $β_{0} (t)$ is a unknown time-varying parameter describing the mean dynamic trend of CD4 cell counts over time. Again, the number and location of knots are chosen according to the PCV criteria and the equally spaced method described in Section 4, respectively. According to this criteria, the optimal number of knots are $k_{0}^{K} = 4$ and $k_{0}^{S} = 8$ , respectively. Once the number and location of knots are fixed, Bayesian estimates are based on 2, 000 samples from the posterior distribution after a burn-in period of 500 iterations. Convergence was monitored by tracking the variability of the joint distribution of data and parameters using the multi-chain procedure discussed in Gelman and Rubin [22].

Estimates of $β_{0} (t)$ (in logarithmic and natural scale) along with their corresponding 95% credible intervals are shown in Figure 5. Both estimates are very similar, except for the small jump at the beginning of the treatment exhibited by the regression spline-based estimate. Such trajectories, which are very precise since the credible intervals are quite narrow, reveal that the mean CD4 cell counts increase quite sharply during the first 40 weeks of treatment, and continue to increase at a slower rate until about week 100, and then dropped towards the end of the study. This makes evident that under this antiviral regimen, the overall CD4 counts increased dramatically during the first 40 weeks, but the effect of the drug therapy fades over time and completely disappeared after about week 100, when cell counts begin to drop. Almost identical results were obtained by Wu and Zhang [69]; the only difference is that they concluded that the inflection point after which the CD4 cell count started to drop was on week 110. A residual analysis (not shown here) indicates that the model fits the data adequately because there are no signs of particular shapes, patterns or significant deviations.

6.2. Case study 2

We consider another AIDS clinical study carried out by the ACTG. In this case, [32] evaluated a highly active antiretroviral therapy containing zidovudine, lamivudine, and ritonavir, for the treatment of patients with moderately advanced HIV-1 infection. This study was designed to ascertain if administration of highly active antiretroviral therapy to patients with moderately advanced HIV-1 infection was associated with evidence of immunologic restoration. More details about design, subjects, treatments and outcome measurements of this study are given in Lederman et al. [32].

The viral load (plasma HIV RNA level) and immunologic response (CD4 cell counts) are negatively correlated and their relationship is approximately linear during antiviral treatments. However, their relationship may not be a constant during the whole period of treatment [34]. Thus, the main goal in this study is to model the dynamic relationship between the viral load and the immunologic response over the treatment period, which plays an essential role in evaluating the antiviral therapy. Fifty-three patients were enrolled in the trial, out of which n = 46 received the treatment for at least 9 of the first 12 weeks and were therefore eligible for analysis. Intolerance of the treatment regimen was responsible for almost all treatment discontinuations. Patients might not exactly follow the designed schedule, and missing clinical visits occurred frequently, which makes this dataset³ (named ACTG 315) quite unbalanced. Additional analyses of this and other trajectories, as well as more scientific findings of the study, can be found in [9,32,34,67].

After starting treatment, the plasma HIV RNA level and the CD4 cell count were measured simultaneously (both of them were reported in logarithmic scale) at days 0, 2, 7, 10, 14, 28, 56, and 86. Figure 6 shows the corresponding cell counts. The number of repeated measurements per subject varies from 4 to 8, and the total number of observations is 328. Simple linear regressions of cell counts against plasma HIV RNA level at each visit (not shown here) evidence that the slope associated with the viral load changes over time because some days the slope is significantly different from zero. This simple observation motivates fitting a TVCM in order to characterize and quantify such relationship.

Figure 6. — ACTG 315 data: CD4 cell counts (logarithmic scale).

Once again, under a Bayesian setting, employing both Gaussian kernel and regression spline functions with g = 2, we fit the TVCM given by

y_{i, j} = β_{0} (t_{i, j}) + β_{1} (t_{i, j}) x_{1, i} (t_{i, j}) + ϵ_{i, j}, j = 1, \dots, n_{i}, i = 1, \dots, n,

where $y_{i, j}$ and $x_{1, i} (t_{i j})$ are the viral load (in logarithmic scale) and the CD4 cell count (also in logarithmic scale) associated with the j-th measurement of the i-th patient, respectively. The time-varying coefficient $β_{1} (t)$ characterizes the dynamic relationship between the viral load and the immunologic response over the treatment period. Interestingly, according to PCV criteria, the optimal number of knots in both cases are $k_{0}^{K} = k_{0}^{S} = k_{1}^{K} = k_{1}^{S} = 1$ . Following exactly the same setting as in the case study #1, the results we report are based on 2,000 samples obtained after a burn-in period of 500 iterations.

Figure 7 shows estimates of the dynamic slope. Both estimated trajectories are quite smooth. Here, we see a significant negative correlation between viral load and immunologic response at the beginning of the treatment. Then, the relationship consistently attenuates until reaching zero about the fourth week. At that point, the negative correlation gradually strengthens again, and continuously to do so towards the end of the treatment period considered in this analysis. Almost identical results were obtained by [67].

6.3. Goodness-of-fit and predictive performance

The modeling literature has largely focused on both Akaike Information Criteria (AIC) and Bayesian Information Criteria (BIC) as a tool for model selection (e.g. see [69]). However, under a Bayesian setting, the BIC is inappropriate for hierarchical models since it underestimates the complexity of the model. An alternative to BIC that addresses this issue is the Deviance Information Criterion (DIC), $D I C = - 2 \log p (y ∣ \hat{Υ}) + 2 p_{D I C}$ , where $\hat{Υ}$ is the posterior mean of model parameters and $p_{D I C} = 2 \log p (y ∣ \hat{Υ}) - 2 E [\log p (y ∣ Υ)]$ is the model complexity [see 21, for a discussion]. Table 2 presents the DIC for the TVCMs fitted above. It is clear the the TVCM based on radial kernel functions is preferred, specially for the ACTG 388 dataset.

Table 2.

DIC and AMSE to assess goodness-of-fit and out-sample predictive performance, respectively, of both radial kernel and regression spline functions.

	Kernel		Spline
Dataset	DIC	AMSE	DIC	AMSE
ACTG 388	64,257.1	0.761	65,306.7	0.759
ACTG 315	2,960.9	0.758	2,955.8	0.811

Open in a new tab

On the other hand, in order to compare the ability of each alternative to predict missing observations, we evaluate their out-of-sample predictive performance by means of a cross-validation (CV) experiment. Thus, for each combination of dataset and model, we performed an L-fold CV in which L randomly selected subsets of roughly equal size in the dataset are treated as missing and then predicted using the rest of the data. We summarize our findings in Table 2, where we report the average mean square error (AMSE) corresponding to the prediction of missing measurements in the datasets. In this context, the AMSE is a measure of how well a given model is capable of predicting missing observations. We can see from this table that both alternatives have comparable predictive capabilities.

7. Discussion

In this paper, we review two simple but powerful alternatives based on linear expansions to estimate time-varying coefficients: a new approach using radial kernel functions and a more standard method using regression spline functions. We framed the estimation procedure under both Frequentist and Bayesian inference paradigms using bootstrap techniques, Gibbs sampling and a variational Bayes method. From an empirical perspective, we provide two simulations studies. These experiments strongly suggest that either combination of basis representation and inference approach are comparable and mostly equivalent. From a practical perspective, the first case study shows that the overall CD4 counts increased dramatically during the first 40 weeks under a specific anti-viral treatment, but the effect of the drug therapy faded over time and completely disappeared after about week 100. On the other hand, the second case study evidences a strong negative correlation between the viral load and the immunologic response at the beginning of an anti-viral treatment, and then shows evidence of a weak correction about the fifth week, when gradually strengthened again and reached the largest value at the end of the treatment period.

As part of the revision process, one of the referees suggested that, in order to avoid inconsistencies in terms of exposition, the model needed to be presented in a general fashion as in a mixed-effects time-varying coefficient model (see Section 5.1). Even though we agree on the convenience of working with a more general model, we consider that our approach should be treated in terms of a ‘standard’ TVCM as in Equation (1) since it makes exposition straightforward, given that our main contribution rely on an estimation protocol based on radial functions, along with inference strategies according to the Bayesian paradigm. Nonetheless, we exhort the reader to pursue such an extension employing the ideas discussed in this manuscript.

The estimation protocol presented here is susceptible of many extensions. For instance, to avoid the curse of dimensionality, the model can be extended to account for longitudinal inhomogeneity of varying coefficients via Bayesian basis selection or adaptive knot selection which is an integral part of the data generating mechanism. Another interesting extension involves the incorporation of specific working correlation matrices in the probabilistic structure of the random error using more convoluted covariance functions. Extensions to more complex situations, including multivariate or spatial data are also possible, and are the subject of future work.

Appendices.

Appendix A. Algortihms for Inference

A.1. Bootstrap algorithm

Since all subjects are assumed to be independent, a natural sampling scheme consists in resampling the entire repeated measurements of each subject with replacement from the original dataset. The bootstrap samples $Υ^{(1)}, \dots, Υ^{(B)}$ , where $Υ^{(b)} = (α^{(b)}, σ^{(b)})$ , $b = 1, \dots, B$ , can be generated using the following bootstrap algorithm:

Randomly select n bootstrap subjects with replacement from the original dataset, and put together the bootstrap dataset as $D^{(b)} = {(y_{i, j}^{*}, x_{i, j}^{*}, t_{i, j}^{*},) : j = 1, \dots, n_{i}, i = 1, \dots, n} .$ Note that the entire set of repeated measurements for some subjects may appear multiple times in the bootstrap dataset.
Compute $N^{(b)}$ , $y^{(b)}$ , $Z^{(b)}$ , and $W^{(b)}$ based on $D^{(b)}$ .
Compute $α^{(b)}$ as in (10) based on $y^{(b)}$ , $Z^{(b)}$ , $W^{(b)}$ .
Compute $σ^{(b)}$ as in (12) based on $y^{(b)}$ , $Z^{(b)}$ , $W^{(b)}$ , $α^{(b)}$ .
Repeat the previous steps B times.

A.2. MCMC algorithm

The posterior distribution of the parameters can be explored using MCMC algorithms in which the posterior distribution is approximated using dependent but approximately identically distributed samples $Υ^{(1)}, \dots, Υ^{(B)}$ , where $Υ^{(b)} = (α^{(b)}, σ^{(b)})$ , $b = 1, \dots, B$ . In this case, the joint posterior distribution is given by:

\begin{aligned} p (α, σ^{2} ∣ Z, y) & \propto (σ^{2})^{- N / 2} e x p \{- \frac{1}{2 σ^{2}} (y - Z α)^{T} (y - Z α)\} \\ \times (σ^{2})^{- p / 2} e x p \{- \frac{1}{2 N σ^{2}} α^{T} α\} \times (σ^{2})^{- (a_{σ} + 1)} e x p \{- \frac{b_{σ}}{σ^{2}}\}, \end{aligned}

(A1)

where $p = \sum_{r = 0}^{d} p_{r}$ is the expansion dimension. The MCMC algorithm iterates over the full conditional distributions of the model parameters $α$ and $σ^{2}$ by generating a new state $Υ^{(b + 1)}$ from a current state $Υ^{(b)}$ as follows:

Choose an initial value for $α$ , say $α^{(0)}$ .
Update $(σ^{2})^{(b + 1)}$ and $α^{(b + 1)}$ until convergence:
- i.
  Sample $(σ^{2})^{(b + 1)}$ from
  $I G a m (a_{σ} + \frac{N}{2} + \frac{p}{2}, b_{σ} + \frac{1}{2} ∥ y - Z α^{(b)} ∥^{2} + \frac{1}{2 N} ∥ α^{(b)} ∥^{2}) .$
- ii.
  Sample $α^{(b + 1)}$ from
  $N ({(Z^{T} Z + \frac{1}{N} I_{p})}^{- 1} Z^{T} y, (σ^{2})^{(b + 1)} {(Z^{T} Z + \frac{1}{N} I_{p})}^{- 1}) .$

Posterior summaries along with point and interval estimates can be approximated based on the Monte Carlo samples. As before, for a given t, the $100 (1 - α) %$ percentile-based credible interval for $β_{r} (t)$ , $r = 0, 1, \dots, d$ , can be computed as in (13).

A.3. Variational Bayes algorithm

The Markov chain defined in the previous section is guaranteed to converge eventually to the posterior distribution $p (α, σ^{2} ∣ Z, y)$ given in (A1). Here, we consider the problem of finding a function $q (\cdot)$ in a family of functions closest to the posterior distribution $p (\cdot)$ , according to a given dissimilarity measure. This idea is known as variational Bayes [see 44, for a review].

Briefly, the main idea can be summarized as follows. Let $θ = (θ_{1}, \dots, θ_{k})$ be a set of parameters and $p (θ ∣ X)$ its posterior distribution after data $X$ have been observed. The purpose consists in finding a function $q (θ)$ that minimizes the Kullback–Leibler divergence respect to $p (θ ∣ X)$ :

K L (p | | q) = \int q (θ) \log \frac{q (θ)}{p (θ ∣ X)} d θ = \log p (X) - \int q (θ) \log \frac{p (θ, X)}{q (θ)} d θ .

Thus, minimizing $K L (p | | q)$ is equivalent to maximizing

E_{q (θ)} [\log \frac{p (θ, X)}{q (θ)}] = \int q (θ) \log \frac{p (θ, X)}{q (θ)} d θ,

which is known as evidence lower bound (ELBO). Furthermore, if $q (θ)$ is assumed to satisfy the mean field assumption

q (θ) = \prod_{i = 1}^{k} q_{i} (θ_{i}),

where the $q_{i} (θ_{i})$ are marginal variational densities, the solution $q^{*} (θ)$ satisfies

\log q_{i}^{*} (θ_{i}) \propto E_{q (θ_{- i})} [\log p (θ_{i} ∣ X, θ_{- i})],

with $θ_{- i} = (θ_{1}, \dots, θ_{i - 1}, θ_{i + 1}, \dots, θ_{k})$ , which leads to a coordinate optimization algorithm.

In this case, the product density approximation to $p (α, σ^{2} ∣ Z, y)$ is $q (α, σ^{2}) = q (α) q (σ^{2}) .$ The optimal densities take the form

\log q (α) \propto E_{q (σ^{2})} [\log p (α ∣ Z, y, σ^{2})] a n d \log q (σ^{2}) \propto E_{q (α)} [\log p (σ^{2} ∣ Z, y, α)] .

Then,

q (α) \propto e x p \{- \frac{1}{2} [α^{T} E_{q (σ^{2})} [σ^{- 2}] (Z^{T} Z + \frac{1}{N} I_{p}) α - 2 α^{T} E_{q (σ^{2})} [σ^{- 2}] Z^{T} y]\}

which we can recognize as a member of the normal family $N (m^{*}, V^{*})$ where

V^{*} = {(E_{q (σ^{2})} σ^{- 2})}^{- 1} {(Z^{T} Z + \frac{1}{N} I_{p})}^{- 1} a n d m^{*} = V^{*} (E_{q (σ^{2})} σ^{- 2}) Z^{T} y .

Similar arguments lead to

q (σ^{2}) \propto (σ^{2})^{- (a_{σ} + N / 2 + p / 2 + 1)} e x p \{- \frac{1}{σ^{2}} (b_{σ} + \frac{1}{2} E_{q (α)} ∥ y - Z α ∥^{2} + \frac{1}{2 N} E_{q (α)} ∥ α ∥^{2})\},

which we can recognize as a member of the inverse-gamma family $I G a m (a^{*}, b^{*})$ where

\begin{aligned} a^{*} & = a_{σ} + N / 2 + p / 2 a n d \\ b^{*} & = b_{σ} + \frac{1}{2} (∥ y ∥^{2} - 2 y^{T} Z E_{q (α)} α + E_{q (α)} α^{T} (Z^{T} Z + \frac{1}{N} I_{p}) E_{α} α \\ + t r [(Z^{T} Z + \frac{1}{N} I_{p}) {V a r}_{q (α)} α]) . \end{aligned}

Hence, the algorithm for obtaining the parameters in $q (α)$ and $q (σ^{2})$ is the following:

Initialize $b^{*} > 0$ .
Repeat until the increase in ELBO is negligible:
- i.
  $V^{*} \leftarrow \frac{b^{*}}{a^{*}} (Z^{T} Z + \frac{1}{N} I_{p})^{- 1}$
- ii.
  $m^{*} \leftarrow \frac{a^{*}}{b^{*}} V^{*} Z^{T} y$
- iii.
  $b^{*} \leftarrow b_{σ} + \frac{1}{2} (∥ y ∥^{2} - 2 y^{T} Z m^{*} + m^{* T} (Z^{T} Z + \frac{1}{N} I_{p}) m^{*} + t r [(Z^{T} Z + \frac{1}{N} I_{p}) V^{*}])$

The lower bound is given by

E L B O (q) = E_{q} \log p (y ∣ Z, α, σ^{2}) + E_{q} \log p (α, σ^{2}) - E_{q} \log q (α) - E_{q} \log q (σ^{2}) .

Using standard results, the evaluation of each term is straightforward and results in:

\begin{aligned} E L B O (q) & = - \frac{1}{2} (N \log 2 π + p \log N - p) + a_{σ} \log b_{σ} - \log Γ (a_{σ}) \\ + a^{*} (1 + \log b^{*} - 2 ψ (a^{*})) + \log Γ (a^{*}) + 2 (\log b^{*} - ψ (a^{*})) + \frac{1}{2} \log | V^{*} | \\ - \frac{a^{*}}{b^{*}} [b_{σ} + \frac{1}{2} (∥ y ∥^{2} - 2 y^{T} Z m^{*} + m^{* T} (Z^{T} Z + \frac{1}{N} I_{p}) m^{*} \\ + t r [(Z^{T} Z + \frac{1}{N} I_{p}) V^{*}])], \end{aligned}

where $ψ (\cdot)$ is the digamma function.

Posterior summaries can be obtained via Monte Carlo simulation using standard random number generation routines.

Appendix B. Notation

Matrices and vectors with entries consisting of subscripted variables are denoted by a boldfaced version of the letter for that variable. For example, $x = (x_{1}, \dots, x_{n})$ denotes an $n \times 1$ column vector with entries $x_{1}, \dots, x_{n}$ . We use $0$ and $1$ to denote the column vector with all entries equal to 0 and 1, respectively, and $I$ to denote the identity matrix. A subindex in this context refers to the corresponding dimension; for instance, $I_{n}$ denotes the $n \times n$ identity matrix. The transpose of a vector $x$ is denoted by $x^{⊤}$ ; analogously for matrices. Moreover, if $X$ is a square matrix, we use $t r (X)$ to denote its trace and $X^{- 1}$ to denote its inverse. The norm of $x$ , given by $\sqrt{x^{⊤} x}$ , is denoted by $∥ x ∥$ .

Notes

Visit https://actgnetwork.org/ for more information about the group.

http://www.urmc.rochester.edu/biostat/people/faculty/wusite/datasets/data/ACTG388Data1Arm.cfm.

http://www.urmc.rochester.edu/biostat/people/faculty/wusite/datasets/data/ACTG315LongitudinalDataViralLoadData.cfm.

Disclosure statement

No potential conflict of interest was reported by the author(s).

References

1.Andriyana Y., Gijbels I., and Verhasselt A., Quantile regression in varying-coefficient models: non-crossing quantile curves and heteroscedasticity, Stat. Papers 59 (2018), pp. 1589–1621. [Google Scholar]
2.Assuncao R.M., Space varying coefficient models for small area data, Environ. Off. J. Int. Environ. Soc. 14 (2003), pp. 453–473. [Google Scholar]
3.Biller C. and Fahrmeir L., Bayesian varying-coefficient models using adaptive regression splines, Stat. Modelling. 1 (2001), pp. 195–211. [Google Scholar]
4.Buhmann M.D., Radial Basis Functions, Cambridge University Press, 2004. [Google Scholar]
5.Cai Z., Fan J., and Li R., Efficient estimation and inferences for varying-coefficient models, J. Am. Stat. Assoc. 95 (2000), pp. 888–902. [Google Scholar]
6.Chiang C.T., Rice J.A., and Wu C.O., Smoothing spline estimation for varying coefficient models with repeatedly measured dependent variables, J. Am. Stat. Assoc. 96 (2001), pp. 605–619. [Google Scholar]
7.Chiou J.M., Ma Y., and Tsai C.L., Functional random effect time-varying coefficient model for longitudinal data, Stat 1 (2012), pp. 75–89. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Cleveland W.S., Grosse E., and Shyu W.M., Local regression models. statistical models in s (Chambers, J.M. and Hastie, T.J., Eds), 309–376. Wadsworth & Brooks, Pacific Grove, 1991,
9.Connick E., Lederman M.M., Kotzin B.L., Spritzler J., Kuritzkes D.R., Sevin A.D., Fox L., Chiozzi M.H., Leonard J.M., Rousseau F., Roe J.D., Martinez A., Kessler H., and Landay A., Immune reconstitution in the first year of potent antiretroviral therapy and its relationship to virologic response, J. Infect. Dis. 181 (2000), pp. 358–363. [DOI] [PubMed] [Google Scholar]
10.Efron B. and Hastie T, Computer Age Statistical Inference, Vol. 5, Cambridge University Press, 2016. [Google Scholar]
11.Eubank R.L., Huang C., Maldonado Y.M., Wang N., Wang S., and Buchanan R.J., Smoothing spline estimation in varying-Coefficient models, J. R. Stat. Soc. Seri. B Stat. Method.) 66 (2004), pp. 653–667. [Google Scholar]
12.Fan J. and Huang T., Profile likelihood inferences on semiparametric varying-coefficient partially linear models, Bernoulli 11 (2005), pp. 1031–1057. [Google Scholar]
13.Fan J., Yao Q., and Cai Z., Adaptive varying-coefficient linear models, J. R. Stat. Soc.: Series B (Stat. Method.) 65 (2003), pp. 57–80. [Google Scholar]
14.Fan J. and Zhang J.T., Two-step estimation of functional linear models with applications to longitudinal data, J. R. Stat. Soc.: Seri. B (Stat. Method.) 62 (2000), pp. 303–322. [Google Scholar]
15.Fan J. and Zhang W., Statistical estimation in varying coefficient models, Annal. Stat. 27 (1999), pp. 1491–1518. [Google Scholar]
16.Fan J. and Zhang W., Statistical methods with varying coefficient models, Stat. Interface. 1 (2008), pp. 179. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Faraway J.J., Linear Models with R, CRC press, 2014. [Google Scholar]
18.Fischl M.A., Ribaudo H.J., Collier A.C., Erice A., Giuliano M., Dehlinger M., Eron Jr., Hammer S.M., Vella S., Morse G.D., and Feinberg J.E., A randomized trial of 2 different 4-drug antiretroviral regimens versus a 3-drug regimen, in advanced human immunodeficiency virus disease, J. Infect. Dis. 188 (2003), pp. 625–634. [DOI] [PubMed] [Google Scholar]
19.Franco-Villoria M., Ventrucci M., and Rue H., A unified view on bayesian varying coefficient models, Electron. J. Stat. 13 (2019), pp. 5334–5359. [Google Scholar]
20.Gelfand A.E., Kim H.J., Sirmans C.F., and Banerjee S., Spatial modeling with spatially varying coefficient processes, J. Am. Stat. Assoc. 98 (2003), pp. 387–396. [Google Scholar]
21.Gelman A., Carlin J.B., Stern H.S., Dunson D.B., Vehtari A., and Rubin D.B., Bayesian Data Analysis, CRC press, 2013. [Google Scholar]
22.Gelman A. and Rubin D., Inferences from iterative simulation using multiple sequences, Stat. Sci. 7 (1992), pp. 457–472. [Google Scholar]
23.Harezlak J., Ruppert D., and Wand M.P., Semiparametric Regression with R, Springer, 2018. [Google Scholar]
24.Hastie T. and Tibshirani R., Varying-coefficient models, J. R. Stat. Soc.: Seri. B (Method.) 55 (1993), pp. 757–779. [Google Scholar]
25.Hastie T., Tibshirani R., and Friedman J., The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer Science & Business Media, 2009. [Google Scholar]
26.Hoff P.D., A First Course in Bayesian Statistical Methods, Vol. 580, Springer, 2009. [Google Scholar]
27.Hoover D.R., Rice J.A., Wu C.O., and Yang L.P., Nonparametric smoothing estimates of time-varying coefficient models with longitudinal data, Biometrika 85 (1998), pp. 809–822. [Google Scholar]
28.Hua Z., Bayesian Analysis of Varying Coefficient Models and Applications. PhD thesis, University of North Carolina at Chapel Hill, 2011.
29.Huang J.Z., Wu C.O., and Zhou L., Varying-coefficient models and basis function approximations for the analysis of repeated measurements, Biometrika 89 (2002), pp. 111–128. [Google Scholar]
30.Jeong S., Park M., and Park T., Analysis of binary longitudinal data with time-varying effects, Comput. Stat. Data. Anal. 112 (2017), pp. 145–153. [Google Scholar]
31.Jeong S., Park, T.,et al. , Bayesian semiparametric inference on functional relationships in linear mixed models, Bayesian Anal. 11 (2016), pp. 1137–1163. [Google Scholar]
32.Lederman M.M., Connick E., Landay A., Kuritzkes D.R., Spritzler J., Kotzin B.L., Fox L., Chiozzi M.H., Leonard J.M., Rousseau F., Wade M., Roe J.D., Martinez A., and Kessler H., Immunologic responses associated with 12 weeks of combination antiretroviral therapy consisting of zidovudine, lamivudine, and ritonavir: results of aids clinical trials group protocol 315, J. Infect. Dis. 178 (1998), pp. 70–79. [DOI] [PubMed] [Google Scholar]
33.Li R. and Liang H., Variable selection in semiparametric regression modeling, Ann. Stat. 36 (2008), pp. 261. [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Liang H., Wu H., and Carroll R.J., The relationship between virologic and immunologic responses in aids clinical research using mixed-effects varying-coefficient models with measurement error, Biostatistics 4 (2003), pp. 297–312. [DOI] [PubMed] [Google Scholar]
35.Lin D.Y. and Ying Z., Semiparametric and nonparametric regression analysis of longitudinal data, J. Am. Stat. Assoc. 96 (2001), pp. 103–126. [Google Scholar]
36.Lin X. and Carroll R.J., Nonparametric function estimation for clustered data when the predictor is measured without/with error, J. Am. Stat. Assoc. 95 (2000), pp. 520–534. [Google Scholar]
37.Little T.D., Deboeck P., and Wu W., Longitudinal data analysis. Emerging Trends in the Social and Behavioral Sciences: An Interdisciplinary, Searchable, and Linkable Resource, (2015). pp 1–17. Wiley Online Library. 10.1002/9781118900772.etrds0208 [DOI]
38.Liu X., Methods and Applications of Longitudinal Data Analysis, Elsevier, 2015. [Google Scholar]
39.Lu T. and Huang Y., Bayesian inference on mixed-effects varying-coefficient joint models with skew-t distribution for longitudinal data with multiple features, Stat. Methods. Med. Res. 26 (2017), pp. 1146–1164. [DOI] [PubMed] [Google Scholar]
40.Lu Y. and Zhang R., Smoothing spline estimation of generalised varying-coefficient mixed model, J. Nonparametr. Stat. 21 (2009), pp. 815–825. [Google Scholar]
41.Memmedli M. and Nizamitdinov A., An application of various nonparametric techniques by nonparametric regression splines, Int. J. Math. Models Methods Appl. Sci. 6 (2012), pp. 106–113. [Google Scholar]
42.Molenberghs G., Fitzmaurice G., Kenward M.G., Tsiatis A., and Verbeke G., Handbook of Missing Data Methodology, CRC Press, 2014. [Google Scholar]
43.Nobles M., Serban N., and Swann J., Spatial accessibility of pediatric primary healthcare: measurement and inference, Ann. Appl. Stat. 8 (2014), pp. 1922–1946. [Google Scholar]
44.Ormerod J.T. and Wand M.P., Explaining variational approximations, Am. Stat. 64 (2010), pp. 140–153. [Google Scholar]
45.Qu A. and Li R., Quadratic inference functions for varying-coefficient models with longitudinal data, Biometrics 62 (2006), pp. 379–391. [DOI] [PMC free article] [PubMed] [Google Scholar]
46.Ramsay J.O., Hooker G., and Graves S., Functional data analysis with r and matlab, (2009).
47.Rice J.A. and Wu C.O., Nonparametric mixed effects models for unequally sampled noisy curves, Biometrics 57 (2001), pp. 253–259. [DOI] [PubMed] [Google Scholar]
48.Ruppert D., Wand M.P., and Carroll R.J., Semiparametric Regression, Vol. 12, Cambridge university press, 2003. [Google Scholar]
49.Senturk D., Dalrymple L.S., Mohammed S.M., Kaysen G.A., and Nguyen D.V., Modeling time-varying effects with generalized and unsynchronized longitudinal data, Medicine¡/DIFdel¿Stat. Med. 32 (2013), pp. 2971–2987. [DOI] [PMC free article] [PubMed] [Google Scholar]
50.Senturk D. and Muller H.G., Generalized varying coefficient models for longitudinal data, Biometrika 95 (2008), pp. 653–666. [Google Scholar]
51.Senturk D. and Muller H.G., Functional varying coefficient models for longitudinal data, J. Am. Stat. Assoc. 105 (2010), pp. 1256–1264. [Google Scholar]
52.Serban N., A space–time varying coefficient model: the equity of service accessibility, Ann. Appl. Stat. 5 (2011), pp. 2024–2051. [Google Scholar]
53.Sosa J. and Diaz L.G., Random time-varying coefficient model estimation through radial basis functions, Rev. Colom. De Estadistica 35 (2012), pp. 167–184. [DOI] [PMC free article] [PubMed] [Google Scholar]
54.Tan X., Shiyko M.P., Li R., Li Y., and Dierker L., A time-varying effect model for intensive longitudinal data, Psychol. Methods. 17 (2012), pp. 61. [DOI] [PMC free article] [PubMed] [Google Scholar]
55.Waller L.A., Zhu L., Gotway C.A., Gorman D.M., and Gruenewald P.J., Quantifying geographic variations in associations between alcohol distribution and violence: a comparison of geographically weighted regression and spatially varying coefficient models, Stoch. Environ. Res. Risk. Assess. 21 (2007), pp. 573–588. [Google Scholar]
56.Wang H. and Xia Y., Shrinkage estimation of the varying coefficient model, J. Am. Stat. Assoc. 104 (2009), pp. 747–757. [Google Scholar]
57.Wang H.J., Zhu Z., and Zhou J., Quantile regression in partially linear varying coefficient models, Ann. Stat. 37 (2009), pp. 3841–3866. [Google Scholar]
58.Wang L., Li H., and Huang J.Z., Variable selection in nonparametric varying-coefficient models for analysis of repeated measurements, J. Am. Stat. Assoc. 103 (2008), pp. 1556–1569. [DOI] [PMC free article] [PubMed] [Google Scholar]
59.Wang Y., Varying-coefficient models: New models, inference procedures, and applications, (2007).
60.Wasserman L., Nonparametric Statistics, Springer-Verlag, New York, 2006. [Google Scholar]
61.Wei F., Huang J., and Li H., Variable selection and estimation in high-dimensional varying-coefficient models, Stat. Sin. 21 (2011), pp. 1515. [DOI] [PMC free article] [PubMed] [Google Scholar]
62.Wu C.O. and Chiang C.T., Kernel smoothing on varying coefficient models with longitudinal dependent variable, Stat. Sin. 10 (2000), pp. 433–456. [Google Scholar]
63.Wu C.O., Chiang C.T., and Hoover D.R., Asymptotic confidence regions for kernel smoothing of a varying-coefficient model with longitudinal data, J. Am. Stat. Assoc. 93 (1998), pp. 1388–1402. [Google Scholar]
64.Wu C.O. and Tian X., Nonparametric estimation of conditional distributions and rank-tracking probabilities with time-varying transformation models in longitudinal studies, J. Am. Stat. Assoc. 108 (2013), pp. 971–982. [Google Scholar]
65.Wu C.O. and Tian X., Nonparametric Models for Longitudinal Data: With Implementation in R, CRC Press, 2018. [DOI] [PubMed] [Google Scholar]
66.Wu C.O., Tian X., and Yu J., Nonparametric estimation for time-varying transformation models with longitudinal data, J. Nonparametr. Stat. 22 (2010), pp. 133–147. [Google Scholar]
67.Wu H. and Liang H., Backfitting random varying-coefficient models with time-dependent smoothing covariates, Scan. J. Stat. 31 (2004), pp. 3–19. [Google Scholar]
68.Wu H. and Zhang J.T., Local polynomial mixed-effects models for longitudinal data, J. Am. Stat. Assoc. 97 (2002), pp. 883–897. [Google Scholar]
69.Wu H. and Zhang J.T., Nonparametric Regression Methods for Longitudinal Data Analysis, John Wiley and Sons, 2006. [Google Scholar]
70.Zeger S.L. and Diggle P.J., Semiparametric models for longitudinal data with application to CD4 cell numbers in HIV seroconverters, Biometrics 30 (1994), pp. 689–699. [PubMed] [Google Scholar]
71.Zhang D., Lin X., Raz J., and Sowers M., Semiparametric stochastic mixed models for longitudinal data, J. Am. Stat. Assoc. 93 (1998), pp. 710–719. [Google Scholar]
72.Zhang J.T., Analysis of Variance for Functional Data, CRC Press, 2013. [Google Scholar]

[CIT0001] 1.Andriyana Y., Gijbels I., and Verhasselt A., Quantile regression in varying-coefficient models: non-crossing quantile curves and heteroscedasticity, Stat. Papers 59 (2018), pp. 1589–1621. [Google Scholar]

[CIT0002] 2.Assuncao R.M., Space varying coefficient models for small area data, Environ. Off. J. Int. Environ. Soc. 14 (2003), pp. 453–473. [Google Scholar]

[CIT0003] 3.Biller C. and Fahrmeir L., Bayesian varying-coefficient models using adaptive regression splines, Stat. Modelling. 1 (2001), pp. 195–211. [Google Scholar]

[CIT0004] 4.Buhmann M.D., Radial Basis Functions, Cambridge University Press, 2004. [Google Scholar]

[CIT0005] 5.Cai Z., Fan J., and Li R., Efficient estimation and inferences for varying-coefficient models, J. Am. Stat. Assoc. 95 (2000), pp. 888–902. [Google Scholar]

[CIT0006] 6.Chiang C.T., Rice J.A., and Wu C.O., Smoothing spline estimation for varying coefficient models with repeatedly measured dependent variables, J. Am. Stat. Assoc. 96 (2001), pp. 605–619. [Google Scholar]

[CIT0007] 7.Chiou J.M., Ma Y., and Tsai C.L., Functional random effect time-varying coefficient model for longitudinal data, Stat 1 (2012), pp. 75–89. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0008] 8.Cleveland W.S., Grosse E., and Shyu W.M., Local regression models. statistical models in s (Chambers, J.M. and Hastie, T.J., Eds), 309–376. Wadsworth & Brooks, Pacific Grove, 1991,

[CIT0009] 9.Connick E., Lederman M.M., Kotzin B.L., Spritzler J., Kuritzkes D.R., Sevin A.D., Fox L., Chiozzi M.H., Leonard J.M., Rousseau F., Roe J.D., Martinez A., Kessler H., and Landay A., Immune reconstitution in the first year of potent antiretroviral therapy and its relationship to virologic response, J. Infect. Dis. 181 (2000), pp. 358–363. [DOI] [PubMed] [Google Scholar]

[CIT0010] 10.Efron B. and Hastie T, Computer Age Statistical Inference, Vol. 5, Cambridge University Press, 2016. [Google Scholar]

[CIT0011] 11.Eubank R.L., Huang C., Maldonado Y.M., Wang N., Wang S., and Buchanan R.J., Smoothing spline estimation in varying-Coefficient models, J. R. Stat. Soc. Seri. B Stat. Method.) 66 (2004), pp. 653–667. [Google Scholar]

[CIT0012] 12.Fan J. and Huang T., Profile likelihood inferences on semiparametric varying-coefficient partially linear models, Bernoulli 11 (2005), pp. 1031–1057. [Google Scholar]

[CIT0013] 13.Fan J., Yao Q., and Cai Z., Adaptive varying-coefficient linear models, J. R. Stat. Soc.: Series B (Stat. Method.) 65 (2003), pp. 57–80. [Google Scholar]

[CIT0014] 14.Fan J. and Zhang J.T., Two-step estimation of functional linear models with applications to longitudinal data, J. R. Stat. Soc.: Seri. B (Stat. Method.) 62 (2000), pp. 303–322. [Google Scholar]

[CIT0015] 15.Fan J. and Zhang W., Statistical estimation in varying coefficient models, Annal. Stat. 27 (1999), pp. 1491–1518. [Google Scholar]

[CIT0016] 16.Fan J. and Zhang W., Statistical methods with varying coefficient models, Stat. Interface. 1 (2008), pp. 179. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0017] 17.Faraway J.J., Linear Models with R, CRC press, 2014. [Google Scholar]

[CIT0018] 18.Fischl M.A., Ribaudo H.J., Collier A.C., Erice A., Giuliano M., Dehlinger M., Eron Jr., Hammer S.M., Vella S., Morse G.D., and Feinberg J.E., A randomized trial of 2 different 4-drug antiretroviral regimens versus a 3-drug regimen, in advanced human immunodeficiency virus disease, J. Infect. Dis. 188 (2003), pp. 625–634. [DOI] [PubMed] [Google Scholar]

[CIT0019] 19.Franco-Villoria M., Ventrucci M., and Rue H., A unified view on bayesian varying coefficient models, Electron. J. Stat. 13 (2019), pp. 5334–5359. [Google Scholar]

[CIT0020] 20.Gelfand A.E., Kim H.J., Sirmans C.F., and Banerjee S., Spatial modeling with spatially varying coefficient processes, J. Am. Stat. Assoc. 98 (2003), pp. 387–396. [Google Scholar]

[CIT0021] 21.Gelman A., Carlin J.B., Stern H.S., Dunson D.B., Vehtari A., and Rubin D.B., Bayesian Data Analysis, CRC press, 2013. [Google Scholar]

[CIT0022] 22.Gelman A. and Rubin D., Inferences from iterative simulation using multiple sequences, Stat. Sci. 7 (1992), pp. 457–472. [Google Scholar]

[CIT0023] 23.Harezlak J., Ruppert D., and Wand M.P., Semiparametric Regression with R, Springer, 2018. [Google Scholar]

[CIT0024] 24.Hastie T. and Tibshirani R., Varying-coefficient models, J. R. Stat. Soc.: Seri. B (Method.) 55 (1993), pp. 757–779. [Google Scholar]

[CIT0025] 25.Hastie T., Tibshirani R., and Friedman J., The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer Science & Business Media, 2009. [Google Scholar]

[CIT0026] 26.Hoff P.D., A First Course in Bayesian Statistical Methods, Vol. 580, Springer, 2009. [Google Scholar]

[CIT0027] 27.Hoover D.R., Rice J.A., Wu C.O., and Yang L.P., Nonparametric smoothing estimates of time-varying coefficient models with longitudinal data, Biometrika 85 (1998), pp. 809–822. [Google Scholar]

[CIT0028] 28.Hua Z., Bayesian Analysis of Varying Coefficient Models and Applications. PhD thesis, University of North Carolina at Chapel Hill, 2011.

[CIT0029] 29.Huang J.Z., Wu C.O., and Zhou L., Varying-coefficient models and basis function approximations for the analysis of repeated measurements, Biometrika 89 (2002), pp. 111–128. [Google Scholar]

[CIT0030] 30.Jeong S., Park M., and Park T., Analysis of binary longitudinal data with time-varying effects, Comput. Stat. Data. Anal. 112 (2017), pp. 145–153. [Google Scholar]

[CIT0031] 31.Jeong S., Park, T.,et al. , Bayesian semiparametric inference on functional relationships in linear mixed models, Bayesian Anal. 11 (2016), pp. 1137–1163. [Google Scholar]

[CIT0032] 32.Lederman M.M., Connick E., Landay A., Kuritzkes D.R., Spritzler J., Kotzin B.L., Fox L., Chiozzi M.H., Leonard J.M., Rousseau F., Wade M., Roe J.D., Martinez A., and Kessler H., Immunologic responses associated with 12 weeks of combination antiretroviral therapy consisting of zidovudine, lamivudine, and ritonavir: results of aids clinical trials group protocol 315, J. Infect. Dis. 178 (1998), pp. 70–79. [DOI] [PubMed] [Google Scholar]

[CIT0033] 33.Li R. and Liang H., Variable selection in semiparametric regression modeling, Ann. Stat. 36 (2008), pp. 261. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0034] 34.Liang H., Wu H., and Carroll R.J., The relationship between virologic and immunologic responses in aids clinical research using mixed-effects varying-coefficient models with measurement error, Biostatistics 4 (2003), pp. 297–312. [DOI] [PubMed] [Google Scholar]

[CIT0035] 35.Lin D.Y. and Ying Z., Semiparametric and nonparametric regression analysis of longitudinal data, J. Am. Stat. Assoc. 96 (2001), pp. 103–126. [Google Scholar]

[CIT0036] 36.Lin X. and Carroll R.J., Nonparametric function estimation for clustered data when the predictor is measured without/with error, J. Am. Stat. Assoc. 95 (2000), pp. 520–534. [Google Scholar]

[CIT0037] 37.Little T.D., Deboeck P., and Wu W., Longitudinal data analysis. Emerging Trends in the Social and Behavioral Sciences: An Interdisciplinary, Searchable, and Linkable Resource, (2015). pp 1–17. Wiley Online Library. 10.1002/9781118900772.etrds0208 [DOI]

[CIT0038] 38.Liu X., Methods and Applications of Longitudinal Data Analysis, Elsevier, 2015. [Google Scholar]

[CIT0039] 39.Lu T. and Huang Y., Bayesian inference on mixed-effects varying-coefficient joint models with skew-t distribution for longitudinal data with multiple features, Stat. Methods. Med. Res. 26 (2017), pp. 1146–1164. [DOI] [PubMed] [Google Scholar]

[CIT0040] 40.Lu Y. and Zhang R., Smoothing spline estimation of generalised varying-coefficient mixed model, J. Nonparametr. Stat. 21 (2009), pp. 815–825. [Google Scholar]

[CIT0041] 41.Memmedli M. and Nizamitdinov A., An application of various nonparametric techniques by nonparametric regression splines, Int. J. Math. Models Methods Appl. Sci. 6 (2012), pp. 106–113. [Google Scholar]

[CIT0042] 42.Molenberghs G., Fitzmaurice G., Kenward M.G., Tsiatis A., and Verbeke G., Handbook of Missing Data Methodology, CRC Press, 2014. [Google Scholar]

[CIT0043] 43.Nobles M., Serban N., and Swann J., Spatial accessibility of pediatric primary healthcare: measurement and inference, Ann. Appl. Stat. 8 (2014), pp. 1922–1946. [Google Scholar]

[CIT0044] 44.Ormerod J.T. and Wand M.P., Explaining variational approximations, Am. Stat. 64 (2010), pp. 140–153. [Google Scholar]

[CIT0045] 45.Qu A. and Li R., Quadratic inference functions for varying-coefficient models with longitudinal data, Biometrics 62 (2006), pp. 379–391. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0046] 46.Ramsay J.O., Hooker G., and Graves S., Functional data analysis with r and matlab, (2009).

[CIT0047] 47.Rice J.A. and Wu C.O., Nonparametric mixed effects models for unequally sampled noisy curves, Biometrics 57 (2001), pp. 253–259. [DOI] [PubMed] [Google Scholar]

[CIT0048] 48.Ruppert D., Wand M.P., and Carroll R.J., Semiparametric Regression, Vol. 12, Cambridge university press, 2003. [Google Scholar]

[CIT0049] 49.Senturk D., Dalrymple L.S., Mohammed S.M., Kaysen G.A., and Nguyen D.V., Modeling time-varying effects with generalized and unsynchronized longitudinal data, Medicine¡/DIFdel¿Stat. Med. 32 (2013), pp. 2971–2987. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0050] 50.Senturk D. and Muller H.G., Generalized varying coefficient models for longitudinal data, Biometrika 95 (2008), pp. 653–666. [Google Scholar]

[CIT0051] 51.Senturk D. and Muller H.G., Functional varying coefficient models for longitudinal data, J. Am. Stat. Assoc. 105 (2010), pp. 1256–1264. [Google Scholar]

[CIT0052] 52.Serban N., A space–time varying coefficient model: the equity of service accessibility, Ann. Appl. Stat. 5 (2011), pp. 2024–2051. [Google Scholar]

[CIT0053] 53.Sosa J. and Diaz L.G., Random time-varying coefficient model estimation through radial basis functions, Rev. Colom. De Estadistica 35 (2012), pp. 167–184. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0054] 54.Tan X., Shiyko M.P., Li R., Li Y., and Dierker L., A time-varying effect model for intensive longitudinal data, Psychol. Methods. 17 (2012), pp. 61. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0055] 55.Waller L.A., Zhu L., Gotway C.A., Gorman D.M., and Gruenewald P.J., Quantifying geographic variations in associations between alcohol distribution and violence: a comparison of geographically weighted regression and spatially varying coefficient models, Stoch. Environ. Res. Risk. Assess. 21 (2007), pp. 573–588. [Google Scholar]

[CIT0056] 56.Wang H. and Xia Y., Shrinkage estimation of the varying coefficient model, J. Am. Stat. Assoc. 104 (2009), pp. 747–757. [Google Scholar]

[CIT0057] 57.Wang H.J., Zhu Z., and Zhou J., Quantile regression in partially linear varying coefficient models, Ann. Stat. 37 (2009), pp. 3841–3866. [Google Scholar]

[CIT0058] 58.Wang L., Li H., and Huang J.Z., Variable selection in nonparametric varying-coefficient models for analysis of repeated measurements, J. Am. Stat. Assoc. 103 (2008), pp. 1556–1569. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0059] 59.Wang Y., Varying-coefficient models: New models, inference procedures, and applications, (2007).

[CIT0060] 60.Wasserman L., Nonparametric Statistics, Springer-Verlag, New York, 2006. [Google Scholar]

[CIT0061] 61.Wei F., Huang J., and Li H., Variable selection and estimation in high-dimensional varying-coefficient models, Stat. Sin. 21 (2011), pp. 1515. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0062] 62.Wu C.O. and Chiang C.T., Kernel smoothing on varying coefficient models with longitudinal dependent variable, Stat. Sin. 10 (2000), pp. 433–456. [Google Scholar]

[CIT0063] 63.Wu C.O., Chiang C.T., and Hoover D.R., Asymptotic confidence regions for kernel smoothing of a varying-coefficient model with longitudinal data, J. Am. Stat. Assoc. 93 (1998), pp. 1388–1402. [Google Scholar]

[CIT0064] 64.Wu C.O. and Tian X., Nonparametric estimation of conditional distributions and rank-tracking probabilities with time-varying transformation models in longitudinal studies, J. Am. Stat. Assoc. 108 (2013), pp. 971–982. [Google Scholar]

[CIT0065] 65.Wu C.O. and Tian X., Nonparametric Models for Longitudinal Data: With Implementation in R, CRC Press, 2018. [DOI] [PubMed] [Google Scholar]

[CIT0066] 66.Wu C.O., Tian X., and Yu J., Nonparametric estimation for time-varying transformation models with longitudinal data, J. Nonparametr. Stat. 22 (2010), pp. 133–147. [Google Scholar]

[CIT0067] 67.Wu H. and Liang H., Backfitting random varying-coefficient models with time-dependent smoothing covariates, Scan. J. Stat. 31 (2004), pp. 3–19. [Google Scholar]

[CIT0068] 68.Wu H. and Zhang J.T., Local polynomial mixed-effects models for longitudinal data, J. Am. Stat. Assoc. 97 (2002), pp. 883–897. [Google Scholar]

[CIT0069] 69.Wu H. and Zhang J.T., Nonparametric Regression Methods for Longitudinal Data Analysis, John Wiley and Sons, 2006. [Google Scholar]

[CIT0070] 70.Zeger S.L. and Diggle P.J., Semiparametric models for longitudinal data with application to CD4 cell numbers in HIV seroconverters, Biometrics 30 (1994), pp. 689–699. [PubMed] [Google Scholar]

[CIT0071] 71.Zhang D., Lin X., Raz J., and Sowers M., Semiparametric stochastic mixed models for longitudinal data, J. Am. Stat. Assoc. 93 (1998), pp. 710–719. [Google Scholar]

[CIT0072] 72.Zhang J.T., Analysis of Variance for Functional Data, CRC Press, 2013. [Google Scholar]

PERMALINK

Time-varying coefficient model estimation through radial basis functions

Juan Sosa

Lina Buitrago

ABSTRACT

1. Introduction

2. Estimation using radial kernel functions

3. Inference methods

3.1. Frequentist inference

3.2. Bayesian inference

4. Location and number of knots

5. Simulation study

5.1. Simulation scenario 1

Figure 1.

Figure 2.

5.2. Simulation scenario 2

Figure 3.

5.3. Execution time

Table 1.

6. Illustrations

6.1. Case study 1

Figure 4.

Figure 5.

6.2. Case study 2

Figure 6.

Figure 7.

6.3. Goodness-of-fit and predictive performance

Table 2.

7. Discussion

Appendices.

Appendix A. Algortihms for Inference

A.1. Bootstrap algorithm

A.2. MCMC algorithm

A.3. Variational Bayes algorithm

Appendix B. Notation

Notes

Disclosure statement

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases