Stochastic Functional Data Analysis: A Diffusion Model-based Approach

Bin Zhu; Peter X-K Song; Jeremy MG Taylor

doi:10.1111/j.1541-0420.2011.01591.x

. Author manuscript; available in PMC: 2012 Dec 1.

Published in final edited form as: Biometrics. 2011 Mar 18;67(4):1295–1304. doi: 10.1111/j.1541-0420.2011.01591.x

Stochastic Functional Data Analysis: A Diffusion Model-based Approach

Bin Zhu ^1,^2,^*, Peter X-K Song ^3,^**, Jeremy MG Taylor ^3,^***

PMCID: PMC3128649 NIHMSID: NIHMS276586 PMID: 21418053

Summary

This paper presents a new modeling strategy in functional data analysis. We consider the problem of estimating an unknown smooth function given functional data with noise. The unknown function is treated as the realization of a stochastic process, which is incorporated into a diffusion model. The method of smoothing spline estimation is connected to a special case of this approach. The resulting models offer great flexibility to capture the dynamic features of functional data, and allow straightforward and meaningful interpretation. The likelihood of the models is derived with Euler approximation and data augmentation. A unified Bayesian inference method is carried out via a Markov Chain Monte Carlo algorithm including a simulation smoother. The proposed models and methods are illustrated on some prostate specific antigen data, where we also show how the models can be used for forecasting.

Keywords: Diffusion model, Euler approximation, Nonparametric regression, Simulation smoother, Stochastic differential equation, Stochastic velocity model

1. Introduction

With the advent of many high-throughput technologies, functional data are routinely collected. To analyze those data, we usually assume that the observations are generated from a unknown mean function with additive errors. This paper aims to use diffusion model to estimate the mean function and its derivatives under this assumption.

There is a rich literature on penalized methods to regulate the mean function and to incorporate smoothness assumptions by using the penalized functions, with the focus on estimation of the unknown mean function(Wahba, 1990; Green and Silverman, 1994; Ramsay and Silverman, 2005). In many practical settings, not only the mean function but also its derivatives (in general referred to as dynamics) offer useful insights regarding the underlying mechanism of a physical or biological process. For example, in the study of prostate specific antigen (PSA), an important biomarker of prostate cancer, we are not only interested in the PSA level but also the dynamics of PSA. Figure 1 displays raw data of one patient's PSA level (panel (a)) and the scaled difference (panel (b)) over time (Proust-Lima et al., 2008), where Y (t) = log(PSA(t)+0.1) and scaled difference is $\frac{Δ Y}{Δ t}$ . It is easy to observe that the PSA level is largely driven by the behavior of the scaled difference that itself provides meaningful clinical interpretation. Modeling the process of the scaled difference will facilitate the modeling of the PSA level. However, the connection between the PSA level and the scale difference cannot be established simply by association, but instead by hierarchical models of dynamics, as the scaled difference may be regarded as the derivative of the PSA level.

PSA plots: (a) the raw data; (b) the scaled difference

This paper presents a new modeling strategy in functional data analysis, where the priori smoothness assumption is specified by stochastic diffusion processes, using a set of ordinary and stochastic differential equations connected in a hierarchical fashion, in the hope that it not only models the mean function but also captures its various dynamic features. Note that this approach treats the unknown mean function and it dynamics as a sample path of stochastic processes. This treatment is different from kernel smoothing and spline smoothing, where the mean function is regarded as a deterministic unknown function. Our treatment of the mean function is similar to that considered in the Gaussian process models for nonparametric Bayesian data analysis, where the mean function is governed by a prior Gaussian process with a mean function M(t; ϕ) and a covariance function C(t, t′; ϕ) with hyperparameters ϕ (Muller and Quintana, 2004; Rasmussen and Williams, 2006). However, the hierarchical structure of the proposed model enables us to make inference of the mean function and its dynamics simultaneously, especially the method provides the estimation and inference for parameters of the stochastic differential equation from noisy data. We note that this differs from the approaches to parameter estimation for models based on ordinary differential equations as recently developed by, for example, Ramsay et al. (2007) and Liang and Wu (2008).

The rest of the paper is organized as follows. Section 2 introduces the proposed model and considers two special cases. For each cases, we give model interpretation and discuss several interesting relationships. Section 3 develops Bayesian inference for stochastic functional data analysis models, where the likelihood is derived using Euler approximation and data augmentation. Section 4 presents a simulation study. In Section 5, the proposed models and methods are applied to estimate the PSA profile from prostate cancer data. Concluding remarks are given in Section 6. Technical details are included in the Web Appendix.

2. Stochastic Velocity Model

2.1 Model Specification

Consider a regression model for functional data of the form:

Y (t) = U (t, ω) + ɛ (t), ω \in Ω, t \in T_{o}

(1)

where Ω is the sample space, Inline graphic _o is the index set of observation times, defined as _o ≔ {t_j : t₁ < t₂ < … < t_J}, and U(·, ω) is an unknown function of interest to be estimated and $ɛ (t) \sim N (0, σ_{ɛ}^{2})$ at each time t. The goal is to estimate the function U(·, ω) and its derivatives given time series observations, Y_o = [Y (t₁), Y(t₂), …, Y(t_J)]^T. In this paper, we develop methods based on diffusion type models for estimation of U(·, ω) and its derivatives U⁽^p⁾(·, ω), p = 1, …, m − 1. Here, U(·, ω) is regarded as a realization of an underlying stochastic process U ≔ U(·, ·) and thus the observed data is the a sample path of the process plus measurement error.

Model (1) is useful to model the PSA level of prostate cancer nonparametrically, where U(t, ω) describes the population mean PSA process. To understand the dynamics of the process, we incorporate models of rate and/or higher order derivatives into model (1). To proceed, we begin by treating U(t, ω) in model (1) as a realization of U(t) ≔ U(t, ·), which enables us to express U(t) in the form of a stochastic diffusion model. That is, the stochastic process U satisfies the following ordinary differential equation(ODE),

\frac{d^{m - 1} U (t)}{d t^{m - 1}} = V (t),

(2)

and its (m−1)th order derivative V(t) is governed by a stochastic differential equation(SDE), given as follows:

d V (t) = a {V (t), ϕ_{s}} d t + b {V (t), ϕ_{s}} d W (t), t \in T_{s},

(3)

where W(t) is the standard Wiener process, ϕ_s is the parameter vector and Inline graphic _s ≔ {t : t₀ ≤ t ≤ t_J} is a continuous index set. In addition, the initial condition at time t₀ is assumed to be $θ (t_{0}) : = {[U (t_{0}), U^{(1)} (t_{0}), \dots, U^{(m - 2)} (t_{0}), V (t_{0})]}^{T} \sim N_{m} (0, σ_{0}^{2} I_{m})$ . In this paper, we use continuous time stochastic processes U and V to model the underlying dynamics. Let V ≔ {V(t, ω) : t ∈ Inline graphic _s, ω ∈ Ω}, defined on a probability space (Ω, ℱ ). We limit V to a one-dimensional continuous state space and a continuous index set _s. Similar definition and limitation hold for U. The SDE in (3) defines a stochastic diffusion process V, which is a Markov process with almost surely continuous sample paths. The existence and uniqueness of the process can be shown rigorously; see Grimmett and Stirzaker (2001, Chap. 13) and Feller (1970, Chap. 10).

The state equations (2) and (3), along with the observation equation (1), make up a continuous-discrete state space model (CDSSM) (Jazwinski, 1970, Chap. 6). Although inference methods will be demonstrated for the stochastic velocity model(SVM), namely the CDSSM with m = 2, they are applicable to any higher order of m. For example, m = 3 corresponding to a stochastic acceleration model(SAM). For SVM, the latent process U(t) represents position, and its first derivative V(t) is the velocity of U(t). Similarly, in the SAM, the processes θ(t) ≔ [U(t), U⁽¹⁾ (t), V(t)]^T represent the position, velocity and acceleration respectively. Coefficients a{V(t), ϕ_s} and b{V(t), ϕ_s} in (3) are typically specified according to the objectives of a given study. The drift term a{V(t), ϕ_s} can be interpreted as the instantaneous mean of velocity; it represents the expected conditional acceleration when V(t) denotes velocity. Likewise, b²{V(t), ϕ_s} measures the instantaneous variance or volatility of velocity. The diffusion model and the consideration of higher derivative in (2) allow considerable flexibility and the incorporation of various dynamic features into the two coefficients a{V(t), ϕ_s} ∈ ℝ and b{V(t), ϕ_s} ∈ ℝ⁺. By this model-based approach, various stochastic processes can be specified for V(t), the model fitting can be evaluated by likelihood-based model assessment, and forecasting can also be easily carried out.

Two special cases are considered in this paper. They are, (i) SVM with Wiener process V(t), denoted SVM-W, where a{V(t), ϕ_s} = 0, b{V(t), ϕ_s} = σ_ξ and $ϕ_{s} = σ_{ξ}^{2}$ ; (ii) SVM with Ornstein-Uhlenbeck(OU) process V(t), denoted SVM-OU, where a{V(t), ϕ_s} = −ρ{V(t) − ν̄}, b{V(t), ϕ_s} = σ_ξ and $ϕ_{s} = {[ρ, \bar{ν}, σ_{ξ}^{2}]}^{T}$ .

2.2 Wiener process for velocity

In SVM-W, V(t) follows a Wiener process, the instantaneous variance $σ_{ξ}^{2}$ measures the disturbance of velocity and influences the smoothness of U(t). With the smaller the $σ_{ξ}^{2}$ , V(t) will appear less wiggly and hence U(t) will be smoother. If σ_ξ = 0, the velocity V(t) is constant over time, so U(t) becomes a straight line.

Integrating (2) and (3) for m = 2, a{V(t), ϕ_s} = 0 and b{V(t), ϕ_s} = σ_ξ, we have

U (t) = U (t_{0}) + \int_{t_{0}}^{t} V (s) d s = U (t_{0}) + V (t_{0}) (t - t_{0}) + σ_{ξ} \int_{t_{0}}^{t} W (s) d s,

(4)

V (t) = V (t_{0}) + σ_{ξ} W (t) .

(5)

The velocity V(t) follows the Wiener process starting at V(t₀). The position U(t) follows a linear trend with deviation governed by the integrated Wiener process, $σ_{ξ} \int_{0}^{t} W (s) d s$ . As shown in the literature, there exists an interesting “equivalence” between smoothing splines and Bayesian estimation of SVM-W(Kimeldorf and Wahba, 1970; Wahba, 1978; Weinert and Sidhu, 1980). By equivalence, we mean that the two methods give the same estimate of U(t), see Web Appendix A for the detail.

2.3 Ornstein-Uhlenbeck process for velocity

The OU process originated as a model for the velocity of a particle suspended in fluid (Uhlenbeck and Ornstein, 1930). The velocity V(t) takes the form:

d V (t) = - ρ {V (t) - \bar{ν}} d t + σ_{ξ} d W (t), t \in T_{s},

(6)

where ρ ∈ ℝ⁺, ν̄ ∈ ℝ, and σ_ξ ∈ ℝ⁺. In contrast to the Wiener process, the OU process is a stationary Gaussian process with stationary mean ν̄ and variance $σ_{ξ}^{2} / 2 ρ$ . $σ_{ξ}^{2}$ has the same interpretation as that of the Wiener process. The instantaneous mean or the expected conditional acceleration − ρ{V(t) − ν̄} describes how fast the process moves. The larger the ρ, the more rapidly the process evolves toward ν̄. The farther V(t) departs from ν̄, the faster the process moves back towards ν̄.

If the data are equally spaced, namely δ_j ≔ t_j − t_j₋₁ = δ, V(t) coincides with the first order autogreession(AR(1)) process with autocorrelation exp(−ρδ). The converse also holds; AR(1) converges weakly to the OU process as δ_j→ 0 (Cumberland and Sykes, 1982).

For the PSA data example in figure 1 it is obvious that the scaled difference is varying around a certain level after about 3 years, which is more consistent with the behavior of an OU process than a Wiener process, suggesting that the SVM-OU may fit better.

3. Estimation and Inference

Statistical inference for CDSSM is challenging because we consider a vector of stochastic processes θ(t) ≔ [U(t), U⁽¹⁾(t), …, U⁽^m⁻²⁾(t), V(t)]^T simultaneously. This leads to a complex likelihood function, which may not even exist in closed form. Since an analytical solution of the SDE is rarely available, the conditional distribution of θ(t) given θ(t′), for t′ < t, which we call the exact transition density, does not have a simple closed form expression. Thus exact inference for the latent processes and its parameters is not generally possible. Hence, a numerical approximation will usually be needed. We will use the Euler approximation of the SDE to approximate the transition density, which enables us to obtain a simple closed form of the likelihood. To alleviate the errors associated with this approximation, it may be helpful to augment the observed data by adding virtual data at extra time points (Tanner and Wong, 1987), so that the interval between adjacent time points is shorter and a preciser approximation is achieved. Even when the exact transition density exists, using the approximated one will significantly simplify the estimation of parameters ϕ_s. A case in the point is the SVM-OU.

The resulting likelihood with this approximate method involves high-dimensional integrals, and we adopt a Bayesian approach using Markov Chain Monte Carlo (MCMC, Geman and Geman 1987, Gelfand and Smith 1990, Gilks et al. 1996) to estimate U(t), V(t) and the parameters $(σ_{ɛ}^{2}, ϕ_{s})$ , with the assistance of the simulation smoother(Durbin and Koopman, 2002).

Different approaches for inference for discretely observed diffusions are reviewed by Beskos et al. (2006). These includes numerical approximations to obtain likelihood functions (Aït-Sahalia, 2002) and methods based on iterated filtering (Ionides et al., 2006). The idea of Euler approximation has been applied to the stochastic volatility model in the finance literature. Pedersen (1995) applied the approximation and data augmentation to facilitate Monte Carlo integration and it was further developed by Durham and Gallant (2002). Bayesian analysis of the diffusion model, especially the stochastic volatility model, has been developed by many authors, including Elerian et al. (2001), Eraker (2001), and Roberts and Stramer (2001). Sorensen (2004) gave a survey on inference methods for stochastic diffusion models in finance. Distinctions between the models considered in financial statistics and the models considered in this paper are that we specify an observation equation to allow for the measurement errors. Most methods of inference for diffusion process do not extend easily when there is measurement error (Beskos et al., 2006). However, MCMC methods can be extended. A further distinction is that we consider the case m > 1 for the ODE, and that we apply the ODE and SDE to model various biomedical phenomena via U(t) and V(t). Thus, the SVM is focused on estimating the unknown sample paths of the latent stochastic process U(t) and V(t), whereas the diffusion models commonly used in the finance literature do not include an observation equation for measurement errors and typically focus on estimating the volatility or variance of the process of interest, for example, derivative securities.

3.1 Likelihood and Euler approximation

To develop Bayesian inference with an MCMC algorithm, we begin with the likelihood of the SVM:

[y_{o} ∣ ϕ_{o}, ϕ_{s}, θ_{0}] = \iint [y_{o} ∣ U_{o}, V_{o}, ϕ_{o}] [U_{o}, V_{o} ∣ θ_{0}, ϕ_{s}] d U_{o} d V_{o},

where U_o ≔ [U(t₁), U(t₂), …, U(t_J)]^T, V_o≔ [V(t₁), V(t₂), …, V(t_J)]^T, and y_o ≔ [y(t₁), y(t₂), …, y(t_J)]^T are vectors of the latent states and observations at t ∈ Inline graphic _o. θ₀ = [U(t₀), V(t₀)]^T is the unknown initial vector of the latent states, and [· ∣ ·] denote conditional density. The conditional density of the observations is given by

[y_{o} ∣ U_{o}, V_{o}, ϕ_{o}] = \prod_{j = 1}^{J} ϕ [y (t_{j}) ∣ U (t_{j}), σ_{ɛ}^{2}],

since the observations are mutually independent given the latent states and follow a normal distribution according to model (1), where ϕ(· ∣ U_G, $σ_{G}^{2}$ ) is the normal density with mean U_G and variance $σ_{G}^{2}$ . In principle, the density of latent states U_o and V_o can be written as:

[U_{o}, V_{o} ∣ θ_{0}, ϕ_{s}] = \prod_{j = 1}^{J} [U (t_{j}), V (t_{j}) ∣ U (t_{j - 1}), V (t_{j - 1}), ϕ_{s}],

due to the Markov property. The exact transition density [U(t_j), V(t_j) ∣ U(t_j₋₁), V(t_j₋₁), ϕ_s) exists in a closed form only for few models with simple SDEs. Even in those cases, the exact transition density may have a complex form. For SVM-OU,

[U (t_{j}), V (t_{j}) ∣ U (t_{j - 1}), V (t_{j - 1}), ϕ_{s}] = N_{2} (m_{O U}, V_{O U})

(7)

with

\begin{array}{l} m_{O U} = {[U (t_{j - 1}) + \bar{ν} δ_{j} + {V (t_{j - 1}) - \bar{ν}} {\frac{1 - exp (- ρ δ_{j})}{ρ}}, \bar{ν} + {V (t_{j - 1}) - \bar{ν}} exp (- ρ δ_{j})]}^{T}, \\ V_{O U} = σ_{ξ}^{2} [\begin{matrix} \frac{δ_{j}}{ρ^{2}} + \frac{1}{2 ρ^{3}} {- 3 + 4 exp (- ρ δ_{j}) - exp (- 2 ρ δ_{j})} & \frac{1}{2 ρ^{2}} {1 - 2 exp (- ρ δ_{j}) + exp (- 2 ρ δ_{j})} \\ \frac{1}{2 ρ^{2}} {1 - 2 exp (- ρ δ_{j}) + exp (- 2 ρ δ_{j})} & \frac{1}{2 ρ} {1 - exp (- 2 ρ δ_{j})} \end{matrix}], \end{array}

the proof of which is given by Zhu (2010). When using data augmentation, we may take the component wise first-order Taylor approximation of m_OU, V_OU with respected to δ_j and get

{\tilde{m}}_{O U} = {[U (t_{j}) + V (t_{j}) δ_{j}, V (t_{j}) - ρ {V (t_{j}) - \bar{ν}} δ_{j}]}^{T},

(8)

{\tilde{V}}_{O U} = σ_{ξ}^{2} [\begin{matrix} 0 & 0 \\ 0 & δ_{j} \end{matrix}] .

(9)

We note that these are the same expressions as those obtained by applying Euler approximation to SVM-OU. Thus although m̃_OU and Ṽ_OU as given in (8) and (9) are not strictly necessary for calculating [U_o, V_o ∣ θ₀, ϕ_s] when m_OU and V_OU are available, they however lead to a simpler form for parameter ρ, which is much easier to update and converges much faster in the following MCMC algorithm.

For a general SDE, e.g. (3), the forms for U(t) and V(t) are:

\begin{array}{l} U (t) = U (t_{0}) + \int_{t_{0}}^{t} V (s) d s, \\ V (t) = V (t_{0}) + \int_{t_{0}}^{t} a {V (s), ϕ_{s}} d s + \int_{t_{0}}^{t} b {V (s), ϕ_{s}} d W (s), t \in T, s \end{array}

where [U(t_j), V(t_j) ∣ U(t_j₋₁), V(t_j₋₁), ϕ_s] is implicitly defined but in general is not available analytically. To deal with this difficulty we use the Euler approximation to obtain a numerical approximation of the transition density in the general SDE case.

The Euler approximation is a discretization method for the SDE through the first-order strong Taylor approximation (Kloeden and Platen, 1992). The resulting discretized versions of the ODE and the SDE in (2) and (3) are given by, respectively,

U^{(J)} (t_{j}) = U^{(J)} (t_{j - 1}) + V^{(J)} (t_{j - 1}) δ_{j},

(10)

V^{(J)} (t_{j}) = V^{(J)} (t_{j - 1}) + a {V^{(J)} (t_{j - 1}), ϕ_{s}} δ_{j} + b {V^{(J)} (t_{j - 1}), ϕ_{s}} η_{j}, t_{j} \in T_{o}

(11)

where δ_j ≔ t_j − t_j₋₁ and η_j ≔ W(t_j) − W(t_j₋₁) ∼ Inline graphic (0, δ_j). For t ∈ [t_j₋₁, t_j], a linear interpolation takes the form

{\tilde{V}}^{(J)} (t) = V^{(J)} (t_{j - 1}) + \frac{t - t_{j - 1}}{t_{j} - t_{j - 1}} (V^{(J)} (t_{j}) - V^{(J)} (t_{j - 1})), t \in T_{s} .

A similar linear interpolation is applied to Ũ⁽^J⁾(t). Bouleau and Lepingle (1992) showed that under some regularity conditions, with constant C, the L^p-norm of the discretization error is bounded and given by:

{‖ sup_{t \in T_{s}} | V (t) - {\tilde{V}}^{(J)} (t) | ‖}_{p} \leq C {(\frac{1 + log J}{J})}^{1 / 2} .

This indicates that if J is sufficiently large, which can be achieved when the maximum of δ_j is sufficiently small for fixed interval [t₁, t_J], then Ṽ⁽^J⁾(t) will be close to its continuous counterpart V(t) with arbitrary precision.

In the rest of this paper, we assume the δ_j is sufficiently small and the approximation is well achieved. To simplify notation, we replace Ṽ⁽^J⁾(t) with V(t) and Ũ⁽^J⁾(t) with U(t), for t ∈ Inline graphic _s. Under these assumptions, the exact transition density, if it exists, is well approximated by the approximate transition density, as shown in the SVM-OU. Note that equations (10) and (11) imply the approximate transition densities of U(t) and V(t) are Gaussian for t ∈ _o, because they are linear combinations of η_j, U(t₀) and V(t₀), which are all Gaussian random variables. Under the Euler approximation, [U_o, V_o ∣ θ₀, ϕ_s] degenerates to ≺ V_o ∣ V(t₀), ϕ_s ≻ because of equation (10), where ≺ · ∣ · ≻ denotes the approximate conditional density. Consequently, the likelihood based on the approximated processes U(t) and V(t) for t ∈ Inline graphic _o is given by

≺ y_{o} ∣ ϕ_{o}, ϕ_{s}, V (t_{0}) ≻ = \int [y_{o} ∣ V_{o}, ϕ_{o}] ≺ V_{o} ∣ V (t_{0}), ϕ_{s} ≻ d V_{o},

where

\begin{matrix} [y_{o} ∣ V_{o}, ϕ_{o}] & = \prod_{j = 1}^{J} ϕ (y (t_{j}) ∣ U (t_{j}), σ_{ɛ}^{2}), \\ ≺ V_{o} ∣ V (t_{0}), ϕ_{s} ≻ & = \prod_{j = 1}^{J} ≺ V (t_{j}) ∣ V (t_{j - 1}), ϕ_{s} ≻, \end{matrix}

and

V (t_{j}) ∣ V (t_{j - 1}), ϕ_{s} \sim N (V (t_{j - 1}) + a {V (t_{j - 1}), ϕ_{s}} δ_{j}, b^{2} {V (t_{j - 1}), ϕ_{s}} δ_{j}) .

3.2 Data augmentation

If observational time intervals are not short enough, the Euler approximation will not work well, because linear interpolation of V(t) and U(t) for t ∈ Inline graphic _o is not accurate enough. A solution to reduce the approximation error is simply to add sufficiently dense virtual data in each time interval and consider the latent states at these times in addition to those at t ∈ _o. The corresponding values of Y(·) at added times can be regarded as missing data. They will be sampled as part of the MCMC scheme in the Bayesian analysis.

To carry out data augmentation, we add M_j equally spaced data at times t_j_−1,1, …, t_j_−1,_{M_j} over a time interval (t_j₋₁, t_j]. Denote $δ_{M_{j}} : = \frac{δ_{j}}{M_{j} + 1}$ . The resulting augmented index set is Inline graphic _ao = {t_j,m : j = 0, 1, …, J, m = 0, 1, 2, …, M_j, M_J = 0}. Note that _ao = _o, if M_j = 0 for all j. The observed data and the augmented data are denoted by y_o ≔ [y(t_1,0), y(t_2,0), …, y(t_J_,0)]^T and $y_{a} : = {[y_{a, 0}^{T}, y_{a, 1}^{T}, \dots, y_{a, J - 1}^{T}]}^{T}$ , respectively, where y_a,j ≔ [y(t_j_,1), y(t_j_,2), …, y(t_{j,M_j})]^T. We also denote $V : = {[V_{1}^{T}, V_{2}^{T}, \dots, V_{J}^{T}]}^{T}$ where V_j ≔ [V(t_j_,0), V(t_j_,1), …, V(t_{j,M_j})]^T. Similar notation is applied to U and U_j. For ease of exposition, we let y_j,m ≔ y(t_j,m), and similarly for other variables.

If the exact transition densities exist, the augmented likelihood is

[y_{o} ∣ ϕ_{o}, ϕ_{s}, θ_{0}] = ∭ [y_{o}, y_{a} ∣ U, V, ϕ_{o}] [U, V ∣ θ_{0}, ϕ_{s}] d y_{a} d U d V,

where

\begin{matrix} [y_{o}, y_{a} ∣ U, V, ϕ_{o}] & = \prod_{j = 0}^{J} \prod_{m = 0}^{M_{j}} ϕ (y_{j, m} ∣ U_{j, m}, σ_{ɛ}^{2}), \\ [U, V {∣ θ}_{0}, ϕ_{s}] & = \prod_{j = 1}^{J} \prod_{m = 1}^{M_{j} + 1} [U_{j - 1, m}, V_{j - 1, m} ∣ U_{j - 1, m - 1}, V_{j - 1, m - 1}, ϕ_{s}] . \end{matrix}

If the exact transition densities do not exist, the discretized versions of the ODE and the SDE are modified from t ∈ Inline graphic _o to t ∈ _ao and given as follows:

\begin{matrix} U_{j - 1, m} & = U_{j - 1, m - 1} + V_{j - 1, m - 1} δ_{M_{j}}, \\ V_{j - 1, m} & = V_{j - 1, m - 1} + a {V_{j - 1, m - 1}, ϕ_{s}} δ_{M_{j}} + b {V_{j - 1, m - 1}, ϕ_{s}} η_{j - 1, m}, \end{matrix}

where t_0,0 ≔ t₀, t_j_−1,_{M_j}₊₁ ≔ t_j_,0, and η_j_−1,_m ≔ W(t_j_−1,_m) − W(t_j_−1,_m₋₁) ∼ Inline graphic (0, δ_{M_j}). The approximate transition density and the corresponding likelihood are given in Section 3.3.

3.3 Bayesian inference

MCMC enables us to draw samples from the joint posterior [θ₀, V, ϕ_o, ϕ_s ∣ y_o] or [θ₀, V, ϕ_o, ϕ_s, y_a ∣ y_o]. For the later case, we will augment M_j equally spaced data points between time interval (t_j₋₁, t_j]. To assess whether M_j is sufficiently large we suggest a sensitivity analysis in which M_j is increased until the parameter estimates are stable. An illustration of this is given in Section 4. We may also compare the trace plots of parameter estimates and DIC values for different M_j values to evaluate the numerical performance of MCMC and goodness-of-fit respectively.

MCMC draws samples from [θ₀, V, ϕ_o, ϕ_s, y_a ∣ y_o] by iteratively simulating from each full conditional density of θ₀, V, ϕ_o, ϕ_s, and y_a. The joint posterior density is proportional to the product of the likelihood and prior densities:

[θ_{0}, V, ϕ_{o}, ϕ_{s}, y_{a} ∣ y_{o}] \propto [y_{o} ∣ θ_{0}, V, ϕ_{o}, ϕ_{s}] [y_{a} ∣ θ_{0}, V, ϕ_{o}, ϕ_{s}] ≺ V ∣ θ_{0}, ϕ_{s} ≻ \times [θ_{0}] [ϕ_{s}] [ϕ_{o}],

where

\begin{matrix} [y_{o} ∣ θ_{0}, V, ϕ_{o}, ϕ_{s}] = & \prod_{j = 1}^{J} ϕ (y_{j, 0} ∣ U_{j, 0} (θ_{0}, V), ϕ_{o}), \\ [y_{a} ∣ θ_{0}, V, ϕ_{o}, ϕ_{s}] = & \prod_{j = 0}^{J - 1} \prod_{m = 1}^{M_{j}} ϕ (y_{j, m} ∣ U_{j, m} (θ_{0}, V), ϕ_{o}), \\ ≺ V ∣ θ_{0}, ϕ_{s} ≻ = & \prod_{j = 1}^{J} \prod_{m = 1}^{M_{j} + 1} ≺ V_{j - 1, m} ∣ V_{j - 1, m - 1}, ϕ_{s} ≻, \end{matrix}

and V_j_−1,_{M_j}₊₁ = V_j_,0. The approximate transition density ≺ V_j_−1,_m ∣ V_j_−1,_m₋₁, ϕ_s ≻ with augmented data is given by,

\begin{matrix} ≺ V_{j - 1, m} ∣ V_{j - 1, m - 1}, ϕ_{s} ≻ : = \\ ϕ (V_{j - 1, m} ∣ V_{j - 1, m - 1} + a {V_{j - 1, m - 1}, ϕ_{s}} δ_{M_{j}}, b^{2} {V_{j - 1, m - 1}, ϕ_{s}} δ_{M_{j}}), \end{matrix}

and [θ₀], [ϕ_s], [ϕ_o] are non-informative prior densities. See Web Appendix B for specification of the prior distr ibutions and details of MCMC algorithm. We use the simulation smoother(Durbin and Koopman, 2002) to achieve an efficient MCMC algorithm. In the simulation smoother, the latent states are recursively backward sampled in blocks instead of one state at a time. This leads to low autocorrelation between successive draws, and hence faster convergence.

3.4 Posterior forecasting with SVM

A desirable property of this approach is the ease of deriving forecasts of states at future times. To forecast the k-step future latent state $θ_{J + k}^{f}$ given the observations y_o, we simulate $θ_{J + k}^{f}$ from the following posterior forecasting distribution,

[θ_{J + k}^{f} ∣ y_{o}] = ∭ [θ_{J + k}^{f} ∣ y_{a}, y_{o}, ϕ_{s}, ϕ_{o}] [y_{a}, ϕ_{s}, ϕ_{o} ∣ y_{o}] d y_{a} d ϕ_{s} d ϕ_{o},

where y_a, ϕ_s and ϕ_o are drawn from [y_a, ϕ_s, ϕ_o ∣ y_o] by the MCMC algorithm. Given y_a, ϕ_s and ϕ_o, we first discretize the SVM. For SVM-OU, this will lead to equations (B.3) and (B.4) in Web Appendix B. Let θ_J denote the latent state of the last observation. Then, Inline graphic (θ_J) = a_J and ar(θ_J) = R_J are obtained via the Kalman filter. Moreover, it follows from (B.4) that the mean and variance of $θ_{J + k}^{f}$ can be recursively obtained as follows:

\begin{matrix} a_{J + k} & = G_{J + k - 1} a_{J + k - 1} \\ R_{J + k} & = G_{J + k - 1} R_{J + k - 1} G_{J + k - 1}^{T} + \sum_{ω_{J + k - 1}}, k = 1, 2, \dots, \end{matrix}

where G_J₊_k₋₁ and Σ_{ω_J+k−1} are specified in Web Appendix B for the SVM-OU and SVM-W, respectively. Finally, we draw $θ_{J + k}^{f}$ from $θ_{J + k}^{f} ∣ y_{a}, y_{o}, ϕ_{s}, ϕ_{o} \sim N (a_{J + k}, R_{J + k})$ . By this way, the forecasts at future times take the variation of parameter draws into consideration.

4. Simulations

Using Euler approximation enables us to make inference for the SVMs with the analytically intractable exact transition densities. Although such flexibility seems to induce approximation errors, in practice these errors would be alleviated by applying the data augmentation. We are interested in assessing the performance of the estimation of the parameters, U(t) and V(t) as the number of data augmentation changes, so we simulate 100 replicate datasets from the SVM-OU with exact transition density (7). The parameters are chosen to be close to the ones estimated from a real dataset analyzed in the following section. Each dataset includes 40 observations, equally spaced with interval length 0.5. We fit each dataset by the SVM-OU, under the following three data augmentation schemes: (1) No augmentation; (2) one data point added in the middle of every two adjacent observations; and (3) three evenly spaced data points are inserted between every two adjacent observations. In all cases, MCMC is run for 45,000 iterations, in which the first 35,000 runs are discarded as the burn-in and every 10th draws are saved. Table 1 presents the Bias E(ϕ̃_0.5 − ϕ), and mean squared error(MSE) E(ϕ̃_0.5 − ϕ)² of the posterior median ϕ̃_0.5 for each parameter ϕ. These results indicate that the strategy of data augmentation, even for the single data point augmentation, would reduce estimation bias rate for variance parameter $σ_{ξ}^{2}$ and drift parameter ρ. The estimations of other parameters $σ_{ɛ}^{2}$ and ν̄ are little affected by the data augmentation. As shown in Web Table 2, the data augmentation improves the estimation of U(t) and V(t) as well, indicated by the smaller average absolute Bias $\frac{1}{40} \sum_{j = 1}^{40} E (| \hat{θ} (t_{j}) - θ (t_{j}) |)$ and average MSE $\frac{1}{40} \sum_{j = 1}^{40} E {(\hat{θ} (t_{j}) - θ (t_{j}))}^{2}$ of posterior means. We observed that the computation time of the proposed MCMC algorithm is proportional to the size of observed and augmented data, and thus the data augmentation would significantly increase the computation burden, which discourages us to add a massive number of augmented data. Those simulation results suggest that in practice we may add few augmented data and achieve accurate estimation.

Table 1. Simulation results for the estimation of SVM-OU parameters and stable rates. For data augmentation, 0, 1 and 3 data points are added between every two adjacent observations.

No. Data points Augmented

Parameter

Truth

Bias

MSE

σ_{ɛ}^{2}

0.01

-6.723E-04

1.590E-05

σ_{ξ}^{2}

0.2

-9.837E-02

1.322E-02

-1.863E-01

5.239E-02

V̄

0.3

-7.612E-03

1.297E-02

σ_{ɛ}^{2}

0.01

1.120E-04

1.521E-05

σ_{ξ}^{2}

0.2

-6.089E-02

1.130E-02

-6.426E-02

3.793E-02

V̄

0.3

-9.181E-03

1.304E-02

σ_{ɛ}^{2}

0.01

4.489E-04

1.611E-05

σ_{ξ}^{2}

0.2

-3.905E-02

1.219E-02

8.856E-03

4.687E-02

V̄

0.3

-9.955E-03

1.284E-02

Open in a new tab

To compare, we analyze these simulated datasets by the SVM-W, and an ODE model, respectively. Note that the posterior mean in the SVM-W is equivalent to the smooth natural cubic spline. The ODE model used in the analysis satisfies the observation equation (1) for functional response, and the mean function U(·) is governed by the following two ODEs:

\begin{matrix} d U (t) & = V (t) d t, \\ d V (t) & = - β_{3} (V (t) - β_{1}) d t, \end{matrix}

(12)

with boundary conditions U(t₀) = β₀ + β₂ and V(t₀) = β₁ − β₂β₃ at t₀ = 0. This formulation leads to a deterministic mean function U(t) = β₀ + β₁t + β₂ exp(−β₃t), which implies that ODE model is essentially a parametric nonlinear regression model. As shown in Table 2, the proposed SVM-OU model is superior to the ones from the other two models in the estimation of U(t) and V(t), when the SVM-OU model generates the data. We are also interested in exploring how the proposed methods perform when data are generated form other model different from the SVM-OU. For this, we further simulate another 100 replicate datasets from above the ODE model with β₀ ∼ Inline graphic (−2.5, 0.01), β₁ ∼ (0.3, 0.01), β₂ ∼ (9, 0.04), β₃ ∼ (1, 0.01) and σ² = 0.04. The processes U(t) and V(t) are then estimated by the SVM-OU, SVM-W and ODE model, respectively. As shown Table 2 when data is generated from the ODE model, estimated U(t) and V(t) by the SVM-OU are close to those by the ODE model, and better than the SVM-W in terms of smaller average absolute bias and average MSE.

Table 2. Simulation results for the estimation of U(t) and V(t), when data simulated from SVM-OU and ODE model. For data augmentation, 0, 1 and 3 data points are added between every two adjacent observations.

Simulation Model	Fitting Model	No. data points augmented	States	Bias	MSE
SVM-OU	SVM-OU	0	U(t)	0.008	0.005
	SVM-OU	0	V(t)	0.079	0.085
	SVM-OU	1	U(t)	0.008	0.005
	SVM-OU	1	V(t)	0.046	0.048
	SVM-OU	3	U(t)	0.008	0.005
	SVM-OU	3	V(t)	0.028	0.038
	SVM-WN	0	U(t)	0.014	0.006
	SVM-WN	0	V(t)	0.052	0.079
	ODE	0	U(t)	0.051	0.099
	ODE	0	V(t)	0.054	0.133

ODE	SVM-OU	3	U(t)	0.043	0.026
	SVM-OU	3	V(t)	0.045	0.090
	SVM-WN	0	U(t)	0.066	0.067
	SVM-WN	0	V(t)	0.135	0.303
	ODE	0	U(t)	0.030	0.018
	ODE	0	V(t)	0.020	0.018

Open in a new tab

In addtion to the above equally spaced data, we are also consider some other scenarios of datasets simulated from the SVM-OU, including: 1), the unequally spaced data resulted from uniform deletion of 15 data points from the original equally spaced data; 2), unbalanced and unequally spaced data resulted from uniform random deletion of 10 data points in the first half period (0,10] and of 5 data points in the second period (10,20] from the original equally spaced data simulated by the SVM-OU; 3), The data similar to the scenario 2) except with the reserved allocation of the unbalanced data points; 4) and 5), The 5-fold larger variance $σ_{ɛ}^{2} = 0.05$ and 10 fold larger variance $σ_{ɛ}^{2} = 0.1$ . Web Table 1 lists the summarized results of these five scenarios, regarding the estimation of U(t) and V(t) by the SVM-OU. The results from the first three scenarios indicate that the unbalance data distribution between the first and second half period times has different impact on the estimation of U(t) and V(t) in the average absolute bias and MSE. The second case with less data available in the first half time period is the worst. In the last two scenarios with increased measurement errors, the simulation results suggest that the estimation for U(t) and V(t) becomes a harder task.

5. Application

We now demonstrate an application where the diffusion models are used to investigate dynamic features of the PSA profile for a prostate cancer patient. We fit the SVM and SAM with the Wiener process and the OU process V(t), respectively. We also forecast the future profile of PSA for both models. The models are evaluated by the DIC model selection criterion(Speigelhalter et al., 2003). DIC = D̄ + P_D, where D̄ is posterior mean of the deviance and P_D is the effective number of parameters. DIC has been shown asymptotically to be a generalization of Akaikes information criterion(AIC). Similar to AIC, DIC trades off the model fitting by the model complex and can be easily computed from the MCMC output. The smaller the DIC value indicates better model-fitting. For each application, the posterior draws are from a 400, 000 iteration chain with 200, 000 burn-in, and every 100th draw is selected. Convergence was assessed by examination of trace plots and autocorrelation plots.

5.1 Prostate specific antigen

PSA is a biomarker used to monitor recurrence of prostate cancer after treatment with radiation therapy. When PSA remains low and its rate varying around zero with low volatility, the tumor is stable and the patient may be cured. If PSA increases dramatically with high rate, it is a strong sign of the tumor re-growing and that the treatment did not cure the patient. Therefore, PSA has strong prognostic significant and is important for making clinical decisions. We want to estimate dynamics of the PSA marker, including PSA level, rate and the volatility of rate. We analyzed the PSA profile of one patient using the SVM and SAM model to estimate PSA(t) nonparametrically. For these data illustrated in the introduction, the average time interval between two observations was 0.4 years with minimum 0.016 and maximum 0.731 years. We added 32 virtual data points to reduce the time span between any pairs of consecutive time points to less than 0.25 year.

Table 3 and Web Table 2 shows the means and quantiles of the SVM and SAM parameters from the Wiener and the OU process V(t), respectively. Figure 2 and Web Figure 1 shows the posterior means and the corresponding 95% credible intervals of the latent states for SVM and SAMs. Here, the four models demonstrate similar trends of the PSA level. However, the rates in the SVMs fluctuates with higher volatility, compare to the SAMs. In addition, there are the non-zero instantaneous mean terms in the SVM-OU and SAM-OUs, whose rates evolve more stably than those in the models with Wiener process. The SVM-OU gives the smallest DIC, which indicates the best model fitting. In this model, the posterior mean of ν̄ is 0.385 with 95% credible interval [0.143, 0.626]. This stable and clearly positive rate after year 2.2 is a strong indicator of prostate cancer recurrence.

Table 3. PSA data: Posterior mean and quantiles for the SVMs.

Wiener Process

OU Process

D̄ = −45.1757, P_D = 12.303, DIC = −32.873

D̄ = −45.935, P_D = 10.658, DIC = −35.277

Mean

2.5%

50%

97.5%

Mean

2.5%

50%

97.5%

σ_{ɛ}^{2}

0.014

0.009

0.003

0.012

0.036

0.012

0.005

0.012

0.024

σ_{ξ}^{2}

0.961

0.589

0.297

0.809

2.548

0.177

0.181

0.037

0.122

0.682

ν̄

0.385

0.124

0.143

0.382

0.626

1.150

0.271

0.756

1.106

1.798

Open in a new tab

PSA posterior summary and forecasting: Plots of posterior means(—) and 95% credible intervals(gray shades) for the SVM with the Wiener process and OU process, respectively, till year 11.2. The future rates and levels are forecasted for the next 3 years, illustrated by the forecasting means (—) and 95% forecasting credible intervals(gray shades). The 5 randomly picked realizations for each plot are also illustrated.

Figure 2 illustrates the forecasting of the PSA latent states for the next 3 years, starting from year 11.2, by SVM-W and SVM-OU. The future states are sampled every 0.25 years and then linearly interpolated, from the posterior forecasting distribution given in Section 3.4. The SVM-OU gives a forecast with narrower credible intervals than the SVM-W. This result seems clinically more sensible, because several studies, including ours presented in Section 4.2, have found that the rate of PSA follows a stationary process. In contrast, a Wiener process corresponds to a nonstationary process for the rate of change of PSA, resulting in an unbounded variance of the forecast over time. This lacks relevant clinical interpretation. The comparison in the forecasts indicates that specification of the latent process is crucial for adequate forecasting, even though their estimates of the mean function are quite similar. A similar phenomenon has been reported by Taylor and Law (1998) in the linear mixed model of CD4 counts, where the covariance structure matters for individual level predictions, although it affects little the estimation of fixed effects.

6. Discussion

Diffusion type models are widely applied in areas such as finance, physics and ecology. However, other than through the connection with the smoothing spline, they have not played a major role in functional data analysis or nonparametric regression. In this paper we develop a framework that sheds light on more general diffusion models to be used in functional data analysis. Unlike in some applications where the form of the diffusion model is determined by the context, we specify a general form based on an ODE, a SDE and measurement error. The key advantage of the proposed diffusion model is that is addresses not only the mean function nonparametrically but also its dynamics, which are also of great interest in many application. Based on this model we adapt and develop existing ideas for estimation and inference for diffusion models. An additional attractive feature of this stochastic model approach to functional data analysis is that forecasting can be easily implemented.

As noted above, the SVM with the Wiener process corresponds to the smoothing spline with m = 2. If no augmented data are involved, the model can be rewritten as a linear mixed model for both situations of exact and approximate transition densities. As shown in the Web Appendix D, when data are equally spaced, the latter case is identical to a linear spline model with truncated line function basis (Ruppert et al., 2003). In this sense, the linear spline model can be regarded as a numerical approximation of the smoothing spline. In addition, with no data augmentation, one can easily fit the SVM with the Wiener process using existing software for the linear mixed model.

A number of extensions of the SVM and SAM are possible. Generalizing SVM and SAM to analyze discrete-valued outcomes is of interest. For the SVM, we have an explicit expression for the observation equation given by:

\begin{matrix} Y (t_{j}) & = U (t_{j}) + ɛ (t_{j}) \\ = \begin{matrix} {[\begin{matrix} 1 \\ 0 \end{matrix}]}^{T} & [\begin{matrix} U (t_{j}) \\ V (t_{j}) \end{matrix}] + ɛ (t_{j}) \end{matrix} \\ = F^{T} θ (t_{j}) + ɛ (t_{j}), j = 1, 2, \dots, J . \end{matrix}

The observation equation can be expressed as,

d Φ {Y (t) ∣ F^{T} θ (t), σ_{ɛ}}, t \in T_{o},

(13)

where Φ(· ∣ U_G, σ_G) is the normal CDF with mean U_G and standard deviation σ_G. Then, (13) can be extended to,

d F {Y (t) ∣ θ (t), ϕ_{o}},

where one specifies the corresponding observation distribution Inline graphic in the exponential family, with state equations (2) and (3) unchanged.

Supplementary Material

Supplementary Data

NIHMS276586-supplement-Supplementary_Data.pdf^{(109.2KB, pdf)}

Acknowledgments

This research was partially supported by National Cancer Institute grants CA110518 and CA69568. The authors are thankful to Dr. Naisysin Wang and Dr. Brisa Sanchez for the helpful discussions, and the editor, associate editor and two anonymous reviewers for valuable comments and suggestions.

Footnotes

Supplementary Materials: Web Appendices, Tables, and Figures referenced in Sections 2, 3, 4, 5 and 6 are available under the Paper Information link at the Biometrics website http://www.biometrics.tibs.org.

References

Aït-Sahalia Y. Maximum likelihood estimation of discretely sampled diffusions: a closed-form approximation approach. Econometrica. 2002;70:223–262. [Google Scholar]
Beskos A, Papaspiliopoulos O, Roberts G, Fearnhead P. Exact and computationally efficient likelihood-based estimation for discretely observed diffusion processes (with discussion) Journal of the Royal Statistical Society B. 2006;68:333–382. [Google Scholar]
Bouleau N, Lepingle D. Numerical Methods for Stochastic Process. New York: Wiley; 1992. [Google Scholar]
Cumberland W, Sykes Z. Weak convergence of an autoregressive process used in modeling population growth. Journal of Applied Probability. 1982;19:450–455. [Google Scholar]
Durbin J, Koopman SJ. A simple and efficient simulation smoother for state space time series analysis. Biometrika. 2002;89:603–616. [Google Scholar]
Durham G, Gallant A. Numerical techniques for maximum likelihood estimation of continuous-time diffusion processes. Journal of Business & Economic Statistics. 2002;20:297–338. [Google Scholar]
Elerian O, Chib S, Shephard N. Likelihood inference for discretely observed non-linear diffusions. Econometrica. 2001;69:959–993. [Google Scholar]
Eraker B. MCMC Analysis of Diffusion Models With Application to Finance. Journal of Business & Economic Statistics. 2001;19:177–191. [Google Scholar]
Feller W. An Introduction to Probability Theory and Its Application. New York: Springer; 1970. [Google Scholar]
Gelfand AE, Smith AFM. Sampling-based approaches to calculating marginal densities. Journal of the American Statistical Association. 1990;85:398–409. [Google Scholar]
Geman S, Geman D. Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. Readings in Computer Vision: Issues, Problems, Principles, and Paradigms pages. 1987:564–584. doi: 10.1109/tpami.1984.4767596. [DOI] [PubMed] [Google Scholar]
Gilks W, Richardson S, Spiegelhalter D. Markov Chain Monte Carlo in Practice. New York: Chapman & Hall/CRC; 1996. [Google Scholar]
Green P, Silverman B. Nonparametric Regression and Generalized Linear Models. New York: Chapman & Hall; 1994. [Google Scholar]
Grimmett G, Stirzaker D. Probability and Random Processes. Oxford: Oxford University press; 2001. [Google Scholar]
Ionides E, Breto C, King A. Inference for nonlinear dynamical systems. Proceedings of the National Academy of Sciences USA. 2006;103:18438. doi: 10.1073/pnas.0603181103. [DOI] [PMC free article] [PubMed] [Google Scholar]
Jazwinski A. Stochastic Processes and Filtering Theory. New York: Academic Press; 1970. [Google Scholar]
Kimeldorf GS, Wahba G. A correspondence between bayesian estimation on stochastic processes and smoothing by splines. Annals of Mathematical Statistics. 1970;41:495–502. [Google Scholar]
Kloeden P, Platen E. Numerical Solution of Stochastic Differential Equations. New York: Springer; 1992. [Google Scholar]
Liang H, Wu H. Parameter estimation for differential equation models using a framework of measurement error in regression models. Journal of the American Statistical Association. 2008;103:1570–1583. doi: 10.1198/016214508000000797. [DOI] [PMC free article] [PubMed] [Google Scholar]
Muller P, Quintana F. Nonparametric Bayesian Data Analysis. Statistical Science. 2004;19:95–110. [Google Scholar]
Pedersen AR. A new approach to maximum likelihood estimation for stochastic differential equations based on discrete observations. Scandinavian Journal of Statistics. 1995;22:55–71. [Google Scholar]
Proust-Lima C, Taylor JMG, Williams S, Ankerst D, Liu N, Kestin L, Bae K, Sandler H. Determinants of change of prostate-specific antigen over time and its association with recurrence following external beam radiation therapy of prostate cancer in 5 large cohorts. International journal of radiation oncology, biology, physics. 2008;72:782. doi: 10.1016/j.ijrobp.2008.01.056. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ramsay J, Hooker G, Campbell D, Cao J. Parameter estimation for differential equations: a generalized smoothing approach. Journal of the Royal Statistical Society B. 2007;69:741–796. [Google Scholar]
Ramsay J, Silverman B. Functional Data Analysis. New York: Springer; 2005. [Google Scholar]
Rasmussen C, Williams C. Gaussian Processes for Machine Learning. New York: Springer; 2006. [Google Scholar]
Roberts GO, Stramer O. On inference for partially observed nonlinear diffusion models using the Metropolis-Hastings algorithm. Biometrika. 2001;88:603–621. [Google Scholar]
Ruppert D, Wand M, Carroll R. Semiparametric Regression. Cambridge: Cambridge University Press; 2003. [Google Scholar]
Sorensen H. Parametric inference for diffusion processes observed at discrete points in time: a survey. International Statistical Review. 2004;72:337–354. [Google Scholar]
Speigelhalter D, Best N, Carlin B, van der Linde A. Bayesian measures of model complexity and fit (with discussion) Journal of the Royal Statistical Society B. 2003;64:583–616. [Google Scholar]
Tanner M, Wong W. The calculation of posterior distributions by data augmentation. Journal of the American Statistical Association. 1987;82:528–540. [Google Scholar]
Taylor JMG, Law N. Does the covariance structure matter in longitudinal modelling for the prediction of future CD4 counts? Statistics in Medicine. 1998;17:2381–2394. doi: 10.1002/(sici)1097-0258(19981030)17:20<2381::aid-sim926>3.0.co;2-s. [DOI] [PubMed] [Google Scholar]
Uhlenbeck G, Ornstein L. On the Theory of the Brownian Motion. Physical Review. 1930;36:823–841. [Google Scholar]
Wahba G. Improper priors, spline smoothing and the problem of guarding against model errors in regression. Journal of the Royal Statistical Society B. 1978;40:364–372. [Google Scholar]
Wahba G. Spline Models for Observational Data. Society for Industrial Mathematics; 1990. [Google Scholar]
Weinert H, Byrd H, Sidhu G. A stochastic framework for recursive computation of spline functions: Part ii, smoothing splines. J Optimization Theory and Applications. 1980;01:255–268. [Google Scholar]
Zhu B. PhD thesis. University of Michigan; Ann Arbor, MI: 2010. Stochastic Dynamic Models for Functional Data. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Data

NIHMS276586-supplement-Supplementary_Data.pdf^{(109.2KB, pdf)}

[R1] Aït-Sahalia Y. Maximum likelihood estimation of discretely sampled diffusions: a closed-form approximation approach. Econometrica. 2002;70:223–262. [Google Scholar]

[R2] Beskos A, Papaspiliopoulos O, Roberts G, Fearnhead P. Exact and computationally efficient likelihood-based estimation for discretely observed diffusion processes (with discussion) Journal of the Royal Statistical Society B. 2006;68:333–382. [Google Scholar]

[R3] Bouleau N, Lepingle D. Numerical Methods for Stochastic Process. New York: Wiley; 1992. [Google Scholar]

[R4] Cumberland W, Sykes Z. Weak convergence of an autoregressive process used in modeling population growth. Journal of Applied Probability. 1982;19:450–455. [Google Scholar]

[R5] Durbin J, Koopman SJ. A simple and efficient simulation smoother for state space time series analysis. Biometrika. 2002;89:603–616. [Google Scholar]

[R6] Durham G, Gallant A. Numerical techniques for maximum likelihood estimation of continuous-time diffusion processes. Journal of Business & Economic Statistics. 2002;20:297–338. [Google Scholar]

[R7] Elerian O, Chib S, Shephard N. Likelihood inference for discretely observed non-linear diffusions. Econometrica. 2001;69:959–993. [Google Scholar]

[R8] Eraker B. MCMC Analysis of Diffusion Models With Application to Finance. Journal of Business & Economic Statistics. 2001;19:177–191. [Google Scholar]

[R9] Feller W. An Introduction to Probability Theory and Its Application. New York: Springer; 1970. [Google Scholar]

[R10] Gelfand AE, Smith AFM. Sampling-based approaches to calculating marginal densities. Journal of the American Statistical Association. 1990;85:398–409. [Google Scholar]

[R11] Geman S, Geman D. Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. Readings in Computer Vision: Issues, Problems, Principles, and Paradigms pages. 1987:564–584. doi: 10.1109/tpami.1984.4767596. [DOI] [PubMed] [Google Scholar]

[R12] Gilks W, Richardson S, Spiegelhalter D. Markov Chain Monte Carlo in Practice. New York: Chapman & Hall/CRC; 1996. [Google Scholar]

[R13] Green P, Silverman B. Nonparametric Regression and Generalized Linear Models. New York: Chapman & Hall; 1994. [Google Scholar]

[R14] Grimmett G, Stirzaker D. Probability and Random Processes. Oxford: Oxford University press; 2001. [Google Scholar]

[R15] Ionides E, Breto C, King A. Inference for nonlinear dynamical systems. Proceedings of the National Academy of Sciences USA. 2006;103:18438. doi: 10.1073/pnas.0603181103. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] Jazwinski A. Stochastic Processes and Filtering Theory. New York: Academic Press; 1970. [Google Scholar]

[R17] Kimeldorf GS, Wahba G. A correspondence between bayesian estimation on stochastic processes and smoothing by splines. Annals of Mathematical Statistics. 1970;41:495–502. [Google Scholar]

[R18] Kloeden P, Platen E. Numerical Solution of Stochastic Differential Equations. New York: Springer; 1992. [Google Scholar]

[R19] Liang H, Wu H. Parameter estimation for differential equation models using a framework of measurement error in regression models. Journal of the American Statistical Association. 2008;103:1570–1583. doi: 10.1198/016214508000000797. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] Muller P, Quintana F. Nonparametric Bayesian Data Analysis. Statistical Science. 2004;19:95–110. [Google Scholar]

[R21] Pedersen AR. A new approach to maximum likelihood estimation for stochastic differential equations based on discrete observations. Scandinavian Journal of Statistics. 1995;22:55–71. [Google Scholar]

[R22] Proust-Lima C, Taylor JMG, Williams S, Ankerst D, Liu N, Kestin L, Bae K, Sandler H. Determinants of change of prostate-specific antigen over time and its association with recurrence following external beam radiation therapy of prostate cancer in 5 large cohorts. International journal of radiation oncology, biology, physics. 2008;72:782. doi: 10.1016/j.ijrobp.2008.01.056. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R23] Ramsay J, Hooker G, Campbell D, Cao J. Parameter estimation for differential equations: a generalized smoothing approach. Journal of the Royal Statistical Society B. 2007;69:741–796. [Google Scholar]

[R24] Ramsay J, Silverman B. Functional Data Analysis. New York: Springer; 2005. [Google Scholar]

[R25] Rasmussen C, Williams C. Gaussian Processes for Machine Learning. New York: Springer; 2006. [Google Scholar]

[R26] Roberts GO, Stramer O. On inference for partially observed nonlinear diffusion models using the Metropolis-Hastings algorithm. Biometrika. 2001;88:603–621. [Google Scholar]

[R27] Ruppert D, Wand M, Carroll R. Semiparametric Regression. Cambridge: Cambridge University Press; 2003. [Google Scholar]

[R28] Sorensen H. Parametric inference for diffusion processes observed at discrete points in time: a survey. International Statistical Review. 2004;72:337–354. [Google Scholar]

[R29] Speigelhalter D, Best N, Carlin B, van der Linde A. Bayesian measures of model complexity and fit (with discussion) Journal of the Royal Statistical Society B. 2003;64:583–616. [Google Scholar]

[R30] Tanner M, Wong W. The calculation of posterior distributions by data augmentation. Journal of the American Statistical Association. 1987;82:528–540. [Google Scholar]

[R31] Taylor JMG, Law N. Does the covariance structure matter in longitudinal modelling for the prediction of future CD4 counts? Statistics in Medicine. 1998;17:2381–2394. doi: 10.1002/(sici)1097-0258(19981030)17:20<2381::aid-sim926>3.0.co;2-s. [DOI] [PubMed] [Google Scholar]

[R32] Uhlenbeck G, Ornstein L. On the Theory of the Brownian Motion. Physical Review. 1930;36:823–841. [Google Scholar]

[R33] Wahba G. Improper priors, spline smoothing and the problem of guarding against model errors in regression. Journal of the Royal Statistical Society B. 1978;40:364–372. [Google Scholar]

[R34] Wahba G. Spline Models for Observational Data. Society for Industrial Mathematics; 1990. [Google Scholar]

[R35] Weinert H, Byrd H, Sidhu G. A stochastic framework for recursive computation of spline functions: Part ii, smoothing splines. J Optimization Theory and Applications. 1980;01:255–268. [Google Scholar]

[R36] Zhu B. PhD thesis. University of Michigan; Ann Arbor, MI: 2010. Stochastic Dynamic Models for Functional Data. [Google Scholar]

PERMALINK

Stochastic Functional Data Analysis: A Diffusion Model-based Approach

Bin Zhu

Peter X-K Song

Jeremy MG Taylor

Summary

1. Introduction

Figure 1.

2. Stochastic Velocity Model

2.1 Model Specification

2.2 Wiener process for velocity

2.3 Ornstein-Uhlenbeck process for velocity

3. Estimation and Inference

3.1 Likelihood and Euler approximation

3.2 Data augmentation

3.3 Bayesian inference

3.4 Posterior forecasting with SVM

4. Simulations

Table 1. Simulation results for the estimation of SVM-OU parameters and stable rates. For data augmentation, 0, 1 and 3 data points are added between every two adjacent observations.

Table 2. Simulation results for the estimation of U(t) and V(t), when data simulated from SVM-OU and ODE model. For data augmentation, 0, 1 and 3 data points are added between every two adjacent observations.

5. Application

5.1 Prostate specific antigen

Table 3. PSA data: Posterior mean and quantiles for the SVMs.

Figure 2.

6. Discussion

Supplementary Material

Acknowledgments

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Stochastic Functional Data Analysis: A Diffusion Model-based Approach

Bin Zhu

Peter X-K Song

Jeremy MG Taylor

Summary

1. Introduction

Figure 1.

2. Stochastic Velocity Model

2.1 Model Specification

2.2 Wiener process for velocity

2.3 Ornstein-Uhlenbeck process for velocity

3. Estimation and Inference

3.1 Likelihood and Euler approximation

3.2 Data augmentation

3.3 Bayesian inference

3.4 Posterior forecasting with SVM

4. Simulations

Table 1. Simulation results for the estimation of SVM-OU parameters and stable rates. For data augmentation, 0, 1 and 3 data points are added between every two adjacent observations.

Table 2. Simulation results for the estimation of U(t) and V(t), when data simulated from SVM-OU and ODE model. For data augmentation, 0, 1 and 3 data points are added between every two adjacent observations.

5. Application

5.1 Prostate specific antigen

Table 3. PSA data: Posterior mean and quantiles for the SVMs.

Figure 2.

6. Discussion

Supplementary Material

Acknowledgments

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases