Recurrent event analysis in the presence of real-time high frequency data via random subsampling

Walter Dempsey

doi:10.1080/10618600.2023.2276114

. Author manuscript; available in PMC: 2025 Jan 1.

Published in final edited form as: J Comput Graph Stat. 2023 Dec 15;33(2):525–537. doi: 10.1080/10618600.2023.2276114

Recurrent event analysis in the presence of real-time high frequency data via random subsampling

Walter Dempsey ^1,^*

PMCID: PMC11165938 NIHMSID: NIHMS1941631 PMID: 38868625

Abstract

Digital monitoring studies collect real-time high frequency data via mobile sensors in the subjects’ natural environment. This data can be used to model the impact of changes in physiology on recurrent event outcomes such as smoking, drug use, alcohol use, or self-identified moments of suicide ideation. Likelihood calculations for the recurrent event analysis, however, become computationally prohibitive in this setting. Motivated by this, a random subsampling framework is proposed for computationally efficient, approximate likelihood-based estimation. A subsampling-unbiased estimator for the derivative of the cumulative hazard enters into an approximation of log-likelihood. The estimator has two sources of variation: the first due to the recurrent event model and the second due to subsampling. The latter can be reduced by increasing the sampling rate; however, this leads to increased computational costs. The approximate score equations are equivalent to logistic regression score equations, allowing for standard, “off-the-shelf” software to be used in fitting these models. Simulations demonstrate the method and efficiency-computation trade-off. We end by illustrating our approach using data from a digital monitoring study of suicidal ideation.

Keywords: recurrent events, probabilistic subsampling, estimating equations, high frequency time series, logistic regression

1. Introduction

Advancement in mobile technology has led to the rapid integration of mobile and wearable sensors into behavioral health (Free et al., 2013). Take HeartSteps, for example, a mobile health (mHealth) study designed to increase physical activity in sedentary adults (Klasnja et al., 2019). Here, a Jawbone sensor is used to monitor step count every minute of the participant’s study day. Of interest in many mHealth studies is the relation of such real-time high frequency sensor data to an adverse, recurrent event process. In a smoking cessation mHealth study (Spring, 2019), for example, the relation between a time-varying sensor-based measure of physiological stress and smoking lapse is of scientific interest. In a suicidal ideation mHealth study (Kleiman et al., 2018), the relation of electrodermal activity (EDA) and accelerometer with self-identified moments of suicidal ideation is of scientific interest.

The goal of this paper is to construct a simple, easy-to-implement method for parameter estimation and inference. To do so, we introduce a random subsampling procedure that has several benefits. First, the resulting inference is unbiased; however, there is a computation-efficiency trade-off. In particular, a higher sampling rate can decrease estimator variance at the cost of increased computation. We show via simulations that the benefits of very high sampling rates is often negligible, as the contribution to the variation is small relative to the variation in the underlying stochastic processes. Second, derived estimating equations are optimal, implying loss of statistical efficiency is only due to subsampling procedure and not the derived methodology. Finally, implementation can leverage existing, standard software for functional data analysis and logistic regression, leading to fast adoption by domain scientists.

1.1. Related work

The use of wearable devices to passively monitor patients has led to a rapid increase in the number of studies with high-frequency sensors that can be conceptualized as studies where measurements are functions. The development of such applications have been accompanied by intense methodological development in regression models with functional covariates (James, 2002; James and Silverman, 2005; Muller, 2005; Ramsay and Silverman, 2005b; Kokoszka and Reimherr, 2017; Crainiceanu et al., 2009; Reiss and Ogden, 2007).

Recently, modeling time-to-event data with functional covariates has received a fair bit of attention. The functional linear Cox regression model (FLCRM) considers the association between a time-to-event and a set of functional and scalar predictors (Kong et al., 2018). In this setting, the functional and scalar predictors are observed at baseline and the hazard function satisfies the proportional hazards assumption and involves a non-parametric baseline hazard with the exponential adjustment including both a functional linear model and a linear model for the baseline scalar predictors. Since the linear assumption may be too restrictive, Cui et al. (2021) consider a more flexible additive functional Cox model for multivariate functional predictors in which the hazard depends on an unspecified bivariate twice differentiable function.

In our setting, the high frequency sensor process is an outcome and must therefore be jointly modeled with the recurrent event process. Recently, Li et al. (2022) proposed a functional approach to joint modeling of multivariate longitudinal biomarkers and a time-to-event outcome (Tsiatis and Davidian, 2004; Rizopoulos, 2010). Here, as in traditional joint models, the longitudinal biomarker is observed at only a few observation times (typically on the order of tens of observations) and is modelled via a functional principal components analysis. The eigenscores are the shared parameters across the longitudinal and survival submodels. A similar joint model was proposed in Dong et al. (2021). Inference proceeds by Monte Carlo EM which can require very large Monte Carlo sample size to ensure good performance, and naïve implementation may not recover EM’s ascent property (Caffo et al., 2005).

In this paper, we also consider joint modeling of recurrent events and a high frequency sensor process. In contrast to traditional longitudinal biomarkers, sensor processes are observed multiple times per second leading to thousand and/or millions of observations per individual. To address this increase in scale, a design-based perspective is taken in which we subsample non-event times to avoid some of the computationally demanding aspects of joint models. Subsampling in the context of functional data analysis for massive data has recently been investigated (Liu et al., 2021) in which a functional L-optimality criterion is derived. Subsampling is different in our current context, as our goal is to subsample from the individual sensor processes to produce a design-unbiased estimate of the cumulative hazard function. Finally, traditional joint modeling often consider the hazard function to depend on the current or cumulative value of the health process (Li et al., 2022; Rizopoulos, 2010). For high frequency sensor processes, we expect the hazard function to only depend on the recent sensor history and use recent history functional linear models (Kim et al., 2011) to account for this local dependence. We combine these models with advances in longitudinal functional data analysis (Staicu et al., 2010) and penalized functional regression (Goldsmith et al., 2015) to perform scalable inference using off-the-shelf software.

2. Main contributions and outline

The main contributions of this paper are as follows: We define a joint model using historical functional linear models and demonstrate how techniques from design-based inference can be used to circumvent computational challenges of joint modeling in Section 3. We then discuss longitudinal functional principal components analysis and demonstrate in Lemma 4.1 that the resulting approximate score equations are equivalent to score functions for logistic regression with binary response and an offset related to the subsampling rate meaning our proposed approach can be fit using off-the-shelf software. An asymptotic analysis is presented in Section 4.4, with an accompanying discussion of the novel computational versus statistical efficiency tradeoff in Section 4.5. A simulation study in Section 5 demonstrates the impact of subsampling rates, the window length parameter, and a computational comparison to existing methods. A multivariate extension and method to handle missing data are presented in Section 6. We end by illustrating our approach using data from a digital monitoring study of suicidal ideation in Section 7.

3. Recurrent event process and associated high frequency data

Suppose $n$ subjects are independently sampled with observed event times $T_{i} = \{T_{i, 1}, \dots, T_{i, k_{i}}\}$ over some observation window $[0, τ_{i}]$ for each subject $i = 1, \dots, n$ . Assume the event times are ordered, i.e., $T_{i, j} < T_{i, j^{'}}$ , for $j < j^{'}$ . The observation window length, $T_{i}$ , is the censoring time and is assumed independent of the event process. Let $N_{i} (t)$ denote the associated counting process of $T_{i}$ ; that is, $N_{i} (t) = \sum_{j = 1}^{k_{i}} 1 [T_{i, j} < t]$ . In this section, we assume a single-dimensional health process $x_{i} = {\{x_{i} (s)\}}_{0 \leq s < τ_{i}}$ for each participant is measured at a dense grid of time points. Accelerometer, for example, is measured at a rate of 32Hz (i.e., 32 times per second). Electrodermal activity (EDA), on the other hand, is measured at a rate of 4Hz (i.e., 4 times per second). Given the high frequency nature of sensor data, this paper assumes the process is measured continuously. See Appendix A for a notation glossary.

Let $H_{i, t}^{N X} = H_{i, t}^{N} \otimes H_{i, t}^{X}$ be the $σ$ -field generated by all past values ${(N_{i} (s), x_{i} (s))}_{0 \leq s \leq t}$ . In this paper, the instantaneous risk of an event at time $t$ is assumed to depend on the health process, time-in-study, and the event history through a fully parametric conditional hazard function:

h_{i} (t ∣ H_{i, t}^{N X}; θ) = \lim_{δ \to 0} δ^{- 1} pr (N_{i} (t + δ) - N_{i} (t) > 0 ∣ H_{i, t}^{N X}),

(1)

where $θ$ is the parameter vector. For high frequency physiological data, we assume that current risk is log-additive and depends on a linear functional of the health process over some recent window of time and pre-specified features of the counting process; that is,

h_{i} (t ∣ H_{i, t}^{N X}; θ) = h_{0} (t; γ) \exp (g_{t} {(H_{i, t}^{N})}^{'} α + \int_{t - Δ}^{t} x_{i} (s) β (s) d s)

(2)

where $h_{0} (t; γ)$ is a parametrized baseline hazard function, $Δ$ is an unknown window-length, and $g_{t} (H_{i, t}^{N}) \in R^{p}$ is a $p$ -length feature vector summarizing the event-history and time-in-study information. The final term $\int_{t - Δ}^{t} x_{i} (s) β (s) d s$ reflects the unknown linear functional form of the impact of the time-varying covariate on current risk.

An alternative to (2) would be to construct features from the sensor data history $f_{t} (H_{i, t}^{X}) \in R^{q}$ and incorporated these features in the place of the final term. Our current approach builds linear features of $H_{i, t}^{X}$ directly from the integrated history, avoiding the feature construction problem – a highly nontrivial issue for high frequency time-series data. The main caveat is the additional parameter $Δ$ ; however, as long as the estimated $\hat{Δ}$ exceeds $Δ$ , then resulting estimation is unbiased albeit at a loss of efficiency. Moreover, sensitivity analysis can be performed to determine how choice of $\hat{Δ}$ affects inference. One limitation of the approach presented here is that only fully parametric hazard models may be fit to the data. However, a spline model for the log baseline hazard affords sufficient model flexibility.

3.1. Likelihood calculation

For the sake of notational simplicity, we leave the dependency of the conditional hazard function on $H_{i, t}^{N X}$ implicit, and write $h_{i} (t; θ)$ . In our current setting, we assume the health process ${\{x_{i} (s)\}}_{0 \leq s < τ_{i}}$ is directly observed. Therefore, we can consider the event process $T_{i}$ conditional on the health process $X_{i}$ , which results in the log-likelihood related to the event process being then given by

L_{n} (θ) = \sum_{i = 1}^{n} (\sum_{j = 1}^{k_{i}} log (h_{i} (T_{i, j}; θ)) - H_{i} (τ_{i}; θ))

where $H_{i} (τ_{i}; θ) = \int_{0}^{τ_{i}} h_{i} (t; θ) d t$ is the cumulative hazard function. See Appendix B for additional arguments in favor of our proposed approach. Solving the associated score equations $U_{n} (θ) = 0$ yields the maximum likelihood estimator $\hat{θ}$ , where

U_{n} (θ) = \sum_{i = 1}^{n} (\sum_{j = 1}^{k_{i}} \frac{h_{i}^{(1)} (T_{i, j}; θ)}{h_{i} (T_{i, j}; θ)} - H_{i}^{(1)} (τ_{i}; θ)),

with $h_{i}^{(1)} (T_{i, j}; θ)$ and $H_{i}^{(1)} (τ_{i}; θ)$ are derivatives with respect to $θ$ .

In classical joint models (Henderson et al., 2000; Tsiatis and Davidian, 2004), time-varying covariates $x_{i} (t)$ are observed only intermittently at appointment times. In our current setting, maximizing the likelihood is computationally prohibitive since for any $θ$ we must compute the cumulative hazard functions $H_{i} (τ_{i}; θ)$ which requires integration of $h_{i} (t; θ)$ given by (2) which itself depends on the integral $\int_{t - Δ}^{t} x_{i} (s) β (s) d s$ that is a function of the unknown functional parameter $β (\cdot)$ . That is, the risk model now depends on an integrated past history of the time-varying covariate which leads to severe increase in computational complexity.

To circumvent these computational difficulties, we will derive approximate score equations based on design-based inference of point processes. Design-based inference is common for spatial point processes (Waagepetersen, 2008) where the spatial varying covariate is observed at a random sample of locations. It is common in mobile health where ecological momentary assessments (Rathbun, 2012; Rathbun and Shiffman, 2016) are used to randomly sample individuals at various time-points to assess their emotional state. In the current setting, we will leverage these ideas to form a subsampling protocol that can substantially reduce computationally complexity. Therefore, the purpose is quite different. Moreover, the dependence of the intensity function on the recent history of sensor values leads to additional complications that must be addressed.

3.2. Probabilistic subsampling framework

To solve the computational challenge we employ a point-process subsampling design to obtain unbiased estimates of the derivative of the cumulative hazards for each subject. The subsampling procedure treats the collected sensor data as a set of potential observations. Suppose covariate information is sampled at times drawn from an independent inhomogeneous Poisson point process with known intensity $π_{i} (t)$ . At a subsampled time $t$ , the windowed covariate history ${\{x_{i} (t - s)\}}_{0 \leq s \leq Δ}$ and counting process features $g_{t} (H_{i, t}^{N})$ are observed. Optimal choice of $π_{i} (t)$ is beyond the scope of this paper; however, simulation studies have suggested setting the subsampling rate proportional to the hazard function $h_{i} (t; θ)$ .

An estimator is design-unbiased if its expectation is equal to that parameter under the probability distribution induced by the sampling design (Cassel et al., 1977). Let $D_{i} \subset [0, τ_{i}]$ denote the random set of subsampled points. Note, by construction, this random set is distinct from the set of event times with probabilty one, i.e., $p r (T_{i} \cap D_{i} = \emptyset) = 1$ . Under subsampling via $π_{i} (t)$ , one may compute a Horvitz-Thompson estimator of the derivative of the cumulative hazard ${\hat{H}}_{i}^{(1)} (τ_{i}; θ) = \sum_{u \in D_{i}} h_{i}^{(1)} (u; θ) / π_{i} (u)$ . An alternative design-unbiased estimator of the derivative of the cumulative hazards is given by

{\hat{H}}_{i}^{(1)} (τ_{i}; θ) = \sum_{u \in (T_{i} \cup D_{i})} \frac{h_{i}^{(1)} (u; θ)}{π_{i} (u) + h_{i} (u; θ)}

(3)

Equation (3) is the estimator suggested by Waagepetersen (2008). This estimator depends on the superposition of the event and subsampling processes. Lemma 4.7 shows the estimator for $θ$ associated with using (3) is the most efficient within a suitable class of estimators for the derivative of the cumulative hazard function (including the Horvitz-Thompson estimator). Therefore, we restrict our attention to (3) for the remainder of this paper. Letting

w_{i} (t; θ) = \frac{π_{i} (t)}{π_{i} (t) + h_{i} (t; θ)},

(4)

the resulting approximate estimating equations can be re-written as

{\hat{U}}_{n} (θ) = \sum_{i = 1}^{n} [\sum_{u \in T_{i}} w_{i} (u; θ) \frac{h_{i}^{(1)} (u; θ)}{h_{i} (u; θ)} - \sum_{u \in D_{i}} w_{i} (u; θ) \frac{h_{i}^{(1)} (u; θ)}{π_{i} (u)}] .

(5)

Equation (5) represents the approximate score functions built via plug-in of the design-unbiased estimator of the derivative of the cumulative hazard given in (3).

4. Longitudinal functional principal components within event-history analysis

Probabilistic subsampling converts the single sensor stream $x_{i}$ into a sequence of functions observed repeatedly at sampled times $D_{i}$ and event times $T_{i}$ over windows of length $Δ$ . Such a data structure is commonly referred to as longitudinal functional data (Xiao et al., 2013; Goldsmith et al., 2015). Given the large increase in longitudinal functional data in recent years, corresponding analysis has received much recent attention (Morris et al., 2003; Morris and Carroll, 2006; Baladandayuthapani et al., 2008; Di et al., 2009; Greven et al., 2010; Staicu et al., 2010; Chen and Müller, 2012; Li and Guan, 2014). Here, we combine work by Park and Staicu (2015) and Goldsmith et al. (2011) to construct a computationally efficient penalized functional method for solving the estimation equations ${\hat{U}}_{n} (θ)$ .

4.1. Estimation of the windowed covariate history

We start by defining $X (t, s) = x (t - s)$ to be the sensor measurement $0 \leq s \leq Δ$ time units prior to time $t \in T_{i} \cup D_{i}$ . The sandwich smoother (Xiao et al., 2013) is used to estimate the mean $μ_{y} (t, s) = ℰ_{y} [X (t, s)]$ where the expectation is indexed by whether $t$ is an event $(y = 1)$ or subsampled $(y = 0)$ time respectively. Alternative bivariate smoothers exist, such as the kernel-based local linear smoother (Hastie et al., 2009), bivariate tensor product splines (Wood, 2006), and the bivariate penalized spline smoother (Marx and Eilers, 2006). The sandwich smoother was chosen for its computational efficiency and estimation accuracy. We then define $\tilde{X} (t, s) = X (t, s) - {\hat{μ}}_{y} (t, s)$ to be the mean-zero process at each time $t \in T_{i} \cup D_{i}$ .

As in Park and Staicu (2015), define the marginal covariance by

Σ_{y} (s, s^{'}) = \int_{0}^{τ} c_{y} ((T, s), (T, s^{'})) f_{y} (T) d T .

for $0 \leq s, s^{'} \leq Δ$ , where $c_{y} ((t, s), (t, s^{'}))$ is the covariance function of the windowed covariate history $X (t, \cdot),$ $T$ is the observation window length of the process, and $f_{y} (T)$ is the intensity function for event $(y = 1)$ and subsampled $(y = 0)$ times respectively. Estimation of $Σ_{y}$ occurs in two steps. For simplicity, we present the steps for subsampled times (i.e., $y = 0$ ) but the steps are the same for event times as well. First, the pooled sample covariance is calculated at a set of grid points:

{\tilde{Σ}}_{0} (s_{r}, s_{r^{'}}) = {(\sum_{i = 1}^{n} |D_{i}|)}^{- 1} (\sum_{i = 1}^{n} \sum_{t \in D_{i}} \tilde{X} (t, s_{r}) \tilde{X} (t, s_{r^{'}})) .

As we assume the health process $x_{i}$ is directly observed, the diagonal elements of ${\hat{Σ}}_{0}$ are not inflated. Second, the estimator $\hat{Σ}$ is further smoothed again using the sandwich smoother (Xiao et al., 2013). Note Park and Staicu (2015) smooth the off-diagonal elements, while here we smooth the entire pooled sample covariance matrix. All negative eigenvalues are set to zero to ensure positive semi-definiteness. The result is used as an estimator ${\hat{Σ}}_{0}$ for the pooled covariance $Σ_{0}$ .

Next, we take the spectral decomposition of the estimated covariance function; let ${{\hat{ψ}}_{k}^{(0)} (s), {\hat{λ}}_{k}^{(0)}}_{k \geq 1}$ be the resulting sequence of eigenfunctions and eigenvalues. The key benefit of the marginal covariance approach is that it allows us to compute a single, time-invariant basis expansion; this reduces the computational burden by avoiding the three dimensional covariance function (i.e., covariance depends on $t$ ) and associated spectral decomposition in methods considered by Chen and Müller (2012). Using the Karhunen-Loève decomposition, we can represent $X (t, s)$ for $t \in T_{i} \cap D_{i}$ by

X (t, s) = {\hat{μ}}_{y} (t, s) + \sum_{k = 1}^{\infty} {\hat{c}}_{i, k}^{(y)} (t) {\hat{ψ}}_{k}^{(y)} (s) \approx {\hat{μ}}_{y} (t, s) + c_{i}^{(y)} (t)^{⊤} {\hat{ψ}}^{(y)} (s)

where

{\hat{c}}_{i, k}^{(y)} (t) = \int_{t - Δ}^{t} {\tilde{X}}_{i} (t, s) {\hat{ψ}}_{k}^{(y)} (s) d s, c_{i}^{(y)} (t) = {(c_{i, 1}^{(y)} (t), \dots, c_{i, K_{x}}^{(y)} (t))}^{⊤}, {\hat{ψ}}^{(y)} (s) = {({\hat{ψ}}_{1}^{(y)} (s), \dots, {\hat{ψ}}_{K_{x}}^{(y)} (s))}^{⊤},

and $K_{x} < \infty$ is the truncation level of the infinite expansion. Following Goldsmith et al. (2011), we set $K_{x}$ to satisfy identifiability constraints (see Section 4.2 for details). In subsequent sections, we leave the dependence on $y$ (i.e., whether $t \in T_{i}$ or $\in D_{i}$ ) implicit unless required for notational simplicity.

4.2. Estimation of $β (s)$

The next step of our method is modeling $β (s)$ . Here, we leverage ideas from the penalized spline literature (Ruppert et al., 2003; Wood, 2003).

Let $ϕ (s) = \{ϕ_{1} (s), \dots, ϕ_{K_{b}} (x)\}$ be a spline basis and assume that $β (s) = \sum_{j = 1}^{K_{b}} b_{j} ϕ_{j} (s) = ϕ (t) b$ where $b = {[b_{1}, \dots, b_{K_{b}}]}^{⊤}$ . Thus, the integral in (2) can be restated as

\begin{array}{l} \int_{t - Δ}^{t} X (t, s) β (s) d s \approx \int_{t - Δ}^{t} [\hat{μ} (t, s) + c (t)^{⊤} \hat{ψ} (s)] \times [ϕ (s) b] d s \\ = [M_{i, t}^{⊤} + c (t)^{⊤} J_{\hat{ψ}, ϕ}] b \end{array}

where $M_{t} = (M_{1, t}, \dots, M_{K_{b}, t}),$ $M_{j, t} = \int_{t - Δ}^{t} \hat{μ} (t, s) ϕ_{j} (s)$ , and $J_{\hat{ψ}, ϕ}$ is a $K_{x} \times K_{b}$ dimensional matrix with the $(k, l)$ th entry is equal to $\int_{0}^{Δ} {\hat{ψ}}_{k} (s) ϕ_{l} (s) d s$ (Ramsay and Silverman, 2005a).

Given the basis for $β (t)$ , the model depends on choice of both $K_{b}$ and $K_{X}$ . We follow Ruppert (2002) by choosing $K_{b}$ large enough to prevent under-smoothing and $K_{x} \geq K_{b}$ to satisfy identifiability constraints. While our theoretical analysis considers truncation levels that depend on $n$ , in practice, we follow the simple rule of thumb and set $K_{b} = K_{x} = 35$ . As long as the choices of $K_{X}$ and $K_{b}$ are large enough, their impact on estimation is typically negligible. Below, we will exploit a connection between (5) and score equations for a logistic regression model. Before moving on, we introduce some additional notation. Define

h_{i} (t ∣ H_{i, t}^{N X}; θ) \approx e x p (Z_{t}^{⊤} γ + g_{t} {(H_{i, t}^{N})}^{'} α + M_{i, t}^{⊤} b + C_{i, t}^{⊤} J_{\hat{ψ}, ϕ} b) = e x p (W_{i, t}^{⊤} θ),

(6)

where $θ = (γ, α, b)$ and $e x p (Z_{t}^{⊤} γ) = : h_{0} (t)$ is the parameterized baseline intensity function. We write ${\tilde{U}}_{n} (θ)$ to denote the approximate score function when substituting in (6) for (2).

4.3. Connection to logistic score functions

We next establish a connection between the above approximate score equations ${\tilde{U}}_{n} (θ)$ and the score equations for a logistic regression model. We can then exploit this connection to allow the model to be fit robustly using standard mixed effects software (Ruppert, 2002; McCulloch and Searle, 2001).

Lemma 4.1.

Under weights (4) and the log-linear intensity function (6), the approximate score function ${\tilde{U}}_{n} (θ)$ is equivalent to

\sum_{i = 1}^{n} \sum_{t \in T_{i} \cup D_{i}} [1 [t \in D_{i}] - \frac{1}{1 + \exp [- ({\tilde{W}}_{i, t}^{⊤} θ + \log π_{i} (t))]}] {\tilde{W}}_{i, t}

where ${\tilde{w}}_{i, t} = - W_{i, t}$ . This is the score function for logistic regression with binary response $Y_{i} (t)$ for $t \in T_{i} \cup D_{i}$ and $i \in [n]$ where $Y_{i} (t) = 1 [t \in T_{i}]$ , offset $l o g π_{i} (t)$ , and covariates ${\tilde{W}}_{i, t}$ .

The connection established by Lemma 4.1 between our proposed methodology and logistic regression allows us to leverage “off-the-shelf” software. The main complication is pre-processing of the functional data; however, these additional steps can also be taken care of via existing software. Therefore, the entire data analytic pipeline is easy-to-implement and requires minimal additional effort by the end-user. To see this, we briefly review the proposed inference procedure.

Remark 4.2 (Inference procedure review).

Given observed recurrent event and high frequency data ${\{T_{i}, x_{i}\}}_{i = 1}^{n}$ ,

For each $i \in [n]$ , sample non-event times as a time-inhomogeneous Poisson point process with intensity according to $π_{i} (t)$
Estimate mean $μ_{y} (t, s)$ for $0 \leq s \leq Δ$ at all event times $t \in \cup_{i = 1}^{n} T_{i}$ and sampled non-event times $t \in \cup_{i = 1}^{n} D_{i}$ .
Compute marginal covariance across event times, $Σ_{1}$ , and non-event times, $Σ_{0}$ .
Compute eigendecomposition ${{\hat{ψ}}_{k}^{(y)}, {\hat{λ}}_{k}^{(y)}}$ of marginal covariance $Σ_{y}$
Use the eigendecomposition to construct $W_{i, t}$ for all $i \in [n]$ and $t \in D_{i} \cup T_{i}$
Perform logistic regression with binary outcome $\{Y_{i} (t)\}$ and offset of $l o g π_{i} (t)$ .

Before demonstrating the methodology via simulation in Section 5 and a worked example in Section 7, we provide a theoretical analysis of our current proposal.

4.4. Theoretical analysis

Our theoretical analysis requires assumptions regarding the subsampling procedure, the event process, and the functional data. We state these assumptions and then our main theorems. We start by assuming there exists $τ < \infty$ such that all individuals are no longer at risk (i.e., $τ_{i} < τ$ for all $i$ ). Moreover, define $R_{i} (t)$ to be the at-risk indicator for participant $i$ , i.e., $R_{i} (t) = 1 [t \in (0, τ_{i})]$ . Asymptotic theory provided in Lemma 4.6 will be proven under regularity conditions A-E in (Andersen et al., 1993, pp. 420–421) along with the following additional assumptions:

Assumption 4.3 (Event process assumptions).

We assume the following holds:

(E.1) The subsampling rate is both lower and upper bounded at all at-risk times; that is, $0 < L < π_{i} (t) < U < \infty$ for all $i = 1, 2, \dots$ and $t \in [0, τ]$ such that $R_{i} (t) = 1$
(E.2) There exists a nonnegative definite matrix $Ξ (θ)$ such that
$n^{- 1} Ξ_{n} (θ) = n^{- 1} \sum_{i = 1}^{n} \int_{0}^{τ} w_{i} (t; θ) \times [\frac{h_{i}^{(1)} (t; θ) h_{i}^{(1)} {(t; θ)}^{⊤}}{h_{i} (t; θ)}] \times R_{i} (t) d t \overset{P}{\to} Ξ (θ) .$
(E.3) There exists $M$ such that $|W_{i, j, t}| < M$ for all $(i, j, t)$ .
(E.4) For all $j,$ $k$
$n^{- 1} \sum_{i = 1}^{n} \int_{0}^{τ} {| \frac{d^{2}}{d θ_{j} d θ_{k}} h_{i} (t; θ_{0}) |}^{2} R_{i} (t) d t \overset{P}{\to} C < \infty$
as $n \to \infty$ .

We also require several assumptions due to the truncation of the Karhunen-Loève decomposition that represents $X (t, s)$ .

Assumption 4.4 (Functional assumptions (Park and Staicu, 2015)).

The following assumptions are standard in prior work on longitudinal functional data analysis (Park and Staicu, 2015; Yao et al., 2005; Chen and Müller, 2012):

(A.1) $X = {X (t, s) : (t, s) \in 𝒯 \times 𝒮}$ is a square integrable element of the $L^{2} (𝒯 \times 𝒮)$
(A.2) The subsampling and conditional intensity rate functions $f_{y} (T)$ are continuous and $s u p |f_{y} (T)| < \infty$ .
(A.3) $ℰ [X (t, s) X (t, s^{'}) X (t^{'}, s) X (t^{'}, s^{'})] < \infty$ for each $s, s^{'} \in [0, Δ]$ and $0 < t,$ $t^{'} < τ$ .
(A.4) $ℰ [{‖ X (t, \cdot) ‖}^{4}] < \infty$ for each $0 < t < τ$ .

Finally, for simplicity, we assume that there exists $b^{⋆}$ such that $β (t) = ϕ (t) b^{⋆}$ ; that is, the true function $β (t)$ sits in the span of the spline basis expansion.

Remark 4.5 (Practical consequences of Assumptions 4.3 and 4.4).

Assumptions 4.3 and 4.4 contain as a special case the scenario where individuals are independent and identically distributed, the functional process is bounded (i.e., $| X (t, s) | < M$ for some $M < \infty$ ), and the subsampling rate is both lower and upper bounded at all risk times. As such bounds are likely to be true in most practical settings, this demonstrates the reasonableness of our assumptions for applied settings.

Lemma 4.6.

Under Assumption 4.3, Assumption 4.4, and $Δ$ known, for large $n$ the estimator ${\hat{θ}}_{n}$ is consistent; moreover,

\sqrt{n} (\hat{θ} - θ) \overset{D}{\to} N (0, Ξ (θ)^{- 1}),

where $\overset{D}{\to}$ is convergence is distribution and

Ξ (θ) = \int_{0}^{τ} w (s; θ) \times [\frac{h^{(1)} (s; θ) \times h^{(1)} (s; θ)^{⊤}}{h (s; θ)}] d s .

and $T$ is the random censoring time of the event process.

Proof of Lemma 4.6 is presented in Appendix C.2. A design-unbiased estimator for $Ξ (θ)$ is

\hat{Ξ} (θ) = n^{- 1} \sum_{i = 1}^{n} \sum_{t \in T_{i} \cup D_{i}} w_{i} (t; θ) (1 - w_{i} (t; θ)) [\frac{h_{i}^{(1)} (t; θ)}{h_{i} (t; θ)}] \times {[\frac{h_{i}^{(1)} (t; θ)}{h_{i} (t; θ)}]}^{⊤}

(7)

For the log-linear intensity model, the sampling-unbiased estimator for $\hat{Ξ} (θ)$ is equivalent to the Fisher information for the previously described logistic regression model. This implies that subsampling from an inhomogeneous Poisson process, standard logistic regression software can be used to fit the recurrent event model by specifying an offset equal to $l o g π_{i} (t)$ . Based on this, we can leverage existing inferential machinery to obtain variance-covariance estimates of model parameters.

That is, if ${\hat{Σ}}_{b b}$ is the $K_{b} \times K_{b}$ dimensional matrix obtained by plugging in the estimates of variance components into the formula for the variance of $\hat{b}$ , then the standard error for estimate at time $t_{0}$ – i.e., $\hat{β} (t_{0}) = ϕ (t_{0}) b$ – is given by $\sqrt{ϕ (t_{0}) {\hat{Σ}}_{b b} ϕ {(t_{0})}^{⊤}}$ . Then the approximate 95% confidence interval can be constructed as $\hat{β} (t_{0}) \pm 1.96 \sqrt{ϕ (t_{0}) {\hat{Σ}}_{b b} ϕ {(t_{0})}^{⊤}}$ . We acknowledge an important limitations of confidence intervals obtained via this approach. Specifically, we ignore the variability inherent in the longitudinal functional principal component analysis; that is, our estimates ignore the variability in estimation of eigenfunctions $\hat{ψ}$ as well as the coefficients ${\hat{c}}_{i, k} (t)$ . Joint modeling could be considered as in Crainiceanu and Goldsmith (2010), however, this is beyond the scope of this article.

Lemma 4.7 shows (4) is optimal within a class of weighted estimating equations. The result ensures the only loss of statistical efficiency is due to subsampling and not using a suboptimal estimation procedure given subsampling. Here, weights $w^{⋆}$ are considered optimal if the difference between the asympotic variance $V (θ_{0}; w^{⋆})$ and the asymptotic variance under any other choice of weights $W, V (θ_{0}; w)$ is positive semi-definite, i.e., any linear contrast has smaller asymptotic variance under weight $w^{⋆}$ than under weight $W$ .

Lemma 4.7.

If the event process is an inhomogeneous Poisson point process with intensity $h (t; θ)$ and subsampling occurs via an independent, inhomogeneous Poisson point process with intensity $π (t)$ , then ${\hat{U}}_{n} (θ)$ are optimal estimating functions (i.e., most efficient) in the class of weighted estimating functions given by (5) replacing (4) by any weight function $w_{i} (t; θ)$ . This class includes the Horvitz-Thompson estimator under $w (s; θ) = 1$ .

Proof of Lemma 4.7 is presented in Appendix D.

4.5. Computation versus statistical efficiency tradeoff

We next consider the statistical efficiency of our proposed estimator when compared to complete-data maximum likelihood estimation. While subsampling introduces additional variation, it may significantly reduce the overall computational burden. It is this trade-off that we next make precise. In particular, we consider the following choice of subsampling rate, $π (t) = c \times h (t; θ)$ for $c > 0$ . That is, the subsampling rate is proportional to the intensity function with time-independent constant $c > 0$ . Under this subsampling rate, the weight function (4) is equal to $c / (c + 1)$ . Under Lemma 4.6,

Ξ (θ) = \frac{c}{c + 1} \int_{0}^{τ} \frac{h^{(1)} (t; θ) h^{(1)} (t; θ)^{⊤}}{h (t; θ)} d t = \frac{c}{c + 1} Σ (θ)

where $Σ (θ)$ is the Fisher information of the complete-data maximum likelihood estimator. Therefore the relative efficiency is $c / (1 + c)$ . For an upper bound $H = {m a x}_{t \in (0, τ)} h (t; θ)$ , if we set $π (t) = c \times H$ , then the relative efficiency can be lower bounded by $c / (c + 1)$ .

Sensor measurements occur multiple times per second. Suppose the intensity rate is bounded above by 1 and the unit time scale was hours. If we then subsample the data at a rate of 10 times per hour, then we have a lower bound on the efficiency of 0.909. For a 4Hz sensor, this reduces the number of samples per hour from 4 × 60 × 60 = 14,400 per hour to on average 10 per hour. While the computational complexity of logistic regression is linear in the number of samples, we get 1440 times reduction in the data size at the cost of a 0.909 statistical efficiency. If we sample 100 times per hour, then the efficiency loss is only 0.999, with a 144 times reduction in data size. Table 1 provides additional examples for a 4Hz and 32Hz sensor rate respectively. The data reduction depends on this rate; however, the lower bound on statistical efficiency does not because the subsampling rate only depends on the upper bound of the intensity function. In particular, if the events are rare then subsampling rate can be greatly reduced with no impact to statistical efficiency.

Table 1.

Data reduction (total # of measurements divided by expected number of subsampled measurements) given sensor rate, subsampling constant and an upper bound on the intensity rate.

Sensor rate	Subsampling constant (c)	Upper bound on intensity rate per hour					Statistical efficiency
		0.5	1	3	5	10
4Hz (EDA)	5	5760	2880	960	576	288	0.833
	10	2880	1440	480	288	144	0.909
	100	288	144	48	29	14	0.990
32Hz (AI)	5	46080	23040	7680	4608	2304	0.833
	10	23040	11520	3840	2304	1152	0.909
	100	2304	1152	384	230	115	0.990

Open in a new tab

4.6. Penalized functional regression models

Recall theoretical results were proven under the assumption that there exists $b^{⋆}$ such that $β (t) = ϕ (t) b^{⋆}$ . To make this assumption plausible, we set $K_{b}$ large enough (but less than $K_{x}$ to ensure identifiability) to ensure the spline basis expansion is sufficiently expressive. However, in practice, such a choice of $K_{b}$ may lead to overfitting the data. Following Goldsmith et al. (2011), we choose the linear spline model and set $ϕ (t) = b_{0} + b_{1} t + \sum_{j = 2}^{K_{b} - 1} {(t - κ_{j})}_{+}$ where ${\{κ_{j}\}}_{j = 2}^{K_{b} - 1}$ are the chosen knots and assume ${\{b_{j}\}}_{j = 2}^{K_{b} - 1} \sim N (0, σ^{2} I)$ to induce smoothness on the spline model. Combining the penalized spline formulation with Lemma 4.1 establishes a connection between our approximate score equations and solving a generalized mixed effects logistic regression with offset. Given the connection with generalized mixed effects models, the inferential machinery to obtain variance-covariance estimates. As we leverage the standard R package ‘glmnet’, the smoothness parameter is chosen via cross-validation. In this context, we acknowledge another limitation. Specifically, penalization may lead to confidence intervals that perform poorly in regions where $\hat{β} (t)$ is over-smoothed.

5. Simulation study

We next assess the proposed methodology via a simulation study. Here, we assume each individual is observed over five days where each day is defined on the unit interval [0,1] with 1000 equally spaced observation times per day. We define $X (t)$ at the grid of observations as a mean-zero Gaussian process with covariance

Σ (t, t^{'}) = \frac{σ^{2}}{Γ (v) 2^{v - 1}} {(\frac{\sqrt{2 v} | t - s |}{ρ_{t}})}^{v} K_{v} (\frac{\sqrt{2 v} | t - s |}{ρ_{t}})

where $K_{v}$ is the modified Bessel function of the second kind. We set $v = 1 / 2,$ $σ^{2} = 1$ , and $ρ = 0.3$ as well as set $K_{b} = K_{x} = 35$ . For simplicity, we assume $Σ$ is known in computation of the eigendecompositions. Given ${X (t)}_{0 \leq t \leq 1}$ for a given user-day, we generate event times according a chosen hazard function $h (t; θ)$ . To mimic our real data, we set

h (t; θ) = \exp (θ_{0} + \int_{0}^{Δ} X (t - s) β (s) d s) .

We set $Δ$ to mimic a 30-minute window for a 12-hour day. We set $θ_{0} = l o g (5 / 1000)$ to set a baseline risk of approximately 5 events per day. We consider two choices of $β (s) : (1) β_{0} + e x p (- β_{1} s)$ which decays to 0 as $s$ approaches $Δ$ from below, and (2) $β_{1} * s i n (2 π \frac{s}{Δ} - π / 2)$ which is significantly different from 0 as $s$ approaches $Δ$ from below.

We generate 1000 datasets, each consisting of 500 user-days. For a given simulated user-day, we randomly sample non-event times using a Poisson process with rate of every five minutes. We use the proposed methodology to construct the estimate ${\hat{β}}_{i, 12} (t)$ for the th simulated dataset; we then subsample the sampled non-event times with thinning probability 1/3, 1/6, 1/12, and 1/24. This results in randomly sampled non-event times given by a Poisson process with rates of every fifteen minutes, half-hour, hour, and two hours. We can construct the corresponding estimates: ${\hat{β}}_{i, 4} (t),$ ${\hat{β}}_{i, 2} (t),$ ${\hat{β}}_{i, 1} (t)$ , and ${\hat{β}}_{i, 0.05} (t)$ respectively. Subsampling allows us to compare the variance due to subsampling as compared to the variance due to sampling fewer non-event times.

Since we are primarily interested in accuracy, we report the mean integrated squared error (MISE) defined as $\frac{1}{1000} \sum_{i = 1}^{1000} \int_{0}^{\infty} {({\hat{β}}_{i, j} (s) - β (s))}^{2} d s$ for each $j$ where $β (s) \equiv 0$ for $s > Δ$ . The MISE is defined in this manner to account for settings where $Δ$ is unknown. Next, let ${\bar{β}}_{j} (t) = \frac{1}{1000} \sum_{i = 1}^{1000} {\hat{β}}_{i, j} (t)$ denote the average estimate for $j = 0.5, 1, 2, 4$ . Then the squared bias is given by $\int_{0}^{\infty} {({\overline{β}}_{j} (s) - β (s))}^{2} d s$ and the variance is given by $\frac{1}{1000} \sum_{j = 1}^{1000} \int_{0}^{\infty} {({\hat{β}}_{i, j} (s) - {\bar{β}}_{j} (s))}^{2} d s$ . The subsampling variance is defined as $\frac{1}{1000} \sum_{i = 1}^{1000} \int_{0}^{\infty} {({\hat{β}}_{i, j} (s) - {\hat{β}}_{i, 12} (s))}^{2} d s$ . Table 2 show the MISE decomposed into the variance and squared biased as well as the subsampling variance. To allow fair comparisons across the two choices of $β (s)$ , all reported numbers are scaled by the integrated square of the true function $\int_{0}^{\infty} β (s)^{2} d s$ . The relative MISE (RMISE) to the lowest sampling rate and the average runtime (in seconds) are also reported.

Table 2.

Mean-integrated squared error, variance, squared bias, and subsampling (SubS) variance for $β (s)$ given by (1) and (2) respectively.

	Sampling rate (per hour)
	12	4.0	2.0	1.0	0.5
SubS. Variance	-	1.2 × 10⁻⁵	2.0 × 10⁻⁵	3.5 × 10⁻⁵	8.6 × 10⁻⁵
Variance	1.3 × 10⁻²	1.5 × 10⁻²	2.1 × 10⁻²	2.7 × 10⁻²	3.8 × 10⁻²
Squared Bias	6.3 × 10 ⁻²	6.0 × 10 ⁻²	6.1 × 10 ⁻²	6.1 × 10 ⁻²	6.7 × 10 ⁻²
MISE	7.6 × 10 ⁻²	7.5 × 10 ⁻²	8.2 × 10 ⁻²	8.8 × 10 ⁻²	1.1 × 10⁻¹
RMISE	-	0.99	1.08	1.16	1.39
Avg. runtime (secs)	356	108	51	24	17
SubS. variance	-	2.7 × 10⁻⁵	5.1 × 10⁻⁵	5.7 × 10⁻⁵	7.5 × 10⁻⁵
Variance	3.7 × 10 ⁻²	2.7 × 10 ⁻²	2.8 × 10⁻²	3.3 × 10⁻²	3.8 × 10 ⁻²
Squared Bias	3.5 × 10 ⁻²	3.6 × 10 ⁻²	3.4 × 10⁻²	3.3 × 10⁻²	3.1 × 10 ⁻²
MISE	7.2 × 10⁻²	6.3 × 10⁻²	6.2 × 10⁻²	6.6 × 10⁻²	6.9 × 10⁻²
RMISE	-	0.87	0.86	0.92	0.96
Avg. runtime (secs)	1084	208	93	47	30

Open in a new tab

Table 2 demonstrates that the variance may increase as the sampling rate decreases. However, the rate of increase in the MISE is relatively quite low. In the first setting, for example, the RMISE when sampling every hour versus every five minutes is 1.16 while the run time is 14.8 times faster. In the second setting, the RMISE remains below or equal to 1 while the run time is 23.1 times faster when sampling every hour rather than every minute. This highlights the efficiency-computation trade-off.

Remark 5.1 (Time-complexity and run-time of maximum likelihood estimation).

The complexity of logistic regression with $n$ observations and covariates of dimension $d$ is O(nd). Maximum likelihood estimation can be well approximated by subsampling at a very high rate; therefore, under a 4Hz sensor the time-complexity of the maximum likelihood estimate is $(4 \times 60 \times τ_{i}) \cdot n \cdot d$ compared to the expected time-complexity under subsampling with rate $c$ per minute of $(c \times τ_{i}) \cdot n \cdot d$ . This means using the sensor at its observation frequency of four times per second will take approximately 213 times as long as subsampling at a rate of once every half-hour. In our first example, using the average run times at each sampling rate to project run time at four times per second yields an approximately 2.0 hour run time. In our second example, we project a run time at four times per second of 6.2 hours. In both instances, the relative efficiency gain would be negligible suggesting a huge computational increase for minimal relative information gain.

5.1. Impact of $Δ$

A concern with the proposed approach is the selection of window-length, $Δ$ . Here we investigate the impact of misspecification of the window length for $β (t) = β_{1} \cdot s i n (2 π s / Δ - π / 2)$ and the true window length $(Δ^{⋆})$ is set to 32 -minutes. See Section E.1 of the supplementary materials for a similar discussion for $β (t) = β_{0} e x p (- β_{1} s)$ . As in the previous simulation, we generate 1000 datasets per condition each consisting of 500 user-days. For each simulation, we analyze the data using window lengths $Δ \in (2 6, 2 9, 3 2, 3 5, 3 7)$ .

When the window length is too large, i.e., $Δ > Δ^{⋆}$ , then asymptotically the estimation is unbiased as $β (t) \equiv 0$ for $t > Δ$ ; however, we incur a penalty in finite samples, especially for settings where the function is far from zero near $t = Δ$ . We find the MISE increases as the absolute error $|Δ - Δ^{⋆}|$ increases. While the MISE increased for $Δ < Δ^{⋆}$ , the pointwise estimation error remains low for $t < Δ$ . This does not hold for $Δ > Δ^{⋆}$ , where instead we see parameter attenuation, i.e., a bias towards zero in the estimates at each $0 < t < Δ$ . To capture this, we define a partial MISE as $\frac{Δ}{\tilde{Δ}} \int_{0}^{\tilde{Δ}} {({\hat{β}}_{i, j} (s) - \hat{β} (s))}^{2} d s$ where $\tilde{Δ} = m i n (Δ, Δ^{⋆})$ , which is the MISE on the subset $0 < t < m i n (Δ, Δ^{⋆})$ and scaled for comparative purposes.

5.2. Selection of bandwidth

While above we explore the bias-variance trade-off under bandwidth misspecification, here we explore a data-adaptive method for bandwidth selection. This is a critical consideration for recent history functional linear models (Kim et al., 2011). Given the model complexity does not change as a function of $Δ$ in our simulations, our proposal is to compare AIC across a range of bandwidths and choose that which maximizes the criterion. Table 4 presents AIC-based selection across the 500 simulated datasets analyzed in the prior section. We see markedly distinct behavior of the selection criterion in the two settings. The AIC-based selection works well for the sinusoidal effect across subsampling rates, but performs poorly in the exponential setting. Recall that the sinusoidal effect is significantly different from 0 at $Δ^{⋆}$ and then 0 for all $s > Δ^{⋆}$ while the exponential effect decays slowly towards 0 as $s \to^{-} Δ$ . Therefore, the selection problem under the sinusoidal effect is much easier than under the exponential effect. Moreover, bias, variance, and MISE do not vary substantially under bandwidth misspecification in the exponential setting. Thus, we conclude that the AIC-based selection method works well in settings where inference depends heavily on estimating the bandwidth accurately. Note that when $Δ$ is inaccurately selected as in the exponential case, the relative MISE when Delta is selected by AIC is a more useful indicator of AIC.

Table 4.

Rate of selection of $\hat{Δ}$ across 500 simulations when true $Δ = 32$ minutes. Bold: most likely bandwidth given a chosen subsampling rate.

	Exponential				Sinusoidal
Δ = \ Rate	4.0	2.0	1.0	0.5	4.0	2.0	1.0	0.5
26	0.38	0.35	0.30	0.26	0.00	0.00	0.00	0.00
29	0.22	0.20	0.17	0.12	0.09	0.14	0.15	0.18
32*	0.11	0.12	0.13	0.18	0.85	0.79	0.76	0.70
35	0.13	0.13	0.14	0.15	0.05	0.06	0.06	0.09
37	0.14	0.20	0.26	0.28	0.01	0.01	0.02	0.03

Open in a new tab

6. Extensions

In this section, we demonstrate the flexibility of the proposed approach by exploring extensions in several important directions to ensure these methods are robust for practical use with high frequency data. This section will continue to leverage the connection to generalized functional linear models provided by Lemma 4.1.

6.1. Multivariate extensions

In this section, we extend our model to the case of multiple functional regressors. That is, suppose $L$ health processes, i.e., $x_{i} = {\{x_{i} (t) = (x_{i, 1} (t), \dots, x_{i, L} (t))\}}_{0 < s < τ_{i}}$ , for each participant is measured at a dense grid of time points. In the suicidal ideation case study, for example, accelerometer is measured at a rate of 32Hz while electrodermal activity (EDA) is measured at a rate of 4Hz. A multivariate extension of our model (1) is given by

h_{i} (t ∣ H_{i, t}^{N X}; θ) = h_{0} (t; γ) \exp (g_{t} {(H_{i, t}^{N})}^{'} α + \int_{t - Δ_{1}}^{t} x_{i, 1} (s) β_{1} (s) d s + \dots + \int_{t - Δ_{L}}^{t} x_{i, L} (s) β_{L} (s) d s)

(8)

The approach given in Section 4.3 extends naturally to the multivariate functional setting. For each functional regressor, we estimate the pooled sample covariance $Σ_{y, l}$ for $y \in {0, 1}$ and $l = 1, \dots, L$ as in Section 4.1. Let $\sum_{k = 1}^{\infty} {\hat{λ}}_{k, l}^{(y)} {\hat{ψ}}_{k, l}^{(y)} (s) {\hat{ψ}}_{k, l} (t)$ be the spectral decomposition of ${\hat{Σ}}_{y, 1}$ . Then $x_{i, j} (t)$ is approximated using a truncated Karhunen-Loeve $\int_{t - Δ_{l}}^{t} X_{l} (t, s) β_{l} (t) = [M_{l, t} + c_{l} (t)^{⊤} J_{{\hat{ψ}}_{l}^{(y)}, ϕ_{l}}] b_{l}$ .

6.2. Missing data

Sensor data can often be missing for intervals of time due to sensor wearing issues. In the suicidal ideation case study, for example, there are 2139 self-identified moments of distress across all 91 participants. Of these, 1289 event times had complete data for the prior thirty minutes, 1984 had fraction of missing data on a fine grid less than 30%, and 1998 had fraction of missing data on a fine grid less than 10%.

Missing data is a critical issue because $c_{i, k} (t)$ cannot be estimated if $X (s, t)$ is not observed for all $s \in [0, Δ]$ . Moreover, standard errors should reflect the uncertainty in these coefficients when missing data is prevalent. Goldsmith et al. (2011) suggest using best linear unbiased predictors (BLUP) or posterior modes in the mixed effects model to estimate $c_{i, k} (t)$ ; however, this is ineffective when there is substantial variability in these estimates. To deal with this, Crainiceanu and Goldsmith (2010) take a full Bayesian analysis. Yao et al. (2005) introduced PACE as an alternative frequentist method. Petrovich et al. (2018) shows that for sparse, irregular longitudinal, the imputation model should not ignore the outcome variable $Y_{i} (t)$ .

Here we present an extension of Petrovich et al. (2018) to our setting by leveraging Lemma 4.1 and the marginal covariance estimation procedure to construct a multiple imputation procedure. Let $x_{i} (t)$ denote incomplete sensor data at time $t$ (i.e., at times ${\{s_{i, r}\}}_{r = 1}^{k_{i t}}$ in $[0, Δ]$ . Then

ℰ [X_{i} (s, t) ∣ Y_{i} (t) = y, x_{i} (t)] = μ_{y} (s, t) + a_{i, t}^{⊤} (s) B_{i, t} (x_{i} (t) - μ_{i} (t))

(9)

C o v (X_{i} (s, t), X_{i} (s^{'}, t) ∣ Y_{i} (t) = y, x_{i} (t)) = Σ_{t} (s, s^{'}) - a_{i, t} (s)^{⊤} B_{i, t} a_{i, t} (s^{'})

(10)

where we have

a_{i, t} {(s)}^{⊤} = (\begin{matrix} Σ_{t} (s_{i, 1}, s) \\ ⋮ \\ Σ_{t} (s_{i, k_{i t}}, s) \end{matrix}); B_{i, t}^{- 1} = (\begin{matrix} Σ_{t} (s_{i, 1}, s_{i, 1}) & Σ_{t} (s_{i, 1}, s_{i, 2}) & \dots \\ Σ_{t} (s_{i, 1}, s_{i, 2}) & ⋱ & ⋮ \\ ⋮ & \dots & Σ_{t} (s_{i, k_{i t}}, s_{i, k_{i t}}) \end{matrix}),

$μ_{i} (t) = ℰ [x_{i} (t) ∣ Y_{i} (t) = y] = {μ_{y} (s_{j}, t)}_{j = 1}^{r}$ and $μ_{y} (s, t)$ is the mean of $X (t, s)$ from group $y$ , and $Σ_{t} (s^{'}, s)$ is the covariance between $X (s^{'}, t)$ and $X (s, t)$ for $s^{'}, s \in [0, Δ]$ and $t \in R_{+}$ . Note that Petrovich et al. (2018) requires modeling the mean and covariance functions separately for $y = 0$ and $y = 1$ .

6.2.1. Multiple imputation under uncongeniality

Multiple imputation yields valid frequentist inferences when the imputation and analysis procedure are congenial (Meng, 1994); the above procedure is derived for function-on-scalar multiple imputation for binary outcomes, which ignored the joint nature of recurrent event analysis in the presence of high frequency sensor data. The main advantage of the above imputation framework is its simplicity and approximate congeniality when events are rare and the sampling rate is low. The main disadvantage is that the above framework is uncongenial under many events and/or high sampling rates. Meng (1994) defines congeniality between an imputation procedure and an analyst’s complete (and incomplete) analysis procedure if there exists a unifying Bayesian model which embeds the imputer’s imputation model and the analyst’s complete data procedure. A recent discussion of congeniality can be found Bartlett and Hughes (2020). Congeniality ensures good frequentist coverage properties. In some uncongenial settings, standard multiple imputation can be biased downwards leading to under-coverage of confidence intervals. A key question is whether we can use the above imputation methods within a general procedure to handle uncongeniality.

To address this, we use the recommendation from Bartlett and Hughes (2020) and consider a method that first bootstraps a sample from the dataset and then apply multiple imputation to each bootstrap sample. This general approach was originally proposed by Shao and Sitter (1996) and Little and Rubin (2002). We suppose $B$ bootstraps and $M$ imputations per bootstrap; let ${\hat{θ}}_{b, m}$ denote the estimator for the $m$ th imputation of the $b$ th bootstrap. The point estimator is given by ${(B)}^{- 1} \sum_{b = 1}^{B} {\hat{θ}}_{b}$ where ${\hat{θ}}_{b} = M^{- 1} \sum_{m = 1}^{M} {\hat{θ}}_{b, m}$ . To construct the confidence interval, we require mean sum of squares with and between boostraps, i.e., $M S W = \frac{1}{B (M - 1)} {\sum_{b = 1}^{B} \sum_{m = 1}^{M} ({\hat{θ}}_{b, m} - {\hat{θ}}_{b}) ({\hat{θ}}_{b, m} - {\hat{θ}}_{b})}^{⊤}$ and $M S B = \frac{1}{B - 1} \sum_{b = 1}^{B} ({\hat{θ}}_{b} - \hat{θ}) {({\hat{θ}}_{b} - \hat{θ})}^{⊤}$ respectively. Then the estimator of the variance-covariance matrix of $\hat{θ}$ is given by ${\hat{Σ}}_{B, M} = (\frac{B + 1}{B M}) M S B - M S W / M$ . We obtain the varaiance for $β (t)$ by $ϕ (t) {\hat{Σ}}_{B, M} ϕ (t)^{⊤}$ . We follow Bartlett and Hughes (2020) and construct confidence intervals based on Satterthwaite’s degrees of freedom, which here is given by

\hat{v} = \frac{{[(\frac{B + 1}{B M}) M S B - \frac{M S W}{M}]}^{2}}{{(B - 1)}^{- 1} {(\frac{B + 1}{B M})}^{2} M S B^{2} + \frac{M S W^{2}}{B M^{2} (M - 1)}}

The bootstrap followed by multiple imputation procedure has been studied extensively by Bartlett and Hughes (2020) and is robust to uncongeniality. The main disadvantage of this approach is its considerable computational intensity. Recall likelihood calculations were computationally prohibitive by themselves, so combining with bootstrap and MI would further increase this large-scale computation. The random subsampling framework thus simplifies handling of missing data via connections to function-on-scalar multiple imputation by Petrovich et al. (2018) as well as to bootstrap to handle uncongeniality by Bartlett and Hughes (2020). Ignoring the computational time of bootstrap sampling, the computational time for the first choice in the simulation study with $B = 200$ bootstraps and $M = 2$ imputations per bootstrap leads to 35 hours for a sampling rate of 0.5 compared to 7 hours for a sampling rate of 4, which highlights the benefits of the proposed framework.

6.3. Multilevel models

The approach can be extended to multilevel models with functional regressors, which are critical in mobile health where a high degree of individual variation is often observed. Let $b_{i} \sim N (0, σ_{b}^{2})$ , then the multilevel extension of (1) is

h_{i} (t ∣ H_{i, t}^{N X}; θ, b_{i}) \approx \exp (Z_{t}^{⊤} γ + g_{t} {(H_{i, t}^{N})}^{'} α + [M_{i, t}^{⊤} + C_{i, t}^{⊤} J_{\hat{ψ}, ϕ}] (β + b_{i})) = \exp (W_{i, t}^{⊤} θ + Z_{i, t}^{⊤} b_{i}),

(11)

where $Z_{i, t} = M_{i, t} + C_{i, t} J_{\hat{ψ}, ϕ}$ . Lemma 4.1 implies that the random subsampling framework applied to equation (11) leads to a penalized logistic mixed-effects model. As far as the authors are aware, the combination of mixed-effects and $L_{2^{-}}$ penalization on a subset of parameters is not available in existing software. Given the paper’s focus on “off-the-shelf” software implementations, multilevel models are considered important future work.

7. A worked example: Adolescent psychiatric inpatient mHealth study

During an eight month period in 2018, 91 psychiatric inpatients admitted for suicidal risk to Franciscan Children’s Hospital were enrolled in an observational study. Study data were graciously provided by Evan Kleiman and his study team (https://kleimanlab.org). Each study participant wore an Empatica E4 (Garbarino et al., 2014), a medical-grade wearable device that offers real-time physiological data. On each user-day, participants were asked to self-identify moments of suicidal distress. At these times, the participant was asked to press a button on the Empatica E4 device. The timestamp of the button press was recorded. One of the primary study goals was to assess the association between sensor-based physiological measurements and self-identified moments of suicidal distress. In particular, the scientific question is whether there are early indicators of escalating distress by monitoring physiological correlates.

A key concern is whether all moments of suicidal distress are recorded. To ensure this, clinical staff interviewed participants in the evening who were then asked to review their button press activity. Any events that were identified as incorrect button press activity were removed. At the end of the 30-day study period, the average number of button presses per day was 2.42 with a standard deviation of 2.62. Investigation of the button press data shows low button-press counts prior to 7AM and a sharp drop off by 11PM. This demonstrates an additional concern: events can only occur when the individual is at-risk, i.e., (A) the individual is wearing the Empatica E4 and (B) is currently awake. To deal with (A) and (B), here we define each study day to begin at 9AM and end at 8PM.

Figure 1 visualizes button presses versus time since study entry for each user. Day 30 is assumed to censor the observation process. A black mark signals dropout before day 30. Figure 1 shows the potential heterogeneity in button press rates between users and study days. To assess whether there is between user or between study day variation, Table 5 presents a two-way ANOVA decomposition of the button press counts as a function of participant and day in study. The ANOVA decomposition demonstrates high variation with day in study and across users.

Fig. 1 — User button-presses (red) versus time since study entry (in hours). The black mark indicates the final sensor measurement time.

Table 5.

ANOVA decomposition of daily button press counts

	DF	Sum Sq	Mean Sq	F value	Pr(>F)
Participant	88	2675.4	30.4	8.9	< 2 × 10⁻¹⁶
Day in Study	29	443.3	15.3	4.5	3.9 × 10⁻¹³
Residuals	672	2304.2	3.4

Open in a new tab

Here, we focus on two physiological processes – (1) electrodermal activity (EDA), a measure of skin conductance measured at 4Hz, and (2) activity index (AI) (Bai et al., 2016), a coordinate-free feature built from accelerometer data collected at 32Hz that measures physical movement. Electrodermal activity can be significantly impacted by external factors (e.g., room temperature). To account for the high between user-day variation, we analyze EDA standardized per study-day and device. The individual EDA and AI trajectories are highly variable which tends to obscure patterns and trends. In Figure 2, the mean trajectories of EDA and AI are plotted in reverse time from button press timestamps, which shows sharp changes in EDA and AI in the 30 minutes prior to button presses. In Figure 2 in Appendix E.3, the mean trajectories of EDA and AI are plotted in reverse time for the sampled non-event times in the 30 minutes prior to the non-event time. The distinct mean behavior motivates our desire to model these two processes separately as discussed in Remark 4.2.

Fig. 2 — Average scaled EDA and AI in the 30 minutes prior to button presses

7.1. Complete-case analysis

Inspection of Figures 2(a) and 2(b) suggest setting $Δ$ between 5 and 30 minutes. Here, we investigate three potential window lengths, $Δ = 5, 15$ , and 30. To ensure minimal loss of efficiency, the subsampling rate was set to once every fifteen minutes. Given the daily button press rate, this ensures an average of 44 non-events to 2.5 events per day. Based on Table 1, this ensures we can achieve a substantial data reduction at a minimal loss of efficiency. After sampling non-event times, complete-case analyses are performed, i.e., sampled times where sensors included in the model have any level of missing data are ignored. Table 6 presents both AIC and BIC criteria on the complete-case data, normalized by the number of observations to ensure fair comparison. We find that $Δ = 15$ is adequate to capturing the proximal impact of EDA and AI on the risk of a button press.

Table 6.

Normalized AIC and BIC for different choices of Δ using complete-case data. For $Δ = 5$ , $k_{b} = k_{x} = 31$ due to amount of data in shorter window-length, while $k_{b} = k_{x} = 35$ for $Δ = 15$ and 30.

Delta	5	15	30
AIC	0.96	0.98	0.95
BIC	0.99	1.01	0.99

Open in a new tab

In Section E.2 of the supplementary materials, Figures 1a and 1b present the per-patient average fraction with missing EDA and AI in 30-minute windows respectively. For EDA, we have a wide range of missingness – from 0% to over 40% across individuals with most between 5% and 30% average fraction missing. For AI, the missingness is less pronounced – from 0% to 15% across individuals with most between 0% and 10% average fraction missing.

We analyzed activity index (AI) and electrodermal activity jointly in a multivariate model as in equation (8). To account for times when the participants may not be wearing the device and thus not be at-risk for a button press, we limited the analysis to data collected between 9AM and 8PM. In the analysis, we assumed a constant baseline hazard and included a binary indicator of whether the participant had an event in the past 12 hours to account for patient heterogeneity in the number of events as seen in Figure 1. Figure 3(a) and 3(b) presents the estimates from the joint analysis and their associated 95% confidence intervals using the bootstrap and multiple imputation strategy. We highlight in gray the statistically significant regions. Here, we see that standardized EDA is not associated with increased risk of button press, while activity index sees a positive association in the final few minutes prior to a button press.

Fig. 3 — $β (t)$ for AI and EDA with 95% CI; solid line is point estimate for the complete-case analysis, while highlighted region are the model-based pointwise 95% confidence intervals.

7.2. Sensitivity analysis for window length

We next investigate whether the results are sensitive to window length. Specifically, we re-analyzed activity index (AI) and electrodermal activity jointly in a multivariate model as in equation (8) with $Δ = 5$ and 30 minutes. Figure 4(a) and 4(b) presents the estimates from the joint analysis for the activity index (AI) and their associated 95% confidence intervals using the bootstrap and multiple imputation strategy. We highlight in gray the statistically significant regions which remain similar across all three choices of window length. We do not present results for the standardized EDA as it continues to be not associated with increased risk of button press.

Fig. 4 — $β (t)$ for AI and EDA with 95% CI; solid line is point estimate for the missing data analysis, while highlighted region are the boostrap and MI based pointwise 95% confidence intervals.

8. Discussion

In this paper, we have presented a methodology for translating a difficult functional analysis with recurrent events problem into a traditional logistic regression. The translation leveraged subsampling and weighting techniques, specifically the use of weights suggested by Waagepetersen (2008), along with flexible functional data analysis methods of Goldsmith et al. (2011) with marginal covariance methods for longitudinal functional data of Park and Staicu (2015). The proposed methodology abides by the idea that we should make data as small as possible as fast as possible. Subsampling and weighting converts the problem to well-known territory which allows us to leverage existing software. We show limited loss of efficiency when the subsampling is properly tuned to the event rates. Important extensions to an online sampling algorithm, optimal weighting when the Poisson point process assumption does not hold, and non-linear functional data methods are all considered important future work.

Supplementary Material

Supp 1

NIHMS1941631-supplement-Supp_1.pdf^{(162.5KB, pdf)}

Table 3.

Mean-integrated squared error (MISE), variance, and partial MISE as a function of Δ* and sampling rate when true $Δ = 32$ minutes.

		Sampling rate (per hour)
Δ =		4.0	2.0	1.0	0.5
26	MISE	0.390	0.395	0.397	0.413
	Variance	5.0 × 10⁻²	5.3 × 10⁻²	5.5 × 10⁻²	6.8 × 10⁻²
	P-MISE	0.210	0.216	0.217	0.235
29	MISE	0.220	0.225	0.224	0.238
	Variance	3.7 × 10⁻²	4.4 × 10⁻²	4.3 × 10⁻²	5.5 × 10⁻²
	P-MISE	0.084	0.090	0.088	0.103
32	MISE	0.061	0.062	0.061	0.072
	Variance	2.7 × 10⁻²	2.9 × 10⁻²	3.0 × 10⁻²	4.2 × 10⁻²
	P-MISE	0.061	0.062	0.061	0.072
35	MISE	0.246	0.247	0.249	0.255
	Variance	1.9 × 10⁻²	2.1 × 10⁻²	2.3 × 10⁻²	2.9 × 10⁻²
	P-MISE	0.154	0.154	0.154,	0.160
37	MISE	0.360	0.359	0.353	0.361
	Variance	2.6 × 10⁻²	3.3 × 10⁻²	3.7 × 10⁻²	4.4 × 10⁻²
	P-MISE	0.256	0.255	0.252	0.259

Open in a new tab

References

Andersen PK, Borgan O, Gill RD, and Keiding N. Statistical models based on counting processes. 1993.
Bai Jiawei, Di Chongzhi, Xiao Luo, Evenson Kelly R., Andrea Z. LaCroix, Ciprian M. Crainiceanu, and David M. Buchner. An activity index for raw accelerometry data and its comparison with other activity metrics. PLOS ONE, 11(8):1–14, August 2016. doi: 10.1371/journal.pone.0160644. URL 10.1371/journal.pone.0160644. [DOI] [PMC free article] [PubMed] [Google Scholar]
Baladandayuthapani V, Mallick BK, Young Hong M, Lupton JR, Turner ND, and Carroll RJ. Bayesian hierarchical spatially correlated functional data analysis with application to colon carcinogensis. Biometrics, 64(1):64–73, 2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bartlett Jonathan W and Hughes Rachael A. Bootstrap inference for multiple imputation under uncongeniality and misspecification. Statistical Methods in Medical Research, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
Caffo Brian S., Jank Wolfgang, and Jones Galin L.. Ascent-based monte carlo expectation–maximization. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2):235–251, 2005. doi: 10.1111/j.1467-9868.2005.00499.x. URL 10.1111/j.1467-9868.2005.00499.x. [DOI] [Google Scholar]
Cassel C-M, Särndal, and Wretman JH. Foundations of inference in survey sampling. Wiley, New York, 1977. [Google Scholar]
Chen K and Müller H-G. Modeling repeated functional observations. Journal of the American Statistical Association, 107(500):1599–1609, 2012. [Google Scholar]
Crainiceanu C and Goldsmith J. Bayesian functional data analysis using winbugs. Journal of Statistical Software, 32:1–33, 2010. [PMC free article] [PubMed] [Google Scholar]
Crainiceanu Ciprian M., Staicu Ana-Maria, and Di Chong-Zhi. Generalized multilevel functional regression. Journal of the American Statistical Association, 104(488):1550–1561, 2009. doi: 10.1198/jasa.2009.tm08564. [DOI] [PMC free article] [PubMed] [Google Scholar]
Cui Erjia, Crainiceanu Ciprian M., and Leroux Andrew. Additive functional cox model. Journal of Computational and Graphical Statistics, 30(3):780–793, 2021. doi: 10.1080/10618600.2020.1853550. [DOI] [PMC free article] [PubMed] [Google Scholar]
Di CZ, Crainiceanu CM, Caffo BS, and Punjabi NM. Multilevel functional principal component analysis. Annals of applied statistics, 3(1):458–488, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
Dong Jianghu (James), Cao Jiguo, Gill Jagbir, Miles Clifford, and Plumb Troy. Functional joint models for chronic kidney disease in kidney transplant recipients. Statistical Methods in Medical Research, 30(8):1932–1943, 2021. doi: 10.1177/09622802211009265. URL 10.1177/09622802211009265. [DOI] [PubMed] [Google Scholar]
Free C, Phillips G, Galli L, Watson L, Felix L, Edwards P, Patel V, and Haines A. The effectiveness of mobile-health technology-based health behaviour change or disease management interventions for health care consumers: A systematic review. PLOS Medicine, 10(1):1–45, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
Garbarino M, Lai M, Bender D, Picard RW, and Tognetti S. Empatica e3 — a wearable wireless multi-sensor device for real-time computerized biofeedback and data acquisition. In 2014 4th International Conference on Wireless Mobile Communication and Healthcare - Transforming Healthcare Through Innovations in Mobile and Wireless Technologies (MOBIHEALTH), pages 39–42, 2014. [Google Scholar]
Goldsmith J, Bobb J, Crainiceanu C, Caffo B, and Reich D. Penalized functional regression. Jounal of Computational and Graphical Statistics, 20(4):830–851, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
Goldsmith J, Zipunnikov V, and Schrck J. Generalized multilevel functional-on-scalar regression and principal component analysis. Biometrics, 71(2):344–353, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
Greven S, Crainiceanu C, Caffo B, and Reich D. Longitudinal functional principal component analysis. Electronic journal of statistics, 4:1022–1054, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hastie T, Tibshirani R, and Friedman J. The elements of statistical learning. Springer Series in Statistics. Springer, 2009. [Google Scholar]
Henderson R, Diggle P, and Dobson A. Joint modeling of longitudinal measurements and event time data. Biostatistics, 1:465–480, 2000. [DOI] [PubMed] [Google Scholar]
James G. Generalized linear models with functional predictors. Journal of the Royal Statistical Society, Ser B, 64:411–432, 2002. [Google Scholar]
James G and Silverman B. Functional adaptive model estimation. Journal of the American Statistical Association, 100:565–576, 2005. [Google Scholar]
Kim Kion, Şentürk Damla, and Li Runze. Recent history functional linear models for sparse longitudinal data. Journal of Statistical Planning and Inference, 141(4): 1554–1566, 2011. ISSN 0378–3758. doi: 10.1016/j.jspi.2010.11.003. URL https://www.sciencedirect.com/science/article/pii/S0378375810005033. [DOI] [PMC free article] [PubMed] [Google Scholar]
Klasnja P, Smith S, Seewald N, Lee A, Hall K, Luers B, Hekler E, and Murphy SA. Efficacy of contextually tailored suggestions for physical activity: A micro-randomized optimization trial of heartsteps. Annals of Behavioral Medicine, 53:573–582, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kleiman Evan M., Turner Brianna J., Fedor Szymon, Beale Eleanor E., Picard Rosalind W., Huffman Jeff C., and Nock Matthew K.. Digital phenotyping of suicidal thoughts. Depression and Anxiety, 35(7):601–608, 2018. [DOI] [PubMed] [Google Scholar]
Kokoszka P and Reimherr M. Introduction to Functional Data Analysis. Chapman and Hall/CRC, 1st ed edition, 2017. doi: 10.1201/9781315117416. [DOI] [Google Scholar]
Kong Dehan, Ibrahim Joseph G., Lee Eunjee, and Zhu Hongtu. Flcrm: Functional linear cox regression model. Biometrics, 74(1):109–117, 2018. doi: 10.1111/biom.12748. URL 10.1111/biom.12748. [DOI] [PMC free article] [PubMed] [Google Scholar]
Li Cai, Xiao Luo, and Luo Sheng. Joint model for survival and multivariate sparse functional data with application to a study of alzheimer’s disease. Biometrics, 78(2):435–447, 2022. doi: 10.1111/biom.13427. URL 10.1111/biom.13427. [DOI] [PMC free article] [PubMed] [Google Scholar]
Li Y and Guan Y. Functional principal component analysis of spatio-temporal point processes with applications in disease surveillance. Journal of the american statistical association, 109 (507):1205–1215, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
Little RJA and Rubin DB. Statistical Analysis with Missing Data. Wiley, 2nd edition, 2002. [Google Scholar]
Liu Hua, You Jinhong, and Cao Jiguo. Functional l-optimality subsampling for massive data, 2021. URL https://arxiv.org/abs/2104.03446.
Marx BD and Eilers PH. Low-rank scale-invariant tensor product smooths for generalized additive mixed models. Biometrics, 62(4):1025–1036, 2006. [DOI] [PubMed] [Google Scholar]
McCulloch Charles E and Searle Shayle R.. Generalized, Linear and Mixed Models. Wiley, New York, 2001. [Google Scholar]
Meng XL. Multiple-imputation inferences with uncongenial sources of input (with discussion). Statistical Science, 10:538–573, 1994. [Google Scholar]
Morris JS and Carroll RJ. Wavelet-based functional mixed models. Journal of the Royal Statistical Society, Series B, 68(2):179–199, 2006. [DOI] [PMC free article] [PubMed] [Google Scholar]
Morris JS, Vannucci M, Brown PJ, and R.J. (2003) Carroll. Wavelet-based nonparametric modeling of hierarchical functions in colon carginogenesis. Journal of the American Statistical Association, 98(463):573–583, 2003. [Google Scholar]
Muller H. Functional modelling and classification of longitudinal data. Scandinavian Journal of Statistics, 32:223–240, 2005. [Google Scholar]
Park SY and Staicu AM. Longitudinal functional data analysis. Stat, 4(1):212–226, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
Petrovich J, Reimherr M, and Daymont C. Functional regression models with highly irregular designs. 2018.
Ramsay J and Silverman B. Functional data analysis. Springer, New York, 2005a. [Google Scholar]
Ramsay JO and Silverman BW. Functional Data Analysis. Springer, New York, NY, 2005b. [Google Scholar]
Rathbun S. Optimal estimation of poisson intensity with partially observed covariates. Biometrika, 100:277–281, 2012. [Google Scholar]
Rathbun S and Shiffman S. Mixed effects models for recurrent events data with partially observed time-varying covariates: Ecological momentary assessment of smoking. Biometrics, 72:46–55, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
Reiss Philip T and Ogden R. Todd. Functional principal component regression and functional partial least squares. Journal of the American Statistical Association, 102(479):984–996, 2007. doi: 10.1198/016214507000000527. [DOI] [Google Scholar]
Rizopoulos D. Jm: An r package for the joint modeling of longitudinal and time-to-event data. Journal of Statistical Software, 35:1–33, 2010.21603108 [Google Scholar]
Ruppert D. Selecting the number of knots for penalized splines. Journal of Computational and Graphical Statistics, 11(4):735–757, 2002. [Google Scholar]
Ruppert D, Wand M, and Carroll R. Semiparametric Regression. €Cambridge University Press, Cambridge, 2003. [Google Scholar]
Shao Jun and Randy R Sitter. Bootstrap for imputed survey data. Journal of the American Statistical Association, 91(435):1278–1288, 1996. [Google Scholar]
Spring B. Sense2stop: Mobile sensor data to knowledge. https://clinicaltrials.gov/ct2/show/NCT03184389, 2019.
Staicu A-M, Crainiceanu CM, and Carroll RJ. Fast methods for spatially correlated multilevel functional data. Biostatistics, 11(2):177–194, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
Tsiatis AA and Davidian M. Joint modeling of longitudinal and time-to-event data: an overview. Statistica Sinica, 14:809–834, 2004. [Google Scholar]
Waagepetersen Rasmus. Estimating functions for inhomogeneous spatial point processes with incomplete covariate data. Biometrika, 95(2):351–363, 2008. [Google Scholar]
Wood S. Generalized additive models: an introduction with R. Chapman & Hall, London, 2003. [Google Scholar]
Wood SN. Low-rank scale-invariant tensor product smooths for generalized additive mixed models. Biometrics, 62(4):1025–1036, 2006. [DOI] [PubMed] [Google Scholar]
Xiao L, Li Y, and Ruppert D. Fast bivariate p-splines: the sandwich smoother. Journal of the Royal Statistical Society: Series B, 75(3):5770–599, 2013. [Google Scholar]
Yao F, Müller H-G, and Wang J-L. Functional data analysis for sparse longitudinal data. Journal of the American Statistical Association, 100(470):577–590, 2005. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supp 1

NIHMS1941631-supplement-Supp_1.pdf^{(162.5KB, pdf)}

[R1] Andersen PK, Borgan O, Gill RD, and Keiding N. Statistical models based on counting processes. 1993.

[R2] Bai Jiawei, Di Chongzhi, Xiao Luo, Evenson Kelly R., Andrea Z. LaCroix, Ciprian M. Crainiceanu, and David M. Buchner. An activity index for raw accelerometry data and its comparison with other activity metrics. PLOS ONE, 11(8):1–14, August 2016. doi: 10.1371/journal.pone.0160644. URL 10.1371/journal.pone.0160644. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] Baladandayuthapani V, Mallick BK, Young Hong M, Lupton JR, Turner ND, and Carroll RJ. Bayesian hierarchical spatially correlated functional data analysis with application to colon carcinogensis. Biometrics, 64(1):64–73, 2008. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] Bartlett Jonathan W and Hughes Rachael A. Bootstrap inference for multiple imputation under uncongeniality and misspecification. Statistical Methods in Medical Research, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] Caffo Brian S., Jank Wolfgang, and Jones Galin L.. Ascent-based monte carlo expectation–maximization. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2):235–251, 2005. doi: 10.1111/j.1467-9868.2005.00499.x. URL 10.1111/j.1467-9868.2005.00499.x. [DOI] [Google Scholar]

[R6] Cassel C-M, Särndal, and Wretman JH. Foundations of inference in survey sampling. Wiley, New York, 1977. [Google Scholar]

[R7] Chen K and Müller H-G. Modeling repeated functional observations. Journal of the American Statistical Association, 107(500):1599–1609, 2012. [Google Scholar]

[R8] Crainiceanu C and Goldsmith J. Bayesian functional data analysis using winbugs. Journal of Statistical Software, 32:1–33, 2010. [PMC free article] [PubMed] [Google Scholar]

[R9] Crainiceanu Ciprian M., Staicu Ana-Maria, and Di Chong-Zhi. Generalized multilevel functional regression. Journal of the American Statistical Association, 104(488):1550–1561, 2009. doi: 10.1198/jasa.2009.tm08564. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] Cui Erjia, Crainiceanu Ciprian M., and Leroux Andrew. Additive functional cox model. Journal of Computational and Graphical Statistics, 30(3):780–793, 2021. doi: 10.1080/10618600.2020.1853550. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] Di CZ, Crainiceanu CM, Caffo BS, and Punjabi NM. Multilevel functional principal component analysis. Annals of applied statistics, 3(1):458–488, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] Dong Jianghu (James), Cao Jiguo, Gill Jagbir, Miles Clifford, and Plumb Troy. Functional joint models for chronic kidney disease in kidney transplant recipients. Statistical Methods in Medical Research, 30(8):1932–1943, 2021. doi: 10.1177/09622802211009265. URL 10.1177/09622802211009265. [DOI] [PubMed] [Google Scholar]

[R13] Free C, Phillips G, Galli L, Watson L, Felix L, Edwards P, Patel V, and Haines A. The effectiveness of mobile-health technology-based health behaviour change or disease management interventions for health care consumers: A systematic review. PLOS Medicine, 10(1):1–45, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] Garbarino M, Lai M, Bender D, Picard RW, and Tognetti S. Empatica e3 — a wearable wireless multi-sensor device for real-time computerized biofeedback and data acquisition. In 2014 4th International Conference on Wireless Mobile Communication and Healthcare - Transforming Healthcare Through Innovations in Mobile and Wireless Technologies (MOBIHEALTH), pages 39–42, 2014. [Google Scholar]

[R15] Goldsmith J, Bobb J, Crainiceanu C, Caffo B, and Reich D. Penalized functional regression. Jounal of Computational and Graphical Statistics, 20(4):830–851, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] Goldsmith J, Zipunnikov V, and Schrck J. Generalized multilevel functional-on-scalar regression and principal component analysis. Biometrics, 71(2):344–353, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] Greven S, Crainiceanu C, Caffo B, and Reich D. Longitudinal functional principal component analysis. Electronic journal of statistics, 4:1022–1054, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] Hastie T, Tibshirani R, and Friedman J. The elements of statistical learning. Springer Series in Statistics. Springer, 2009. [Google Scholar]

[R19] Henderson R, Diggle P, and Dobson A. Joint modeling of longitudinal measurements and event time data. Biostatistics, 1:465–480, 2000. [DOI] [PubMed] [Google Scholar]

[R20] James G. Generalized linear models with functional predictors. Journal of the Royal Statistical Society, Ser B, 64:411–432, 2002. [Google Scholar]

[R21] James G and Silverman B. Functional adaptive model estimation. Journal of the American Statistical Association, 100:565–576, 2005. [Google Scholar]

[R22] Kim Kion, Şentürk Damla, and Li Runze. Recent history functional linear models for sparse longitudinal data. Journal of Statistical Planning and Inference, 141(4): 1554–1566, 2011. ISSN 0378–3758. doi: 10.1016/j.jspi.2010.11.003. URL https://www.sciencedirect.com/science/article/pii/S0378375810005033. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R23] Klasnja P, Smith S, Seewald N, Lee A, Hall K, Luers B, Hekler E, and Murphy SA. Efficacy of contextually tailored suggestions for physical activity: A micro-randomized optimization trial of heartsteps. Annals of Behavioral Medicine, 53:573–582, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R24] Kleiman Evan M., Turner Brianna J., Fedor Szymon, Beale Eleanor E., Picard Rosalind W., Huffman Jeff C., and Nock Matthew K.. Digital phenotyping of suicidal thoughts. Depression and Anxiety, 35(7):601–608, 2018. [DOI] [PubMed] [Google Scholar]

[R25] Kokoszka P and Reimherr M. Introduction to Functional Data Analysis. Chapman and Hall/CRC, 1st ed edition, 2017. doi: 10.1201/9781315117416. [DOI] [Google Scholar]

[R26] Kong Dehan, Ibrahim Joseph G., Lee Eunjee, and Zhu Hongtu. Flcrm: Functional linear cox regression model. Biometrics, 74(1):109–117, 2018. doi: 10.1111/biom.12748. URL 10.1111/biom.12748. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R27] Li Cai, Xiao Luo, and Luo Sheng. Joint model for survival and multivariate sparse functional data with application to a study of alzheimer’s disease. Biometrics, 78(2):435–447, 2022. doi: 10.1111/biom.13427. URL 10.1111/biom.13427. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R28] Li Y and Guan Y. Functional principal component analysis of spatio-temporal point processes with applications in disease surveillance. Journal of the american statistical association, 109 (507):1205–1215, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R29] Little RJA and Rubin DB. Statistical Analysis with Missing Data. Wiley, 2nd edition, 2002. [Google Scholar]

[R30] Liu Hua, You Jinhong, and Cao Jiguo. Functional l-optimality subsampling for massive data, 2021. URL https://arxiv.org/abs/2104.03446.

[R31] Marx BD and Eilers PH. Low-rank scale-invariant tensor product smooths for generalized additive mixed models. Biometrics, 62(4):1025–1036, 2006. [DOI] [PubMed] [Google Scholar]

[R32] McCulloch Charles E and Searle Shayle R.. Generalized, Linear and Mixed Models. Wiley, New York, 2001. [Google Scholar]

[R33] Meng XL. Multiple-imputation inferences with uncongenial sources of input (with discussion). Statistical Science, 10:538–573, 1994. [Google Scholar]

[R34] Morris JS and Carroll RJ. Wavelet-based functional mixed models. Journal of the Royal Statistical Society, Series B, 68(2):179–199, 2006. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R35] Morris JS, Vannucci M, Brown PJ, and R.J. (2003) Carroll. Wavelet-based nonparametric modeling of hierarchical functions in colon carginogenesis. Journal of the American Statistical Association, 98(463):573–583, 2003. [Google Scholar]

[R36] Muller H. Functional modelling and classification of longitudinal data. Scandinavian Journal of Statistics, 32:223–240, 2005. [Google Scholar]

[R37] Park SY and Staicu AM. Longitudinal functional data analysis. Stat, 4(1):212–226, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R38] Petrovich J, Reimherr M, and Daymont C. Functional regression models with highly irregular designs. 2018.

[R39] Ramsay J and Silverman B. Functional data analysis. Springer, New York, 2005a. [Google Scholar]

[R40] Ramsay JO and Silverman BW. Functional Data Analysis. Springer, New York, NY, 2005b. [Google Scholar]

[R41] Rathbun S. Optimal estimation of poisson intensity with partially observed covariates. Biometrika, 100:277–281, 2012. [Google Scholar]

[R42] Rathbun S and Shiffman S. Mixed effects models for recurrent events data with partially observed time-varying covariates: Ecological momentary assessment of smoking. Biometrics, 72:46–55, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R43] Reiss Philip T and Ogden R. Todd. Functional principal component regression and functional partial least squares. Journal of the American Statistical Association, 102(479):984–996, 2007. doi: 10.1198/016214507000000527. [DOI] [Google Scholar]

[R44] Rizopoulos D. Jm: An r package for the joint modeling of longitudinal and time-to-event data. Journal of Statistical Software, 35:1–33, 2010.21603108 [Google Scholar]

[R45] Ruppert D. Selecting the number of knots for penalized splines. Journal of Computational and Graphical Statistics, 11(4):735–757, 2002. [Google Scholar]

[R46] Ruppert D, Wand M, and Carroll R. Semiparametric Regression. €Cambridge University Press, Cambridge, 2003. [Google Scholar]

[R47] Shao Jun and Randy R Sitter. Bootstrap for imputed survey data. Journal of the American Statistical Association, 91(435):1278–1288, 1996. [Google Scholar]

[R48] Spring B. Sense2stop: Mobile sensor data to knowledge. https://clinicaltrials.gov/ct2/show/NCT03184389, 2019.

[R49] Staicu A-M, Crainiceanu CM, and Carroll RJ. Fast methods for spatially correlated multilevel functional data. Biostatistics, 11(2):177–194, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R50] Tsiatis AA and Davidian M. Joint modeling of longitudinal and time-to-event data: an overview. Statistica Sinica, 14:809–834, 2004. [Google Scholar]

[R51] Waagepetersen Rasmus. Estimating functions for inhomogeneous spatial point processes with incomplete covariate data. Biometrika, 95(2):351–363, 2008. [Google Scholar]

[R52] Wood S. Generalized additive models: an introduction with R. Chapman & Hall, London, 2003. [Google Scholar]

[R53] Wood SN. Low-rank scale-invariant tensor product smooths for generalized additive mixed models. Biometrics, 62(4):1025–1036, 2006. [DOI] [PubMed] [Google Scholar]

[R54] Xiao L, Li Y, and Ruppert D. Fast bivariate p-splines: the sandwich smoother. Journal of the Royal Statistical Society: Series B, 75(3):5770–599, 2013. [Google Scholar]

[R55] Yao F, Müller H-G, and Wang J-L. Functional data analysis for sparse longitudinal data. Journal of the American Statistical Association, 100(470):577–590, 2005. [Google Scholar]

PERMALINK

Recurrent event analysis in the presence of real-time high frequency data via random subsampling

Walter Dempsey

Abstract

1. Introduction

1.1. Related work

2. Main contributions and outline

3. Recurrent event process and associated high frequency data

3.1. Likelihood calculation

3.2. Probabilistic subsampling framework

4. Longitudinal functional principal components within event-history analysis

4.1. Estimation of the windowed covariate history

4.2. Estimation of β(s)

4.3. Connection to logistic score functions

Lemma 4.1.

Remark 4.2 (Inference procedure review).

4.4. Theoretical analysis

Assumption 4.3 (Event process assumptions).

Assumption 4.4 (Functional assumptions (Park and Staicu, 2015)).

Remark 4.5 (Practical consequences of Assumptions 4.3 and 4.4).

Lemma 4.6.

Lemma 4.7.

4.5. Computation versus statistical efficiency tradeoff

Table 1.

4.6. Penalized functional regression models

5. Simulation study

Table 2.

Remark 5.1 (Time-complexity and run-time of maximum likelihood estimation).

5.1. Impact of Δ

5.2. Selection of bandwidth

Table 4.

6. Extensions

6.1. Multivariate extensions

6.2. Missing data

6.2.1. Multiple imputation under uncongeniality

6.3. Multilevel models

7. A worked example: Adolescent psychiatric inpatient mHealth study

Fig. 1.

Table 5.

Fig. 2.

7.1. Complete-case analysis

Table 6.

Fig. 3.

7.2. Sensitivity analysis for window length

Fig. 4.

8. Discussion

Supplementary Material

Table 3.

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

4.2. Estimation of $β (s)$

5.1. Impact of $Δ$