Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 Jan 1.
Published in final edited form as: J Comput Graph Stat. 2023 Dec 15;33(2):525–537. doi: 10.1080/10618600.2023.2276114

Recurrent event analysis in the presence of real-time high frequency data via random subsampling

Walter Dempsey 1,*
PMCID: PMC11165938  NIHMSID: NIHMS1941631  PMID: 38868625

Abstract

Digital monitoring studies collect real-time high frequency data via mobile sensors in the subjects’ natural environment. This data can be used to model the impact of changes in physiology on recurrent event outcomes such as smoking, drug use, alcohol use, or self-identified moments of suicide ideation. Likelihood calculations for the recurrent event analysis, however, become computationally prohibitive in this setting. Motivated by this, a random subsampling framework is proposed for computationally efficient, approximate likelihood-based estimation. A subsampling-unbiased estimator for the derivative of the cumulative hazard enters into an approximation of log-likelihood. The estimator has two sources of variation: the first due to the recurrent event model and the second due to subsampling. The latter can be reduced by increasing the sampling rate; however, this leads to increased computational costs. The approximate score equations are equivalent to logistic regression score equations, allowing for standard, “off-the-shelf” software to be used in fitting these models. Simulations demonstrate the method and efficiency-computation trade-off. We end by illustrating our approach using data from a digital monitoring study of suicidal ideation.

Keywords: recurrent events, probabilistic subsampling, estimating equations, high frequency time series, logistic regression

1. Introduction

Advancement in mobile technology has led to the rapid integration of mobile and wearable sensors into behavioral health (Free et al., 2013). Take HeartSteps, for example, a mobile health (mHealth) study designed to increase physical activity in sedentary adults (Klasnja et al., 2019). Here, a Jawbone sensor is used to monitor step count every minute of the participant’s study day. Of interest in many mHealth studies is the relation of such real-time high frequency sensor data to an adverse, recurrent event process. In a smoking cessation mHealth study (Spring, 2019), for example, the relation between a time-varying sensor-based measure of physiological stress and smoking lapse is of scientific interest. In a suicidal ideation mHealth study (Kleiman et al., 2018), the relation of electrodermal activity (EDA) and accelerometer with self-identified moments of suicidal ideation is of scientific interest.

The goal of this paper is to construct a simple, easy-to-implement method for parameter estimation and inference. To do so, we introduce a random subsampling procedure that has several benefits. First, the resulting inference is unbiased; however, there is a computation-efficiency trade-off. In particular, a higher sampling rate can decrease estimator variance at the cost of increased computation. We show via simulations that the benefits of very high sampling rates is often negligible, as the contribution to the variation is small relative to the variation in the underlying stochastic processes. Second, derived estimating equations are optimal, implying loss of statistical efficiency is only due to subsampling procedure and not the derived methodology. Finally, implementation can leverage existing, standard software for functional data analysis and logistic regression, leading to fast adoption by domain scientists.

1.1. Related work

The use of wearable devices to passively monitor patients has led to a rapid increase in the number of studies with high-frequency sensors that can be conceptualized as studies where measurements are functions. The development of such applications have been accompanied by intense methodological development in regression models with functional covariates (James, 2002; James and Silverman, 2005; Muller, 2005; Ramsay and Silverman, 2005b; Kokoszka and Reimherr, 2017; Crainiceanu et al., 2009; Reiss and Ogden, 2007).

Recently, modeling time-to-event data with functional covariates has received a fair bit of attention. The functional linear Cox regression model (FLCRM) considers the association between a time-to-event and a set of functional and scalar predictors (Kong et al., 2018). In this setting, the functional and scalar predictors are observed at baseline and the hazard function satisfies the proportional hazards assumption and involves a non-parametric baseline hazard with the exponential adjustment including both a functional linear model and a linear model for the baseline scalar predictors. Since the linear assumption may be too restrictive, Cui et al. (2021) consider a more flexible additive functional Cox model for multivariate functional predictors in which the hazard depends on an unspecified bivariate twice differentiable function.

In our setting, the high frequency sensor process is an outcome and must therefore be jointly modeled with the recurrent event process. Recently, Li et al. (2022) proposed a functional approach to joint modeling of multivariate longitudinal biomarkers and a time-to-event outcome (Tsiatis and Davidian, 2004; Rizopoulos, 2010). Here, as in traditional joint models, the longitudinal biomarker is observed at only a few observation times (typically on the order of tens of observations) and is modelled via a functional principal components analysis. The eigenscores are the shared parameters across the longitudinal and survival submodels. A similar joint model was proposed in Dong et al. (2021). Inference proceeds by Monte Carlo EM which can require very large Monte Carlo sample size to ensure good performance, and naïve implementation may not recover EM’s ascent property (Caffo et al., 2005).

In this paper, we also consider joint modeling of recurrent events and a high frequency sensor process. In contrast to traditional longitudinal biomarkers, sensor processes are observed multiple times per second leading to thousand and/or millions of observations per individual. To address this increase in scale, a design-based perspective is taken in which we subsample non-event times to avoid some of the computationally demanding aspects of joint models. Subsampling in the context of functional data analysis for massive data has recently been investigated (Liu et al., 2021) in which a functional L-optimality criterion is derived. Subsampling is different in our current context, as our goal is to subsample from the individual sensor processes to produce a design-unbiased estimate of the cumulative hazard function. Finally, traditional joint modeling often consider the hazard function to depend on the current or cumulative value of the health process (Li et al., 2022; Rizopoulos, 2010). For high frequency sensor processes, we expect the hazard function to only depend on the recent sensor history and use recent history functional linear models (Kim et al., 2011) to account for this local dependence. We combine these models with advances in longitudinal functional data analysis (Staicu et al., 2010) and penalized functional regression (Goldsmith et al., 2015) to perform scalable inference using off-the-shelf software.

2. Main contributions and outline

The main contributions of this paper are as follows: We define a joint model using historical functional linear models and demonstrate how techniques from design-based inference can be used to circumvent computational challenges of joint modeling in Section 3. We then discuss longitudinal functional principal components analysis and demonstrate in Lemma 4.1 that the resulting approximate score equations are equivalent to score functions for logistic regression with binary response and an offset related to the subsampling rate meaning our proposed approach can be fit using off-the-shelf software. An asymptotic analysis is presented in Section 4.4, with an accompanying discussion of the novel computational versus statistical efficiency tradeoff in Section 4.5. A simulation study in Section 5 demonstrates the impact of subsampling rates, the window length parameter, and a computational comparison to existing methods. A multivariate extension and method to handle missing data are presented in Section 6. We end by illustrating our approach using data from a digital monitoring study of suicidal ideation in Section 7.

3. Recurrent event process and associated high frequency data

Suppose n subjects are independently sampled with observed event times Ti=Ti,1,,Ti,ki over some observation window 0,τi for each subject i=1,,n. Assume the event times are ordered, i.e., Ti,j<Ti,j, for j<j. The observation window length, Ti, is the censoring time and is assumed independent of the event process. Let Ni(t) denote the associated counting process of Ti; that is,Ni(t)=j=1ki1[Ti,j<t]. In this section, we assume a single-dimensional health process xi=xi(s)0s<τi for each participant is measured at a dense grid of time points. Accelerometer, for example, is measured at a rate of 32Hz (i.e., 32 times per second). Electrodermal activity (EDA), on the other hand, is measured at a rate of 4Hz (i.e., 4 times per second). Given the high frequency nature of sensor data, this paper assumes the process is measured continuously. See Appendix A for a notation glossary.

Let Hi,tNX=Hi,tNHi,tX be the σ-field generated by all past values Ni(s),xi(s)0st. In this paper, the instantaneous risk of an event at time t is assumed to depend on the health process, time-in-study, and the event history through a fully parametric conditional hazard function:

hi(tHi,tNX;θ)=limδ0δ1pr(Ni(t+δ)Ni(t)>0Hi,tNX), (1)

where θ is the parameter vector. For high frequency physiological data, we assume that current risk is log-additive and depends on a linear functional of the health process over some recent window of time and pre-specified features of the counting process; that is,

hitHi,tNX;θ=h0t;γexpgtHi,tNα+tΔtxisβsds (2)

where h0(t;γ) is a parametrized baseline hazard function, Δ is an unknown window-length, and gtHi,tNRp is a p-length feature vector summarizing the event-history and time-in-study information. The final term tΔtxi(s)β(s)ds reflects the unknown linear functional form of the impact of the time-varying covariate on current risk.

An alternative to (2) would be to construct features from the sensor data history ftHi,tXRq and incorporated these features in the place of the final term. Our current approach builds linear features of Hi,tX directly from the integrated history, avoiding the feature construction problem – a highly nontrivial issue for high frequency time-series data. The main caveat is the additional parameter Δ; however, as long as the estimated Δˆ exceeds Δ, then resulting estimation is unbiased albeit at a loss of efficiency. Moreover, sensitivity analysis can be performed to determine how choice of Δˆ affects inference. One limitation of the approach presented here is that only fully parametric hazard models may be fit to the data. However, a spline model for the log baseline hazard affords sufficient model flexibility.

3.1. Likelihood calculation

For the sake of notational simplicity, we leave the dependency of the conditional hazard function on Hi,tNX implicit, and write hi(t;θ). In our current setting, we assume the health process xi(s)0s<τi is directly observed. Therefore, we can consider the event process Ti conditional on the health process Xi, which results in the log-likelihood related to the event process being then given by

Ln(θ)=i=1n(j=1kilog(hi(Ti,j;θ))Hi(τi;θ))

where Hiτi;θ=0τihi(t;θ)dt is the cumulative hazard function. See Appendix B for additional arguments in favor of our proposed approach. Solving the associated score equations Un(θ)=0 yields the maximum likelihood estimator θˆ, where

Un(θ)=i=1n(j=1kihi(1)(Ti,j;θ)hi(Ti,j;θ)Hi(1)(τi;θ)),

with hi(1)Ti,j;θ and Hi(1)τi;θ are derivatives with respect to θ.

In classical joint models (Henderson et al., 2000; Tsiatis and Davidian, 2004), time-varying covariates xi(t) are observed only intermittently at appointment times. In our current setting, maximizing the likelihood is computationally prohibitive since for any θ we must compute the cumulative hazard functions Hiτi;θ which requires integration of hi(t;θ) given by (2) which itself depends on the integral tΔtxi(s)β(s)ds that is a function of the unknown functional parameter β(). That is, the risk model now depends on an integrated past history of the time-varying covariate which leads to severe increase in computational complexity.

To circumvent these computational difficulties, we will derive approximate score equations based on design-based inference of point processes. Design-based inference is common for spatial point processes (Waagepetersen, 2008) where the spatial varying covariate is observed at a random sample of locations. It is common in mobile health where ecological momentary assessments (Rathbun, 2012; Rathbun and Shiffman, 2016) are used to randomly sample individuals at various time-points to assess their emotional state. In the current setting, we will leverage these ideas to form a subsampling protocol that can substantially reduce computationally complexity. Therefore, the purpose is quite different. Moreover, the dependence of the intensity function on the recent history of sensor values leads to additional complications that must be addressed.

3.2. Probabilistic subsampling framework

To solve the computational challenge we employ a point-process subsampling design to obtain unbiased estimates of the derivative of the cumulative hazards for each subject. The subsampling procedure treats the collected sensor data as a set of potential observations. Suppose covariate information is sampled at times drawn from an independent inhomogeneous Poisson point process with known intensity πi(t). At a subsampled time t, the windowed covariate history xi(ts)0sΔ and counting process features gtHi,tN are observed. Optimal choice of πi(t) is beyond the scope of this paper; however, simulation studies have suggested setting the subsampling rate proportional to the hazard function hi(t;θ).

An estimator is design-unbiased if its expectation is equal to that parameter under the probability distribution induced by the sampling design (Cassel et al., 1977). Let Di0,τi denote the random set of subsampled points. Note, by construction, this random set is distinct from the set of event times with probabilty one, i.e., prTiDi==1. Under subsampling via πi(t), one may compute a Horvitz-Thompson estimator of the derivative of the cumulative hazard Hˆi1τi;θ=uDihi1u;θ/πiu. An alternative design-unbiased estimator of the derivative of the cumulative hazards is given by

Hˆi(1)(τi;θ)=u(TiDi)hi(1)(u;θ)πi(u)+hi(u;θ) (3)

Equation (3) is the estimator suggested by Waagepetersen (2008). This estimator depends on the superposition of the event and subsampling processes. Lemma 4.7 shows the estimator for θ associated with using (3) is the most efficient within a suitable class of estimators for the derivative of the cumulative hazard function (including the Horvitz-Thompson estimator). Therefore, we restrict our attention to (3) for the remainder of this paper. Letting

wit;θ=πitπit+hit;θ, (4)

the resulting approximate estimating equations can be re-written as

U^n(θ)=i=1n[uTiwi(u;θ)hi(1)(u;θ)hi(u;θ)uDiwi(u;θ)hi(1)(u;θ)πi(u)]. (5)

Equation (5) represents the approximate score functions built via plug-in of the design-unbiased estimator of the derivative of the cumulative hazard given in (3).

4. Longitudinal functional principal components within event-history analysis

Probabilistic subsampling converts the single sensor stream xi into a sequence of functions observed repeatedly at sampled times Di and event times Ti over windows of length Δ. Such a data structure is commonly referred to as longitudinal functional data (Xiao et al., 2013; Goldsmith et al., 2015). Given the large increase in longitudinal functional data in recent years, corresponding analysis has received much recent attention (Morris et al., 2003; Morris and Carroll, 2006; Baladandayuthapani et al., 2008; Di et al., 2009; Greven et al., 2010; Staicu et al., 2010; Chen and Müller, 2012; Li and Guan, 2014). Here, we combine work by Park and Staicu (2015) and Goldsmith et al. (2011) to construct a computationally efficient penalized functional method for solving the estimation equations Uˆn(θ).

4.1. Estimation of the windowed covariate history

We start by defining X(t,s)=x(ts) to be the sensor measurement 0sΔ time units prior to time tTiDi. The sandwich smoother (Xiao et al., 2013) is used to estimate the mean μy(t,s)=y[X(t,s)] where the expectation is indexed by whether t is an event (y=1) or subsampled (y=0) time respectively. Alternative bivariate smoothers exist, such as the kernel-based local linear smoother (Hastie et al., 2009), bivariate tensor product splines (Wood, 2006), and the bivariate penalized spline smoother (Marx and Eilers, 2006). The sandwich smoother was chosen for its computational efficiency and estimation accuracy. We then define X˜(t,s)=X(t,s)μˆy(t,s) to be the mean-zero process at each time tTiDi.

As in Park and Staicu (2015), define the marginal covariance by

Σys,s=0τcyT,s,T,sfyTdT.

for 0s,sΔ, where cy(t,s),t,s is the covariance function of the windowed covariate history X(t,), T is the observation window length of the process, and fy(T) is the intensity function for event (y=1) and subsampled (y=0) times respectively. Estimation of Σy occurs in two steps. For simplicity, we present the steps for subsampled times (i.e., y=0) but the steps are the same for event times as well. First, the pooled sample covariance is calculated at a set of grid points:

Σ˜0sr,sr=(i=1nDi)1(i=1ntDiX˜t,srX˜t,sr).

As we assume the health process xi is directly observed, the diagonal elements of Σˆ0 are not inflated. Second, the estimator Σˆ is further smoothed again using the sandwich smoother (Xiao et al., 2013). Note Park and Staicu (2015) smooth the off-diagonal elements, while here we smooth the entire pooled sample covariance matrix. All negative eigenvalues are set to zero to ensure positive semi-definiteness. The result is used as an estimator Σˆ0 for the pooled covariance Σ0.

Next, we take the spectral decomposition of the estimated covariance function; let {ψˆk(0)(s),λˆk(0)}k1 be the resulting sequence of eigenfunctions and eigenvalues. The key benefit of the marginal covariance approach is that it allows us to compute a single, time-invariant basis expansion; this reduces the computational burden by avoiding the three dimensional covariance function (i.e., covariance depends on t) and associated spectral decomposition in methods considered by Chen and Müller (2012). Using the Karhunen-Loève decomposition, we can represent X(t,s) for tTiDi by

X(t,s)=μˆy(t,s)+k=1cˆi,k(y)(t)ψˆk(y)(s)μˆy(t,s)+ci(y)(t)ψˆ(y)(s)

where

cˆi,k(y)(t)=tΔtX˜i(t,s)ψˆk(y)(s)ds,ci(y)(t)=(ci,1(y)(t),,ci,Kx(y)(t)),ψˆ(y)(s)=(ψˆ1(y)(s),,ψˆKx(y)(s)),

and Kx< is the truncation level of the infinite expansion. Following Goldsmith et al. (2011), we set Kx to satisfy identifiability constraints (see Section 4.2 for details). In subsequent sections, we leave the dependence on y (i.e., whether tTi or Di) implicit unless required for notational simplicity.

4.2. Estimation of β(s)

The next step of our method is modeling β(s). Here, we leverage ideas from the penalized spline literature (Ruppert et al., 2003; Wood, 2003).

Let ϕ(s)=ϕ1(s),,ϕKb(x) be a spline basis and assume that β(s)=j=1Kbbjϕj(s)=ϕ(t)b where b=b1,,bKb. Thus, the integral in (2) can be restated as

tΔtX(t,s)β(s)dstΔt[μˆ(t,s)+c(t)ψˆ(s)]×[ϕ(s)b]ds=[Mi,t+c(t)Jψˆ,ϕ]b

where Mt=M1,t,,MKb,t, Mj,t=tΔtμˆ(t,s)ϕj(s), and Jψˆ,ϕ is a Kx×Kb dimensional matrix with the (k,l) th entry is equal to 0Δψˆk(s)ϕl(s)ds (Ramsay and Silverman, 2005a).

Given the basis for β(t), the model depends on choice of both Kb and KX. We follow Ruppert (2002) by choosing Kb large enough to prevent under-smoothing and KxKb to satisfy identifiability constraints. While our theoretical analysis considers truncation levels that depend on n, in practice, we follow the simple rule of thumb and set Kb=Kx=35. As long as the choices of KX and Kb are large enough, their impact on estimation is typically negligible. Below, we will exploit a connection between (5) and score equations for a logistic regression model. Before moving on, we introduce some additional notation. Define

hitHi,tNX;θexpZtγ+gtHi,tNα+Mi,tb+Ci,tJψˆ,ϕb=expWi,tθ, (6)

where θ=(γ,α,b) and expZtγ=:h0(t) is the parameterized baseline intensity function. We write U˜n(θ) to denote the approximate score function when substituting in (6) for (2).

4.3. Connection to logistic score functions

We next establish a connection between the above approximate score equations U˜n(θ) and the score equations for a logistic regression model. We can then exploit this connection to allow the model to be fit robustly using standard mixed effects software (Ruppert, 2002; McCulloch and Searle, 2001).

Lemma 4.1.

Under weights (4) and the log-linear intensity function (6), the approximate score function U˜n(θ) is equivalent to

i=1ntTiDi[1[tDi]11+exp[(W˜i,tθ+logπi(t))]]W˜i,t

where w˜i,t=Wi,t. This is the score function for logistic regression with binary response Yi(t) for tTiDi and i[n] where Yi(t)=1tTi, offset logπi(t), and covariates W˜i,t.

The connection established by Lemma 4.1 between our proposed methodology and logistic regression allows us to leverage “off-the-shelf” software. The main complication is pre-processing of the functional data; however, these additional steps can also be taken care of via existing software. Therefore, the entire data analytic pipeline is easy-to-implement and requires minimal additional effort by the end-user. To see this, we briefly review the proposed inference procedure.

Remark 4.2 (Inference procedure review).

Given observed recurrent event and high frequency data Ti,xii=1n,

  1. For each i[n], sample non-event times as a time-inhomogeneous Poisson point process with intensity according to πi(t)

  2. Estimate mean μy(t,s) for 0sΔ at all event times ti=1nTi and sampled non-event times ti=1nDi.

  3. Compute marginal covariance across event times, Σ1, and non-event times, Σ0.

  4. Compute eigendecomposition {ψˆk(y),λˆk(y)} of marginal covariance Σy

  5. Use the eigendecomposition to construct Wi,t for all i[n] and tDiTi

  6. Perform logistic regression with binary outcome Yi(t) and offset of logπi(t).

Before demonstrating the methodology via simulation in Section 5 and a worked example in Section 7, we provide a theoretical analysis of our current proposal.

4.4. Theoretical analysis

Our theoretical analysis requires assumptions regarding the subsampling procedure, the event process, and the functional data. We state these assumptions and then our main theorems. We start by assuming there exists τ< such that all individuals are no longer at risk (i.e., τi<τ for all i). Moreover, define Ri(t) to be the at-risk indicator for participant i, i.e., Ri(t)=1t0,τi. Asymptotic theory provided in Lemma 4.6 will be proven under regularity conditions A-E in (Andersen et al., 1993, pp. 420–421) along with the following additional assumptions:

Assumption 4.3 (Event process assumptions).

We assume the following holds:

  • (E.1) The subsampling rate is both lower and upper bounded at all at-risk times; that is, 0<L<πi(t)<U< for all i=1,2, and t[0,τ] such that Ri(t)=1

  • (E.2) There exists a nonnegative definite matrix Ξ(θ) such that
    n1Ξn(θ)=n1i=1n0τwi(t;θ)×[hi(1)(t;θ)hi(1)(t;θ)hi(t;θ)]×Ri(t)dtPΞ(θ).
  • (E.3) There exists M such that Wi,j,t<M for all (i,j,t).

  • (E.4) For all j, k
    n1i=1n0τ|d2dθjdθkhi(t;θ0)|2Ri(t)dtPC<
    as n.

We also require several assumptions due to the truncation of the Karhunen-Loève decomposition that represents X(t,s).

Assumption 4.4 (Functional assumptions (Park and Staicu, 2015)).

The following assumptions are standard in prior work on longitudinal functional data analysis (Park and Staicu, 2015; Yao et al., 2005; Chen and Müller, 2012):

  • (A.1) X={X(t,s):(t,s)𝒯×𝒮} is a square integrable element of the L2𝒯×𝒮

  • (A.2) The subsampling and conditional intensity rate functions fy(T) are continuous and supfy(T)<.

  • (A.3) [X(t,s)X(t,s)X(t,s)X(t,s)]< for each s,s[0,Δ] and 0<t, t<τ.

  • (A.4) [X(t,)4]< for each 0<t<τ.

Finally, for simplicity, we assume that there exists b such that β(t)=ϕ(t)b; that is, the true function β(t) sits in the span of the spline basis expansion.

Remark 4.5 (Practical consequences of Assumptions 4.3 and 4.4).

Assumptions 4.3 and 4.4 contain as a special case the scenario where individuals are independent and identically distributed, the functional process is bounded (i.e., |X(t,s)|<M for some M<), and the subsampling rate is both lower and upper bounded at all risk times. As such bounds are likely to be true in most practical settings, this demonstrates the reasonableness of our assumptions for applied settings.

Lemma 4.6.

Under Assumption 4.3, Assumption 4.4, and Δ known, for large n the estimator θˆn is consistent; moreover,

n(θˆθ)DN(0,Ξ(θ)1),

where D is convergence is distribution and

Ξ(θ)=0τw(s;θ)×h(1)(s;θ)×h(1)(s;θ)h(s;θ)ds.

and T is the random censoring time of the event process.

Proof of Lemma 4.6 is presented in Appendix C.2. A design-unbiased estimator for Ξ(θ) is

Ξ^(θ)=n1i=1ntTiDiwi(t;θ)(1wi(t;θ))[hi(1)(t;θ)hi(t;θ)]×[hi(1)(t;θ)hi(t;θ)] (7)

For the log-linear intensity model, the sampling-unbiased estimator for Ξˆ(θ) is equivalent to the Fisher information for the previously described logistic regression model. This implies that subsampling from an inhomogeneous Poisson process, standard logistic regression software can be used to fit the recurrent event model by specifying an offset equal to logπi(t). Based on this, we can leverage existing inferential machinery to obtain variance-covariance estimates of model parameters.

That is, if Σˆbb is the Kb×Kb dimensional matrix obtained by plugging in the estimates of variance components into the formula for the variance of bˆ, then the standard error for estimate at time t0 – i.e., βˆt0=ϕt0b – is given by ϕt0Σˆbbϕt0. Then the approximate 95% confidence interval can be constructed as βˆt0±1.96ϕt0Σˆbbϕt0. We acknowledge an important limitations of confidence intervals obtained via this approach. Specifically, we ignore the variability inherent in the longitudinal functional principal component analysis; that is, our estimates ignore the variability in estimation of eigenfunctions ψˆ as well as the coefficients cˆi,k(t). Joint modeling could be considered as in Crainiceanu and Goldsmith (2010), however, this is beyond the scope of this article.

Lemma 4.7 shows (4) is optimal within a class of weighted estimating equations. The result ensures the only loss of statistical efficiency is due to subsampling and not using a suboptimal estimation procedure given subsampling. Here, weights w are considered optimal if the difference between the asympotic variance Vθ0;w and the asymptotic variance under any other choice of weights W,Vθ0;w is positive semi-definite, i.e., any linear contrast has smaller asymptotic variance under weight w than under weight W.

Lemma 4.7.

If the event process is an inhomogeneous Poisson point process with intensity h(t;θ) and subsampling occurs via an independent, inhomogeneous Poisson point process with intensity π(t), then Uˆn(θ) are optimal estimating functions (i.e., most efficient) in the class of weighted estimating functions given by (5) replacing (4) by any weight function wi(t;θ). This class includes the Horvitz-Thompson estimator under w(s;θ)=1.

Proof of Lemma 4.7 is presented in Appendix D.

4.5. Computation versus statistical efficiency tradeoff

We next consider the statistical efficiency of our proposed estimator when compared to complete-data maximum likelihood estimation. While subsampling introduces additional variation, it may significantly reduce the overall computational burden. It is this trade-off that we next make precise. In particular, we consider the following choice of subsampling rate, π(t)=c×h(t;θ) for c>0. That is, the subsampling rate is proportional to the intensity function with time-independent constant c>0. Under this subsampling rate, the weight function (4) is equal to c/(c+1). Under Lemma 4.6,

Ξ(θ)=cc+10τh(1)(t;θ)h(1)(t;θ)h(t;θ)dt=cc+1Σ(θ)

where Σ(θ) is the Fisher information of the complete-data maximum likelihood estimator. Therefore the relative efficiency is c/(1+c). For an upper bound H=maxt(0,τ)h(t;θ), if we set π(t)=c×H, then the relative efficiency can be lower bounded by c/(c+1).

Sensor measurements occur multiple times per second. Suppose the intensity rate is bounded above by 1 and the unit time scale was hours. If we then subsample the data at a rate of 10 times per hour, then we have a lower bound on the efficiency of 0.909. For a 4Hz sensor, this reduces the number of samples per hour from 4 × 60 × 60 = 14,400 per hour to on average 10 per hour. While the computational complexity of logistic regression is linear in the number of samples, we get 1440 times reduction in the data size at the cost of a 0.909 statistical efficiency. If we sample 100 times per hour, then the efficiency loss is only 0.999, with a 144 times reduction in data size. Table 1 provides additional examples for a 4Hz and 32Hz sensor rate respectively. The data reduction depends on this rate; however, the lower bound on statistical efficiency does not because the subsampling rate only depends on the upper bound of the intensity function. In particular, if the events are rare then subsampling rate can be greatly reduced with no impact to statistical efficiency.

Table 1.

Data reduction (total # of measurements divided by expected number of subsampled measurements) given sensor rate, subsampling constant and an upper bound on the intensity rate.

Sensor rate Subsampling constant (c) Upper bound on intensity rate per hour Statistical efficiency
0.5 1 3 5 10
4Hz (EDA) 5 5760 2880 960 576 288 0.833
10 2880 1440 480 288 144 0.909
100 288 144 48 29 14 0.990
32Hz (AI) 5 46080 23040 7680 4608 2304 0.833
10 23040 11520 3840 2304 1152 0.909
100 2304 1152 384 230 115 0.990

4.6. Penalized functional regression models

Recall theoretical results were proven under the assumption that there exists b such that β(t)=ϕ(t)b. To make this assumption plausible, we set Kb large enough (but less than Kx to ensure identifiability) to ensure the spline basis expansion is sufficiently expressive. However, in practice, such a choice of Kb may lead to overfitting the data. Following Goldsmith et al. (2011), we choose the linear spline model and set ϕ(t)=b0+b1t+j=2Kb1(tκj)+ where κjj=2Kb1 are the chosen knots and assume bjj=2Kb1N0,σ2I to induce smoothness on the spline model. Combining the penalized spline formulation with Lemma 4.1 establishes a connection between our approximate score equations and solving a generalized mixed effects logistic regression with offset. Given the connection with generalized mixed effects models, the inferential machinery to obtain variance-covariance estimates. As we leverage the standard R package ‘glmnet’, the smoothness parameter is chosen via cross-validation. In this context, we acknowledge another limitation. Specifically, penalization may lead to confidence intervals that perform poorly in regions where βˆ(t) is over-smoothed.

5. Simulation study

We next assess the proposed methodology via a simulation study. Here, we assume each individual is observed over five days where each day is defined on the unit interval [0,1] with 1000 equally spaced observation times per day. We define X(t) at the grid of observations as a mean-zero Gaussian process with covariance

Σt,t=σ2Γ(v)2v12v|ts|ρtvKv2v|ts|ρt

where Kv is the modified Bessel function of the second kind. We set v=1/2, σ2=1, and ρ=0.3 as well as set Kb=Kx=35. For simplicity, we assume Σ is known in computation of the eigendecompositions. Given {X(t)}0t1 for a given user-day, we generate event times according a chosen hazard function h(t;θ). To mimic our real data, we set

ht;θ=expθ0+0ΔXtsβsds.

We set Δ to mimic a 30-minute window for a 12-hour day. We set θ0=log(5/1000) to set a baseline risk of approximately 5 events per day. We consider two choices of β(s):1β0+expβ1s which decays to 0 as s approaches Δ from below, and (2) β1*sin2πsΔπ/2 which is significantly different from 0 as s approaches Δ from below.

We generate 1000 datasets, each consisting of 500 user-days. For a given simulated user-day, we randomly sample non-event times using a Poisson process with rate of every five minutes. We use the proposed methodology to construct the estimate βˆi,12(t) for the th simulated dataset; we then subsample the sampled non-event times with thinning probability 1/3, 1/6, 1/12, and 1/24. This results in randomly sampled non-event times given by a Poisson process with rates of every fifteen minutes, half-hour, hour, and two hours. We can construct the corresponding estimates: βˆi,4(t), βˆi,2(t), βˆi,1(t), and βˆi,0.05(t) respectively. Subsampling allows us to compare the variance due to subsampling as compared to the variance due to sampling fewer non-event times.

Since we are primarily interested in accuracy, we report the mean integrated squared error (MISE) defined as 11000i=110000(βˆi,j(s)β(s))2ds for each j where β(s)0 for s>Δ. The MISE is defined in this manner to account for settings where Δ is unknown. Next, let β¯j(t)=11000i=11000βˆi,j(t) denote the average estimate for j=0.5,1,2,4. Then the squared bias is given by 0βj(s)β(s)2ds and the variance is given by 11000j=110000(βˆi,j(s)β¯j(s))2ds. The subsampling variance is defined as 11000i=110000(βˆi,j(s)βˆi,12(s))2ds. Table 2 show the MISE decomposed into the variance and squared biased as well as the subsampling variance. To allow fair comparisons across the two choices of β(s), all reported numbers are scaled by the integrated square of the true function 0β(s)2ds. The relative MISE (RMISE) to the lowest sampling rate and the average runtime (in seconds) are also reported.

Table 2.

Mean-integrated squared error, variance, squared bias, and subsampling (SubS) variance for βs given by (1) and (2) respectively.

Sampling rate (per hour)
12 4.0 2.0 1.0 0.5
SubS. Variance - 1.2 × 10−5 2.0 × 10−5 3.5 × 10−5 8.6 × 10−5
Variance 1.3 × 10−2 1.5 × 10−2 2.1 × 10−2 2.7 × 10−2 3.8 × 10−2
Squared Bias 6.3 × 10 −2 6.0 × 10 −2 6.1 × 10 −2 6.1 × 10 −2 6.7 × 10 −2
MISE 7.6 × 10 −2 7.5 × 10 −2 8.2 × 10 −2 8.8 × 10 −2 1.1 × 10−1
RMISE - 0.99 1.08 1.16 1.39
Avg. runtime (secs) 356 108 51 24 17
SubS. variance - 2.7 × 10−5 5.1 × 10−5 5.7 × 10−5 7.5 × 10−5
Variance 3.7 × 10 −2 2.7 × 10 −2 2.8 × 10−2 3.3 × 10−2 3.8 × 10 −2
Squared Bias 3.5 × 10 −2 3.6 × 10 −2 3.4 × 10−2 3.3 × 10−2 3.1 × 10 −2
MISE 7.2 × 10−2 6.3 × 10−2 6.2 × 10−2 6.6 × 10−2 6.9 × 10−2
RMISE - 0.87 0.86 0.92 0.96
Avg. runtime (secs) 1084 208 93 47 30

Table 2 demonstrates that the variance may increase as the sampling rate decreases. However, the rate of increase in the MISE is relatively quite low. In the first setting, for example, the RMISE when sampling every hour versus every five minutes is 1.16 while the run time is 14.8 times faster. In the second setting, the RMISE remains below or equal to 1 while the run time is 23.1 times faster when sampling every hour rather than every minute. This highlights the efficiency-computation trade-off.

Remark 5.1 (Time-complexity and run-time of maximum likelihood estimation).

The complexity of logistic regression with n observations and covariates of dimension d is O(nd). Maximum likelihood estimation can be well approximated by subsampling at a very high rate; therefore, under a 4Hz sensor the time-complexity of the maximum likelihood estimate is 4×60×τind compared to the expected time-complexity under subsampling with rate c per minute of c×τind. This means using the sensor at its observation frequency of four times per second will take approximately 213 times as long as subsampling at a rate of once every half-hour. In our first example, using the average run times at each sampling rate to project run time at four times per second yields an approximately 2.0 hour run time. In our second example, we project a run time at four times per second of 6.2 hours. In both instances, the relative efficiency gain would be negligible suggesting a huge computational increase for minimal relative information gain.

5.1. Impact of Δ

A concern with the proposed approach is the selection of window-length, Δ. Here we investigate the impact of misspecification of the window length for β(t)=β1sin(2πs/Δπ/2) and the true window length Δ is set to 32 -minutes. See Section E.1 of the supplementary materials for a similar discussion for β(t)=β0expβ1s. As in the previous simulation, we generate 1000 datasets per condition each consisting of 500 user-days. For each simulation, we analyze the data using window lengths Δ(26,29,32,35,37).

When the window length is too large, i.e., Δ>Δ, then asymptotically the estimation is unbiased as β(t)0 for t>Δ; however, we incur a penalty in finite samples, especially for settings where the function is far from zero near t=Δ. We find the MISE increases as the absolute error ΔΔ increases. While the MISE increased for Δ<Δ, the pointwise estimation error remains low for t<Δ. This does not hold for Δ>Δ, where instead we see parameter attenuation, i.e., a bias towards zero in the estimates at each 0<t<Δ. To capture this, we define a partial MISE as ΔΔ˜0Δ˜(β^i,j(s)β^(s))2ds where Δ˜=minΔ,Δ, which is the MISE on the subset 0<t<minΔ,Δ and scaled for comparative purposes.

5.2. Selection of bandwidth

While above we explore the bias-variance trade-off under bandwidth misspecification, here we explore a data-adaptive method for bandwidth selection. This is a critical consideration for recent history functional linear models (Kim et al., 2011). Given the model complexity does not change as a function of Δ in our simulations, our proposal is to compare AIC across a range of bandwidths and choose that which maximizes the criterion. Table 4 presents AIC-based selection across the 500 simulated datasets analyzed in the prior section. We see markedly distinct behavior of the selection criterion in the two settings. The AIC-based selection works well for the sinusoidal effect across subsampling rates, but performs poorly in the exponential setting. Recall that the sinusoidal effect is significantly different from 0 at Δ and then 0 for all s>Δ while the exponential effect decays slowly towards 0 as sΔ. Therefore, the selection problem under the sinusoidal effect is much easier than under the exponential effect. Moreover, bias, variance, and MISE do not vary substantially under bandwidth misspecification in the exponential setting. Thus, we conclude that the AIC-based selection method works well in settings where inference depends heavily on estimating the bandwidth accurately. Note that when Δ is inaccurately selected as in the exponential case, the relative MISE when Delta is selected by AIC is a more useful indicator of AIC.

Table 4.

Rate of selection of Δ^ across 500 simulations when true Δ=32 minutes. Bold: most likely bandwidth given a chosen subsampling rate.

Exponential Sinusoidal
Δ = \ Rate 4.0 2.0 1.0 0.5 4.0 2.0 1.0 0.5
26 0.38 0.35 0.30 0.26 0.00 0.00 0.00 0.00
29 0.22 0.20 0.17 0.12 0.09 0.14 0.15 0.18
32* 0.11 0.12 0.13 0.18 0.85 0.79 0.76 0.70
35 0.13 0.13 0.14 0.15 0.05 0.06 0.06 0.09
37 0.14 0.20 0.26 0.28 0.01 0.01 0.02 0.03

6. Extensions

In this section, we demonstrate the flexibility of the proposed approach by exploring extensions in several important directions to ensure these methods are robust for practical use with high frequency data. This section will continue to leverage the connection to generalized functional linear models provided by Lemma 4.1.

6.1. Multivariate extensions

In this section, we extend our model to the case of multiple functional regressors. That is, suppose L health processes, i.e., xi=xi(t)=xi,1(t),,xi,L(t)0<s<τi, for each participant is measured at a dense grid of time points. In the suicidal ideation case study, for example, accelerometer is measured at a rate of 32Hz while electrodermal activity (EDA) is measured at a rate of 4Hz. A multivariate extension of our model (1) is given by

hitHi,tNX;θ=h0t;γexpgtHi,tNα+tΔ1txi,1sβ1sds++tΔLtxi,LsβLsds (8)

The approach given in Section 4.3 extends naturally to the multivariate functional setting. For each functional regressor, we estimate the pooled sample covariance Σy,l for y{0,1} and l=1,,L as in Section 4.1. Let k=1λ^k,l(y)ψ^k,l(y)(s)ψ^k,l(t) be the spectral decomposition of Σˆy,1. Then xi,j(t) is approximated using a truncated Karhunen-Loeve tΔltXl(t,s)βl(t)=Ml,t+cl(t)Jψˆl(y),ϕlbl.

6.2. Missing data

Sensor data can often be missing for intervals of time due to sensor wearing issues. In the suicidal ideation case study, for example, there are 2139 self-identified moments of distress across all 91 participants. Of these, 1289 event times had complete data for the prior thirty minutes, 1984 had fraction of missing data on a fine grid less than 30%, and 1998 had fraction of missing data on a fine grid less than 10%.

Missing data is a critical issue because ci,k(t) cannot be estimated if X(s,t) is not observed for all s[0,Δ]. Moreover, standard errors should reflect the uncertainty in these coefficients when missing data is prevalent. Goldsmith et al. (2011) suggest using best linear unbiased predictors (BLUP) or posterior modes in the mixed effects model to estimate ci,k(t); however, this is ineffective when there is substantial variability in these estimates. To deal with this, Crainiceanu and Goldsmith (2010) take a full Bayesian analysis. Yao et al. (2005) introduced PACE as an alternative frequentist method. Petrovich et al. (2018) shows that for sparse, irregular longitudinal, the imputation model should not ignore the outcome variable Yi(t).

Here we present an extension of Petrovich et al. (2018) to our setting by leveraging Lemma 4.1 and the marginal covariance estimation procedure to construct a multiple imputation procedure. Let xi(t) denote incomplete sensor data at time t (i.e., at times si,rr=1kit in [0,Δ]. Then

Xi(s,t)Yi(t)=y,xi(t)=μys,t+ai,tsBi,txitμit (9)
CovXis,t,Xis,tYit=y,xit=Σts,sai,t(s)Bi,tai,ts (10)

where we have

ai,t(s)=(Σt(si,1,s)Σt(si,kit,s));Bi,t1=(Σt(si,1,si,1)Σt(si,1,si,2)Σt(si,1,si,2)Σt(si,kit,si,kit)),

μi(t)=[xi(t)Yi(t)=y]={μy(sj,t)}j=1r and μy(s,t) is the mean of X(t,s) from group y, and Σts,s is the covariance between Xs,t and X(s,t) for s,s[0,Δ] and tR+. Note that Petrovich et al. (2018) requires modeling the mean and covariance functions separately for y=0 and y=1.

6.2.1. Multiple imputation under uncongeniality

Multiple imputation yields valid frequentist inferences when the imputation and analysis procedure are congenial (Meng, 1994); the above procedure is derived for function-on-scalar multiple imputation for binary outcomes, which ignored the joint nature of recurrent event analysis in the presence of high frequency sensor data. The main advantage of the above imputation framework is its simplicity and approximate congeniality when events are rare and the sampling rate is low. The main disadvantage is that the above framework is uncongenial under many events and/or high sampling rates. Meng (1994) defines congeniality between an imputation procedure and an analyst’s complete (and incomplete) analysis procedure if there exists a unifying Bayesian model which embeds the imputer’s imputation model and the analyst’s complete data procedure. A recent discussion of congeniality can be found Bartlett and Hughes (2020). Congeniality ensures good frequentist coverage properties. In some uncongenial settings, standard multiple imputation can be biased downwards leading to under-coverage of confidence intervals. A key question is whether we can use the above imputation methods within a general procedure to handle uncongeniality.

To address this, we use the recommendation from Bartlett and Hughes (2020) and consider a method that first bootstraps a sample from the dataset and then apply multiple imputation to each bootstrap sample. This general approach was originally proposed by Shao and Sitter (1996) and Little and Rubin (2002). We suppose B bootstraps and M imputations per bootstrap; let θˆb,m denote the estimator for the mth imputation of the bth bootstrap. The point estimator is given by (B)1b=1Bθˆb where θˆb=M1m=1Mθˆb,m. To construct the confidence interval, we require mean sum of squares with and between boostraps, i.e., MSW=1B(M1)b=1Bm=1M(θˆb,mθˆb)(θˆb,mθˆb) and MSB=1B1b=1B(θˆbθˆ)(θˆbθˆ) respectively. Then the estimator of the variance-covariance matrix of θˆ is given by ΣˆB,M=B+1BMMSBMSW/M. We obtain the varaiance for β(t) by ϕ(t)ΣˆB,Mϕ(t). We follow Bartlett and Hughes (2020) and construct confidence intervals based on Satterthwaite’s degrees of freedom, which here is given by

v^=[(B+1BM)MSBMSWM]2(B1)1(B+1BM)2MSB2+MSW2BM2(M1)

The bootstrap followed by multiple imputation procedure has been studied extensively by Bartlett and Hughes (2020) and is robust to uncongeniality. The main disadvantage of this approach is its considerable computational intensity. Recall likelihood calculations were computationally prohibitive by themselves, so combining with bootstrap and MI would further increase this large-scale computation. The random subsampling framework thus simplifies handling of missing data via connections to function-on-scalar multiple imputation by Petrovich et al. (2018) as well as to bootstrap to handle uncongeniality by Bartlett and Hughes (2020). Ignoring the computational time of bootstrap sampling, the computational time for the first choice in the simulation study with B=200 bootstraps and M=2 imputations per bootstrap leads to 35 hours for a sampling rate of 0.5 compared to 7 hours for a sampling rate of 4, which highlights the benefits of the proposed framework.

6.3. Multilevel models

The approach can be extended to multilevel models with functional regressors, which are critical in mobile health where a high degree of individual variation is often observed. Let biN0,σb2, then the multilevel extension of (1) is

hitHi,tNX;θ,biexpZtγ+gtHi,tNα+Mi,t+Ci,tJψˆ,ϕβ+bi=expWi,tθ+Zi,tbi, (11)

where Zi,t=Mi,t+Ci,tJψˆ,ϕ. Lemma 4.1 implies that the random subsampling framework applied to equation (11) leads to a penalized logistic mixed-effects model. As far as the authors are aware, the combination of mixed-effects and L2 penalization on a subset of parameters is not available in existing software. Given the paper’s focus on “off-the-shelf” software implementations, multilevel models are considered important future work.

7. A worked example: Adolescent psychiatric inpatient mHealth study

During an eight month period in 2018, 91 psychiatric inpatients admitted for suicidal risk to Franciscan Children’s Hospital were enrolled in an observational study. Study data were graciously provided by Evan Kleiman and his study team (https://kleimanlab.org). Each study participant wore an Empatica E4 (Garbarino et al., 2014), a medical-grade wearable device that offers real-time physiological data. On each user-day, participants were asked to self-identify moments of suicidal distress. At these times, the participant was asked to press a button on the Empatica E4 device. The timestamp of the button press was recorded. One of the primary study goals was to assess the association between sensor-based physiological measurements and self-identified moments of suicidal distress. In particular, the scientific question is whether there are early indicators of escalating distress by monitoring physiological correlates.

A key concern is whether all moments of suicidal distress are recorded. To ensure this, clinical staff interviewed participants in the evening who were then asked to review their button press activity. Any events that were identified as incorrect button press activity were removed. At the end of the 30-day study period, the average number of button presses per day was 2.42 with a standard deviation of 2.62. Investigation of the button press data shows low button-press counts prior to 7AM and a sharp drop off by 11PM. This demonstrates an additional concern: events can only occur when the individual is at-risk, i.e., (A) the individual is wearing the Empatica E4 and (B) is currently awake. To deal with (A) and (B), here we define each study day to begin at 9AM and end at 8PM.

Figure 1 visualizes button presses versus time since study entry for each user. Day 30 is assumed to censor the observation process. A black mark signals dropout before day 30. Figure 1 shows the potential heterogeneity in button press rates between users and study days. To assess whether there is between user or between study day variation, Table 5 presents a two-way ANOVA decomposition of the button press counts as a function of participant and day in study. The ANOVA decomposition demonstrates high variation with day in study and across users.

Fig. 1.

Fig. 1

User button-presses (red) versus time since study entry (in hours). The black mark indicates the final sensor measurement time.

Table 5.

ANOVA decomposition of daily button press counts

DF Sum Sq Mean Sq F value Pr(>F)
Participant 88 2675.4 30.4 8.9 < 2 × 10−16
Day in Study 29 443.3 15.3 4.5 3.9 × 10−13
Residuals 672 2304.2 3.4

Here, we focus on two physiological processes – (1) electrodermal activity (EDA), a measure of skin conductance measured at 4Hz, and (2) activity index (AI) (Bai et al., 2016), a coordinate-free feature built from accelerometer data collected at 32Hz that measures physical movement. Electrodermal activity can be significantly impacted by external factors (e.g., room temperature). To account for the high between user-day variation, we analyze EDA standardized per study-day and device. The individual EDA and AI trajectories are highly variable which tends to obscure patterns and trends. In Figure 2, the mean trajectories of EDA and AI are plotted in reverse time from button press timestamps, which shows sharp changes in EDA and AI in the 30 minutes prior to button presses. In Figure 2 in Appendix E.3, the mean trajectories of EDA and AI are plotted in reverse time for the sampled non-event times in the 30 minutes prior to the non-event time. The distinct mean behavior motivates our desire to model these two processes separately as discussed in Remark 4.2.

Fig. 2.

Fig. 2

Average scaled EDA and AI in the 30 minutes prior to button presses

7.1. Complete-case analysis

Inspection of Figures 2(a) and 2(b) suggest setting Δ between 5 and 30 minutes. Here, we investigate three potential window lengths, Δ=5,15, and 30. To ensure minimal loss of efficiency, the subsampling rate was set to once every fifteen minutes. Given the daily button press rate, this ensures an average of 44 non-events to 2.5 events per day. Based on Table 1, this ensures we can achieve a substantial data reduction at a minimal loss of efficiency. After sampling non-event times, complete-case analyses are performed, i.e., sampled times where sensors included in the model have any level of missing data are ignored. Table 6 presents both AIC and BIC criteria on the complete-case data, normalized by the number of observations to ensure fair comparison. We find that Δ=15 is adequate to capturing the proximal impact of EDA and AI on the risk of a button press.

Table 6.

Normalized AIC and BIC for different choices of Δ using complete-case data. For Δ=5, kb=kx=31 due to amount of data in shorter window-length, while kb=kx=35 for Δ=15 and 30.

Delta 5 15 30
AIC 0.96 0.98 0.95
BIC 0.99 1.01 0.99

In Section E.2 of the supplementary materials, Figures 1a and 1b present the per-patient average fraction with missing EDA and AI in 30-minute windows respectively. For EDA, we have a wide range of missingness – from 0% to over 40% across individuals with most between 5% and 30% average fraction missing. For AI, the missingness is less pronounced – from 0% to 15% across individuals with most between 0% and 10% average fraction missing.

We analyzed activity index (AI) and electrodermal activity jointly in a multivariate model as in equation (8). To account for times when the participants may not be wearing the device and thus not be at-risk for a button press, we limited the analysis to data collected between 9AM and 8PM. In the analysis, we assumed a constant baseline hazard and included a binary indicator of whether the participant had an event in the past 12 hours to account for patient heterogeneity in the number of events as seen in Figure 1. Figure 3(a) and 3(b) presents the estimates from the joint analysis and their associated 95% confidence intervals using the bootstrap and multiple imputation strategy. We highlight in gray the statistically significant regions. Here, we see that standardized EDA is not associated with increased risk of button press, while activity index sees a positive association in the final few minutes prior to a button press.

Fig. 3.

Fig. 3

β(t) for AI and EDA with 95% CI; solid line is point estimate for the complete-case analysis, while highlighted region are the model-based pointwise 95% confidence intervals.

7.2. Sensitivity analysis for window length

We next investigate whether the results are sensitive to window length. Specifically, we re-analyzed activity index (AI) and electrodermal activity jointly in a multivariate model as in equation (8) with Δ=5 and 30 minutes. Figure 4(a) and 4(b) presents the estimates from the joint analysis for the activity index (AI) and their associated 95% confidence intervals using the bootstrap and multiple imputation strategy. We highlight in gray the statistically significant regions which remain similar across all three choices of window length. We do not present results for the standardized EDA as it continues to be not associated with increased risk of button press.

Fig. 4.

Fig. 4

β(t) for AI and EDA with 95% CI; solid line is point estimate for the missing data analysis, while highlighted region are the boostrap and MI based pointwise 95% confidence intervals.

8. Discussion

In this paper, we have presented a methodology for translating a difficult functional analysis with recurrent events problem into a traditional logistic regression. The translation leveraged subsampling and weighting techniques, specifically the use of weights suggested by Waagepetersen (2008), along with flexible functional data analysis methods of Goldsmith et al. (2011) with marginal covariance methods for longitudinal functional data of Park and Staicu (2015). The proposed methodology abides by the idea that we should make data as small as possible as fast as possible. Subsampling and weighting converts the problem to well-known territory which allows us to leverage existing software. We show limited loss of efficiency when the subsampling is properly tuned to the event rates. Important extensions to an online sampling algorithm, optimal weighting when the Poisson point process assumption does not hold, and non-linear functional data methods are all considered important future work.

Supplementary Material

Supp 1

Table 3.

Mean-integrated squared error (MISE), variance, and partial MISE as a function of Δ* and sampling rate when true Δ=32 minutes.

Sampling rate (per hour)
Δ = 4.0 2.0 1.0 0.5
26 MISE 0.390 0.395 0.397 0.413
Variance 5.0 × 10−2 5.3 × 10−2 5.5 × 10−2 6.8 × 10−2
P-MISE 0.210 0.216 0.217 0.235
29 MISE 0.220 0.225 0.224 0.238
Variance 3.7 × 10−2 4.4 × 10−2 4.3 × 10−2 5.5 × 10−2
P-MISE 0.084 0.090 0.088 0.103
32 MISE 0.061 0.062 0.061 0.072
Variance 2.7 × 10−2 2.9 × 10−2 3.0 × 10−2 4.2 × 10−2
P-MISE 0.061 0.062 0.061 0.072
35 MISE 0.246 0.247 0.249 0.255
Variance 1.9 × 10−2 2.1 × 10−2 2.3 × 10−2 2.9 × 10−2
P-MISE 0.154 0.154 0.154, 0.160
37 MISE 0.360 0.359 0.353 0.361
Variance 2.6 × 10−2 3.3 × 10−2 3.7 × 10−2 4.4 × 10−2
P-MISE 0.256 0.255 0.252 0.259

References

  1. Andersen PK, Borgan O, Gill RD, and Keiding N. Statistical models based on counting processes. 1993.
  2. Bai Jiawei, Di Chongzhi, Xiao Luo, Evenson Kelly R., Andrea Z. LaCroix, Ciprian M. Crainiceanu, and David M. Buchner. An activity index for raw accelerometry data and its comparison with other activity metrics. PLOS ONE, 11(8):1–14, August 2016. doi: 10.1371/journal.pone.0160644. URL 10.1371/journal.pone.0160644. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Baladandayuthapani V, Mallick BK, Young Hong M, Lupton JR, Turner ND, and Carroll RJ. Bayesian hierarchical spatially correlated functional data analysis with application to colon carcinogensis. Biometrics, 64(1):64–73, 2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Bartlett Jonathan W and Hughes Rachael A. Bootstrap inference for multiple imputation under uncongeniality and misspecification. Statistical Methods in Medical Research, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Caffo Brian S., Jank Wolfgang, and Jones Galin L.. Ascent-based monte carlo expectation–maximization. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2):235–251, 2005. doi: 10.1111/j.1467-9868.2005.00499.x. URL 10.1111/j.1467-9868.2005.00499.x. [DOI] [Google Scholar]
  6. Cassel C-M, Särndal, and Wretman JH. Foundations of inference in survey sampling. Wiley, New York, 1977. [Google Scholar]
  7. Chen K and Müller H-G. Modeling repeated functional observations. Journal of the American Statistical Association, 107(500):1599–1609, 2012. [Google Scholar]
  8. Crainiceanu C and Goldsmith J. Bayesian functional data analysis using winbugs. Journal of Statistical Software, 32:1–33, 2010. [PMC free article] [PubMed] [Google Scholar]
  9. Crainiceanu Ciprian M., Staicu Ana-Maria, and Di Chong-Zhi. Generalized multilevel functional regression. Journal of the American Statistical Association, 104(488):1550–1561, 2009. doi: 10.1198/jasa.2009.tm08564. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Cui Erjia, Crainiceanu Ciprian M., and Leroux Andrew. Additive functional cox model. Journal of Computational and Graphical Statistics, 30(3):780–793, 2021. doi: 10.1080/10618600.2020.1853550. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Di CZ, Crainiceanu CM, Caffo BS, and Punjabi NM. Multilevel functional principal component analysis. Annals of applied statistics, 3(1):458–488, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Dong Jianghu (James), Cao Jiguo, Gill Jagbir, Miles Clifford, and Plumb Troy. Functional joint models for chronic kidney disease in kidney transplant recipients. Statistical Methods in Medical Research, 30(8):1932–1943, 2021. doi: 10.1177/09622802211009265. URL 10.1177/09622802211009265. [DOI] [PubMed] [Google Scholar]
  13. Free C, Phillips G, Galli L, Watson L, Felix L, Edwards P, Patel V, and Haines A. The effectiveness of mobile-health technology-based health behaviour change or disease management interventions for health care consumers: A systematic review. PLOS Medicine, 10(1):1–45, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Garbarino M, Lai M, Bender D, Picard RW, and Tognetti S. Empatica e3 — a wearable wireless multi-sensor device for real-time computerized biofeedback and data acquisition. In 2014 4th International Conference on Wireless Mobile Communication and Healthcare - Transforming Healthcare Through Innovations in Mobile and Wireless Technologies (MOBIHEALTH), pages 39–42, 2014. [Google Scholar]
  15. Goldsmith J, Bobb J, Crainiceanu C, Caffo B, and Reich D. Penalized functional regression. Jounal of Computational and Graphical Statistics, 20(4):830–851, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Goldsmith J, Zipunnikov V, and Schrck J. Generalized multilevel functional-on-scalar regression and principal component analysis. Biometrics, 71(2):344–353, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Greven S, Crainiceanu C, Caffo B, and Reich D. Longitudinal functional principal component analysis. Electronic journal of statistics, 4:1022–1054, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Hastie T, Tibshirani R, and Friedman J. The elements of statistical learning. Springer Series in Statistics. Springer, 2009. [Google Scholar]
  19. Henderson R, Diggle P, and Dobson A. Joint modeling of longitudinal measurements and event time data. Biostatistics, 1:465–480, 2000. [DOI] [PubMed] [Google Scholar]
  20. James G. Generalized linear models with functional predictors. Journal of the Royal Statistical Society, Ser B, 64:411–432, 2002. [Google Scholar]
  21. James G and Silverman B. Functional adaptive model estimation. Journal of the American Statistical Association, 100:565–576, 2005. [Google Scholar]
  22. Kim Kion, Şentürk Damla, and Li Runze. Recent history functional linear models for sparse longitudinal data. Journal of Statistical Planning and Inference, 141(4): 1554–1566, 2011. ISSN 0378–3758. doi: 10.1016/j.jspi.2010.11.003. URL https://www.sciencedirect.com/science/article/pii/S0378375810005033. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Klasnja P, Smith S, Seewald N, Lee A, Hall K, Luers B, Hekler E, and Murphy SA. Efficacy of contextually tailored suggestions for physical activity: A micro-randomized optimization trial of heartsteps. Annals of Behavioral Medicine, 53:573–582, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Kleiman Evan M., Turner Brianna J., Fedor Szymon, Beale Eleanor E., Picard Rosalind W., Huffman Jeff C., and Nock Matthew K.. Digital phenotyping of suicidal thoughts. Depression and Anxiety, 35(7):601–608, 2018. [DOI] [PubMed] [Google Scholar]
  25. Kokoszka P and Reimherr M. Introduction to Functional Data Analysis. Chapman and Hall/CRC, 1st ed edition, 2017. doi: 10.1201/9781315117416. [DOI] [Google Scholar]
  26. Kong Dehan, Ibrahim Joseph G., Lee Eunjee, and Zhu Hongtu. Flcrm: Functional linear cox regression model. Biometrics, 74(1):109–117, 2018. doi: 10.1111/biom.12748. URL 10.1111/biom.12748. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Li Cai, Xiao Luo, and Luo Sheng. Joint model for survival and multivariate sparse functional data with application to a study of alzheimer’s disease. Biometrics, 78(2):435–447, 2022. doi: 10.1111/biom.13427. URL 10.1111/biom.13427. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Li Y and Guan Y. Functional principal component analysis of spatio-temporal point processes with applications in disease surveillance. Journal of the american statistical association, 109 (507):1205–1215, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Little RJA and Rubin DB. Statistical Analysis with Missing Data. Wiley, 2nd edition, 2002. [Google Scholar]
  30. Liu Hua, You Jinhong, and Cao Jiguo. Functional l-optimality subsampling for massive data, 2021. URL https://arxiv.org/abs/2104.03446.
  31. Marx BD and Eilers PH. Low-rank scale-invariant tensor product smooths for generalized additive mixed models. Biometrics, 62(4):1025–1036, 2006. [DOI] [PubMed] [Google Scholar]
  32. McCulloch Charles E and Searle Shayle R.. Generalized, Linear and Mixed Models. Wiley, New York, 2001. [Google Scholar]
  33. Meng XL. Multiple-imputation inferences with uncongenial sources of input (with discussion). Statistical Science, 10:538–573, 1994. [Google Scholar]
  34. Morris JS and Carroll RJ. Wavelet-based functional mixed models. Journal of the Royal Statistical Society, Series B, 68(2):179–199, 2006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Morris JS, Vannucci M, Brown PJ, and R.J. (2003) Carroll. Wavelet-based nonparametric modeling of hierarchical functions in colon carginogenesis. Journal of the American Statistical Association, 98(463):573–583, 2003. [Google Scholar]
  36. Muller H. Functional modelling and classification of longitudinal data. Scandinavian Journal of Statistics, 32:223–240, 2005. [Google Scholar]
  37. Park SY and Staicu AM. Longitudinal functional data analysis. Stat, 4(1):212–226, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Petrovich J, Reimherr M, and Daymont C. Functional regression models with highly irregular designs. 2018.
  39. Ramsay J and Silverman B. Functional data analysis. Springer, New York, 2005a. [Google Scholar]
  40. Ramsay JO and Silverman BW. Functional Data Analysis. Springer, New York, NY, 2005b. [Google Scholar]
  41. Rathbun S. Optimal estimation of poisson intensity with partially observed covariates. Biometrika, 100:277–281, 2012. [Google Scholar]
  42. Rathbun S and Shiffman S. Mixed effects models for recurrent events data with partially observed time-varying covariates: Ecological momentary assessment of smoking. Biometrics, 72:46–55, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Reiss Philip T and Ogden R. Todd. Functional principal component regression and functional partial least squares. Journal of the American Statistical Association, 102(479):984–996, 2007. doi: 10.1198/016214507000000527. [DOI] [Google Scholar]
  44. Rizopoulos D. Jm: An r package for the joint modeling of longitudinal and time-to-event data. Journal of Statistical Software, 35:1–33, 2010.21603108 [Google Scholar]
  45. Ruppert D. Selecting the number of knots for penalized splines. Journal of Computational and Graphical Statistics, 11(4):735–757, 2002. [Google Scholar]
  46. Ruppert D, Wand M, and Carroll R. Semiparametric Regression. €Cambridge University Press, Cambridge, 2003. [Google Scholar]
  47. Shao Jun and Randy R Sitter. Bootstrap for imputed survey data. Journal of the American Statistical Association, 91(435):1278–1288, 1996. [Google Scholar]
  48. Spring B. Sense2stop: Mobile sensor data to knowledge. https://clinicaltrials.gov/ct2/show/NCT03184389, 2019.
  49. Staicu A-M, Crainiceanu CM, and Carroll RJ. Fast methods for spatially correlated multilevel functional data. Biostatistics, 11(2):177–194, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Tsiatis AA and Davidian M. Joint modeling of longitudinal and time-to-event data: an overview. Statistica Sinica, 14:809–834, 2004. [Google Scholar]
  51. Waagepetersen Rasmus. Estimating functions for inhomogeneous spatial point processes with incomplete covariate data. Biometrika, 95(2):351–363, 2008. [Google Scholar]
  52. Wood S. Generalized additive models: an introduction with R. Chapman & Hall, London, 2003. [Google Scholar]
  53. Wood SN. Low-rank scale-invariant tensor product smooths for generalized additive mixed models. Biometrics, 62(4):1025–1036, 2006. [DOI] [PubMed] [Google Scholar]
  54. Xiao L, Li Y, and Ruppert D. Fast bivariate p-splines: the sandwich smoother. Journal of the Royal Statistical Society: Series B, 75(3):5770–599, 2013. [Google Scholar]
  55. Yao F, Müller H-G, and Wang J-L. Functional data analysis for sparse longitudinal data. Journal of the American Statistical Association, 100(470):577–590, 2005. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supp 1

RESOURCES