Abstract
We introduce a new class of semiparametric latent variable models for long memory discretized event data. The proposed methodology is motivated by a study of bird vocalizations in the Amazon rain forest; the timings of vocalizations exhibit self-similarity and long range dependence. This rules out Poisson process based models where the rate function itself is not long range dependent. The proposed class of FRActional Probit (FRAP) models is based on thresholding, a latent process. This latent process is modeled by a smooth Gaussian process and a fractional Brownian motion by assuming an additive structure. We develop a Bayesian approach to inference using Markov chain Monte Carlo and show good performance in simulation studies. Applying the methods to the Amazon bird vocalization data, we find substantial evidence for self-similarity and non-Markovian/Poisson dynamics. To accommodate the bird vocalization data in which there are many different species of birds exhibiting their own vocalization dynamics, a hierarchical expansion of FRAP is provided in the Supplementary Material.
Key words and phrases. Fractional Brownian motion, fractal, latent Gaussian process models, long range dependence, nonparametric Bayes, probit, time series
1. Introduction.
Event data are often obtained in a discretized form in environmental and ecological applications. Instead of recording exact times of event occurrence, one records whether or not at least one event occurred within each interval. Such data can potentially be treated as a discrete time series (Tiao, Phadke and Box (1976), Stern and Coe (1984)), ignoring the underlying continuous time process that generated the events. While this simplification may be more amenable to standard time series analysis, it is often desirable to provide a self-explanatory stochastic model that is capable of capturing the temporal dynamics of the underlying event generating process (Davison and Ramesh (2020)).
In Davison and Ramesh (2020) and Ramesh, Thayakaran and Onof (2013), the authors use a Markov modulated Poisson process (MMPP) (Fischer and Meier-Hellstern (1993)) for the discretized events. Event intensities of an MMPP are directed by the states of an independently evolving continuous time Markov process whose different states correspond to different rates of events. Davison and Ramesh (2020) derived expressions for the likelihood of the observed binary series for an MMPP using Chapman–Kolmogorov equations of a continuous time Markov chain. They proposed a maximum likelihood approach for inference on the model parameters which include the instantaneous transition rate matrix of the continuous time Markov chain and the Poisson rates corresponding to each state of the chain. They also show that the autocorrelation function of the binary time series generated by an MMPP exhibits a geometric decay. Fearnhead and Sherlock (2006) proposed a Gibbs sampling algorithm for Bayesian inference.
The geometric rate of decay in autocorrelations of an MMPP makes it inapplicable to model time series with slower decay in autocorrelations. This is true for time series where the dependence structure is non-Markovian; a special class of time series that has non-Markovian dependence and is a focus in this article is known as long range dependent series. Roughly speaking, a time series is long-range dependent if its autocovariance function decays like a power function. Long-range dependence has been encountered in time series data from a large variety of fields, including hydrology (Hurst (1951)), finance (Lo (1989)), network traffic (Willinger et al. (2003)), and climatology (Franzke et al. (2020)) among others. A natural extension of the MMPP to accommodate long-range dependence is the fractional Poisson process (Laskin (2003)). However, likelihood computation of discretized data, obtained from a fractional Poisson process, is not straightforward.
In seminal work, Mandelbrot and Van Ness (1968) introduced fractional Brownian motion, a generalization of standard Brownian motion, and showed that the increments of this process are stationary and exhibit long range dependence. The general definition of fractional Brownian motion is a stochastic integral with respect to a standard Brownian motion where the order of integration is defined by a parameter H ∈ (0, 1). Mandelbrot and Van Ness (1968) referred to H as the Hurst parameter after the hydrologist Harold Hurst, who discovered long-range dependence in time series while studying storage capacities of dams on the Nile river. Mandelbrot and Van Ness (1968) also established that the fractional Brownian motion is a self-similar stochastic process with no characteristic time scale (Graves et al. (2017)). Intuitively, self-similar processes retain statistical properties over different time scales. When the increments of a self-similar process are stationary, these increments exhibit long-range dependence.
For discretized events the intensity of the latent counting process determines the correlation structure of the binary time series. If the binary series is long-range dependent, then an inhomogeneous Poisson process with fixed intensity λ(t) is insufficient to explain the observed data, as it implies that increments in disjoint time intervals are independent. Furthermore, Beran et al. (2013), Chapter 2, showed that a doubly stochastic Poisson process with random intensity λ(t) is long-range dependent if and only if λ(t) is long-range dependent; refer to Samorodnitsky (2006), Pipiras and Taqqu (2017) for reviews on long-range dependence and self-similarity.
In this article we propose a latent semiparametric framework to model long-range dependent discretized event data via a FRActional Probit (FRAP) model. The FRAP model assumes a latent stochastic process responsible for generating the events of interest. Positive values of the process within a time interval imply one or more event occurrences within that interval. By setting the latent process as the fractional Brownian motion parameterized by the Hurst coefficient, we show the FRAP model is able to capture long-range dependence of the discretized events. By varying the Hurst coefficient within (0, 1), the spectrum of the model encompasses antipersistence when H ∈ (0, 1/2), independence for H = 1/2, and long-range dependence when H ∈ (1/2, 1). Moreover, we also include a nonparametric trend component in our model to account for nonstationarity of event occurrences. The proposed framework accommodates testing of long-range dependence in the data by comparing H0: H = 0.5 vs. H1: H > 0.5. We define a Bayesian approach to inference using a Gaussian process prior for the nonparametric trend. A Markov chain Monte Carlo (MCMC) sampling algorithm is proposed relying on sampling the latent process.
The rest of the article is organized as follows. In Section 2 we introduce the motivating Amazon bird vocalization data, including exploratory analyses revealing possible long-range dependence. Section 3 is dedicated to the development and analysis of the FRAP model. Section 4 contains simulation experiments evaluating the proposed approach, and Section 5 analyzes the Amazon data. In the Supplementary Material (Chakraborty, Ovaskainen and Dunson (2022a)), we extend the FRAP model to allow multiple types of events through a grade-of-membership model and provide details on prior specification and posterior computation.
2. Amazon bird vocalization data.
Bird songs play a major role in mate selection and thus have a pronounced impact on their population dynamics (Slabbekoorn and Smith (2002)). Identifying birds based on their vocalizations is a widely used method for estimating bird population sizes and following population trends over time, and automated acoustic monitoring is increasingly used in both ecological studies and in conservation (Laiolo (2010)). Bird songs are well known to follow a circadian pattern in that they sing most intensely early in the morning and late in the day (Krebs and Kacelnik (1983)).
We are motivated by an Amazon bird vocalization data set containing observations from the years 2010 to 2014. Audio monitoring devices were placed at different locations throughout the Amazon rain forest. Using the methods of Ovaskainen, de Camargo and Somervuo (2018), these recordings were converted to discretized binary time series (de Camargo, Roslin and Ovaskainen (2019)) containing 0–1 indicators of which species vocalized at least once in one minute time intervals for a 180-minute period starting at sunrise. A visual depiction of the binary sequence of vocalizations for the bird species Automolus ochrolaemus is provided in Figure 1. Based on the audio recordings, it is not possible to reliably distinguish different individual birds of the same species or to infer the number of birds vocalizing. We focus on three locations which are similar in habitat and close in latitude and longitude. Our data consist of recordings for 15 relatively common bird species. For each species we have about five to 10 days of recordings during the months of June to September with recordings starting typically around 5:15 AM. On average, a given species vocalized in 25–30 out of the 180 intervals.
Fig. 1.

Binary sequence of all vocalizations of birds from the Automolus ochrolaemus species, during nine days (not necessarily consecutive) of recording. White and black grids represent absence or presence of vocalizations, respectively.
Our analysis focuses on three characteristics of the bird vocalization dynamics. First, we are interested in the distribution of duration of bird song activity and inactivity; in particular, our results indicate that the duration cannot be adequately modeled by the exponential distribution. In the context of event data, exponential inter-event times are routinely assumed for mathematical and computational simplicity. However, many naturally occurring events, such as earthquakes (Ogata and Abe (1991)), landscape evolution (Weymer et al. (2018)), and human brain activity (Tagliazucchi et al. (2013)) have been shown not to follow such patterns. We are also interested in identifying time periods when birds are more likely to sing and recovering groups of bird species that have similar singing patterns.
Define the marginal probability of vocalization for a given time interval of length Δt to be the probability of observing at least one vocalization when a time interval of this length is selected at random. In the left panel in Figure 2 we show the marginal probabilities of a vocalization during minute intervals of length Δt = {1, 2, 4, 9, 15, 30, 60, 90} for 15 different bird species. On the right panel in Figure 2, we show the probabilities of vocalizations conditioned on the event that the bird vocalized in the previous interval of the same length. Quite naturally, the marginal probabilities show an increasing pattern with the length of intervals. In comparison, the conditional probabilities show substantially less variation with Δt; for most species the conditional probabilities vary between (0.4, 0.75). Such scaling of summary statistics is commonly encountered in self-similar stochastic processes (Pipiras and Taqqu (2017)). Additionally, the distance autocorrelations (Zhou (2012)) and the periodogram of the binary series for one day of recording for the species Corythopis torquata is displayed in Figure 3. The distance autocorrelation is a popular alternative to the standard autocorrelation function for investigating nonlinear dependence structures and thus is more suitable for the binary time series data presented here. The slow decay in the distance autocorrelation and the spikes in the spectrum for small frequencies indicate potential long-range dependence in the data.
Fig. 2.

Marginal (left panel) and conditional (right panel) probabilities of bird vocalizations for 15 different species at different time scales Δt = {1, 2, 4, 9. 15, 30, 60, 90}. The names of the species from Table 1 have been abbreviated using the first letter of their genus name (first word) and first letter of their specific epithet (second word), due to space constraints.
Fig. 3.

Distance autocorrelation (left panel) at different lags for the binary indicators obtained from one day of recording of vocalizations for the species Corythopis torquata. On the right panel the periodogram for the same time series is shown.
We will use the notation X(t) for the stochastic process . A stochastic process X(t) is said to be self-similar if, for any c > 0, we have so that the random variables X(t) and X(ct) are equivalent in distribution up to scaling factors governed by the parameter H. This parameter H ∈ (0, 1) is commonly known as the Hurst exponent. A self-similar process with stationary increments has nonsummable autocovariances (Pipiras and Taqqu (2017)) and is known as a long-range dependent (LRD) time series. In such series the degree of long-range dependence is controlled by H. For continuous time series data, many methods have been proposed to estimate H: the ReScaled range (RS) analysis (Hurst (1951), Mandelbrot and Wallis (1969)), detrended fluctuation analysis (Peng et al. (1994)), log periodogram regression (Geweke and Porter-Hudak (1983)), local Whittle approximation (Robinson (1995)) etc. Although these methods typically apply to continuous data, we use these estimators in our exploratory analyses, in particular the estimators due to Geweke and Porter-Hudak (1983) and Robinson (1995).
To estimate H, according to Geweke and Porter-Hudak (1983) and Robinson (1995), we use the LongMemoryTS package in R. Table 1 shows the estimates of the Hurst exponent, according to Geweke and Porter-Hudak (1983) () and Robinson (1995) (), for the 15 bird species from Figure 2. Both and have a tuning parameter m which is the number of Fourier frequencies. In Table 1 we report the estimated Hurst coefficients for and for . The estimates of H, as seen from and in Table 1, suggest long memory behaviour although they often do not satisfy the constraint 0 < H < 1.
Table 1.
Estimated Hurst exponents for the 15 bird species using (), Geweke and Porter-Hudak (1983) (), Robinson (1995) (), Livsey et al. (2018) (), and the FRAP model. For the FRAP model we include the posterior mean () along with the 95% credible intervals (, )
| Species name | m = n1/2 | m = n2/3 | m = n4/5 | m = n1/2 | m = n2/3 | m = n4/5 | ||||
|---|---|---|---|---|---|---|---|---|---|---|
| Automolus ochrolaemus | 0.72 | 0.82 | 0.89 | 0.65 | 0.86 | 0.86 | 0.67 | 0.85 | 0.89 | 0.95 |
| Cercomacra cinerascens | 1.04 | 1.06 | 1.17 | 0.99 | 1.13 | 1.04 | 0.83 | 0.90 | 0.92 | 0.94 |
| Corythopis torquata | 0.91 | 0.94 | 0.89 | 0.78 | 0.98 | 0.85 | 0.72 | 0.80 | 0.84 | 0.88 |
| Frederickena viridis | 1.25 | 1.03 | 1.07 | 1.20 | 1.10 | 1.02 | 0.63 | 0.79 | 0.86 | 0.93 |
| Grallaria varia | 1.02 | 0.91 | 0.91 | 0.95 | 0.98 | 0.91 | 0.66 | 0.84 | 0.89 | 0.93 |
| Hylexetastes perrotii | 0.79 | 0.94 | 0.92 | 0.71 | 0.98 | 0.89 | 0.85 | 0.89 | 0.93 | 0.95 |
| Hylophilus muscicapinus | 0.80 | 0.99 | 0.96 | 0.68 | 0.99 | 0.92 | 0.69 | 0.80 | 0.87 | 0.93 |
| Ibycter americanus | 0.96 | 1.08 | 1.10 | 0.88 | 1.21 | 0.99 | 0.81 | 0.90 | 0.94 | 0.96 |
| Micrastur gilvicollis | 0.84 | 0.90 | 0.98 | 0.73 | 0.95 | 0.91 | 0.64 | 0.80 | 0.84 | 0.89 |
| Micrastur mirandollei | 0.83 | 0.96 | 1.05 | 0.76 | 1.02 | 1.02 | 0.61 | 0.83 | 0.88 | 0.93 |
| Myrmeciza ferruginea | 0.87 | 1.10 | 0.99 | 0.78 | 0.92 | 0.88 | 0.61 | 0.81 | 0.85 | 0.88 |
| Percnostola rufifrons | 0.81 | 0.96 | 0.89 | 0.78 | 0.99 | 0.90 | 0.63 | 0.87 | 0.92 | 0.96 |
| Pipra erythrocephala | 0.63 | 0.86 | 0.96 | 0.60 | 0.95 | 0.93 | 0.60 | 0.79 | 0.84 | 0.88 |
| Pithys albifrons | 0.77 | 0.86 | 0.90 | 0.71 | 0.89 | 0.86 | 0.63 | 0.83 | 0.87 | 0.90 |
| Ramphastos vitellinus | 0.84 | 0.99 | 1.01 | 0.76 | 1.02 | 0.96 | 0.59 | 0.77 | 0.83 | 0.89 |
Time series models for discrete valued data with LRD structure are relatively sparse. Classical approaches for count/discrete valued times series, such as the integer autoregressive moving-average (McKenzie (1985, 1986, 1988)) and discrete autoregressive moving-average (Jacobs and Lewis (1978a, 1978b)), cannot account for LRD (Davis et al. (2016), Chapter 21). Cui and Lund (2009) developed a model for stationary Bernoulli sequences with LRD based on renewal sequences. Livsey et al. (2018) provide a recipe for multivariate count time series with Poisson marginals and a flexible autocovariance structure that can adequately handle LRD but fit a misspecified likelihood for inference on relevant parameters. Additionally, it is not entirely straightforward to accommodate covariates in their method. More recently, Jia et al. (2021) developed a method to construct count time series with prescribed marginals through suitable transformations of a latent Gaussian series. However, the joint distribution of counts thus obtained is not easily determined. Estimates of the Hurst exponent, obtained from the quasi-maximum likelihood method from Livsey et al. (2018), are also included in Table 1 under the column . The remaining columns in Table 1 refer to model estimates which are discussed later in Section 5.
Our goal is not simply to estimate the Hurst coefficient; we would like to define a realistic generative probability model for these data that takes into account the data collection process and can be used as a useful baseline for future ecological analyses that include spatial dependence, environmental covariates, and other complications. The estimated Hurst coefficients for our proposed fractional probit model are provided in Table 1; see Section 3.1 below. Interestingly, the Hurst coefficients are significantly above 0.5 for all 15 bird species. This suggests long-range dependence, a new finding of ecological interest, which should be considered in future analyses of animal occurrence time series. One can theoretically use a (doubly stochastic) Poisson process to model these data; however, one should allow flexible rate functions to accommodate long memory behaviour Beran et al. (2013), Chapter 2.
3. Discretized event data.
We begin this section by defining some notation. Suppose event recordings are discretized at time points {t0, t1, …, tn } where the time points belong to some index set 𝒯. In this article we assume that ti+1 − ti = Δ for all i = 0, 1, …, n − 1. Corresponding to each time interval, we have the following binary event indicators:
| (3.1) |
We consider R replications of this binary time series Z = {Z(1), Z(2), …, Z(R) }. In our particular setting the replications correspond to different days of recording at a fixed location and for a fixed bird species.
3.1. Fractional probit model.
Consider, for now, a single replication of the binary series Z. We assume a latent continuous time process y(t), t ∈ 𝒯, is responsible for instigating events of interest. Let ρ0(y(s), y(t)) denote the covariance function of y(·) for s, t ∈ 𝒯. We want to derive a discrete time series from y(t) so that it reflects the autocovariance structure of the observed binary data. Of particular interest are time series that exhibit long-range dependence motivated by the bird vocalization data. A time series is said to have long-range dependence if its autocovariance function ρX(k) at lag decays polynomially as k → ∞,
| (3.2) |
where L(·) is a slowly varying function at infinity, meaning it is positive on [c, ∞) with c ≥ 0 and, for any a > 0, limu→∞ L(au)/L(u) = 1. The parameter d is called the long-range dependence parameter, and the series is said to have long memory. A popular alternative characterization of long-range dependent series relies on properties in the frequency domain. If sX(λ) is the spectral density of the times series , then the series is long-range dependent if
| (3.3) |
for some slowly varying function L*(·) at zero. This definition implies that spectral densities of long-range dependent series have an infinite spike in a neighborhood around zero.
The concept of long memory is intricately related to self-similarity of processes. Broadly speaking, self-similar processes are obtained as normalized limits of partial-sum processes of a long memory series (Pipiras and Taqqu (2017)). While there are several well-studied self-similar processes, one of the most fundamental and perhaps the most popular is the fractional Brownian motion (fBM). A Brownian motion B(t) is a stationary Gaussian process with covariance function KB(s, t) = τ2 min(s, t), τ > 0. The fBM generalizes this covariance structure to the form
| (3.4) |
The parameter H is known as the Hurst exponent of the fBM. Henceforth, we shall write BH(t) to denote an fBM with Hurst exponent H. In (3.4), . For H = 0.5 the Brownian motion is recovered. The self-similarity of the process stems from the fact that . Setting , , we obtain a stationary discrete time series, known as fractional Gaussian noise (fGN), elements of which marginally follow N(0, τ2). The autocovariance function ρϵ(k), k = 0, 1, 2, … of is
| (3.5) |
where, for two sequences an and bn, an ~ bn implies that an/bn = 1 as n → ∞. Hence, for H ∈ (1/2, 1) the series is LRD in the sense of equation (3.2) with LRD parameter d = H − 1/2. Our proposed model relies heavily on the simple observation that if we define a series , where is a stationary Gaussian series, then the autocovariance function of this binary series is
| (3.6) |
see Livsey et al. (2018), Lemma 4.1, for a proof of this property. In particular, if , then the binary series inherits the LRD property. To see this, suppose H ∈ (1/2, 1), then for large lags k, ρZ*(k) ≈ ρϵ(k) since sin x ≈ x for small x, that is, the series is also long-range dependent with Hurst coefficient H. In the context of discretized event data as described in (3.1), we then have the following latent formulation:
| (3.7) |
for i = 0, 1, …; see also Livsey et al. (2018), equation (4.4), for an equivalent formulation for any latent Gaussian series. The above formulation accounts for long memory in the observed binary series, with the autocorrelation decay mimicking that of an fGN. Moreover, as a consequence of the scaling property of an fBM, a scale-free property of conditional probabilities consistent with Figure 2 is established in the following lemma.
Lemma 3.1. Let BH(t) be an fBM with Hurst coefficient H with τ2 = 1. Suppose we observe BH(t) at and let Xi ≡ BH(i), i ≥ 1, X0 ≡ BH(0). Define the binary series of indicators at time scale m, , i ≥ 1 so that, for m = 0, the series , , … is as in (3.7). Then, for any m = 0, 1, …, the conditional probability is independent of the time scale m. In particular,
| (3.8) |
Proof. See Appendix A.1. □
Two remarks are in order. First, for the special case H = 0.5, the conditional probability in equation (3.8) becomes 1/2 so that, when the series of indicators are generated from an underlying white noise series, the conditional probability of Zi+1 = 1|Zi = 1 and the marginal probability of Zi = 1 are equal. Second, since the function arcsin(·) is increasing, the conditional probability of Zi+1 = 1|Zi = 1 increases with H, covering the cases of antipersistence H < 0.5, independence H = 0.5, and LRD for H > 0.5. Figure 4 depicts the relationship between the Hurst coefficient H and the conditional probabilities.
Fig. 4.

Relation between the Hurst coefficient H and the conditional probabilities obtained from equation (3.8).
Additionally, the spectral density of the series Zn can be shown to have a pole at zero frequency when H > 1/2, a distinctive feature of LRD series. Let sZ(λ) and sϵ(λ) denote the spectral density of the series Zn and ϵn, respectively, for −π ≤ λ ≤ π. Then we have, for H > 1/2,
where we have used the Jordan inequality arcsin x − x ≥ 0 for 0 < x < 1 (Mitrinović and Vasic (1970)). Combining this with the fact that sϵ(λ) ~ (τ2/C)λ1−2H, C = C(H) > 0 in a neighborhood of 0, we see sZ(λ) also has a pole at λ = 0 for H > 1/2 and hence is LRD, according to definition (3.3).
When considering the Amazon bird vocalization data and other real data applications, a clear limitation of model (3.7) is the restriction of the marginal probabilities being fixed at 0.5. To be realistic, we need to allow the marginal probabilities to be arbitrary and varying smoothly according to the time of the day. Moreover, Mikosch and Stărică (2004) and Chen, Härdle and Pigorsch (2010), among many others, noted that long memory behavior can often be an artifact of nonstationarities.
With this motivation we introduce a nonstationary component in the FRAP model by assuming that the latent process driving the events, say y(t), admits an additive decomposition of the form y(t) = f(t) + BH(t) while letting
| (3.9) |
where we assume f(·) is continuously differentiable (see Section 4 for examples). The marginal probability of observing an event in interval (ti−1, ti] is then , where Φ(·) is the cumulative density function of a standard Gaussian random variable. Hence, the variation in f(·) during (ti−1, ti] determines the probability of observing an event during this time; a positive change increases the marginal probability, whereas a negative change decreases it. If f(ti) − f(ti−1) = 0, then the marginal probability is . To simplify notation, we write Zi = Z(ti−1, ti). The vector follows an n-dimensional Gaussian distribution with mean 0 and covariance matrix ΣH whose (i, j)th element is ΣH(i, j) = τ2ρϵ(|i − j|), defined in equation (3.5). The marginal probability of an event occurrence in the interval (ti−1, ti] then becomes P[Z(ti−1, ti) = 1] = Φ[{f(ti) − f(ti−1)}/τ]. In Figure 5 we show the variations in marginal probabilities when the nonstationary component f(t) in model (3.9) is set to f(t) = sin(4πt)/90 with τ = 1.
Fig. 5.

Variation in marginal probabilities of observing a vocalization or an event when f(t) = sin(4πt)/90 for time intervals (0, 1], (1, 2], …, (89, 90]. Here, τ = 1, and the marginal probabilities are calculated as Φ{f(i + 1) −f(i)} for i = 0, …, 89.
Akin to probit models for longitudinal binary data with covariate information (Chib and Greenberg (1998)), we are interested in modeling the likelihood of the observed events Z = (Z1, …, Zn) ∈ {0, 1} n. However, in our context we have time series data with smooth trend f(t) and temporal dependence captured through ϵH. Letting f = {f(t0), …, f(tn)} and putting the pieces together, we get the following probit-type model:
| (3.10) |
where EW is the intersection of half-planes and the matrix A ∈ ℛn×n is such that Aii = 1, Ai,i−1 = −1 and Aij = 0 for j ≠ i, i − 1. For identifiability we impose the restriction that f(0) = 0. Then, under model f(·)/τ is identifiable. To accommodate this restriction, we let A11 = 1, A1,j = 0, j = 2, …, n; the other rows of A remain unchanged.
Model (3.10) is quite flexible in incorporating a smooth trend f(t) and autocorrelated errors. In the special case in which H = 0.5, the error term becomes uncorrelated so that f(t) is assumed to characterize the pattern over time in the data. In contrast, when H > 0.5, we obtain long range dependence. The model provides a useful basis for testing of long-range dependence via comparing H0: H = 0.5 to H1: H > 0.5 in the presence of potential nonstationarity.
3.2. Priors and posterior computation.
Without loss of generality, we assume that the time points {t0, …, tn} ∈ 𝒯 = [0, T]. Let Θ = {(f, β, τ): f ∈ ℱ, β ∈ ℛ, τ ∈ ℛ+} be the parameter space in model (3.10), where we let ℱ be the space of continuously differentiable functions on 𝒯 and β = log{H/1 − H}. Let Πβ denote the prior on β and Πτ denote the prior on τ.We choose Πβ ≡ N(0, 1) and Πτ ≡ Inverse-Gamma(aτ, bτ) for positive constants aτ, bτ. For the nonparametric component we let f ~ Πf, where Πf is an appropriate prior for an unknown smooth function. In particular, we choose a zero mean Gaussian process (GP) with a squared exponential covariance kernel (Rasmussen and Williams (2006)) scaled by the precision parameter τ2 of the latent process, defined as
| (3.11) |
for s, t ∈ 𝒯. For numerical stability we follow the standard practice of adding a small positive quantity ν to the diagonal elements of the GP covariance matrix so that . Consequently, the induced prior on g = Af is again a multivariate Gaussian distribution with covariance matrix Cg = τ2ACA′, where C is an n × n matrix with Cij = C(ti, tj). To learn the hyperparameters (σ, ϕ) from the data, we transform them to the logarithmic scale and augment the parameter space Θ to Θ* = Θ × η, where η = {(log σ, log ϕ): σ, ϕ > 0}. We place independent standard Gaussian priors on each component of η. Thus, Πη ≡ N(0, 1) × N(0, 1). The prior specification is completed by setting Π= Πf ×Πβ ×Πτ ×Πη.
For the Amazon bird vocalization data, we have replications {Z(1), …, Z(R)} of Z over different days which have minimal empirical correlations. We assume these replicates are conditionally independent involving the same f(t) but with different realizations of the latent residual term leading to different realizations W(r), for r = 1, …, R, of W in equation (3.10). Including also the priors, this leads to the following hierarchy:
| (3.12) |
for any Er ⊂ {0, 1}n and , as defined after equation (3.10).
Posterior computation under the hierarchical FRAP model (3.12) is potentially challenging. We initially considered an integrated nested Laplace approximation (INLA) which was developed for approximate Bayesian inference in latent Gaussian models by Rue, Martino and Chopin (2009). However, the non-Markovian structure of the FRAP model renders the INLA paradigm nonapplicable (Rue and Held (2005)). In a recent article, Sørbye and Rue (2018) applied the INLA framework to a fGN model where the authors approximate the fGN by a mixture of first-order autoregressive processes. This approximation technique works quite well when the observed time series is quite long n ~ 500 and the number of replications available is also very high R ~ 1000. For the Amazon bird vocalization data, both the length and the replications are quite small compared to these numbers.

We instead focus on Markov chain Monte Carlo (MCMC), developing a practical algorithm that exploits the structure of the model, as detailed in Algorithm 1. We use θ | − to denote the full conditional distribution of a parameter θ, given other parameters and the data in Algorithm 1. The Metropolis random walk steps to update the Hurst exponent and the Gaussian process kernel hyperparameters are implemented following the adaptive Metropolis algorithm (Roberts and Rosenthal (2001)). Adaptive Metropolis modifies the classical version of the algorithm by varying the covariance of the noise in the random walk targeting the optimal acceptance rate (Roberts and Rosenthal (2001)). Suppose s1 and s2 are the noise variance of the random walk updates of β and η, respectively. We start with s1 = 0.1 and s2 = 0.2 and update them at MCMC iteration l by increasing or decreasing by a factor of exp(l−0.5) whenever l is divisible by 50. Adaptation targets an acceptance probability of ~ 0.3. Values of f(·) at a set of test points can also be evaluated by accommodating a further step in Algorithm 1 following Rasmussen and Williams (2006), equations 2.22–2.24.
The main computational bottleneck of Algorithm 1 involves simulating the truncated Gaussian random variables for updating the latent variables Wr. This is done using R package tmvtnorm. Unfortunately, we found the popular circulant embedding algorithm (Pipiras and Taqqu (2017), Chapter 2.11) to simulate Gaussian long-range dependent sequences to be quite slow when these constraints are imposed. To accelerate computation, the R copies of the latent variables are generated in parallel. The R code to implement the FRAP model, given R copies of discretized events, is available at https://github.com/antik015/Fractional-Probit-Model; the code is also available in Chakraborty, Ovaskainen and Dunson (2022b).
3.3. Asymptotics.
Here, we consider infill asymptotics, so we assume we can make measurements at finer time points {t0, …, tn } as n → ∞ within the interval [0, T]. We assume the noise variance τ = 1. Also, we set the number of replications R = 1 since the proof does not depend on a specific value of R. Let the true trend function be f0 ∈ ℱ and the true Hurst coefficient be H0, satisfying 0 < a < H0 < b < 1 for some a, b ∈ (0, 1). Define θ0 = (f0, H0) and P0 to be the true data generating probability measure, and consider any weak neighborhood U of θ0. By showing that the joint prior Π ☰Πβ × Πf has positive Kullback–Leibler support we have the following consistency result.
Theorem 3.2. Suppose f0 ∈ ℱ and 0 < a < H0 < b < 1 for some a, b ∈ (0, 1). Write θ0 = (f0, H0), and consider any weak neighborhood U of θ0. Then, the posterior probability of the set Uc given the series of indicators Π(Uc | Z1, …, Zn) → P0-probability as n → ∞.
A proof of Theorem 3.2 is provided in the Appendix.
4. Simulation experiments.
We report the results of a detailed simulation study for different choices of the latent trend function f(·) in equation (3.9) while varying the number of replications R. We assume discretized observations are available for a period of n = 90 time units, and the number of replications R considered is {10, 25, 50}. The following choices of the trend function are considered:
f2(t) = 5[1 +exp{−2.5(t −45)/15}]−1;
f3(t)=−2{(t −45)/45}2 +2;
f5(t) = 0.1f1(t) log{f2(t)}.
We note here that f2(·) slightly violates the assumption that the nonstationary component in model (3.9) at t = 0 is 0. We define the squared empirical ℓ2 norm of a function g(·), evaluated on the points {t1,…, tn}, as . Given an estimator of in model (3.12), we evaluate the performance of Algorithm 1 by computing the relative mean square error (ReMSE), defined as , where . The latent trends f(·) are chosen from the aforementioned list, and is set to be the pointwise posterior mean of f(·)/τ at {t1, …, t90 }, obtained under the hierarchy (3.12). We considered three choices for the Hurst exponent, namely, {0.5, 0.75, 0.9}, ranging from independent increments for H = 0.5 to highly correlated increments for H = 0.9. We generated the binary data by first evaluating y(t) = f(t) + BH(t) at {t0, t1, …, tn }; to simulate the noise vector, we sampled ϵH ~ N(0, τ2 ΣH) with τ2 = 0.052, 0.12, 0.152. Representing each positive increment of y(·) by 1, the discretized series Z is obtained, and the sampling is repeated R times to complete the data generation process. For each combination of f(·), H, τ, and R, we performed 30 independent evaluations of the proposed framework, and in Table 2 we report the average ReMSE and the average estimated Hurst exponent for τ = 0.1 with the value of ν fixed at 0.001; results for τ = 0.05,0.15 are provided in the Supplementary Material (Chakraborty, Ovaskainen and Dunson (2022a)).
Table 2.
Relative mean square error (ReMSE) for different choices of the latent trend function f(t) for the model (3.10) under hierarchy (3.12). For each f(t), three values of the Hurst exponent are considered: {0.5, 0.75, 0.9} together with {10, 25, 50} replications. The results reported are averages of 30 independent simulation experiments for each combination
| f1(t) | f2(t) | f3(t) | f4(t) | f5(t) | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Hurst exponent (H) | Replications (R) | MSE | MSE | MSE | MSE | MSE | |||||
| 0.5 | 10 | 1.26 | 0.55 | 1.02 | 0.49 | 0.12 | 0.52 | 0.09 | 0.52 | 1.38 | 0.50 |
| 25 | 0.58 | 0.48 | 0.40 | 0.48 | 0.01 | 0.50 | 0.01 | 0.51 | 0.96 | 0.51 | |
| 50 | 0.40 | 0.54 | 0.17 | 0.50 | 0.007 | 0.48 | 0.005 | 0.50 | 0.08 | 0.50 | |
| 0.75 | 10 | 2.13 | 0.76 | 1.88 | 0.76 | 0.14 | 0.74 | 0.28 | 0.76 | 4.81 | 0.74 |
| 25 | 1.37 | 0.75 | 1.20 | 0.77 | 0.06 | 0.76 | 0.03 | 0.75 | 1.46 | 0.75 | |
| 50 | 0.84 | 0.74 | 0.24 | 0.74 | 0.04 | 0.75 | 0.02 | 0.75 | 0.55 | 0.74 | |
| 0.9 | 10 | 4.18 | 0.88 | 6.52 | 0.87 | 0.70 | 0.88 | 0.20 | 0.90 | 14.96 | 0.87 |
| 25 | 3.08 | 0.89 | 2.61 | 0.89 | 0.29 | 0.89 | 0.18 | 0.89 | 5.87 | 0.93 | |
| 50 | 1.11 | 0.89 | 0.99 | 0.88 | 0.08 | 0.87 | 0.07 | 0.89 | 3.34 | 0.88 | |
Estimates of the Hurst exponent are quite accurate across all the combinations of R, H, and f(·). This is important in the context of the Amazon bird vocalization data for which we have, on average, 10 days of data. Naturally, the ReMSE in Table 2 is inversely proportional to the number of replications R, decreasing by a factor of two when the number of replications is doubled. Interestingly, the degree of LRD also controls the ReMSE. For all the choices of f(·), the average ReMSE increases with H. Large H implies strong dependence in the data which makes the problem of recovering f(·) harder. This was investigated formally in Hall and Hart (1990) who observed that the rates of recovering f(·) decrease with H. Set p(t1, t2) = Φ[{f (t2) − f (t1)}/τ] as the true marginal probability under model (3.9) with trend function f(·) and let denote samples from the posterior distribution of f and τ obtained fitting Algorithm 1. The black line in Figure 6 is the posterior mean of the marginal probabilities , and the red line plots p(t1, t2) for the case H = 0.75 and R = 50 and five choices of f(·) in consideration here. We also show the pointwise 95% credible bands of . The best result is obtained for f1(t). The credible bands mostly provide accurate uncertainty quantification for all the cases. However, when the number of replications R is smaller the problem of accurately estimating the marginal probabilities becomes much harder, especially for high values of H. Posterior samples of the Hurst exponent for one case are also included in the figure.
Fig. 6.

Figure (a) shows the posterior mean and 95% credible bands for marginal probabilities in one minute intervals when . The values of the Hurst coefficient and the number of replications were H = 0.75 and R = 50, respectively. Red dashed and black solid lines correspond to the true values and the posterior mean, respectively. Gray shaded regions are credible bands. Corresponding posterior samples of H are shown in (b). A red line is added at the true value H = 0.75.
To further investigate the behavior of the posterior distribution of the Hurst exponent, we carried out an independent simulation experiment focusing on the coverage probability of the credible intervals. We fix the number of replicates at R = 5 and vary the Hurst exponent together with the latent trends as above. For each such combination, we generated 100 data sets and applied model (3.12). Our findings for 95% credible intervals are summarized in Table 3. The coverage probabilities (CP) for all the cases considered are close to the nominal level. The average lengths (l) of the intervals vary substantially for different choices of H along with the standard deviation. For example, the average length of the intervals are maximum for the case H = 0.5 with very little variation, but, when H = 0.9, the intervals become shorter on average although their variability increases by almost a factor of 3.
Table 3.
Coverage probability (CP) of 95% credible intervals for the Hurst exponent under the hierarchy (3.12). Also included are the average length (l) of the credible intervals with corresponding standard deviation inside parenthesis. The number of replicates in each case is R = 5
| f1 (·) | f2 (·) | f3 (·) | f4 (·) | f5 (·) | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| H | CP | l | CP | l | CP | l | CP | l | CP | l |
| 0.5 | 0.97 | 0.16 (0.05) | 0.94 | 0.19 (0.04) | 0.92 | 0.20 (0.02) | 0.92 | 0.21 (0.03) | 0.98 | 0.20 (0.02) |
| 0.75 | 0.91 | 0.15 (0.03) | 0.92 | 0.14 (0.14) | 0.92 | 0.14 (0.02) | 0.90 | 0.14 (0.02) | 0.93 | 0.13 (0.01) |
| 0.9 | 0.90 | 0.13 (0.10) | 0.89 | 0.14 (0.11) | 0.91 | 0.12 (0.10) | 0.90 | 0.12 (0.09) | 0.94 | 0.14 (0.10) |
5. Application to Amazon bird vocalization data.
5.1. Analysis and results.
We applied the FRAP model to the 15 bird species mentioned in Section 2. For each of these species, we have 180 minutes of recordings available for multiple days. The estimated Hurst exponents for these 15 species are reported in Table 1 with the posterior mean, lower, and upper end of the 95% credible intervals under columns , , and , respectively. All the species show high long-range dependence in their temporal vocalization patterns. The posterior mean estimate of the Hurst exponent for the birds range from a minimum of 0.83 up to 0.94. The variation in the Hurst exponent across species is very small with an overall mean of 0.88 and standard deviation 0.04. The high value of the Hurst exponents is consistent with the data in the sense that birds either vocalize or remain silent over long periods of time. We note that this is a combination of two factors which are occurrence and vocalization activity. First, due to their movement activity, a bird individual may be in the vicinity of the recorder for some time and then move to another location. Second, conditional on the bird being present, it may sustain its vocalization activity over some time and remain silent over another time.
Figure 7 shows posterior means and 95% pointwise intervals for the species-specific marginal probabilities of vocalizations occurring in each of the 180 time intervals between 5.15–8.15 a.m. for all 15 species listed in Table 1. Due to data sparsity and the high Hurst exponent, the raw posterior samples exhibited spiky patterns over time, and, hence, we (mildly) smoothed the samples prior to calculating the posterior summaries in Figure 7. While these trends should not be overinterpreted, we do see some general patterns appearing. For example, for Cercomarca cinerascens, Frederickena viridis, Grallaria varia, Micrastur mirandollei, Myrmeciza ferruginea, Percnostola ruffifrons, Pipra erythrocephala, Pithys albifrons, and Ramphastos vitellinus we see an increase in vocalization activity after 7 a.m., whereas Automolus ochrolaemus, Corythopis torquata, Hylexetastes perrotii, and Ibycter americanus more or less maintain a uniform activity level during this time. Micrastur gilvicollis and Hylophilus muscicapinus show more activity during the early hours of the day. Since groups of birds show similar vocalization patterns, in Chakraborty, Ovaskainen and Dunson (2022a), we extend the FRAP framework to a hierarchical setting that shares information across different species.
Fig. 7.

Smoothed marginal probabilities of vocalization obtained by fitting model (3.9) for the 15 species listed in Table 1 for 180 test intervals of duration one minute from 5.15–8.15 a.m. Shaded regions are 95% credible intervals and black lines are posterior means.
5.2. FRAP vs. MMPP.
We compare the fit of the proposed FRAP model with the MMPP model (Davison and Ramesh (2020)) for discretized event data via summary statistics derived from the posterior distribution and maximum likelihood estimates, respectively. The particular summary statistics in which we are interested are the conditional probabilities in Figure 2. In the context of the FRAP model, the distribution of the binary indicators Z is completely characterized by the latent variables W. The posterior predictive distribution of WR+1, given the observed binary indicators Z1, …, ZR, is p(WR+1 | Z1, …, ZR) = ∫ p(WR+1 | θ*)p(θ* | Z1, …, ZR), where θ* = (f, β, τ, σ, ϕ)T and p(WR+1 | θ*) ~ N(Af, τ2ΣH), H = log{β/(1 − β)}. To sample the latent variable WR+1, we use the MCMC samples of θ* obtained from Algorithm 1, that is, given , the l th MCMC sample from p(θ* | Z1, …, ZR), we draw . Then, equation (3.9) is used to obtain the corresponding binary series .
The MMPP assumes event occurrence is governed by specific states of an unobserved continuous time Markov chain, hereafter referred to as CTMC, X(t) with finite state space {1, 2, …, K } and instantaneous transition probability matrix G ∈ ℛK×K. Given the chain is in state k ∈ {1, …, K } at time t, events occur following a Poisson process with rate λk. The event generating process is then parameterized by the G and λ = {λ1, …, λk }. The likelihood of a discretized series of events under the MMPP model has been derived in Davison and Ramesh (2020). Let and denote the maximum likelihood estimates of G and L, respectively, using R replicates of binary event indicators Z1, …, ZR. For the Amazon bird vocalization data we generate a series of binary event indicators ZR+1 using the plug-in estimates and with k = 2.
Having generated event indicators ZR+1 from the two models for each of the 15 species in Table 1, we compute the conditional probability of occurrence of a vocalization, given a vocalization in the previous interval for time scales Δt = {1, 2, 4, 9, 15, 30, 60, 90}; for the FRAP model we compute the conditional probabilities for each MCMC sample and consider the average. In the left panel of Figure 8, we plot these probabilities using the estimates obtained from the MMPP model, and in the right panel we plot the average conditional probability for different time scales across MCMC samples. The proposed FRAP model captures the scaling of the conditional probabilities seen in the observed data (Figure 2) while the MMPP does not. We also fitted the MMPP with K = 3 states, but the results were very similar.
Fig. 8.

Conditional probabilities of vocalizations for the 15 different species at different time scales Δt = {1, 2, 4, 9, 15, 30, 60, 90} obtained from fitted model for the MMPP (left) and samples from posterior predictive for the FRAP model (right).
5.3. Model diagnostics.
We also carried out typical model diagnostics for count time series data, discussed in Czado, Gneiting and Held (2009) and Kolassa (2016). Specifically, we use marginal calibration plots to assess model fit. We first draw samples from the predictive distribution of Z180 |Z1, …, Z179 for a particular species of bird. We then compute P(Z180 = 1 | Z1, …, Z179) using the Monte Carlo average. This predictive probability is then matched with the observed probability P(Z180 = 1) which is computed as . In Figure 9 we plot the differences in the predicted and observed probabilities for the 15 different species. For some birds the difference is very small, whereas for other birds this difference goes up to 0.25, especially when the number of replicates available is small. Overall, the model performs adequately; prediction of vocalizations can potentially be improved by including covariates, such as weather and habitat conditions at the sampling site.
Fig. 9.

Difference between one-step ahead prediction probabilities for Z180 = 1 and observed probabilities for the 15 species of bird analyzed here.
6. Discussion.
In this article we proposed a novel class of models for characterizing long-range dependence in discretized event data, along with a Bayesian approach to inference under these models. We are particularly motivated by bird vocalization studies and, indeed, are involved in ongoing collaborations collecting many such datasets across the globe in order to obtain new insights into biodiversity, interactions among species, and the role of biotic and abiotic factors. The proposed class of FRAP models provides an important starting point for building realistic models for these emerging datasets as well as related datasets from precipitation and storm event modeling. Immediate next directions are to add complexity to the models in order to more realistically characterize structure in the data, ranging from spatial dependence to covariate effects. Such extensions are conceptually quite straightforward.
There are several other important directions that are potentially less trivial. The first is to broaden the class of models from a latent fractional Brownian motion to a broader class of stochastic processes with long-range dependence. This may include long-range modifications to usual Gaussian process covariance kernels (e.g., Matern) as well as non-Gaussian cases; for example, Levy processes, alpha-stable processes, etc. The second critical direction is developing much faster computational algorithms. There is an immense literature on algorithms for accelerating computation in Gaussian process models but, to our knowledge, very little consideration of the case in which there is long-range dependence. In our motivating applications we are faced with immense datasets containing automated recordings over time at many different locations around the world. To scale up to such datasets, we plan to consider divide-and-conquer algorithms and variational approximations, among other directions.
Supplementary Material
Acknowledgments.
The authors acknowledge support from the United States Office of Naval Research (ONR) and the European Research Council (ERC). We also thank the Editor and two anonymous referees for their constructive suggestions.
APPENDIX SECTION
A.1. Proof of Lemma 3.1.
Since the fBM is a Gaussian process, from Corollary 2.6.3 of Pipiras and Taqqu (2017), we get and for any . Hence, BH(i) ~ N(0, i2H). By stationarity of the incremental process of fBM, it is enough to show (3.8) holds for i = 1. Define and . Then, . Also, . From (3.4) we get . Thus, we have . Finally, , which, after applying (3.4) again, we obtain Cov(Y1, Y2) = 22Hm(22H−1 − 1). Setting λ2 = 22Hm,
A.2. Mixing of MCMC chain in Algorithm 1.
We briefly comment on the mixing of the MCMC chain obtained via Algorithm 1. With L MCMC samples we calculate the effective sample sizes (ESS) for the parameters f(·)//τ H as
| (A.1) |
where ρ(j) is the autocorrelation at lag j. We set J = 30 as the maximum lag and L = 10,000. For the 180 parameters f(t)/τ, where t = 1, …, 180, the average effective sample size for the 15 species were 2012.21 and that for the Hurst coefficient H averaged over all the species is 1941.44.
A.3. Proof of Theorem 3.2.
For any set V ∈ Θ, the posterior probability Π(V | Z1, …, Zn) = ∫ Π(V | W, Z1,…, Zn) Π (W | Z1, …, Zn) dW. Now, fix any weak neighborhood U of θ0. Weak consistency conditional on the latent variables is proved in Section S6 of Chakraborty, Ovaskainen and Dunson (2022a). Thus, the random variable Π(Uc | W, Z1, …, Zn) converges to 0 in P0-probability. We now extend the proof for the marginal probability Π(Uc | W, Z1, …, Zn). Fix any δ > 0. Then we have
where we use the fact that Π(Uc | W, Z1, …, Zn) ≤ 1.
Footnotes
SUPPLEMENTARY MATERIAL
Supplementary materials (DOI: 10.1214/21-AOAS1546SUPPA; .pdf). The supplementary document Chakraborty, Ovaskainen and Dunson (2022a) contains an extension of the FRAP framework to a grade-of-membership model, related priors and computational details for joint inference on multiple species, technical results for proving Theorem 3.2, additional simulation results from Section 4.
Code (DOI: 10.1214/21-AOAS1546SUPPB; .zip). This zip folder contains two Rscript files to implement the proposed method. One could download and run the git_demo_run.R file as it is.
REFERENCES
- Beran J, Feng Y, Ghosh S and Kulik R (2013). Long-Memory Processes: Probabilistic Properties and Statistical Methods. Springer, Heidelberg. 10.1007/978-3-642-35512-7 [DOI] [Google Scholar]
- Chakraborty A, Ovaskainen O and Dunson DB (2022a). Supplement to “Bayesian semiparametric long memory models for discretized event data.” 10.1214/21-AOAS1546SUPPA [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chakraborty A, Ovaskainen O and Dunson DB (2022b). Code to implement methods in “Bayesian semiparametric long memory models for discretized event data.” 10.1214/21-AOAS1546SUPPB [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen Y, Härdle WK and Pigorsch U (2010). Localized realized volatility modeling. J. Amer. Statist. Assoc 105 1376–1393. 10.1198/jasa.2010.ap09039 [DOI] [Google Scholar]
- Chib S and Greenberg E (1998). Analysis of multivariate probit models. Biometrika 85 347–361. [Google Scholar]
- Cui Y and Lund R (2009). A new look at time series of counts. Biometrika 96 781–792. 10.1093/biomet/asp057 [DOI] [Google Scholar]
- Czado C, Gneiting T and Held L (2009). Predictive model assessment for count data. Biometrics 65 1254–1261. 10.1111/j.1541-0420.2009.01191.x [DOI] [PubMed] [Google Scholar]
- Davis RA, Holan SH, Lund R and Ravishanker N (2016). Handbook of Discrete-Valued Time Series. CRC Press. [Google Scholar]
- Davison A and Ramesh N (1996). Some models for discretized series of events. J. Amer. Statist. Assoc 91 601–609. [Google Scholar]
- De Camargo U, Roslin T and Ovaskainen O (2019). Spatio-temporal scaling of biodiversity in acoustic tropical bird communities. Ecography 42 1936–1947. [Google Scholar]
- Fearnhead P and Sherlock C (2006). An exact Gibbs sampler for the Markov-modulated Poisson process. J. R. Stat. Soc. Ser. B. Stat. Methodol 68 767–784. 10.1111/j.1467-9868.2006.00566.x [DOI] [Google Scholar]
- Fischer W and Meier-Hellstern K (1993). The Markov-modulated Poisson process (MMPP) cookbook. Perform. Eval 18 149–171. 10.1016/0166-5316(93)90035-S [DOI] [Google Scholar]
- Franzke CL, Barbosa S, Blender R, Fredriksen H-B, Laepple T, Lambert F, Nilsen T, Rypdal K, Rypdal M et al. (2020). The structure of climate variability across scales. Reviews of Geophysics 58 e2019RG000657. [Google Scholar]
- Geweke J and Porter-Hudak S (1983). The estimation and application of long memory time series models. J. Time Series Anal 4 221–238. 10.1111/j.1467-9892.1983.tb00371.x [DOI] [Google Scholar]
- Graves T, Gramacy R, Watkins N and Franzke C (2017). A brief history of long memory: Hurst, Mandelbrot and the road to ARFIMA, 1951–1980. Entropy 19 437. [Google Scholar]
- Hall P and Hart JD (1990). Nonparametric regression with long-range dependence. Stochastic Process. Appl 36 339–351. 10.1016/0304-4149(90)90100-7 [DOI] [Google Scholar]
- Hurst HE (1951). Long-term storage capacity of reservoirs. Trans. Amer. Soc. Civ. Eng 116 770–799. [Google Scholar]
- Jacobs PA and Lewis PAW (1978a). Discrete time series generated by mixtures. I. Correlational and runs properties. J. Roy. Statist. Soc. Ser. B 40 94–105. [Google Scholar]
- Jacobs PA and Lewis PAW (1978b). Discrete time series generated by mixtures. II. Asymptotic properties. J. Roy. Statist. Soc. Ser. B 40 222–228. [Google Scholar]
- Jia Y, Kechagias S, Livsey J, Lund R and Pipiras V (2021). Latent Gaussian count time series. J. Amer. Statist. Assoc 1–28. [Google Scholar]
- Kolassa S (2016). Evaluating predictive count data distributions in retail sales forecasting. Int. J. Forecast 32 788–803. [Google Scholar]
- Krebs JR and Kacelnik A (1983). The dawn chorus in the great tit (Parus major): Proximate and ultimate causes. Behaviour 83 287–308. [Google Scholar]
- Laiolo P (2010). The emerging significance of bioacoustics in animal species conservation. Biol. Conserv 143 1635–1645. [Google Scholar]
- Laskin N (2003). Fractional Poisson process Commun. Nonlinear Sci. Numer. Simul 8 201–213. 10.1016/S1007-5704(03)00037-6 [DOI] [Google Scholar]
- Livsey J, Lund R, Kechagias S and Pipiras V (2018). Multivariate integer-valued time series with flexible autocovariances and their application to major hurricane counts. Ann. Appl. Stat 12 408–431. 10.1214/17-AOAS1098 [DOI] [Google Scholar]
- Lo AW (1989). Long-term memory in stock market prices. Technical Report, National Bureau of Economic Research. [Google Scholar]
- Mandelbrot BB and Van Ness JW (1968). Fractional Brownian motions, fractional noises and applications. SIAM Rev 10 422–437. 10.1137/1010093 [DOI] [Google Scholar]
- Mandelbrot BB and Wallis JR (1969). Some long-run properties of geophysical records. Water Resour. Res 5 321–340. [Google Scholar]
- Mckenzie E (1985). Some simple models for discrete variate time series 1. J. Am. Water Resour. Assoc 21 645–650. [Google Scholar]
- Mckenzie E (1986). Autoregressive moving-average processes with negative-binomial and geometric marginal distributions. Adv. in Appl. Probab 18 679–705. 10.2307/1427183 [DOI] [Google Scholar]
- Mckenzie E (1988). Some ARMA models for dependent sequences of Poisson counts. Adv. in Appl. Probab 20 822–835. 10.2307/1427362 [DOI] [Google Scholar]
- Mikosch T and Stărică C (2004). Nonstationarities in financial time series, the long-range dependence, and the IGARCH effects. Rev. Econ. Stat 86 378–390. [Google Scholar]
- Mitrinovic DS and Vasic PM (1970). Analytic Inequalities 1. Springer. [Google Scholar]
- Ogata Y and Abe K (1991). Some statistical features of the long-term variation of the global and regional seismic activity. International Statistical Review/Revue Internationale de Statistique 139–161. [Google Scholar]
- Ovaskainen O, DE Camargo UM and Somervuo P (2018). Animal sound identifier (ASI): Software for automated identification of vocal animals. Ecol. Lett 21 1244–1254. 10.1111/ele.13092 [DOI] [PubMed] [Google Scholar]
- Peng C-K, Buldyrev SV, Havlin S, Simons M, Stanley HE and Goldberger AL (1994). Mosaic organization of DNA nucleotides. Phys. Rev. E 49 1685. [DOI] [PubMed] [Google Scholar]
- Pipiras V and Taqqu MS (2017). Long-Range Dependence and Self-Similarity. Cambridge Series in Statistical and Probabilistic Mathematics 45. Cambridge Univ. Press, Cambridge. [Google Scholar]
- Ramesh N, Thayakaran R and Onof C (2013). Multi-site doubly stochastic Poisson process models for fine-scale rainfall. Stoch. Environ. Res. Risk Assess 27 1383–1396. [Google Scholar]
- Rasmussen CE and Williams CKI (2006). Gaussian Processes for Machine Learning. Adaptive Computation and Machine Learning. MIT Press, Cambridge, MA. [Google Scholar]
- Roberts GO and Rosenthal JS (2001). Optimal scaling for various Metropolis–Hastings algorithms. Statist. Sci 16 351–367. 10.1214/ss/1015346320 [DOI] [Google Scholar]
- Robinson PM (1995). Gaussian semiparametric estimation of long range dependence. Ann. Statist 23 1630–1661. 10.1214/aos/1176324317 [DOI] [Google Scholar]
- Rue H and Held L (2005). Gaussian Markov Random Fields: Theory and Applications. Monographs on Statistics and Applied Probability 104. CRC Press/CRC, Boca Raton, FL. 10.1201/9780203492024 [DOI] [Google Scholar]
- Rue H, Martino S and Chopin N (2009). Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. J. R. Stat. Soc. Ser. B. Stat. Methodol 71 319–392. 10.1111/j.1467-9868.2008.00700.x [DOI] [Google Scholar]
- Samorodnitsky G (2006). Long range dependence. Found. Trends Stoch. Syst 1 163–257. 10.1561/0900000004 [DOI] [Google Scholar]
- Slabbekoorn H and Smith TB (2002). Bird song, ecology and speciation. Philos. Trans. R. Soc. Lond. B, Biol. Sci 357 493–503. 10.1098/rstb.2001.1056 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sørbye SH and RUE H (2018). Fractional Gaussian noise: Prior specification and model comparison. Environmetrics 29 e2457. 10.1002/env.2457 [DOI] [Google Scholar]
- Stern R and Coe R (1984). A model fitting analysis of daily rainfall data. J. R. Stat. Soc., A 147 1–18. [Google Scholar]
- Tagliazucchi E, Von Wegner F, Morzelewski A, Brodbeck V, Jahnke K and Laufs H (2013). Breakdown of long-range temporal dependence in default mode and attention networks during deep sleep. Proc. Natl. Acad. Sci. USA 110 15419–15424. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tiao GC, Phadke M and Box GE (1976). Some empirical models for the Los Angeles photochemical smog data. J. Air Pollut. Control Assoc 26 485–490. [DOI] [PubMed] [Google Scholar]
- Weymer BA, Wernette P, Everett ME and Houser C (2018). Statistical modeling of the long-range dependent structure of barrier island framework geology and surface geomorphology. Earth Surf. Dyn 6 431–450. [Google Scholar]
- Willinger W, Paxson V, Riedi RH and Taqqu MS (2003). Long-range dependence and data network traffic. In Theory and Applications of Long-Range Dependence 373–407. Birkhäuser, Boston, MA. [Google Scholar]
- Zhou Z (2012). Measuring nonlinear dependence in time-series, a distance correlation approach. J. Time Series Anal 33 438–457. 10.1111/j.1467-9892.2011.00780.x [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
