Abstract
In this paper, we present a method to adjust a stochastic logistic differential equation (SLDE) to a set of highly sparse real data. We assume that the SLDE have two unknown parameters to be estimated. We calculate the Maximum Likelihood Estimator (MLE) to estimate the intrinsic growth rate. We prove that the MLE is strongly consistent and asymptotically normal. For estimating the diffusion parameter, the quadratic variation of the data is used. We validate our method with several types of simulated data. For more realistic cases in which we observe discretizations of the solution, we use diffusion bridges and the stochastic expectation-maximization algorithm to estimate the parameters. Furthermore, when we observe only one point for each path for a given number of trajectories we were still able to estimate the parameters of the SLDE. As far as we know, this is the first attempt to fit stochastic differential equations (SDEs) to these types of data. Finally, we apply our method to real data coming from fishery. The proposed adjustment method can be applied to other examples of SDEs and is highly applicable in several areas of science, especially in situations of sparse data.
Keywords: Stochastic logistic differential equation, diffusion bridges, EM algorithm, biological growth
1. Introduction
1.1. Motivation
Our main motivation for this work comes, mainly from biology, where it is interesting to fit stochastic differential equations (SDEs) to real data. However, the problem of adjusting SDEs has a wide field of interest. The problem of fitting parameters in differential equations has been studied for a long time. It is a classic example of the so-called inverse problem (see for instance [26] or [29]). In contrast, fitting SDEs to actual data is a very difficult task. Influenced by the deterministic case, in many situations, we can assume that the model to adjust depends only on some parameters that are constants but unknown and, by using the data, the goal is to estimate it. In this study, we consider this particular case and then we only need to estimate the parameters using the available data to fit the SDE. There exists some theory to do the estimation. Most of them assume one of the two following situations: a continuous observation of the solutions or, at least, discretized observations (see, for instance, [28,41] for further reading). Nevertheless, to have continuous observation is, in many cases, almost impossible and only in very rare situations is it possible to observe an acceptable discretization of the solution. In biology, and in particular, in marine biology, the situation is the same: for almost no phenomenon is it possible to get continuous observations along the time. The typical situations are those where only one measurement for each individual is observed, with the advantage that the data are a collection of several individuals from the same population. We will exploit this fact to provide a method of estimation of parameters for a particular SDE.
An SDE is essentially an ordinary differential equation in which one or more of the terms is a stochastic process, resulting in a solution which is also a stochastic process. The driven noise most in use is the Wiener process or some related process, which can be of additive or multiplicative type. SDEs have been applied in a wide range of disciplines, such as biology, medicine, population dynamics and engineering (see [32]). In this paper, we consider an SDE with a simple multiplicative noise. Parameter estimation for SDEs (e.g. the Ornstein-Uhlenbeck process, Geometric Brownian motion, etc.) have been developed over the last few decades, such as the MLE method (continuous observation), the Expectation Maximization (EM) algorithm, the Ozaki method, and Bayesian methods (discrete observation); see for instance [28,41] and references therein.
Regarding parametric estimation for SDE, we refer to the seminal paper [35]. In this paper, the author considers the parametric estimation problem for continuous time stochastic processes. He derives a particular functional partial differential equation, which characterizes the exact likelihood function of a discretely sampled Itô process. In opposition to [35], we first theoretically derive the likelihood function as an application of the Girsanov theorem for the full continuous process, meaning without any discretization. Afterward, we obtain the MLE for the parameter of interest, assuming continuous observations. At this point, we propose a natural discretized version of the MLE and we illustrate that this discretized MLE converges to the true value of the parameter.
Growth modeling in ecology is usually studied assuming a deterministic approach, which implies that the theoretical growth model can be represented from an average model. This represents a simplistic approach (e.g. constant parameters, homoscedasticity) with little statistical support for highly variable data, such as the individual variability in growth of the individuals in a population [25]. Another approach where the individual variability is explicitly included in the growth modeling is based on stochastic growth models, allowing estimating probability regions associated with the length- or age-structure of the population, indicating that the variability in growth for each individual can be statistically defined from an assumed probabilistic density function (e.g. normal, gamma) [21,36]. The individual variability in growth has been also analyzed from longitudinal data [19,27].
The individual growth is a demographic feature key; it is defined as the increment in length or weight with time. The individual growth is important to predict the total population and therefore it provides information to take good decisions related to the species; for instance, to estimate the biomass in wild populations, to establish politics related to control of population, etc. Usually, it has established two paths for analyzing the individual growth; the first is a deterministic approach, where one assumes that the model is a differential equation, with some parameters that are fixed constant and have to be estimated from data which are assumed as random observations from some unknown statistical population; in this way, an objective function requires to be defined (e.g. SSQ, likelihood function) to estimate parameters and confidence intervals by numerical optimization ([15,22,52]). There is a small variation of this approach, which is given by the Bayesian approach, assuming that the parameters are random; therefore the analysis provides a statistical distribution of the values of the parameters ([44]).
The second approach is the use of stochastic models for modeling growth, which have the advantage of characterizing the central tendencies of a population (similar to deterministic models), but also it could address some sources of variability of the individuals [33]. The application of this approach is an alternative for analyzing the skewed data commonly sampled in the random observations. The stochastic models are an attempt for estimating parameters able to include all observations; thus the heterogeneity in data and the incorporation of random variability in parameters are the most important features in the stochastic growth models ([48,49]).
In fishery, it could be applied to batoid species vulnerable to overfishing, which are characterized by having many offspring of small size, a high natural mortality rate (e.g. by predation) and are subjected to intense fishing regimes; therefore, this work is very important, because it is possible to estimate population growth parameters that are essential for demographic studies, which support the management and conservation schemes.
1.2. Our contribution
In this study, we propose a method for estimating the parameters in the three described situations and for a particular stochastic differential equation: the SLDE; the estimation combines a quadratic variation of the observations and MLE via stochastic expectation maximization (SEM) algorithm.
It is important to remark that the ideas of the proposed method could be applied to other SDEs satisfying some classical suitable assumptions. We validate the proposed method assuming three scenarios: when we have a continuous observations of the solution, when we have and discretize observations of the solution, and, the most difficult, where only a data set corresponding to one record for each individual at different points in time is observed. Then, we apply this method to a data set for the biological growth of the giant electric ray (or Cortez electric ray) Narcine entemedor.
The model we focus on in this paper is an SDE with a random initial condition. The use of a random initial condition allows us to include in the model the variability of the birth size. We consider the intrinsic growth rate as a constant but unknown; thus, we are interested in estimating this parameter. The model considers the individual variability of the organisms in the population such as has been studied by several authors (see for instance [41]). As we mentioned before, we are interested in estimating the parameters by using the quadratic variation and the MLE method. It is known that under nice assumptions, the likelihood ratio is obtained by using the Girsanov theorem. By maximizing the log-likelihood, we obtain the estimators and then prove that the estimator is strongly consistent and asymptotically normal.
The estimation method is based on the MLE jointly with the EM algorithm and we now discuss the idea behind the proposed method. The main challenge in likelihood-based inference for SDEs is that the transition density, and hence the likelihood function, for a discretely sampled solution of the SDE is in almost all cases not explicitly available and must therefore be approximated. Since the data can be viewed as incomplete observations, where the full data set is a continuous time record of the solution, it is natural that we propose to find maximum likelihood estimates by applying the EM algorithm, see [17,38]. Therefore, we need to calculate the conditional expectation of the likelihood function for the full model given the observations. We achieve this by simulating the sample paths of the SDEs given the data (which corresponds to the simulation by a diffusion bridge) using ideas from [3]. It is important to recall that in classical statistical inference with discretely observed diffusion, a lack of control on the frequency of the data could generate large biases (see for instance [13]). In contrast, with the use of the EM algorithm in our method, we reconstruct the information and overcome the mentioned biases.
Since the 1970s, the theory of SDEs has been widely used in the study of population ecology and population dynamics, to establish the bases for estimating parameters (e.g. intrinsic growth rate or value of r) in density-dependent iteroparous populations (see May [37], Braun [5], Golec and Sathananthan [23] and Xiping et al.[51]); further, the MLE method is essential in parameter estimation theory for SLDE applied to population growth (inputs for demographic studies) and somatic growth (biological/fisheries parameters to know the resilience of commercially important species) of bony fishes (Román-Román et al. [45], Shah [47] and Jurado-Molina et al. [31]) and elasmobranchs (Tovar- Ávila et al. [48], Guzmán-Castellanos et al. [24] and Cortés [12]).
The MLE for modeling growth curves with the use of SDEs has been studied by several authors, here we only mention two papers. In [16] the authors considered a model given by an SDE with a mixed model and studied the MLE for a parameter that has to be estimated from the continuous observation of the solution process. In [18], the authors used a Bayesian approach to study and fit an SDE to real data from several chickens that were measured several times over time. The model they considered is a nonlinear mixed model with an SDE. In this paper, we do not consider a mixed model; actually, we are mixing the real data to overcome the limitation that we have only one measurement for each individual in the data. We remark that this is the first application to electric rays and to highlight that Narcine entemedor belongs to the category of having a reproductive strategy of the type 'species that have many offspring of small size' Camhi et al. [8], so this type of adjustment to the model could be applicable to other species with similar characteristics (e.g. guitarfisher and round rays).
Regarding stochastic models in biology, we cite [14]: ‘A major problem in the application of modeling and theory to field research and experimentation in ecology is that mathematical modeling in ecology requires simplifying assumptions, most of which are incompatible with the reality of ecological systems. One of the most important of these assumptions is that individual members of populations can be aggregated into a single state variable representing population size. Many classical models in ecology, such as the logistic equation and the Lotka-Volterra equations assume that all individuals in a population are identical and can be lumped together.’ It is well known that a natural manner to overcome the problem described above is to consider SDEs instead of deterministic differential equations: in this way, we can include in the model randomness such as environment or external perturbations (see Kloeden and Platen [32], Protter [43], Iacus [28]).
This paper is organized as follows. In Section 2, we introduce the stochastic logistic differential equation. In Section 3, we first provide a framework to estimate the diffusion parameter of the SLDE, then we calculate and study some asymptotic properties of the MLE of r, the drift coefficient in the SDE. We also tested the MLE for the parameter r assuming continuous observation of the stochastic process. In Section 4, we review some of the theory of stochastic EM algorithm and diffusion bridges and its use in statistical inference for the case of discretized observations. A method for the case with only one observation for several trajectories of the solution is discussed in that section. The application of the proposed method to real data is presented in Section 5, we also describe the data used in this paper. Our conclusions about the application of the model and its numerical applications are given in Section 6. In Appendix 1, we prove the consistency and asymptotic normality of the estimator ML for r. In Appendix 2, we provide a statistical study, which shows that the estimators obtained using the proposed method are unique under the corresponding time rescaling.
2. A stochastic logistic differential equation
It is well-known in the classical literature (see, for instance, [7]) that the classical logistic model formulated via the initial value problem
| (1) |
has a unique solution given by
| (2) |
In the logistic model and could denote the proportion of individuals at the time instant and , respectively. Moreover, r>0 is the intrinsic growth rate and usually it is assumed that is a constant but unknown. This model has been applied successfully to several fields of knowledge, for instance, growth of tumors, reaction models in chemistry, Fermi distribution in physics, etc. (see for instance [7] or [6] and the references therein).
In this paper, we study the SLDE
| (3) |
where and is a standard Brownian motion, and is a bounded absolutely continuous random variable . We further assume that both and are defined on a common probability space as functions of ω. The model (3) allows the noise to depend on the size of the corresponding population. This type of model is one example of a multiplicative type of noise. Problems like Equation (3) are called initial value problems (IVP). This equation has been studied by several authors, here we only mention [41].
The SLDE (3) has a mathematical formal interpretation given by the following stochastic integral equation
| (4) |
where we use the Itô stochastic integral in the last term of (4).
There is a strong (in the probabilistic sense) and closed solution to (3) given by
| (5) |
where
Indeed, a proof of this fact can be deduced from [30, see Th. 2.2. therein]. We observe that the solution is always positive for all .
It is possible to prove that
and so, since the denominator of (in (5)) is a type of renormalization of plus a constant, we deduce that
From this point on and following the usual notation in probability, we omit the ω-dependence of , , etc.
3. Parametric estimation in the continuous case
In this section, we consider the estimation of the parameters for Equation (3) when the solution process is sampled at continuous time in the interval of observation , T>0. We call this the continuous case if for T>0 (fixed), we can divide the interval into n sub-intervals of length ( is the time between two observations of process path) and we can observe an n large enough such that goes to zero. We will assume that the parameter is unknown and we will assume that the initial condition is a random variable with some density and that this density is the same for all possible values of θ. We provide an estimator for the parameter in the case of continuous observation of a path of the solution to Equation (3). In particular, we calculate the MLE for r and prove the asymptotic properties of this estimator.
With the previous assumptions, the diffusion parameter can be estimated from the quadratic variation of the process as follows. Define the estimator by
| (6) |
thus, since we have the asymptotic normality of (see [50]), i.e.
then we can conclude that is an unbiased estimator of . On the other hand, assume that we know σ or that we estimate it by using (6). Denote by the true value of r and by the probability measure on the space of continuous functions generated by . It is known that and are equivalent for different values of r with (see [28] or [34]). Then, the likelihood is given by the Girsanov theorem:
| (7) |
Then, by maximizing the log likelihood (7) with respect to r we have
from which we obtain the MLE for r
| (8) |
We now study the properties of the estimator .
Theorem 3.1
The estimator is strongly consistent, i.e.
(9) and asymptotically normal, i.e.
(10) where
The proof of this theorem is presented in Appendix 1.
3.1. Validation of the proposed method assuming continuous observation
The objective of this section is to study the behavior of the estimator when we have continuous observation of the solution and σ is known. To achieve this we use synthetic data and proceed as follows. We fix the parameter r and using the true solution (5) we simulate 10,000 trajectories by using the Montecarlo method. Then, we discretize uniformly the time interval, say, , and by using these paths of the solution we will calculate with 10,000.
Denote by the i-th simulation of the solution using formula (5). We also denote by the values of the i-th simulation at the discretized points .
Then, we define the discretized MLE for r
| (11) |
where
| (12) |
We validate the method with two examples for the true parameter: and . We fix and the initial value . We have simulated over the time window , i.e. T = 10. Figure 1 illustrates the consistency property described in Theorem 3.1, (see (9)) and the results of these simulations are plotted in Figure 2.
Figure 1.
Comparison between r and when . (a) r = 0.34. (b) r = 0.78.
Figure 2.
Comparison between r and with an increasing number of simulations N. (a) r = 0.4, (b) r = 0.9.
From these numerical experiments, we conclude that the performance of the estimator converges to the true parameter within the time window and with simulations.
On the robustness of the MLE: a simulation study
In this subsection, we present the result of a simulation study addressed to stress the estimation method. We focus only on the parameter r. We assume that the 10% of the simulation paths we use to estimate the parameter r have errors, which we assume that are Gaussian. That means, we have
with a Brownian motion independent of .
We run the method to estimate r assuming continuous observation and we obtain the result given in Table 1. We present the result of three different values of r.
Table 1.
Average and quantiles ( ) of the parameter estimate obtained from 1000 simulated datasets .
| True value | Without perturbation | With perturbation |
|---|---|---|
| 0.400 | 0.435509 | 0.435645 |
| 0.900 | 0.928649 | 0.929011 |
| 0.341 | 0.377311 | 0.377437 |
The conclusion from this simulation test is that the method still provides a good estimator for the parameter r. In subsection 4.1.1 we provide a simulation study where we perturb a percentage of the actual data.
4. Estimation for the case of incomplete data
In this section, we briefly review the simulation of diffusion bridges and its application to inference, more precisely, we are interested in using it to estimate the parameters that appear in the SLDE (4) when the corresponding diffusion process is discretely observed and where the data set corresponding to one record for each individual at different points in time is observed. To simulate a diffusion bridge, we apply the method for approximate diffusion bridge simulation proposed in [3]. The main motivation of the use of diffusion bridges is to simulate artificially the conditional continuous observation of the underlying stochastic process and then we can use the estimators given in (6) and (8). This means, in practice, we do not observe the full path of the process, instead, we only observe discrete points of the trajectory. Then, we can think about the gap between the observations as a missed information of a model with a tractable likelihood function and then we use the simulation of diffusion bridges and a stochastic EM algorithm (see [10] and [41]) to obtain estimators of the parameters.
4.1. Discrete time observations
Suppose that the only data available from the i-th realization of the process are the observations at times for ), denoted by , with , where we have N paths of the stochastic logistic process in the time interval . We can consider the data set as an incomplete observation of a complete data set given by the sample paths . Then, we use diffusion bridges for complete information, expression (6) to estimate σ, and a stochastic EM algorithm to find the MLE of r for the full log-likelihood function
| (13) |
To do so, we should calculate the conditional expectation of (13) for the full model given the observations. We do this by simulating the sample paths of the diffusion process given the data, which corresponds to the simulation of a diffusion bridge. Let a and b be two points in the state space of . Then a solution of (3) in the interval such that and will be called -bridge. The SEM algorithm works as follows. Let be initial values of the parameters.
Steps 2,3 and 4 of Algorithm 1 are repeated K times with a suitable burn-in of and then the estimators are given by
| (14) |
To calculate the conditional expectation in E-step of Algorithm 1 we use the current and we generate a diffusion bridge
where for , , and .
In the M-step is given by
To update σ in Step 4, we use the continuous paths generated by the diffusion bridges for the E-step.
A simulation study. Here,we present the result of a simulation study, in which 1000 data sets were simulated, i.e. N = 1000. Each data set was obtained by simulating a sample path of length 1500 in the interval time with initial distribution . We suppose that we have only 15 observations at times (n = 14). Then for each path. The parameter values were r = 0.4 and . Figures 3 and 4 present plots of the estimators of 450 iterations of Algorithm 1.
Figure 3.
r estimation with EM.
Figure 4.
σ estimation with EM.
Algorithm 1 was run with K = 450 and L = 1000. First, we choose arbitrarily initial values and (see Figures 3 and 4). Later, we ran Algorithm 1 using the incomplete observation to choose the initial values of parameters, i.e.
and
for . We can observe that the algorithm is most efficient with the second initial values, and so we use these. Based on the evolution of the estimators of Figures 3 and 4, we can affirm that for both parameters there is a very good approximation after iteration 250, and so we choose The averages of the last 200 iterations and the quantiles ( ) of the estimates obtained are given in Table 2.
Table 2.
Average and quantiles ( ) of parameter estimates obtained from 1000 simulated datasets and 1500 length of each path in the interval time .
| Parameter | True value | Estimator | Quantile 95% |
|---|---|---|---|
| r | 0.40 | 0.399310 | (0.397181,0.402966) |
| σ | 0.25 | 0.251208 | (0.248455,0.264933) |
4.1.1. Robust MLE
Suppose that we can only observe samples , which include errors. This can be expressed by
where is the unobservable true data (error-free) and is the error of ij-th sample from a random variable with parameter τ. Here, we can think the data set as an incomplete observation of full data set given by the sample paths and the records of , or equivalently and . Then, the log-likelihood function for r based on the full data set and is given by
| (15) |
where f is the conditional density function of given .
The SEM algorithm for this case is similar to Algorithm 1. Let and if we update τ in the M-step of Algorithm using
we obtained the corresponding Algorithm.
A simulation study. In this subsection, we present the result of a simulation study, in which 1000 data sets were simulated, with the same conditions that the simulation example from last section ut supposing the we only observed and .
Figure 5 shows that for three parameters there is a very good approximation after iteration 350, and so we choose
Figure 5.
estimation with EM.
The averages of the last 100 iterations and the quantiles ( ) of the estimates obtained are given in Table 3.
Table 3.
Average and quantiles ( ) of parameter estimates obtained from 1000 simulated datasets and 1500 length of each path in the interval time .
| Parameter | True value | Estimator | Quantile 95% |
|---|---|---|---|
| r | 0.40 | 0.413360 | (0.413081,0.413790) |
| σ | 0.25 | 0.251202 | (0.252240,0.254720) |
| τ | 0.10 | 0.103046 | (0.088745,0.117238) |
4.2. One record for each path
In this section, we propose a method for estimating the parameters when we have only one measurement from each of a suitable number of paths of the solution. The motivation is to be applicable to the real data, which have this characteristic. We will assume that each trajectory comes from the same stochastic process. Then, when we apply this to the real data it will mean that every measurement comes from an individual that belongs to the same underlying population.
To generate a data set with these characteristics from the sampled paths of a process that is a solution of Equation (3), we will use Algorithm 2. Given θ and the values α and β to generate a path of the data set, the algorithm works as follows.
If we use Algorithm 2 to generate a data set of N paths, we can use expressions (6) and (8) to estimate θ. For the case when we have a data set of initial observation we can use its as a sample of Beta distribution and obtain the corresponding MLE for α and β (see Section 5.2).
A Simulation study. Here, we present the results of a small simulation study, in which we simulated data set of 1000 paths using, i.e. N = 1000. Each path was obtained via Algorithm 2 with n = 1000 over the interval time with initial distribution . The parameter values were r = 5 and . The results of the estimation are presented in Table 4.
Table 4.
Average and quantiles ( ) of parameter estimates obtained from 1000 simulated datasets and 1000 length of each path in the interval time .
| Parameter | Real value | Estimator | Quantile 95% |
|---|---|---|---|
| r | 5.0 | 4.963997 | (4.792340,5.142983) |
| σ | 0.4 | 0.402344 | (0.399762,0.429351) |
5. Application to real data
5.1. Data description
We selected the giant electric ray, because it is one of the most frequent species captured in the artisanal elasmobranch fisheries in the Gulf of California Bizzarro et al. [2]. In addition, in the official fishing reports of batoids, there are no records by species, which complicates the understanding of the impact that fisheries have on stocks. Therefore, such studies are essential to allow the estimation of population parameters (e.g. r and k for the von Bertalanffy model), that supply biological productivity analyses of cartilaginous fishes, which are widely used to identify species with high biological fragility and that have been most affected over the historical series of catches (Musick [39], Cheung et al. [11] and Salomón-Aguilar et al. [46]).
Giant electric ray specimens were collected between October 2013 and December 2015 in the south of Bahía de La Paz, located in the southern portion of the Gulf of California (24” 25' N, 110” 18' W). The organisms were captured by artisanal fishers using monofilament gill nets (200–300 m long, 1.5 m high, 20–25 cm stretch mesh) traditionally called chinchorros which are set in the afternoon at depths between 10 and 30 m over sandy bottoms and recovered the next morning. The total length of each individual was measured (TL, cm) and its sex was determined by the presence of copulatory organs in males. Vertebrae were collected from the abdominal region of each specimen. The radius of each vertebra was measured on the corpus calcareum along a straight line through the focus of each vertebra with the SigmaScan Pro 5.0.0 Software (SPSS Inc). The vertebral radius (VR) was plotted against TL and tested for a linear relation to determine if these vertebrae were a suitable structure for age determination and for back-calculated estimation of length at previous ages. Two readers then did a simultaneous and independent band count without knowing the sex or size of the specimens. The number of samples is N = 244.
We observe that the data come from fishing, so every piece of data is unique, meaning that each individual in the sample is measured once, and at only one point in time. Then, in order to use the method proposed in this paper, we proceed as follows. Since the sample is assumed to come from the same species in a very well delimited geographic area, then we will construct sample paths by joining data from the sample. However, we will join it in such a way that we obtain increasing paths.
5.2. One observation and complete data
We present an estimation procedure applied to the real database. The actual data was used to sample the trajectories of the process following Algorithm 2. Then, based on these paths, we estimate the corresponding parameters.
The statistical estimation is presented in two scenarios. The first scenario assumes that the paths represent complete data, i.e. the observations represent continuous paths. In this case, for the estimation of the parameters, we followed the method of Section 3.
The second scenario is where we assumed that the obtained paths represent discrete time observations of a continuous time process. For the estimation in this scenario, we apply Algorithm 1.
For the estimation, in both cases, we consider a database of 1000 paths, i.e. M = 1000. For each path, n = 12 at the moments ( ) and we assume two discretizations in time. The estimators obtained are unique up to the appropriate rescaling, (see Appendix 2).
Table 5 presents the results for each scenario. The EM algorithm was run with 500 iterations; the estimator is the mean of the last 200 iterations. We used the estimator of the continuous case for the initial parameters.
Table 5.
Average of parameter estimates obtained from 1000 simulated datasets and 1000 observations in each path using different scales.
| Parameter | EM | Cont. | EM | Cont. |
|---|---|---|---|---|
| r | 0.270067 | 0.359245 | 2.700672 | 3.592446 |
| σ | 0.154958 | 0.179908 | 0.490019 | 0.568920 |
Using the obtained estimators, we sample 1000 reverse time paths to complete the information about the missing years (ages zero and one) and from this, we have a sample to fit the initial distribution of the process. The initial distributions are and for the continuous and the discrete case, respectively. Now using the initial distribution and MLE obtained, we simulate 1000 trajectories for each case. The mean, confidence intervals and 95% quantiles are plotted in Figure 6 for the continuous case and the discrete case.
Figure 6.
Intervals confidence for the two scenarios considered for the estimations.
In the case of the fit when we assume that the information is complete (continuous case), we can see in Figure 6(a) that the fit is bad. In particular, with this method, the trajectories are overestimated.
On the other hand, if we observe a diffusion process only at discrete times, we interpret its continuous time path as missing information and combine the algorithms 1 and 2 to obtain the parameter estimators, a very good fit is obtained, see Figure 6(b).
5.3. Validation of the model
This subsection is devoted to the validation of the stochastic model for the real data. We show that the residuals are Gaussians which will imply that the noise B in (3) is actually a Brownian motion.
First, by applying the Lamperti transformation to Equation (3), we obtain
| (16) |
From this, we can define the increment as
| (17) |
We put the estimated values of the parameters into and we get the residual . Then, if we prove that are Gaussian with zero mean and finite variance, then we conclude that the model fits properly to the actual data. Figure 7 shows QQ-plot of residual of 1000 paths of the real data versus a sample of same size of Gaussian random variable with zero mean and variance.
Figure 7.
Quantile-Quantile Residuals vs Gaussian.
Regarding the observed outliers in Figure 7, we now offer an explanation about it.
First, we observe that in the distribution of the frequency of ages (lengths) of the data, the upper extreme of the distribution (the one of individuals with large sizes and ages) is poorly represented; while the lower extreme of the distribution (that of individuals with small sizes and ages) has null representation.
In the former case, it is explained, first, because small individuals are not vulnerable to the fishing gear used by fishermen. Indeed, the size of the hatchling is less than 10 cm, which is the measurement of the opening of the mesh of the net that fishermen use, that allows the organisms to escape. A second reason for the null representation of young ages is that these sizes are generally distributed in other areas (breeding areas) where fishermen do not operate.
In the case of older individuals, it is related to the fact that their abundance is lower, just because the probability of survival at older ages is unlikely since they have been exposed to the possibility of death, either natural (they reached their maximum age) or because they were caught before.
Since that for age 12 there are two records, only one record for ages 13 and 14 and the size for age zero is generated with a beta random variable of the estimated parameters, this explains why in Figure 7 we can observe:
biases in some quartiles of the residuals.
outliers when the initial sample value is particularly small.
However, the records for ages from 2 to 11 contain enough information to generate a good dataset which we used to obtain the estimators and to fit the model. From Figure 7, we conclude that the residuals are Gaussians, this implies that the stochastic model (3) is well fitted to the data.
6. Concluding remarks
In this paper, we have provided a procedure to estimate some parameters for a particular stochastic differential equation (SDE), the logistic SDE driven by a simple multiplicative noise. The method shows consistency for the three scenarios of available data considered in this work. Indeed, we showed that the method estimated very well the diffusion and drift parameters for simulated paths of the solution's process.
We believe the estimation method proposed in this paper is able to be extended under some additional hypotheses, to cover other type of SDEs. This will be the subject of a future study by the authors. Furthermore, the proposed method could be applied to other types of data.
In ecology, the variability in growth modeling is normally linked to the data. For this reason, the notion of observation uncertainty is recognized, which means that the main source of the variability is the age or length measured for each individual [42]. This occurs because the age reading, age validation (to define temporality the formation of growth marks in vertebrae or statoliths in fishes), and length structure of the biological samples are measured with error, influencing the parameter values and theoretical growth curve fitting to the observed data [9]. A new method was presented in this paper, where the variability in growth modeling was estimated directly on the growth parameter r, while the variability in length structure was statistically controlled from a Beta distribution; this approach contains a biological concept trying to take into account that each individual of a population could have a different growth rate, as was exhibited for the SDE used to fit the real data of the electric ray Narcine entemedor.
The estimation procedure applied to a real database evidences same types of growth, from moderate to fast (according to the classification of [4]) in Narcine entemedor, as have been obtained with other models, such as that of von Bertalanffy. For this reason, the information derived from the proposed algorithm is consistent with the fishery-biological parameters (growth rate), size structure and age groups previously described for Cortez electric ray.
Nowadays, regarding fisheries regulation, the multispecies approach based on ecosystems is being adopted for the elaboration of management plans and programs [1], so it is increasingly necessary to complete the biological/fisheries information (e.g. r, K in von Bertalanffy model, r, for the logistic model, etc) of commercially important species and incidental species for the application of analytical models in order to obtain maximum sustainable yields, to know the health status and resilience of the resources. Despite the lack of data for elasmobranchs, this purpose can be achieved thanks to models such as the one we propose here, because it is adjustable to other species of batoids with similar life history characteristics to N. entemedor; further, it will allow to apply the Code of Conduct for Responsible Fisheries [20], since with the best possible reliable information, in the near future, more reliable fishing management guidelines will be generated by having the basic tools for demographic studies based on different scenarios, that are currently so scarce for cartilaginous fish.
Acknowledgments
We would like to thank two anonymous referees for their remarks and in particular to one of them for such helpful comments and suggestions that improve our paper.
Appendices.
Appendix 1. Proof of consistency and normality of r
Proof Proof of Theorem 3.1 —
Observe that using the definition of the SDE (3) into (8), we have
(A1) then,
(A2) Then, to prove the result, we only need to study the right side of (A2) without the constant σ.
Focus on the right term in the equality (A2). We can rewrite it as
We now study the term . By taking first and second moments, we have that
therefore, has zero mean and variance equal to 1 for all T>0. From this, we deduce that
which implies that
We now turn to . We consider the random variable and calculate its first moment,
for all T>0. From this, we have the limit
and from this expression, we deduce that
which implies that
Finally, by the Slutsky's theorem, we conclude that
which proves (9).
To show normality, we note that
for all T>0, and we deduce from this that
in , which implies
in probability. Therefore, from the central limit theorem for martingales, we have that
in distribution. Moreover, since Thus, by Slutsky's theorem, we conclude that
which proves (10). This completes the proof.
Appendix 2. Scale change and precision of estimation
We present a simulation study to illustrate that the estimators are unique up to scale change. Moreover, we use these results to show the precision in the estimation of the parameters according to data available.
We consider the simulation of a path SLDE given by (2) with parameters r = 0.4 and over the time interval with .
To measure precision, using (6) and (8), we calculate the estimators based on different sample sizes but with an appropriate scale, i.e. with Δ. On the other hand, we assume that the data are observed with different scales, we calculate the estimators ( and , ith scale) using those scales and we prove that they are the same estimators but rescaled as in the following way
and
where the distance between the observations under iht scale and is the number of data (uniformly distributed in ) for each scale.
The results of three different scales , and (i = 1, 2, 3, 4) and seven different data size ( ) are reported in the Table A1.
Table A1.
Study of rescaling the parameters.
| j | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| 1 | 10 | 0.5379 | 0.3093 | 53.7876 | 3.0935 | 5.3788 | 0.9782 | 26.8938 | 2.1874 |
| 2 | 20 | 0.4704 | 0.2821 | 23.5190 | 1.9949 | 2.3519 | 0.6308 | 11.7595 | 1.4106 |
| 3 | 50 | 0.4348 | 0.2645 | 8.6958 | 1.1831 | 0.8696 | 0.3741 | 4.3479 | 0.8366 |
| 3 | 100 | 0.4239 | 0.2571 | 4.2394 | 0.8129 | 0.4239 | 0.2571 | 2.1197 | 0.5748 |
| 5 | 200 | 0.4188 | 0.2542 | 2.0941 | 0.5684 | 0.2094 | 0.1798 | 1.0471 | 0.4020 |
| 6 | 500 | 0.4154 | 0.2511 | 0.8308 | 0.3551 | 0.0831 | 0.1123 | 0.4154 | 0.2511 |
| 7 | 1000 | 0.4143 | 0.2503 | 0.4143 | 0.2503 | 0.0414 | 0.0791 | 0.2072 | 0.1770 |
Disclosure statement
No potential conflict of interest was reported by the author(s).
References
- 1.Arreguín-Sánchez F. and Arcos-Huitrón E, La pesca en México: estado de la explotación y uso de los ecosistemas, Hidrobiológica 21 (2011), pp. 431–462. [Google Scholar]
- 2.Bizzarro J., Smith W., Hueter R., Tyminski J., Márquez-Farías J.F., Castillo-Géniz J.L., Cailliet G.M., and Villavicencio-Garayzar C.J., The status of shark and ray fishery resources in the Gulf of California: Applied research to improve management and conservation. Report to the David and Lucille Packard Foundation (2007).
- 3.Bladt M. and Sørensen M., Simple simulation of diffusion bridges with application to likelihood inference for diffusions, Bernoulli 20 (2014), pp. 645–675. [Google Scholar]
- 4.Branstetter S., Early life-history implications of selected carcharhinoid and lamnoid sharks of the northwest Atlantic, in Elasmobranchs as Living Resources: Advances in Biology, Ecology, Systematics and the Status of the Fisheries, Pratt, H.L., Gruber, S.H., Taniuchi, T. (eds.), NOAA Tech. Rep. 90, National Marine Fisheries Service, Silver Spring, MD, (1990)
- 5.Braumann C., Population growth in a random environment, Bull. Math. Biol. 45 (1983), pp. 635–641. [Google Scholar]
- 6.Braun M., Coleman C.S., Drew D.A., and Lucas W.F., Differential Equation Models, Vol. 1, Springer-Verlag, 1983. [Google Scholar]
- 7.Braun M. and Golubitsky M., Differential Equations and Their Applications, Vol. 1, 4th ed., New York, Springer-Verlag, 2014. [Google Scholar]
- 8.Camhi M., Valenti S., Fordham S., Fowler S., and Gibson T., The conservation status of pelagic sharks and rays. Report of the IUCN Shark Specialist Group Pelagic Shark Red List Workshop. IUCN Species Survival Commission Shark Specialist Group. Newbury, UK (2009).
- 9.Campana S.E., Accuracy, precision and quality control in age determination, including a review of the use and abuse of age validation methods, J. Fish. Biol. 59 (2001), pp. 197–242. [Google Scholar]
- 10.Celeux G. and Diebolt J., The SEM algorithm: A probabilistic teacher algorithm derived from the EM algorithm for mixture problem, Comput. Statist. Quart 2 (1986), pp. 599–613. [Google Scholar]
- 11.Cheung W., Pitcher T., and Pauly D., A fuzzy logic expert system to estimate intrinsic extinction vulnerabilities of marine fishes to fishing, Biol. Conserv. 124 (2005), pp. 97–111. [Google Scholar]
- 12.Cortés E., Perspectives on the intrinsic rate of population growth, Methods Ecol. Evol. 7 (2016), pp. 1136–1145. [Google Scholar]
- 13.Dacunha-Castelle D. and Florens-Zmirou D., Estimation of the coefficients of a diffusion from discrete observations, Stochastics 19 (1986), pp. 263–284. [Google Scholar]
- 14.DeAngelis D.L. and Gross J.L, Eds., Individual-based Models and Approaches in Ecology: Populations, Communities and Ecosystems, CRC Press, 2018. [Google Scholar]
- 15.Delgado-Vences F.J., Ornelas A., Cruz V., Morales E., Hernandez C., and Marin E., Bayesian inference for a random logistic differential equation and its application to biological growth of Narcine Entemedor. Manuscript in revision (2020).
- 16.Delattre M., Genon–Catalot V., and Samson A, Maximum likelihood estimation for stochastic differential equations with random effects, Scand. J. Stat. 40 (2013), pp. 322–343. [Google Scholar]
- 17.Dempster A.P., Laird N.M., and Rubin D.B, Maximum likelihood from incomplete data via the EM algorithm (with discussion), J. R. Stat. Soc., Ser. B Stat. Methodol. 39 (1977), pp. 1–38. [Google Scholar]
- 18.Donnet S., Foulley J.L., and Samson A, Bayesian analysis of growth curves using mixed models defined by stochastic differential equations, Biometrics 66 (2010), pp. 733–741. [DOI] [PubMed] [Google Scholar]
- 19.Escati-Peñaloza G., Parma A.M., and Orensanz J.M.L., Analysis of longitudinal growth increment data using mixed-effects models: individual and spatial variability in a clam, Fish. Res. 105 (2010), pp. 91–101. [Google Scholar]
- 20.FAO . Code of Conduct for Responsible Fisheries Rome, FAO. 1995. p. 41.
- 21.Fisch N.C., Bence J.R., Myers J.T., Berglund E.K., and Yule D.L., A comparison of age- and size-structured assessment models applied to a stock of cisco in Thunder Bay, Ontario, Fish. Res. 209 (2019), pp. 86–100. DOI: 10.1016/j.fishres.2018.09.014 [DOI] [Google Scholar]
- 22.Fournier D.A., Sibert J, Majkowski J, and Hampton J., Multifan: A likelihood based method for estimating growth parameters and age composition from multiple length frequency data sets illustrated using data for southern bluefin tuna (Thunnus maccoyii), Can J. Fish Aquat. Sci. 57 (1990), pp. 301–317. [Google Scholar]
- 23.Golec J. and Sathananthan S., Stability analysis of a stochastic logistic model, Math. Comput. Model. 38 (2003), pp. 585–593. [Google Scholar]
- 24.Guzmán-Castellanos A.B., Morales-Bojórquez E., and Balart E.F., Individual growth estimation in elasmobranchs: The multi-model inference approach, Hidrobiológica 24 (2014), pp. 137–150. [Google Scholar]
- 25.Haddon M, Modelling and Quantitative Methods in Fisheries, 433, Chapman and Hall, Boca Raton, FL, 2011. [Google Scholar]
- 26.Hasanoglu A.H. and Romanov V.G, Introduction to Inverse Problems for Differential Equations, Springer International Publishing, 2017. [Google Scholar]
- 27.Hidalgo-de-la-Toba J.A., Vadopalas B., Lluch-Cota D.B., Morales-Bojórquez E., Bautista-Romero J.J., and González-Peláez S.S., Individual growth profiling improves growth modelling in the geoduck clam Panopea generosa, ICES J. Marine Sci. 78 (2021), pp. 112–1124. Doi: 10.1093/icesjms/fsaa197 [DOI] [Google Scholar]
- 28.Iacus S.M, Simulation and Inference for Stochastic Differential Equations: with R Examples, Springer Science & Business Media, 2009. [Google Scholar]
- 29.Isakov V., Inverse Problems for Partial Differential Equations, Vol. 127, Springer, New York, 2006. [Google Scholar]
- 30.Jiang D. and Shi N., A note on nonautonomous logistic equation with random perturbation, J. Math. Anal. Appl. 303 (2005), pp. 164–172. [Google Scholar]
- 31.Jurado-Molina J., Gutiérrez-Benítez O., and Roldan-Heredia A., Model uncertainty and Bayesian estimation of growth parameters of yellowtail snapper (Ocyurus chrysurus) from Veracruz, México, Hidrobiológica 28 (2018), pp. 191–199. [Google Scholar]
- 32.Kloeden P.E. and Platen E, Numerical Solution of Stochastic Differential Equations, Vol. 23, Springer Science & Business Media, 20013. [Google Scholar]
- 33.Lande R., Steinar E., and Saether B.E., Stochastic Population Dynamics in Ecology and Conservation, Oxford University Press, Oxford, 2003. [Google Scholar]
- 34.Liptser R.S. and Shiryaev A.N, Statistics of Random Processes: I. General Theory, Vol. 1, Springer Science & Business Media, 2001. [Google Scholar]
- 35.Lo A.W, Maximum likelihood estimation of generalized Ito processes with discretely sampled data, Econom. Theory 4 (1988), pp. 231–247. [Google Scholar]
- 36.Luquin-Covarrubias M.A., Morales-Bojórquez E., García-Borbón J.A., Amezcua-Castro S., Pérez-Valencia S.A., and Larios-Castro E., Evidence of overfishing of geoduck clam Panopea globosa from a length-based stock assessment approach, PeerJ 8 (2020), pp. 90–69. doi: 10.7717/peerj.9069 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.May R, Stability in randomly fluctuating versus deterministic environments, Am. Nat. 107 (1973), pp. 621–650. [Google Scholar]
- 38.McLachlan G.J. and Krishnan T., The EM Algorithm and Extensions, Wiley, NewYork, 1997. [Google Scholar]
- 39.Musick J.A., Criteria to define extinction risk in marine fishes, Fisheries 24 (1999), pp. 6–14. [Google Scholar]
- 40.Nielsen S.F., The stochastic EM algorithm: Estimation and asymptotic results, Bernoulli 6 (2000), pp. 457–489. [Google Scholar]
- 41.Panik M.J, Stochastic Differential Equations: An Introduction with Applications in Population Dynamics Modeling, John Wiley & Sons, 2017. [Google Scholar]
- 42.Pardo S.A., Cooper A.B., and Dulvy N.K., Avoiding shy growth curves, Methods Ecology Evolut. 4 (2013), pp. 353–360. [Google Scholar]
- 43.Protter P., Stochastic Integration and Differential Equations, Springer-Verlag, Berlin, 2004. [Google Scholar]
- 44.Quinn T.J. and Deriso R, Quantitative Fish Dynamics, 1st ed., Oxford University Press, 1999. [Google Scholar]
- 45.Román-Román P., Romero D., and Torres-Ruiz F., A diffusion process to model generalized von Bertalanffy growth patterns: fitting to real data, J. Theor. Biol. 263 (2010), pp. 59–69. [DOI] [PubMed] [Google Scholar]
- 46.Salomón-Aguilar C.A., Villavicencio-Garayzar C.J., and Reyes-Bonilla H., Shark breeding grounds and seasons in the Gulf of California: Fishery management and conservation strategy, Ciencias Marinas 35 (2009), pp. 369–388. [Google Scholar]
- 47.Shah M.A., Stochastic logistic model for fish growth, Open. J. Stat. 4 (2014), pp. 11–18. [Google Scholar]
- 48.Tovar-Ávila J., Troynikov V.S., Walker T.I., and Day R.W., Use of stochastic models to estimate the growth of the Port Jackson shark, Heterodontus portusjacksoni, off eastern Victoria, Australia, Fish. Res. 95 (2009), pp. 230–235. [Google Scholar]
- 49.Troynikov V.S., Estimation of seasonal growth parameters using a stochastic Gompertz model for tagging data, J. Shellfish Res. 17 (1998), pp. 833–838. [Google Scholar]
- 50.Wei-Cheng M, Estimation of diffusion parameters in diffusion processes and their asymptotic normality, Int. J. Contemp. Math. Sci. 1 (2006), pp. 763–776. [Google Scholar]
- 51.Xiping S. and Yongji W., Stability analysis of a stochastic logistic model with nonlinear diffusion term, Appl. Math. Model. 32 (2008), pp. 2067–2075. [Google Scholar]
- 52.Zepeda-Benitez V.Y., Morales-Bojórquez E, López-Martínez J., and Hernández-Herrera A., Growth model selection for the jumbo squid Dosidicus gigas from the Gulf of California, México, Aquat. Biology 21 (2014), pp. 231–247. [Google Scholar]







