Skip to main content
Journal of Applied Statistics logoLink to Journal of Applied Statistics
. 2019 Nov 19;47(10):1776–1793. doi: 10.1080/02664763.2019.1693522

Location-scale mixed models and goodness-of-fit assessment applied to insect ecology

R A Moral a,CONTACT, J Hinde b, E M M Ortega c, C G B Demétrio c, W A C Godoy d
PMCID: PMC9041897  PMID: 35707134

Abstract

Survival models have been extensively used to analyse time-until-event data. There is a range of extended models that incorporate different aspects, such as overdispersion/frailty, mixtures, and flexible response functions through semi-parametric models. In this work, we show how a useful tool to assess goodness-of-fit, the half-normal plot of residuals with a simulated envelope, implemented in the hnp package in R, can be used on a location-scale modelling context. We fitted a range of survival models to time-until-event data, where the event was an insect predator attacking a larva in a biological control experiment. We started with the Weibull model and then fitted the exponentiated-Weibull location-scale model with regressors both for the location and scale parameters. We performed variable selection for each model and, by producing half-normal plots with simulated envelopes for the deviance residuals of the model fits, we found that the exponentiated-Weibull fitted the data better. We then included a random effect in the exponentiated-Weibull model to accommodate correlated observations. Finally, we discuss possible implications of the results found in the case study.

Keywords: Biological control, exponentiated models, half-normal plots with simulation envelopes, location-scale modelling, mixed survival models

1. Introduction

In insect studies, outcomes of interest vary widely and hence a broad range of data types are obtained, such as discrete or continuous data, that can be either univariate or multivariate [6]. When studying the ecology of insect species, the interest lies in describing several ecological processes, such as predation, competition, prey-preference, amongst others.

These ecological data can be used for decision making, especially regarding pest management. Integrated pest management (IPM) consists of a series of activities with the purpose of reducing damage from insect pests in agroecosystems [7]. IPM involves (i) the choice of an economic damage threshold; (ii) pest identification and monitoring; (iii) prevention through several techniques; and (iv) control, which is done when steps (i) and (ii) indicate that it should be applied and preventive techniques are no longer available. One of the types of control is biological control, which involves the promotion of pest suppression using other species of insects and fungi, referred to as natural enemies. Hence, no chemical pesticides are used, assuring better crop quality and disease risk prevention for humans [7].

One common type of ecological data is the time until the occurrence of an event, e.g. death of a predator, time taken for a parasitoid to parasitise its prey, time until a competitor attacked another competitor, time until a predator attacked its prey. These data may yield relevant information for biological control strategies. Because this type of variable is continuous and strictly positive, it may be analysed through survival analysis models or non-parametric tests [1]. Many parametric models have been proposed in the literature and a common objective is to use a well-fitting but not overly complex model to make inferences from. It is very important to assess goodness-of-fit of survival models so that any inference is not misleading. However, this may become a difficult task. A good alternative is to use appropriate diagnostic plots, such as half-normal plots with simulation envelopes [2,10].

The main goals of this work are to build and assess goodness-of-fit for different location-scale models, fitted to data from an experiment on prey preference. Predators were given a choice between parasitised and non-parasitised prey and the time until a predator attacked a prey was observed. We propose three location-scale models, starting with the Weibull model, turning to the exponentiated-Weibull model and, finally fitting the exponentiated-Weibull model including random effects. Usually, only the location parameter is modelled with covariates. However, here we model the location and scale parameters with covariates, simultaneously. Moreover, we develop functions for the statistical software R [12] to produce half-normal plots with simulation envelopes for these models to assess goodness-of-fit, using the framework from the hnp package [10]. In the following sections, we describe the experiment, present the models and estimation methods, as well as the goodness-of-fit assessment techniques, present a simulation study, and finally discuss the results.

2. Case study

A major pest of maize worldwide is the larva of Spodoptera frugiperda. Three important natural enemies of this pest are the stinkbug Podisus nigrispinus and the earwig Euborellia annulipes, both predators, and the parasitoid wasp Campoletis flavicincta [9]. The latter lays eggs inside S. frugiperda larvae that hatch and develop inside them, killing the larva upon reaching the adult stage. This process causes substantial metabolic changes in the larvae which may be perceived by potential predators.

In a biological control context, it is important that predators act synergistically to promote pest suppression [5]. In this sense, a desirable outcome when controlling S. frugiperda with multiple natural enemies, such as the three species mentioned above, would be weak competition between predators and preference for non-parasitised larvae so that the parasitoid's population is also maintained in the system [see 9]. Information on the time taken by different predator species to attack prey and on prey preference are particularly relevant in this context.

To study the predatory behaviour of the stinkbug and the earwig when given the choice between parasitised and non-parasitised prey, [9] conducted an experiment in a completely randomised block design with 33 blocks and 4 treatments in a 2×2 (species and gender) factorial design, i.e. male and female stinkbugs and earwigs, which were fasted for 24 hours prior to the experiment set up. Each block consisted of four Petri dishes with one predator and two S. frugiperda larvae, one of which had previously been parasitised by C. flavicincta and another which had not. Each experimental unit was observed for 1 hour and the time (in seconds) taken until the predator first attacked a larva was recorded, as well as which larva the predator chose to effectively consume first (predator preference), see Supporting Information for the complete data set. The experiment took 3 days to complete and on each day 11 blocks were installed and observed, so there is reason to believe that the day effect is random, reflecting different batches of insects used. However, the block within day effect should be treated as fixed because only up to three blocks could be observed at the same time. In that sense, predators which were used in the first blocks were fasted for 24 hours prior to experiment installation but predators used in the last blocks had a longer fasting time which could systematically influence their behaviour. The data consist of time until an event and in some replicates there was not an attack within 1 hour of observation, hence some observations are censored.

3. Modelling

Over the following sections, we will describe how to build location-scale models to analyse these data, starting with a simple model and then adding flexibility with increasing complexity. We then show how random effects may be included to incorporate correlation among observations taken on the same day of experiment.

3.1. Exponentiated family of distributions

A highly flexible family of distributions that can be used to model time-until-event data is the exponentiated family [11]. It allows for the modelling of different hazard function behaviours: constant, increasing, decreasing, bathtub, and unimodal.

Let T1,,Tn be n independent random variables, where each Ti follows a distribution from the family of exponentiated distributions. Their probability density function (pdf) may be written as

fTi(ti;θi,a)=a[G(ti;θi)]a1g(ti;θi),ti>0,

with a>0 a shape parameter, g() a pdf and G() its respective distribution function, and θi the vector of parameters of the distribution. Taking the base distribution as Weibull with

g(ti;αi,γi)=γitiγi1αiγiexp[(tiαi)γi],ti,αi,γi>0,

and

G(ti;αi,γi)=1exp[(tiαi)γi],

then

fTi(ti;αi,γi,a)=aγitiγi1αiγiexp[(tiαi)γi]{1exp[(tiαi)γi]}a1,ti>0, (1)

is the pdf of the exponentiated-Weibull distribution [11]. When γi=1, this is the pdf of the exponentiated-exponential distribution and when a = 1, it is the pdf of the Weibull distribution. Consequently, when a = 1 and γi=1, this is the pdf of the exponential distribution. Constructing a location-scale model from (1) is straightforward and can be done by finding the pdf of Yi=log(Ti) and reparameterising fYi(yi;αi,γi,a) in terms of μi=log(αi), a location parameter, and σi=1γi, a scale parameter, which yields the following result:

fYi(yi;μi,σi,a)=aσi{1exp[exp(yiμiσi)]}a1exp[yiμiσiexp(yiμiσi)], (2)

where <μi<, σi>0 and <yi<. The survival function for model (2) is

SYi(yi;μi,σi,a)=1{1exp[exp(yiμiσi)]}a.

Based on the density (2), we obtain a linear location-scale regression model linking the response variable yi to the location and scale parameters as

yi=μi+σizi,i=1,,n,

where the pdf of the random error Zi is given by

fZi(zi;a)=a{1exp[exp(zi)]}a1exp[ziexp(zi)].

The parameter a makes the left tail of the distribution heavier as it approaches zero, and the distribution becomes more symmetric as a increases. Together with the parameter σi, it provides additional flexibility to the distribution. Under its location-scale formulation, the exponentiated-Weibull model can be useful to analyse time-until-event data, given the natural interpretation of the location and scale parameters, in addition to the extra flexibility introduced by the parameter a. This improves goodness-of-fit for highly skewed distributions, as well as accommodates different shapes of the hazard rate function.

Random samples for the exponentiated-Weibull distribution may be generated using the following expression:

yi=μi+σilog{log(1ui1a)},

where ui is sampled from the Uniform(0,1) distribution.

3.2. Maximum likelihood estimation

Here, there is a type I censoring scheme taking place, i.e. the Petri dishes were observed for 1 hour and for those in which the predators did not attack any larva, the time until attack was not observed – it is only known that it is larger than 60 minutes. Let Ti represent the survival times and Ci represent censoring times. We observe a realisation of the response times Yi=min(log(Ti),log(Ci)), together with a censoring indicator δi=1 if TiCi and δi=0 if Ti>Ci. To incorporate explanatory variables, let μi=x1iβ1 and logσi=x2iβ2, with x1i and x2i 1×p1 and 1×p2 covariate vectors and β1 and β2 p1×1 and p2×1 parameter vectors, respectively. Then, the log-likelihood function under non-informative censoring and considering the vector of parameters θ=(β1T,β2T,a)T can be written as

l(θ)=i=1nδilog(aσi)+(a1)i=1nδilog{1exp[exp(yiμiσi)]}+i=1nδi[yiμiσiexp(yiμiσi)]+i=1n(1δi)log{1{1exp[exp(yiμiσi)]}a}. (3)

The estimation of the p1+p2+1 parameters can be done by maximising l(). Here, we use the BFGS [3] method implemented in the optim function in software R [12] to maximise the log-likelihood (3) and fit the models.

3.3. Estimation including random effects

Because the experiment took 3 days with 11 blocks installed and observed on each day, the day effect may be treated as random, as there were three different batches of insects, coming from the same population. When including random effects in the linear predictor, the likelihood function must be integrated over the random effects so that only the parameters to be estimated are left in the function. In a sample divided in m groups, let Tij be the time until attack of the ith predator in the jth group, with i=1,,nj and j=1,,m (here, m = 3 days). Let Cij represent the censoring times. We again observe realisations of the response times Yij=min(log(Tij),log(Cij)), together with a censoring indicator δij=1 if TijCij and δij=0 if Tij>Cij. To incorporate explanatory variables, let μij=x1ijβ1 and logσij=x2ijβ2, with x1ij and x2ij 1×p1 and 1×p2 covariate vectors, β1 and β2 p1×1 and p2×1 parameter vectors, respectively.

Assuming that all the individuals from the same group have a common random effect, denoted by bj, and further supposing that the random effects are unobserved random variables, the regression model for correlated data is expressed in the following form:

yij=μij+bj+σijzij,i=1,,nj,j=1,,m,

with Bj a normal unobserved random variable, that is, BjN(0,σd2), with pdf fBj(bj;σd2).

Under non-informative censoring, the conditional likelihood function for the nj individuals in group j is now

Lj(β1,β2,a|bj)=i=1nj{aσij{1exp[exp(yijμijσij)]}a1×exp[yijμijσijexp(yijμijσij)]}δij×{1{1exp[exp(yijμijσij)]}a}1δij.

Assuming independence among groups, to obtain the marginal likelihood function, the product between the conditional likelihood function and the pdf of the random effect must be integrated over the random effect, which yields

L(β1,β2,a,σd2)=j=1m{1σd22πLj(β1,β2,a|bj)exp(bj2σd2)dbj}.

Therefore, the log-likelihood function to be maximised considering the vector of parameters θ=(β1T,β2T,a,σd2)T may be written as

l(θ)=j=1mlog{1σd22πLj(β1,β2,a|bj)exp(bj2σd2)dbj}. (4)

There is no analytical solution for the integral in (4), so numerical methods must be used to compute an approximation. Here we use the adaptive Gauss–Hermite quadrature method [13], described in detail in the Appendix.

Under conditions that are fulfilled for the parameter vector θ in the interior of the parameter space but not on the boundary, the asymptotic distribution of n(θ^θ) is multivariate normal Np1+p2+2(0,K(θ)1), where K(θ) is the information matrix. The asymptotic covariance matrix K(θ)1 of θ^ can be approximated by the inverse of the (p1+p2+2)×(p1+p2+2) observed information matrix L¨(θ). The elements of this matrix can be determined by simple double differentiation of l(θ) with respect to the model parameters and then evaluated numerically. Then, an asymptotic confidence interval with significance level ν for each parameter θr is given by

ICr=(θ^rzν/2L^r,r,θ^r+zν/2L^r,r),

whereL^r,r is the rth diagonal of L¨(θ^)1 estimated at θ^, for r=1,,p1+p2+2, and zν/2 is the 1ν/2 quantile of the standard normal distribution.

Predictions for the random effects may be computed through

b^j=E(Bj|Yij=yij)=bjfBj(bj|yij)dbj,

where

fBj(bj|yij)=Lj(β1,β2,a|bj)fBj(bj;σd2)Lj(β1,β2,a|bj)fBj(bj;σd2)dbj

is the pdf of the posterior distribution Bj|Yij. Numerical integration techniques, such as the adaptive Gauss–Hermite quadrature described above, may be used to compute b^j.

3.4. Model selection

In order to study the effects of the covariates species, gender and type of consumed larva, the following linear predictors were used initially:

μijkl=β0+δl+si+gj+(sg)ij+ck, (5a)
log(σijkl)=β0+si+gj+(sg)ij+ck, (5b)

where, for the location parameter μijkl, β0 is the intercept, δl is the effect of the lth block, l=1,,33, si is the effect of the ith species (i = 1 for the earwig and i = 2 for the stinkbug), gj is the effect of the jth gender (j = 1 for females and j = 2 for males), (sg)ij is the effect of the interaction between the ith species and the jth gender, and ck is the effect of prey preference (k = 1 when no prey was effectively consumed, k = 2 when the non-parasitised larva was preferred and k = 3 when the parasitised larva was preferred). It is noteworthy to mention that attack and consumption are measured differently in the case study: there may be an attack, but no effective consumption (when the predator eats the prey in its entirety). For a multiplicity of reasons, predators may opt to attack prey only once and not effectively consume it. Therefore, the consumption variable may be included in the linear predictor, as it is not perfectly collinear with the censoring indicator δi.

Analogously, for the scale parameter σijkl, the effects included initially are the same, with a () included in the notation to differentiate from the estimates for the location parameter. We then performed backward selection using likelihood ratio (LR) tests. We also studied the combination of ck levels to reduce the number of parameters. After model selection, we then added the day effects as random in the linear predictor for the location parameter of the exponentiated-Weibull model. These were assumed to be independent and to follow a normal distribution with mean zero and variance σd2.

3.5. Goodness-of-fit assessment

It is important to carry out diagnostic analyses to verify goodness-of-fit because when the model does not fit the data well, statistical inference may lead to incorrect conclusions about the process under study. Several graphical techniques may be used such as plotting different types of residuals or influence measures (e.g. leverage and Cook's distance). A useful tool to assess model goodness-of-fit is the half-normal plot of deviance residuals with a model-based simulation envelope [2,10]. The envelope is such that under the correct model most of the deviance residuals for the fitted model should lie within it. This conceptually simple technique consists in plotting the ordered absolute values of a model diagnostic versus the expected order statistics of a half-normal distribution

Φ1(i+n182n+12),

where i is the ith order statistic, 1in and n is the sample size.

Then, a simulated envelope can be obtained by (1) fitting a model; (2) extracting model diagnostics and calculating ordered absolute values; (3) simulating 99 (or more) response variables using the same model matrix, error distribution and fitted parameters; (4) fitting the same model to each simulated response variable and obtaining the same model diagnostics, again as ordered absolute values; and (5) computing the desired percentiles (e.g. 2.5 and 97.5) at each value of the expected order statistic to form the envelope.

Here, we used the deviance residuals as the diagnostic measures, defined as

r^Di=sign(r^mi){2[r^mi+δilog(δir^mi)]}12,

where r^mi=δi+logS^(yi) are the Martingale residuals [14].

Half-normal plots with simulated envelopes are implemented in the hnp package for many standard models [10] in R [12]. It is also possible to use the hnp function to produce these plots for models that are not already coded within the function by providing three helper functions, one to extract the model diagnostics, another to simulate response variables, and finally a function to refit the model to the simulated samples. All R code used to obtain the results in this paper is uploaded as Supporting Information.

4. Simulation study

We carried out a simulation study to help understand the behaviour of the proposed exponentiated-Weibull mixed model under different circumstances. Three values for the variance of the random effect σd2 were used (0, 1, and 3). Moreover, the location parameter was modelled without covariates and only a random intercept per day of experiment, and the scale parameter was modelled with two dummy covariates, species and whether the predator consumed prey in one hour of observation, that is,

μijk=β0+bj,log(σijk)=β0+si+ck,

where β0 and β0 are intercepts, bj is the random effect associated with the jth group of correlated observations, BjN(0,σd2), si is the effect of the ith species and ck is the combined effect of consumption (k = 1 when no prey was effectively consumed and k = 2 when there was consumption).

We used the same sample size (132) as for the original data set and here, for simplicity, no censoring. True parameter values were fixed as log(a)=2.17,β0=2.78,β0=1.10,s2=0.47 (σ – Species: Stinkbug) and c2=0.51 (σ – Consumption: Yes) in all simulated scenarios. For each scenario in each study, 1000 simulations were performed and the averages of the estimates were computed, as well as the mean squared errors.

It is possible to see that the model performs reasonably well and estimates σd2 with good precision, see Table 1. As expected, when the variance of the random effects is larger, the precision is lower. See the Appendix for full results of additional simulation studies carried out under other scenarios, including different levels of censoring. In summary, model performance is poorer as the censoring proportion increases and for smaller sample sizes, which is expected. Moreover, when there are more random effects, σd2 is estimated with better precision.

Table 1. Mean of parameter estimates and associated mean squared errors (MSE) for the exponentiated-Weibull mixed model fitted to 1000 simulated data sets. True parameter values: log(a)=2.17,β0=2.78,β0=1.10,s2=0.47 (σ – Species: Stinkbug) and c2=0.51 (σ – Consumption: Yes) and varying σd2, as indicated.

  σd2=0 σd2=1 σd2=3
Parameter Mean MSE Mean MSE Mean MSE
log(a) 2.46 0.27 2.48 0.32 2.44 0.72
β0 (μ – Intercept) 2.36 0.77 2.36 0.83 2.45 1.81
β0 (σ – Intercept) 1.09 0.02 1.08 0.02 0.99 0.07
s2 (σ – Species: Stinkbug) 0.41 0.01 0.40 0.01 0.39 0.02
c2 (σ – Consumption: Yes) –0.40 0.02 –0.40 0.02 –0.39 0.04
σd2 0.07 0.59 0.86 0.71 2.33 0.73

5. Application

5.1. Exploratory analysis

Turning to the analysis of the data set described in Section 2, the Kaplan–Meier estimator for the logarithm of the time until attack data suggests that there is a clear difference between species, with earwigs taking less time to attack prey than stinkbugs. It also suggests that the interaction between gender and species may not be significant, see Figure 1(a).

Figure 1.

Figure 1.

Fitted survival curves for the Weibull model using linear predictors (6a) and (6b) for (a) each treatment combination and (b) combination between species and consumption levels; and fitted survival curves for the exponentiated-Weibull model using linear predictors (7a) and (7b) for (c) each treatment combination and (d) combination between species and consumption levels.

5.2. Weibull model fit

We now fit the Weibull model, modelling both the location and the scale parameters with regressors, see linear predictors (5a) and (5b). There is evidence that the blocks within day effects are not significant (LR=37.61,d.f.=32,p=0.2276). Removing the interaction term from both linear predictors is not significant according to the likelihood-ratio test (LR=3.25,d.f.=2,p=0.1969). Removing the gender main effect from both linear predictors was also not significant (LR=0.31,d.f.=2,p=0.8552). Combining levels 2 (consumption of the non-parasitised larva) and 3 (consumption of the parasitised larva) of the consumption effect in both linear predictors also yielded a non-significant likelihood-ratio test statistic (LR=2.25,d.f.=2,p=0.3238). The inclusion of a species×consumption interaction effect was not significant (LR=2.39,d.f.=2,p=0.3025). Any further reductions yielded significant likelihood-ratio test statistics. Therefore, the selected model was

μik=β0+si+ck, (6a)
log(σik)=β0+si+ck, (6b)

where β0 and β0 are intercepts, si and si are effects of the ith species and ck and ck are the combined effects of consumption, (k = 1 when no prey was effectively consumed and k = 2 when there was consumption), see Table 2 for parameter estimates. A global test between the maximal model (5a and 5b) and the selected model (6a and 6b) was also not significant (LR=43.43,d.f.=38,p=0.2510).

Table 2. Parameter estimates (standard errors), t values and p-values for the Weibull models fitted to the time until attack data using linear predictors (6a) and (6b).

Parameter Estimate (s.e.) t value p-value
β0 (μ – Intercept) 6.16 (0.34) 17.93 <0.0001
s2 (μ – Species: Stinkbug) 1.33 (0.18) 7.40 <0.0001
c2 (μ – Consumption: Yes) 1.32 (0.36) 3.70 0.0002
β0 (σ – Intercept) 0.20 (0.22) 0.93 0.3525
s2 (σ – Species: Stinkbug) 0.26 (0.13) 1.99 0.0462
c2 (σ – Consumption: Yes) 0.43 (0.23) 1.88 0.0604

s.e. = standard error

The fitted survival curves fail to capture well the curvature of the Kaplan–Meier estimates, see Figure 1(a, b), and the lack-of-fit is confirmed by the half-normal plot with simulation envelope, with half of the points lying outside of the simulated envelope, see Figure 2(a). This suggests that the Weibull model is not flexible enough to capture the distribution of the data.

Figure 2.

Figure 2.

Half-normal plot with simulation envelope for the deviance residuals of the (a) Weibull model and (b) exponentiated-Weibull model, both fitted using the maximal linear predictors (5a) and (5b) without the block effects.

The estimated parameters for the location part of the Weibull model suggest that there is no significant interaction between species and gender at 5% level, as well as no significant gender main effect at 5% level, but there is a significant negative consumption effect (see Table 2). In addition, the estimated parameters for the scale part of the model suggest that the failure rate function accelerates for earwigs and for individuals which have effectively consumed prey. This parameter also models variance heterogeneity.

Here, the estimated survival curves for earwigs that did not effectively consume prey in 1 hour and for stinkbugs that did are similar (see Figure 1 b). Fitting a model that merged these two curves yielded the same number of parameters than the previous model (linear predictors (6a) and (6b), see Table 2), but the fitted curves do not seem to capture well the behaviour of the Kaplan–Meier estimates (see Figure 1 a and b). Again, because the Weibull model does not fit the data well, a more adequate model should be used to draw conclusion from this study.

5.3. Exponentiated-Weibull model fit

We now fit the exponentiated-Weibull model, modelling both the location and scale parameters with regressors, see linear predictors (5a) and (5b). This model has one additional parameter, a, which is a shape parameter. There is evidence that the blocks within day effects are not significant (LR=42.21,d.f.=32,p=0.1071). Removing the interaction term from both linear predictors is not significant according to the likelihood-ratio test (LR=2.62,d.f.=2,p=0.2697). Removing the gender main effect from both linear predictors was also not significant (LR=0.24,d.f.=2,p=0.8850). Removing all effects for the location parameter linear predictor, leaving only the intercept was also not significant (LR=3.52,d.f.=3,p=0.3182). Combining levels 2 (consumption of the non-parasitised larva) and 3 (consumption of the parasitised larva) of the consumption effect in the linear predictor for the scale parameter yielded a non-significant likelihood-ratio test statistic (LR=1.32,d.f.=1,p=0.2501). The inclusion of a species×consumption interaction effect in the linear predictor for the scale parameter was not significant (LR=0.43,d.f.=2,p=0.5109). Any further reductions yielded significant likelihood-ratio test statistics. Therefore, the selected model was

μik=β0, (7a)
log(σik)=β0+si+ck, (7b)

where β0 and β0 are intercepts, si is the effect of the ith species and ck is the combined effect of consumption (k = 1 when no prey was effectively consumed and k = 2 when there was consumption), see Table 3 for parameter estimates. A global test between the maximal model (5a and 5b) and the selected model (7a and 7b) was also not significant (LR=49.91,d.f.=40,p=0.1354).

Table 3. Parameter estimates (standard errors), t values and p-values for the exponentiated-Weibull model fitted to the time until attack data using linear predictors (7a and 7b).

Parameter Estimate (s.e.) t value p-value
log(a) 2.17 (0.35) 6.15 <0.0001
β0 (μ – Intercept) 2.78 (0.57) 4.87 <0.0001
β0 (σ – Intercept) 1.10 (0.13) 8.65 <0.0001
s2 (σ – Species: Stinkbug) 0.47 (0.10) 4.64 <0.0001
c2 (σ – Consumption: Yes) 0.51 (0.14) 3.58 0.0003

s.e. = standard error.

The fitted survival curves now seem to follow the Kaplan–Meier estimates quite well, see Figure 1(c, d), and the half-normal plot with simulation envelope confirms that the model fits the data reasonably well, see Figure 2(b). Also, there is no significant effect whatsoever for the location part of the model and the only significant effects (which are the same ones for the Weibull models) are now for the scale part of the model. While both models account for heterogeneity of variances, the additional shape parameter of the exponentiated-Weibull model brings extra flexibility to the survival curves. It is also related to the acceleration of the curves and especially governs the shape of the top of the survival curve, determining when the decay starts. For this case study, the value of a=exp(2.17)=8.76 means that the decay of the survival curve is delayed if compared to a Weibull model (a = 1), and this is why location parameters are needed in the simpler Weibull fit.

Again, the estimated survival curves for earwigs that did not effectively consume prey in 1 hour and for stinkbugs that did are fairly similar (see Figure 1 b). The survival curve for earwigs which effectively consumed prey in 1 hour is more accelerated, the survival curve for earwigs that did not and stinkbugs that did so is less accelerated and the survival curve for stinkbugs that did not consume prey effectively in 1 hour is the least accelerated one (see Figure 1 c and d). This is reflected by the estimated coefficients for the scale parameter (see Table 3), which is equal to exp(0.59)=1.80 for earwigs that consume prey effectively within 1 hour of observation, and exp(1.10)=3.00 for those who do not; while it is equal to exp(1.06)=2.89 for stinkbugs who consume and exp(1.57)=4.81 for those who do not. Hence, since stinkbugs that do not eat present a higher degree of censoring, this is reflected by a larger scale parameter, associated to a heavier tail in the distribution, as well as more variability.

5.4. Including day effects as random in the exponentiated-Weibull model

Now we turn to including the day effects in the location linear predictor for the exponentiated-Weibull model. These should be included as random, because three different batches of insects from the same population were used and there is random variation in the conditions of the 3 days when the experiment was conducted.

The maximum likelihood estimates for the fixed effects are exactly the same as shown in Table 3. The estimate for σd2=0.0005 suggests that there is no significant day effect in the experiment. The hypothesis test H0:σd2 versus H1:σd2>0 is on the boundary of the parameter space and hence the asymptotic reference distribution of the likelihood-ratio test statistic here is a mixture of χ2 distributions: one with 0 degrees of freedom and another with 1 degree of freedom, with a mixing proportion of 12. For this case LR=1.54, with p = 0.1073, and this was not unexpected since the blocks within day effects were also not significant. Moreover, 3 days is a small number of levels of the grouping factor and hence model estimates may be biased, since it has been shown that in general a high reliability is attained for cases with more than ten levels of the grouping factor [see 8].

6. Discussion

In this work, we began by fitting the Weibull model, which is a reasonable starting point when analysing time-until-event data. However, here we used the location-scale formulation and included covariates in both the location and the scale parameters, which provided more flexibility to the survival and hazard functions. However, for this case study the inclusion of an additional shape parameter was essential to appropriately model the data, whilst accounting for variance heterogeneity by also including regressors for the scale parameter. It is worth noting that this is different from the approach of [4] in the sense that the parameters simultaneously modelled here have, for both models, a practical interpretation in the sense of mean and variance regarding the location and scale parameters, respectively.

The scale parameter translates directly into the acceleration of the survival curves. This brings a natural and straightforward interpretation for the parameter estimates and provides a much more complete summary of the data, with direct implications regarding the research questions of the case study. Furthermore, it is noteworthy to mention that even though in this case study the inclusion of a random intercept did not significantly contribute to a better model fit, this approach can be useful when analysing other types of correlated data, given the multiplicity of settings from which correlation may arise.

Assessing goodness-of-fit for these models is not an easy task and the use of half-normal plots with simulation envelopes was a good alternative, especially through the hnp package in R, which allows for relatively easy implementation, even for more complex models. In practice, the hnp package allows for the specification of any model for which it is possible to write code to simulate new samples, and obtain model diagnostics. It was clear that a combination of the use of half-normal plots with simulation envelopes and the superposition of the fitted survival curves on their respective Kaplan–Meier estimates was crucial on deciding which model to use. Hence, we advocate the use of half-normal plots with simulation envelopes to aid in goodness-of-fit assessment.

Regarding the case study, it was clear that the earwig usually attacks earlier than the stinkbug and this is important information from a biological control viewpoint. Previous experiments have shown that this earwig species presents more aggressive behaviour than the stinkbug [9]. Also, the fact that prey consumption affected the survival curves in the same way for both the parasitised and non-parasitised prey shows that there is a clear distinction of two groups: specimens that effectively consumed prey in 1 hour and specimens that did not. This separation between ‘fast’ and ‘slow’ specimens suggests that there is individual variability that must be taken into account when planning biological control strategies. Therefore, further studies on mechanisms of recognition of parasitised prey are particularly relevant in this context.

If the interest lies in modelling prey preference, this can be achieved by using, for example, competing risks models. Here, however, we are looking at whether the predators attack any prey. When we used combined levels of the consumption effect (ck), that is equivalent to an additional predator-related variable, since the use of the original levels does not have any impact on the time until first attack (as made evident by the likelihood-ratio test). Moreover, we have shown in previous work [9] that these predators may identify parasitised prey after they have attacked them, opting to switch which type of prey they consume. But this is not the case before they attack a larva, and hence that is a separate question, which has been addressed before. Here, however, the used methodology was sufficient to answer the research question, which aimed to assess whether individuals of different species and genders opted to attack prey faster or slower in comparison.

Finally, experiments assessing the competition behaviour of these two predators would also enrich the understanding of the ecological relations among these three species. The modelling strategy adopted here makes it possible to determine the variables related to a faster response time. Moreover, by including specific random effects in the linear predictor, it is possible to incorporate the individual variability that may arise for different reasons, such as different insect populations.

Supplementary Material

helper_functions.R
Moral_et_al_simulation_study2.R
Moral_et_al_simulation_study1.R
Moral_et_al_figures.R
supplementarydata.txt
Moral_et_al_R_code.R

Appendix 1. Gauss–Hermite quadrature.

The Gauss–Hermite quadrature of order q for integrals of the form +exp(x2)f(x)dx which is written as

+exp(x2)f(x)dxk=1qvkf(sk),

with sk,k=1,2,,q, the q roots of the Hermite polynomial of degree q, written as

Hq(x)=(1)qexp(x2)dqdxqexp(x2)

and weights

vk=2(q+1)n!πHq(sk).

In a general form, as

g(x)=exp(x2)f(x)f(x)=g(x)exp(x2),

the approximation of +g(x)dx=+exp(x2)f(x)dx may be obtained by k=1qvkf(sk)=k=1qvkg(sk)exp(sk2). Therefore, the Gauss–Hermite quadrature approximation of the integral of any function g(x) may be written as

+g(x)dxk=1qvkg(sk)exp(sk2).

As the number of integration points q is increased, the better the approximation becomes. This method is implemented in many R packages, such as npmlreg [3] and glmmML [1].

One variation of this method is the adaptive Gauss–Hermite quadrature, which centres the integration points around the maximum s=argmaxx{exp(x2)f(x)} of the objective function. In this case we have

sk+=s+sk[d2dx2log{exp(x2)f(x)}|x=s]12

as scaled roots of the Hermite polynomial of degree q and scaled weights

vk+=vkexp(sk+2+sk2)[d2dx2log{exp(x2)f(x)}|x=s]12.

Hence, the adaptive Gauss–Hermite quadrature of order q approximation may be written as

+exp(x2)f(x)dxk=1qvk+f(sk+).

This method is more precise than the Gauss–Hermite quadrature for functions whose mode is far from zero, however, at a computational cost which is maximising exp(x2)f(x). It is implemented in R packages and in SAS's procedure NLMIXED; a variation is readily available from the R base installation through function integrate. For more details see [2, 4, 5].

References.

  1. G. Broström, glmmML: Generalised linear models with clustering, R package version 1.0.3, 2018; software available at https://CRAN.R-project.org/package=glmmML

  2. P.J. Davis and P. Rabinowitz, Methods of Numerical Integration, Academic Press, New York, 2007.

  3. J. Einbeck, R. Darnell, and J. Hinde, npmlreg: Nonparametric maximum likelihood estimation for random effect models, R package version 0.46-3, 2018; software available at https://CRAN.R-project.org/package=npmlreg

  4. Q. Liu and D.A. Pearce, A note on Gauss-Hermite quadrature, Biometrika 81 (1994), pp. 624–629.

  5. S. Rabe-Hesketh, A. Skrondal, and A. Pickles, Reliable estimation of generalised linear mixed models using adaptive quadrature, The Stata Journal 2 (2002), pp. 1–21.

Appendix 2. Further simulation study.

Here, we present a second simulation study, which used three different sample sizes (36, 132, and 480), three proportions of censoring (0%, 15%, and 30% of censored observations), and two numbers of groups of correlated observations (simulating 4 or 12 days of experiment), with the same number of replicates for each combination between the levels of the covariates, i.e. the designs were balanced. True parameter values were fixed as in the study presented previously, i.e. log(a)=2.17,β0=2.78,β0=1.10,s2=0.47 (σ – Species: Stinkbug) and c2=0.51 (σ – Consumption: Yes), in all simulated scenarios. We performed 1000 simulations and the averages of the estimates were computed, as well as the mean squared errors.

The results indicate that the model performed less well as the proportion of censored observations increased and for smaller sample sizes, as expected. As the variance of the random effects increased, for sample size 36 the precision of the estimates of the location (μ) and shape (a) parameters was very low. When there were more random effects (or groups) the model seemed to estimate σd2 with better precision, see Tables 4, 5, and 6.

Table A1. Mean squared errors (MSE) for each parameter of the exponentiated-Weibull mixed model fitted to 1000 simulated data sets, with σd2=0.

    4 random effects 12 random effects
Censoring Parameter n = 36 n = 132 n = 480 n = 36 n = 132 n = 480
0% log(a) 2.47 0.30 0.12 3.36 0.34 0.11
  β0 (μ – Intercept) 10.96 0.89 0.26 12.30 0.96 0.28
  β0 (σ – Intercept) 0.09 0.01 0.00 0.11 0.01 0.00
  s2 (σ – Sp.: Stinkbug) 0.04 0.02 0.01 0.05 0.02 0.01
  c2 (σ – Cons.: Yes) 0.04 0.02 0.02 0.05 0.02 0.02
  σd2 0.03 0.01 0.06 0.13 0.03 0.01
15% log(a) 3.48 0.40 0.17 3.73 0.43 0.17
  β0 (μ – Intercept) 19.07 1.33 0.40 16.13 1.34 0.44
  β0 (σ – Intercept) 0.12 0.03 0.01 0.12 0.02 0.01
  s2 (σ – Sp.: Stinkbug) 0.04 0.02 0.02 0.05 0.02 0.02
  c2 (σ – Cons.: Yes) 0.05 0.03 0.02 0.05 0.03 0.02
  σd2 0.04 0.02 0.05 0.16 0.04 0.01
30% log(a) 4.73 0.55 0.22 4.98 0.53 0.22
  β0 (μ – Intercept) 28.47 2.17 0.59 23.78 1.94 0.66
  β0 (σ – Intercept) 0.18 0.05 0.03 0.17 0.05 0.03
  s2 (σ – Sp.: Stinkbug) 0.06 0.03 0.02 0.06 0.03 0.02
  c2 (σ – Cons.: Yes) 0.06 0.03 0.02 0.08 0.03 0.02
  σd2 0.05 0.02 0.03 0.22 0.05 0.02

Table A2. Mean squared errors (MSE) for each parameter of the exponentiated-Weibull mixed model fitted to 1000 simulated data sets, with σd2=1.

    4 random effects 12 random effects
Censoring Parameter n = 36 n = 132 n = 480 n = 36 n = 132 n = 480
0% log(a) 4.78 0.31 0.11 5.12 0.35 0.11
  β0 (μ – Intercept) 23.57 0.89 0.25 21.18 1.08 0.27
  β0 (σ – Intercept) 0.13 0.02 0.00 0.15 0.02 0.00
  s2 (σ – Sp.: Stinkbug) 0.05 0.02 0.02 0.06 0.02 0.02
  c2 (σ – Cons.: Yes) 0.05 0.02 0.02 0.07 0.03 0.02
  σd2 0.23 0.18 0.12 0.10 0.06 0.05
15% log(a) 4.57 0.47 0.15 8.60 0.42 0.15
  β0 (μ – Intercept) 27.06 1.66 0.47 45.39 1.53 0.37
  β0 (σ – Intercept) 0.16 0.03 0.01 0.24 0.03 0.01
  s2 (σ – Sp.: Stinkbug) 0.05 0.03 0.02 0.07 0.03 0.02
  c2 (σ – Cons.: Yes) 0.06 0.03 0.02 0.08 0.03 0.02
  σd2 0.23 0.19 0.14 0.12 0.06 0.05
30% log(a) 7.35 0.63 0.20 10.66 0.59 0.20
  β0 (μ – Intercept) 53.48 2.52 0.74 72.75 2.58 0.57
  β0 (σ – Intercept) 0.25 0.05 0.02 0.36 0.05 0.02
  s2 (σ – Sp.: Stinkbug) 0.07 0.03 0.02 0.09 0.03 0.02
  c2 (σ – Cons.: Yes) 0.07 0.03 0.02 0.10 0.04 0.02
  σd2 0.26 0.19 0.15 0.17 0.07 0.06

Table A3. Mean squared errors (MSE) for each parameter of the exponentiated-Weibull mixed model fitted to 1000 simulated data sets, with σd2=3.

    4 random effects 12 random effects
Censoring Parameter n = 36 n = 132 n = 480 n = 36 n = 132 n = 480
0% log(a) 6.88 0.66 0.06 8.40 0.50 0.06
  β0 (μ – Intercept) 37.04 2.38 0.23 43.63 1.94 0.27
  β0 (σ – Intercept) 0.20 0.05 0.02 0.24 0.03 0.02
  s2 (σ – Sp.: Stinkbug) 0.06 0.02 0.01 0.08 0.03 0.02
  c2 (σ – Cons.: Yes) 0.06 0.03 0.02 0.09 0.03 0.02
  σd2 1.76 1.23 1.24 0.76 0.70 0.56
15% log(a) 8.53 0.84 0.09 13.26 0.59 0.08
  β0 (μ – Intercept) 53.41 3.96 0.34 80.24 2.85 0.28
  β0 (σ – Intercept) 0.23 0.05 0.02 0.36 0.04 0.01
  s2 (σ – Sp.: Stinkbug) 0.07 0.03 0.02 0.11 0.03 0.02
  c2 (σ – Cons.: Yes) 0.07 0.03 0.02 0.11 0.04 0.02
  σd2 1.78 1.23 1.12 0.84 0.69 0.56
30% log(a) 10.99 1.49 0.26 13.17 0.76 0.12
  β0 (μ – Intercept) 83.99 8.21 1.21 106.37 5.00 0.52
  β0 (σ – Intercept) 0.31 0.06 0.03 0.49 0.05 0.01
  s2 (σ – Sp.: Stinkbug) 0.08 0.03 0.03 0.13 0.04 0.02
  c2 (σ – Cons.: Yes) 0.09 0.04 0.02 0.15 0.04 0.02
  σd2 1.75 1.23 1.08 0.89 0.71 0.52

Funding Statement

RAM was funded by Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP) (Grant No. 2014/12903-8). Exchange visits between CGBD and JH were partially supported by CNPq, FAPESP and Science Foundation Ireland (SFI). WACG was funded by CNPq.

Disclosure statement

No potential conflict of interest was reported by the authors.

References

  • 1.Aitkin M., Francis B., Hinde J. and Darnell R., Statistical Modelling in R, Oxford University Press, Oxford, 2009. [Google Scholar]
  • 2.Atkinson A.C., Plots, Transormations, and Regression: An Introduction to Graphical Methods of Diagnostic Regression Analysis, Clarendon, Oxford, 1985. [Google Scholar]
  • 3.Broyden C.G., Dennis J.E. and Moré J.J., On the local and superlinear convergence of quasi-Newton methods, J. Appl. Math. 12 (1973), pp. 223–245. [Google Scholar]
  • 4.Burke K. and MacKenzie G., Multi-parameter regression survival modelling: An alternative to proportional hazards, Biometrics 73 (2017), pp. 678–686. doi: 10.1111/biom.12625 [DOI] [PubMed] [Google Scholar]
  • 5.Cardinale B.J., Harvey C.T., Gross K. and Ives A.R., Biodiversity and biocontrol: Emergent impacts of a multi-enemy assemblage on pest suppression and crop yield in an agroecosystem, Ecol. Lett. 6 (2003), pp. 857–865. doi: 10.1046/j.1461-0248.2003.00508.x [DOI] [Google Scholar]
  • 6.Demétrio C.G.B., Hinde J. and Moral R.A., Models for overdispersed data in entomology, in Ecological Modelling Applied to Entomology, C.P. Ferreira and W.A.C. Godoy, eds., Springer, Berlin, 2014, pp. 219–259.
  • 7.Kogan M., Integrated pest management historical perspectives and contemporary developments, Annu. Rev. Entomol. 43 (1998), pp. 243–270. doi: 10.1146/annurev.ento.43.1.243 [DOI] [PubMed] [Google Scholar]
  • 8.McNeish D. and Stapleton L.M., Modeling clustered data with very few clusters, Multivar. Behav. Res. 51 (2016), pp. 495–518. doi: 10.1080/00273171.2016.1167008 [DOI] [PubMed] [Google Scholar]
  • 9.Moral R.A., Demétrio C.G.B., Hinde J., Godoy W.A.C. and Fernandes F.S., Parasitism-mediated prey selectivity in laboratory conditions and implications for biological control, Basic Appl. Ecol. 19 (2017), pp. 67–75. doi: 10.1016/j.baae.2016.11.002 [DOI] [Google Scholar]
  • 10.Moral R.A., Hinde J. and Demétrio C.G.B., Half-normal plots and overdispersed models in R: The hnp package, J. Stat. Softw. 81 (2017), pp. 1–23. doi: 10.18637/jss.v081.i10 [DOI] [Google Scholar]
  • 11.Nadarajah S., Cordeiro G.M. and Ortega E.M.M., The exponentiated Weibull distribution: A survey, Statist. Papers 54 (2013), pp. 839–877. doi: 10.1007/s00362-012-0466-x [DOI] [Google Scholar]
  • 12.R Core Team , R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria; software available at: http://www.R-project.org.
  • 13.Stroud A.H. and Secrest D., Gaussian Quadrature Formulas, Prentice-Hall Series in Automatic Computation, Prentice-Hall, 1966. [Google Scholar]
  • 14.Therneau T.M., Grambsch P.M. and Fleming T.R., Martingale-based residuals for survival models, Biometrika 77 (1990), pp. 147–160. doi: 10.1093/biomet/77.1.147 [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

helper_functions.R
Moral_et_al_simulation_study2.R
Moral_et_al_simulation_study1.R
Moral_et_al_figures.R
supplementarydata.txt
Moral_et_al_R_code.R

Articles from Journal of Applied Statistics are provided here courtesy of Taylor & Francis

RESOURCES