Skip to main content
. 2022 Aug 29;24(11):3395–3421. doi: 10.1007/s10530-022-02858-8

Box 1.

Inferring hierarchical parameters

As an example, consider a project in which volunteer recorders report detections and non-detections of an alien species at a large number of locations. Rather than inferring whether a location is occupied (i.e., the alien species is present) for each location individually, such data may be modelled using hierarchical parameters that govern the distribution of occupied locations. For instance, one might introduce the hierarchical parameter ψ that reflects the fraction of locations that are occupied.

To illustrate this, consider a project in which volunteer recorders visit L locations m times each. Let di reflect the number of visits at location l=1,,L that resulted in a detection, and the remaining m-dl in a non-detection. Let us further denote by zl wheter location l is occupied (zl=1) or not (zl=0) and by ϵ10 and ϵ01 the false negative and false positive detection rates, respectively. Under this model,

Pdl|zl,ϵ01,ϵ10=mdlϵ10dl1-ϵ10m-dlifzl=0,mdl1-ϵ01dlϵ01m-dlifzl=1.

As an example, we consider the case with m=5 visits per location, ϵ01=0.1 and ϵ10=0.7. As shown in Fig. 4A, accurately identifying occupied locations is difficult under these parameters: the most likely data at occupied locations is dl=1, which is almost equally likely to get at non-occupied locations as well.

To infer the hierarchical parameters ψ, ϵ01 and ϵ10, we integrate out zl to obtain the relevant likelihood

Pd|ψ,ϵ01,ϵ10=l=1LPdl|zl=0,ϵ01,ϵ101-ψ+Pdl|zl,=1ϵ01,ϵ10ψ.

In Fig. 4B,C, we show Bayesian estimates of the parameters ψ, ϵ01 and ϵ10 from data simulated at L=100, L=1,000 or L=10,000 locations, confirming that these hierarchical parameters can be inferred rather accurately if sufficient locations were surveyed. Importantly, however, error rates can only be accurately learned if there are enough sites with multiple detections and hence sufficiently many visits. For a fixed number of visits, estimation errors are therefore minimized for intermediate number of visits per location, for the error rates chosen here at about m=20 (Fig. 4D).