. 2022 Aug 29;24(11):3395–3421. doi: 10.1007/s10530-022-02858-8

Box 2.

Estimating relative abundances

Consider a survey designed to quantify the abundances $N_{l}$ at locations $l = 1, \dots, L$ from abundances reported by observers $j = 1, \dots, J$ from a total of $V$ visits. Let $d_{v}$ denote the reported abundance during visit $v = 1, \dots, V$ conducted at location $l_{v}$ by observer $o_{v}$ . Here, $d_{v}$ is affected by both the abundance $N_{l_{v}}$ at location $l_{v}$ as well as by the detection probability $p_{o_{v}}$ of observer $o_{v}$ such that

$P (d_{v} | N_{l_{v}}, p_{o_{v}}) = (\binom{N_{l_{v}}}{d_{v}}) {p_{o_{v}}}^{d_{v}} {(1 - p_{o_{v}})}^{N_{l_{v}} - d_{v}}$

is given by binomial sampling. Since $N_{l_{v}}$ and $p_{o_{v}}$ are confounded, estimating them individually is difficult (DasGupta & Rubin 2005). To illustrate this, consider a case with two locations with $N_{1} = 100$ and $N_{2} = 200$ surveyed $m = 5$ times each by a single observer with detection probability $p = 0.2$ . As shown in Fig. 5A, the uncertainty associated with abundance estimated from that data under mild priors $N_{1}, N_{2} \sim Exp (0.001)$ spans about two orders of magnitude. This is because the data is well explained by pretty much any abundance if paired with a corresponding detection probability and more informative priors would be required to constrain the range of possible values. However, there is considerable evidence that $N_{2}$ is about twice $N_{1}$ (Fig. 5B), illustrating that relative abundances may be learned accurately from such surveys.

To benefit from this in a realistic setting, we here generalize the inference of relative abundances to many locations. Let us assume that the abundances $N_{l} = N_{0} e^{ρ_{l}}$ are scaled by location-specific factors $ρ_{l} \sim N (0, σ_{ρ}^{2})$ that are themselves normally distributed with mean zero and variance $σ_{ρ}^{2}$ . Similarly, we assume that the detection probabilities $p_{j} = logistic (π_{0} + π_{j})$ are scaled by observer-specific effects $π_{j} \sim N (0, σ_{π}^{2})$ that are also normally distributed with mean zero and variance $σ_{π}^{2}$ . Here, the logistic transformation ensures $0 \leq p_{j} \leq 1$ . We further enforce the conditions $\frac{1}{L} \sum_{i} ρ_{l} = 0$ and $\frac{1}{J} \sum_{j} π_{j} = 0$ by scaling $N_{0}$ and $p_{0}$ accordingly. If observers do not visit multiple locations, the $π_{j}$ need to be modelled using informative covariates.

We conducted simulations with $N_{0} = 100$ , $σ_{ρ}^{2} = 0.2$ , $π_{0} = - 1$ and $σ_{π}^{2} = 0.5$ , corresponding to an average detection probability $p_{0} = logistic (π_{0}) = 0.27$ . As shown in Figs. 5C and 5D, neither $N_{0}$ nor $p_{0}$ can be inferred accurately, regardless of whether $L = 20$ or $L = 100$ locations were surveyed by $J = 20$ or $J = 100$ observers visiting $m = 5$ different locations each, corresponding to $V = 100$ and $V = 500$ visits, respectively. In contrast, the relative abundances are estimated well, and easily distinguish locations with high from those with low abundances (Figs. 5E and 5F).