Skip to main content
PLOS Computational Biology logoLink to PLOS Computational Biology
. 2020 May 18;16(5):e1007878. doi: 10.1371/journal.pcbi.1007878

Estimating a novel stochastic model for within-field disease dynamics of banana bunchy top virus via approximate Bayesian computation

Abhishek Varghese 1,2,*, Christopher Drovandi 1,2, Antonietta Mira 3,4, Kerrie Mengersen 1,2
Editor: Konstantin B Blyuss5
PMCID: PMC7259802  PMID: 32421712

Abstract

The Banana Bunchy Top Virus (BBTV) is one of the most economically important vector-borne banana diseases throughout the Asia-Pacific Basin and presents a significant challenge to the agricultural sector. Current models of BBTV are largely deterministic, limited by an incomplete understanding of interactions in complex natural systems, and the appropriate identification of parameters. A stochastic network-based Susceptible-Infected-Susceptible model has been created which simulates the spread of BBTV across the subsections of a banana plantation, parameterising nodal recovery, neighbouring and distant infectivity across summer and winter. Findings from posterior results achieved through Markov Chain Monte Carlo approach to approximate Bayesian computation suggest seasonality in all parameters, which are influenced by correlated changes in inspection accuracy, temperatures and aphid activity. This paper demonstrates how the model may be used for monitoring and forecasting of various disease management strategies to support policy-level decision making.

Author summary

The Banana Bunchy Top Virus (BBTV) poses one of the greatest threats to the food security of developing nations and the banana industry throughout the Asia-Pacific Basin. Decision-makers face significant challenges in mitigating BBTV spread in banana plantations due to the vector-borne spread of this disease, which is significantly influenced by a vast array of external environmental factors that are unique to each plantation. We propose a flexible network-based model that describes the spread of BBTV in a real banana plantation through a random process while accounting for individual plantation characteristics and utilise a principled methodology for estimating model parameters. Our models can be used to quantify the effects of seasonal changes and plantation configuration on BBTV spread and can be used to predict high-risk areas in this plantation. We believe that our model might be used by decision-makers to evaluate the effectiveness of current disease management strategies and explore opportunities for improvements.

Introduction

The Banana Bunchy Top Virus (BBTV) is one of the most economically important vector-borne banana diseases throughout the Asia-Pacific Basin. The disease was first introduced to Australia in 1913 via infected suckers from Fiji, and spread locally through the banana aphid, Pentalonia nigronervosa [1]. With limited knowledge on epidemiological characteristics of the disease or disease management approaches, incidence rates across Australian banana plantations rose rapidly, eradicating over 90% of national crop production in the 1930’s [2]. Cook et al. [3] estimate that the economic benefits of BBTV exclusion from commercial plantations range from $15.9 to $20.7 million each year, approximately 5% of annual crop production value. Aggressive disease management strategies implemented by the Australian Government from the 1930s-90s have largely restricted the disease to the South-East Queensland and Northern New South Wales regions of Australia [4]. Eradication, however, has not been achieved, requiring continuous monitoring by the National BBTV Program.

While monitoring the infection counts across the region does provide an indication of disease management success, a vast array of external environmental factors influence BBTV growth, making this an unreliable metric. Furthermore, there are currently limited opportunities to explore various management strategies for BBTV within banana plantations, which could reduce infection rates and identify cost-saving measures for the monitoring program. In such scenarios, mathematical models offer the opportunity to simulate various disease management strategies with a low-cost and quick turnaround [5].

Unfortunately, there have been few contributions to modelling the disease dynamics of BBTV in plantations in the last few decades–despite the significant advancements in computational resources and our understanding of vector-borne diseases. Allen [6] generated a stochastic spatiotemporal polycyclic model for BBTV to describe disease progress within a banana plantation, specifically focusing on identifying the mean inoculation distance of BBTV. However, the model was designed for a hypothetical homogenous circular plantation, which does not account for the various plantation configurations and unique plantation characteristics present around the world; a key factor which greatly affects the effectiveness of disease management strategies. Another model developed by Smith et al. [1] aimed to describe the influence of external inoculum on BBTV spread within a banana plantation in the Philippines. However, the model developed by Smith et al. [1] is deterministic, which results in poor representations of complex natural processes that are inherently probabilistic. Furthermore, these articles do not provide a principled parameter estimation method with appropriate uncertainty quantification based on available field data.

In our paper, we propose a new stochastic model that describes BBTV spread across a banana plantation, parameterising for neighbouring and long-distance infectivity rates, and recovery rates. Further, we develop a principled Bayesian parameter estimation method for calibrating this model to real field data. Given the intractability of the likelihood function for this model, we employ approximate Bayesian computation (ABC) [7] for estimating model parameters and their uncertainty. Our methodology is inspired by Dutta et al. [8], who demonstrate that ABC may be effectively used to estimate the spreading parameters of a disease by applying a simple Susceptible-Infected-Susceptible (SIS) model over a known network structure. This paper adapts and extends this approach to further understand the spreading characteristics of BBTV, evaluates various disease management strategies at the plantation level, and predicts the spread of future outbreaks. Even though our work is motivated by the vector-borne transmission of BBTV, we believe that our modelling framework is easily adaptable to describe the within-field disease dynamics of other vector-borne diseases.

Methods

This study focuses on a banana plantation in Newrybar, near the North-Eastern border of New South Wales in Australia (28°42'14.8"S, 153°32'20.4"E), with an area of approximately 12 hectares. A routine site inspection in 2013 identified a banana plant infected with BBTV in the North-Western region of the farm. A following inspection in 2014 identified 26 infections clustered across the South-Eastern region of the farm. Since 2014, the site has undergone monthly inspections, collecting location data and plant characteristics, while implementing a rogue-and-remove disease management strategy. The location of every infected plant has been recorded using the Global Positioning System (GPS) functionality on a smartphone. The dataset consists of 38 snapshots of infection data at monthly intervals with the coordinates of each infected plant identified and rogued during a site visit to the farm.

Fig 1 provides a birds-eye view of the plantation. The plantation is separated by dirt paths approximately 3–5 m wide, creating smaller subsections of banana plants across the plantation.

Fig 1. Satellite image of Newrybar banana plantation.

Fig 1

Solid white lines indicate the approximate border of the property.

Fig 2(A) provides the BBTV infection counts of the Newrybar site at monthly intervals from December 2014 to January 2018. Since December 2014, over 3000 banana plants have been removed from the Newrybar plantation.

Fig 2. BBTV infection counts since surveying began in Dec-2014.

Fig 2

(a) BBTV Infection counts over the survey time period (38 months). Surveys occurred at approximately monthly intervals (between 25–30 days), according to the National Banana Bunchy Top Virus Project disease management practices. Therefore, the exact date of plantation survey has been disregarded. (b) Logarithmic transformation of BBTV infection counts, to highlight seasonal variance in infectivity.

Fig 2(B) highlights the seasonality in BBTV infectivity. Over the observation period of approximately 38 months, BBTV infection counts tend to peak during warmer periods of the year (November to February), while receding during traditionally colder months (May to August).

BBTV forward-simulation model

We propose to model the spread of BBTV in a banana plantation by modifying the ‘simple contagion’ model developed by Dutta et al. [8]. The ‘simple contagion’ model simulates a standard SIS process on a fixed network structure. At every time step, each infected node chooses one of its neighbours with equal probability regardless of their status (susceptible or infected), and if the chosen node is susceptible, it is infected with probability θ. Our network-based forward-simulating modelling approach enables easy adaptation to describe the disease dynamics over a range of plantations, and various vector-borne diseases. A network representation of the banana plantation and its degree distribution are provided in Fig 3(A) and 3(B), respectively.

Fig 3. Representative network structure of banana plantation.

Fig 3

(a) Labels indicate node index. Each plantation subsection is regarded as a node. Subsection coordinates are mapped using open source GRASS GIS software. (b) Degree-distribution of network. Highest-degree node: node 43 with 10 edges; Lowest-degree node: node 54 with 1 edge.

Dutta et al. [8] denote the ‘simple contagion’ model by Ms and parameterise it in terms of the spreading rate θ and the seed node nsn. For given values of these two parameters, they forward simulate the evolving epidemic over time using model Ms.

We adapt and extend model Ms in several ways for our application. Dutta et al. [8] determine a node to be an individual person, with edges representing each person’s contacts. The Newrybar banana plantation has a plant density of approximately 4000 banana plants per hectare, with infections distributed across the farm area (Fig 4(A)). Therefore, modelling the individual status of every plant as a node would be impractical and computationally expensive, due to the varying contact points between plants, and the complex plant-pathogen-vector relationships as described in the previous section. Instead, the data may be aggregated to monitor the infection likelihood over larger areas of the farm (as depicted in Fig 4(B)). The wide dirt paths throughout a farm act as a soft barrier to BBTV spread, since most aphids are apterous (wingless) and are likely to move from leaf-to-leaf. Therefore, the subsections in a plantation may be described as nodes. Nodes that correspond to neighbouring subsections are linked, giving rise to an undirected network, as seen in Fig 3(A). Any subsection containing at least one infected plant may be considered ‘infected’ (Fig 4(C)).

Fig 4. Distribution of BBTV Infections.

Fig 4

(a) Distribution of BBTV infections over total observation period. Black points represent observed infections. (b) Discrete-space distribution of BBTV infections over total observation period. Infection coordinates are binned to plantation subsections. Subsections are coloured according to infection count. (c) Discrete spatio-temporal distribution of infection presence in plantation subsections across observation period. This is the final data used for parameter estimation. The first column of the data is used as the initial configuration for model simulation. The node # of the corresponding subsection detailed in Fig 3(A).

Furthermore, Dutta et al. [8] describe Ms to propagate infection spread in the network through a single seed node. This is an unrealistic assumption for modelling BBTV spread in plantations, since it is possible for a plantation to have multiple latent infections upon first exposure to the virus. Therefore, Ms must be adapted to accept multiple seed nodes. Additionally, since the scope of this paper is limited to the analysis of current trends and predictions to evaluate disease management strategies, we are not interested in inferring the initial seed node(s). Rather, the BBTV model considers the infected nodes observed in the first month of field data surveying to be the initial configuration of seed nodes at t = 0 (see Fig 4(C)).

Finally, the SI model of Dutta et al. [8] is extended by explicitly considering the recovery of an infected node. A node is considered recovered if, upon inspection at time t, it contained at least one infected plant, while at time (t+1) no infected plants were found. Recall that if an infected plant is found within a certain subsection of a plantation, it is immediately rogued and removed. This may be considered as a recovery. Removing the infected plant(s) from its subsection reverts the corresponding node to a susceptible state, as the remaining plants within the subsection/node remain vulnerable to being infected, giving rise to a Susceptible-Infected-Susceptible (SIS) model.

Parameters

The unique epidemiological characteristics of BBTV must be parameterised to fully capture the complex dynamics of BBTV transmission in plantations, and to extend on the current literature on BBTV.

There are three parameters that are estimated in this model:

  1. Probability of recovery, θ0: As infected plants are rogued and removed at monthly inspections of infected plantation visits, nodes may subsequently recover from an infected state to a susceptible state. Currently, there is no existing literature on BBTV recovery rates in field scenarios using current disease management strategies. Accurate estimates of the probability of node recovery in plantations could inform decision makers on the effectiveness of current roguing methods, inspection frequency and inspection accuracy.

  2. Neighbouring probability of infection, θ1: The model Ms created by Dutta et. al [8] proposes an infection probability for each of the neighbours of an infected node. This parameter is also relevant for our case study, as the aphid vector is likely to travel between neighbouring nodes. Allen [6] identifies that the probability of a BBTV infection is inversely proportional to the distance from a previously infected plant, as most aphid flights cover small distances.

  3. Non-neighbouring probability of infection, θ2: While short distance flights are more likely to occur in banana plantations, long distance aphid vector transmission remains a possibility. This parameter operates on all infected nodes, whereby each node has a probability of infecting every non-neighbour. Aphids are also known to be restless and sensitive to small changes in the environment and have been shown to relocate to other plants due to overpopulation, harvesting activities and sudden changes in atmospheric weather conditions [9].

The operations of these parameters are summarised in Fig 5 below:

Fig 5. State-flow diagram describing each parameter’s influence on the state of a node.

Fig 5

A single Banana plant has been chosen to represent an entire subsection of bananas.

Seasonality

As highlighted by the infection counts presented in Fig 2(B), BBTV infectivity is influenced by seasonal changes in temperature, therefore it may be useful to identify changes in the posterior distribution of the parameters for different seasons. Allen [6] identifies that detection efficiency, eradication efficiency and aphid activity are seasonally varying factors which greatly affect the spread of BBTV. Furthermore, Allen [10] confirms a seasonally varying leaf emergence rate in bananas, which informs detection and eradication efficiency. Anhalt and Almeida [11] observe temperature to be highly correlated with acquisition and inoculation efficiency, with peak transmission efficiency occurring at 25–30 degrees.

To accommodate for seasonal variation, we allow each parameter to be month dependent. While it may be theoretically possible to generate unique posterior estimates for each month, their accuracy may be greatly diminished by the lower amount of field data available for each monthly parameter. To maintain an effective sample size of observed data to inform each parameter while ensuring a clear differentiation between parameter counterparts, each parameter has been replicated by dividing and grouping months traditionally above or below the long-term average temperature. Months with an average temperature traditionally higher than the long-term annual average temperature, may be referred to as ‘summer’ months, and vice versa for ‘winter’ months. Summer months are given by September, October, November, December, January and February. Likewise, winter months are given by March, April, May, June, July and August.

Therefore, the final set of parameters are described by θij, with i ϵ [0, 1, 2] indicating the parameter type (recovery, near, and long distance infectivity respectively) and j ϵ [0, 1] indicating the season (summer and winter respectively).

The mechanisms of the forward-simulation model are summarised in Fig 6.

Fig 6. Flow-chart describing BBTV model behaviour for each time step t.

Fig 6

Approximate Bayesian Computation (ABC)

The parameter ϕ = {θ00, θ10, θ20, θ01, θ11, θ21} may be inferred by its posterior density p(ϕ | y) given the observed dataset y. The posterior density can be written by Bayes’ theorem as,

p(ϕ|y)=π(ϕ)p(y|ϕ)m(y) (1)

where π(ϕ), p(ϕ | y) and m(x) = π(ϕ)p(y|ϕ)dϕ are, correspondingly, the prior density on the parameter ϕ, the likelihood function, and the marginal likelihood. The prior density π(ϕ) enables a way to leverage the learning of parameters from prior knowledge, such as the epidemiological characteristics and expert knowledge on BBTV [8].

Alternative modelling methods for describing the disease dynamics of vector-borne diseases within plantations have been previously covered in the epidemiological literature. In particular, the use of state-space epidemiological models with continuous-time processes have been demonstrated in a variety of scenarios, such as describing the disease spread of the Citrus tristeza virus (CTV) in a citrus orchard [12] and an aphid infestation in a sugar cane plantation [13]. Employing this class of models, with some simplifying assumptions on plantation shape and uniformity enables a relatively straightforward calculation of the likelihood function, p(y | ϕ), enabling the use of likelihood-based inference techniques such as Markov Chain Monte Carlo (MCMC) or Sequential Monte Carlo (SMC) to estimate the posterior distributions of parameters with high accuracy.

The use of a static, spatial network model to describe the spatiotemporal spread of a disease in a plantation presents a novel approach to this modelling problem, and joins a growing body of research in the application of epidemiological network models [14]. However, the use of network models to describe disease dynamics presents significant challenges for employing likelihood-based inference approaches due to the complex nature of network structures, resulting in an intractable likelihood function [8]. While MCMC has been successfully employed for some network modelling applications in contact networks, all applications required the assumption of a ‘tree-like’ contact network structure to resolve the likelihood function [8]. While these assumptions do not hold for the spatial network employed in our application, fortunately, simulating our network model is computationally cheap. This enables us to employ ABC, which provides the opportunity to sample from the approximate posterior density of the parameters [7].

ABC bypasses the evaluation of the likelihood function by instead simulating data from the model to generate an approximate posterior distribution. Due to the high dimensionality of the observed data, y, the data set is often reduced to a set of summary statistics, S(y). Thus, ABC targets the posterior conditional on the summary statistics:

p(ϕ|S(y))p(S(y)|ϕ)π(ϕ) (2)

However, this too requires the evaluation of a typically intractable likelihood, p(S(y) | ϕ). Therefore, ABC approximates this intractable likelihood through the following integral:

pϵ(S(y)|ϕ)=yKϵ(ρ(S(x),S(y)))p(x|ϕ)dx (3)

where ρ(S(x), S(y)) is a discrepancy function that compares the simulated and observed summary statistics, and Kϵ(·) is a kernel weighting function with bandwidth ϵ that weights simulated summaries in accordance with their closeness to the observed summary statistic. The role of the discrepancy measure will become clear in the next section. While the integral in (3) is analytically intractable, it may be estimated by taking n iid simulations from the model {xi}i=1n~p(x|ϕ), evaluating their corresponding summary statistics {Si}i=1n where Si = S(xi), and calculating the following ABC likelihood:

pϵ(S(y)|ϕ)1ni=1nKϵ(ρ(Si,S(y))) (4)

The unbiased likelihood estimator described in (4) is generally sufficient to obtain a Bayesian algorithm that targets the posterior distribution pϵ(ϕ|S(y))pϵ(S(y)|ϕ)p(ϕ). The summary statistics, S(·), discrepancy measure, ρ(·,·), and tolerance value, ϵ, utilised in the ABC method introduce approximation errors to the target posterior distribution. In order to minimise these errors, these factors must be chosen and tuned carefully to maximise accuracy while ensuring a computationally feasible operation [15].

ABC algorithms

The most basic implementation of ABC is known as rejection sampling [16]. In this algorithm, the parameter is estimated by generating model realisations x corresponding to different parameter values ϕ promoted from the prior. The summaries S(x) are computed and compared to S(y) through the discrepancy measure ρ(·,·). If the discrepancy between the simulated and observed summaries is lower than the tolerance, ϵ, then the corresponding ϕ is accepted as part of the approximate posterior distribution.

The pseudo-code for an ABC rejection sampling scheme is provided below [17]:

for i ϵ 1: n do

Draw ϕ ~ π(ϕ)

Draw x ~ p(· | ϕ)

Accept ϕ if ρ(S(x), S(y)) ≤ ϵ

end for

where n is the number of iid samples to be taken from the prior π(ϕ).

In this paper, ABC is implemented through an MCMC algorithm to effectively estimate the posterior distributions of the recovery and spreading parameters of BBTV in the Newrybar banana plantation. ABC-MCMC [18] aims to improve the efficiency in comparison to ABC rejection sampling, by proposing parameter values locally around promising regions of the parameter space.

Summary statistics

The summary statistics play an important role in the ability for ABC methods to effectively estimate the posterior distributions of parameters [15]. Summary statistics summarise observed or simulated data which can often be large, complex and high dimensional. Effective summary statistics characterise the influence of specific parameters on the model, so that varying parameter values result in observable changes in the reported summary statistics.

We find that the following summary statistics are informative about the model parameters:

  1. S1–A vector with entry S1(t) being the proportion of infected nodes at each time step t (where t = 1, …, 38).

  2. S10–A scalar computed as the total number of infected nodes summed over all time-steps t, that recovered by t + 1.

  3. S010–A scalar computed as the total number of susceptible nodes summed over all time-steps t, which became infected by t + 1. These nodes must not have an infected neighbour at time t.

  4. S011–A scalar computed as the total number of susceptible nodes summed over all time-steps t, which became infected by t + 1. These nodes must have at least one infected neighbour at time t.

The development of summary statistics to capture important features of simulated data is largely intuitive and requires some tuning and validation through simulation studies with dummy data. In developing our summary statistics, we consider the informativeness of each summary statistic to a specific parameter, while minimising dimensionality.

The summary statistics are informative as follows:

  • S1 describes the temporal characteristic of the simulated infection spread and is influenced by all three parameters.

  • S10 provides an indication of the recovery rate, thus corresponding to θ0.

  • S010 describes the number of infections occurring via a vector from a long distance, corresponding with θ2.

  • S011 describes the number of infections occurring via a vector from a neighbouring node, informing θ1.

Although there are several approaches available to weight summary statistics in ABC [19], this is not necessary for our application since the total variance and value of each summary statistic is sufficiently similar, such that each statistic has an equal influence on the discrepancy measure. This has been validated through a simulation study where dummy ‘observed’ data has been simulated with known parameter values, and an ABC-MCMC algorithm with the above summary statistics is used to estimate these parameters.

Since the parameters are applied during different seasons, the summary statistics S10, S010 and S011 have been replicated for summer (20 out of 38 months) and winter. Thus, we have a total of 7 summary statistics, one vector of length 38, and 6 scalars (3 for summer and 3 for winter).

It may be noted that while S1 is a vector describing the infection rate for each time step t, all other summaries (S10, S011, S010) are scalars–having been summed over all time steps t. While the use of a combined metric does reduce some of the information captured, we believe this is a reasonable approach for our application, especially given the great reduction in dimensionality to the final summary vector. Furthermore, we suggest that little information is lost in the aggregation. Consider the statistic S10, for example. Under the assumed model, the number of nodes that recover from time t to t + 1 is binomially distributed with the number of trials given by the number of infected nodes at time t and “success” probability θ0j. Aggregating this statistic over the summer or winter months results in a sum of binomially distributed random variables each with a different number of trials and the same success probability. Using the properties of the binomial distribution, the sum is also binomially distributed. Thus, under the assumed model, the aggregated statistic carries the same information as the vector of statistics over the summer or winter months. A similar argument can be made for S011 and S010. This significantly reduces the length of the summary vector, enabling lower ABC tolerances and higher acceptance rates, and resulting in better approximations of the posterior distributions of the parameters [16].

Discrepancy measure (ρ(·,·))

The Mean Squared Error (MSE) between the simulated and observed summary vector is utilised as a discrepancy measure for this case study. This may be described as follows:

ρ(S(x),S(y))=1kj=1k(Sj(x)Sj(y))2 (5)

where S(x) is the summary vector of the simulated data, S(y) is the summary vector of the observed data, and j denotes an element of the summary vector, while k represents the number of elements in each summary vector.

Here we incorporate all our summary statistics into a single discrepancy measure to compare observed and simulated data, as is standard in ABC analyses. An alternative approach may be to define a distance metric for each summary with its own tolerance (e.g. Ratmann et al. [20]).

ABC-MCMC iterations

10 million (1e7) MCMC iterations were simulated to achieve the approximate posteriors for the model parameters in order to maintain a reasonable effective sample size. We opted for a burn-in of (1e5) MCMC iterations and thinned by a factor of 200 to reduce correlation between proposals along the MCMC chain. This results in approximately 40,000 unique samples after thinning and burn-in.

ABC-MCMC tolerance (ε)

Reducing the MCMC tolerance (ε) increases the accuracy of the posterior distribution, at the expense of computation time. An extensive simulation study was conducted which revealed that an MCMC tolerance of 23 was appropriate to provide accurate posteriors while ensuring a reasonable computation time.

Priors

Given the novel nature and the lack of specific expert knowledge regarding these parameters, uniform priors with bounds of [0, 1] have been chosen for all parameters.

Results and discussion

Posterior distributions

As shown in Table 1, mean posterior recovery probability (θ0j) is influenced by the seasonal changes in temperature. While the mean posterior recovery probability for an infected node is 25.8% in summer (θ01), this increases to 30.67% for winter (θ00).

Table 1. Mean posteriors of parameters.

Summer (j = 0) Winter (j = 1) Δ
Recovery (θ0j) 25.8% 30.67% +4.87%
Neighbouring infectivity (θ1j) 6.18% 4.06% -2.12%
Distant infectivity (θ2j) 0.72% 0.62% -0.1%

Mean posterior neighbouring infectivity (θ1j) is 2.12% higher in summer months compared to winter months (6.18% and 4.06% respectively), indicating minor seasonal dependence. Distant infectivity exhibits a similar dependence (θ2j), as the mean posterior probability during summer of 0.72% only decreases by 0.1% during winter months (Fig 7 and Table 1).

Fig 7. Approximate posterior distributions of parameters.

Fig 7

Posterior densities are coloured to correspond with their respective seasonal counterpart, with darker colours representing winter seasons.

These results may be explained through a range of environmental factors. Higher posterior probabilities for node recovery in winter are likely due to lower inspection accuracy arising from lower leaf growth rates and decreased farming activity during this season. Research from the Department of Employment Economic Development and Innovation identify that bananas growing in the sub-tropical climates, such as the South-East Queensland area, are heavily influenced by temperatures: the rate of production is often significantly reduced in winter, sometimes to a rate of one leaf in 20 days [21]. In contrast, summer leaf emergence can be completed in around four days in tropical conditions. Since BBTV in banana plants is identified through observing visual symptoms of infections, identifying newly infected plants is much more likely during summer compared to winter, where a plant may be latently infected for months before displaying signs of infection. Therefore, the lower reported BBTV infection rates in winter would artificially increase the posterior probability of recovery in this season.

Higher neighbouring and distant infectivity in summer is likely due to more weather events in this season, and inoculum acquisition sensitivity to tropical temperatures. According to the historical monthly averages of climate data provided by the Bureau of Meteorology, South-East Queensland experiences significantly higher wind speeds during summer at an average maximum gust of 131.7 km/h compared to 93.7 km/h during winter [22]. Similarly, average rainfall during summer is 113.7 mm for 11 days compared to 84.2 mm for nine days during winter [22]. A higher frequency and intensity of such weather events are likely to perturb the aphid vector, as observed by Claflin et al. [9], increasing the risk of neighbouring and long-distance transmission. Additionally, Anhalt & Almeida [11] identify that P. nigronervosa provides peak inoculum acquisition and transmission efficiency between 25–30 degrees Celsius, which is historically experienced during the summer months in South-East Queensland. Higher seasonal rates of inoculum transmission, in conjunction with increased vector activity during this period could account for greater counts of neighbouring and long-distance infections during the summer.

The estimated pairwise posterior distributions of model parameters highlight key operative characteristics of the model. As seen in Fig 8, the estimated univariate posterior distribution for the posterior probability of recovery is largely symmetric, while the posterior probability of neighbouring and distant infectivity (θ1j) remains positively skewed.

Fig 8. Estimated univariate and pairwise posterior distributions using ABC-MCMC.

Fig 8

Negative correlation is evident between neighbouring infectivity (θ1j) and distant infectivity in the corresponding months (θ2j), which indicates that different combinations of these parameters can generate similar summary statistics. This is an intuitive correlation since the total number of infected nodes in each month is a sum of the number of nodes infected by a neighbour or over long-distance. Therefore, if a high probability of neighbouring infectivity is proposed in a model simulation, a low probability of distant infectivity will be more likely to result in the observed summary statistics.

Posterior forecasting

Our modelling framework allows us to provide a posterior infectivity forecast for all nodes over a 6-month period (Fig 9). The forecast is generated by running the BBTV forward simulation model with parameters obtained from a random sample from the posterior distribution.

Fig 9. 6-month posterior forecast of node infectivity.

Fig 9

(a) Simulated using both summer and winter posterior counterparts. (b) Simulated using only summer posterior counterparts. (c) Simulated using only winter posterior counterparts. Green lines indicate subsections infected at the last known timestep t = 38, while brown lines indicate subsections in a susceptible state at t = 38.

Each node begins with a 100% or 0% infection probability in month 38, since this is the last known time-step provided to the model as the initial configuration for each of the three simulation scenarios. The subsequent forecasted probabilities of infection for each node converge to a steady state probability of 45%. The forecasted infection probabilities for nodes infected in the last known time-step are characterised by a depreciation in infection probability, and an increasing variance in individual infection probability at each subsequent month. In contrast, forecasted probabilities for initially uninfected nodes begin with a high variance in the group, progressively decreasing in subsequent months.

Fig 9(B) provides a posterior forecast of node infectivity which applies the posterior counterparts for summer months to all months (θi1). Compared to Fig 9(A), this results in a higher steady state probability of 57%. Furthermore, while previously uninfected nodes in both figures exhibit a steep increase in forecasted node infectivity by the first month, the subsequent infection probabilities flatten to the steady state infection probability in Fig 9(A) and continue to increase in Fig 9(B). This is likely due to the higher neighbour infectivity and distant infectivity probabilities during summer.

This may be confirmed through Fig 9(C), which provides a 6-month posterior forecast of node infectivity, utilising the posterior counterparts of winter months to all months (θi0). Unlike Fig 9(B) and 9(C) displays a gradual increase in forecasted probability of node infectivity over subsequent months, tending to a lower steady-state node infectivity probability of 40%.

Discrete-space posterior probability forecast

Fig 10(A) and 10(B) visualise the 1-month posterior probability forecasts for infection in each node. Fig 10(A) depicts the posterior infection probabilities for infected nodes in the last observed time-step, while Fig 10(B) depicts the posterior infection probabilities for previously uninfected nodes.

Fig 10. 1-month discrete-space posterior forecast of node infectivity.

Fig 10

(a) Posterior probability of node infection for previously infected nodes. (b) Posterior probability of node infection for previously uninfected nodes. Subsections highlighted in green indicate nodes not planted with bananas.

Posterior forecasts from the BBTV model indicate that previously infected nodes are likely to remain infected, consistently reporting an infection probability of approximately 74% for all nodes. Since the forecasted month (month 39) occurs during the summer period, the posterior distributions of the summer counterparts (θi1) are utilised for this simulation.

Fig 10(A) and 10(B) indicate that the highest posterior predicted infection probability is associated to nodes with a high number of neighbours. This is likely due to the greater application of neighbouring infectivity (θ1j) on this node, as a high number of neighbours increases the probability of the aphid vector to infect this node.

Alternate plantation organisation

The model may be utilised to explore the reconfiguration of a banana plantation, to observe the effect of clearing a subsection to mitigate the local spread of BBTV. The clearing of a subsection may be considered equivalent to freezing its corresponding node in a susceptible state for all time steps t. Through this method, any long- or short-range vector transmission to a cleared node will remain unsuccessful.

Field surveyors with the National Banana Bunchy Top Project expressed interest in exploring the impact of clearing the central section of the farm (nodes 20 to 34). Fig 11(A) and 11(B) describe the 1-month posterior infection probability forecasts for month 39 if nodes 20 to 34 (in grey) were cleared.

Fig 11. 1-month discrete-space posterior forecast of node infectivity.

Fig 11

(a) Posterior probability of node infection for previously infected nodes. (b) Posterior probability of node infection for previously uninfected nodes. Subsections highlighted green indicate nodes not planted with bananas. Subsections highlighted grey indicate nodes that have been ‘cleared’ according to the simulation.

When compared to the forecasts from Fig 10(A) and 10(B), removing nodes 24 to 32 results in some nodes have a higher posterior infection probability by up to 0.8%, while others decrease by a maximum of 1%. When averaged across all remaining nodes in the network, there is an insignificant reduction in the forecasted posterior infection probabilities. Further configurations of cleared areas may be explored upon recommendations by stakeholders.

Posterior predictive checking

In addition to the 38 months of field data provided by the BBTV Prevention Program from December 2014 to January 2018 which was used to estimate the model parameters (training dataset), an additional 7 months of field data till August 2018 was provided, which may be utilised as validation dataset. In order to conduct posterior predictive checking, the model was set to simulate the posterior infection probabilities of each node in month (t + 1), given an initial configuration of the infected nodes at month t. The validation data for month (t + 1) may then be compared to the corresponding posterior infection probabilities to identify model accuracy.

Binomial deviance loss

The accuracy of the posterior predicted infection probabilities for a subsection may be identified through its deviance from the validation data. This may be calculated through the binomial deviance loss function, which provides a discrepancy between a prediction and its corresponding binomial validation data (true node infection state). The loss function is defined as follows:

yPlog(1+eP) (6)

where y is the true node infection state in month (t + 1), and P is the log odds of the corresponding posterior prediction. A deviance loss of 0 indicates a completely correct prediction of the true infection state of a node at time (t + 1), while a loss of -2 indicates that the predicted state of a node at time (t + 1) is totally deviant from the true infection state recorded in the validation dataset.

Fig 12 displays the binomial deviance loss for each prediction for each subsection, coloured according to the validation dataset used. Prediction accuracy is largely unaffected by the validation set used, with the mean deviance loss across all subsections being -0.6; a 40% improvement compared to a random prediction. Posterior predictions for certain subsections are consistently excellent (see nodes 27–32, 43–50), due to consistent incidence levels in these areas of the farm.

Fig 12. Binomial deviance loss aggregated for all predicted months in a 6-month period.

Fig 12

Each dot represents the binomial deviance for a prediction corresponding to a subsection, coloured according to the predicted validation data set. The red line represents the deviance loss of a posterior prediction of 50% (random).

Receiver-Operating Characteristic (ROC) curve

A ROC Curve illustrates the diagnostic ability of a binary classifier system as its discrimination threshold is varied. The ROC curve is created by plotting the true positive rate (TPR) against the false positive rate (FPR) at various threshold settings. Fig 13 displays the ROC curve of the model, which demonstrates an Area-Under-the-Curve (AUC) of 0.65, generally considered to be a “fair” model. This metric is a general indicator of the posterior prediction confidence, as a larger area under the curve would indicate higher true positive rates at lower threshold levels. The ROC curve is well above the diagonal line, indicating that it performs markedly better than random predictions, particularly at higher thresholds.

Fig 13. Receiver-Operating Characteristic (ROC) curve for the BBTV forward-simulating model.

Fig 13

Predictions across all validation data sets have been aggregated to construct this curve.

Implications and future research

Monitoring the nodal recovery (θ0j), near infectivity (θ1j) and distant infectivity (θ2j) rates over time would serve to metricise the effectiveness of disease management strategies. For example, a long-term reduction in θ0j could be a signal of increased aphid resistance to chemicals used in current pesticide routines, or of an increase in latent infections across the plantation. Correspondingly, a long-term increase in θ1j and θ2j could point to higher aphid activity across the plantation, potentially indicating a reduction in inspection accuracy.

Providing site inspectors with specific high-risk areas through posterior predictive modelling, could improve inspection accuracy. Furthermore, the identification of 3 or 4 high-risk areas in a plantation opens the opportunity of increasing the site inspection frequency from monthly to fortnightly inspections, while limiting the inspection coverage to these high-risk areas.

The model has several limitations. Firstly, it does not account for the geographical characteristics of the plantation. The steepness and height of an area of the plantation would affect the ability for the aphid vector to travel to neighbouring subsections/nodes. Furthermore, the area represented by each node, and the distance to neighbouring nodes is quite varied and would be likely to influence the posterior probability of neighbouring and distant infectivity due to the greater distance for the aphid vector to travel, and the number of potential host plants in a subsection. Secondly, environmental factors such as the wind speed and direction, and extreme weather events, are not considered in this model. Higher wind speeds and extreme weather events would be more likely to perturb and relocate the aphid vector, resulting in greater infectivity rates across the plantation. Thirdly, the methodology places limits on the achievable resolution of posteriors, since the model relies on the presence of clear divided areas in a banana plantation to identify nodes and establish a network. Furthermore, since the collection of field data relies on visual observation, the accuracy of model parameters largely relies on the inspection accuracy which is influenced by seasonality and environmental factors, in addition to inspector experience and fatigue. Lastly, the model only considers a subsection to be infected if at least one infection has been reported in that subsection, whereas the field data collected at Newrybar would enable the calculation of infection counts in each subsection/node, as well as the total number of infected leaves present in a plant, and thus in a subsection/node. These factors would significantly improve model accuracy and informativeness. Extending the model to address these limitations should be considered for future research.

Conclusion

This paper has adapted and extended upon current network-based disease models implemented in an ABC framework. A forward-simulating network-based SIS model has been created which simulates the spread of BBTV across the subsections of a banana plantation, by parametrising nodal recovery, neighbouring infectivity and distant infectivity across summer and winter. Findings from posterior results achieved through ABC-MCMC indicate seasonality in all parameters, which are influenced by correlated changes in inspection accuracy, temperatures and aphid activity. This model enables the simulation, monitoring and forecasting of various disease management strategies, which may support policy-level decision making and inspector experience. Introducing higher dimensional field and weather data will improve model accuracy and utility; an area to be explored for future research.

Supporting information

S1 Zip. Source code.

MATLAB code for network model, summary statistics, and ABC-MCMC parameter estimation algorithm.

(ZIP)

S2 Zip. Source data.

Raw.CSV data of infections recorded from banana plantation from Dec-2014 to Jan-2018.

(ZIP)

S1 Document. Supporting information MCMC convergence analysis.

Document providing convergence analysis of MCMC algorithm, algorithm choice and sensitivity analysis of ABC-MCMC tolerance parameter.

(DOCX)

S1 Fig. Post burn-in trace plots of ABC-MCMC chain for all parameters.

Referred to in S1 Document.

(TIFF)

S2 Fig. Post burn-in (1e6) and thinned (factor of 200) lag-k autocorrelation plots of ABC-MCMC for all parameters.

Referred to in S1 Document.

(TIFF)

S3 Fig. Approximate posterior means across multiple tolerance thresholds.

Referred to in S1 Document.

(TIFF)

Acknowledgments

The authors would like to acknowledge the support of Hort Innovation in providing the data required for this project, and Barry Sullivan, Dr. John Thomas and Dr. Kathy Crew for their expert advice on BBTV throughout this project.

Data Availability

All relevant data and code are found within the manuscript and its Supporting Information files. They are also made available at the following Github repository: https://github.com/Sheksta/bbtv-abc

Funding Statement

A.V received funding by the Australian Research Council (ARC) Centre of Excellence for Mathematical and Statistical Frontiers (ACEMS) and the Queensland University of Technology under grant number CE140100049. K.M was supported by an ARC Laureate Fellowship under grant number FL150100150. C.D was supported by the ARC Discovery Project and the Queensland University of Technology under grant number DP200102101. A.M’s role in this paper was self-funded. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Smith MC, Holt J, Kenyon L, Foot C. Quantitative epidemiology of Banana Bunchy Top Virus Disease and its control. Plant Pathology. 1998;47(2):177–87. [Google Scholar]
  • 2.Fish S. The History of Plant Pathology in Australia. Annual Review of Phytopathology. 1970;8(1):13–36. [Google Scholar]
  • 3.Cook DC, Liu S, Edwards J, Villalta ON, Aurambout J-P, Kriticos DJ, et al. Predicting the Benefits of Banana Bunchy Top Virus Exclusion from Commercial Plantations in Australia (Banana Bunchy Top Virus Control in Australia). 2012;7(8):e42391. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Dale JL. Banana bunchy top: an economically important tropical plant virus disease. Advances in virus research. 1987;33:301 10.1016/s0065-3527(08)60321-8 [DOI] [PubMed] [Google Scholar]
  • 5.Brooks-Pollock E, Roberts GO, Keeling MJ. A dynamic model of bovine tuberculosis spread and control in Great Britain. Nature. 2014;511:228 10.1038/nature13529 [DOI] [PubMed] [Google Scholar]
  • 6.Allen RN, New South Wales Department of Agriculture WNCAI. Further studies on epidemiological factors influencing control of banana bunch top disease, and evaluation of control measures by computer simulation. Australian Journal of Agricultural Research; ISSN. 1987;38(2):373–82. [Google Scholar]
  • 7.Sisson SA. Handbook of Approximate Bayesian Computation. Fan Y, Beaumont M, editors. Milton: CRC Press LLC; 2018. [Google Scholar]
  • 8.Dutta R, Mira A, Onnela JP. Bayesian inference of spreading processes on networks. Proceedings Of The Royal Society A-Mathematical Physical And Engineering Sc. 2018;474(2215). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Claflin SB, Power AG, Thaler JS. Aphid density and community composition differentially affect apterous aphid movement and plant virus transmission. Ecological Entomology. 2017;42(3):245–54. [Google Scholar]
  • 10.Allen RN, New South Wales Department of Agriculture WARC. Epidemiological factors influencing the success of roguing for the control of Bunchy Top disease of bananas in New South Wales. Australian Journal of Agricultural Research; ISSN. 1978;29(3):535–44. [Google Scholar]
  • 11.Anhalt MD, Almeida RPP. Effect of Temperature, Vector Life Stage, and Plant Access Period on Transmission of Banana bunchy top virus to Banana. Phytopathology. 2008;98(6):743–8. 10.1094/PHYTO-98-6-0743 [DOI] [PubMed] [Google Scholar]
  • 12.Gibson GJ. Markov Chain Monte Carlo Methods for Fitting Spatiotemporal Stochastic Models in Plant Epidemiology. Journal of the Royal Statistical Society: Series C (Applied Statistics). 1997;46(2):215–33. [Google Scholar]
  • 13.Brown PE, Chimard F, Remorov A, Rosenthal JS, Wang X. Statistical inference and computational efficiency for spatial infectious disease models with plantation data. Journal of the Royal Statistical Society: Series C (Applied Statistics). 2014;63(3):467–82. [Google Scholar]
  • 14.Danon L, Ford AP, House T, Jewell CP, Keeling MJ, Roberts GO, et al. Networks and the epidemiology of infectious disease. Interdiscip Perspect Infect Dis. 2011;2011:284909 10.1155/2011/284909 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Dennis P. Summary Statistics Handbook of Approximate Bayesian Computation: CRC Press; 2018. [Google Scholar]
  • 16.Beaumont MA, Zhang W, Balding DJ. Approximate Bayesian Computation in Population Genetics. Genetics. 2002;162(4):2025 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Marin J-M, Pudlo P, Robert CP, Ryder RJ. Approximate Bayesian computational methods. Statistics and Computing. 2012;22(6):1167–80. [Google Scholar]
  • 18.Marjoram P, Molitor J, Plagnol V, Tavaré S. Markov chain Monte Carlo without likelihoods. 2003;100(26):15324–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Fearnhead P, Prangle D. Constructing summary statistics for approximate Bayesian computation: semi-automatic approximate Bayesian computation [with Discussion]. Journal of the Royal Statistical Society Series B (Statistical Methodology). 2012;74(3):419–74. [Google Scholar]
  • 20.Ratmann O, Andrieu C, Wiuf C, Richardson S. Model criticism based on likelihood-free inference, with an application to protein network evolution. Proceedings of the National Academy of Sciences. 2009;106(26):10576. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Deuter P, White N, Putland D, Mackenzie R, Muller J. Critical (temperature) thresholds and climate change impacts/adaptation in horticulture 2011. [Google Scholar]
  • 22.Ballina Weather Station: Bureau of Meteorology In: Meteorology Bo, editor. [online]: Ballina Airport Weather Station; 2018. [Google Scholar]
PLoS Comput Biol. doi: 10.1371/journal.pcbi.1007878.r001

Decision Letter 0

Virginia E Pitzer, Konstantin B Blyuss

14 Oct 2019

Dear Dr Varghese,

Thank you very much for submitting your manuscript 'Estimating a novel stochastic model for within-field disease dynamics of banana bunchy top virus via approximate Bayesian computation' for review by PLOS Computational Biology. Your manuscript has been fully evaluated by the PLOS Computational Biology editorial team and in this case also by independent peer reviewers. The reviewers appreciated the attention to an important problem, but raised some substantial concerns about the manuscript as it currently stands, mostly regarding the justification and details of the particular methodology used to address the problem at hand. While your manuscript cannot be accepted in its present form, we are willing to consider a revised version in which the issues raised by the reviewers have been adequately addressed. We cannot, of course, promise publication at that time.

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

Your revisions should address the specific points made by each reviewer. Please return the revised version within the next 60 days. If you anticipate any delay in its return, we ask that you let us know the expected resubmission date by email at ploscompbiol@plos.org. Revised manuscripts received beyond 60 days may require evaluation and peer review similar to that applied to newly submitted manuscripts.

In addition, when you are ready to resubmit, please be prepared to provide the following:

(1) A detailed list of your responses to the review comments and the changes you have made in the manuscript. We require a file of this nature before your manuscript is passed back to the editors.

(2) A copy of your manuscript with the changes highlighted (encouraged). We encourage authors, if possible to show clearly where changes have been made to their manuscript e.g. by highlighting text.

(3) A striking still image to accompany your article (optional). If the image is judged to be suitable by the editors, it may be featured on our website and might be chosen as the issue image for that month. These square, high-quality images should be accompanied by a short caption. Please note as well that there should be no copyright restrictions on the use of the image, so that it can be published under the Open-Access license and be subject only to appropriate attribution.

Before you resubmit your manuscript, please consult our Submission Checklist to ensure your manuscript is formatted correctly for PLOS Computational Biology: http://www.ploscompbiol.org/static/checklist.action. Some key points to remember are:

- Figures uploaded separately as TIFF or EPS files (if you wish, your figures may remain in your main manuscript file in addition).

- Supporting Information uploaded as separate files, titled Dataset, Figure, Table, Text, Protocol, Audio, or Video.

- Funding information in the 'Financial Disclosure' box in the online system.

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see here

We are sorry that we cannot be more positive about your manuscript at this stage, but if you have any concerns or questions, please do not hesitate to contact us.

Sincerely,

Konstantin B. Blyuss

Guest Editor

PLOS Computational Biology

Virginia Pitzer

Deputy Editor

PLOS Computational Biology

A link appears below if there are any accompanying review attachments. If you believe any reviews to be missing, please contact ploscompbiol@plos.org immediately:

[LINK]

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: Referee report on: PCOMPBIOL-D-19-01505

It is not clear what the network is. How many nodes, how many links, what is the degree distribution? Is this a fully spatial network, apart from the long-range transmission? Do patches separated by dirt tracks define nodes? Are all adjacent patches considered as neighbours? Would be useful to see some summary statistics of this spatial network.

Data on which the parameter estimation has been done is never presented, plotted or analysed. I would expect that at least some of the paper presents and analyses this data.

Why are there no comparisons between the true data and model output? Would like to see how output from the models with the best parameters compares to the true data.

The simulation seem to be a discrete time simulation with an arbitrary update windows, so how do you compare abstract time to real time that it is present in the real data? Is the abstract time-step set to some real time like day or week or so on? It is never mentioned over what time window has the plantation been observed and some plots of how the number of infected patches evolved in time.

See my comment about the lack of analysis of the data on which the inference is based on.

At some point the authors talk about SI model but their model is SIS.

What is the initial condition for the simulation?

Nodes are sometimes referred to as patches then subsections and other versions, can there be some consistency?

Supplementary material in my opinion is more important than some of the figures in the paper. These essentially try to make a case that the model is good/valid. I would move this to the main part of the paper and would make a better case that the model is valid by comparing model output to existing data over the whole period of collection. Or by devising a test whereby one divides the existing data in “used for inference” and “used for validation” and tests performed over different such divisions to validate the model properly.

So the role of the vector is simply the long-range infection?

I am not sure if the choice of statistics is the best for comparing data to simulation.

First, I would like to see the data. Is it simply, time, patch, status? Is there seasonality in the collected data?

For reproducibility purposes is the code available on Github? What about the data?

Line 114: over a range of plantation

Line 132-133: what do the authors mean by “the new BBTV

model does not estimate for an initial seed node”? Do they mean that their model is not able to infer the initial seed?

Line 191: I think it should be /pi(\\phi) not \\phi(x) in the integral

Line 315: In the sentence “Fig 6 provides a posterior infectivity forecast for all subsections over a 6-month period” what does “subsections” refer to? Is this each node/patch?

Line 363-364: “if subsections 20 to 34 (in grey) were cleared” – is this equivalent o removing nodes?

Line 408-410: again this is not an SI model, in epidemiology this is an SIS model.

Line 468-472: what ranges of values can formula 2 take and what value means good fit versus bad fit?

Reviewer #2: Review of “Estimating a novel stochastic model for within-field disease dynamics of banana

bunchy top virus via approximate Bayesian computation” by Varghese et al. (PCOMPBIOL-D-19-01505)

This paper describes the development and fitting of a stochastic SIS model of banana bunchy top virus (BBTV) to some novel data from a plantation in Australia. The model is fitted to the available data using ABC-MCMC. The underlying disease dynamics involve an intermediate aphid vector, however data on aphid populations are unknown, and instead the authors simplify the underlying model to look at aggregate probabilities of localised transmission and long-range transmission between “nodes” in a network, where each node corresponds to a group of plants in spatially contiguous areas. This approach has the advantage of not having to model the aphid vector dynamics, at the cost of modelling at quite a course spatial level. They extend their model to use different transmission and recovery probabilities in different seasons (summer/winter) in order to try to assess the impact of weather / temperature etc. on the transmission dynamics.

The results suggest some marginal impact of season on localised transmission potential, and negligible impact of season on long-range transmission and recovery.

Overall I think the paper looks fine. It’s a relatively straightforward application of ABC methodology to a novel data set, and would be of interest to the readers of the journal call. The authors state a “significant” impact of season on localised transmission potential, but I think that the approximate posteriors overlap a lot here and so I might argue that the effect is not that “significant”. In my opinion there also needs to be more discussion of the ABC tolerance used in the ABC-MCMC, and what impact that might have made on the results as well. Perhaps subtle differences are being masked by the approximations introduced in the ABC, and this warrants further discussion. Another discussion point could be whether the use of a combined metric might be averaging out some of the information in the data,.

Minor remarks:

Abstract (and elsewhere): I think that technically the authors are using an SIS model, not an SI model per se, because there are transitions back to the S class (and hence why they observe endemic equilibriums in their forecasts).

L57, L94: Sometimes there are capital letters in e.g. “North-Eastern” (L94) but other times lower-case letters e.g. “south-east” (L57). These should be consistent throughout the manuscript.

L128: Sentence beginning, “Secondly, ….” does not read well. Suggest a re-write to clarify what the authors mean.

L144: “rogueing” should be “roguing”.

L160: The figure citations in the text are all out by one (and throughout the rest of the manuscript). This line should read “Fig. 1” and not “Fig. 2”, and this carries through the paper.

Equation 4: Should this have a 1/n outside the sum?

L242: Some more exposition about how the authors figured out which summary measures were informative would be interesting. What criteria did they use?

L245: It might be worth clarifying in the notation that S1 is specified for each time point (all the other summary measures are explicitly stated to be at time point t, except the first).

P13: How many replicates simulations were used to generate the estimate in equation 4? This should be stated somewhere.

Fig 4: It looks like the plot in the bottom-right has had its axes flipped compared to the other plots. Is this correct?

P21: Why did the authors choose to remove the grey squares? Why these and not any other squares? Possibly Fig 8 might be more informative if the differences were mapped, rather than the absolute values.

L384: “...their priori...” doesn’t make sense.

L411: Here the authors use “MCMC-ABC” but elsewhere they use “ABC-MCMC”.

Reviewer #3: See attachment

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Large-scale datasets should be made available via a public repository as described in the PLOS Computational Biology data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.

Reviewer #1: No: no data seen or even plotted

Reviewer #2: Yes

Reviewer #3: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: No

Attachment

Submitted filename: Review_PCOMPBIOL-D-19-01505.pdf

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1007878.r003

Decision Letter 1

Virginia E Pitzer, Konstantin B Blyuss

10 Mar 2020

Dear Mr. Varghese,

Thank you very much for submitting your manuscript "Estimating a novel stochastic model for within-field disease dynamics of banana bunchy top virus via approximate Bayesian computation" for consideration at PLOS Computational Biology. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. Reviewers 1 and 3 have suggested accepting your manuscript for publication as is. Reviewer 2 has mentioned a few additional very minor issues that they would like to ask you to sort out, and once this is done, they would not need to review the paper again, and the paper can then be accepted for publication.

Please prepare and submit your revised manuscript within 30 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. 

When you are ready to resubmit, please upload the following:

[1] A letter containing a detailed list of your responses to all review comments, and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out

[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).

Important additional instructions are given below your reviewer comments.

Thank you again for your submission to our journal. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Konstantin B. Blyuss

Guest Editor

PLOS Computational Biology

Virginia Pitzer

Deputy Editor

PLOS Computational Biology

***********************

A link appears below if there are any accompanying review attachments. If you believe any reviews to be missing, please contact ploscompbiol@plos.org immediately:

[LINK]

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: I thank the authors for addressing all my comments.

I am happy to recommend the paper for publication.

Reviewer #2: Review of “Estimating a novel stochastic model for within-field disease dynamics of banana

bunchy top virus via approximate Bayesian computation” by Varghese et al. (PCOMPBIOL-D-19-01505)

This is a review of a revised version of the above manuscript. I thank the authors for taking the time to consider and act on the comments from the first review. In my opinion, the manuscript is now much improved and clearer as to the intent, novelty and limitations of the study. As such I think it is suitable for publication, subject to a few minor amends as highlighted below.

One point is that there are now a lot of figures. I appreciate that the authors have moved some figures from the Supplementary Information to the main paper in response to some comments from the other reviewers. However, I think the balance has shifted a bit too far. For example, Fig. 7 is effectively the diagonals of Fig. 8, and as such I would remove Fig. 7 and just reference Fig. 8. I would also put all the MCMC convergence plots into the Supp Mat (e.g. Figs 9 and 10). Also, Fig 11 is a sensitivity analysis type plot and I think it should be in Supp Mat also.

There is also no Fig. 12 as far as I can see, so again, please take care with figure numbering.

Minor remarks:

Abstract, L33: I don’t think the “predict for” is correct here. Would suggest something like:

“Our models can be used to quantify the effects of …. BBTV spread, and can be used to predict high-risk areas in this plantation.”

L131: Should probably reference the exact GIS software used.

L281: Should probably be \\rho(S(x), S(y)) to be consistent with previous paragraph applying the distance metric to summary measures.

L297-300: Please make it clear that these are summed across all time points. This is not highlighted in the text until the following page, and I think it would help clarity to make this explicit here.

Table 1: Should the summer and winter be j = 0 and j = 1 respectively (not j = 1 and j = 2)?

L431: This is interesting, but I disagree slightly with the author’s interpretation. To my mind, what they have done is conduct a sensitivity analysis around what would happen to the posterior marginal Cis if they were to change the tolerance. They suggest that since the CIs don’t change much, then the “chosen tolerance value is reasonable for our application”. I might suggest that what they’ve done is assess the relative impacts of small changes, rather than the absolute impacts. I would suggest rewording this to something along the lines of, “small changes to the tolerance do not seem to impact on the approximate posterior uncertainties here, and so we decide to keep with our initial choice of tolerance.”

This also relates to my earlier question (apologies for not making this clear in the first review), that a choice of an aggregated discrepancy measure might average out some information in the data (an alternative is to use multiple distance metrics around each summary measure, each with a different tolerances e.g. Ratmann et al. 2009).

L592: “Significant” seasonality?

Reviewer #3: Whilst I am not fully convinced that the network model considered is more complex than the spatial models I referenced in my original review, I do believe that this work makes a nice contribution to applying ABC to epidemic models. I have no further comments to raise.

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Large-scale datasets should be made available via a public repository as described in the PLOS Computational Biology data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.

Reviewer #1: Yes

Reviewer #2: None

Reviewer #3: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: No

Figure Files:

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Data Requirements:

Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.

Reproducibility:

To enhance the reproducibility of your results, PLOS recommends that you deposit laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see http://journals.plos.org/ploscompbiol/s/submission-guidelines#loc-materials-and-methods

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1007878.r005

Decision Letter 2

Virginia E Pitzer, Konstantin B Blyuss

15 Apr 2020

Dear Mr. Varghese,

We are pleased to inform you that your manuscript 'Estimating a novel stochastic model for within-field disease dynamics of banana bunchy top virus via approximate Bayesian computation' has been provisionally accepted for publication in PLOS Computational Biology.

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests.

Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.

IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.

Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS.

Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. 

Best regards,

Konstantin B. Blyuss

Guest Editor

PLOS Computational Biology

Virginia Pitzer

Deputy Editor

PLOS Computational Biology

***********************************************************

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1007878.r006

Acceptance letter

Virginia E Pitzer, Konstantin B Blyuss

12 May 2020

PCOMPBIOL-D-19-01505R2

Estimating a novel stochastic model for within-field disease dynamics of banana bunchy top virus via approximate Bayesian computation

Dear Dr Varghese,

I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.

Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work!

With kind regards,

Laura Mallard

PLOS Computational Biology | Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom ploscompbiol@plos.org | Phone +44 (0) 1223-442824 | ploscompbiol.org | @PLOSCompBiol

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Zip. Source code.

    MATLAB code for network model, summary statistics, and ABC-MCMC parameter estimation algorithm.

    (ZIP)

    S2 Zip. Source data.

    Raw.CSV data of infections recorded from banana plantation from Dec-2014 to Jan-2018.

    (ZIP)

    S1 Document. Supporting information MCMC convergence analysis.

    Document providing convergence analysis of MCMC algorithm, algorithm choice and sensitivity analysis of ABC-MCMC tolerance parameter.

    (DOCX)

    S1 Fig. Post burn-in trace plots of ABC-MCMC chain for all parameters.

    Referred to in S1 Document.

    (TIFF)

    S2 Fig. Post burn-in (1e6) and thinned (factor of 200) lag-k autocorrelation plots of ABC-MCMC for all parameters.

    Referred to in S1 Document.

    (TIFF)

    S3 Fig. Approximate posterior means across multiple tolerance thresholds.

    Referred to in S1 Document.

    (TIFF)

    Attachment

    Submitted filename: Review_PCOMPBIOL-D-19-01505.pdf

    Attachment

    Submitted filename: Response to Reviewers_Submission.docx

    Attachment

    Submitted filename: Response to Reviewers_v2.docx

    Data Availability Statement

    All relevant data and code are found within the manuscript and its Supporting Information files. They are also made available at the following Github repository: https://github.com/Sheksta/bbtv-abc


    Articles from PLoS Computational Biology are provided here courtesy of PLOS

    RESOURCES