Skip to main content
PLOS One logoLink to PLOS One
. 2020 Mar 12;15(3):e0230254. doi: 10.1371/journal.pone.0230254

Forecasting severe grape downy mildew attacks using machine learning

Mathilde Chen 1,*, François Brun 2, Marc Raynal 3, David Makowski 4,5
Editor: Andrea Luvisi6
PMCID: PMC7067461  PMID: 32163490

Abstract

Grape downy mildew (GDM) is a major disease of grapevine that has an impact on both the yields of the vines and the quality of the harvested fruits. The disease is currently controlled by repetitive fungicide treatments throughout the season, especially in the Bordeaux vineyards where the average number of fungicide treatments against GDM was equal to 10.1 in 2013. Reducing the number of treatments is a major issue from both an environmental and a public health point of view. One solution would be to identify vineyards that are likely to be heavily attacked in spring and then apply fungicidal treatments only to these situations. In this perspective, we use here a dataset including 9 years of GDM observations to develop and compare several generalized linear models and machine learning algorithms predicting the probability of high incidence and severity in the Bordeaux region. The algorithms tested use the date of disease onset and/or average monthly temperatures and precipitation as input variables. The accuracy of the tested models and algorithms is assessed by year-by-year cross validation. LASSO, random forest and gradient boosting algorithms show better performance than generalized linear models. The date of onset of the disease has a greater influence on the accuracy of forecasts than weather inputs and, among weather inputs, precipitation has a greater influence than temperature. The best performing algorithm was selected to evaluate the impact of contrasted climate scenarios on GDM risk levels. Results show that risk of GDM at bunch closure decreases with reduced rainfall and increased temperatures in April-May. Our results also show that the use of fungicide treatment decision rules that take into account local characteristics would reduce the number of treatments against GDM in the Bordeaux vineyards compared to current practices by at least 50%.

Introduction

Downy mildew is one of the most severe diseases of grapevines (Vitis vinifera). Plasmopara viticola, the pathogen responsible of this disease, is a heterothallic oomycete [1]. In autumn, winter eggs, called oospores, are produced. They overwinter in infected leaves, fallen to the vineyard ground [2]. In spring, they germinate as macrosporangium, which releases zoospores [2,3]. Zoospores are disseminated through rain splashes to young vines organs (leaves, flowers or young bunches), where they germinate and penetrate through stomata, causing primary infection after 7 to 10 days of incubation [3]. Sporangia, borne by sporangiophores, then emerge from affected host tissues. They are spread with wind and rain splashes to green parts of grapes, where they release new zoospores from asexual reproduction, which can then infect healthy tissues (secondary infection). P. viticola damages on flowers and bunches lead to yield losses [2]. Leaf damage also induces a reduction in the sugar content, which induces a decline in the grapes quality [4].

For economic and health crop reasons, applying fungicide treatment remains a very common practice to control grape downy mildew (GDM) [5]. Several resistant varieties were developed [6], but they are still not used for the production of most of the more profitable wines, due to appellation regimes’ specifications. Many microorganisms and botanicals were tested as an alternative to synthetic chemical fungicides [710], but most of them have not yet been developed for commercial purposes [5], mainly because of their low and unsteady efficacy in the vineyards.

Currently, many growers start spraying fungicides early in spring, and fungicide applications are then frequently repeated, about every two weeks in the Bordeaux region, a major vine producing area [11]. A large number of fungicide treatments are therefore applied over the course of the growing season, with implications for people living around vineyards health [12], grape growers’ health [1315], air [16], soil [17] and water [18] contamination, and entailing high production costs [19]. In 2013, an average of 18.5 pesticide sprays were applied in Bordeaux vineyards, 52% of which were used to control GDM [11].

Predictions of disease outbreak can assist farmers in decision-making for crop protection. Such predictions can be integrated into Decision Support Systems [20] or warning systems [21]. They could potentially be used in spring to estimate the incidence and severity of GDM at the end of the season. Based on model forecasts, growers could trigger fungicide applications only when the risk of GDM is high, avoiding unnecessary sprays. Models based on weather inputs can also be used to deal with more long term issues, for example to forecast GDM outbreaks under different climate conditions and assess the potential severity of the disease in the future.

In the past, several approaches were used to predict GDM epidemics. Historically, statistical models were first developed in Germany [22], France [2325], Switzerland [26], Italy [27] and Australia [28]. Statistical models are simple to implement and they are able to predict complex systems, without explicating all functional mechanisms [29]. Mechanistic models differ from traditional statistical models by the need to translate every stage of the development cycle of an organism as functions; their structure makes explicit hypotheses about the biological mechanisms that drive infection dynamics [30]. This type of model relies on the estimation of many parameters and requires a good knowledge of the biological mechanisms and of the impact of different environmental variables on these mechanisms. Such models were also developed to dynamically predict primary infections of P. viticola [31]. Machine learning algorithms are also increasingly used in agriculture [32,33]. Machine learning models provide predictions of outcomes of complex mechanisms by relating outputs to inputs using very flexible algorithms [34]. On vine, Vercesi et al. [35] used such algorithms to estimate ability of P. viticola oospores to germinate. However, no study has been conducted to compare the performance of statistical and machine learning methods for predicting occurrence of high disease levels in vineyards, particularly for GDM.

Models performance depends on the model equations, on the accuracy of the parameter estimates, and on inputs used in the model. Since GDM development is influenced by several weather conditions such as rainfall and temperatures [3639], climatic variables are frequently used in predictive models [22,31,39,40]. Other input variables can be included in predictive models, such as crop cultivar or soil type. Field observations can also be used as input data for predictive models, e.g. in Savary et al. [41] and Delière et al. [42]. However, data collection can be time consuming and costly.

In this study, we assess the ability of statistical models and machine learning algorithms to predict the occurrence of high GDM levels at the end of the season, which has never been done before. More specifically, we develop different statistical and machine learning models to predict the risk of high GDM incidence or severity on leaves and bunches at the end of the season, in untreated Bordeaux vineyards. The models tested are generalized linear models [43], regularized regression models (LASSO) [44], and two machine learning algorithms, i.e. gradient boosting [45] or random forests [46]. These models are implemented with three sets of inputs, namely field scouting observations, climate inputs, and both types of inputs. Model performances are assessed by cross-validation using a large dataset of observations collected during 9 years in 153 vineyards in the Bordeaux regions. The most accurate models are then used to evaluate the potential reduction in the number of fungicide treatments achieved when model outputs are used to trigger fungicide applications. We also use our models to determine the impact of temperature increases and changes in precipitation on future GDM outbreaks.

Material and methods

Data

Grape downy mildew (GDM) incidence and severity data were collected in several vineyards located in the Bordeaux region by the French vine and wine extension service (Institut Français de la Vigne et du Vin, IFV). Data have been collected with the agreement of the winegrowers. Different vineyards were included in the dataset each year. A site-year is a unique combination of vineyard site and year. The total number of site-years included in the dataset was 153. Each monitored site-year consisted in an untreated row of vines including from 6 to 165 plants and further referred to as “plot”. Each untreated row was surrounded by two other untreated rows, to ensure that they were not unintentionally sprayed with fungicide. In the monitored central row, weekly visual inspections were performed on grape stocks, leaves and bunches in order to assess disease incidence and severity. The proportion of vine stocks, leaves and bunches displaying symptoms (incidence) and the average percentages of leaves and bunches necrotic area (severity) were recorded. In total, 1 to 19 visual inspections were conducted in each vineyard. Observations were conducted from budburst (i.e. week 10, early March) until at least bunch closing (i.e. week 29, mid-late July) or stopped when the proportions of infected vine stocks and bunches were close to 100%. For each plot, the level of GDM (i) leaf incidence, (ii) bunch incidence, (iii) leaf severity and (iv) bunch severity at the end of the season were derived from the last epidemiologic observations. Among the plots, the median GDM incidence in the dataset was 16.7% and 21.4% on leaves and bunches, respectively. The median GDM severity reached 3.0% and 4.0% on leaves and bunches, respectively. Contrasted outbreaks were observed between 2010 and 2018 seasons. For example, the disease incidence on leaves ranged from 0 to 54.8% in 2011 and ranged from 2.2 to 100% in 2018 (Fig 1A).

Fig 1. Grape downy mildew (GDM) outbreaks temporal variability in monitored untreated plots.

Fig 1

(A) Last value of GDM incidence and severity on leaves and bunches after bunch closing in monitored plots. The color of the point represents the health status of each plot at the end of the season (green = last observation < median; red = last observation ≥ median). (B) Imputed disease onset dates in monitored plots. Median contamination levels and median disease onset date are represented by vertical dashed lines in panel (A) and panel (B), respectively. In both panel, the lower and upper hinges of the boxes correspond to the first and third quartiles (the 25th and 75th percentiles) and horizontal segment represent the range between min and max values.

Dates of GDM onset were estimated by analyzing incidence data on vine plants, i.e. the proportion of infected plants in a plot. GDM onset was defined as the first week in which the proportion of infected vines stocks exceeded 1%. The number of weeks between the first week of the year and this date was estimated for each plot by survival analysis in order to deal with censored data [47]. Censored GDM onset dates were found in 95 plots (right censored: 38.9%, left censored: 38.9% and interval censored: 22.1%). Censored data were imputed by a semi-parametric survival model [48] including the average rainfall between March and June as covariate [47]. In plots where few observations were collected, the imputed dates of disease onset were close to the median onset date of the dataset, which corresponded to week 23, i.e. early-mid June. Between 2010 and 2018, GDM onset date varied across years. For example, less than 25% on the monitored plots displayed symptoms by week 23 (i.e. early-mid June) in 2010, 2011 and 2016, whereas 100% of the plots were infected at this date in 2014, 2015 and 2018 (Fig 1B).

Climatic variables were computed from the SAFRAN database, produced by the French national meteorological service (Météo-France). SAFRAN data covers France in the form of an 8 by 8 km grid [49]. For each plot and each year, mean amount of rainfall (in mm day-1) and mean temperature (in °C) were calculated in March, April, May and June from the weather data of the grid cell including the considered plot (Fig 2). Contrasted climatic conditions were observed between years. For example, the average precipitation measured in March 2012 and in March 2018 was 0.7 mm day-1 and 3.8 mm day-1, respectively (Fig 2A). At these dates, the average temperature was 11.0°C and 8.5°C (Fig 2B). Data used in this study are summarized in Supporting information (see S1 Data).

Fig 2. Climatic variability in March, April, May and June during the 2010–2018 period in the 153 untreated monitored plots.

Fig 2

(A) Mean monthly precipitation amount (in mm day-1) and (B) mean monthly temperature (in °C). Median precipitation amount and median temperature are represented by vertical dashed lines in panel (A) and (B), respectively. In both panel, the lower and upper hinges of the boxes correspond to the first and third quartiles (the 25th and 75th percentiles) and horizontal segment represent the range between min and max values.

Models predicting occurrence of high levels of disease incidence and severity

Different models were considered to calculate the probability to reach a high level of contamination at bunch closing stage, i.e. higher than the median value reported in the dataset. Four types of output were considered in turn, i.e., incidence on leaves, incidence on bunches, severity on leaves and severity on bunches. For each output, four types of models were developed in the version 3.5.1 of the statistical software R [50] and compared to predict occurrence of high level of GDM (i.e., higher than median): generalized linear models (binomial-logit models), further denoted as GLM [43], binomial LASSO regression [44], random forest [46] and gradient boosting [45]. Depending on the considered output, the predictions returned by these models correspond to predicted probabilities of high levels of incidence or severity of GDM on leaves or on bunches.

Three binomial-logit models were developed: one model including a single input i.e., disease onset date, one model taking into account monthly average precipitation and temperature in March, April May and June, i.e. eight weather inputs, and one model including both disease onset date and weather variables as inputs, i.e. nine inputs variables. The models were fitted to data using the glm R function. The most relevant inputs of the last two models were selected using a stepwise procedure based on the Akaïke criterion (AIC) implemented with the R function stepAIC from the MASS R package, version 7.3 [51].

LASSO regression, random forest and gradient boosting were first fitted using weather inputs only, and then using both weather inputs and the GDM onset date. LASSO regression (implemented here with a logit link) is a special type of regression model fitted using a penalty term shrinking regression coefficients towards zero. Here, this model was fitted with the version 2.0 of the R package glmnet [52] and the most relevant model inputs were selected by cross-validation using the R function cv.gmlnet. Random forest and gradient boosting are ensemble learning algorithms; they are based on the combination of multiple learning simple algorithms to improve prediction performance. Random forest is a bagging method developed by Breiman [46]. The algorithm builds an ensemble of independent deep decision trees (500 in our study) from bootstrapped samples. Deep trees have the properties to have low bias but high variance, and when combined together, produce an output with lower variance. In the case of gradient boosting, an ensemble of successive shallow trees are built (100 in our study), such as each new tree predicts the residuals of the previous one. Random forest and gradient boosting were fitted using the R packages ranger, version 0.11.2 [53] and gbm, version 2.1.5 [54] to predict occurrence of high levels of GDM.

Model assessment and sensitivity analysis

The ability of the fitted models to predict occurrence of high level of GDM was assessed by year-by-year cross validation using the area under the ROC curve as a measure of classification performance [55,56]. Here, we used a year-by-year cross-validation to account for the strong “year” effect on the disease intensity. As data collected during the same year in different plots are not independent, it was safer to remove all the data collected a given year at each cross-validation step. This is equivalent to a group-wise cross-validation based on 9 groups corresponding to the 9 years of data included in the dataset.

A separate ROC analysis was conducted for each output and each model separately. The 153 plots were divided into two subgroups depending on whether the final disease observation (Y) was above the regional median value (Yt), computed for the 2010–2018 period, or less than or equal to this threshold. The probability of high level of GDM, i.e. Y > Yt, was then estimated by each model for each plot in each subgroup. Let I denote the prediction of a given model in a given plot, i.e., the predicted probability of high level of GDM in a given plot. Each value of I was compared with a decision threshold It. The results were used to determine the true positive proportion (TPP) (number of plots with I > It, in the subgroup defined by Y > Yt divided by the total number of plots in this subgroup) and the true negative proportion (TNP) (number of plots with I ≤ I t, in the subgroup defined by Y ≤ Yt divided by the total number of plots in this subgroup). TPP and TNP are estimates of P(I > It | Y > Yt) and P(I ≤ It | Y ≤ Yt) and are referred to as “sensitivity” and “specificity”, respectively. The ROC curve of model is a graphical plot of sensitivity against 1-specificity. The values of TPP and TNP are calculated by allowing the decision threshold (It) to vary over the range of its possible values. A ROC curve that passes close to the point (0, 1) shows that the model gives satisfactory results in terms of sensitivity and specificity. A choice can thus be achieved by using the model with an appropriate choice of the decision threshold. A ROC curve that passes close to a straight line joining the points (0, 0) and (1, 1) shows that the model is non-informative (i.e. no better than a random decision). This approach is common in phytosanitary studies when the objective is to discriminate between low and high levels of infection [5759]

A useful summary of the overall accuracy of a model is the area under the ROC curve (AUC). AUC is expected to be equal to 0.5 for a non-informative model, and to 1 for a perfect model. TPP and TNP values were used to compute the sensitivity, specificity, and AUC for each output variable and each model. The computations were performed using the pROC [60] package of the version 3.5.1 of R statistical software.

For machine learning algorithms, inputs were ranked according to their importance. For random forests, the importance corresponds to the increase in the misclassification frequency after random permutations of the values of each input. The variables leading to the largest average increase of misclassification frequency are considered most important [46]. For gradient boosting algorithms, the average improvement of the loss function (generally the MSE for regression or the deviance for classification) made by each variable is computed. The variables with the largest average improvement, i.e. the relative influence, are considered most important [45].

A sensitivity analysis was conducted to explore the potential consequences of an increase of temperature and/or of a change in rainfall. The best model among the models including weather input variables (i.e., the model with the highest AUC) was used to compute the probability of high levels of GDM according to different climatic scenarios. The original temperature data were increased by +1°C, 2°C, 3°C or 4°C, successively. Rainfall was increased by +5%, 10% or 15%, and then decreased by the same levels. All combinations of temperature and rainfall changes were considered. The selected model was run using each climate scenario.

A graphical summary of the modeling framework is presented in Fig 3.

Fig 3. Illustration of the modeling framework implemented in this study.

Fig 3

This modeling framework included 6 steps. (I) Extraction of epidemiological and climatic data from two different databases (model input and outputs are written in bold and in italic, respectively). (II) GDM onset date imputation by a semi-parametric survival model. (III) Models fitting. (IV) Models assessment based on a ROC analysis; models with higher area under the ROC curve (AUC) were selected. (V) Sensitivity analysis of the model outputs to weather inputs. (VI) Estimated reduction in GDM fungicide application obtained by delaying the date of the first fungicide application compared to current practices in the Bordeaux vineyards; calculations were based on GLM model predictions using dates of appearance of GDM as inputs.

Number of fungicide treatments

The probability of severe attack was estimated for each plot as a function of the GDM onset date using the GLM model. We calculated the number of fungicide treatments against GDM resulting from the use of model output (i.e., predicted probability of high GDM) to trigger treatments. To do so, we assumed that fungicide treatments were applied only in plots where the probability of severe attack was higher than a predefined threshold and that vine growers apply fungicide every two weeks after the first treatment until late August (week 35) [61]. Several probability thresholds in the range 0–1 were considered successively, and the resulting number of treatments was calculated for each threshold and each plot. The average number of treatments over plots was finally calculated for each probability threshold. This number was compared to the mean number of GDM fungicide treatments applied in 2010 and 2013 according to the results of a survey conducted by the French Ministry of Agriculture’s Statistics and Prospective Service (SSP) [11].

Results

Ability of the models to distinguish between high and low levels of disease

The best models according to AUC are those including the full set of inputs i.e., both climate variables and the date of disease onset (Fig 4). The highest AUC (0.86) is obtained with gradient boosting for incidence on leaves. For the three other outputs, i.e. leaves severity, bunches incidence and bunches severity, the AUC values of the best models were between 0.78 and 0.85.

Fig 4. Area under the ROC curve (AUC) of several models used to predict occurrence of high level of GDM incidence and severity on leaves and bunches.

Fig 4

Higher AUC represents higher model performance.

Model performances are decreased when climate variables are omitted for predicting incidence and severity, but the AUC always remains very close to or slightly higher than 0.75 for all outputs. The omission of the date of disease onset from the set of inputs has a strong impact on the performance of the models. However, when this variable is omitted, the AUC of best models remains higher than 0.70 for incidence on bunches and is even higher than 0.75 for incidence on leaves (AUC = 0.77). The decrease of AUC resulting from the omission of the date of disease onset is stronger for severity on leaves and for severity on bunches for which the best AUC values are lower than 0.70 (0.67 and 0.65, respectively) (Fig 4).

Considering the models including climate inputs only (second row of Fig 4), gradient boosting shows the best performance for three of the four outputs (incidence on leaves, incidence on bunches, severity on bunches). The AUC values of random forest are, however, very close. The other types of models (LASSO, GLM with and without input selection) show contrasted results depending on the output. For example, LASSO has a high value of AUC for incidence on leaves but gets a very low value for severity on leaves. Very variable AUC values are also obtained for GLM.

Considering the models including climate inputs plus date of disease onset (third row of Fig 4), gradient boosting has the highest AUC for two outputs, namely incidence and severity on leave. Random forest, LASSO and GLM without selection also show good performance. Stepwise input selection substantially decreases AUC values revealing that this selection procedure was unable to select relevant inputs. Better results are thus obtained with GLM without input selection.

Importance of the model inputs

The inputs are ranked according to their importance in Fig 5 (see also S1, S2 and S3 Figs).

Fig 5. Importance of the inputs used in random forest and gradient boosting models predicting the risk of high GDM severity on leaves at bunch closing stage.

Fig 5

Models presented in A and B include all inputs (date and climate) and models presented in C and D include climate inputs only. The importance metric reflects the gain in the model performance resulting from the use of each input.

The date of GDM onset is ranked first for all outputs (Figs 5A and 5B and S1, S2 and S3). This input has thus a stronger impact on model classifications than all the considered climate inputs. The difference of importance between the date of disease onset and the most important climate input is stronger for random forest than for gradient boosting but the date of disease onset is ranked first with both approaches.

The most important climate inputs are those related to precipitation in late spring, i.e. mean precipitation in May (for random forest and gradient boosting) and in June (for gradient boosting). The least important variables are the average temperature in March for random forest and mean temperature in May-June for gradient boosting. Mean temperature in March is also among the least influential variables for gradient boosting.

Influence of temperature and precipitation on the probability of high disease severity

The gradient boosting algorithm is selected here because, in most cases, this type of model was the most accurate among the tested models including climate inputs (Fig 4). This model is thus used here to analyze the sensitivity of the probability of high disease severity on leaves to monthly temperature and precipitation changes.

The results are presented in Fig 6. Each graphic in Fig 6 shows the effect of precipitation (from -15% to +15%) and temperature (from +1°C to 4°C) change between the months of March and June. The effect of a fixed level of temperature increase during a given month, while keeping all other temperature variables unchanged is represented month-by-month in S4 Fig. The probability of high leaves severity shows an increasing trend as a function of precipitation increase (Fig 6). If we consider the precipitation change, the median probability of high severity increases from 0.57 to 0.79 for +15% of precipitation and +0% of temperature (Fig 6). The first and third quartiles computed over the set of vineyard plots follow a similar increasing trend. Symmetrically, the probability of high severity decreases when the level of precipitation is reduced. Thus, a reduction of -15% of precipitation between March and June decreases the probability of high severity from 0.57 to 0.31. The strong sensitivity of the probability of high disease severity on leaves to precipitation is consistent with the increasing trend shown by the partial dependence plots displayed in Fig 7A, B for precipitation in May and June.

Fig 6. Probability of high severity on leaves according to different precipitation variations and to different levels of temperature increase between March and June.

Fig 6

Each graphic shows the effect of precipitation change (from -15% to +15%) for a fixed level of temperature increase (from +0°C to +4°C) on predicted probability that GDM severity on leaves will be higher than regional median at the end of the season. Probabilities are forecasted by a gradient boosting algorithm that includes all climatic features. Each boxplot represents the distribution of the probability values over the vineyard plots of our dataset; the shaded boxplot corresponds to initial precipitation and temperatures (precipitation and temperature kept unchanged compared to actual conditions) and the median probability obtained with this scenario is indicated by a red dotted line. The lower and upper hinges of the boxes correspond to the first and third quartiles (the 25th and 75th percentiles) and vertical segment represent the range between min and max values.

Fig 7. Partial dependence plots of the relationships between probability of high GDM severity on leaves and the four most important climate variables of gradient boosting (according to Fig 5D).

Fig 7

Each graph represents the marginal effect of one variable on the probability computed by the gradient boosting model. (A) Partial dependence plot for the average amount of rainfall in May (in mm/day). (B) Partial dependence plot for the average amount of rainfall in June (in mm/day). (C) Partial dependence plot for the mean temperature in April (in °C). (D) Partial dependence plot for the mean temperature in June (in °C).

Overall, the temperature effect is smaller. Increasing temperature in April and May tends to have negative effect on the probability of high severity (S4 Fig). Thus, in May, the probability of high severity decreases from 0.57 (if the precipitation is kept unchanged) to 0.42 (at +3°C). Even in case of a +15% increase of precipitation, the probability of high severity in May does not exceed its original value at +3°C. The effect of temperature in June is positive but small (the probability increased from 0.57 to 0.76 at +4°C). The sensitivity of the probability of high disease severity to temperature is consistent with the partial dependence plots obtained for temperature (Fig 7C and 7D); these plots reveals a decreasing trend in April and an increasing trend in June but with a plateau covering a large range of temperatures.

Potential reduction of GDM treatment

A late occurrence of GDM on vines resulted in a decrease of the probability of high incidence and severity. For example, the probability of high GDM severity on leaves computed by the GLM was higher than 0.75 in case of early disease onset (before week 22, i.e. late May, early June) but lower than 0.5 when disease onset occurred after week 24 (95%IC = [23.2; 25]), i.e. mid-June (Fig 8). The probability to reach high GDM incidence on leaves estimated by the GLM decreases from 0.92 (95%IC = [0.86; 0.99]) for a disease onset at week 19 (late May) to 0.5 (95%IC = [0.39; 0.61]) for a disease onset at week 24 (late June) (Fig 8). A similar decreasing trend was obtained with the other models, in particular with gradient boosting (Figs 8 and S5).

Fig 8. Response of probability of high severity on leaves to date of disease onset estimated with the GLM and its 95% confidence interval (in green), and partial dependence plot obtained with the gradient boosting algorithm including climate inputs and date of disease onset (in red).

Fig 8

Median, minimum, 1st and 3rd quartiles, and maximum of observed onset dates are represented by a dot and four crosses, respectively.

The probability of high disease severity and incidence computed as a function of disease onset date can be used to trigger fungicide treatments. We consider here a decision rule in which the first treatment is applied only (i) when disease symptoms are observed and (ii) when the probability of high disease incidence/severity estimated as a function of the date of disease onset exceeds a certain threshold. With this decision rule, when the threshold is zero, the first treatment is applied in a plot as soon as some disease symptoms are observed in that plot. If the threshold is set to a value higher than zero, only the plots exceeding the corresponding probability value will receive a treatment. The resulting average numbers of treatments are reported in Fig 9A. In this figure, the results are obtained with the probability of high disease severity on leaves, but very similar results are obtained with other response variables, i.e. probability of high disease incidence on leaves and on bunches, or the probability of high disease severity on bunches (see S6 Fig).

Fig 9. Impact of the model-based decision rule on GDM pesticide use in Bordeaux vineyards as a function of a predefined triggering probability threshold (probability of high GDM severity on leaves).

Fig 9

(A) The black curve indicates the average numbers of fungicide treatments in the vineyard plots of our dataset computed while assuming that the first treatment is triggered only when the GLM probability of high severity exceeds the value given in the x-axis. Blue line represents the number of treatments for threshold = 0, i.e. when the first treatment is applied in all plots as soon as GDM symptoms are detected. Red and orange lines correspond to the average numbers of treatments recorded by the SSP in 2013 and 2010, respectively. (B) Potential reduction of GDM treatments compared to other treatment scenarios, represented by the color of each curve, and computed according to the predefined triggering probability threshold. Blue curve represents the reduction induced by the application of the decision rules compared to the strategy where first treatment is triggered at disease onset. Orange and red lines represent the treatment reduction compared to the results of SSP study in 2010 and 2013, respectively.

According to this rule, triggering the first spray as soon as the model output exceeds 0 leads to 5.1 treatments against the disease, in average. Fig 9A shows that the number of treatments decreases substantially when the considered probability threshold increased. For example, triggering first fungicide treatment when the probability of high GDM severity on leaves exceeds 0.5 leads to 3.7 treatments in average, which corresponds to a reduction of 1.4 application in average compared to a systematic application at disease onset. The setting of the probability threshold at 0.75 further reduces the average number of applications, i.e. to 1.5 treatments.

The corresponding percentage of treatment reduction are showed in Fig 9B. With probability thresholds of 0.5 and 0.75, the number of fungicide applications against GDM was lower by 27.2% and 70.2% compared to a systematic fungicide application at disease onset, respectively (Fig 9B).

This potential reduction of GDM treatments is even more important when compared to the current practices observed in Bordeaux vineyards. According to the results of a survey conducted by the SSP, 7.9 and 10.1 fungicide treatments against GDM were applied in average in Bordeaux vineyards in 2010 and 2013, respectively [61]. Compared to the average number of treatment values obtained in 2010 in the Bordeaux region, it is possible to reduce the number of treatments applied in 2010 by 53.3% and 80.9% by using our model-based decision rule with a probability threshold of 0.5 and 0.75, respectively. These levels of reduction reach 63.5% and 85.1% (with a probability threshold of 0.5 and 0.75, respectively) when considering the average number of treatments against GDM sprayed in 2013 in the Bordeaux region (Fig 9B).

Discussion

In our analysis, we were able to relate the risk of high GDM incidence and severity on leaves and on bunches to the disease onset date and to climatic conditions in spring. Our study shows that the date of appearance of GDM has a greater influence on GDM infection levels than climate variables. An early onset date, i.e. before the end of May or early June, leads to a higher probability of a strong attack, while later infections, i.e. after the end of June, are associated with a lower risk. This result is consistent with those of Kennelly et al. [62], who showed that late infection reduced the severity of GDM on bunches due to the development of ontogenic berry resistance. In addition, GDM is a polycyclic disease, which means that early first infection increases the number of asexual cycles and the infection rates.

The reason for the strong influence of the date of disease onset probably lies in the fact that this variable already integrates many factors, in particular climatic factors. Indeed, Chen et al. [47] showed that the date of disease onset depends on spring precipitation. This is consistent with the fact that, among the climatic factors, we found that spring precipitation was the most influential. More generally, the climate conditions at the end of spring, and more particularly in May, were found to be decisive for the development of GDM in the Bordeaux vineyards.

Our analysis showed that a decrease in spring precipitation leads to a reduction in the risk of GDM. The development of oospores, the main source of inoculum of primary infection, is inhibited by dry periods of spring [38]. Precipitation is also necessary for the dispersion and survival of GDM zoospores that cause infection in grape leaves, bunches and shoots [3]. The effect of temperature on GDM is more complex and depends on the period. Our results indicate that a temperature increase in late spring, i.e. June, tends to favor a high incidence or severity of GDM. On the contrary, an increase in temperature in April and May tends to reduce the risk of a serious attack of GDM. The positive effect of an increased temperature in June on disease incidence and severity is consistent with the results of Salinari et al. [63] and Rossi et al. [64] who found that high temperature accelerates oospores germination.

The results of our sensitivity analysis suggest that a climate change scenario characterized by a decrease in precipitation and an increase in temperature in the spring reduces the risk of a serious attack on GDM. These results are consistent with those of Launay et al. [65], who show that a lower risk of infection with GDM can be expected in regions with oceanic climatic conditions, such as Bordeaux vineyards, in case of temperature increase and reduced leaf moisture duration. On the other hand, according to Salinari et al. [63], increased air temperature and reduced precipitation in Bordeaux vineyards would advance the first symptoms of GDM, which could lead to a more serious infection due to the polycyclic nature of the pathogen. It should be noted that Salinari et al. [63] considered a climate scenario in which the decrease in precipitation was insufficient to reduce the risk of GDM. Potentially, our models could thus be used to adapt grape disease management in different contexts of climatic change [66]. However, several factors are not taken into account by our statistical and machine learning tools. The high potential of adaptability of plants pathogen under new climatic conditions is not considered in our study [67]. Climatic inputs were limited to aggregated precipitation and temperature conditions during spring only. Other climatic variables such as solar radiation, moisture, hydro-thermal time, which are used in other epidemiological models [31,40], are omitted in our models. However, our approach is simple to implement as soon as disease observations are available. It could be applied to other pathogens for which such data are collected [68] like septoria leaf blotch of wheat [69]. Our models can be easily updated with new observational data, providing an additional level of confidence to end users in terms of model accuracy [70].

Our models could also be used for another type of application; they could be integrated in decision support systems to reduce the number of fungicide treatments in Bordeaux vineyards, where more than 10 treatments are commonly sprayed to control GDM [11]. Forecasts of our models could be used to trigger treatment when the predicted risk of GDM at bunch closure is higher than a certain threshold, in order to avoid unnecessary fungicide treatments. Based on our model assessment, the most accurate model is the one including all features, i.e., both date of disease onset and climate inputs. However, this model poses some practical problems. As some of its climate inputs are not available before June, this model could be used in late spring only, i.e., after the start of the GDM epidemic. Since the omission of the climate variables from the set of inputs only slightly reduces the performance of the models, we recommend using the version of the GLM model including the date of disease onset as the only input variable. This model can be used as soon as the presence of GDM is detected and does not require any climate variables. A drawback of this approach is that it involves constant and frequent field scouting in order to determine GDM onset date. In the future, sensors on drones may become available for automatic disease detection [71,72].

In Bordeaux vineyards, we show that more than 50% of the treatments against GDM could be avoided compared to current practices if GLM forecast were used to trigger first fungicide application. This result is based on the decision rule established in our study from GLM predictions. Following this rule, the first treatment is triggered if the predicted probability of a severe attack is higher than a given threshold. In average, the application of this decision rule (with a probability threshold equal to 0.5) resulted in a 53% and 63% reduction of the number of treatments against the disease, compared to the average number of treatments reported in the surveys conducted in the Bordeaux regions in 2010 and 2013, respectively. Our results are consistent with several previous studies conducted in other major vine producing countries. Several warning systems were indeed developed to identify periods when conditions are favorable for GDM development (infection or sporulation), and to schedule necessary fungicide applications [21,73,74]. It was shown that the implementation of these tools could lead to a reduction in the number of fungicide applications compared to current practices. For example, the warning system developed by Caffi et al. [21] led to a median reduction of 54% of the number of fungicide applications, compared to standard schedules in Italian vineyards. Similar results were obtained by Pellegrini et al. [74] and Menesatti et al. [40].

The practicality of our approach should be assessed in close collaboration with farmers and agricultural extension services. The proposed approach requires field scouting, which is time and labor consuming for winegrowers, but in the future, observation costs could be strongly decreased by using automatic disease detection methods [75], such as image analysis [71] or airborne inoculum detection [76]. These recent techniques are likely to reduce the cost of symptom detection. Furthermore, although delaying fungicide treatments can be perceived as risky by some grape growers, very few experiments have been conducted to support this statement. The study by Menesatti et al. [40] challenged this perception and showed that a strategy based on triggering fungicide application at GDM onset contributed to effectively control the disease and to reduce the number of fungicide applications by almost half, compared to current control practices in Italian organic vineyards. However, as the number of available experimental studies is limited, new experiments covering a variety of agricultural and environmental conditions would be useful to assess more precisely the potential economic benefits and risks of this strategy. Crop insurance could also be offered to producers as a mean of covering the GDM risk associated with the use of decision support tools delaying the first fungicide treatments [77]. Our approach could also be assimilated as an insurance index to offer a market-based method of reducing the overuse or inefficient use of fungicides [77]. Although the systematic use of fungicide treatments currently appears to be an effective solution for controlling GDM, regulations on pesticide use may become more restrictive in the future, forcing grape growers to reduce their use of fungicides.

Supporting information

S1 Data. Models inputs.

(XLSX)

S1 Fig

Grape downy mildew (GDM) incidence data on leaves after bunch closing (A) and imputed disease onset dates (B) in 151 untreated plots. Median contamination levels and median disease onset date are represented by vertical dotted lines.

(PNG)

S2 Fig

Grape downy mildew (GDM) incidence data on bunches after bunch closing (A) and imputed disease onset dates (B) in 156 untreated plots. Median contamination levels and median disease onset date are represented by vertical dotted lines.

(PNG)

S3 Fig

Grape downy mildew (GDM) severity data on bunches after bunch closing (A) and imputed disease onset dates (B) in 152 untreated plots. Median contamination levels and median disease onset date are represented by vertical dotted lines.

(PNG)

S4 Fig. Probability of high severity on leaves according to different precipitation variations (lines) and to different levels of temperature increase between March and June (columns).

Each graphic shows the effect of precipitation change during a given period (from -15% to +15% between March and June, in April, in May, or in June) for a fixed level of temperature increase (from +0°C to +4°C between March and June) on predicted probability that GDM severity on leaves will be higher than regional median at the end of the season. Probabilities are forecasted by a gradient boosting algorithm that includes all climatic features. Each boxplot represents the distribution of the probability values over the vineyard plots of our dataset; the shaded boxplot corresponds to initial precipitation and temperatures (precipitation and temperature kept unchanged compared to actual conditions) and the median probability obtained with this scenario is indicated by a red dotted line. The lower and upper hinges of the boxes correspond to the first and third quartiles (the 25th and 75th percentiles) and vertical segment represent the range between min and max values.

(PNG)

S5 Fig. Response of probability of high incidence or severity on leaves or on bunches to date of disease onset estimated with the GLM and its 95% confidence interval (in green), and partial dependence plot obtained with the gradient boosting algorithm including climate inputs and date of disease onset (in red).

Median, minimum, 1st and 3rd quartiles, and maximum of observed onset dates are represented by a dot and four crosses, respectively.

(PNG)

S6 Fig. Number of fungicide treatments applied to control GDM in Bordeaux vineyards as a function of a predefined triggering probability threshold (probability of high GDM incidence or severity on leaves or on bunches).

The black curve indicates the average numbers of fungicide treatments in the vineyard plots of our dataset computed while assuming that the first treatment is triggered only when the GLM probability of high severity exceeds the value given in the x-axis. Blue line represents the number of treatments for threshold = 0, i.e. when the first treatment is applied in all plots as soon as GDM symptoms are detected. Red and orange lines correspond to the average numbers of treatments recorded by the SSP in 2013 and 2010, respectively.

(PNG)

Acknowledgments

We thank the French Vine and Wine Institute (Institut Français de la Vigne et du Vin) and its technical partners for collecting and providing us with access to their data and to the EPIcure web platform (https://www.vignevin-epicure.com). We also thank M. Vergnes for data collection coordination and C. Debord for database administration.

Data Availability

Data used in this study are summarized in Supporting information (see S1 Data).

Funding Statement

This work received funding from the French Ministry of Agriculture (CAS DAR, SMART-PIC project), the Institut Carnot Plant2Pro (project L-i-cite, see https://www.instituts-carnot.eu/en/carnot-institute/plant2pro) and from the Bordeaux Vine Council (CIVB). This work is part of the #DigitAg project (ANR-16-CONV-0004, see https://www.hdigitag.fr/en/who-are-we/). The work of D.M was partly funded by the CLAND Institute of Convergence (16-CONV-0003, see https://cland.lsce.ipsl.fr/). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript

References

  • 1.Wong FP, Burr HN, Wilcox WF. Heterothallism in Plasmopara viticola. Plant Pathol. 2001. August;50(4):427–32. [Google Scholar]
  • 2.Dubos B. Maladies cryptogamiques de la vigne—Les champignons parasites des organes herbacés et du bois de la vigne. Féret. 2002. 208 pages. (Collection des Usuels Féret de la Vigne et du Vin). [Google Scholar]
  • 3.Gessler C, Pertot I, Perazzolli M. Plasmopara viticola: a review of knowledge on downy mildew of grapevine and effective disease management. Phytopathol Mediterr. 2011;50(1):3–44. [Google Scholar]
  • 4.Jermini M, Blaise P, Gessler C. Quantitative effect of leaf damage caused by downy mildew (Plasmopara viticola) on growth and yield quality of grapevine “Merlot” (Vitis vinifera). Vitis. 2010;(49):77–85. [Google Scholar]
  • 5.Pertot I, Caffi T, Rossi V, Mugnai L, Hoffmann C, Grando MS, et al. A critical review of plant protection tools for reducing pesticide use on grapevine and new perspectives for the implementation of IPM in viticulture. Crop Prot. 2017. July;97:70–84. [Google Scholar]
  • 6.Lacombe T, Audeguin L, Boselli M, Bucchetti B, Cabello F, Chatelet P, et al. Grapevine European catalogue: Towards a comprehensive list. Vitis—J Grapevine Res. 2011;50(2):65–8. [Google Scholar]
  • 7.Dagostin S, Schärer H-J, Pertot I, Tamm L. Are there alternatives to copper for controlling grapevine downy mildew in organic viticulture? Crop Prot. 2011. July;30(7):776–88. [Google Scholar]
  • 8.Liang C, Zang C, McDermott MI, Zhao K, Yu S, Huang Y. Two imide substances from a soil-isolated Streptomyces atratus strain provide effective biocontrol activity against grapevine downy mildew. Biocontrol Sci Technol. 2016. October 2;26(10):1337–51. [Google Scholar]
  • 9.Zhang X, Zhou Y, Li Y, Fu X, Wang Q. Screening and characterization of endophytic Bacillus for biocontrol of grapevine downy mildew. Crop Prot. 2017. June 1;96:173–9. [Google Scholar]
  • 10.Ghule MR, Sawant IS, Sawant SD, Sharma R, Shouche YS. Identification of Fusarium species as putative mycoparasites of Plasmopara viticola causing downy mildew in grapevines. Australas Plant Dis Notes. 2018. May 14;13(1):16. [Google Scholar]
  • 11.Service de la Statistique et de la Prospection. Enquête Pratiques culturales en viticulture 2013. Nombre de traitements phytosanitaires. 2015 Aug;(28).
  • 12.Kab S, Spinosi J, Chaperon L, Dugravot A, Singh-Manoux A, Moisan F, et al. Agricultural activities and the incidence of Parkinson’s disease in the general French population. Eur J Epidemiol. 2017. March;32(3):203–16. 10.1007/s10654-017-0229-z [DOI] [PubMed] [Google Scholar]
  • 13.Viel JF, Challier B. Bladder cancer among French farmers: does exposure to pesticides in vineyards play a part? Occup Environ Med. 1995. September 1;52(9):587–92. 10.1136/oem.52.9.587 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Baldi I, Filleul L, Mohammed-Brahim B, Fabrigoule C, Dartigues J-F, Schwall S, et al. Neuropsychologic effects of long-term exposure to pesticides: results from the French Phytoner study. Environ Health Perspect. 2001;109(8):6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Baldi I, Gruber A, Rondeau V, Basler P, Brochard P, Fabrigoule C. Neurobehavioral effects of long-term exposure to pesticides: results from the 4-year follow-up of the PHYTONER Study. Occup Environ Med. 2011. February 1;68(2):108–15. 10.1136/oem.2009.047811 [DOI] [PubMed] [Google Scholar]
  • 16.Shunthirasingham C, Oyiliagu CE, Cao X, Gouin T, Wania F, Lee S-C, et al. Spatial and temporal pattern of pesticides in the global atmosphere. J Environ Monit JEM. 2010;12(9):1650–7. 10.1039/c0em00134a [DOI] [PubMed] [Google Scholar]
  • 17.Imfeld G, Vuilleumier S. Measuring the effects of pesticides on bacterial communities in soil: A critical review. Eur J Soil Biol. 2012. March 1;49:22–30. [Google Scholar]
  • 18.Morrissey CA, Mineau P, Devries JH, Sanchez-Bayo F, Liess M, Cavallaro MC, et al. Neonicotinoid contamination of global surface waters and associated risk to aquatic invertebrates: A review. Environ Int. 2015. January 1;74:291–303. 10.1016/j.envint.2014.10.024 [DOI] [PubMed] [Google Scholar]
  • 19.Butault J-P, Dedryver C-A, Gary C, Guichard L, Jacquet F, Meynard J-M, et al. Ecophyto R&D—Quelles voies pour réduire l’usage des pesticides. 2010;92. [Google Scholar]
  • 20.Rossi V, Salinari F, Poni S, Caffi T, Bettati T. Addressing the implementation problem in agricultural decision support systems: the example of vite.net®. Comput Electron Agric. 2014. January;100:88–99. [Google Scholar]
  • 21.Caffi T, Rossi V, Bugiani R. Evaluation of a warning system for controlling primary infections of grapevine downy mildew. Plant Dis. 2010;94(6):709–716. 10.1094/PDIS-94-6-0709 [DOI] [PubMed] [Google Scholar]
  • 22.Hill GK. Simulation of P. viticola oospore-maturation with the model SIMPO. Simul P Vitic Oospore-Matur Model SIMPO. 2000;23(4):7–8. [Google Scholar]
  • 23.Stryzik S. Modèle d’état potentiel d’infection: application a Plasmopara viticola. Association de Coordination Technique Agricole; 1983. [Google Scholar]
  • 24.Tran Manh Sung C, Strizyk S, Clerjeau M. Simulation of the Date of Maturity of Plasmopora viticola Oospores to Predict the Severity of Primary Infections in Grapevine. Plant Dis. 1990;74(2):120–4. [Google Scholar]
  • 25.Magnien C, Jacquin D, Muckensturm N, Guillemard P. MILVIT: un modèle descriptif et quantitatif de la phase asexuée du mildiou de la vigne. Présentation et premiers résultats de validation1. EPPO Bull. 1991. September 1;21(3):451–9. [Google Scholar]
  • 26.Blaise P, Gessler C. Vinemild: toward a management tool for grape downy mildew. Acta Hortic [Internet]. 1992. [cited 2019 Aug 8]; Available from: http://agris.fao.org/agris-search/search.do?recordID=US201301776106 [Google Scholar]
  • 27.Orlandini S, Gozzini B, Rosa M, Egger E, Storchi P, Maracchi G, et al. PLASMO: a simulation model for control of Plasmopara viticola on grapevine1. EPPO Bull. 1993;23(4):619–26. [Google Scholar]
  • 28.Magarey PA, Wachtel MF, Weir PC, Seem RC. A computer-based simulator for rational management of grapevine downy mildew (Plasmopara viticola). Plant Prot Q. 1991;6(1):29–33. [Google Scholar]
  • 29.Thakur AK. Model: Mechanistic vs Empirical In: Rescigno A, Thakur AK, editors. New Trends in Pharmacokinetics [Internet]. Boston, MA: Springer US; 1991. [cited 2019 Aug 8]. p. 41–51. (NATO ASI Series). Available from: 10.1007/978-1-4684-8053-5_3 [DOI] [Google Scholar]
  • 30.Lessler J, Cummings DAT. Mechanistic Models of Infectious Disease and Their Impact on Public Health. Am J Epidemiol. 2016. March 1;183(5):415–22. 10.1093/aje/kww021 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Rossi V, Caffi T, Giosuè S, Bugiani R. A mechanistic model simulating primary infections of downy mildew in grapevine. Ecol Model. 2008. Apr;212(3–4):480–91. [Google Scholar]
  • 32.Liakos KG, Busato P, Moshou D, Pearson S, Bochtis D. Machine Learning in Agriculture: A Review. Sensors. 2018. August;18(8):2674. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Jha K, Doshi A, Patel P, Shah M. A comprehensive review on automation in agriculture using artificial intelligence. Artif Intell Agric. 2019. June 1;2:1–12. [Google Scholar]
  • 34.Baker Ruth E., Jose-Maria Peña, Jayaratnam Jayamohan, Antoine Jérusalem. Mechanistic models versus machine learning, a fight worth fighting for the biological community? Biol Lett. 2018. May 31;14(5):20170660 10.1098/rsbl.2017.0660 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Vercesi A, Sirtori C, Vavassori A, Setti E, Liberati D. Estimating germinability ofPlasmopara viticola oospores by means of neural networks. Med Biol Eng Comput. 2000. January;38(1):109–12. 10.1007/bf02344698 [DOI] [PubMed] [Google Scholar]
  • 36.Rossi V, Giosuè S, Girometta B, Bugiani R. Influenza delle condizioni meteorologiche sulle infezioni primarie di Plasmopara viticola in Emilia-Romagna. Atti Giornate Fitopatol. 2002;263–70. [Google Scholar]
  • 37.Rouzet J, Jacquin D. Development of overwintering oospores of Plasmopara viticola and severity of primary foci in relation to climate*. EPPO Bull. 2003. December;33(3):437–42. [Google Scholar]
  • 38.Rossi V, Caffi T. Effect of water on germination of Plasmopara viticola oospores. Plant Pathol. 2007. December 1;56(6):957–66. [Google Scholar]
  • 39.Caffi T, Rossi V, Bugiani R, Spanna F, Flamini L, Cossu A, et al. A model predictiong primary infections of Plasmopara viticola in different grapevine-growing areas of Italy. J Plant Pathol. 2009;14. [Google Scholar]
  • 40.Menesatti P, Antonucci F, Costa C, Mandalà C, Battaglia V, la Torre A. Multivariate forecasting model to optimize management of grape downy mildew control. VITIS—J Grapevine Res. 2015. March 30;52(3):141–8. [Google Scholar]
  • 41.Savary S, Delbac L, Rochas A, Taisant G, Willocquet L. Analysis of Nonlinear Relationships in Dual Epidemics, and Its Application to the Management of Grapevine Downy and Powdery Mildews. Phytopathology. 2009. August;99(8):930–42. 10.1094/PHYTO-99-8-0930 [DOI] [PubMed] [Google Scholar]
  • 42.Delière L, Cartolaro P, Léger B, Naud O. Field evaluation of an expertise-based formal decision system for fungicide management of grapevine downy and powdery mildews: Decision system for management of grapevine mildews. Pest Manag Sci. 2015. September;71(9):1247–57. 10.1002/ps.3917 [DOI] [PubMed] [Google Scholar]
  • 43.Nelder JA, Wedderburn RWM. Generalized Linear Models. J R Stat Soc Ser Gen. 1972;135(3):370–84. [Google Scholar]
  • 44.Tibshirani R. Regression Shrinkage and Selection via the Lasso. J R Stat Soc Ser B Methodol. 1996;58(1):267–88. [Google Scholar]
  • 45.Friedman JH. Stochastic gradient boosting. Comput Stat Data Anal. 2002. February 28;38(4):367–78. [Google Scholar]
  • 46.Breiman L. Random Forests. Mach Learn. 2001. October 1;45(1):5–32. [Google Scholar]
  • 47.Chen M, Brun F, Raynal M, Makowski D. Timing of grape downy mildew onset in Bordeaux vineyards. Phytopathology. 2018. October 30;109(5):787–95. [DOI] [PubMed] [Google Scholar]
  • 48.Anderson-Bergman C. icenReg : Regression Models for Interval Censored Data in R. J Stat Softw [Internet]. 2017. [cited 2017 Dec 21];81(12). Available from: http://www.jstatsoft.org/v81/i12/ [Google Scholar]
  • 49.Le Moigne P. Description de l’analyse des champs de surface sur la France par le système SAFRAN [Internet]. 2002. Available from: https://www.researchgate.net/publication/235793825_Description_de_l'analyse_des_champs_de_surface_sur_la_France_par_le_systeme_SAFRAN
  • 50.R Core Team. R: A Language and Environment for Statistical Computing [Internet]. 2018. Available from: https://www.R-project.org/
  • 51.Venables WN, Ripley BD. Modern Applied Statistics with S. :504.
  • 52.Friedman J, Hastie T, Tibshirani R. Regularization Paths for Generalized Linear Models via Coordinate Descent. J Stat Softw. 2010;33(1):1–22. [PMC free article] [PubMed] [Google Scholar]
  • 53.Wright MN, Ziegler A. ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R. J Stat Softw. 2017. March 31;77(1):1–17. [Google Scholar]
  • 54.Ridgeway G. Generalized Boosted Models: A guide to the gbm package. :15.
  • 55.Barbottin A, Makowski D, Le Bail M, Jeuffroy M-H, Bouchard C, Barrier C. Comparison of models and indicators for categorizing soft wheat fields according to their grain protein contents. Eur J Agron. 2008. November;29(4):175–83. [Google Scholar]
  • 56.Makowski D, Tichit M, Guichard L, Van Keulen H, Beaudoin N. Measuring the accuracy of agro-environmental indicators. J Environ Manage. 2009. May;90:S139–46. 10.1016/j.jenvman.2008.11.023 [DOI] [PubMed] [Google Scholar]
  • 57.Makowski D, Taverne M, Bolomier J, Ducarne M. Comparison of risk indicators for sclerotinia control in oilseed rape. Crop Prot. 2005. June 1;24(6):527–31. [Google Scholar]
  • 58.Yuen J, Twengström E, Sigvald R. Calibration and verification of risk algorithms using logistic regression. Eur J Plant Pathol. 1996. November 1;102(9):847–54. [Google Scholar]
  • 59.Hughes, McRoberts, Burnett. Decision-making and diagnosis in disease management. Plant Pathol. 1999;48(2):147–53. [Google Scholar]
  • 60.Robin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez J-C, et al. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics. 2011. March 17;12(1):77. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Service de la Statistique et de la Prospection. Pratiques culturales en viticulture 2013—Réduire la dose, une pratique répandue pour les traitements fongicides [Internet]. 2016 Dec p. 8. (Agrest Primeur). Report No.: 343. Available from: http://agreste.agriculture.gouv.fr/IMG/pdf/primeur343.pdf
  • 62.Kennelly MM, Gadoury DM, Wilcox WF, Magarey PA, Seem RC. Seasonal development of ontogenic resistance to downy mildew in grape berries and rachises. Phytopathology. 2005;95(12):1445–1452. 10.1094/PHYTO-95-1445 [DOI] [PubMed] [Google Scholar]
  • 63.Salinari F, Giosuè S, Tubiello FN, Rettori A, Rossi V, Federico S, et al. Downy mildew (Plasmopara viticola) epidemics on grapevine under climate change. Glob Change Biol. 2006. July;12(7):1299–307. [Google Scholar]
  • 64.Rossi V, Caffi T, Melandri M, Pradolesi G. Aggiornamenti sulla peronospora della vite. 2005;(2):38–56. [Google Scholar]
  • 65.Launay M, Caubel J, Bourgeois G, Huard F, Garcia de Cortazar-Atauri I, Bancal M-O, et al. Climatic indicators for crop infection risk: Application to climate change impacts on five major foliar fungal diseases in Northern France. Agric Ecosyst Environ. 2014. December;197:147–58. [Google Scholar]
  • 66.Newbery F, Qi A, Fitt BD. Modelling impacts of climate change on arable crop diseases: progress, challenges and applications. Curr Opin Plant Biol. 2016. August 1;32:101–9. 10.1016/j.pbi.2016.07.002 [DOI] [PubMed] [Google Scholar]
  • 67.Garrett KA, Dendy SP, Frank EE, Rouse MN, Travers SE. Climate change effects on plant disease: genomes to ecosystems. Annu Rev Phytopathol. 2006;44:489–509. 10.1146/annurev.phyto.44.070505.143420 [DOI] [PubMed] [Google Scholar]
  • 68.Sine M, Morin E, Simonneau D, Cosnac GD, Escriou H. VIGICULTURES–An early warning system for crop pest management. 2010;9. [Google Scholar]
  • 69.Michel L, Brun F, Makowski D. A framework based on generalised linear mixed models for analysing pest and disease surveys. Crop Prot. 2017. April;94:1–12. [Google Scholar]
  • 70.Olatinwo R, Hoogenboom G. Chapter 4—Weather-based Pest Forecasting for Efficient Crop Protection In: Abrol DP, editor. Integrated Pest Management [Internet]. San Diego: Academic Press; 2014. [cited 2019 Aug 8]. p. 59–78. Available from: http://www.sciencedirect.com/science/article/pii/B9780123985293000051 [Google Scholar]
  • 71.Rieder R, Pavan W, Carré Maciel JM, Cunha Fernandes JM, Sarroglia Pinho M. A virtual reality system to monitor and control diseases in strawberry with drones: a project. In San Diego: Ames Daniel P., Quinn Nigel W.T. and Rizzoli Andrea E.; 2014. Available from: http://www.iemss.org/society/index.php/iemss-2014-proceedings [Google Scholar]
  • 72.Mahlein A-K. Plant Disease Detection by Imaging Sensors–Parallels and Specific Demands for Precision Agriculture and Plant Phenotyping. Plant Dis. 2015. September 1;100(2):241–51. [DOI] [PubMed] [Google Scholar]
  • 73.Madden LV, Ellis MA, Lalancette N, Hughes G, Wilson LL. Evaluation of a Disease Warning System for Downy Mildew of Grapes. Plant Dis. 2000. May;84(5):549–54. 10.1094/PDIS.2000.84.5.549 [DOI] [PubMed] [Google Scholar]
  • 74.Pellegrini A, Prodorutti D, Frizzi A, Gessler C, Pertot I. Development and evaluation of a warning model for the optimal use of copper in organic viticulture. J Plant Pathol. 2010;14. [Google Scholar]
  • 75.Martinelli F, Scalenghe R, Davino S, Panno S, Scuderi G, Ruisi P, et al. Advanced methods of plant disease detection. A review. Agron Sustain Dev. 2015. January 1;35(1):1–25. [Google Scholar]
  • 76.Thiessen LD, Keune JA, Neill TM, Turechek WW, Grove GG, Mahaffee WF. Development of a grower-conducted inoculum detection assay for management of grape powdery mildew. Plant Pathol. 2016. February;65(2):238–49. [Google Scholar]
  • 77.Norton M, Sprundel G-J van, Turvey CG, Meuwissen MPM. Applying weather index insurance to agricultural pest and disease risks. Int J Pest Manag. 2016. July 2;62(3):195–204. [Google Scholar]

Decision Letter 0

Andrea Luvisi

12 Dec 2019

PONE-D-19-22557

Forecasting severe grape downy mildew attacks using machine learning

PLOS ONE

Dear Dr. Mathilde Chen,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Both Reviewers indicate the need to improve the paper significantly.

We would appreciate receiving your revised manuscript by Jan 26 2020 11:59PM. When you are ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter.

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). This letter should be uploaded as separate file and labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. This file should be uploaded as separate file and labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. This file should be uploaded as separate file and labeled 'Manuscript'.

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

We look forward to receiving your revised manuscript.

Kind regards,

Andrea Luvisi

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at http://www.journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and http://www.journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

Additional Editor Comments (if provided):

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: No

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: No

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: This manuscript explores a machine-learning model for predicting Grape Downy Mildew incidence and severity with the hope that growers can use this information to reduce the number of fungicide applications. After testing several models and algorithms, the authors find that date of disease onset is the best predictor of GDM incidence and severity, along with some climatic factors. The authors present an in-depth look at constructing a model, but the practicality of the model they have produced is lost. They acknowledge that growers will most likely not adopt practices involving increased scouting or delayed spraying.

Spelling and grammar need to be corrected throughout.

113 and 114: Define site-year

115: 3 plants is very few. Is this data reliable? That is a very large range in plot size.

121: 1 to 57 is a very large range in number of site visits. Is this data reliable?

128: What do you mean by vine stocks?

147: Mislabeled as Fig. 1

226: Mislabeled as Fig. 2

272: Use of the term random forest twice

272: Mislabeled as Fig. 3

286: Mislabeled as Fig. 4

377: Use of attack is unusual – perhaps infection?

441: How this would reduce treatments by more than 50% is still unclear. This is a bold claim and I feel that it requires more detail so that readers can understand how the authors reached this number. It would be helpful to see an additional paragraph on how this number was derived.

Fig 1. Caption does not mention incidence and does not offer enough detail on analysis. Medium severity looks very close to 0 – figures should be formatted to better show the severity instead of using the same scale as incidence. There is no mention of severity in the text, only incidence (Line 126) – describe the results in the text.

Fig 2. Put labels in English. Lines are not distinguishable. What does this figure contribute to the manuscript?

Fig. 6. Provide details of statistical analysis. More detail required.

Fig. 7. This caption requires significantly more detail.

Fig. 9. This caption and figure are unclear to me yet are the basis for the claim that fungicide applications can be reduced by 50%. More detail and explanation is required so that we understand this claim fully, because it has big implications.

Reviewer #2: The authors study the ability of machine learning algorithms to predict the onset of GDM to reduce the number of fungicide treatments. The analysis is performed on an extensive data set spanning many years from the Bordeaux region. The problem is well motivated. The English is very good with only a few very minor mistakes. But, I have some open questions about the technical/ML approach. I believe the paper is at least a major revision.

Major:

Novelty is an important part of the publication process in general. Can the authors specifically comment on the novelty of this work? This is especially important because there are some cited works (36-39) that seem similar. It should not be up to the readers to infer the novelty of the work.

How is the GDM onset date encoded as an input/feature? It would be a date, which is a character string, it’s not obvious how a study would numerical-ize this for a machine learning problem, and the authors should state this.

I’m confused by the choice of ROC-AUC as a metric for measuring performance. The problem is a regression problem, and AUC is typically used for classification problems. Traditionally variations of squared error are used with regression problems. Is there a literature basis for using AUC for a regression problem? Currently, the set up for the experiment is this:

1) Regression analysis

2) Cast the problem as a classification problem by thresholding based on the median

3) Get AUC/ROC

Why do regression in the first place if this was the end goal?

I’m not sure I agree on thresholding the output based on greater/less than the median. This would only detect if a system is over/underpredicting GDM in a magnitude-less way. Isn’t the goal to get an accurate probability? You cannot measure error well if you do this.

Can the authors elaborate one Line 186 what they mean by ‘median’ here? Is this the median value of the plot as a whole, median of the year, etc.?

The outputs (incidence and severity) seem subjective because an expert would have to eyeball the spread of disease. I’m surprise the authors continued with some plots that had only 1 data inspection.

The abstract states the authors use a year-by-year cross validation, this isn't a conventional cross-validation method and needs to be explained. Yet, it is not in the methods section.

Very minor:

Methods section: It’s conventional to state the number of input values for your data set, the readers should not infer this from a later figure.

Line 53: I think this should be “Leaf” and not “Leaves”

Line 83: Is mechanistic models the right term for this? Probabalistic models still need to determine some function of a model.

Fig. 1: Box plots are helpful, but the authors should state the boundaries of the box. Is this the percentile? If so please state what they are.

Figure 2: Axes are in French.

Line 175: Technically, 100 trees is a parameter of the algorithm, this statement is not true in general.

Line 232: Sever attack -> severe attack

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: Yes: Alberto C. Cruz, Ph.D.

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files to be viewed.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2020 Mar 12;15(3):e0230254. doi: 10.1371/journal.pone.0230254.r002

Author response to Decision Letter 0


1 Feb 2020

Reviewer #1:

This manuscript explores a machine-learning model for predicting Grape Downy Mildew incidence and severity with the hope that growers can use this information to reduce the number of fungicide applications. After testing several models and algorithms, the authors find that date of disease onset is the best predictor of GDM incidence and severity, along with some climatic factors. The authors present an in-depth look at constructing a model, but the practicality of the model they have produced is lost. They acknowledge that growers will most likely not adopt practices involving increased scouting or delayed spraying.

Spelling and grammar need to be corrected throughout.

Answer: The proposed approach requires field scouting, which is time and labor consuming for winegrowers, but in the future, observation costs could be strongly decreased by using automatic disease detection methods (Martinelli et al., 2015), such as image analysis (Rieder et al., 2014) or airborne inoculum detection (Thiessen et al., 2016). These recent techniques are likely to reduce the cost of symptom detection. Furthermore, although delaying fungicide treatments can be perceived as risky by some grape growers, very few experiments have been conducted to support this statement. The study by Menesatti et al. (2015) challenged this perception and showed that a strategy based on triggering fungicide application at GDM onset contributed to effectively control the disease and to reduce the number of fungicide applications by almost half, compared to current control practices in Italian organic vineyards. However, as the number of available experimental studies is limited, new experiments covering a variety of agricultural and environmental conditions would be useful to assess more precisely the potential economic benefits and risks of this strategy. Crop insurance could also be offered to producers as a mean of covering the GDM risk associated with the use of decision support tools delaying the first fungicide treatments (Norton et al., 2016). Our approach could also be assimilated as an insurance index to offer a market-based method of reducing the overuse or inefficient use of fungicides (Norton et al., 2016).

This is now specified in the discussion section of the revised manuscript. See lines 540 to 557.

113 and 114: Define site-year

Answer: Different vineyards were included in the dataset each year. A site-year is a unique combination of vineyard site and year. This is now specified in the revised manuscript. See lines 124 to 125.

115: 3 plants is very few. Is this data reliable? That is a very large range in plot size.

Answer: Plots including less than 6 plants (only two plots) were excluded from the dataset. It is now mentioned line 127. The new dataset was used in the revised paper and the results were updated. New results are almost identical compared to the original results.

121: 1 to 57 is a very large range in number of site visits. Is this data reliable?

Answer: These values were updated. In total, 1 to 19 visual inspections were conducted in each vineyard. See line 133.

128: What do you mean by vine stocks?

Answer: In this study, dates of GDM onset were estimated by analyzing incidence data on vine plants, i.e. the proportion of infected plants in a plot. This is now clarified in the revised manuscript. See lines 144 to 145.

147: Mislabeled as Fig. 1

Answer: Fig 1 caption was rewritten in accordance with PLOS ONE authors’ guideline concerning figures’ caption. See lines 157 to 163.

226: Mislabeled as Fig. 2

Answer: Fig 2 caption was rewritten in accordance with PLOS ONE authors’ guideline concerning figures’ caption. See lines 175 to 180.

272: Use of the term random forest twice

Answer: The phrase was corrected. See line 335.

272: Mislabeled as Fig. 3

Answer: Fig 3 caption was rewritten in accordance with PLOS ONE authors’ guideline concerning figures’ caption. See lines 269 to 276.

286: Mislabeled as Fig. 4

Answer: Fig 4 caption was rewritten in accordance with PLOS ONE authors’ guideline concerning figures’ caption. See lines 323 to 324.

377: Use of attack is unusual – perhaps infection?

Answer: The phrase was rewritten. See line 458.

441: How this would reduce treatments by more than 50% is still unclear. This is a bold claim and I feel that it requires more detail so that readers can understand how the authors reached this number. It would be helpful to see an additional paragraph on how this number was derived.

Answer: In Bordeaux vineyards, we show that more than 50% of the treatments against GDM could be avoided compared to current practices if GLM forecast were used to trigger first fungicide application. This result is based on the decision rule established in our study from GLM predictions. Following this rule, the first treatment is triggered if the predicted probability of a severe attack is higher than a given threshold. In average, the application of this decision rule (with a probability threshold equal to 0.5) resulted in a 53% and 63% reduction of the number of treatments against the disease, compared to the average number of treatments reported in the surveys conducted in the Bordeaux regions in 2010 and 2013, respectively. Our results are consistent with several previous studies conducted in other major vine producing countries. This is now detailed in the discussion section of the revised manuscript. See lines 521 to 530.

Besides, more details on these results were added to the Result section of the revised paper. See lines 432 to 441 and the new Fig 9.

Fig 1. Caption does not mention incidence and does not offer enough detail on analysis. Medium severity looks very close to 0 – figures should be formatted to better show the severity instead of using the same scale as incidence. There is no mention of severity in the text, only incidence (Line 126) – describe the results in the text.

Answer: Fig 1 was improved. See new Fig 1, its caption lines 157 to 163. Information on median GDM severity was added. See lines 140 to 141.

Fig 2. Put labels in English. Lines are not distinguishable. What does this figure contribute to the manuscript?

Answer: Axis labels were translated in English. Fig 2 was revised to give more information on annual variability in the climatic dataset. See new Fig 2.

Fig. 6. Provide details of statistical analysis. More detail required.

Answer: Results presented in the original Fig6 were split between the new Fig6 and new S5 figure. Each graphic in new Fig 6 shows the effect of precipitation (from -15% to +15%) and temperature (from +1°C to 4°C) change between the months of March and June. The effect of a fixed level of temperature increase during a given month, while keeping all other temperature variables unchanged is represented month-by-month in S5 figure. Details were added to the caption of new Fig 6. See lines 363 to 372.

Fig. 7. This caption requires significantly more detail.

Answer: Details were added to the caption of Figure 7. See lines 384 to 389.

Fig. 9. This caption and figure are unclear to me yet are the basis for the claim that fungicide applications can be reduced by 50%. More detail and explanation is required so that we understand this claim fully, because it has big implications.

Answer: We consider here a decision rule in which the first treatment is applied only (i) when disease symptoms are observed and (ii) when the probability of high disease incidence/severity estimated as a function of the date of disease onset exceeds a certain threshold. The number of treatments depend on the selected probability threshold.

New Fig 9A shows that the number of treatments decreases substantially when the considered probability threshold increased. For example, triggering first fungicide treatment when the probability of high GDM severity on leaves exceeds 0.5 leads to 3.7 treatments in average, which corresponds to a reduction of 1.4 application in average compared to a systematic application at disease onset. The setting of the probability threshold at 0.75 further reduces the average number of applications, i.e. to 1.5 treatments.

The corresponding percentage of treatment reduction are showed in Fig 9B. With probability thresholds of 0.5 and 0.75, the number of fungicide applications against GDM was lower by 27.2% and 70.2% compared to a systematic fungicide application at disease onset, respectively (new Fig 9B).

This potential reduction of GDM treatments is even more important when compared to the current practices observed in Bordeaux vineyards. According to the results of a survey conducted by the SSP, 7.9 and 10.1 fungicide treatments against GDM were applied in average in Bordeaux vineyards in 2010 and 2013, respectively (Service de la Statistique et de la Prospection, 2016). Compared to the average number of treatment values obtained in 2010 in the Bordeaux region, it is possible to reduce the number of treatments applied in 2010 by 53.3% and 80.9% by using our model-based decision rule with a probability threshold of 0.5 and 0.75, respectively. These levels of reduction reach 63.5% and 85.1% (with a probability threshold of 0.5 and 0.75, respectively) when considering the average number of treatments against GDM sprayed in 2013 in the Bordeaux region (new Fig 9B).

This is now detailed in the results section of the revised manuscript. See lines 420 to 441.

Fig 9 and its caption were improved in order to give more details on how we obtained this value. See new Fig 9 and lines 442 to 452.

________________________________________

Reviewer #2:

The authors study the ability of machine learning algorithms to predict the onset of GDM to reduce the number of fungicide treatments. The analysis is performed on an extensive data set spanning many years from the Bordeaux region. The problem is well motivated. The English is very good with only a few very minor mistakes. But, I have some open questions about the technical/ML approach. I believe the paper is at least a major revision.

Major:

Novelty is an important part of the publication process in general. Can the authors specifically comment on the novelty of this work? This is especially important because there are some cited works (36-39) that seem similar. It should not be up to the readers to infer the novelty of the work.

Answer: In this study, we assess the ability of statistical models and machine learning algorithms to predict the occurrence of high GDM levels at the end of the season, which has never been done before. More specifically, we develop different statistical and machine learning models to predict the risk of high GDM incidence or severity on leaves and bunches at the end of the season, in untreated Bordeaux vineyards. The models tested are generalized linear models (Nelder et al., 1972), regularized regression models (LASSO) (Tibshirani, 1996), and two machine learning algorithms, i.e. gradient boosting (Friedman, 2002) or random forests (Breiman, 2001). This is now clearly mentioned in the revised version. See lines 103 to 109.

How is the GDM onset date encoded as an input/feature? It would be a date, which is a character string, it’s not obvious how a study would numericalize this for a machine learning problem, and the authors should state this.

Answer: In this study, dates of GDM onset were estimated by analyzing incidence data on vine plants, i.e. the proportion of infected plants in a plot. GDM onset was defined as the first week in which the proportion of infected vines stocks exceeded 1%. The number of weeks between the first week of the year and this date was estimated for each plot by survival analysis in order to deal with censored data (Chen et al., 2018). This is now specified in the revised manuscript. See lines 144 to 148.

I’m confused by the choice of ROC-AUC as a metric for measuring performance. The problem is a regression problem, and AUC is typically used for classification problems. Traditionally variations of squared error are used with regression problems. Is there a literature basis for using AUC for a regression problem? Currently, the set up for the experiment is this:

1) Regression analysis (nous on fait une regression binomial adaptée à la classification binaire → pas de regression)

2) Cast the problem as a classification problem by thresholding based on the median

3) Get AUC/ROC

Why do regression in the first place if this was the end goal?

Answer: In this study, we don’t perform any quantitative regression. We use binomial regression models (binomial-logistic GLM and binomial-LASSO) and classification methods (gradient boosting, random forest) to calculate the probability to reach a high level of contamination at bunch closing stage, i.e. higher than the median value reported in the dataset. The target variable is binary (1 for high disease level, 0 for low disease level) and indicates whether a high disease level was reached or not in each plot.

The ability of the fitted models to predict occurrence of high level of GDM was assessed by the area under the ROC curve (AUC). This criterion is commonly used for comparing the performances of binomial regression models and classification methods based on machine learning. See for examples:

- Fawcett, T., 2006. An introduction to ROC analysis. Pattern Recognition Letters, 27(8):861–874.

- Bradley, A. P., 1997. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition, 30, 1145–1159

I’m not sure I agree on thresholding the output based on greater/less than the median. This would only detect if a system is over/underpredicting GDM in a magnitude-less way. Isn’t the goal to get an accurate probability? You cannot measure error well if you do this.

Answer: In this study (as in many plant health studies), we do not attempt to quantitatively predict GDM incidence or severity values. Our objective is to estimate whether GDM attack will exceed a given threshold of incidence or severity at the end of the season. In this study, the considered threshold for each output is the regional median, computed from the 153 plots included in the dataset. However, our method is generic and other thresholds could be considered. This approach is common in phytosanitary studies when the objective is to discriminate between low and high levels of infection. For examples, see:

- Yuen J, Twengstrom E, Sigvald R, 1996. Calibration and verification of risk algorithms using logistic regression. European Journal of Plant Pathology 102, 847–54.

- Hughes, G., McRoberts, N., and Burnett, F.J., 1999. Decision-making and diagnosis in diseases management. Plant Pathology 48:147-153.

- Makowski, D., Taverne M., Bolomier J., Ducarne M., 2005. Comparison of risk indicators for sclerotinia control in oilseed rape. Crop Protection 24:527-531.

These references were included in the revised paper. See line 247.

Can the authors elaborate one Line 186 what they mean by ‘median’ here? Is this the median value of the plot as a whole, median of the year, etc.?

Answer: In this study, the term "median" refers to the regional median over all years included in the dataset. These medians are computed from the 153 monitored plots. This is now clarified in the revised manuscript. See lines 227 to 230.

The outputs (incidence and severity) seem subjective because an expert would have to eyeball the spread of disease. I’m surprise the authors continued with some plots that had only 1 data inspection.

Answer: In this study, we considered the impact of GDM at the end of the season, i.e. after bunch closing. For each plot, the level of GDM (i) leaf incidence, (ii) bunch incidence, (iii) leaf severity and (iv) bunch severity at the end of the season were derived from the last epidemiologic observations.

In plots where one single observation was recorded, the inspection was carried out after bunch closing stage and this observation was thus relevant to estimate GDM level at the end of the season. In such plot, GDM onset date is, of course, censored. GDM onset date is also censored in plots including several observations when these observations are not collected every week. The number of censored data in our dataset is thus relatively large (61.7%). However, this issue was already discussed in a previous study and it was shown that censored date of GDM onset could be imputed by survival analysis (Chen et al., 2018)

Here, censored date of disease onset were thus imputed using a semi-parametric survival model (Anderson-Bergman, 2017) including the average rainfall between March and June as covariate (Chen et al., 2018). In plots where few observations were collected, the imputed dates of disease onset were close to the median onset date of the dataset, which corresponds to week 23, i.e. early-mid June. See lines 148 to 151 in the material and methods section of the revised manuscript.

The abstract states the authors use a year-by-year cross validation, this isn't a conventional cross-validation method and needs to be explained. Yet, it is not in the methods section.

Answer: The ability of the fitted models to predict occurrence of high level of GDM was assessed by year-by-year cross validation using the area under the ROC curve as a measure of classification performance (Barbottin et al., 2008; Makowski et al., 2009). Here, we used a year-by-year cross-validation to account for the strong "year" effect on disease intensity. As data collected the same year in different plots are not independent, it was safer to remove all the data collected a given year at each cross-validation step. This is equivalent to a group-wise cross-validation based on 9 groups corresponding to the 9 years of data included in our dataset. This is now specified in the material and methods section of the revised manuscript. See lines 219 to 226.

Very minor:

Methods section: It’s conventional to state the number of input values for your data set, the readers should not infer this from a later figure.

Answer: OK. See lines 194 to 198.

Line 53: I think this should be “Leaf” and not “Leaves”

Answer: OK. See line 53.

Line 83: Is mechanistic models the right term for this? Probabalistic models still need to determine some function of a model.

Answer: Mechanistic models differ from traditional statistical models by the need to translate every stage of the development cycle of an organism as functions; their structure makes explicit hypotheses about the biological mechanisms that drive infection dynamics (Lessler et al., 2016). This type of model relies on the estimation of many parameters and requires a good knowledge of the biological mechanisms and of the impact of different environmental variables on these mechanisms. See lines 83 to 88.

Fig. 1: Box plots are helpful, but the authors should state the boundaries of the box. Is this the percentile? If so please state what they are.

Answer: The lower and upper hinges of the boxes correspond to the first and third quartiles (the 25th and 75th percentiles) and horizontal segment represent the range between min and max values. See the new caption of Fig 1. See lines 157 to 163.

Figure 2: Axes are in French.

Answer: The text was translated in English. See new Fig 2.

Line 175: Technically, 100 trees is a parameter of the algorithm, this statement is not true in general.

Answer: This paragraph was modified. See line 211.

Line 232: Sever attack -> severe attack

Answer: Ok. See line 284.

Attachment

Submitted filename: Response to Reviewers.docx

Decision Letter 1

Andrea Luvisi

26 Feb 2020

Forecasting severe grape downy mildew attacks using machine learning

PONE-D-19-22557R1

Dear Dr. Chen

We are pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it complies with all outstanding technical requirements.

Within one week, you will receive an e-mail containing information on the amendments required prior to publication. When all required modifications have been addressed, you will receive a formal acceptance letter and your manuscript will proceed to our production department and be scheduled for publication.

Shortly after the formal acceptance letter is sent, an invoice for payment will follow. To ensure an efficient production and billing process, please log into Editorial Manager at https://www.editorialmanager.com/pone/, click the "Update My Information" link at the top of the page, and update your user information. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to enable them to help maximize its impact. If they will be preparing press materials for this manuscript, you must inform our press team as soon as possible and no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

With kind regards,

Andrea Luvisi

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: Line 130-131 and in Figure 3: The revised manuscript still uses the term ‘grape stock’, which is not a commonly used term in viticulture. Are the authors = referring to trunks? Canes? Shoots?

Reviewer #2: The manuscript now clearly states the novelty, and it is an impactful study on the ability to advise growers on fungicide treatments. The greatest strength of the manuscript is it’s discussion—rarely do statistical/machine learning works go into depth about “why” the model works. I have a few minor points, I believe the manuscript should be accepted (and no more than a minor revision).

It’s clear now that the problem is a binary classification task. The AUC is indeed appropriate for this, but a statistical/machine learning problem *must* provide more than just the AUC. With R, it should be trivial to provide the additional metrics: true positive rate, false positive rate, positive predictive value, F1 score, confusion matrix, etc. (average over year-folds?). In particular, positive predictive value will indicate the promise of this work as a diagnostic tool, whereas AUC is more of a measurement of classification performance. Is it possible to also provide the ROC graphs for each method as well?

Can the authors explicitly provide the a-priori rates for the classification task (preferably year-by-year)?

Line 149: How did the authors chose a 1% infection rate? Is this based on some prior work, or is it a parameter?

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: Yes: Alberto C Cruz

Acceptance letter

Andrea Luvisi

2 Mar 2020

PONE-D-19-22557R1

Forecasting severe grape downy mildew attacks using machine learning

Dear Dr. Chen:

I am pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please notify them about your upcoming paper at this point, to enable them to help maximize its impact. If they will be preparing press materials for this manuscript, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

For any other questions or concerns, please email plosone@plos.org.

Thank you for submitting your work to PLOS ONE.

With kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Andrea Luvisi

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Data. Models inputs.

    (XLSX)

    S1 Fig

    Grape downy mildew (GDM) incidence data on leaves after bunch closing (A) and imputed disease onset dates (B) in 151 untreated plots. Median contamination levels and median disease onset date are represented by vertical dotted lines.

    (PNG)

    S2 Fig

    Grape downy mildew (GDM) incidence data on bunches after bunch closing (A) and imputed disease onset dates (B) in 156 untreated plots. Median contamination levels and median disease onset date are represented by vertical dotted lines.

    (PNG)

    S3 Fig

    Grape downy mildew (GDM) severity data on bunches after bunch closing (A) and imputed disease onset dates (B) in 152 untreated plots. Median contamination levels and median disease onset date are represented by vertical dotted lines.

    (PNG)

    S4 Fig. Probability of high severity on leaves according to different precipitation variations (lines) and to different levels of temperature increase between March and June (columns).

    Each graphic shows the effect of precipitation change during a given period (from -15% to +15% between March and June, in April, in May, or in June) for a fixed level of temperature increase (from +0°C to +4°C between March and June) on predicted probability that GDM severity on leaves will be higher than regional median at the end of the season. Probabilities are forecasted by a gradient boosting algorithm that includes all climatic features. Each boxplot represents the distribution of the probability values over the vineyard plots of our dataset; the shaded boxplot corresponds to initial precipitation and temperatures (precipitation and temperature kept unchanged compared to actual conditions) and the median probability obtained with this scenario is indicated by a red dotted line. The lower and upper hinges of the boxes correspond to the first and third quartiles (the 25th and 75th percentiles) and vertical segment represent the range between min and max values.

    (PNG)

    S5 Fig. Response of probability of high incidence or severity on leaves or on bunches to date of disease onset estimated with the GLM and its 95% confidence interval (in green), and partial dependence plot obtained with the gradient boosting algorithm including climate inputs and date of disease onset (in red).

    Median, minimum, 1st and 3rd quartiles, and maximum of observed onset dates are represented by a dot and four crosses, respectively.

    (PNG)

    S6 Fig. Number of fungicide treatments applied to control GDM in Bordeaux vineyards as a function of a predefined triggering probability threshold (probability of high GDM incidence or severity on leaves or on bunches).

    The black curve indicates the average numbers of fungicide treatments in the vineyard plots of our dataset computed while assuming that the first treatment is triggered only when the GLM probability of high severity exceeds the value given in the x-axis. Blue line represents the number of treatments for threshold = 0, i.e. when the first treatment is applied in all plots as soon as GDM symptoms are detected. Red and orange lines correspond to the average numbers of treatments recorded by the SSP in 2013 and 2010, respectively.

    (PNG)

    Attachment

    Submitted filename: Response to Reviewers.docx

    Data Availability Statement

    Data used in this study are summarized in Supporting information (see S1 Data).


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES