Skip to main content
PLOS Neglected Tropical Diseases logoLink to PLOS Neglected Tropical Diseases
. 2020 Apr 8;14(4):e0008185. doi: 10.1371/journal.pntd.0008185

One mean to rule them all? The arithmetic mean based egg reduction rate can be misleading when estimating anthelminthic drug efficacy in clinical trials

Wendelin Moser 1,2, Jennifer Keiser 1,2, Benjamin Speich 3,4, Somphou Sayasone 2,5,6, Stefanie Knopp 2,5, Jan Hattendorf 2,5,*
Editor: Matthew C Freeman7
PMCID: PMC7170292  PMID: 32267856

Abstract

Animal and human helminth infections are highly prevalent around the world, with only few anthelminthic drugs available. The anthelminthic drug performance is expressed by the cure rate and the egg reduction rate. However, which kind of mean should be used to calculate the egg reduction rate remains a controversial issue. We visualized the distributions of egg counts of different helminth species in 7 randomized controlled trials and asked a panel of experts about their opinion on the egg burden and drug efficacy of two different treatments. Simultaneously, we calculated infection intensities and egg reduction rates using different types of means: arithmetic, geometric, trimmed, winsorized and Hölder means. Finally, we calculated the agreement between expert opinion and the different means. We generated 23 different trial arm pairs, which were judged by 49 experts. Among all investigated means, the arithmetic mean showed poorest performance with only 64% agreement with expert opinion (bootstrap confidence interval [CI]: 60−68). Highest agreement of 94% (CI: 86−96) was reached by the Hölder mean M0.2, followed by the geometric mean (91%, CI: 85−94). Winsorized and trimmed means showed a rather poor performance (e.g. winsorization with 0.1 cut-off showed 85% agreement, CI: 78−87), but they performed reasonably well after excluding treatment arms with a small number of patients. In clinical trials with moderate sample size, the currently recommended arithmetic mean does not necessarily rank anthelminthic efficacies in the same order as might be obtained from expert evaluation of the same data. Estimates based on the arithmetic mean should always be reported together with an estimate, which is more robust to outliers, e.g. the geometric mean.

Author summary

Besides cure rates, egg reduction rates represent an important indicator of anthelminthic drug efficacy in clinical trials. However, there is an ongoing controversy whether the arithmetic or the geometric mean should be used for its calculation. The arithmetic mean is problematic in skewed distributions mainly because the mean is sensitive to outliers, whereas the geometric mean does not correspond to our intuitive interpretation of average reduction. Several studies tried to compare the performance of different means but they relied on assumptions, which favored one approach over another. Despite the ongoing debate, the World Health Organization (WHO) recommends the arithmetic mean to calculate egg reduction rates. To overcome limitations from previous studies, we visualized data from several clinical trials and asked a panel of experts to compare drug efficacy of two different treatments. Afterwards, we estimated efficacy by using different means. Finally, we calculated the raw agreement between expert opinion and the different means. From all investigated methods to calculate efficacy, the arithmetic mean showed the poorest performance in terms of agreement with expert opinion. In anthelminthic human drug trials, which are characterized by small sample size and non-adherence, estimates more robust to outliers should be reported to assess drug efficacy performance.

Introduction

Helminths, including cestodes, nematodes and trematodes, infect a large number of humans and animals. Among humans, helminth infections are highly prevalent with for example, 1.5 billion people infected with soil-transmitted helminths (STHs, Ascaris lumbricoides, hookworm and Trichuris trichiura) [1], 240 million with schistosomes [2] and 120 million with lymphatic filaria [3]. In livestock production, helminth infections are responsible for decreased productivity, which leads to economic losses for famers [4]. To control human helminth infections, the World Health Organization’s (WHO) goal is to reduce the burden caused by moderate and heavy infections by increasing the coverage of anthelminthic drugs within so-called preventive chemotherapy programs–i.e. annual or biannual mass treatment of high risk populations [5]. Anthelminthic resistance has been observed widely in veterinary medicine [68]; therefore, emergence of resistance in humans is likely [9,10]. Hence, it is crucial to closely observe the anthelminthic drug efficacy in order to detect resistance development [11,12].

From a clinical medicine point of view cure rates (CRs) are usually of primary interest; however, from a public health perspective and for monitoring drug resistance, egg reduction rates (ERRs) are often more appropriate compared to CRs [13] and are therefore commonly used in human and exclusively used in veterinary medicine [12,14]. The ERR is defined as the relative reduction in the group mean egg output after treatment compared to pre-treatment levels. For estimating the ERR two types of means–or, more precisely, measures of central tendency–are exclusively used: the arithmetic mean and the geometric mean. Both means have strengths and weaknesses, triggering an ongoing debate among researchers and disease control specialists, which measure to prefer. One main disadvantage of the arithmetic mean is the influence of outliers. An example was reported by Speich and colleagues [15]; one extreme outlier resulted in a decrease in ERR from 93% to 73%. In addition, the arithmetic mean is not in close proximity to most of the observations in skewed distributions. To reduce the influence of outliers for skewed parasite data, commonly the geometric mean is used. Its disadvantages include the assumption of homogeneity of the variance between the compared groups [16,17] and the arbitrary choice of the constant for taking the logarithm of zero egg counts at follow-up [18]. The current WHO guidelines recommend the use of the arithmetic mean for calculating ERRs [13].

Several researchers have continued to identify the most appropriate method for calculating ERRs using empirical data or computer simulations. However, the methods used were based on assumptions about true efficacy or egg distribution, which favored one specific mean over another [1720]. For instance, if we define the performance of a mean as the unbiased estimation of the relative egg reduction in the population, the arithmetic mean will outperform the geometric mean in any distribution without extreme values. Conversely, if we define performance as sensitivity to outliers, range of the confidence interval or proximity to the median, the geometric mean will always show a better performance.

This study applied a new approach to assess the performance of different means to calculate ERRs. To overcome previous shortcomings, we visualized the distributions of egg counts of different helminth species in selected randomized controlled trials and asked a panel of experts about their opinion on the egg burden and drug efficacy of two different treatments. Afterwards, we calculated means and ERRs based on different types of means and assessed their agreement with the expert opinion. Of note, we used for this study exclusively data from human drug trials with small to moderate sample size (range 13 to 140 participants per arm). The results should not be extrapolated to other scenarios.

Methods

The methods can be divided into four main steps: i) gathering and preparing data from previously conducted randomized multi-arm anthelminthic drug trials and dividing the trial arms into pairs, ii) visualizing the egg count distributions and asking experts for their opinion, which one of the two trial arms has a higher egg burden (before and after treatment) and better drug efficacy, iii) calculating mean egg counts at baseline and follow-up and ERRs of each trial arm using different types of means, and iv) assessing the performance of each type of mean according to their proportional agreement with the experts. The steps are summarized in Fig 1.

Fig 1. Illustration of the study design.

Fig 1

Data preparation

Seven clinical drug trials against helminths with a total of 33 study arms, for which individual patient level data was available in house, were used for generating the questionnaires for experts [15, 2125]. If efficacy was reported for more than one helminth species, all species were included resulting in a total of 46 arms. The trial arms were, stratified by study and species, ordered according to arithmetic mean infection intensity and grouped into consecutive pairs.

Expert opinion on egg counts and drug efficacy

Questionnaire format

For each of the 23 trial arm pairs we generated several figures visualizing the egg count distributions with box-plots and kernel density plots (the latter can be interpreted as a histogram but is displayed as a continuous line instead of bars). The plots were separately generated for baseline and follow-up and were represented on linear and log scale using R’s stats package with default settings (except for the smoothing bandwidth of the density plots, which was set as the maximum egg counts of both trial arms divided by 20). A constant of 1 was added to the egg counts before logarithmic transformation. The experts were asked to judge if the egg burden is considerably higher in arm A, slightly higher in A, similar, slightly higher in B, or considerably higher in B separately for baseline and follow up. Similarly, we asked for their opinion about treatment efficacy whereby the following options were provided: Treatment A is better, A slightly better, Similar, B slightly better, and B better. We generated several questionnaires and, in each questionnaire, the order of questions and the allocation of trial arms to A and B were randomly shuffled. One questionnaire example is presented in S1 File.

Questionnaire distribution

Experts including biostatisticians, human parasitologists and epidemiologists, and veterinary parasitologists and epidemiologists with long-term experience in helminthic diseases selected from personal contacts of the authors were asked to fill in this questionnaire between February and November 2016. All participants were asked for additional contacts of potential specialists to increase the number of participants. The questionnaires were distributed either via a hard copy or sent by email. After the distribution, each participant received up to five reminders. Participants were asked for their personal interpretation of the data and they were informed that there is no right or wrong answer.

Calculation of mean egg counts and ERRs with different means

The geometric mean is defined as:

GM(x1,,xn)=e(1/ni=1nlog(xi)) (1)

The geometric mean requires x1, …, xn > 0. Therefore, a small amount (usually 1) has to be added to account for zero egg counts. The amount is usually subtracted from the final results:

GM(x1,,xn)=e(1/ni=1nlog(xi+1))1 (2)

The Hölder mean (syn. power mean) is defined as:

Hp(x1,,xn)=(1/ni=1nxip)1/p (3)

with parameter p ≠ 0 and x1, …, xn ≥ 0. The arithmetic and geometric means are special cases of the Hölder mean with p = 1 and p = limp0, respectively. Another common mean is the Lehmer mean defined as:

Lp(x1,,xn)=i=1n(xi+1)pi=1n(xi+1)p11 (4)

Just like the geometric mean, the Lehmer mean requires values > 0. Therefore, also in this case 1 is added to account for zero counts.

The truncated and winsorized means are less sensitive to extreme values. For the truncated mean a certain percentage of the ends are discarded whereas for the winsorized mean the values are replaced by the most extreme remaining values. Several algorithms exist to determine quantiles, we used the inverse of the empirical distribution function with averaging at discontinuities (type 2 in R, type 5 in SAS). This quantile algorithm–in contrast to several others–satisfies M(e) = M({e,e}) for each n-tuple e of n elements. Truncation and winsorization is normally applied at both ends, but–because we are only worried about extremely high egg counts–we discarded/replaced only the highest values.

In total we calculated mean egg counts and ERRs for 30 different means: arithmetic and geometric means, Hölder and Lehmer means with parameter p set to 0.1, 0.2, …, 0.9 and winsorized and truncated means with discarding, replacing, 2, 4, 6, 8 and 10% of the highest values.

Assessing the performance of each mean as agreement with experts

To assess agreement between experts and calculated means we dichotomized both variables. For the calculated means we simply used the difference between both arms to decide of arm A or arm B showed higher egg counts or egg reductions, ignoring the magnitude of the difference. Expert opinion was dichotomized into the same categories based on two different definitions. i) ‘all studies’ (simple majority criterion): if more experts judged the egg burden/drug efficacy higher in a certain arm (e.g. number of persons answering either A is better and A is slightly better) compared to the number of experts favoring the other arm while ignoring the undecided. We used the score (arithmetic mean of the answers of the Likert scale transformed into numerical scores) to break ties. If the score was 3, which occurred once in the baseline and once in the follow-up judgments, the questions were excluded from the analysis. ii) ‘consensus studies’ (absolute majority criterion): more experts (>50%) shared the view that the egg burden/drug efficacy is higher in a certain arm than undecided or those with an opposite view together. We refer to this as ‘consensus studies’ because no or only very few experts had an opposite opinion (median = 0, range 0 to 3).

Additionally, we inspected visually the relationship between the calculated differences among the trial arm pairs and the raters score. Further, we explored the relationship between the calculated differences in ERRs and rater scores among the trial arm pairs and the difference of the observed CRs. Further information how agreement and performance was assessed is shown in the guidance S2 File.

Data analysis

All data were analysed with R version 3.4.3. Performance was calculated as the raw percentage agreement between experts and each mean. Of note, raw agreement is an appropriate measure in our study because, by design, chance agreement is always 50%. Confidence intervals for agreement were constructed by bootstrap resampling of raters. A sensitivity analysis was conducted, which only included trial arms with a minimum number of observations of 30 per arm.

Inter-rater reliability among expert opinion was assed by Krippendorff's α for ordinal metrics. We also estimated the intra cluster correlation (single unit, random raters) and the average pairwise kappa coefficient with weighted squared distances. Because all estimates were very close, we present only Krippendorff's α. Loess smoothing lines were estimated with a smoothing parameter α of 0.85.

Results

Characteristics of included studies

Data from six publications [15,2125] including seven clinical trials with 32 trial arms were used for generating the questionnaires. Among the clinical trials, four included drug efficacy data for treating T. trichiura [15,2124], three for hookworm [21,22,24] and two for O. viverrini [25]. Different drugs, doses or drug combinations were used in the trials, i.e. albendazole [15,2224], mebendazole [15,22,24], oxantel pamoate [15,21,22], ivermectin [15,24], nitazoxanide [23] and tribendimidine [25]. The median number of participants per arm was 48 (interquartile range: 39–112, range: 13–140). The median CR was 34% (range: 0–91%). Further trial arm characteristics including egg counts and cure rates are presented in S1 Table in S3 File).

Response rate and field specifications

From a total of 76 invited experts, we received 49 (64.5%) filled-out questionnaires. Participants included human parasitologists/epidemiologists (n = 26, 53.1%), followed by veterinary parasitologists/epidemiologists (n = 12, 24.5%), biostatisticians (n = 9, 18.4%) and two engineers with experience in human parasitology (n = 2, 4.1%). The distribution of academic qualifications was as follows: 27 (55.1%) had a PhD-degree, 16 (32.7%) were professors, four had a MSc-degree (8.2%) and two participants were medical doctors (4.1%).

Inter rater reliability of experts' judgments

The responses obtained for each question are visualized in Fig 2. As expected, the answer "Egg burden is similar" was quite common at baseline whereas a clear preference was found for the follow up and efficacy ratings. Krippendorff's α was estimated at 0.44, 0.62 and 0.65 for baseline, follow-up and efficacy, respectively. In 3.5% (40/1127) of the answers, the raters stated that they are not able to provide a reliable judgment. From the 69 comparisons, 37 (54%) fulfilled the absolute majority criterion, i.e. more than 50% of experts favor one arm and 67 (97%) fulfilled the simple majority criterion.

Fig 2. Judgment of experts with respect to the egg burden and treatment efficacy of 2 clinical trial arms.

Fig 2

The labels below the bars denote the page and question (1: question at the top of the page, 2: middle, 3: bottom) in the example questionnaire presented in the S1 File. Numbers above bars represent the number of experts with a valid response (i.e. excluding “don't know” responses). Abbreviations: Q: Question; p: page. Top panel: baseline, middle panel: follow-up, bottom panel: efficacy. a) consensus agreement (absolute majority criterion—more than 50% of experts favor one arm) b) arm pair excluded, experts did not favor any arm c) excluded from the sensitivity analysis (number of trial participants in 1 arm below 30).

Performance of different means

The agreements between the different means and the expert opinion are presented in Fig 3. The arithmetic mean showed the poorest performance among all means. Especially, for comparisons at follow-up the agreement was close to chance agreement. Truncation and winsorization means improved the agreement in particular if the proportion truncated was high. We observed the highest performance using the Hölder mean (with parameter 0.2), followed by the geometric mean and the Lehmer mean (with parameter 0.5). If only those comparisons with expert consensus are considered (Fig 3. right panel), the agreement was generally slightly higher but the overall pattern did not change.

Fig 3. Percentage agreement between experts and different means.

Fig 3

Raw percentage agreement between expert opinion and the calculated means for egg burden at baseline and follow-up and drug efficacy (superiority of a certain trial arm). Both, expert opinion and calculated means were dichotomized into 'A > B' and 'B > A'. Number of trial arm comparisons N: left panel: bl = 22, fu = 22, ef = 23; right panel: bl = 7, fu = 14, ef = 16. AM: arithmetic mean, GM: geometric mean, Hö: Hölder mean, Le: Lehmer mean, Wi: winsorized mean, tr: truncated mean. Numbers behind Hö/Le indicate parameter p, numbers behind Wi/tr denote proportion discarded/replaced. The rank denotes the rounded row mean rank. All: simple majority definition, consensus: absolute majority criterion, more than 50% of experts favor one arm, i.e. only those comparisons marked with footnote a) in Fig 2 are considered. S2 File explains how Fig 2 and Fig 3 are related.

In-depths investigation of selected means

The performance of the geometric, arithmetic, winsorized (trimmed at 10%) and Hölder (parameter 0.2) mean was explored in more detail. The arithmetic and geometric means were selected, since they are currently most commonly used and the winsorized and Hölder mean, because they showed a good performance. The relationship between experts' rating scores and the difference among the means between trial arms are presented in Fig 4.

Fig 4. Relationship between the calculated difference among 2 trial arms estimated by different means and experts' rating scores.

Fig 4

The symbols in the first 3 panels show the association between the rater scores and the differences in egg counts or egg reductions between trial arm pairs calculated by 4 different means (different means are represented by different colors). The lines represent the corresponding loess smoothing lines. The bar plots at the top show the experts’ rating scores in the same way as in Fig 2. Some bar plots were placed at the bottom to avoid over plotting. Note, that rater scores (and bar plots) which favored arm B have been converted to favor arm A, e.g. a rating score of 4 would be converted to a score of 2 (a score of 3 indicates no difference between the trial arms). In 3 comparisons at follow up (numbered 1 to 3 in the top right panel) the estimates were especially strong diverging. The corresponding raw data are presented as strip plot and histogram in the bottom right panel. S2 File explains how Fig 2 and Fig 4 are related.

For baseline, all means showed a correlation with the rating-scores. However, at rating scores close to 3 (indicating no difference among trial arms) all means showed considerable variability. With respect to the follow up judgments, the arithmetic mean showed the poorest performance. In three of the five comparisons with rating scores below 1.75, the arithmetic mean found the opposite trial arm to be associated with a higher egg burden. In all three cases a single outlier was responsible for this result (Fig 4, lower right panel). A similar picture was observed for the drug efficacy judgments. In three of the five comparisons with rating scores below 1.75, the arithmetic mean favored the other drug. In one case the arithmetic mean estimated the difference in ERRs as 160% in opposition to the rating scores. However, this was a small trial with only 17 and 19 participants in the trial arms. Consequently, the arithmetic mean performed better in the sensitivity analysis, where small trials were excluded (S3 Fig in S3 File) but showed still the poorest performance among the four investigated means.

Sensitivity analysis

After excluding the three trial arm pairs with less than 30 participants per arm, no noteworthy influence on the results was observed. Agreement was generally somewhat higher. One exception was the results of the Winsorized mean, which performed better in this scenario (S3 Fig in S3 File). We explored in addition the association of expert opinion and ERRs with differences in CRs. Expert opinion correlated strongly with differences in CRs whereas the correlation between winsorized and arithmetic mean ERRs and difference in CRs was again weak (Fig 5). Further sensitivity analyses using weighted lowess smoother (with weights proportional to the number of subjects in the trial arms) and with scaled differences in the ERRs (i.e. the most extreme value was considered as the minimum or maximum (S5 Fig in S3 File) supported the findings from the main analysis.

Fig 5. Relationship between rater scores, means and cure rates.

Fig 5

Differences between ERRs and CRs in percentage points. Lines and shaded areas represent the loess smoothing line and the corresponding 95% confidence band. Grey crosses and the dotted line represent the experts' score and its corresponding loess smoothing line.

Discussion

We calculated the egg burden and drug efficacy from several clinical drug trials against helminth infections using different types of means. The performance of the different types of means was assessed by calculating their agreement with expert opinion. From all investigated means the arithmetic mean showed the worst performance, which was sometimes not much higher than chance agreement.

The poor performance of the arithmetic mean in our study was in all scenarios related to the presence of a single outlier. Outliers might be more common in human drug trials compared to population epidemiological surveys or the veterinary sector because some participants might refuse to swallow all tablets, vomit after the treatment or do not adhere to treatment for other reasons and as randomized trials, especially dose-ranging trials, have usually relatively few participants in each arm. In addition, individual responses to treatment show remarkable variability, which might result in imbalance if the sample size is limited. Therefore, our results should not be extrapolated to studies with a different purpose, like large-scale program evaluations, resistance surveillance, environmental sanitation or the veterinary sector.

Olliaro et al. [26] pointed out: the best suited approach to assess drug efficacy depends on the purpose and for large scale program monitoring trends in responses and emergence of drug resistance are of primary interest, which can be more precisely assessed with individual level estimates. In this context, several modeling approaches have been proposed which have several advantages including estimating the full distribution of individual responses [27]. However, it might be challenging to specify rather sophisticated models a-priory in a statistical analysis plan as required in clinical trials.

Several simulation studies assessed the performance of different means with contrasting results [1719]. Other studies relied on certain assumptions which, by design, favored one of the estimates, e.g. that the arithmetic mean based ERR represents the true efficacy [18] or that the egg counts follow a certain distribution [19,20]. To overcome the shortcomings of previous studies we used, for the first time, an approach, which does not rely on any assumptions and does not favor any particular estimate. The judgments of visualized paired comparisons might be hypothetical, because the helminth species is not specified, but provides a natural picture in terms of burden and drug efficacy. One could argue that the visualization is causing bias because of optical illusions but the consistency of our findings using complementary approaches—like associations with CRs–indicates that the results are sufficiently robust. We can only speculate about the reasons for the discrepancy between the expert opinion and the arithmetic mean. Some experts might consider extreme values as non-representative and ignore them; other experts might have the health burden in mind and prefer a large proportion of light infection even if a few heavy infections remain.

The geometric mean showed an overall robust performance in our study. The main advantage of the geometric mean is that it is simple to compute and that the mean is commonly applied for skewed data. However, there are also several disadvantages associated with this type of mean. The sample mean is biased and underestimates the population mean by a factor of evar/(n*2)-1 multiplied by the geometric mean. Another issue represents the fact that the geometric mean is not defined for samples that include zeros. Usually, a constant of 1 is added to each count but this constant has been criticized as being not more rational than adding any other positive number [18].

The Hölder mean slightly outperformed the geometric mean but the difference was marginal. It remains debatable if a slightly improved performance justifies the increased complexity associated with its calculation. A positive feature of the Hölder mean is that all values lie between the arithmetic and geometric mean and no modification in the presence of 0 values is required. However, in case of high CRs the estimates according to the Hölder mean could even be below the geometric mean. This is caused by the fact that–in contrast to the geometric mean–no constant is added to the zero egg counts. Considering the above stated example with 9 times 0 egg counts and one time 1000 eggs, the geometric mean would estimate a mean egg count of 1, whereas the Hölder mean (with parameter 0.2) would estimate 0.01; therefore, a higher parameter of 0.4 might be more appropriate. Likewise, the Lehmer mean requires a constant in the presence of zero values and despite it performed similar to the Hölder mean, we would not recommend its use.

In contrast to the truncated mean, the winsorized mean does not compromise the sample size and it is therefore preferable over the truncated mean. The winsorized mean with a cut-off level of 10% performed reasonably well in our study, in case the small study arms were excluded, which was highlighted in the sensitivity analysis. For obvious reasons, the estimate is not suitable for high CRs, since all CRs above 90% would result in an estimate of 100%. An additional problem might arise in case of cluster randomized trials. One needs to define if the replacement of values should be done for the entire trial arm or for each cluster separately. In this study, we applied a one sided truncation of the upper tail. It should be noted that there might be other settings where egg counts are generally quite high and zero or low egg counts represent the extreme values.

Constructing interval estimates might be challenging for several types of means in the presence of a complex study design. Confidence intervals for 2 arm superiority trials can be easily computed via bootstrapping but methods to incorporate the Hölder mean into random effect models or generalized estimating equations are currently not available. Likewise, the arithmetic mean features many statistical properties and many statistical methods rely on these properties. Further, meta-analyses on egg counts and egg reductions might become difficult to interpret if other means than the arithmetic mean are used.

There might be also biological reasons why we prefer one mean over another. The arithmetic mean of a sample is always the best estimator of the population arithmetic mean, and similarly the geometric mean of a sample is the best estimator of the population geometric mean. In environmental sanitation, we might be mainly interested in the total number of eggs shed into the environment. In this case, the arithmetic mean will be most appropriate because 'super-shedders' are of particular importance and should not be considered as outliers.

Conclusion

In anthelminthic drug trials of moderate sample size, the ERR based on arithmetic mean—as recommended by current WHO guidelines—showed a poor agreement with expert opinion on drug efficacy. It should not be used as the primary outcome in human drug efficacy trials and should be always reported together with an estimate that is more robust to outliers. Of course, all estimates should be complemented by their corresponding confidence intervals. We recommend extending the WHO guidelines to include aspects of clinical trials besides recommendations for programme monitoring.

Supporting information

S1 File. Example questionnaire.

(PDF)

S2 File. Example explaining agreements, scores and relationship between figures.

(PDF)

S3 File. Trial characteristics, sensitivity analysis and agreement among different means.

(PDF)

Data Availability

The data cannot be shared without restrictions because the authors do not own the data. The data underlying the results presented in the study are available from the authors of the original studies [ref. 15, 21-25].

Funding Statement

WM and JK were partly funded by Swiss National Science Foundation (No 320030_14930). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Pullan RL, Smith JL, Jasrasaria R, Brooker SJ. Global numbers of infection and disease burden of soil transmitted helminth infections in 2010. Parasit Vectors. 2014;7: 37 10.1186/1756-3305-7-37 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Vos T, Flaxman AD, Naghavi M, Lozano R, Michaud C, Ezzati M, et al. Years lived with disability (YLDs) for 1160 sequelae of 289 diseases and injuries 1990–2010: a systematic analysis for the Global Burden of Disease Study 2010. The Lancet. 2012;380: 2163–2196. 10.1016/S0140-6736(12)61729-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.WHO. Global programme to eliminate lymphatic filariasis: progress report, 2011. Weekly epidemiological record. 2012;87: 346–356. [PubMed] [Google Scholar]
  • 4.Charlier J, van der Voort M, Kenyon F, Skuce P, Vercruysse J. Chasing helminths and their economic impact on farmed ruminants. Trends Parasitol. 2014;30: 361–367. 10.1016/j.pt.2014.04.009 [DOI] [PubMed] [Google Scholar]
  • 5.WHO. Soil-transmitted helminthiasis: eliminating soil-transmitted helminthiasis as a public health problem in children. Progress report 2001–2010 and strategic plan 2011–2020. Geneva World Health Organization. 2012.
  • 6.Kaplan RM, Vidyashankar AN. An inconvenient truth: global worming and anthelmintic resistance. Vet Parasitol. 2012;186: 70–78. 10.1016/j.vetpar.2011.11.048 [DOI] [PubMed] [Google Scholar]
  • 7.de Lourdes Mottier M, Prichard RK. Genetic analysis of a relationship between macrocyclic lactone and benzimidazole anthelmintic selection on Haemonchus contortus. Pharmacogenet Genomics. 2008;18: 129–140. 10.1097/FPC.0b013e3282f4711d [DOI] [PubMed] [Google Scholar]
  • 8.Abongwa M, Martin J, Robertson A. A brief review on the mode of action of antinematodal drugs: Acta Vet (Beogr). 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Diawara A, Drake LJ, Suswillo RR, Kihara J, Bundy DAP, Scott ME, et al. Assays to detect beta-tubulin codon 200 polymorphism in Trichuris trichiura and Ascaris lumbricoides. PLoS Negl Trop Dis. 2009;3: e397 10.1371/journal.pntd.0000397 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Diawara A, Halpenny CM, Churcher TS, Mwandawiro C, Kihara J, Kaplan RM, et al. Association between response to albendazole treatment and β-tubulin genotype frequencies in soil-transmitted helminths. PLoS Negl Trop Dis. 2013;7: e2247 10.1371/journal.pntd.0002247 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Albonico M, Engels D, Savioli L. Monitoring drug efficacy and early detection of drug resistance in human soil-transmitted nematodes: a pressing public health agenda for helminth control. Int J Parasitol. 2004;34: 1205–1210. 10.1016/j.ijpara.2004.08.001 [DOI] [PubMed] [Google Scholar]
  • 12.WHO. Assessing the efficacy of anthelminthic drugs against schistosomiasis and soil-transmitted helminthiases. Geneva: World Health Organization; 2013. [Google Scholar]
  • 13.Montresor A. Cure rate is not a valid indicator for assessing drug efficacy and impact of preventive chemotherapy interventions against schistosomiasis and soil-transmitted helminthiasis. Trans R Soc Trop Med Hyg. 2011;105: 361–363. 10.1016/j.trstmh.2011.04.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Coles GC, Jackson F, Pomroy WE, Prichard RK, von Samson-Himmelstjerna G, Silvestre A, et al. The detection of anthelmintic resistance in nematodes of veterinary importance. Vet Parasitol. 2006;136: 167–185. 10.1016/j.vetpar.2005.11.019 [DOI] [PubMed] [Google Scholar]
  • 15.Speich B, Ali SM, Ame SM, Bogoch II, Alles R, Huwyler J, et al. Efficacy and safety of albendazole plus ivermectin, albendazole plus mebendazole, albendazole plus oxantel pamoate, and mebendazole alone against Trichuris trichiura and concomitant soil-transmitted helminth infections: a four-arm, randomised controlled trial. Lancet Infect Dis. 2015;15: 277–284. 10.1016/S1473-3099(14)71050-3 [DOI] [PubMed] [Google Scholar]
  • 16.Cochran W, Cox G. Experimental Designs. John Wiley & Sons, New York; 1992. [Google Scholar]
  • 17.Montresor A. Arithmetic or geometric means of eggs per gram are not appropriate indicators to estimate the impact of control measures in helminth infections. Trans R Soc Trop Med Hyg. 2007;101: 773–776. 10.1016/j.trstmh.2007.04.008 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Dobson RJ, Sangster NC, Besier RB, Woodgate RG. Geometric means provide a biased efficacy result when conducting a faecal egg count reduction test (FECRT). Vet Parasitol. 2009;161: 162–167. 10.1016/j.vetpar.2008.12.007 [DOI] [PubMed] [Google Scholar]
  • 19.Smothers CD, Sun F, Dayton AD. Comparison of arithmetic and geometric means as measures of a central tendency in cattle nematode populations. Vet Parasitol. 1999;81: 211–224. 10.1016/s0304-4017(98)00206-4 [DOI] [PubMed] [Google Scholar]
  • 20.Torgerson PR, Schnyder M, Hertzberg H. Detection of anthelmintic resistance: a comparison of mathematical techniques. Vet Parasitol. 2005;128: 291–298. 10.1016/j.vetpar.2004.12.009 [DOI] [PubMed] [Google Scholar]
  • 21.Moser W, Ali SM, Ame SM, Speich B, Puchkov M, Huwyler J, et al. Efficacy and safety of oxantel pamoate in school-aged children infected with Trichuris trichiura on Pemba Island, Tanzania: a parallel, randomised, controlled, dose-ranging study. Lancet Infect Dis. 2016;16: 53–60. 10.1016/S1473-3099(15)00271-6 [DOI] [PubMed] [Google Scholar]
  • 22.Speich B, Ame SM, Ali SM, Alles R, Huwyler J, Hattendorf J, et al. Oxantel Pamoate–Albendazole for Trichuris trichiura Infection. N Engl J Med. 2014;370: 610–620. 10.1056/NEJMoa1301956 [DOI] [PubMed] [Google Scholar]
  • 23.Speich B, Ame SM, Ali SM, Alles R, Hattendorf J, Utzinger J, et al. Efficacy and safety of nitazoxanide, albendazole, and nitazoxanide-albendazole against Trichuris trichiura infection: a randomized controlled trial. PLoS Negl Trop Dis. 2012;6: e1685 10.1371/journal.pntd.0001685 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Knopp S, Mohammed KA, Speich B, Hattendorf J, Khamis IS, Khamis AN, et al. Albendazole and mebendazole administered alone or in combination with ivermectin against Trichuris trichiura: a randomized controlled trial. Clin Infect Dis. 2010;51: 1420–1428. 10.1086/657310 [DOI] [PubMed] [Google Scholar]
  • 25.Sayasone S, Odermatt P, Vonghachack Y, Xayavong S, Senggnam K, Duthaler U, et al. Efficacy and safety of tribendimidine against Opisthorchis viverrini: two randomised, parallel-group, single-blind, dose-ranging, phase 2 trials. Lancet Infect Dis. 2016;16: 1145–1153. 10.1016/S1473-3099(16)30198-0 [DOI] [PubMed] [Google Scholar]
  • 26.Olliaro PL, Vaillant M, Diawara A, Coulibaly JT, Garba A, Keiser J, et al. Toward measuring Schistosoma response to praziquantel treatment with appropriate descriptors of egg excretion. PLoS Negl Trop Dis. 2015;9: e0003821 10.1371/journal.pntd.0003821 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Walker M, Mabud TS, Olliaro PL, Coulibaly JT, King CH, Raso G, et al. New approaches to measuring anthelminthic drug efficacy: parasitological responses of childhood schistosome infections to treatment with praziquantel. Parasit Vectors. 2016;9: 41 10.1186/s13071-016-1312-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
PLoS Negl Trop Dis. doi: 10.1371/journal.pntd.0008185.r001

Decision Letter 0

Sara Lustigman, Matthew C Freeman

29 Oct 2019

Dear Dr. Hattendorf:

Thank you very much for submitting your manuscript "One mean to rule them all? The arithmetic mean based egg reduction rate can be misleading when estimating anthelminthic drug efficacy in clinical trials" (#PNTD-D-19-01393) for review by PLOS Neglected Tropical Diseases. Your manuscript was fully evaluated at the editorial level and by independent peer reviewers. The reviewers appreciated the attention to an important problem, but raised some substantial concerns about the manuscript as it currently stands. These issues must be addressed before we would be willing to consider a revised version of your study. We cannot, of course, promise publication at that time.

We therefore ask you to modify the manuscript according to the review recommendations before we can consider your manuscript for acceptance. Your revisions should address the specific points made by each reviewer.

When you are ready to resubmit, please be prepared to upload the following:

(1) A letter containing a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript.

(2) Two versions of the manuscript: one with either highlights or tracked changes denoting where the text has been changed (uploaded as a "Revised Article with Changes Highlighted" file); the other a clean version (uploaded as the article file).

(3) If available, a striking still image (a new image if one is available or an existing one from within your manuscript). If your manuscript is accepted for publication, this image may be featured on our website. Images should ideally be high resolution, eye-catching, single panel images; where one is available, please use 'add file' at the time of resubmission and select 'striking image' as the file type.

Please provide a short caption, including credits, uploaded as a separate "Other" file. If your image is from someone other than yourself, please ensure that the artist has read and agreed to the terms and conditions of the Creative Commons Attribution License at http://journals.plos.org/plosntds/s/content-license (NOTE: we cannot publish copyrighted images).

(4) If applicable, we encourage you to add a list of accession numbers/ID numbers for genes and proteins mentioned in the text (these should be listed as a paragraph at the end of the manuscript). You can supply accession numbers for any database, so long as the database is publicly accessible and stable. Examples include LocusLink and SwissProt.

(5) To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see http://journals.plos.org/plosntds/s/submission-guidelines#loc-methods

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/ PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

We hope to receive your revised manuscript by Dec 28 2019 11:59PM. If you anticipate any delay in its return, we ask that you let us know the expected resubmission date by replying to this email.

To submit a revision, go to https://www.editorialmanager.com/pntd/ and log in as an Author. You will see a menu item call Submission Needing Revision. You will find your submission record there.

Sincerely,

Matthew C Freeman, MPH, Ph.D.

Associate Editor

PLOS Neglected Tropical Diseases

Sara Lustigman

Deputy Editor

PLOS Neglected Tropical Diseases

***********************

Reviewer's Responses to Questions

Key Review Criteria Required for Acceptance?

As you describe the new analyses required for acceptance, please consider the following:

Methods

-Are the objectives of the study clearly articulated with a clear testable hypothesis stated?

-Is the study design appropriate to address the stated objectives?

-Is the population clearly described and appropriate for the hypothesis being tested?

-Is the sample size sufficient to ensure adequate power to address the hypothesis being tested?

-Were correct statistical analysis used to support conclusions?

-Are there concerns about ethical or regulatory requirements being met?

Reviewer #1: (No Response)

Reviewer #2: see general comments

Reviewer #3: Please see attached reviewer comments

--------------------

Results

-Does the analysis presented match the analysis plan?

-Are the results clearly and completely presented?

-Are the figures (Tables, Images) of sufficient quality for clarity?

Reviewer #1: (No Response)

Reviewer #2: see general comments

Reviewer #3: Please see attached reviewer comments

--------------------

Conclusions

-Are the conclusions supported by the data presented?

-Are the limitations of analysis clearly described?

-Do the authors discuss how these data can be helpful to advance our understanding of the topic under study?

-Is public health relevance addressed?

Reviewer #1: (No Response)

Reviewer #2: see general comments

Reviewer #3: Please see attached reviewer comments

--------------------

Editorial and Data Presentation Modifications?

Use this section for editorial suggestions as well as relatively minor modifications of existing data that would enhance clarity. If the only modifications needed are minor and/or editorial, you may wish to recommend “Minor Revision” or “Accept”.

Reviewer #1: (No Response)

Reviewer #2: (No Response)

Reviewer #3: Please see attached reviewer comments

--------------------

Summary and General Comments

Use this section to provide overall comments, discuss strengths/weaknesses of the study, novelty, significance, general execution and scholarship. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. If requesting major revision, please articulate the new experiments that are needed.

Reviewer #1: The authors present an interesting and (as far as I can see) novel approach towards selection between summary measures of central tendancy in calculating egg reduction rates. The rationale is clear, the methods relating to expert elicitation appear to be sound, and the paper is mostly well written. However I do have some concerns relating to the way in which the "agreement" between summary statistics and expert assessments is discussed and presented that I would like the authors to address before publication:

The paper is currently lacking a clear discussion of the fundamental difference between different types of mean at the population level, and the difference between the population mean and sample mean. When reading this paper I think the reader could get the impression that there is in reality "one true mean (to rule them all!)" but of course this is not the case; the population arithmetic mean and the population geometric mean are fundamentally different for any skewed distribution, and it is to be expected that the arithmetic mean of a sample is the best estimator of the population arithmetic mean, and similarly that the geometric mean of a sample is the best estimator of the population geometric mean. I suspect that the authors understand this well enough (from e.g. their discussion of the Dobson et al article) but I think they could do more to spell it out for the less statistically-experienced reader.

What is interesting and useful about this paper is that they make no a-priori assumption regarding which population mean is most relevant: they simply present the distributions to experts and ask which shows the best "average" (in a general sense) egg reduction. I therefore read the results more as an ellicitation exercise where the experts are tasked with selecting which of the population means best reflects their interpretation of the data; and we find that the geometric mean best reflects their qualitative feeling about the distributions. This is most likely a result of the experts tending to down-weight outliers in a similar way to the geometric mean, which is interesting. I wonder how much of this is related to the potential issue regarding refusal of treatment on lines 302-304: do the authors think that the experts were potentially dismissing extreme values as non-representative due to the potential for this type of effect? I think there is scope for a lot more discussion in this area - although it will naturally stray into social science issues with which I am not very familiar, so it could well be that some of these issues are already well described elsewhere.

However, I do think the authors are somewhat over-playing their conclusions - it is not possible based on their data to conclude that the geometric mean is in any sense a "better" measure than the arithmetic mean, rather that the experts they asked tended to down-weight outliers in a similar way to the geometric mean. This is an important point in itself - face validity is a desirable property of such a statistic - but it is not sufficient to recommend that the geometric mean be favoured in all situations. For example, the arithmetic mean remains the best indicator of the total number of eggs being excreted, and therefore the epidemiological infection pressure within the environment. I also think it is meaningless to use sample geometric means when comparing egg reduction rates to another population where arithmetic means were used: this suggests that future studies should present both arithmetic mean reduction for comparison with previous studies as well as geometric mean reduction (or some other metric) to give an estimate that may correspond more closely to an expert evaluation of the distributions. As the paper stands I think these issues could easily be misinterpreted, so I do feel that this needs to be addressed.

I also have a number of more minor comments:

- I am not sure I accept your argument that raw agreement is appropriate - what is to be lost by calculating a Kappa statistic? This is a very standard method that naturally incorporates the chance agreement probability, and also allows you another way of calculating 95% confidence intervals. Please either do this or give a better justification for not doing so.

- Line 49: I know what you mean by "problematic in skewed distribitions" but I would prefer you to describe the issue (i.e. sensitive to extreme values) directly, as it is not necessarily a "problem" in a statistical sense if it yields the best unbiased estimator of the population arithmetic mean (which it typically does)

- Line 74: I am not sure that "While" is appropriate to start this sentence - it seems like you mean something more like resistance is widespread in animal parasites and therefore is also likely in humans? Perhaps rephrase.

- Line 197: Should be "Data from six publications"

- Lne 321-322: It seems like this could use a reference

- I found appendix S1 very useful - thanks for including it. If I were to be critical, I would suggest that density plots are not a particularly useful way of representing these data as they 'smooth out' outliers - histograms (with an appropriate bin) or even empirical cumulative distribution plots may have been better. But the B+W plots are fine so I don't think it's a big problem.

Reviewer #2: Moser and colleagues assessed the agreement in interpreting drug efficacy data applying a variety of measures for central tendency. Based on the differences in agreement across the experts they conclude that the current recommended arithmetic mean should be reported together with an estimate which is more robust to outliers, e.g. geometric mean. Overall, I have three major concerns on the paper for which I would like to see an in depth response.

First, one should not only interpret the efficacy of drugs based on a measure of central tendency only. Instead this should be interpret along the uncertainty intervals around the point estimate. I therefore strongly believe that the recommendation should not focus on including different measures of central tendency (which often give totally different point estimates), rather we need to recommend reporting the appropriate 95% confidence intervals. This is a well established practice in veterinary parasitology, but not in the human parasitology (see also WHO guidelines were conclusions on drug efficacy are based solely based on point estimates).

Second, I strongly doubt whether the applied methodology is appropriate. To me this exercise can only be assessed through a simulation study where a substantial number of datasets with varying mean and skewness of egg counts, number of outliers, sample size and true underlying drug efficacy are generated. Subsequently, the different measures of tendency are applied and there deviation from the true underlying efficacy is then assessed. This would indeed allow for more evidence-based recommendations.

Third, the authors often make references to the field of veterinary parasitology. Although there are indeed clear similarities between the field of veterinary and human parasitology, there are some important differences which, in my opinion, are not always accurately reflected in the current draft. For example, cure rate is not used at all in veterinary parasitology to assess the efficacy of anthelmintic drugs. In contrast to WHO, who only recently recommended ERR based on the arithmetic mean, its veterinary counter part (World Association for the Advancement of Veterinary Parasitology; WAAVP), has been recommending reporting both arithmetic mean and 95% CI since 1992. I therefore would like to propose that the authors either explicitly mention these differences in to the manuscript, or re-direct the focus of the manuscript to the human field only. I would prefer the latter given the focus of the journal, the dataset used and the link to WHO guidelines.

Minor comments

Line 67: Abbreviate soil-transmitted helminths as STHs instead of STH

Line 77: I tend to disagree that recommendation for anthelmintic drugs the primary output would be different from a clinical medicine and public health; this is in particular when drugs are not 100% and CR is affected by the analytic sensitivity of the diagnostic tool. In addition, in veterinary parasitology, CR is not recommended at all, and hence this sentence may need some revision (see also general comment). Note that reference 4 are not the WAAVP guidelines, Coles et al., 1992 Vet Parasitol are.

Line 85: I would strongly recommend not only draw conclusions on the measure of tendency only; taken the uncertainty intervals on board would be quite essential to draw conclusions. It is therefore my belief, that this statement deviates from the fundamental approach to interpret point estimates.

Line 88: it will be important to highlight that each of these different measures of central tendency result in quite different point estimates, which will only further create confusion. Again, one should not only focus on the point estimate.

Line 95: it is not clear to me why assumptions on true drug efficacy or egg distributions would favour one specific mean over the other one. In fact, I strongly believe that this is the only way forward (see also general comment). However, I tend to agree that the simulation studies did not include outliers, that is why it would be good to include this in newly designed simulation study.

Line 123: I found the presentation of the egg counts (providing boxplots separately for baseline and follow-up) rather misleading. Now the experts only have some idea on the central tendency, but not on the individual variation in response across treatment arms (are high excretes at baseline still excreting most or the least eggs at follow-up). As a consequence of this are not provided with all the information to make a proper on judgement on the difference in drug efficacy across treatment arms. This might have been done on purpose, but it once more highlights the need for interpreting the point estimate with uncertainty intervals.

Line 303: I found this sentence rather redundant, it is not really relevant here, or I mis understood the message. Do you mean that arithmetic can be used to assess infection intensity but not for drug efficacy or is a difference made between trials to designed to assess / compare drugs and trials designed to assess efficacy of drugs in large scale deworming progams?

Line 305: in general ERR would not really recommended at all to assess differences in drug efficacy in dose-response trials or in any other trial designed to assess differences. This is because the variation in point-estimates might be to large, that is why CR is probably better (given that subjects are randomized treatment arms stratifying for baseline egg counts).

Line 310: I am not sure whether this statement holds true; reporting ERR based on group means does not exclude variation due to individual response, as long as the uncertainty intervals around the point estimate are provided. As a consequence of this, the following argumentation on the use of simple measures of central tendency over complex models does not hold true anymore (see also first major comment).

Line 348 – 353: overall I largely disagree with the drawn conclusions. The authors are providing the wrong message. In stead of providing 2 different measures of central tendency, which only creates confusion, one should emphasize the need for reporting uncertainty intervals, which may already reflect the impact of outliers.

Reviewer #3: Please see attached reviewer comments

--------------------

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Matthew Denwood

Reviewer #2: No

Reviewer #3: Yes: Luc E. Coffeng

Attachment

Submitted filename: PNTD-D-19-01393 LC.docx

PLoS Negl Trop Dis. doi: 10.1371/journal.pntd.0008185.r003

Decision Letter 1

Sara Lustigman, Matthew C Freeman

23 Feb 2020

Dear Dr. Hattendorf,

Thank you very much for submitting your manuscript "One mean to rule them all? The arithmetic mean based egg reduction rate can be misleading when estimating anthelminthic drug efficacy in clinical trials" for consideration at PLOS Neglected Tropical Diseases. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. The reviewers appreciated the attention to an important topic. Based on the reviews, we are likely to accept this manuscript for publication, providing that you modify the manuscript according to the review recommendations.

Please prepare and submit your revised manuscript within 30 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email.  

When you are ready to resubmit, please upload the following:

[1] A letter containing a detailed list of your responses to all review comments, and a description of the changes you have made in the manuscript. 

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out

[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).

Important additional instructions are given below your reviewer comments.

Thank you again for your submission to our journal. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Matthew C Freeman, MPH, Ph.D.

Associate Editor

PLOS Neglected Tropical Diseases

Sara Lustigman

Deputy Editor

PLOS Neglected Tropical Diseases

***********************

Reviewer's Responses to Questions

Key Review Criteria Required for Acceptance?

As you describe the new analyses required for acceptance, please consider the following:

Methods

-Are the objectives of the study clearly articulated with a clear testable hypothesis stated?

-Is the study design appropriate to address the stated objectives?

-Is the population clearly described and appropriate for the hypothesis being tested?

-Is the sample size sufficient to ensure adequate power to address the hypothesis being tested?

-Were correct statistical analysis used to support conclusions?

-Are there concerns about ethical or regulatory requirements being met?

Reviewer #3: See reviewer attachment

--------------------

Results

-Does the analysis presented match the analysis plan?

-Are the results clearly and completely presented?

-Are the figures (Tables, Images) of sufficient quality for clarity?

Reviewer #3: See reviewer attachment

--------------------

Conclusions

-Are the conclusions supported by the data presented?

-Are the limitations of analysis clearly described?

-Do the authors discuss how these data can be helpful to advance our understanding of the topic under study?

-Is public health relevance addressed?

Reviewer #3: See reviewer attachment

--------------------

Editorial and Data Presentation Modifications?

Use this section for editorial suggestions as well as relatively minor modifications of existing data that would enhance clarity. If the only modifications needed are minor and/or editorial, you may wish to recommend “Minor Revision” or “Accept”.

Reviewer #3: See reviewer attachment

--------------------

Summary and General Comments

Use this section to provide overall comments, discuss strengths/weaknesses of the study, novelty, significance, general execution and scholarship. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. If requesting major revision, please articulate the new experiments that are needed.

Reviewer #1: Thanks for considering my suggestions - I think your additions have greatly improved the manuscript, and it is now much clearer from reading the main paper what exactly you mean by "performance".

I have only a couple of further comments in relation to the abstract (which I think is currently identical to the previous version):

Lines 34-36 currently states: "Among all investigated means, the arithmetic mean showed poorest performance and agreed with the expert opinion in only 64% (bootstrap confidence interval: 60−68)."

I think this is still misleading as it suggests that it BOTH showed poorest performance AND agreed with expert opinion less than the others, whereas in fact you judge performance based on expert opinion. How about something like:

"Among all investigated means, the arithmetic mean showed the poorest performance with only 64% agreement with expert opinion (bootstrap confidence interval: 60−68)."

Line 39: I think the comma should be between 'CI: 78-87)' and 'but' rather than between 'well' and 'after'

Line 42: The wording of "necessarily provide reliable estimates in anthelminthic efficacy" is also a bit vague and potentially misleading. How about something like:

"necessarily rank anthelminthic efficacies in the same order as might be obtained from expert evaluation of the same data"

Line 60: Could you make it more explicit exactly what you mean by performance here, eg: "...showed the poorest performance in terms of agreement with expert opinion"

I think that the authors can be trusted to address these minor issues without need for a further review.

Reviewer #3: See reviewer attachment

--------------------

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Matthew Denwood

Reviewer #3: Yes: Luc E. Coffeng

Figure Files:

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Data Requirements:

Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.

Reproducibility:

To enhance the reproducibility of your results, PLOS recommends that you deposit laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see http://journals.plos.org/plosntds/s/submission-guidelines#loc-materials-and-methods

Attachment

Submitted filename: PNTD-D-19-01393_R1 LC.docx

PLoS Negl Trop Dis. doi: 10.1371/journal.pntd.0008185.r005

Decision Letter 2

Sara Lustigman, Matthew C Freeman

1 Mar 2020

Dear Dr. Hattendorf,

We are pleased to inform you that your manuscript 'One mean to rule them all? The arithmetic mean based egg reduction rate can be misleading when estimating anthelminthic drug efficacy in clinical trials' has been provisionally accepted for publication in PLOS Neglected Tropical Diseases.

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch within two working days with a set of requests.

Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.

IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.

Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS.

Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Neglected Tropical Diseases.

Best regards,

Matthew C Freeman, MPH, Ph.D.

Associate Editor

PLOS Neglected Tropical Diseases

Sara Lustigman

Deputy Editor

PLOS Neglected Tropical Diseases

***********************************************************

PLoS Negl Trop Dis. doi: 10.1371/journal.pntd.0008185.r006

Acceptance letter

Sara Lustigman, Matthew C Freeman

19 Mar 2020

Dear Dr. Hattendorf,

We are delighted to inform you that your manuscript, "One mean to rule them all? The arithmetic mean based egg reduction rate can be misleading when estimating anthelminthic drug efficacy in clinical trials," has been formally accepted for publication in PLOS Neglected Tropical Diseases.

We have now passed your article onto the PLOS Production Department who will complete the rest of the publication process. All authors will receive a confirmation email upon publication.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any scientific or type-setting errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript. Note: Proofs for Front Matter articles (Editorial, Viewpoint, Symposium, Review, etc...) are generated on a different schedule and may not be made available as quickly.

Soon after your final files are uploaded, the early version of your manuscript will be published online unless you opted out of this process. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

Thank you again for supporting open-access publishing; we are looking forward to publishing your work in PLOS Neglected Tropical Diseases.

Best regards,

Serap Aksoy

Editor-in-Chief

PLOS Neglected Tropical Diseases

Shaden Kamhawi

Editor-in-Chief

PLOS Neglected Tropical Diseases

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 File. Example questionnaire.

    (PDF)

    S2 File. Example explaining agreements, scores and relationship between figures.

    (PDF)

    S3 File. Trial characteristics, sensitivity analysis and agreement among different means.

    (PDF)

    Attachment

    Submitted filename: PNTD-D-19-01393 LC.docx

    Attachment

    Submitted filename: PNTD-D-19-01393-Point_by_point_response_v1.docx

    Attachment

    Submitted filename: PNTD-D-19-01393_R1 LC.docx

    Attachment

    Submitted filename: PNTD-D-19-01393-Point_by_point_response_R2.docx

    Data Availability Statement

    The data cannot be shared without restrictions because the authors do not own the data. The data underlying the results presented in the study are available from the authors of the original studies [ref. 15, 21-25].


    Articles from PLoS Neglected Tropical Diseases are provided here courtesy of PLOS

    RESOURCES