Expert opinion as priors for random effects in Bayesian prediction models: Subclinical ketosis in dairy cows as an example

Haifang Ni; Irene Klugkist; Saskia van der Drift; Ruurd Jorritsma; Gerrit Hooijer; Mirjam Nielen

doi:10.1371/journal.pone.0244752

. 2021 Jan 14;16(1):e0244752. doi: 10.1371/journal.pone.0244752

Expert opinion as priors for random effects in Bayesian prediction models: Subclinical ketosis in dairy cows as an example

Haifang Ni ^1,², Irene Klugkist ^2,^*, Saskia van der Drift ³, Ruurd Jorritsma ¹, Gerrit Hooijer ¹, Mirjam Nielen ¹

Editor: Angel Abuelo⁴

PMCID: PMC7808599 PMID: 33444385

Abstract

Random effects regression models are routinely used for clustered data in etiological and intervention research. However, in prediction models, the random effects are either neglected or conventionally substituted with zero for new clusters after model development. In this study, we applied a Bayesian prediction modelling method to the subclinical ketosis data previously collected by Van der Drift et al. (2012). Using a dataset of 118 randomly selected Dutch dairy farms participating in a regular milk recording system, the authors proposed a prediction model with milk measures as well as available test-day information as predictors for the diagnosis of subclinical ketosis in dairy cows. While their original model included random effects to correct for the clustering, the random effect term was removed for their final prediction model. With the Bayesian prediction modelling approach, we first used non-informative priors for the random effects for model development as well as for prediction. This approach was evaluated by comparing it to the original frequentist model. In addition, herd level expert opinion was elicited from a bovine health specialist using three different scales of precision and incorporated in the prediction as informative priors for the random effects, resulting in three more Bayesian prediction models. Results showed that the Bayesian approach could naturally take the clustering structure of clusters into account by keeping the random effects in the prediction model. Expert opinion could be explicitly combined with individual level data for prediction. However in this dataset, when elicited expert opinion was incorporated, little improvement was seen at the individual level as well as at the herd level. When the prediction models were applied to the 118 herds, at the individual cow level, with the original frequentist approach we obtained a sensitivity of 82.4% and a specificity of 83.8% at the optimal cutoff, while with the three Bayesian models with elicited expert opinion, we obtained sensitivities ranged from 78.7% to 84.6% and specificities ranged from 75.0% to 83.6%. At the herd level, 30 out of 118 within herd prevalences were correctly predicted by the original frequentist approach, and 31 to 44 herds were correctly predicted by the three Bayesian models with elicited expert opinion. Further investigation in expert opinion and distributional assumption for the random effects was carried out and discussed.

Introduction

Random effects regression models are routinely used in etiological and intervention research. By including the random effect coefficient in the model, variance between clusters can be taken into account. However, this approach is hardly seen in prediction for clustered data. In traditional prediction models, the random effects are either neglected or conventionally substituted with zero for all new clusters after model development [1].

In a recent simulation study from Ni et al. [2], the authors discussed this neglection in traditional prediction models and developed a Bayesian approach where the random effects could remain in the model for development as well as for prediction. Within the Bayesian framework, a prior distribution for the model parameters reflects the knowledge or uncertainty about the parameters before observing the data. By updating the prior distributions with data, posterior distributions are obtained. One of the advantages of using Bayesian modelling, therefore, is that it can naturally combine multiple sources of evidence [3]. In the study from Ni et al. [2] for instance, (simulated) cluster level expert opinion was used as informative priors for the random effects of new clusters. The simulations showed that the Bayesian models incorporating cluster level expert opinion outperformed the traditional frequentist model, under the assumption that the expert was able to correctly predict in which part of the random effects distribution each cluster was located. A more detailed explanation of this approach will be provided in the methods section.

Prediction modelling is an explicit and empirical approach to estimate disease risk in medicine and veterinary medicine. It follows the evidence based (veterinary) medicine discipline and aims at using the current best evidence in diagnosis and making decisions for the care of individual subjects [4]. In dairy science for instance, attempts have been made to develop diagnostic methods to detect subclinical ketosis (SCK) in dairy cows based on routine milk recording data [e.g., 5,6]. SCK is considered one of the main metabolic disorders in early lactation dairy cows, which is defined by an increased concentration of ketone bodies in body fluids in absence of clinical signs [7]. Analysis of the concentration of acetone and β-hydroxybutyrate (BHBA) in blood is considered the reference test (e.g., [8]). A recent example was published by Van der Drift et al. [9], who proposed a prediction model consisting of routine milk measures as well as available test-day information as predictors. The between herd variance was accounted for by random herd effects when selecting the predictors. The current SCK monitor system in the Netherlands is based on this developed prediction model. While their original model included random herd effects to correct for the clustering of cows, the random effect term was removed for their final prediction model.

In this study, we applied a Bayesian approach to the SCK data collected by Van der Drift et al. [9]. Four Bayesian prediction models were investigated. First, a Bayesian prediction model with non-informative priors for the random effects were evaluated by comparing it to the original model. Three more Bayesian models were explored with herd level expert opinion elicited from a bovine health specialist incorporated in the prediction through informative priors for the random herd effects. The main aim of this study is to explore whether the proposed Bayesian prediction modelling approach is feasible for empirical data, and whether in this dataset, it would outperform the original prediction model without random effects and improve the diagnostic accuracy for SCK.

Materials and methods

Data

Van der Drift et al. collected both blood and milk samples at the individual cow level for the development of the prediction model. Throughout the paper, this model will be labeled as the 2012SCK model. In short, a total of 123 Dutch farms were randomly selected from the milk recording organization the Dutch-Flemish Cattle Improvement Cooperative (CRV), which includes 83.8% of all Dutch dairy farmers. The 123 farms were visited on a planned milk recording test day between November 2009 and November 2010 for data collection. On each farm, all cows between 5 and 60 days in milk (DIM) were blood sampled, which is the risk group for SCK. As a consequence, there were not many eligible cows present on smaller farms. Five farms were excluded due to incompleteness of cow level measures. The final dataset consisted of 1,678 cows from 118 farms (see S1 Appendix). On average, 14 cows per farm were sampled, varying from 3 to 47 with a median of 13. The overall animal prevalence of SCK for the 1,678 cows was 11.2% based on the reference test results in the blood samples. Within herd animal prevalences ranged from 0% to 80% and was not symmetric with a peak at zero (39 herds).

Additional herd information was collected during the farm visit at the test day, including feeding management. Information on milk production for each herd was provided by CRV. Characterization of the feeding management for each herd was collected by means of standard questionnaire for the farmer.

The 2012SCK model

The cow level measures milk acetone, milk BHBA, milk fat-to-protein ratio, parity as well as the herd level measure season were selected as predictors in the 2012SCK logistic regression random effects model. Milk acetone (97.41±116.76 μmol/L) and milk BHBA (74.95±77.82 μmol/L) measures at individual cow level were obtained from routine milk analysis by Fourier transform infrared (FTIR) spectroscopy. Milk fat-to-protein ratio (1.33±0.23), parity of each animal and season during the farm visit at the test day were included as well. Observed binary outcomes were obtained by applying the plasma BHBA threshold of 1,200 μmol/L, above which an animal was considered SCK positive. The random herd effects were assumed to be normally distributed with mean 0 and variance $σ_{u}^{2}$ .

Parameters of the diagnostic model were estimated with maximum likelihood. Prediction of the presence of SCK in cows from new herds was based on point estimates for the regression parameters of the model where the random effect term was removed. All 118 herds were used for model development as well for model prediction. For proper comparisons, the Bayesian prediction models took the same approach.

Bayesian approach

To obtain Bayesian estimates, we used non-informative priors for all the regression parameters of the predictors (normal distributions with mean 0 and variance 10,000) and for the variance of the random herd effects (inverse gamma distribution with both hyperparameters equivalent to 0.001). Three Markov chain Monte Carlo (MCMC) posterior chains were sampled. Within each chain, the first 5,000 iterations were discarded as the burn-in phase. The convergence was visually inspected using trace plots. Proper convergence was observed for all chains and the subsequent 20,000 iterations were used for parameter estimates. For the purpose of reducing computational effort in the prediction phase, the 20,000 iterations were thinned by 100, resulting in 200 per chain and 600 in total. Each of the 600 iterations consisted of a sampled value for the regression coefficients and the variance of the random effects respectively.

For the prediction without incorporation of expert knowledge, in each iteration and for each cluster a value was drawn from the random effects normal distribution with mean zero and the sampled variance. For each cow within each iteration, a predicted risk on SCK was computed by the prediction model. As a result, a distribution of predicted risk was available based on all iterations for each individual. In this study, in order to compare the results between the Bayesian models and the 2012SCK model, the median of the predicted risk distribution was used as the summarized predicted risk, resulting in a single estimate per individual cow. The R code for the Bayesian prediction model without the incorporation of expert prior knowledge can be found in S2 Appendix.

Herd level prior information could be incorporated by sampling the random effects from a specific part of the random effects normal distribution. For instance, when the prior information would indicate a herd to have a below average risk for SCK, the random effect for this herd would be sampled from the lower half of the distribution as it is displayed in Fig 1A. By incorporating herd level prior information, we could thus restrict the parameter space for each random herd effect hence resulting in more precise estimation for the random effects.

Fig 1 — Per part contains either half, or one third, or one fifth of the distribution.

Expert opinion

An experienced ruminant health specialist (GH, dipl. ECBHM, i.e., European College of Bovine Health Management) was asked to give his personal opinion for each herd on the SCK risk in early lactation cows in relation to the total Dutch dairy population.

As previous research indicated that the risk for SCK is related to routine feeding practices (e.g., [10]) and herd level milk yield (e.g., [11]), herd level feeding management and herd level milk production documents that were collected during the farm visit [9] were provided to the expert as a proxy for a farm visit. This herd level information on feeding and milk production was hence incorporated in the prediction model through elicited expert opinion.

We adopted the three scales specified for the expert elicitation from the simulation study [2]. The 2-level scale divided the Dutch dairy population into two groups: the lower 50% risk group and the upper 50% risk group. The 3-level scale divided the population into three equal probability groups, and the 5-level scale divided the population into five groups (see Fig 1). As all herd level information was available for the 5 herds that were not included in the analysis, these herds were used as test herds to pilot and evaluate the instruction and the scoring form for the expert elicitation. The original instruction provided for the expert can be found in S3 Appendix.

Optimal expert opinion and distributional assumption

Optimal expert opinion

In order to provide a benchmark for the best possible results using the proposed approach for this particular dataset, we also explored the predictive performance in case the elicited expert opinion would always be correct. This from here on called 'optimal expert opinion' was defined as placing all clusters in the correct part of the random effects distribution. Determination of the correct part was based on the percentile/ranking of each herd among all 118 observed within herd animal prevalences which were computed using the true disease status of the cows determined by plasma BHBA values. Values for the random intercepts were still randomly drawn from the assigned part of the random effects distribution. Predictions were subsequently made in the same way as for the real expert model.

Distributional assumption

The assumed distribution for random cluster effects is almost always the normal distribution and it is usually chosen for computational convenience [12]. Some researchers argued that misspecification of the random effects distribution had little impact on parameter estimates [13,14], while others pointed out that regression parameter estimates were very sensitive to the random effects distribution and suggested more flexible distributional assumptions [15,16]. Given the asymmetric nature of the SCK within herd prevalences, we also investigated the skew-normal distribution with optimal expert opinion for all Bayesian models [17]. The random effects under the skew-normal assumption were sampled from the asymmetric distribution with mean zero, variance $σ_{u}^{2}$ , and skewness α_u. For parameter estimation, the same non-informative priors were specified as for the normal random effects models. A normal prior distribution with mean zero and standard deviation one was specified for the skewness parameter α_u.

Model notations and computation

We denote our reproduced 2012SCK model as FREQ (i.e., the frequentist model). Further, four Bayesian models were specified. We denote the Bayesian prediction model without herd specific information as Bayes0. The Bayesian models with herd level prior information are denoted as Bayes2 (2-level scale), Bayes3 (3-level scale), Bayes5 (5-level scale) respectively. The model assessment measures used were the same as in the simulation study [2], that is the area under the curve (AUC), Brier score (i.e., mean squared error), calibration slope, sensitivity and specificity at the optimal cutoff, sensitivity at 95% and 90% specificity cutoff. All analyses were carried out in R [18]. The frequentist parameter estimates were obtained by using the package ‘lme4’ [19] and the Bayesian results were obtained by calling Stan from R using the package ‘rstan’ [20].

Results

Estimation

The 2012SCK model was reproduced using the generalized linear model function in 'lme4'. After adjusting the optimizer to ‘bobyqa’, parameter estimates from our analysis were identical to the originally reported results in Van der Drift et al. [9] up to the third decimal, but the standard errors differed slightly. The estimated intraclass correlation coefficient (ICC) was 0.35, and the Nagelkerke’s R² was 41.2%.

Parameters were further estimated in a Bayesian approach using non-informative priors. The point estimates for the regression coefficients from the posterior were similar to the results from the reproduced 2012SCK model (see S4 Appendix).

Prediction and model comparisons

Animal level

The reproduced 2012SCK model (FREQ) showed identical diagnostic accuracy as the originally reported results. Table 1 presents the diagnostic performance of the models at the individual cow level. The effects of including the elicited expert opinion, as well as the simulated optimal expert opinion were assessed within the Bayesian approach. As can be seen, the Bayesian model without herd level information (Bayes0) performed approximately the same as the frequentist model. The Bayesian models with elicited expert opinion showed no improvement in comparison to the frequentist model. However, the Bayesian models with optimal expert opinion slightly outperformed the frequentist model, with Bayes5 showing the best prediction.

Table 1. Animal level measures (n = 1,678) area under the curve (AUC), Brier score, calibration slope, sensitivity (Se), specificity (Sp) using the optimal cutoff, sensitivity using the 95% and 90% specificity cutoffs for the predicted outcomes.

				Elicited expert opinion			Optimal expert opinion
	Optimal Value	FREQ	Bayes0	Bayes2 (2 levels)	Bayes3 (3 levels)	Bayes5 (5 levels)	Bayes2 (2 levels)	Bayes3 (3 levels)	Bayes5 (5 levels)
AUC (%)	100	88.5	88.3	88.2	87.5	88.5	91.1	92.4	92.5
Brier score	0	0.069	0.069	0.071	0.077	0.084	0.062	0.059	0.058
Calibration slope	1	0.809	0.787	0.674	0.605	0.784	0.796	0.832	0.821
Se (optimal cutoff) (%)	100	82.4	81.4	84.6	78.7	80.9	81.4	81.4	88.3
Sp (optimal cutoff) (%)	100	83.8	83.5	75.0	82.8	83.6	85.8	86.7	80.3
Se (95% Sp cutoff) (%)	100	51.1	51.6	49.5	44.7	52.1	56.9	63.3	64.4
Se (90% Sp cutoff) (%)	100	69.7	69.1	66.0	61.7	70.2	74.5	76.1	75.0

Open in a new tab

Herd level

Predicted within herd animal prevalences were compared to the observed prevalences based on the reference test results in blood samples. The predicted prevalence of each herd was calculated from the predicted binary outcome (with disease probability >0.50 as positive) from each cow within the herd using the optimal cutoff (defined as maximum sum of sensitivity and specificity). As expected, the frequentist model and the Bayesian model without herd specific information resulted in similar predictive accuracy. The Bayesian models with elicited expert opinion from the 2-level and 3-level scales had more herds correctly estimated (44 and 43 respectively) compared to the frequentist model (30), and less false positives (19 and 20 respectively) compared to the frequentist model (33). Further, Bayesian models with optimal expert opinion outperformed the Bayesian models with elicited expert opinion regarding the number of false positives (see S4 Appendix).

Expert opinion

Table 2 presents a summary of the number of herds assigned to each risk group within each scale by the expert (see S5 Appendix). More herds were assigned to the lower risk group(s) by the expert than to the higher risk group(s) in all three scales. When the number of risk groups increased (i.e., from 2 to 5 levels), the frequency of disagreement between the elicited expert opinion and the observed within herd animal prevalences increased accordingly. This can also be seen in the three plots (S4 Appendix). It should be noted that in the 5-level scale, as 39 out of 118 herds have zero diseased cows which exceeds 20% that cannot be ranked, the lowest 40% herds are combined into one risk group.

Table 2. Summary of the elicited expert opinion on 118 herds.

Each column presents the number of herds assigned to each risk group within each scale (from the lowest risk to the highest risk).

	2-level scale	3-level scale	5-level scale
Low risk	72	51	33
↓			33
		48	26
			18
High risk	46	19	8
Total	118	118	118

Open in a new tab

Results also reveal that the degree of agreement between the elicited expert opinion and the observed within herd animal prevalence is affected by the herd sample sizes. Table 3 shows that for herds with at least 12 sampled cows, there is more agreement between expert opinion and observed within herd prevalence than herds with less than 12 sampled cows.

Table 3. The number of herds agreed between the elicited expert opinion and the observed within herd animal prevalence on the relative position of each herd among the 118 herds.

	Agreement (%)
	Herd sample size <12 (n = 48)	Herd sample size ≥12 (n = 70)
2-level scale	30 (62.5)	45 (64.3)
3-level scale	18 (37.5)	38 (54.3)
5-level scale	19 (39.6)	31 (44.3)

Open in a new tab

Distributional assumption

At the individual cow level, the four Bayesian models with a skew-normal distribution for the random effects performed similar to the Bayesian models with the normal distribution. At the herd level, the skew-normal models had more herds correctly estimated and less false positives at the alarm level of 10% (see S4 Appendix).

Splitting data into a training and a test set

In order to compare objectively between results from the frequentist model in the original study Van der Drift et al. (2012) and the Bayesian prediction models in this study, we used the same dataset for model development as well as for model prediction as it was done in the original study. However, we additionally performed model comparisons based on a 80/20 training/test set approach as follows. About 80% of the 118 herds were randomly selected for model development, resulting in 94 herds with 1,331 cows. The number of cows per herd was approximately 14 on average and 32 out of the 94 herds had zero diseased animals. The parameter estimates from the frequentist as well as the Bayesian approach of the training set are shown in Table 5S of S4 Appendix, while the prediction results on the test set are found in Table 6S of S4 Appendix.

Discussion

This study demonstrates an application of a Bayesian prediction modelling approach [2] that incorporates the clustering structure by keeping the random effects in the prediction model. In addition, this approach provides a natural framework to combine evidence from various sources, such as expert in the field in our SCK example. Herd level expert opinion provided by the bovine health specialist enabled us to combine herd specific information with individual level milk measures and available test-day data in the prediction. However in this dataset, little improvement was seen in prediction resulting from the Bayesian models with elicited expert opinion incorporated in comparison to the prediction from the original frequentist model. The predictive performance from the three Bayesian models with different levels of precision in expert opinion remained poor at the individual cow level. At the herd level, the Bayesian prediction models showed slightly higher diagnostic accuracy, with more within herd prevalences being correctly estimated and less false positives at the alarm level of 10%. We therefore conclude in agreement with Van der Drift et al. [9] that this prediction models, both without and with the addition of herd level information, are not suited for the cow level SCK diagnosis.

A reason to include the simulated optimal expert was to rule out the possibility that predictive performance did not improve because the elicited expert opinion was of suboptimal quality. In the current study, we observed that the expert tended to underestimate the herd risk for SCK, as more herds were assigned to the lower risk group(s). In practice, one could try to improve the quality of the expert knowledge by eliciting multiple experts and the application of approved methods to reach agreement between the experts [21]. In this study, we decided to simulate expert information under the assumption that the expert was always correct. Note that this was a methodological exercise to further investigate the potential of the proposed approach for this particular dataset, and not an approach that should be applied in practice to reach better predictive performance. Although the simulation study from Ni et al. [2] demonstrated that the Bayesian models including correct cluster level expert opinion was able to provide better predictions, the improvement in predictive performance in this study was limited. Therefore, we investigated several other possible explanations for lack of (substantial) improvement as well.

Another explanation for little improvement in prediction both at the individual cow level and at the herd level might be the small sample sized herds in this study. In herds that consisted of at least 12 sampled cows, the degree of agreement was higher between the observed within herd prevalences resulted from the reference test with blood samples and the elicited expert opinion than in herds with less than 12 sampled cows in all three scales. As Oetzel [8] concluded in his study, the minimum sample size for herd-based tests that gave moderate confidence (75% or more) was 12. In our dataset, the observed within herd prevalences from the small sample sized herds (n<12) may therefore not represent the true prevalence for these herds. However, a sensitivity analysis in including only the 70 herds that had at least 12 cows (n≥12) showed similar results compared to 118 herds (results not shown).

Also, the relatively low ICC in this dataset may limit the benefit of incorporating cluster level prior information. Using the data from all 118 herds, the ICC was 0.35. However, 39 out of 118 herds had zero SCK animals which influenced the ICC and the subsequent random effects estimation. The estimated variance of the random effects was 1.792 when all 118 herds were included, but reduced to 0.539 when 39 herds with zero diseased cows were removed. This removal also reduced the ICC substantially, to the value 0.14. A lower ICC leaves less potential influence of the clustering effect, hence less benefit from adding herd level prior information to a prediction model.

Finally, the distributional assumption for the random effects may have influenced the estimation as well. The normal distribution was examined by comparing it with skew-normal distribution within the Bayesian models using the optimal expert opinion. The model with random herd effects under skew-normal distributional assumption did not show better prediction at the individual cow level than the model with random effects under normality assumption. However, the skew-normal random effects Bayesian models provided better predictions at the herd level than the respective Bayesian models with normal random effects, which indicated that the skew-normal random effects distribution may be better suited for zero-inflated data.

The regression models based on the training set showed very similar parameter estimates to the full dataset, albeit with larger standard errors for the regression coefficients and lower variance for the random effects. The model assessments on the test set showed less favorable prediction results for all methods, as was to be expected, but did not alter our conclusions about the comparisons between the methods.

Conclusions

This study illustrates how the Bayesian prediction modelling approach can take the clustering effect into account and how cluster level expert opinion can be combined with individual level data. However in this dataset, incorporation of elicited expert opinion did not improve prediction at the individual level nor at the herd level. Therefore, further investigation of the potential gain of using this approach requires applications in studies where the between cluster variance is relatively large and where all clusters harbor individuals with the outcome under study.

Supporting information

S1 Appendix. Appendix A: Model data.

(XLSX)

Click here for additional data file.^{(63KB, xlsx)}

S2 Appendix. Appendix B: R code for the Bayesian model without incorporating prior expert knowledge.

(DOCX)

Click here for additional data file.^{(25.7KB, docx)}

S3 Appendix. Appendix C: Instruction for expert elicitation.

(DOCX)

Click here for additional data file.^{(30.3KB, docx)}

S4 Appendix. Appendix D: Additional tables and figures.

(DOCX)

Click here for additional data file.^{(413.7KB, docx)}

S5 Appendix. Appendix E: Expert elicitation results.

(XLSX)

Click here for additional data file.^{(12.7KB, xlsx)}

Acknowledgments

The authors thank Hiemke M. Knijn from the Dutch-Flemish Cattle Improvement Cooperative (CRV) for providing the milk production data. Dr. Tine van Werven and Joost de Veer are gratefully acknowledged for their help in pre-testing the expert elicitation form and giving valuable feedback for improvement in the instruction.

Abbreviations

SCK: subclinical ketosis
BHBA: β-hydroxybutyrate
2012SCK model: the original prediction model from Van der Drift et al. (2012)
DIM: days in milk
CRV: the Dutch-Flemish Cattle Improvement Cooperative
MCMC: Markov chain Monte Carlo
FREQ: frequentist model without random effects
ECBHM: European College of Bovine Health Management
Bayes0: Bayesian prediction model without herd specific information
Bayes2: Bayesian prediction model with herd specific information using 2-level scale
Bayes3: Bayesian prediction model with herd specific information using 3-level scale
Bayes5: Bayesian prediction model with herd specific information using 5-level scale

Data Availability

Most data are contained within the manuscript and/or Supporting Information files, except the milk recording data collected by the Dutch-Flemish Cattle Improvement Cooperative (CRV). This part of data cannot be shared publicly because according to article 7 of the data transfer agreement between Utrecht University and CRV, we are not allowed to transfer the data to a third party as it contains business-sensitive information. CRV informed us that they will consider disclosure of the data when asked to do so. In that case, they will determine whether the requested data contains business-sensitive information. Interested researchers who meet the criteria for access to confidential data can contact via CRV senior researcher Hiemke Knijn (hiemke.knijn@crv4all.com) to request the dataset under the name "Milk recording data for Van der Drift et al. 2012". The future researchers will have the same access as the authors to these data.

Funding Statement

Department of Population Health Sciences, division Farm Animal Health provided support in the form of salary for HN, RJ, GH and MN. Department of Methodology and Statistics provided support in the form of salary for HN, IK. Royal GD Animal Health provided support in the form of salary for SD. Dutch-Flemish Cattle Improvement Cooperative (CRV) organization provided support in funding for the initial data collection by SD. The funder of the initial data collection (CRV) did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

1.Bouwmeester W, Zuithoff NPA, Mallett S, Geerlings MI, Vergouwe Y, Steyerberg EW, et al. Reporting and methods in clinical prediction research: a systematic review. PLoS Med. 2012;9:1–12. 10.1371/journal.pmed.1001221 [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Ni H, Groenwold RHH, Nielen M, Klugkist I. Prediction models for clustered data with informative priors for the random effects: A simulation study. BMC Medical Research Methodology 2018;18:83 10.1186/s12874-018-0543-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Spiegelhalter DJ, Abrams KR, Myles JP. Bayesian approaches to clinical trials and health-care evaluation. Chichester: John Wiley & Sons Ltd; 2004. [Google Scholar]
4.Steyerberg EW. Clinical prediction models: A practical approach to development, validation and updating. Springer, New York; 2009. [Google Scholar]
5.Jorritsma R, Baldée SJC, Schukken YH, Wensing T, Wentink GH. Evaluation of a milk test for detection of subclinical ketosis. Vet. Q. 1998;20:108–110. 10.1080/01652176.1998.9694851 [DOI] [PubMed] [Google Scholar]
6.Krogh MA, Toft N, Enevoldsen C. Latent class evaluation of a milk test, a urine test, and the fat-to-protein percentage ratio in milk to diagnose ketosis in dairy cows. Journal of Dairy Science 2011;94:2360–2367. 10.3168/jds.2010-3816 [DOI] [PubMed] [Google Scholar]
7.Tremblay M, Kammer M, Lange H, Plattner S, Baumgartner C, Stegeman JA, et al. Identifying poor metabolic adaptation during early lactation in dairy cows using cluster analysis. Dairy Sci. 2018;101:7311–7321. 10.3168/jds.2017-13582 [DOI] [PubMed] [Google Scholar]
8.Oetzel GR. Monitoring and testing dairy herds for metabolic disease. Vet. Clin. North Am. Food Anim. Pract. 2004;20:651–674. 10.1016/j.cvfa.2004.06.006 [DOI] [PubMed] [Google Scholar]
9.Van der Drift SGA, Jorritsma R, Schonewille JT, Knijn HM, Stegeman JA. Routine detection of hyperketonemia in dairy cows using Fourier transform infrared spectroscopy analysis of β-hydroxybutyrate and acetone in milk in combination with test-day information. J Dairy Sci. 2012;95:4886–98. 10.3168/jds.2011-4417 [DOI] [PubMed] [Google Scholar]
10.Goldhawk C, Chapinal N, Veira DM, Weary DM, Von Keyserlingk MA. Prepartum feeding behavior is an early indicator of subclinical ketosis. J Dairy Sci. 2009;92:4971–4977. 10.3168/jds.2009-2242 [DOI] [PubMed] [Google Scholar]
11.Raboisson D, Mounié M, Maigné E. Diseases, reproductive performance, and changes in milk production associated with subclinical ketosis in dairy cows: a meta-analysis and review. J Dairy Sci. 2014;97:7547–63. 10.3168/jds.2014-8237 [DOI] [PubMed] [Google Scholar]
12.Agresti A, Caffo B, Ohman-Strickland P. Examples in which misspecification of a random effects distribution reduces efficiency, and possible remedies. Computational Statistics and Data Analysis 2004;47:639–653. [Google Scholar]
13.Butler SM, Louis TA. Random effects models with non-parametric priors. Statistics in Medicine 1992;11:1981–2000. 10.1002/sim.4780111416 [DOI] [PubMed] [Google Scholar]
14.Heagerty PJ, Kurland BF. Misspecified maximum likelihood estimates and generalised linear mixed models. Biometrika 2001;88:973–985. [Google Scholar]
15.Litière S, Alonso A, Molenberghs G. Type I and type II error under random-effects misspecification in generalized linear mixed model. Biometrics 2007;63:1038–1044. 10.1111/j.1541-0420.2007.00782.x [DOI] [PubMed] [Google Scholar]
16.McCulloch CE, Neuhaus JM. Misspecifying the shape of a random effects distribution: why getting it wrong may not matter. Statistical science 2011;26:388–402. [Google Scholar]
17.Azzalini A, Capitanio A. The skew-normal and related families. Cambridge university press, New York; 2014. [Google Scholar]
18.R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, ISBN 3-900051-07-0. URL http://www.R-project.org. 2013. [Google Scholar]
19.Bates D, Maechler M, Bolker B, Walker S. Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software 2015;67:1–48. [Google Scholar]
20.Stan Development Team. RStan: the R interface to Stan. R package version 2.17.3. http://mc-stan.org/. 2018.
21.O’Hagan A, Buck CE, Daneshkhah A, Eiser JR, Garthwaite PH, Jenkinson DJ, et al. Uncertain Judgements: Eliciting Experts’ Probabilities. John Wiley & Sons, Chichester; 2006. [Google Scholar]

PLoS One. doi: 10.1371/journal.pone.0244752.r001

Decision Letter 0

Angel Abuelo

27 Aug 2020

PONE-D-20-17985

Including cluster level expert opinion as priors for random effects in a prediction model: an empirical example with subclinical ketosis data in dairy cows

PLOS ONE

Dear Dr. Ni,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Thank you for this interesting manuscript. Both reviewers report that this is a well-written article and are supportive of its publication. I concur with their view. However, the reviewers have requested for some clarifications in different aspects that would strengthen the manuscript. Please address these comments.

Please submit your revised manuscript by Oct 11 2020 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

We look forward to receiving your revised manuscript.

Kind regards,

Angel Abuelo, DVM, MRes, MSc, PhD, DABVP (Dairy), DECBHM

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2.We note that you have indicated that data from this study are available upon request. PLOS only allows data to be available upon request if there are legal or ethical restrictions on sharing data publicly. For information on unacceptable data access restrictions, please see http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions.

In your revised cover letter, please address the following prompts:

a) If there are ethical or legal restrictions on sharing a de-identified data set, please explain them in detail (e.g., data contain potentially identifying or sensitive patient information) and who has imposed them (e.g., an ethics committee). Please also provide contact information for a data access committee, ethics committee, or other institutional body to which data requests may be sent.

b) If there are no restrictions, please upload the minimal anonymized data set necessary to replicate your study findings as either Supporting Information files or to a stable, public repository and provide us with the relevant URLs, DOIs, or accession numbers. Please see http://www.bmj.com/content/340/bmj.c181.long for guidelines on how to de-identify and prepare clinical data for publication. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories.

We will update your Data Availability statement on your behalf to reflect the information you provide.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: Manuscript ID: PONE-D-20-17985

Manuscript Title: Including cluster level expert opinion as priors for random effects in a prediction model: an empirical example with subclinical ketosis data in dairy cows

The manuscript covered thoroughly the different subject aspects. However, I have raised some concerns and comments, which could improve the quality of the current version of the manuscript.

Title

I think it should be rewritten in a shorter format and consider adding something about Bayesian analysis.

I suggest this one “Expert opinion as priors for random effects in a Bayesian prediction model: subclinical ketosis in dairy cows as an example”

Abstract

It is well written and informative, but I think the authors should add some information about the study population. Similarly, information on the type of the used model is missing. The estimates from the four models should be listed in the abstract.

Introduction

The introduction is well-written and addresses the topic satisfactory, however, the authors should give some background on Bayesian modeling and how it works or at least mention the reasons that make it preferable over conventional modeling.

Line 59. Change “…included in the priors of the random effects …” to “…used as priors of the random effects ...”

Line 67-69. You need to mention a couple of those attempts (e.g., Krogh et al., 2011, DOI: 10.3168/jds.2010-3816) that worked on the diagnostics of subclinical ketosis.

Line 76-77. Please, explain why you used weakly-informative priors.

Line 77. Using prior information in Bayesian modelling is very common. Priors could be extracted from previous relevant research studies or expert opinions in the field. I think it may deserve to run four models; a model with non-prior information, a model with expert opinions as priors, a model with priors extracted from literature and the last one without priors (zero). With this approach, you can show the variations among the four different model scenarios then compare between them and choose the best one based on DIC (lowest value).

M&M

Line 86-88. I think this is more relevant to the introduction. In fact, I encourage the authors to add a short paragraph on bovine ketosis to the section of introduction.

Line 90. What do you mean by cow level? This can be understandable for blood but for milk, we have 4 quarters. Please clarify it.

Line 92. “.. were randomly selected and visited to collect data..” please, add further details on the method of randomization, what were the selection criteria for inclusion in the study and the exclusion criteria as well. Additionally, what are the type of data that they collected?

Line 94. I think the order of Appendix files should start with A first then B, so on. Here, you are starting with C. do you have a specific reason?

Line 95-96. “… prevalence of SCK for the 1,678 cows was 11.2%. …” based on blood samples or milk samples?

Line 104. I think it is important to add some descriptive summary statistics about the cow level measures milk acetone, milk BHBA, milk fat-to-protein ratio, parity as well as the herd level measures.

Line 107. Clarify the positivity and negativity for the readers. i.e., cow having BHBA threshold of ≥1,200 μmol/L is positive?

Line 115. For the Bayesian approach, how did you assessed the convergence of the MCMC chain? did you checked the visual inspection of the time-series plots? Did you check the Gelman-Rubin diagnostic plots? How did you assess the significance level in Bayesian analysis?

Please, support your Bayesian analysis with 1-2 references

Line 144. I just wonder why the authors decided to consider only the personal opinion of one expert. Why not considering two or three experts and compare between them? Is this expert one of the co-authors of this manuscript? I think the expert should be different from the co-authors for an independent judgment and avoid any possible bias.

As I mentioned above, I suggest the authors to run 4 models and compare between them using the Deviance Information Criteria (DIC) according to Spiegelhalter et al. (Spiegelhalter, D.J., Best, N.G., Carlin, B.P., van der Linde, A., 2002. Bayesian measures of model complexity and fit (with discussion). J. Roy. Stat.Soc. B 64, 583–640).

Line 179-180. Which four Bayesian models?

I think I lost here. I miss the justification for using the weakly-informative priors.

Results

It would be much better if you started the results to briefly describe the study population, cows and herd measures.

Line 216. As I mentioned above, it is important to compare between the models using DIC value to choose the best model. you can also use the Youden’s index (Y) to compare the overall performance based on the Se and Sp with the highest value being generally the most preferable at the tested cutoff in table 1.

Discussion

Line 280-282. That is true. I have raised this issue in my comments. Using more than one expert is more reliable and provides more precise estimates. In addition, it minimizes the mis-classification. Since you have carried out it based on the opinion of one expert thus, you should flag this as a study limitation. On top of that, I strongly suggest to prior information from previous literature (and they are many either at herd or cow level), which could give more robust and precise results.

Line 291. Another explanation of what exactly? Clarify, please

Line 302. “many herds that had zero diseased animals” This is not clear in the manuscript because there is no descriptive statistics were presented or discussed.

- I think the discussion still needs further improvements and the authors should focus more on discussing the main findings as well as the interpretation of the difference between the different types of Bayesian models and conventional one. Almost half of the discussion, the authors are discussing the limitations of the study (one expert and small size herds) rather than the main study findings.

The conclusion is very generic and does not precisely reflect the study. The authors developed a Bayesian model and compared it with a conventional model for subclinical ketosis in cattle. This is not clear in your conclusion.

I like the way you wrote the conclusion in the abstract. It clearly reflects and summarize the main point from this research study.

- Where is the conflict of interest statement?

### END ###

Reviewer #2: This is quite a nice piece of work that I enjoyed reading. Well done to the authors.

I just have a few general comments and a small number of specific ones.

There is a significant problem of overfitting when using training data to evaluate the predictive ability of a model. Conventionally, datasets are split into training and testing datasets to avoid this. The authors should consider this approach as it may change the results of their study.

In addition to this, it is not clear when reading the abstract which data are being predicted. It is important to be transparent about whether these predictions are applied to new (test) data, or the same data that the model was built on.

Line 76-79 - reads like a summary of the materials and methods - don't think this is required here

Line 86-89 reads more like introductory material

Line 94-97 - What were the criteria for sampling per farm? (why were only 3 sampled one farm?)

What stage of lactation were animals sampled in?

Line 148-150 - Was this information also used in the predictive model, or was it only available to the expert? They are herd level data so probably just the expert but it should be made clear.

Line 225 - Can you expand on the herd level predictions? I think you probably need some additional information on this on the materials and methods also. In lines 225-233, It appears that you are predicting the probability for each cow, and then converting this to a predicted App Prev for each herd. But you will need a cut off point above which an animal will be considered predicted positive? The numbers from 228-233 read more like the individual level rather than the herd? Can you clarify this please?

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2021 Jan 14;16(1):e0244752. doi: 10.1371/journal.pone.0244752.r002

Author response to Decision Letter 0

16 Oct 2020

Dear Madam/Sir,

We hereby submit our revised manuscript “Expert opinion as priors for random effects in Bayesian prediction models: subclinical ketosis in dairy cows as an example” (PONE-D-20-17985) for publication in PLOS ONE.

We like to thank the editor and reviewers for providing valuable feedback on our manuscript. Response to reviewers is now included as a separate file and all the changes we made are listed. A marked-up copy of the manuscript that highlights changes is uploaded as well under the file name “Revised Manuscript with Track Changes”.

This work is original and has not been submitted to another journal. All authors have read the manuscript before submission and declared no conflict of interest.

On behalf of all authors,

Haifang Ni

Attachment

Submitted filename: Response to Reviewers.docx

Click here for additional data file.^{(39.8KB, docx)}

PLoS One. doi: 10.1371/journal.pone.0244752.r003

Decision Letter 1

Angel Abuelo

2 Nov 2020

PONE-D-20-17985R1

Expert opinion as priors for random effects in Bayesian prediction models: subclinical ketosis in dairy cows as an example

PLOS ONE

Dear Dr. Ni,

Thank you for your revised manuscript and pro-actively addressing the reviewers' suggestions. Both reviewers are supportive of the manuscript being accepted, provided their additional suggestions are included. I concur with their view. Particularly, regarding the additional discussion points raised by reviewer #2, which are essential.

Please submit your revised manuscript by Dec 17 2020 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

We look forward to receiving your revised manuscript.

Kind regards,

Angel Abuelo, DVM, MRes, MSc, PhD, DABVP (Dairy), DECBHM

Academic Editor

PLOS ONE

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #2: (No Response)

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

Reviewer #1: Yes

Reviewer #2: No

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

Reviewer #1: Yes

Reviewer #2: Yes

**********

6. Review Comments to the Author

Reviewer #1: Manuscript Number: PONE-D-20-17985R1

Manuscript Title: Expert opinion as priors for random effects in Bayesian prediction models: subclinical ketosis in dairy cows as an example

Thank you for addressing the raised comments appropriately and implementing the necessary modifications within the manuscript. Your responses are satisfactory and very thorough. The quality of the new version of the manuscript looks much better and more consistent. I do not have further comments except just a minor comment. I suggest adding the R script of the Bayesian model preferably in a text format as a supplementary file/ Appendix.

### END ###

Reviewer #2: The authors have done a good job of addressing the comments of both reviewers. I just have two follow up general comments:

I think the authors still need to discuss the impact of using the same data to build the model to evaluate its predictive ability. Surely the reason such analyses are undertaken is that they could be of some practical use in the field? In that regard, there is no point in developing a model that can predict very well within the dataset it is constructed on but falls over when applied to new data. Ultimately it us up to the reviewers how they address this, but at the very least this should be clearly and transparently discussed in the discussion section. I do not think that just because the previous authors did it this way is a good explanation for the approach taken.

Regarding sampling criteria

The authors recruited cows between 5-60 DIM. However, within this range, two different subtypes of SCK occur, that is Type I in the latter half of the window and Type II SCK in the earlier part of the window. Given these subtypes have two different physiological mechanisms, it is reasonable to assume that the risk factors for each subtype might be different. How have the authors dealt with this?

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Yasser Mahmmod

Reviewer #2: No

PLoS One. 2021 Jan 14;16(1):e0244752. doi: 10.1371/journal.pone.0244752.r004

Author response to Decision Letter 1

14 Dec 2020

Dear Madam/Sir,

We would like to thank again the editor and reviewers for providing valuable feedback on our manuscript. Response to reviewers is now included as a separate file and all the changes we made are listed. A marked-up copy of the manuscript that highlights changes is uploaded as well under the file name “Revised Manuscript with Track Changes”.

This work is original and has not been submitted to another journal. All authors have read the manuscript before submission and declared no conflict of interest.

On behalf of all authors,

Haifang Ni

Attachment

Submitted filename: Response to Reviewers.docx

Click here for additional data file.^{(21.6KB, docx)}

PLoS One. doi: 10.1371/journal.pone.0244752.r005

Decision Letter 2

Angel Abuelo

16 Dec 2020

Expert opinion as priors for random effects in Bayesian prediction models: subclinical ketosis in dairy cows as an example

PONE-D-20-17985R2

Dear Dr. Ni,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Angel Abuelo, DVM, MRes, MSc, PhD, DABVP (Dairy), DECBHM

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

PLoS One. doi: 10.1371/journal.pone.0244752.r006

Acceptance letter

Angel Abuelo

22 Dec 2020

PONE-D-20-17985R2

Expert opinion as priors for random effects in Bayesian prediction models: subclinical ketosis in dairy cows as an example

Dear Dr. Ni:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Angel Abuelo

Academic Editor

PLOS ONE

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

S1 Appendix. Appendix A: Model data.

(XLSX)

Click here for additional data file.^{(63KB, xlsx)}

S2 Appendix. Appendix B: R code for the Bayesian model without incorporating prior expert knowledge.

(DOCX)

Click here for additional data file.^{(25.7KB, docx)}

S3 Appendix. Appendix C: Instruction for expert elicitation.

(DOCX)

Click here for additional data file.^{(30.3KB, docx)}

S4 Appendix. Appendix D: Additional tables and figures.

(DOCX)

Click here for additional data file.^{(413.7KB, docx)}

S5 Appendix. Appendix E: Expert elicitation results.

(XLSX)

Click here for additional data file.^{(12.7KB, xlsx)}

Attachment

Submitted filename: Response to Reviewers.docx

Click here for additional data file.^{(39.8KB, docx)}

Attachment

Submitted filename: Response to Reviewers.docx

Click here for additional data file.^{(21.6KB, docx)}

Data Availability Statement

[pone.0244752.ref001] 1.Bouwmeester W, Zuithoff NPA, Mallett S, Geerlings MI, Vergouwe Y, Steyerberg EW, et al. Reporting and methods in clinical prediction research: a systematic review. PLoS Med. 2012;9:1–12. 10.1371/journal.pmed.1001221 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0244752.ref002] 2.Ni H, Groenwold RHH, Nielen M, Klugkist I. Prediction models for clustered data with informative priors for the random effects: A simulation study. BMC Medical Research Methodology 2018;18:83 10.1186/s12874-018-0543-5 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0244752.ref003] 3.Spiegelhalter DJ, Abrams KR, Myles JP. Bayesian approaches to clinical trials and health-care evaluation. Chichester: John Wiley & Sons Ltd; 2004. [Google Scholar]

[pone.0244752.ref004] 4.Steyerberg EW. Clinical prediction models: A practical approach to development, validation and updating. Springer, New York; 2009. [Google Scholar]

[pone.0244752.ref005] 5.Jorritsma R, Baldée SJC, Schukken YH, Wensing T, Wentink GH. Evaluation of a milk test for detection of subclinical ketosis. Vet. Q. 1998;20:108–110. 10.1080/01652176.1998.9694851 [DOI] [PubMed] [Google Scholar]

[pone.0244752.ref006] 6.Krogh MA, Toft N, Enevoldsen C. Latent class evaluation of a milk test, a urine test, and the fat-to-protein percentage ratio in milk to diagnose ketosis in dairy cows. Journal of Dairy Science 2011;94:2360–2367. 10.3168/jds.2010-3816 [DOI] [PubMed] [Google Scholar]

[pone.0244752.ref007] 7.Tremblay M, Kammer M, Lange H, Plattner S, Baumgartner C, Stegeman JA, et al. Identifying poor metabolic adaptation during early lactation in dairy cows using cluster analysis. Dairy Sci. 2018;101:7311–7321. 10.3168/jds.2017-13582 [DOI] [PubMed] [Google Scholar]

[pone.0244752.ref008] 8.Oetzel GR. Monitoring and testing dairy herds for metabolic disease. Vet. Clin. North Am. Food Anim. Pract. 2004;20:651–674. 10.1016/j.cvfa.2004.06.006 [DOI] [PubMed] [Google Scholar]

[pone.0244752.ref009] 9.Van der Drift SGA, Jorritsma R, Schonewille JT, Knijn HM, Stegeman JA. Routine detection of hyperketonemia in dairy cows using Fourier transform infrared spectroscopy analysis of β-hydroxybutyrate and acetone in milk in combination with test-day information. J Dairy Sci. 2012;95:4886–98. 10.3168/jds.2011-4417 [DOI] [PubMed] [Google Scholar]

[pone.0244752.ref010] 10.Goldhawk C, Chapinal N, Veira DM, Weary DM, Von Keyserlingk MA. Prepartum feeding behavior is an early indicator of subclinical ketosis. J Dairy Sci. 2009;92:4971–4977. 10.3168/jds.2009-2242 [DOI] [PubMed] [Google Scholar]

[pone.0244752.ref011] 11.Raboisson D, Mounié M, Maigné E. Diseases, reproductive performance, and changes in milk production associated with subclinical ketosis in dairy cows: a meta-analysis and review. J Dairy Sci. 2014;97:7547–63. 10.3168/jds.2014-8237 [DOI] [PubMed] [Google Scholar]

[pone.0244752.ref012] 12.Agresti A, Caffo B, Ohman-Strickland P. Examples in which misspecification of a random effects distribution reduces efficiency, and possible remedies. Computational Statistics and Data Analysis 2004;47:639–653. [Google Scholar]

[pone.0244752.ref013] 13.Butler SM, Louis TA. Random effects models with non-parametric priors. Statistics in Medicine 1992;11:1981–2000. 10.1002/sim.4780111416 [DOI] [PubMed] [Google Scholar]

[pone.0244752.ref014] 14.Heagerty PJ, Kurland BF. Misspecified maximum likelihood estimates and generalised linear mixed models. Biometrika 2001;88:973–985. [Google Scholar]

[pone.0244752.ref015] 15.Litière S, Alonso A, Molenberghs G. Type I and type II error under random-effects misspecification in generalized linear mixed model. Biometrics 2007;63:1038–1044. 10.1111/j.1541-0420.2007.00782.x [DOI] [PubMed] [Google Scholar]

[pone.0244752.ref016] 16.McCulloch CE, Neuhaus JM. Misspecifying the shape of a random effects distribution: why getting it wrong may not matter. Statistical science 2011;26:388–402. [Google Scholar]

[pone.0244752.ref017] 17.Azzalini A, Capitanio A. The skew-normal and related families. Cambridge university press, New York; 2014. [Google Scholar]

[pone.0244752.ref018] 18.R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, ISBN 3-900051-07-0. URL http://www.R-project.org. 2013. [Google Scholar]

[pone.0244752.ref019] 19.Bates D, Maechler M, Bolker B, Walker S. Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software 2015;67:1–48. [Google Scholar]

[pone.0244752.ref020] 20.Stan Development Team. RStan: the R interface to Stan. R package version 2.17.3. http://mc-stan.org/. 2018.

[pone.0244752.ref021] 21.O’Hagan A, Buck CE, Daneshkhah A, Eiser JR, Garthwaite PH, Jenkinson DJ, et al. Uncertain Judgements: Eliciting Experts’ Probabilities. John Wiley & Sons, Chichester; 2006. [Google Scholar]

PERMALINK

Expert opinion as priors for random effects in Bayesian prediction models: Subclinical ketosis in dairy cows as an example

Haifang Ni

Irene Klugkist

Saskia van der Drift

Ruurd Jorritsma

Gerrit Hooijer

Mirjam Nielen

Roles

Abstract

Introduction

Materials and methods

Data

The 2012SCK model

Bayesian approach

Fig 1. The random effects distribution divided into multiple parts of equal proportions in three different scales.

Expert opinion

Optimal expert opinion and distributional assumption

Optimal expert opinion

Distributional assumption

Model notations and computation

Results

Estimation

Prediction and model comparisons

Animal level

Table 1. Animal level measures (n = 1,678) area under the curve (AUC), Brier score, calibration slope, sensitivity (Se), specificity (Sp) using the optimal cutoff, sensitivity using the 95% and 90% specificity cutoffs for the predicted outcomes.

Herd level

Expert opinion

Table 2. Summary of the elicited expert opinion on 118 herds.

Table 3. The number of herds agreed between the elicited expert opinion and the observed within herd animal prevalence on the relative position of each herd among the 118 herds.

Distributional assumption

Splitting data into a training and a test set

Discussion

Conclusions

Supporting information

Acknowledgments

Abbreviations

Data Availability

Funding Statement

References

Decision Letter 0

Angel Abuelo

Roles

Author response to Decision Letter 0

Decision Letter 1

Angel Abuelo

Roles

Author response to Decision Letter 1

Decision Letter 2

Angel Abuelo

Roles

Acceptance letter

Angel Abuelo

Roles

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases