Abstract
Background
The purpose of this article is to discuss the statistical methods for agreement analysis used in Richelle’s article (BMC Med Educ 22:335, 2022). The authors investigated the attitudes of final-year medical students regarding substance use during pregnancy and identified the factors that influence these attitudes.
Methods
We found that Cohen’s kappa value for measuring the agreement between these medical students’ attitudes towards drugs/alcohol use during pregnancy was questionable. In addition, we recommend using weighted kappa instead of Cohen’s kappa for agreement analysis at the presence of three categories.
Results
The agreement improved from “good” (Cohen’s kappa) to “very good” (weighted kappa) for medical students’ attitudes towards drugs/alcohol use during pregnancy.
Conclusion
To conclude, we recognize that this does not significantly alter the conclusions of the Richelle et al. paper, but it is necessary to ensure that the appropriate statistical tools are used.
Keywords: Substance use, Pregnancy, Medical students, Attitudes, Agreement, Cohen’s kappa, Weighted kappa
Background
We read with interest the article entitled: “Factors influencing medical students’ attitudes towards substance use during pregnancy” which was published in BMC Medical Education on 2 May 2022 [1]. The authors investigated the attitudes of final-year medical students regarding substance use during pregnancy and identified the factors that influence these attitudes. They focused on two items, including drugs and alcohol, regarding the punishment of substance use during pregnancy. Nonetheless, we found that Cohen’s kappa value for measuring the agreement between these medical students’ attitudes towards drugs/alcohol use during pregnancy was questionable. We recommend using weighted kappa instead of Cohen’s kappa for agreement analysis at the presence of three categories. The agreement improved from “good” (Cohen’s kappa) to “very good” (weighted kappa) for medical students’ attitudes towards drugs/alcohol use during pregnancy. To conclude, we recognize that this does not significantly alter the conclusions of the Richelle et al. paper, but it is necessary to ensure that the appropriate statistical tools are used.
Main text
Cohen’s kappa statistic is generally suitable for evaluating two raters [2]. Especially in Cohen’s kappa statistic, the weighted kappa statistic should be used to estimate the inter-rater reliability in the presence of more than two categories [3]. In contrast to Cohen’s kappa statistic, weighted kappa statistic relies more on predefined cell weights that reflect agreement or disagreement.
Cohen’s kappa is calculated as follows:
1 |
Weighted kappa is calculated as follows:
2 |
The value of ujj(ii′) is the proportion of objects put in the same category j by both raters i and i′. The value of pij is the proportion of objects that rater i assigned to category j, and k is the number of raters. Cohen [4] suggested the k value should be interpreted as follows: < 0.20 as poor agreement, 0.20–0.40 as fair agreement, 0.41–0.60 as moderate agreement, 0.61–0.80 as good agreement, and > 0.80 as very good agreement.
In the authors’ Table 1, according to the authors’ calculation, inter-rater reliability was good for medical students’ attitudes towards drugs/alcohol use during pregnancy (Cohen’s kappa = 0.775, 95% [confidence interval] CI = 0.714–0.837). But, in our opinion, weighted kappa was more applicable than Cohen’s kappa for the presence of three categories. Consequently, we performed the weighted kappa statistics (linear and quadratic) for evaluating the agreement according to the authors’ data. The linear weighted kappa value was 0.804 (95% [confidence interval] CI = 0.746–0.863) as very good agreement. The quadratic weighted kappa value was 0.831 (95% [confidence interval] CI = 0.770–0.892) as very good agreement, too. The greater the difference between the ratings of the same data, the stronger the hint of inconsistency. For example, the penalty for category disagree into category agree should be significantly greater than that for predicting category disagree into category undecided. If using Cohen’s kappa, there is no difference between the former and the later. If using linear weights, the penalty for the former is equal to 2 times the later. If using quadratic weights, the penalty for the former equal to 4 times the later. Therefore, we recommend that quadratic weighted kappa should be used to evaluate the agreement for magnifying the degree of inconsistency in the judgment of large level distance.
Table 1.
Punishment for pregnant women using drugs or alcohol
Drugs | Alcohol | ||
---|---|---|---|
Disagree | Undecided | Agree | |
Disagree | 200 | 10 | 6 |
Undecided | 12 | 57 | 11 |
Agree | 13 | 1 | 50 |
kc: 0.775 (p<0.001, 95% [confidence interval] CI = 0.714–0.837) | |||
klw: 0.804 (p<0.001, 95% [confidence interval] CI = 0.746–0.863) | |||
kqw: 0.831 (p<0.001, 95% [confidence interval] CI = 0.770–0.892) |
The data has been cited from the article published by Richelle et al. [1] and undergone modification. kc: Cohen’s kappa, klw: linear weighted kappa, kqw: quadratic weighted kappa
In conclusion, the authors underestimated the agreement between medical students’ attitudes towards drugs/alcohol use during pregnancy. The reasonable agreement was “very good” for medical students’ attitudes towards drugs/alcohol use during pregnancy. Anyway, we recognize that this does not significantly alter the conclusions of the Richelle et al. paper, but it is necessary to ensure that the appropriate statistical tools are used. We highlight that the rigor and use of the correct statistical approach is crucial for any scientific publication. Applying appropriate statistical methods can enhance the scientific accuracy of research results.
Acknowledgements
Not applicable.
Abbreviation
- CI
Confidence interval
Authors’ contributions
TY wrote the original draft of the manuscript. LY, XJ and SS were involved in the analysis and interpretation of the data. WS was a major contributor in revising the manuscript. ML contributed to the conception and design of the study. All authors read and approved the final manuscript.
Funding
This work was supported by the Heilongjiang Province Higher Education Teaching Reform Project (SJGY20200799), Fundamental Research Funds in Heilongjiang Provincial Universities (135509160) and Qiqihar University Degree and Postgraduate Education and Teaching Reform Research Project (JGXM_QUG_Z2019003, JGXM_QUG_Z2019002).
The funding bodies played no role in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript.
Availability of data and materials
Not applicable.
Declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Footnotes
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Richelle L, Dramaix-Wilmet M, Roland M, Kacenelenbogen N. Factors influencing medical students’ attitudes towards substance use during pregnancy. BMC Med Educ. 2022;22(1):335. doi: 10.1186/s12909-022-03394-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.McHugh ML. Interrater reliability: the kappa statistic. Biochem Med (Zagreb) 2012;22(3):276–282. doi: 10.11613/BM.2012.031. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Marasini D, Quatto P, Ripamonti E. Assessing the inter-rater agreement for ordinal data through weighted indexes. Stat Methods Med Res. 2016;25(6):2611–2633. doi: 10.1177/0962280214529560. [DOI] [PubMed] [Google Scholar]
- 4.Cohen J. Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit. Psychol Bull. 1968;70(4):213–220. doi: 10.1037/h0026256. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
Not applicable.