Skip to main content
BMC Medical Education logoLink to BMC Medical Education
. 2023 Feb 17;23:118. doi: 10.1186/s12909-023-04071-0

Matters arising: methodological issues on evaluating agreement between medical students’ attitudes towards drugs/alcohol use during pregnancy by Cohen’s kappa analysis

Tianfei Yu 1,, Lei Yang 2, Xinjie Jiang 1, Shuli Shao 1, Wei Sha 1, Ming Li 3
PMCID: PMC9938547  PMID: 36803351

Abstract

Background

The purpose of this article is to discuss the statistical methods for agreement analysis used in Richelle’s article (BMC Med Educ 22:335, 2022). The authors investigated the attitudes of final-year medical students regarding substance use during pregnancy and identified the factors that influence these attitudes.

Methods

We found that Cohen’s kappa value for measuring the agreement between these medical students’ attitudes towards drugs/alcohol use during pregnancy was questionable. In addition, we recommend using weighted kappa instead of Cohen’s kappa for agreement analysis at the presence of three categories.

Results

The agreement improved from “good” (Cohen’s kappa) to “very good” (weighted kappa) for medical students’ attitudes towards drugs/alcohol use during pregnancy.

Conclusion

To conclude, we recognize that this does not significantly alter the conclusions of the Richelle et al. paper, but it is necessary to ensure that the appropriate statistical tools are used.

Keywords: Substance use, Pregnancy, Medical students, Attitudes, Agreement, Cohen’s kappa, Weighted kappa

Background

We read with interest the article entitled: “Factors influencing medical students’ attitudes towards substance use during pregnancy” which was published in BMC Medical Education on 2 May 2022 [1]. The authors investigated the attitudes of final-year medical students regarding substance use during pregnancy and identified the factors that influence these attitudes. They focused on two items, including drugs and alcohol, regarding the punishment of substance use during pregnancy. Nonetheless, we found that Cohen’s kappa value for measuring the agreement between these medical students’ attitudes towards drugs/alcohol use during pregnancy was questionable. We recommend using weighted kappa instead of Cohen’s kappa for agreement analysis at the presence of three categories. The agreement improved from “good” (Cohen’s kappa) to “very good” (weighted kappa) for medical students’ attitudes towards drugs/alcohol use during pregnancy. To conclude, we recognize that this does not significantly alter the conclusions of the Richelle et al. paper, but it is necessary to ensure that the appropriate statistical tools are used.

Main text

Cohen’s kappa statistic is generally suitable for evaluating two raters [2]. Especially in Cohen’s kappa statistic, the weighted kappa statistic should be used to estimate the inter-rater reliability in the presence of more than two categories [3]. In contrast to Cohen’s kappa statistic, weighted kappa statistic relies more on predefined cell weights that reflect agreement or disagreement.

Cohen’s kappa is calculated as follows:

kC=j=1nujjii-j=1npijpij1-j=1npijpij 1

Weighted kappa is calculated as follows:

kw=1-i=1nj=1nwijpiji=1nj=1nwijpiqj 2

The value of ujj(ii) is the proportion of objects put in the same category j by both raters i and i. The value of pij is the proportion of objects that rater i assigned to category j, and k is the number of raters. Cohen [4] suggested the k value should be interpreted as follows: < 0.20 as poor agreement, 0.20–0.40 as fair agreement, 0.41–0.60 as moderate agreement, 0.61–0.80 as good agreement, and > 0.80 as very good agreement.

In the authors’ Table 1, according to the authors’ calculation, inter-rater reliability was good for medical students’ attitudes towards drugs/alcohol use during pregnancy (Cohen’s kappa = 0.775, 95% [confidence interval] CI = 0.714–0.837). But, in our opinion, weighted kappa was more applicable than Cohen’s kappa for the presence of three categories. Consequently, we performed the weighted kappa statistics (linear and quadratic) for evaluating the agreement according to the authors’ data. The linear weighted kappa value was 0.804 (95% [confidence interval] CI = 0.746–0.863) as very good agreement. The quadratic weighted kappa value was 0.831 (95% [confidence interval] CI = 0.770–0.892) as very good agreement, too. The greater the difference between the ratings of the same data, the stronger the hint of inconsistency. For example, the penalty for category disagree into category agree should be significantly greater than that for predicting category disagree into category undecided. If using Cohen’s kappa, there is no difference between the former and the later. If using linear weights, the penalty for the former is equal to 2 times the later. If using quadratic weights, the penalty for the former equal to 4 times the later. Therefore, we recommend that quadratic weighted kappa should be used to evaluate the agreement for magnifying the degree of inconsistency in the judgment of large level distance.

Table 1.

Punishment for pregnant women using drugs or alcohol

Drugs Alcohol
Disagree Undecided Agree
Disagree 200 10 6
Undecided 12 57 11
Agree 13 1 50
kc: 0.775 (p<0.001, 95% [confidence interval] CI = 0.714–0.837)
klw: 0.804 (p<0.001, 95% [confidence interval] CI = 0.746–0.863)
kqw: 0.831 (p<0.001, 95% [confidence interval] CI = 0.770–0.892)

The data has been cited from the article published by Richelle et al. [1] and undergone modification. kc: Cohen’s kappa, klw: linear weighted kappa, kqw: quadratic weighted kappa

In conclusion, the authors underestimated the agreement between medical students’ attitudes towards drugs/alcohol use during pregnancy. The reasonable agreement was “very good” for medical students’ attitudes towards drugs/alcohol use during pregnancy. Anyway, we recognize that this does not significantly alter the conclusions of the Richelle et al. paper, but it is necessary to ensure that the appropriate statistical tools are used. We highlight that the rigor and use of the correct statistical approach is crucial for any scientific publication. Applying appropriate statistical methods can enhance the scientific accuracy of research results.

Acknowledgements

Not applicable.

Abbreviation

CI

Confidence interval

Authors’ contributions

TY wrote the original draft of the manuscript. LY, XJ and SS were involved in the analysis and interpretation of the data. WS was a major contributor in revising the manuscript. ML contributed to the conception and design of the study. All authors read and approved the final manuscript.

Funding

This work was supported by the Heilongjiang Province Higher Education Teaching Reform Project (SJGY20200799), Fundamental Research Funds in Heilongjiang Provincial Universities (135509160) and Qiqihar University Degree and Postgraduate Education and Teaching Reform Research Project (JGXM_QUG_Z2019003, JGXM_QUG_Z2019002).

The funding bodies played no role in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript.

Availability of data and materials

Not applicable.

Declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Footnotes

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Richelle L, Dramaix-Wilmet M, Roland M, Kacenelenbogen N. Factors influencing medical students’ attitudes towards substance use during pregnancy. BMC Med Educ. 2022;22(1):335. doi: 10.1186/s12909-022-03394-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.McHugh ML. Interrater reliability: the kappa statistic. Biochem Med (Zagreb) 2012;22(3):276–282. doi: 10.11613/BM.2012.031. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Marasini D, Quatto P, Ripamonti E. Assessing the inter-rater agreement for ordinal data through weighted indexes. Stat Methods Med Res. 2016;25(6):2611–2633. doi: 10.1177/0962280214529560. [DOI] [PubMed] [Google Scholar]
  • 4.Cohen J. Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit. Psychol Bull. 1968;70(4):213–220. doi: 10.1037/h0026256. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Not applicable.


Articles from BMC Medical Education are provided here courtesy of BMC

RESOURCES