Skip to main content
Arthroplasty Today logoLink to Arthroplasty Today
letter
. 2025 Aug 12;35:101794. doi: 10.1016/j.artd.2025.101794

Comment on “Comparative Efficacy of ChatGPT and DeepSeek in Addressing Patient Queries on Gonarthrosis and Total Knee Arthroplasty”

Hinpetch Daungsupawong 1,, Viroj Wiwanitkit 2
PMCID: PMC12359166  PMID: 40831906

Dear Editor,

The publication on “Comparative Efficacy of ChatGPT and DeepSeek in Addressing Patient Queries on Gonarthrosis and Total Knee Arthroplasty [1]” is interesting. This study employed a prospective, comparative design to assess the performance of 2 AI systems, ChatGPT and DeepSeek, in providing information to osteoarthritis patients seeking knee replacement surgery. Although the use of blinded orthopedic experts decreased bias in determining the accuracy of responses, there are several constraints to consider. For example, the power analysis method was not specified for the 100-person sample size, and using a 10-level Lickert score, which is ordinal data, and then computing the average and comparing it to an appropriate statistical approach for ordinal data may skew the results. Furthermore, confining patients to only 5 questions may not address the queries or problems that the majority of patients may have, reducing the data' generalizability.

Another significant drawback is that, while the experts were blinded to their knowledge of the AI system, the patients knew whose platform they received the answers from, which may have influenced their satisfaction with their personal expectations. Furthermore, the experts' assessment of the quality of the answers was subjective, with no defined standards or references for accuracy, making it difficult to duplicate the findings. It also ignores inter-rater reliability, which is critical to the accuracy ratings' dependability. Neglecting language, culture, or patient knowledge levels may have an impact on comprehension and satisfaction, which should be taken into account during the analysis.

The study raises several interesting questions, such as how the complexity and format of a patient’s question affects the performance of each AI system. Should responses be tailored to the patient’s health literacy level to increase comprehension and satisfaction? Long-term follow-up to assess the impact on anxiety or adherence to postoperative advice would be beneficial. Other important considerations include the reliability of the data in the event of conflicting AI responses, and the impact on patient confidence, as well as the cost and ease of use in a real clinical context.

The results demonstrate that ChatGPT has considerably higher patient accuracy and satisfaction scores than DeepSeek, yet DeepSeek performs better at summarizing brief recovery timetables, highlighting the strengths and differences in how each system provides data. A better allocation of tasks to the answer format may increase overall performance. Furthermore, the very low variance of scores implies that both systems are consistent in their responses but lack the ability to react to more complicated or particular inquiries.

More in-depth statistical analysis, such as mixed-effects models or nonparametric tests, is required. It may help expose the interplay or interaction between question kinds, gender, age, or technological experience, which will be useful in building AI to give more suitable and personalized health information in the future.

Declaration of Generative AI and AI-Assisted Technologies in the Writing Process

During the preparation of this work the authors used computation tool for language checking and editing. After using this tool/service, the authors reviewed and edited the content as needed and take full responsibility for the content of the publication.

Conflicts of interest

The authors declare there are no conflicts of interest.

For full disclosure statements refer to https://doi.org/10.1016/j.artd.2025.101794.

CRediT authorship contribution statement

Hinpetch Daungsupawong: Conceptualization, Validation, Visualization, Writing – original draft, Writing – review & editing. Viroj Wiwanitkit: Conceptualization, Validation, Visualization.

Appendix A. Supplementary data

Conflict of Interest Statement for Daungsupawong
mmc1.pdf (100KB, pdf)
Conflict of Interest Statement for Wiwanitkit
mmc2.pdf (89.8KB, pdf)

Reference

  • 1.Gurbuz S., Bahar H., Yavuz U., Keskin A., Karslioglu B., Solak Y. Comparative efficacy of ChatGPT and DeepSeek in addressing patient queries on Gonarthrosis and total knee arthroplasty. Arthroplast Today. 2025;33 doi: 10.1016/j.artd.2025.101730. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Conflict of Interest Statement for Daungsupawong
mmc1.pdf (100KB, pdf)
Conflict of Interest Statement for Wiwanitkit
mmc2.pdf (89.8KB, pdf)

Articles from Arthroplasty Today are provided here courtesy of Elsevier

RESOURCES