Skip to main content
. 2023 Oct 9;6(10):e2336997. doi: 10.1001/jamanetworkopen.2023.36997

Table 4. Accuracy and Completeness Scores for Risks, Benefits, and Alternatives to Surgery, Generated by Surgeons vs a Large Language Model−Based Chatbot.

Area Accuracy and completeness score, mean (SD)
Laparoscopic cholecystectomy Inguinal hernia Colectomy Coronary artery bypass graft Knee arthoplasty Spine fusion
Surgeon Chatbot Surgeon Chatbot Surgeon Chatbot Surgeon Chatbot Surgeon Chatbot Surgeon Chatbot
Risks 1.4 (0.2) 1.5 (0.3) 1.7 (0.6) 1.8 (0.3) 1.8 (0.5) 1.7 (0.7) 1.3 (0.2) 1.6 (0.3) 2.1 (0.6) 1.7 (0.3) 1.8 (0.5) 1.6 (0.5)
Benefits 1.5 (0.7) 1.3 (0.4) 1.7 (0.8) 2.9 (0.3) 1.4 (0.8) 2.2 (0.6) 1.3 (0.5) 2.2 (0.4) 1.5 (0.5) 2.8 (0.3) 1.5 (0.7) 2.6 (0.5)
Alternatives 1.4 (0.7) 2.4 (0.9) 1.5 (0.9) 2.8 (0.5) 1.6 (0.9) 2.8 (0.4) 1.4 (0.7) 2.6 (0.5) 1.3 (0.6) 3.0 (0) 1.2 (0.4) 2.8 (0.4)
Overall impression 1.9 (0.3) 2.3 (0.5) 1.9 (0.4) 2.7 (0.6) 2.0 (0.6) 2.4 (0.5) 1.6 (0.6) 2.3 (0.7) 2.2 (0.4) 2.0 (0) 2.1 (0.5) 2.3 (0.5)
Composite 1.6 (0.3) 1.9 (0.4) 1.7 (0.5) 2.5 (0.3) 1.6 (0.6) 2.3 (0.5) 1.4 (0.4) 2.2 (0.3) 1.8 (0.5) 2.4 (0.1) 1.6 (0.4) 2.3 (0.4)