Table 4. Accuracy and Completeness Scores for Risks, Benefits, and Alternatives to Surgery, Generated by Surgeons vs a Large Language Model−Based Chatbot.
Area | Accuracy and completeness score, mean (SD) | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Laparoscopic cholecystectomy | Inguinal hernia | Colectomy | Coronary artery bypass graft | Knee arthoplasty | Spine fusion | |||||||
Surgeon | Chatbot | Surgeon | Chatbot | Surgeon | Chatbot | Surgeon | Chatbot | Surgeon | Chatbot | Surgeon | Chatbot | |
Risks | 1.4 (0.2) | 1.5 (0.3) | 1.7 (0.6) | 1.8 (0.3) | 1.8 (0.5) | 1.7 (0.7) | 1.3 (0.2) | 1.6 (0.3) | 2.1 (0.6) | 1.7 (0.3) | 1.8 (0.5) | 1.6 (0.5) |
Benefits | 1.5 (0.7) | 1.3 (0.4) | 1.7 (0.8) | 2.9 (0.3) | 1.4 (0.8) | 2.2 (0.6) | 1.3 (0.5) | 2.2 (0.4) | 1.5 (0.5) | 2.8 (0.3) | 1.5 (0.7) | 2.6 (0.5) |
Alternatives | 1.4 (0.7) | 2.4 (0.9) | 1.5 (0.9) | 2.8 (0.5) | 1.6 (0.9) | 2.8 (0.4) | 1.4 (0.7) | 2.6 (0.5) | 1.3 (0.6) | 3.0 (0) | 1.2 (0.4) | 2.8 (0.4) |
Overall impression | 1.9 (0.3) | 2.3 (0.5) | 1.9 (0.4) | 2.7 (0.6) | 2.0 (0.6) | 2.4 (0.5) | 1.6 (0.6) | 2.3 (0.7) | 2.2 (0.4) | 2.0 (0) | 2.1 (0.5) | 2.3 (0.5) |
Composite | 1.6 (0.3) | 1.9 (0.4) | 1.7 (0.5) | 2.5 (0.3) | 1.6 (0.6) | 2.3 (0.5) | 1.4 (0.4) | 2.2 (0.3) | 1.8 (0.5) | 2.4 (0.1) | 1.6 (0.4) | 2.3 (0.4) |