Appropriateness of Cardiovascular Disease Prevention Recommendations Obtained From a Popular Online Chat-Based Artificial Intelligence Model

Ashish Sarraju; Dennis Bruemmer; Erik Van Iterson; Leslie Cho; Fatima Rodriguez; Luke Laffin

doi:10.1001/jama.2023.1044

. 2023 Feb 3;329(10):842–844. doi: 10.1001/jama.2023.1044

Appropriateness of Cardiovascular Disease Prevention Recommendations Obtained From a Popular Online Chat-Based Artificial Intelligence Model

Ashish Sarraju ^1,^✉, Dennis Bruemmer ¹, Erik Van Iterson ¹, Leslie Cho ¹, Fatima Rodriguez ², Luke Laffin ¹

PMCID: PMC10015303 PMID: 36735264

Abstract

This study examines the appropriateness of artificial intelligence model responses to fundamental cardiovascular disease prevention questions.

To obtain cardiovascular disease (CVD) prevention advice, individuals may explore informational sources, including those on the internet, or communicate with clinicians. A research version of a dialogue-based artificial intelligence (AI) language model (ChatGPT) was released in November 2022 and has captured wide attention, with media reports suggesting more than 1 million users within days.¹ Using a chat-based interface, this AI model responds to complex queries interactively.² This study qualitatively evaluated the appropriateness of AI model responses to simple, fundamental CVD prevention questions.

Methods

This study was performed in December 2022. We created 25 questions addressing fundamental preventive concepts, including risk factor counseling, test results, and medication information, based on guideline-based prevention topics and our clinical experience in tertiary care preventive cardiology clinics (Table).³ Each question was posed to the online AI interface² 3 times and the responses were recorded. Each set of 3 responses was graded by an experienced preventive cardiology clinician. There was 1 reviewer assigned to each set of responses. A total of 3 reviewers participated in this study. Reviewers graded each set of responses as either “appropriate” or “inappropriate” based on their clinical judgment and the content of the response or as “unreliable” if the 3 responses were inconsistent. The set of responses was graded as inappropriate if any of the 3 responses contained inappropriate information. Reviewers graded responses in 2 hypothetical contexts: as responses on a patient-facing information platform (akin to hospital-based informational websites) and as AI-generated draft responses to electronic message questions sent by patients for clinician review.

Table. Evaluation of Fundamental Cardiovascular Disease Prevention Recommendations From an Online Chat-Based Artificial Intelligence Model Based on Assessment by Preventive Cardiology Clinicians.

Question	Reviewer grade for response^a
How can I prevent heart disease?^b	Appropriate
What is the best diet for the heart?	Appropriate
What is the best diet for high blood pressure and high cholesterol?	Appropriate
How much should I exercise to stay healthy?	Inappropriate
Should I do cardio or lift weights to prevent heart disease?	Inappropriate
How can I lose weight?	Appropriate
How can I decrease LDL?	Appropriate
How can I decrease triglycerides?	Appropriate
What is lipoprotein(a)?	Appropriate
How can I quit smoking?	Appropriate
What are the side effects of statins?	Appropriate
I have muscle pain with a statin. What should I do?	Appropriate
My cholesterol is still high and I’m already on a statin. What should I do?	Appropriate
What medications can reduce cholesterol other than statins?	Appropriate
What is ezetimibe?	Appropriate
What are Repatha and Praluent?	Appropriate
What is inclisiran?	Inappropriate
What are the side effects of Repatha and Praluent?	Appropriate
Should I take aspirin to prevent heart disease?	Appropriate
My cholesterol panel shows triglycerides 400 mg/dL. How should I interpret this?	Appropriate
My LDL is 200 mg/dL. How should I interpret this?	Inappropriate
What does a coronary calcium score of 0 mean?	Appropriate
What does a coronary calcium score of 100 mean?	Appropriate
What does a coronary calcium score of 400 mean?	Appropriate
What genetic mutations can cause high cholesterol?	Appropriate

Open in a new tab

^{^a}

Each question was posed on the interface 3 times. Each set of 3 responses was graded in 2 contexts, first as a hypothetical response on a general patient-facing informational platform and second as a hypothetical draft response to an electronic health message sent by patients directly to their clinicians. Each set of responses was graded by a single reviewer as appropriate, inappropriate, or unreliable. Appropriate indicates that all 3 responses were internally consistent and generally similar to what the reviewer might recommend; inappropriate, all 3 responses were internally consistent but factually inaccurate and/or different from what the reviewer might recommend; and unreliable, the 3 responses were inconsistent with each other. Reviewer grades were the same for both contexts across all responses.

^{^b}

Example response: “There are several ways you can prevent heart disease, including: 1. Eating a healthy diet that is low in salt, saturated fat, and added sugars; 2. Getting regular physical activity; 3. Maintaining a healthy weight; 4. Avoiding tobacco smoke; 5. Managing stress; 6. Getting regular medical check-ups and monitoring your blood pressure and cholesterol levels; 7. If you have diabetes, managing your blood glucose levels. By taking these steps, you can significantly reduce your risk of developing heart disease. It's also important to talk to your doctor about your risk factors and whether you may benefit from medications or other treatments to prevent heart disease.”

Results

AI model responses to 21 of 25 questions (84%) were graded as appropriate in both contexts (Table). Four responses (16%) were graded as inappropriate in both contexts. For 3 of the 4 sets of responses, all 3 responses had inappropriate information; for 1 set, 1 of 3 responses was inappropriate. For example, the AI model responded to questions about exercise by firmly recommending both cardiovascular activity and lifting weights, which may be incorrect and potentially harmful for certain patients. Responses about interpreting a low-density lipoprotein cholesterol level of 200 mg/dL lacked relevant details, including familial hypercholesterolemia and genetic considerations. Responses about inclisiran suggested that it is commercially unavailable. No responses were graded as unreliable.

Discussion

This exploratory study found that a popular online AI model provided largely appropriate responses to simple CVD prevention questions as evaluated by preventive cardiology clinicians. Findings suggest the potential of interactive AI to assist clinical workflows by augmenting patient education and patient-clinician communication around common CVD prevention queries. For example, such an application may provide conversational responses to simple queries on informational platforms or create automated draft responses to patient electronic messages for clinicians. Whether these approaches can improve readability should be explored, because prior work has indicated low readability of certain online patient educational materials for CVD prevention.⁴

There are several limitations to this study. This AI model is a research version of a “chatbot” that is not meant for medical use. CVD prevention is a wide field that is not covered by the preliminary list of simple questions in this study. AI accuracy and reliability are susceptible to training data limitations and biases. For instance, an inappropriate answer regarding inclisiran was likely due to a training timeline that missed newer developments. This study used the version of ChatGPT available at the time of this analysis¹ and did not assess other AI language models. Future research should compare various models to understand differential limitations. Use of appropriateness ratings was subjective and not validated; the work should be repeated using a more formal system for grading the responses and assessing specific aspects of the responses (eg, accuracy, readability). Only a single reviewer evaluated the responses for each question; having multiple reviewers would have allowed assessment of consistency among the reviewers' ratings. Heterogeneity between the set of 3 AI responses was not assessed in detail. Finally, the AI tool’s responses did not include references to evidence to support any statements.

Section Editors: Jody W. Zylke, MD, Deputy Editor; Kristin Walter, MD, Senior Editor.

Supplement.

Data sharing statement

Click here for additional data file.^{(102.4KB, pdf)}

References

1.ChatGPT: optimizing language models for dialogue. Accessed December 11, 2022. https://openai.com/blog/chatgpt
2.Stokel-Walker C. AI bot ChatGPT writes smart essays: should professors worry? Nature. Published online December 9, 2022. doi: 10.1038/d41586-022-04397-7 [DOI] [PubMed] [Google Scholar]
3.Arnett DK, Blumenthal RS, Albert MA, et al. 2019 ACC/AHA guideline on the primary prevention of cardiovascular disease: a report of the American College of Cardiology/American Heart Association Task Force on Clinical Practice Guidelines. J Am Coll Cardiol. 2019;74(10):e177-e232. doi: 10.1016/j.jacc.2019.03.010 [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Rodriguez F, Ngo S, Baird G, Balla S, Miles R, Garg M. Readability of online patient educational materials for coronary artery calcium scans and implications for health disparities. J Am Heart Assoc. 2020;9(18):e017372. doi: 10.1161/JAHA.120.017372 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement.

Data sharing statement

Click here for additional data file.^{(102.4KB, pdf)}

[jld230010r1] 1.ChatGPT: optimizing language models for dialogue. Accessed December 11, 2022. https://openai.com/blog/chatgpt

[jld230010r2] 2.Stokel-Walker C. AI bot ChatGPT writes smart essays: should professors worry? Nature. Published online December 9, 2022. doi: 10.1038/d41586-022-04397-7 [DOI] [PubMed] [Google Scholar]

[jld230010r3] 3.Arnett DK, Blumenthal RS, Albert MA, et al. 2019 ACC/AHA guideline on the primary prevention of cardiovascular disease: a report of the American College of Cardiology/American Heart Association Task Force on Clinical Practice Guidelines. J Am Coll Cardiol. 2019;74(10):e177-e232. doi: 10.1016/j.jacc.2019.03.010 [DOI] [PMC free article] [PubMed] [Google Scholar]

[jld230010r4] 4.Rodriguez F, Ngo S, Baird G, Balla S, Miles R, Garg M. Readability of online patient educational materials for coronary artery calcium scans and implications for health disparities. J Am Heart Assoc. 2020;9(18):e017372. doi: 10.1161/JAHA.120.017372 [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Appropriateness of Cardiovascular Disease Prevention Recommendations Obtained From a Popular Online Chat-Based Artificial Intelligence Model

Ashish Sarraju, MD

Dennis Bruemmer, MD, PhD

Erik Van Iterson, PhD

Leslie Cho, MD

Fatima Rodriguez, MD, MPH

Luke Laffin, MD

Abstract

Methods

Table. Evaluation of Fundamental Cardiovascular Disease Prevention Recommendations From an Online Chat-Based Artificial Intelligence Model Based on Assessment by Preventive Cardiology Clinicians.

Results

Discussion

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Appropriateness of Cardiovascular Disease Prevention Recommendations Obtained From a Popular Online Chat-Based Artificial Intelligence Model

Ashish Sarraju, MD

Dennis Bruemmer, MD, PhD

Erik Van Iterson, PhD

Leslie Cho, MD

Fatima Rodriguez, MD, MPH

Luke Laffin, MD

Abstract

Methods

Table. Evaluation of Fundamental Cardiovascular Disease Prevention Recommendations From an Online Chat-Based Artificial Intelligence Model Based on Assessment by Preventive Cardiology Clinicians.

Results

Discussion

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases