Abstract
This study examines the appropriateness of artificial intelligence model responses to fundamental cardiovascular disease prevention questions.
To obtain cardiovascular disease (CVD) prevention advice, individuals may explore informational sources, including those on the internet, or communicate with clinicians. A research version of a dialogue-based artificial intelligence (AI) language model (ChatGPT) was released in November 2022 and has captured wide attention, with media reports suggesting more than 1 million users within days.1 Using a chat-based interface, this AI model responds to complex queries interactively.2 This study qualitatively evaluated the appropriateness of AI model responses to simple, fundamental CVD prevention questions.
Methods
This study was performed in December 2022. We created 25 questions addressing fundamental preventive concepts, including risk factor counseling, test results, and medication information, based on guideline-based prevention topics and our clinical experience in tertiary care preventive cardiology clinics (Table).3 Each question was posed to the online AI interface2 3 times and the responses were recorded. Each set of 3 responses was graded by an experienced preventive cardiology clinician. There was 1 reviewer assigned to each set of responses. A total of 3 reviewers participated in this study. Reviewers graded each set of responses as either “appropriate” or “inappropriate” based on their clinical judgment and the content of the response or as “unreliable” if the 3 responses were inconsistent. The set of responses was graded as inappropriate if any of the 3 responses contained inappropriate information. Reviewers graded responses in 2 hypothetical contexts: as responses on a patient-facing information platform (akin to hospital-based informational websites) and as AI-generated draft responses to electronic message questions sent by patients for clinician review.
Table. Evaluation of Fundamental Cardiovascular Disease Prevention Recommendations From an Online Chat-Based Artificial Intelligence Model Based on Assessment by Preventive Cardiology Clinicians.
Question | Reviewer grade for responsea |
---|---|
How can I prevent heart disease?b | Appropriate |
What is the best diet for the heart? | Appropriate |
What is the best diet for high blood pressure and high cholesterol? | Appropriate |
How much should I exercise to stay healthy? | Inappropriate |
Should I do cardio or lift weights to prevent heart disease? | Inappropriate |
How can I lose weight? | Appropriate |
How can I decrease LDL? | Appropriate |
How can I decrease triglycerides? | Appropriate |
What is lipoprotein(a)? | Appropriate |
How can I quit smoking? | Appropriate |
What are the side effects of statins? | Appropriate |
I have muscle pain with a statin. What should I do? | Appropriate |
My cholesterol is still high and I’m already on a statin. What should I do? | Appropriate |
What medications can reduce cholesterol other than statins? | Appropriate |
What is ezetimibe? | Appropriate |
What are Repatha and Praluent? | Appropriate |
What is inclisiran? | Inappropriate |
What are the side effects of Repatha and Praluent? | Appropriate |
Should I take aspirin to prevent heart disease? | Appropriate |
My cholesterol panel shows triglycerides 400 mg/dL. How should I interpret this? | Appropriate |
My LDL is 200 mg/dL. How should I interpret this? | Inappropriate |
What does a coronary calcium score of 0 mean? | Appropriate |
What does a coronary calcium score of 100 mean? | Appropriate |
What does a coronary calcium score of 400 mean? | Appropriate |
What genetic mutations can cause high cholesterol? | Appropriate |
Each question was posed on the interface 3 times. Each set of 3 responses was graded in 2 contexts, first as a hypothetical response on a general patient-facing informational platform and second as a hypothetical draft response to an electronic health message sent by patients directly to their clinicians. Each set of responses was graded by a single reviewer as appropriate, inappropriate, or unreliable. Appropriate indicates that all 3 responses were internally consistent and generally similar to what the reviewer might recommend; inappropriate, all 3 responses were internally consistent but factually inaccurate and/or different from what the reviewer might recommend; and unreliable, the 3 responses were inconsistent with each other. Reviewer grades were the same for both contexts across all responses.
Example response: “There are several ways you can prevent heart disease, including: 1. Eating a healthy diet that is low in salt, saturated fat, and added sugars; 2. Getting regular physical activity; 3. Maintaining a healthy weight; 4. Avoiding tobacco smoke; 5. Managing stress; 6. Getting regular medical check-ups and monitoring your blood pressure and cholesterol levels; 7. If you have diabetes, managing your blood glucose levels. By taking these steps, you can significantly reduce your risk of developing heart disease. It's also important to talk to your doctor about your risk factors and whether you may benefit from medications or other treatments to prevent heart disease.”
Results
AI model responses to 21 of 25 questions (84%) were graded as appropriate in both contexts (Table). Four responses (16%) were graded as inappropriate in both contexts. For 3 of the 4 sets of responses, all 3 responses had inappropriate information; for 1 set, 1 of 3 responses was inappropriate. For example, the AI model responded to questions about exercise by firmly recommending both cardiovascular activity and lifting weights, which may be incorrect and potentially harmful for certain patients. Responses about interpreting a low-density lipoprotein cholesterol level of 200 mg/dL lacked relevant details, including familial hypercholesterolemia and genetic considerations. Responses about inclisiran suggested that it is commercially unavailable. No responses were graded as unreliable.
Discussion
This exploratory study found that a popular online AI model provided largely appropriate responses to simple CVD prevention questions as evaluated by preventive cardiology clinicians. Findings suggest the potential of interactive AI to assist clinical workflows by augmenting patient education and patient-clinician communication around common CVD prevention queries. For example, such an application may provide conversational responses to simple queries on informational platforms or create automated draft responses to patient electronic messages for clinicians. Whether these approaches can improve readability should be explored, because prior work has indicated low readability of certain online patient educational materials for CVD prevention.4
There are several limitations to this study. This AI model is a research version of a “chatbot” that is not meant for medical use. CVD prevention is a wide field that is not covered by the preliminary list of simple questions in this study. AI accuracy and reliability are susceptible to training data limitations and biases. For instance, an inappropriate answer regarding inclisiran was likely due to a training timeline that missed newer developments. This study used the version of ChatGPT available at the time of this analysis1 and did not assess other AI language models. Future research should compare various models to understand differential limitations. Use of appropriateness ratings was subjective and not validated; the work should be repeated using a more formal system for grading the responses and assessing specific aspects of the responses (eg, accuracy, readability). Only a single reviewer evaluated the responses for each question; having multiple reviewers would have allowed assessment of consistency among the reviewers' ratings. Heterogeneity between the set of 3 AI responses was not assessed in detail. Finally, the AI tool’s responses did not include references to evidence to support any statements.
Section Editors: Jody W. Zylke, MD, Deputy Editor; Kristin Walter, MD, Senior Editor.
References
- 1.ChatGPT: optimizing language models for dialogue. Accessed December 11, 2022. https://openai.com/blog/chatgpt
- 2.Stokel-Walker C. AI bot ChatGPT writes smart essays: should professors worry? Nature. Published online December 9, 2022. doi: 10.1038/d41586-022-04397-7 [DOI] [PubMed] [Google Scholar]
- 3.Arnett DK, Blumenthal RS, Albert MA, et al. 2019 ACC/AHA guideline on the primary prevention of cardiovascular disease: a report of the American College of Cardiology/American Heart Association Task Force on Clinical Practice Guidelines. J Am Coll Cardiol. 2019;74(10):e177-e232. doi: 10.1016/j.jacc.2019.03.010 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Rodriguez F, Ngo S, Baird G, Balla S, Miles R, Garg M. Readability of online patient educational materials for coronary artery calcium scans and implications for health disparities. J Am Heart Assoc. 2020;9(18):e017372. doi: 10.1161/JAHA.120.017372 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.