Abstract
Generative Pre-trained Transformer (ChatGPT) is a web-based artificial intelligence assistant with the potential to provide information, answer questions, and make recommendations on various topics. Rare cardiovascular diseases (rCVD) are among the health problems that require specialized knowledge and attention, and web databases provide relatively limited information. In this study, we investigated the accuracy and reliability of ChatGPT’s answers to patients’ possible questions about rCVD. ChatGPT was asked forty questions about rCVD. Based on current guidelines and information, academicians who are experts in their fields evaluated ChatGPT’s answers to these questions. The success of ChatGPT, which has been repeatedly evaluated in classical diseases, was lower in rCVD. The responses to various questions exhibited significant similarity, with some answers including redundant information. In addition, ChatGPT did not give the desired answers to some questions. However, although some answers were longer than necessary, there was very little incorrect information in the answers. Although ChatGPT is competent in obtaining information about rCVD, physicians should clarify the answers given by ChatGPT to patients. Therefore, ChatGPT should be used as an auxiliary information acquisition tool rather than as a primary resource for patients with rCVD.
Keywords: Accuracy, Artificial intelligence, ChatGPT, Rare cardiovascular disease, Reliability
Subject terms: Cardiology, Biomedical engineering
Introduction
Artificial intelligence (AI) has increasingly been incorporated into healthcare, offering tools that assist with diagnosis, treatment planning, and patient education. Generative Pre-trained Transformer (ChatGPT) is a web-based AI language model developed by OpenAI, known for its open access and strong capability in providing information1. It has been tested and proven successful across various medical fields and conditions2,3. While its application in common conditions has been widely explored, its reliability in addressing rare diseases remains unclear.
Rare diseases, affecting fewer than 1 in 2000 individuals, are often genetic and associated with chronic symptoms or premature mortality4–6. Limited research and small sample sizes result in restricted literature, with expertise concentrated among specialized physicians5,7.
Consequently, patients frequently turn to online resources for medical information, with AI-based tools emerging as a convenient alternative8. These platforms provide rapid access to health-related content, offering users instant responses to their queries. However, the absence of direct physician oversight raises concerns about the accuracy and completeness of AI-generated information, particularly for complex or nuanced medical conditions. Misinterpretations can lead to incorrect self-diagnosis, unnecessary anxiety, or even delays in seeking appropriate medical care. Therefore, evaluating the reliability of AI-generated responses is essential, especially in the context of rare diseases where misinformation can have significant clinical and psychological consequences for patients.
Despite the increasing number of studies on the role of ChatGPT in healthcare, its effectiveness in addressing rare cardiovascular diseases (rCVD) remains largely unexplored. This study aims to assess the accuracy and reliability of ChatGPT responses to questions regarding rCVD management and treatment and to address a critical gap in AI-enabled medical education.
Materials and methods
Seven cardiovascular diseases (arrhythmogenic right ventricular dysplasia, Brugada syndrome, dilated cardiomyopathy, long QT syndrome, primary pulmonary hypertension, restrictive cardiomyopathy, short QT syndrome) that fit the definition of rare diseases were identified by the academicians in the study. For all diseases, potential questions that patients might have were identified by the expert team following a comprehensive review of the literature (Table 1). The questions were grouped under six main headings: general information about the disease, symptoms and signs that may be seen in the disease, tests required for the diagnosis of the disease, treatment of the disease, the effect of the disease on lifestyle, and how to live with the disease. For the purpose of standardization, the same questions were asked in English regarding the identified diseases. The questions were answered separately using the free version of ChatGPT (ChatGPT-4o) and the paid version (ChatGPT Plus). The responses were evaluated using a Likert scale (Table 2) to assess the accuracy of each answer: 0. Wrong, misleading, potentially harmful; (1) Inconclusive, incomprehensible, incomplete; (2) Incomplete/outdated information and unsatisfactory; (3) Correct but not sufficient; (4) Adequate and correct; and (5) Comprehensive, detailed, and correct. Responses rated as “4. Adequate and correct” were factually accurate but lacked sufficient elaboration, contextual detail, or supporting information to be fully informative. In contrast, responses rated as “5. Comprehensive, detailed, and correct” not only provided accurate information, but also included thorough explanations, relevant nuances, and additional clarifications. The scoring scale was determined based on other studies in the literature. The responses were evaluated using a scoring scale by expert academic doctors with at least 15 years of experience in the field based on the latest guidelines and current information. Academic physicians did not know which version of ChatGPT the answers given to the questions ChatGPT belonged to. In addition, all questions were entered into ChatGPT using the “New Chat” function. This study did not involve human participants or the use of identifiable patient data. Therefore, ethical approval from an institutional review board and informed consent were not required.
Table 1.
Questions about rare cardiovascular diseases.
| About | Tests |
| 1. What is X* disease? | 1. How is X* disease diagnosed? |
| 2. How common is X* disease? | 2. Which tests do I need to have for X* disease diagnosis and why? |
| 3. Is X* disease genetically transmitted? | 3. How often should I get my ECG checked with X* disease? |
| 4. Can X* disease be cured? | 4. How often should I get an echocardiogram with X* disease? |
| 5. How long am I expected to live with X* disease? | Treatment |
| 6. Is X* disease serious? | 1. What kind of drugs do I have to take for X* disease? |
| 7. Does X* disease cause heart failure? | 2. I have X* disease. What kind of drugs should I avoid? |
| 8. Do I need to find a specialized cardiologist for X* disease? | 3. I have been diagnosed with X* disease. Do I have to take medicines for the rest of my life? |
| 9. Does X* disease coexist with other diseases? | 4. I use medication for X* disease. When and how should I take my medicines? |
| Signs and symptoms | 5. Does herbal remedies help with X* disease? |
| 1. What are the symptoms of X* disease? | 6. Is there a surgical treatment for X* disease? If so, does surgery cure the disease? |
| 2. I recently had a fainting spell. Should I be checked for X* disease? | 7. I have X* disease. Do I or will I need a cardiac device implantation? |
| 3. When should I seek emergency medical services with X* disease? | 8. Will a pacemaker implantation reduce my symptoms or cure my X* disease? |
| 4. I have X* disease and I have been experiencing shortness of breath lately. What should I do? | 9. Will an ICD implantation reduce my symptoms or cure my X* disease? |
| Lifestyle | Living with X* disease |
| 1. I have X* disease. What should my diet include? | 1. I have X* disease. Can I drive or operate machinery? |
| 2. I have X* disease. What should I avoid eating? | 2. I have been diagnosed with X* disease and carry an ICD. Can I drive or operate machinery? |
| 3. Can I take alcohol if I have X* disease? | 3. I have been diagnosed with X* disease. Is it safe for me to fly? |
| 4. Can I smoke cigarettes if I have X* disease? | 4. I have been diagnosed with X* disease and carry an ICD/Pacemaker. Is it safe for me to fly? |
| 5. How much can I exercise with X* disease? | 5. Having been diagnosed with X* disease, can I still have sex? |
| 6. What kind of physical exercise is safe for X* disease? | 6. Are there any vaccines I should avoid in X* disease? |
| 7. Do I have to limit my fluid consumption if I have X* disease? | 7. I have X* disease. Is there anything my family/caregivers should be careful about? |
| X*: Any of the specified diseases (arrhythmogenic right ventricular dysplasia, Brugada syndrome, dilated cardiomyopathy, long QT syndrome, primary pulmonary hypertension, restrictive cardiomyopathy, short QT syndrome), ECG: Electrocardiogram, ICD: Implantable Cardioverter Defibrillator | |
Table 2.
Likert scale for accuracy Evaluation.
| Likert Score | Description |
|---|---|
| 0 | Wrong, misleading, potentially harmful |
| 1 | Inconclusive, incomprehensible, incomplete |
| 2 | Incomplete/outdated information and unsatisfactory |
| 3 | Correct but not sufficient |
| 4 | Adequate and correct |
| 5 | Comprehensive, detailed, and correct |
Analysis of data Each academician carefully reviewed the answers to 40 different questions related to a rare cardiovascular disease, based on their area of expertise, from both the free and paid versions of ChatGPT. The answers were then rated using a Likert Scale and entered into an Excel spreadsheet. The scores for each version were averaged by dividing the total number of questions by the scores given. Afterward, the percentage of these average scores was calculated. The resulting percentage scores were visually represented in bar charts in Excel. This visualization was created for all the rare diseases.
Results
ChatGPT demonstrated a high level of reliability, comprehensibility and accuracy in generating responses to 40 different questions in 6 different categories encompassing 7 rarely encountered cardiovascular diseases in both paid and free-to-use versions. Overall, the free-to-use version of ChatGPT scored 91.21 ± 1.38%, whereas the paid version scored 95.14 ± 0.74% over a 280-question survey (Fig. 1). The highest scoring rCVD was pulmonary arterial hypertension (PAH) with an average score of 94% and the lowest scoring one was Long QT Syndromes (LQTS) with an average score of 92.25%. Moreover, the difference between the free and paid versions’ final results was found to be statistically significant (p < 0.0001) (Fig. 2). The answers to all questions were graded 3 out of 5 or higher (accurate but inadequate) on Likert scale by all experts. Out of a total of 280 questions, the answers to 24 were graded 3 out of 5 on Likert scale of 0 to 5. Of these 24 questions, 13 belonged to the “Treatment” category.
Figure 1.
Distribution of the adequacy of responses generated by ChatGPT (n = 280 for free and paid versions).
Figure 2.
Grade of responses by ChatGPT in free and paid versions with overall scores, evaluated in percents.
Among the 6 categories of questions, the “Tests” category scored the highest with 95.3% and, “Treatment” category scored the lowest with 90.7%. The greatest difference between paid and free-to-use versions was observed in the “About” category (97.1% vs. 90.7% in paid and free-to-use versions, respectively) (Fig. 3).
Figure 3.
Grade of responses by ChatGPT in free and paid versions according to each question category, evaluated in percents.
The success of ChatGPT in answering questions regarding rCVD was lower than more commonly encountered diseases such as heart failure or coronary artery disease, which were evaluated several times previously. The responses given to different questions were very similar, and some answers contained unnecessary information. Moreover, ChatGPT couldn’t give the preferred, correct responses to several questions. Although some responses were longer than necessary, wrong and misleading information was negligible.
Discussion
This study examines the accuracy and reliability of ChatGPT’s responses to the most likely questions posed by patients with rCVD, as evaluated by physician-academics. The academics rated ChatGPT’s performance in providing basic information about rCVD, lifestyle changes, and medical treatment options as adequate. However, compared to studies evaluating common cardiac diseases, the accuracy and reliability of ChatGPT were found to be lower.
Rare cardiovascular diseases require special attention and are difficult to diagnose and manage due to lack of literature data9. In addition, low awareness also delays diagnosis. Furthermore, these diseases may not manifest with specific symptoms. Misdiagnosis is an undesirable situation that can be seen in all rare diseases, as it is difficult to diagnose10. The literature has also shown that rare diseases can be confused with a wide range of other conditions10. Limited resources and restricted access to advanced cardiac diagnostic tools are additional challenges in reaching a diagnosis. These difficulties in diagnosis are mirrored in the challenges faced during treatment. For this reason, patients often turn to web-based AI tools to find answers to their questions, especially when they struggle to get responses from physicians other than specialists11. However, obtaining clear and accurate answers from AI programs for these diseases, which involve a complex network of information even in clinical practice, remains a significant challenge for patients12.
Searching for health information online has become a common practice. Patients turn to search engines for answers to all kinds of questions on their minds. However, the results obtained from search engines can often be complex and misleading13. Recently, ChatGPT has emerged as an alternative to search engines, offering the ability to analyze vast amounts of data efficiently and provide understandable answers. The first version of ChatGPT, based on GPT-3.5, was launched by OpenAI on November 30, 2022, and quickly gained widespread attention14. It has since been upgraded, with the current version being GPT-4.0. Over time, ChatGPT has emerged as a significant milestone in the field of AI15. ChatGPT is continuously trained by Reinforcement Learning from Human Feedback (RLHF) or Reinforcement Learning from Human Preference (RLHP) methods with user comments and corrections and stores the information to provide more accurate responses in the future15. As such, it collects and analyzes vast web-based data and can answer questions in a conversational style16. Furthermore, ChatGPT has security measures in place to prevent abuse17. The free accessibility of ChatGPT is also a great advantage. In the literature, there are many studies evaluating the use of ChatGPT for medical information purposes2,18,19. In these studies, the adequacy, usability, and reliability of ChatGPT responses to different diseases have been tested.
ChatGPT has disadvantages as well as advantages. First, it can collect personal data while chatting with individuals, and privacy issues may arise in the future. Second, ChatGPT’s most recent training data cut-off date is May 2024, making it difficult to access the most up-to-date information instantly. However, the training data dates are continuously moved forward, i.e., the software is updated. Third, this web-based software may overlook critical information presented in figures or tables when retrieving data. Fourth, being entirely dependent on web-based content, it lacks the ability to distinguish between reliable and false information. Fifth, when answering questions, it tends to give longer answers than necessary. Sixth, it may misinterpret or fail to understand certain abbreviations. Seventh, it may struggle to differentiate between outdated and current information. Finally, it does not provide references for the answers to the questions. As ChatGPT continues to evolve, the inclusion of newer and more up-to-date scientific data in the literature will naturally improve its ability to provide more accurate responses. This ongoing expansion of knowledge will likely enhance the model’s capacity to generate better answers in the future. In addition, DeepSeek, recently developed by Chinese scientists, has gained global recognition. This AI model can offer alternative solutions and provide more accurate and comprehensive answers to rCVD-related queries in the future.
One limitation of our study is that the questions were asked in English. Additionally, the rapid pace of information changes presents another limitation. A key challenge in real-world use is that patients may not always phrase their questions clearly, which can lead to unclear or incomplete responses from AI systems. Unlike carefully structured expert-formulated questions, patient queries can sometimes be vague, causing the AI to generate less precise answers. Future research should explore how AI models handle patient-phrased inquiries and whether refining the way questions are asked could improve the relevance and accuracy of responses.
ChatGPT is a popular and updatable web-based AI software that is easily accessible in the healthcare field. In rCVD, for which there is very little information on the web, this AI modeling is less successful in giving answers to the questions asked of it than in classical diseases. Despite this, the reliability and accuracy of the responses are generally sufficient. Nevertheless, these responses should be reviewed and verified by a qualified medical doctor. In other words, ChatGPT should be used as an auxiliary tool rather than a primary resource.
Author contributions
SS and MC wrote the main manuscript text. EK prepared Figs. 1, 2 and 3. BS, OBŞ, ST, YD, BAY, MRY, and AŞ contributed to the study design, reviewed the manuscript, and provided critical revisions. All authors reviewed and approved the final version of the manuscript.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Data availability
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.
Declarations
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Papastratis, I., Konstantinidis, D., Daras, P. & Dimitropoulos, K. AI nutrition recommendation using a deep generative model and ChatGPT. Sci. Rep.14 (1), 14620 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Choi, J. et al. Availability of ChatGPT to provide medical information for patients with kidney cancer. Sci. Rep.14 (1), 1542 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Cavnar Helvaci, B. et al. Assessing the accuracy and reliability of chatgpt’s medical responses about thyroid cancer. Int. J. Med. Inf.191, 105593 (2024). [DOI] [PubMed] [Google Scholar]
- 4.Nguengang Wakap, S. et al. Estimating cumulative point prevalence of rare diseases: analysis of the Orphanet database. Eur. J. Hum. Genet.28 (2), 165–173 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Podolec, P. Rare cardiovascular diseases. Eur. Heart J.38 (43), 3190–3192 (2017). [DOI] [PubMed] [Google Scholar]
- 6.Hu, C. S. A comprehensive strategy for managing arrhythmogenic right ventricular cardiomyopathy. Turk. Kardiyol Dern Ars. 48 (2), 88–95 (2020). [DOI] [PubMed] [Google Scholar]
- 7.Limongelli, G., Monda, E., Lioncino, M. & Bossone, E. Rare cardiovascular diseases: from genetics to personalized medicine. Heart Fail. Clin.18 (1), xix–xxi (2022). [DOI] [PubMed] [Google Scholar]
- 8.Wei, K., Fritz, C. & Rajasekaran, K. Answering head and neck cancer questions: an assessment of ChatGPT responses. Am. J. Otolaryngol.45 (1), 104085 (2024). [DOI] [PubMed] [Google Scholar]
- 9.Fine, N. M. & Shahi, K. Rare cardiovascular disease care: centers of excellence or excellence of centers? JACC Case Rep.19, 101891 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Kaski, J. P. & Arbelo, E. The 2023 ESC guidelines for the management of cardiomyopathies: the 10 commandments. Eur. Heart J.45 (13), 1101 (2024). [DOI] [PubMed] [Google Scholar]
- 11.Saeidnia, H. R., Kozak, M., Lund, B. D. & Hassanzadeh, M. Evaluation of chatgpt’s responses to information needs and information seeking of dementia patients. Sci. Rep.14 (1), 10273 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Lu, L. et al. Healthcare professionals and the public sentiment analysis of ChatGPT in clinical practice. Sci. Rep.15 (1), 1223 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Roumeliotis, K. I. & Tselikas, N. D. ChatGPT and Open-AI models: A preliminary review. Future Internet. 15 (6), 192 (2023). [Google Scholar]
- 14.What’s the. Next word in large Language models? Nat. Mach. Intell.5 (4), 331–332 (2023). [Google Scholar]
- 15.Briganti, G. How ChatGPT works: a mini review. Eur. Arch. Otorhinolaryngol.281 (3), 1565–1569 (2024). [DOI] [PubMed] [Google Scholar]
- 16.Turchi, T. et al. Pathways to democratized healthcare: envisioning human-centered AI-as-a-service for customized diagnosis and rehabilitation. Artif. Intell. Med.151, 102850 (2024). [DOI] [PubMed] [Google Scholar]
- 17.Chen, H. et al. Multi role ChatGPT framework for transforming medical data analysis. Sci. Rep.14 (1), 13930 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Gencer, A. Readability analysis of chatgpt’s responses on lung cancer. Sci. Rep.14 (1), 17234 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Madaudo, C. et al. Artificial intelligence in cardiology: a peek at the future and the role of ChatGPT in cardiology practice. J. Cardiovasc. Med. (Hagerstown). 25 (11), 766–771 (2024). [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.



