Skip to main content
Cureus logoLink to Cureus
. 2023 Oct 26;15(10):e47754. doi: 10.7759/cureus.47754

ChatGPT's Epoch in Rheumatological Diagnostics: A Critical Assessment in the Context of Sjögren's Syndrome

Bilal Irfan 1,, Aneela Yaqoob 2
Editors: Alexander Muacevic, John R Adler
PMCID: PMC10676288  PMID: 38022092

Abstract

Introduction: The rise of artificial intelligence in medical practice is reshaping clinical care. Large language models (LLMs) like ChatGPT have the potential to assist in rheumatology by personalizing scientific information retrieval, particularly in the context of Sjögren's Syndrome. This study aimed to evaluate the efficacy of ChatGPT in providing insights into Sjögren's Syndrome, differentiating it from other rheumatological conditions.

Materials and methods: A database of peer-reviewed articles and clinical guidelines focused on Sjögren's Syndrome was compiled. Clinically relevant questions were presented to ChatGPT, with responses assessed for accuracy, relevance, and comprehensiveness. Techniques such as blinding, random control queries, and temporal analysis ensured unbiased evaluation. ChatGPT's responses were also assessed using the 15-questionnaire DISCERN tool.

Results: ChatGPT effectively highlighted key immunopathological and histopathological characteristics of Sjögren's Syndrome, though some crucial data and citation inconsistencies were noted. For a given clinical vignette, ChatGPT correctly identified potential etiological considerations with Sjögren's Syndrome being prominent.

Discussion: LLMs like ChatGPT offer rapid access to vast amounts of data, beneficial for both patients and providers. While it democratizes information, limitations like potential oversimplification and reference inaccuracies were observed. The balance between LLM insights and clinical judgment, as well as continuous model refinement, is crucial.

Conclusion: LLMs like ChatGPT offer significant potential in rheumatology, providing swift and broad medical insights. However, a cautious approach is vital, ensuring rigorous training and ethical application for optimal patient care and clinical practice.

Keywords: artificial intelligence (ai), large language models (llms), chatgpt-4, chatgpt, sjögren’s syndrome

Introduction

Artificial intelligence (AI) continues to play an increasingly important role in medical practice and technology [1]. Its advancement as a field is radically transforming the landscape of clinical care, with significant considerations and outcomes for physicians, patients, and policymakers alike [2]. With such an emerging field, it is vital to map out the different ways that it can impact informed decision-making processes. In the context of rheumatology, large language models (LLMs) such as Chat Generative Pre-Trained Transformer (ChatGPT) can personalize the retrieval of scientific information by understanding the context of a user's query and tailoring the responses to meet specific needs, such as summarizing complex scientific articles into easy-to-understand formats for informed decision-making [3]. These models can also analyze vast datasets of scientific literature to identify key insights, trends, and relationships, thereby assisting in evidence-based decision-making by providing synthesized and relevant information [4].

In the context of Sjögren's Syndrome, ChatGPT and other large language models can offer targeted insights that are particularly helpful in differentiating this disorder from other rheumatological conditions. For example, if a healthcare provider is unsure whether a patient's symptoms of dry eyes and mouth are indicative of Sjögren's Syndrome or perhaps another autoimmune condition, the model could retrieve and summarize the most current diagnostic criteria, such as the American College of Rheumatology (ACR)/European League Against Rheumatism (EULAR) classification criteria for Sjögren's Syndrome. It could highlight key serological markers like anti-SSA/Ro and anti-SSB/La antibodies, which are more commonly associated with Sjögren's than with other disorders.

The LLM could also assist in treatment planning by retrieving the latest guidelines and summarizing pharmacological options, such as the use of hydroxychloroquine for arthralgia in Sjögren's, and how that differs from its use in, say, lupus or rheumatoid arthritis. It can provide summaries of the latest studies on emerging therapies, such as B-cell targeted treatments, that may have differing efficacies in Sjögren's compared to other rheumatological diseases. For monitoring risks of non-Hodgkin lymphoma in Sjögren's Syndrome, the LLM could outline evidence-based surveillance protocols, including which imaging or laboratory tests are most indicative of disease progression or malignancy risk, as opposed to what would be relevant in other rheumatological conditions. Given these considerations, we presented four questions to ChatGPT related to Sjögren's Syndrome where the responses were evaluated for informativeness, accuracy, and overall applicability and user-friendliness by two independent reviewers. 

Materials and methods

To explore the potential of ChatGPT in assisting healthcare professionals with insights into Sjögren's Syndrome and differentiating it from other rheumatological conditions, we developed a brief study utilizing a multi-pronged methodology. Initially, we compiled resources that consisted of peer-reviewed articles, clinical guidelines, and case studies specifically focused on Sjögren's Syndrome, as well as other rheumatological diseases. These texts were gleaned from reputable academic databases such as PubMed and ClinicalTrials.gov and were utilized to aid in analyzing the responses drawn from ChatGPT. 

The second phase involved the formulation of clinically relevant questions, by the authors, that healthcare providers commonly encounter in the diagnosis or management of Sjögren's Syndrome. Queries like "What are the immunological features of Sjögren's Syndrome?", "What are the histopathological features of Sjögren's Syndrome that make it more high risk for getting non-Hodgkin lymphoma?", "What is the appropriate follow-up management for a patient presenting Sjögren's Syndrome?", and "I have joint pain, dry eyes, dry mouth, and a persistent dry cough. What is the differential diagnosis?" were developed. These questions, four in total, were then used as prompts to engage ChatGPT, and the responses were thoroughly assessed in terms of their accuracy, relevance, and comprehensiveness, aligned with the current scientific literature. We were mindful of presenting questions in “new chats” to avoid it utilizing past conversation history to curate a specifically tailored response [5]. For example, when stating symptoms such as joint pain and dry eyes and requesting a differential diagnosis of the matter, past conversation about Sjögren's Syndrome’s immunology could influence its response as the subject matter was already being discussed [6]. 

Ensuring an unbiased assessment of the ChatGPT's utility was paramount in our study design. To achieve this, we employed multiple techniques. For one, we implemented a blinding technique, removing any tags or markers that might identify the source as an LLM [7]. This allowed for a blind evaluation by three reviewers which included clinicians with internal medicine specializations, as part of drawing from past studies making use of physician reviewers [8]. We also ensured that all evaluators declared any potential conflicts of interest prior to their assessments. The responses generated by ChatGPT were also cross-validated by repeating them in a new conversation thread to verify overall thematic consistency and accuracy. Scores from 1 (low) to 5 (high) were also independently assigned by reviewers, which were then averaged, to each article using the 15-questionnaire DISCERN tool to assess written health material [9]. To further bolster the unbiased nature of our assessment, we introduced several additional methods. Randomized control prompts about other rheumatological conditions like arthritis and osteoarthritis were interspersed with the main prompts to gauge if the model displayed any preference or bias towards Sjögren's Syndrome [10]. Temporal analysis was performed by conducting the study at various time points to examine if updates to the model influenced its response to our control queries. After finding no prominent difference in results to our control prompts between June-August 2023, we utilized ChatGPT Version 4 for our analysis which continues to regularly receive updates and technical fixes, with our focus on the end results of the Sjögren's Syndrome-related questions being based on the model present as of August 31, 2023. By rigorously adhering to these materials and methods, our study aims to offer a robust and unbiased evaluation of the capabilities of LLMs like ChatGPT in aiding the diagnosis and management of Sjögren's Syndrome in comparison to other rheumatological conditions.

Results

Upon questioning ChatGPT regarding the immunopathological characteristics of Sjögren's Syndrome, it elucidated key aspects, encompassing autoantibody generation, lymphocytic infiltration, imbalances in cytokine production, B-cell hyperactivity, and an augmented expression of Type I interferon (Figure 1). Intriguingly, the model inherently proffered five salient points, independent of any preset constraints, engendering implications for potential hierarchization of these pathophysiological features in terms of significance or prevalence. The bibliographic citations generated in alignment with this query were predominantly from credible sources; however, an inconsistency was noted in the fourth reference, where the year of publication was erroneously delineated (Figure 2). Notably, ChatGPT judiciously incorporated a caveat emphasizing the dynamism inherent in medical knowledge and flagged its own knowledge cut-off at September 2021, underscoring its circumspection [11]. 

Figure 1. What are the immunological features of Sjögren's Syndrome?

Figure 1

Figure 2. References for "What are the immunological features of Sjögren's Syndrome?" .

Figure 2

In its elucidation of the histopathological attributes specific to the syndrome, ChatGPT enumerated five cardinal features, whilst omitting crucial data, notably the periductal infiltration of the salivary and lacrimal glands predominantly by CD4+ helper T (Th) lymphocytes, among other cellular entities (Figure 3). Furthermore, the model did not delineate the genomic predilections contributing to the syndrome's pathogenesis. Disparities were also observed in the bibliographic citations for therapeutic management, with the foremost reference manifesting discrepancies in authorship attribution, publication chronology, title, and source journal (Figures 4-8) [12]. 

Figure 3. What are the histopathological features of Sjögren's Syndrome that make it more high risk for getting non-Hodgkin lymphoma?

Figure 3

Figure 4. References for "What are the histopathological features of Sjögren's Syndrome that make it more high risk for getting non-Hodgkin lymphoma?" (Part 1).

Figure 4

Figure 5. References for "What are the histopathological features of Sjögren's Syndrome that make it more high risk for getting non-Hodgkin lymphoma?" (Part 2).

Figure 5

Figure 6. What is the appropriate follow-up management for a patient presenting Sjögren's Syndrome? (Part 1).

Figure 6

Figure 7. What is the appropriate follow-up management for a patient presenting Sjögren's Syndrome? (Part 2).

Figure 7

Figure 8. What is the appropriate follow-up management for a patient presenting Sjögren's Syndrome? (Part 3).

Figure 8

When prompted with a clinical vignette detailing xerophthalmia, arthralgia, xerostomia, and persistent dry cough, ChatGPT adeptly identified a myriad of plausible etiological considerations, with Sjögren's Syndrome being preeminent (Figure 9). Furthermore, the model sagaciously expounded on the imperativeness of a comprehensive clinical evaluation and intimated potential specialist referrals encompassing rheumatology, ophthalmology, or gastroenterology (Figure 10) [13]. DISCERN scores were also relatively high for much of the responses presented by ChatGPT (Table 1). 

Table 1. DISCERN Scores for ChatGPT-generated responses to questions.

Question DISCERN Score
What are the immunological features of Sjögren's Syndrome? 4.22
What are the histopathological features of Sjögren's Syndrome that make it more high risk for getting non-Hodgkin lymphoma?  3.67
What is the appropriate follow-up management for a patient presenting Sjögren's Syndrome? 3.97
I have joint pain, dry eyes, dry mouth, and a persistent dry cough. What is the differential diagnosis for this? 4.82

Figure 9. I have joint pain, dry eyes, dry mouth, and a persistent dry cough. What is the differential diagnosis for this?

Figure 9

Figure 10. Recommendation to See Specialists.

Figure 10

Discussion

The utilization of artificial intelligence and LLMs like ChatGPT in the realm of rheumatology offers a spectrum of both promising opportunities and notable challenges. From our study, the immediacy with which ChatGPT can access and summarize vast amounts of data presents a significant advantage for both patients and healthcare providers [14]. Such access can facilitate evidence-based decision-making, particularly in complex areas like rheumatology where differential diagnosis can be intricate [15].

For patients, the use of ChatGPT can democratize information, offering insights into their symptoms and potential conditions even before a clinical consultation [16]. This can empower them with information, enabling more informed conversations with their healthcare providers [17]. For the common person, it provides a platform to understand complex medical conditions in simplified terms, bridging the knowledge gap [18]. Furthermore, for physicians and clinical providers, it can act as an efficient aid in clinical decision-making by offering quick references, diagnostic criteria, and updated treatment guidelines [19].

The limitations observed in our results, such as the tendency of ChatGPT to default to a brief explanation by listing only a select number of features, raise concerns about potential oversimplification [20]. The inherent risk is that essential clinical features may be omitted, leading to an incomplete understanding [21]. Moreover, its dependence on its last training data (in this case, up to September 2021) means it might not always provide the most up-to-date information. As seen from our results, ChatGPT did have inaccuracies in reference citations, further emphasizing the importance of cross-referencing [22].

While LLMs can simplify complex scientific information, there's an inherent responsibility to ensure that this does not inadvertently lead to misinformation [23]. As we observed, ChatGPT does acknowledge its last update, which is vital for transparency. However, there's a need for robust mechanisms to continuously update and train these models with the latest medical research. Another ethical consideration is the potential for LLMs to inadvertently influence clinical decision-making if relied upon too heavily. Physicians must balance the insights from LLMs with their clinical judgment. Additionally, it's imperative to address the potential bias in LLMs. As seen with our control queries about other rheumatological conditions, ensuring that the AI does not display any undue preference for a particular condition is crucial [24]. Overreliance on AI responses without critical evaluation can skew clinical perceptions. This is all becoming increasingly relevant as social media and telehealth technology continues to expand to refine areas of care and be a source of medical information [25,26]. 

Conclusions

The adoption of LLMs like ChatGPT in the domain of rheumatology holds vast potential, offering swift access to a broad spectrum of medical knowledge that can enhance evidence-based clinical decision-making. Our study underscores the efficiency and utility of ChatGPT in demystifying complex medical concepts, bridging the knowledge gap for both patients and healthcare professionals. However, this potential is counterbalanced by some limitations. The oversimplification observed in some responses and occasional inaccuracies in reference citations urge a cautious approach in relying solely on LLMs without cross-referencing or incorporating clinical judgment. As artificial intelligence continues to make strides in healthcare, the continuous refinement, rigorous training, and ethical application of these tools will be paramount in ensuring their optimal use in patient care and clinical practice.

The authors have declared that no competing interests exist.

Author Contributions

Concept and design:  Bilal Irfan

Acquisition, analysis, or interpretation of data:  Bilal Irfan, Aneela Yaqoob

Drafting of the manuscript:  Bilal Irfan

Critical review of the manuscript for important intellectual content:  Bilal Irfan, Aneela Yaqoob

Supervision:  Aneela Yaqoob

Human Ethics

Consent was obtained or waived by all participants in this study

Animal Ethics

Animal subjects: All authors have confirmed that this study did not involve animal subjects or tissue.

References

  • 1.Artificial intelligence: how is it changing medical sciences and its future? Basu K, Sinha R, Ong A, Basu T. Indian J Dermatol. 2020;65:365–370. doi: 10.4103/ijd.IJD_421_20. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.The potential for artificial intelligence in healthcare. Davenport T, Kalakota R. Future Healthc J. 2019;6:94–98. doi: 10.7861/futurehosp.6-2-94. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.AI am a rheumatologist: a practical primer to large language models for rheumatologists. Venerito V, Bilgin E, Iannone F, Kiraz S. Rheumatology (Oxford) 2023;62:3256–3260. doi: 10.1093/rheumatology/kead291. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Science in the age of large language models. Birhane A, Kasirzadeh A, Leslie D, et al. Nat Rev Phys. 2023;5:277–280. [Google Scholar]
  • 5.Visual chatgpt: talking, drawing and editing with visual foundation models [preprint] Wu C, Yin S, Qi W, et al. arXiv. 2023 [Google Scholar]
  • 6.Conversational ai models for ophthalmic diagnosis: comparison of ChatGPT and the Isabel pro Differential Diagnosis Generator. Balas M, Ing EB. JFO Open Ophthalmol. 2023;1:100005. [Google Scholar]
  • 7.Large language models encode clinical knowledge. Singhal K, Azizi S, Tu T, et al. Nature. 2023;620:172–180. doi: 10.1038/s41586-023-06291-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.The role of artificial intelligence language models in dermatology: opportunities, limitations and ethical considerations. Sathe A, Seth I, Bulloch G, et al. Australas J Dermatol. 2023 doi: 10.1111/ajd.14133. [DOI] [PubMed] [Google Scholar]
  • 9.Evaluating the reliability of DISCERN: a tool for assessing the quality of written patient information on treatment choices. Rees CE, Ford JE, Sheard CE, et al. Patient Educ Couns. 2002;47:273–275. doi: 10.1016/s0738-3991(01)00225-7. [DOI] [PubMed] [Google Scholar]
  • 10.Evaluating ChatGPT’s ability to solve higher-order questions on the competency-based medical education curriculum in medical biochemistry. Ghosh A, Bir A. Cureus. 2023;15:37023–37010. doi: 10.7759/cureus.37023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Impact of ChatGPT on medical chatbots as a disruptive technology. Chow JC, Sanders L, Li K. Front Artif Intell. 2023;6:1166014. doi: 10.3389/frai.2023.1166014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.High rates of fabricated and inaccurate references in ChatGPT-generated medical content. Bhattacharyya M, Miller VM, Bhattacharyya D, et al. Cureus. 2023;15:39238–39210. doi: 10.7759/cureus.39238. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Evaluation of ChatGPT’s capabilities in medical report generation. Zhou Z. Cureus. 2023;15:0. doi: 10.7759/cureus.37589. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.ChatGPT for future medical and dental research. Fatani B. Cureus. 2023;15:37285–37210. doi: 10.7759/cureus.37285. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Evaluating the feasibility of ChatGPT in healthcare: an analysis of multiple clinical and research scenarios. Cascella M, Montomoli J, Bellini V, Bignami E. J Med Syst. 2023;47:33. doi: 10.1007/s10916-023-01925-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.ChatGPT in medicine: an overview of its applications, advantages, limitations, future prospects, and ethical considerations. Dave T, Athaluri SA, Singh S. Front Artif Intell. 2023;6:1169595. doi: 10.3389/frai.2023.1169595. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Telehealth, social media, patient empowerment, and physician burnout: seeking middle ground. Mano MS, Morgan G. Am Soc Clin Oncol Educ Book. 2022;42:1–10. doi: 10.1200/EDBK_100030. [DOI] [PubMed] [Google Scholar]
  • 18.ChatGPT, GPT-4, and other large language models - the next revolution for clinical microbiology? Egli A. Clin Infect Dis. 2023 doi: 10.1093/cid/ciad407. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.The wide range of opportunities for large language models such as ChatGPT in rheumatology. Hügle T. RMD Open. 2023;9:0. doi: 10.1136/rmdopen-2023-003105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Understanding the role and adoption of artificial intelligence techniques in rheumatology research: an in-depth review of the literature. Madrid-García A, Merino-Barbancho B, Rodríguez-González A, Fernández-Gutiérrez B, Rodríguez-Rodríguez L, Menasalvas-Ruiz E. Semin Arthritis Rheum. 2023;61:152213. doi: 10.1016/j.semarthrit.2023.152213. [DOI] [PubMed] [Google Scholar]
  • 21.Study tests large language models’ ability to answer clinical questions. Harris E. JAMA. 2023;330:496. doi: 10.1001/jama.2023.12553. [DOI] [PubMed] [Google Scholar]
  • 22.Learning to fake it: limited responses and fabricated references provided by ChatGPT for medical questions. Gravel J, D’Amours-Gravel M, Osmanlliu E. Mayo Clin Proc Digit Health. 2023;1:226–234. [Google Scholar]
  • 23.ChatGPT: when artificial intelligence replaces the rheumatologist in medical writing. Verhoeven F, Wendling D, Prati C. Ann Rheum Dis. 2023;82:1015–1017. doi: 10.1136/ard-2023-223936. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Artificial intelligence and deep learning for rheumatologists. McMaster C, Bird A, Liew DF, Buchanan RR, Owen CE, Chapman WW, Pires DE. Arthritis Rheumatol. 2022;74:1893–1905. doi: 10.1002/art.42296. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.The spread of sleep health information on TikTok: an analysis of user-generated content. Irfan B, Yasin I. Sleep Med. 2023;110:154. doi: 10.1016/j.sleep.2023.08.009. [DOI] [PubMed] [Google Scholar]
  • 26.Tele-ID politesse: recognizing cross-culturally sensitive care with hijab & niqab. Irfan B, Yasin I, Yaqoob A. Clin Infect Dis. 2023 doi: 10.1093/cid/ciad426. [DOI] [PubMed] [Google Scholar]

Articles from Cureus are provided here courtesy of Cureus Inc.

RESOURCES