Skip to main content
PLOS One logoLink to PLOS One
. 2025 Jun 18;20(6):e0326351. doi: 10.1371/journal.pone.0326351

Evaluating the readability, quality, and reliability of responses generated by ChatGPT, Gemini, and Perplexity on the most commonly asked questions about Ankylosing spondylitis

Mete Kara 1, Erkan Ozduran 2,*, Müge Mercan Kara 3, İlhan Celil Özbek 4, Volkan Hancı 5
Editor: Mark Hwang6
PMCID: PMC12176213  PMID: 40531978

Abstract

Ankylosing spondylitis (AS), which usually occurs in the second and third decades of life, is associated with chronic pain, limitation of mobility, and severe decreases in quality of life. This study aimed to make a comparative evaluation in terms of the readability, information accuracy and quality of the answers given by artificial intelligence (AI)-based chatbots such as ChatGPT, Perplexity and Gemini, which have become popular with the widespread access to medical information, to user questions about AS, a chronic inflammatory joint disease. In this study, the 25 most frequently queried keywords related to AS determined through Google Trends were directed to each 3 AI-based chatbots. The readability of the resulting responses was evaluated using readability indices such as Simple Gunning Fog (GFOG), Flesch Reading Ease Score (FRES) and Measure of Gobbledygook (SMOG). The quality of the responses was measured by Ensuring Quality Information for Patients (EQIP) and Global Quality Score (GQS) scores, and the reliability was measured using the modified DISCERN and Journal of American Medical Association (JAMA) scales. According to Google Trends data, the most frequently searched keywords related to AS are “Ankylosing spondylitis pain”, “Ankylosing spondylitis symptoms” and “Ankylosing spondylitis disease”, respectively. It was found that the readability levels of the answers produced by AI-based chatbots were above the 6th grade level and showed a statistically significant difference (p < 0.001). In EQIP, JAMA, mDISCERN and GQS evaluations, Perplexity stood out in terms of information quality and reliability, receiving higher scores compared to other chat robots (p < 0.05). It has been found that the answers given by AI chatbots to AS-related questions exceed the recommended readability level and the reliability and quality assessment raises concerns due to some low scores. It is possible for future AI chatbots to have sufficient quality, reliability and appropriate readability levels with an audit mechanism in place.

Introduction

Ankylosing spondylitis (AS) is a chronic and progressive inflammatory disease primarily involving the axial skeleton [1]. AS, the most common form of axial spondyloarthropathies (SpA), is characterised by inflammation of the spine and sacroiliac joints. Over time, this inflammation can lead to decreased spinal mobility and, in severe cases, spinal fusion [2]. AS, which usually occurs in the second and third decades of life, is associated with chronic pain, limitation of mobility, and severe decreases in quality of life. Genetic predisposition, especially HLA-B27 positivity, is known to play an important role in the development of AS [3]. While the diagnostic process is based on the patient’s symptoms, critical diagnostic tools such as magnetic resonance imaging (MRI) and HLA-B27 testing can be used [4]. The global prevalence of AS ranges from approximately 0.1% to 1.4% [4].

According to the axial spondyloarthritis management recommendations prepared by The Assessment of SpondyloArthritis international Society (ASAS)-EULAR in 2022, it is important to inform the patient correctly about his disease and encourage him to be educated, exercise and quit smoking [5]. It is strongly emphasized that all patients should receive education about the disease as a starting point for self-management, to engage and empower them as active partners in their care [6]. Despite all these positive recommendations, the healthcare provider’s knowledge and communication skills, as well as the ability to assess patients’ educational needs, can impact patient education [7]. It is reported that lack of information about health care provider causes delays of 7–10 years in the diagnosis process [8]. Not only this, but also the patient’s social and cultural background and physiological factors may also create some barriers. Finally, the level of health literacy that helps patients access, understand, evaluate, and apply information regarding healthcare also plays a major role [7].

Accessing online information in the field of health is not a new issue. According to a study, 9 out of 10 Americans accessed the internet in 2018, and 75% of them tried to access information in the field of health [9]. With the release of version 3.5 of the Chat Generative PreTrained Transformer (ChatGPT) in 2022, a large language model (LLM) powered by AI, a new alternative search engine has emerged instead of traditional internet search engines. Perplexity AI (2022) and Google Gemini (2024- formerly BARD 2023) were other applications that entered the AI market as “chatbots” that use natural language processing (NLP) and machine learning to answer user questions [10]. It is estimated that ChatGPT has 300 million weekly active users and 3.8 billion monthly visitors according to November 2024 data [11]. Many AI applications, especially the AI intelligence applications listed above, have been the subject of studies on different medical subjects in which the answers given by users to their questions are evaluated [12,13].

A popular topic in studies created with AI applications is rheumatological diseases. [14]. Many benefits are mentioned in the literature that AI applications can facilitate screening, diagnosis, monitoring, risk assessment, determination of prognosis, obtaining optimal treatment results and new drug discovery for patients with rheumatoid arthritis [15]. According to the study, it is reported that 58.9% of the patients diagnosed with rheumatoid arthritis, AS and fibromyalgia have access to information about their diseases via the internet [16]. It is well known that; patients who are informed about the causes, pathophysiology, treatment and prevention of a disease can better participate and adapt to disease prevention or treatment procedures [9]. The contribution of data provided by AI applications to the field of health in different rheumatological diseases such as AS can be discussed in future studies and can help this field.

According to studies, it is stated that information and communication technology-based patient education has the potential to improve self-management and behavioral changes in patients in autoimmune rheumatological diseases [17]. The use of AI in rheumatic diseases has increased the diagnosis and accuracy of rheumatic diseases, made it possible to predict patient outcomes, expanded treatment options, and facilitated the provision of personalized medical solutions [18]. AI-powered tools facilitate personalized patient education and provide remote diagnostic and treatment support [19]. It can be stated that AI-supported applications have promising potential in the future in early diagnosis and appropriate treatment planning for the patient in chronic inflammatory and multisystemic diseases such as AS, as in other rheumatic diseases. However, despite all these positive effects, it is of great importance that patients have access to reliable and understandable sources when searching for information online about chronic diseases such as AS. Misinformation can cause unnecessary anxiety and treatment non-compliance [14].

Readability is defined as a criterion that determines the comprehensibility of written materials by the target audience. Readability level can be determined by formulae such as Flesch-Kincaid, Gunning Fog and SMOG. The readability level for patient education materials (PEM) is recommended by the American Medical Association (AMA), the National Institutes of Health (NIH), and the US Department of Health and Human Services to be grade 6 or lower. This standard aims to ensure that patients have access to accurate and understandable information about their disease and treatment options [20,21].

Texts that are difficult to read negatively affect health literacy and prevent patients from managing chronic diseases. Patients have difficulty even following the medication instructions they should use [22]. On the contrary, high health literacy resulting from more easily readable texts is associated with appropriate medication adherence [23].

This study aims to contribute to the literature by comprehensively evaluating the reliability, quality and readability of health information about AS produced by three AI – supported chatbots: ChatGPT, Perplexity and Gemini. The lack of a studies in the literature analyzing the texts produced by AI-assisted chatbots about AS increases the importance of this research. Patients’ access to more reliable and understandable information will contribute to more favorable outcomes in disease management.

Materials and methods

Ethics comittee permission

The planning, execution, and data collection processes of this cross-sectional study were carried out in accordance with the approval of the relevant ethics committee. (Cumhuriyet University Ethics Committee, Ethics Committee No: 2024-10/10, Date: 17.10.2024). Written or verbal informed consent was not obtained because the current study did not include human participants or human tissue.

In this study, similar to previous studies in the literature, the answers given by AI chatbots to the most frequently asked questions about AS were evaluated in a cross-sectional manner within a static time period [13]. For this reason, it is clear that different study results may occur in different time periods.

Research procedure

At the beginning of the study, personal internet browser data was completely deleted and Google Incognito mode was activated in order to eliminate possible biases. However, in real life, although the algorithmic biases are mitigated by using Google Incognito mode, disabling the user’s lifestyle and searches may lead to different results, which may cause a limitation [24].

On the Google Trends (https://trends.google.com/) platform, the frequencies and geographical distributions of searches made on the keyword “Ankylosing Spondylitis” worldwide from 2004 to the present were examined on October 18, 2024, using the “most relevant” results filter and The 25 most frequently searched keywords and geographical areas of interest were identified and recorded [9]. Measurements made in January 2024 showed that Google’s share in the search engine market was 81.95%. Considering this overwhelming superiority, the Google search engine was used in our study to access the most reliable and largest database [25]. In our study, the 25 most frequently asked questions on Google Trends were evaluated [26]. Just as there are studies similar to this methodology in the literature, there are also studies in the methodology in which the 100 most frequently asked questions about any medical subject are questioned. The 100 questions obtained in these studies were evaluated separately by asking AI chatbots one by one and a larger sample was obtained [20,21]. In addition, there are AI studies that examine the most frequently asked questions on the official websites of some communities and organizations [27]. Although studies with different methodologies change the comprehensiveness of the studies by creating sample size differences in the sample, they add their own uniqueness to each study.

The study aimed to examine the responses of freely available AI models, including ChatGPT, Gemini and Perplexity, to the specified keywords. Accordingly, the obtained keywords were directed to AI models in English [20,28]. AI Chatbots have different working principles. Generative Pre-trained Transformer (GPT) is a natural language generation model developed by OpenAI. It learns to predict the next word in a given text and produces meaningful text that resembles the typed content [29]. Perplexity; It is a chatbot based on OpenAI GPT technology and provides answers containing references to queries and directions. Gemini AI is a “native multimodal” model that allows it to process and learn from various types of data, including text, video and audio. Gemini is capable of analyzing complex data sets such as images and subgraphs [12].

It is known that underlying machine learning methods are not well positioned to distinguish between factually correct and incorrect information during data entry, which is why AI applications regularly make factual errors and provide imprecise information [30]. Cyber crimes may occur as a result of entering personal information being recorded by chatbots, creating fake profiles and manipulating images. This situation brings with it some concerns about loss of privacy about our own data [31]. In the light of this informations, different user logins were made and each keyword was asked to the AI chatbots, thus trying to prevent biases that may occur during the sequential processing of keywords. The answers produced by the AI models were recorded in the database to be evaluated with readability, reliability and quality metrics. The keywords and responses from each AI chatbot can be accessed through the web archive found at: https://figshare.com/articles/dataset/AI-AS_Responds_docx/28395005?file=52285787

In this study, the free and publicly available GPT-4o model of the ChatGPT free version, which has the highest level of social accessibility of AI technologies, was used. In this way, it is aimed to evaluate the information about the content presented to individuals at different socioeconomic levels when they benefit from these Technologies [12,20].

Readability evaluation

Readability formulas.

The responses of AI chatbots to the keywords they produced were examined with various formulas using the website that measures text readability (http://readabilityformulas.com/). Since there is no information about which readability index provides more accurate and reliable information or a gold standard readability index, the most popular indexes used in previous studies, similar to the literature, were used in our study.

In this study, the Coleman-Liau Readability Index (CLI), Linsear Write (LW), Automated Readability Index (ARI), Simple Measure of Gobbledygook (SMOG), Gunning Fog Readability Index (GFOG), The Flesch Reading Ease Score (FRES) and Commonly used readability metrics such as Flesch-Kincaid Grade Level (FKGL) were used. These metrics were used to determine how close the texts produced by AI are to daily spoken language and their level of comprehensibility [20,21,32].

Readability score evaluation.

Detailed information on how different readability formulas work is presented in Table 1. The obtained readability scores are expressed as median (minimum-maximum) values to determine the general readability level of the texts. In this study, the results obtained were compared with the sixth grade readability level recommended by the American Medical Association and the National Institutes of Health (NIH). Accordingly, the average acceptable score for the Flesch Reading Ease Score (FRES) was determined as 80.0, and for the other six formulas as 6. Final scores below 80.0 for the FRES formula and above 6 for other formulas represent text above the average recommended readability level [20,21].

Table 1. Readability tools, formulas and descriptions.
Readability Index Description Formula
Gunning FOG (GFOG) It estimates the number of years of education required for a person to understand a given text. G = 0.4 X (W/S+((C*/W) X 100))
Flesch Reading Ease Score(FRES) It was created to assess the readability of newspapers and is particularly effective for evaluating school textbooks and technical manuals. The scores range from 0 to 100, with higher scores indicating greater ease of reading. I = (206.835 – (84.6 X (B/W)) – (1.015 X (W/S)))
Flesch–Kincaid grade level (FKGL) Delineates the academic capacity level imperative for grasping the written material G = (11.8 X (B/W)) + (0.39 X (W/S)) −15.59
Simple Measure of Gobbledygook (SMOG) It measures the number of years of education the average person needs to understand a text. G = 1.0430 X √C + 3.1291
Coleman–Liau (CL) score Evaluates the educational level required for understanding a text and offers an associated grade level in the US education system. G = (−27.4004 X (E/100)) + 23.06395
Linsear Write (LW) Offers an approximate assessment of the academic level needed to comprehend the text.

LW = (R + 3C)/S Result

  • • If >20, divide by 2

  • • If ≤20, subtract 2, and then divide by 2

Automated readability index (ARI) Assesses the scholastic rank in American educational institutions needed to be capable of comprehending written material. The greater the number of characters, the more complex the term. ARI = 4.71 X l + 0.5*ASL – 21.43

G = Grade level; B = Number of syllables; W = Number of words; S = Number of sentences; I = Flesch Index Score; SMOG = Simple Measure of Gobbledygook; C = Complex words (≥3 syllables); E = predicted Cloze percentage = 141.8401 – (0.214590 X number of characters) + (1.079812*S); C* = Complex words with exceptions including, proper nouns, words made 3 syllables by addition of “ed” or “es”, compound words made of simpler words. ASL = the average number of sentences per 100 words R = the number of words ≤2 syllables.

Reliability evaluation

In our study, the Modified DISCERN scale was used to determine the reliability level of information sources. This scale evaluates five different criteria and gives sources a score between 0 and 5. A higher score means that the source provides information with higher reliability [33].

The questions in the scale evaluate five basic dimensions such as the timeliness of the sources, whether additional sources of information are listed, whether they address discussions, and the clarity and impartiality of the language [34]. The validity and reliability of the JAMA and DISCERN scales have been tested in previous studies [35,36].

In the analysis carried out based on “The Journal of the American Medical Association (JAMA) Benchmark”, which is another reliability scale we used in our study, the scientific reliability of the texts was evaluated within the framework of four basic publication ethics principles such as authorship, currency, disclosure and attribution.

In the evaluation made according to JAMA criteria, 0 or 1 point is given for each criterion depending on whether the study meets this criterion or not. The total score obtained in this way indicates the overall reliability level of the study between 0 and 4. Higher scores indicate that the study meets these criteria better and is therefore more reliable [14,37].

Quality evaluation

The Global Quality Score (GQS) is a scale that evaluates the quality of online health information on a five-point scale. On this scale, a score of 1 represents the lowest quality and a score of 5 represents the highest quality. Accordingly, a health source with a score of 1 does not provide any benefit to patients, while a source with a score of 5 provides extremely reliable and comprehensive information. Likewise, 2 points are considered low quality and limited in use, 3 points are considered medium quality and limited useful, and 4 points are considered good quality and useful [3840]. The reliability and validity of the GQS questionnaire are found in the literature [41].

EQIP (Ensuring Quality Information for Patients) is a tool used to evaluate the quality of medical texts. Based on the “yes”, “partially” or “no” answers given to the 20 questions in this tool, the quality of the text is determined with a score between 0 and 100. Scoring is done by giving 1 point for each “yes” answer, 0.5 for the “partially” answer and 0 point for the “no” answer, and the final score is calculated. The final score is divided by 20 and then those that do not apply are subtracted and multiplied by 100 ((X of Yes*1) + (Y of Partly*0.5) + (Z of No*0))/20 – (Q of does not apply))] *100 = % score) [42]. The results are evaluated in four categories, “severe problems with quality” (0−25%), “severe problems with quality” (26−50%), “good quality with minor problems” (51−75%) and “well written” (76−100%) [43].

In our study, the surveys used in AI studies in the literature were carefully evaluated and more than one reliability and quality surveys were included in our study. In the literature, the method of evaluating accuracy on a medical subject through the joint opinion of experienced authors on that subject has not been used because it is not considered objective [44].

Statistical analysis

We used SPSS Windows version 24.0 (SPSS Inc., USA) to analyze the data. Kolmogorov-Smirnov and Shapiro-Wilk tests were used as normality tests. It was determined that it did not comply with normal distribution. For this reason, nonparametric tests were used. Categorical variables were summarized with frequencies and percentages, while continuous variables were described with medians and their ranges. To compare categorical variables, we employed Fisher’s exact test and the chi-square test. For continuous variables, the Mann-Whitney U and Wilcoxon tests were used. Statistical significance was set at a p-value less than 0.05.

Results

The most frequently used keywords on Google by users looking for information about ankylosing spondylitis were determined via Google Trends as “Ankylosing spondylitis pain”, “Ankylosing spondylitis symptoms” and “Ankylosing spondylitis disease”. The keyword “Symptoms of Ankylosing spondylitis” was removed from the analysis because the keyword “Ankylosing spondylitis symptoms” was present. The keyword “Treatment for Ankylosing spondylitis” was removed from the analysis because the keyword “Ankylosing spondylitis treatments” existed. The keywords “Arthritis”, “Rheumatoid Arthritis”, “Spondylosis”, “Psoriatic arthritis”, “osteoarthritis”, “as” and “Ankylosing spondylitis Reddit” were removed from the analysis because they were irrelevant to the topic and were answered by AI chatbots by evoking different diseases.

Finally, we focused on the 16 identified keywords. All of these keywords are listed in Table 2. As a result of the analysis, it was determined that the highest search volume regarding AS was in Australia, New Zealand and Ireland, respectively. Data on AS prevalence in Australia is not clear, and it is estimated to be 0.2% in Western Australia. [45]. Promising developments in the treatment of rheumatological diseases in Australia and New Zealand may have increased public awareness in this field and increased online searches [46].

Table 2. Top 16 relevant keywords searched about Ankylosing spondylitis across countries: 2004-2023 (Based on Google trends data).

Rank Keyword
1 Ankylosing spondylitis pain
2 Ankylosing spondylitis symptoms
3 Ankylosing spondylitis disease
4 What is Ankylosing spondylitis
5 Ankylosing spondylitis treatment
6 As Ankylosing spondylitis
7 Ankylosing spondylitis meaning
8 Ankylosing spondylitis causes
9 Ankylosing spondylitis diagnosis
10 Ankylosing spondylitis icd10
11 Ankylosing spondylosis
12 Hla b27
13 Uveitis
14 Ankylosing spondylitis radiology
15 Spondyloarthritis
16 Ankylosing spondylitis blood test

EQIP: ensuring quality information for patients.

AS-related keywords were entered into ChatGPT-4o, Perplexity and Google Gemini AI chatbots. The readability scores of the answers given by these AI chatbots were calculated. The readability levels of the texts were evaluated by comparing them with the ability of a 6th grade reader to understand the text (Table 3).

Table 3. Readability scores for Chatgpt-4o, Gemini, and Perplexity responses to the most frequently asked Ankylosing spondylitis -related questions, and a statistical comparison of the text content to a 6th-grade reading level [Median, 95% Confidence Interval (CI) (Lower limit of confidence interval- Upper limit of confidence interval)].

CALCULATOR Statistics ChatGPT Gemini Perplexity ChatGPT
C6thGRL
(P)* (Value of Cohen’s d-Effect size)
Gemini C6thGRL
(P)* (Value of Cohen’s d-Effect size)
Perplexity C6thGRL
(p)* (Value of Cohen’s d-Effect size)
Between
ChatGPT and Gemini
(p)††
Between
ChatGPT and
Perplexity
(p)††
Between
Perplexity and Gemini
(p)††
(Value of Cohen’s d-Effect size)
FRES 34.64
(25.05-37.11)
35.04
(30.88-40.7)
21.76
(18.42-28.93)
< 0.001 (5.2-large) < 0.001 (5.9-large) < 0.001 (6.7-large) 0.386 0.050 < 0.001
(1.27-large)
GFOG 15.35
(14.5-17.09)
13.95
(13.36 −15.34)
16.79
(15.51-17.34)
< 0.001 (4.02-large) < 0.001 (4.51-large) < 0.001 (6.9-large) 0.113 0.207 0.007 (1.16-large)
FKGL 11.87
(11.49-13.85)
11.65
(10.93-12.61)
13.48
(12.75-14.39)
< 0.001 (3.01-large) < 0.001 (3.7-large) < 0.001 (4.9-large) 0.498 0.070 0.003 (1.16-large)
CLI 14.54
(13.9-16)
14.21
(13.31-15.01)
16.40
(14.97 −16.94)
< 0.001 (4.5-large) < 0.001 (5.1-large) < 0.001 (5.4-large) 0.336 0.097 0.024 (1.04-large)
SMOG 11.70
(11.55-13.25)
11.47
(11.05-12.31)
12.96
(12.12-13.71)
< 0.001 (4.01-large) < 0.001 (4.8-large) < 0.001 (4.6-large) 0.228 0.309 0.003 (0.9-large)
ARI 11.94
(11.58-14.07)
11.47
(10.86-12.67)
13.05
(12.52-14.32)
< 0.001 (2.9-large) < 0.001 (3.4-large) < 0.001 (4.4-large) 0.228 0.152 0.024
(0.98-large)

Abbreviations: Flesch reading ease score (FRES), Gunning FOG (GFOG), Flesch-Kincaid Grade Level (FKGL), Simple Measure of Gobbledygook (SMOG), Coleman-Liau Index (CLI), Automated Readability Index (ARI) and Linsear Write (LW)

*

: C6thGRL(p): Comparison of the responses according to 6th grade reading level (p)

: Wilcoxon test

††

: Chi-Square test for categorical variables and Mann-Whitney U test for continuous variables

p values in bold are statistically significant

Assessment of readability across the three groups, utilizing average scores from the readability calculator

When evaluating the readability of responses across all three chatbots, significant differences were observed between specific groups. In the readability analysis, no significant difference was found between ChatGPT-4o and Perplexity or ChatGPT-4o and Gemini in pairwise comparisons. Perplexity and Gemini showed significant differences across the FRES, GFOG, FKGL, SMOG, CLI, and ARI readability formulas (p < 0.001 – Cohen’s d = 1.27-large effect, p = 0.007 – Cohen’s d = 1.16 – large effect, p = 0.003 – Cohen’s d = 1.16 – large effect, p = 0.024, p = 0.003 – Cohen’s d = 0.9 – large effect, p = 0.024 Cohen’s d = 0.98 – large effect, respectively). According to the readability evaluations, the metrics rank the systems from easiest to hardest as follows: Google Gemini, ChatGPT-4.0, and Perplexity (Table 3, Fig 1). Although Gemini was the easiest to read compared to other chatbots, it still gave responses that were difficult to read at the recommended sixth-grade level.

Fig 1. Binary readability evaluation between chatbots.

Fig 1

Evaluating the responses from ChatGPT, Gemini, and Perplexity in relation to the recommended sixth-grade reading level

When comparing the median readability scores of all responses to the sixth-grade reading level, a statistically significant difference was found across all metrics (p < 0.001). Notably, the readability of the responses surpassed the sixth-grade standard in every metric (Table 3). People who gain disease-specific knowledge through texts at the recommended readability level can understand the cause and pathomechanics of the disease, have accurate information about prevention and treatment options, and can participate in prevention or rehabilitation more effectively and actively [47].

Evaluation of reliability and quality

The JAMA, DISCERN, GQS and EQIP evaluation results (median, 95% Confidence Interval (CI) (Lower limit of confidence interval- Upper limit of confidence interval)) of the answers given by ChatGPT were as follows: 0 (0–0), 2 (1.6–2.03), 3 (2.5–3.63), 48.50 (45–50.37). The JAMA, DISCERN, GQS and EQIP evaluation results (median, 95% Confidence Interval (CI) (Lower limit of confidence interval- Upper limit of confidence interval))of the answers given by Google Gemini were as follows; 0 (0.16–0.71), 2 (1.36–1.89), 3 (2.63–3.24) and 48.65 (47.58–51.43). The JAMA, DISCERN, GQS and EQIP evaluation results (median, 95% Confidence Interval (CI) (Lower limit of confidence interval- Upper limit of confidence interval)) of the answers given by Perplexity were as follows; 2 (2.16–2.71), 3 (2.97–3.4), 4 (3.8–4.07) and 64.95 (63.13–68.09). Perplexity’s responses obtained the highest Figure scores in EQIP, JAMA, modified DISCERN, and GQS assessments (p < 0.001) (Table 4, Fig 2). The most important factor that could put Perplexity one step ahead of other chatbots may have been the availability of information such as references, author names, and the creation date of the text.

Table 4. Comparison of JAMA,EQIP, modified DISCERN and Global Quality Scale (GQS) ratings for the responses from ChatGPT-4o, Gemini, and Perplexity.

ChatGPT vs Perplexity ChatGPT vs Gemini Perplexity vs Gemini
ChatGPT Perplexity P (Value of Cohen’s d-Effect size) ChatGPT Gemini P Perplexity Gemini P (Value of Cohen’s d-Effect size)
GQS, n (%) 0.008 *
(1.13 -large)
0.075* < 0.001* (2.26-large)
1-point 2(12.5) 0 (0) 2(12.5) 0(0) 0 (0) 0(0)
2-point 2(12.5) 0 (0) 2(12.5) 3(18.7) 0 (0) 3(18.7)
3-point 5(31.3) 1(6.3) 5(31.3) 11(68.8) 1(6.3) 11(68.8)
4-point 7(43,7) 15(93.7) 7(43,7) 2(12.5) 15(93.7) 2(12.5)
5-point 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)
JAMA, n (%) < 0.001* (6.73 -large) >0.999 < 0.001* (3.9-large)
0-point 16(100) 0 (0) 16(100) 9(56.3) 0 (0) 9(56.3)
1-point 0 (0) 0 (0) 0 (0) 7(43.7) 0 (0) 7(43.7)
2-point 0 (0) 9(56.3) 0 (0) 0 (0) 9(56.3) 0 (0)
3-point 0 (0) 7(43.7) 0 (0) 0 (0) 7(43.7) 0 (0)
4-point 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)
m DISCERN, n (%) < 0.001* (3.4 -large) 0.433** < 0.001 *
(3.-large)
1-point 3(18.8) 0 (0) 3(18.8) 6(37.5) 0 (0) 6(37.5)
2-point 13(81.3) 0 (0) 13(81.3) 10(62.5) 0 (0) 10(62.5)
3-point 0 (0) 13(81.3) 0 (0) 0 (0) 13(81.3) 0 (0)
4-point 0 (0) 3(18.7) 0 (0) 0 (0) 3(18.7) 0 (0)
5-point 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)
EQIP, n(%) 0.004* (3.7 -large) >0.999 0.004* (3.87-large)
Serious problems with qood quality 8(50) 0 (0) 8(50) 8(50) 0 (0) 8(50)
Good quality with minor problems 8(50) 15 (93.8) 8(50) 8(50) 15 (93.8) 8(50)
Well written 0 (0) 1 (6.2) 0 (0) 0 (0) 1 (6.2) 0 (0)

EQIP: ensuring quality information for patients.

*

: Chi-Square test

p values in bold are statistically significant

Fig 2. Binary reliability and quality evaluation between chatbots.

Fig 2

According to the findings of this study, Gemini was found to be the easiest to read, and Perplexity was found to be the best in quality and reliability. All three AI models have a higher readability level than the recommended average readability level.

Discussion

In this study, it was determined that the answers given by Perplexity, ChatGPT and Gemini AI chatbots to frequently asked questions regarding AS had a reading level above the 6th-grade level recommended by the US Department of Health and Human Services and the National Institutes of Health. A comprehensive evaluation has been conducted on the accuracy, reliability, and comprehensibility of AS-related information produced by these AI tools. Our study makes a significant contribution to the literature as one of the first comprehensive evaluations of answers to frequently asked questions about AS generated by popular large language models.

Health literacy is defined as the degree to which an individual can access, process, and comprehend basic health information and services, and therefore participate in health-related decisions [21]. Digital health literacy is defined as the ability to evaluate health information received from electronic sources and apply the obtained information to address or solve a health-related problem and is therefore known as an important component of health literacy. However, the use of medical terminology and jargon in the texts and dense paragraphs makes the texts difficult to understand and poses obstacle for those with limited health literacy [48]. It is stated that people with poor eHealth literacy tend to be significantly older and have more chronic health problems [49]. Providing internet and device access to individuals trying to access health-related online information, and ensuring that the information provided is at the recommended readability level, will help remove barriers to health literacy [48].

In the literature, studies investigating online information about AS have shown that texts have more difficult readability levels than recommended. In a study examining online information on publicly accessible websites about AS, the authors found that text readability levels were 4.1 ± 2.12 (mean ± SD) grade levels higher than the recommended 6th grade level [50]. In another study, the readability of written patient information and consent documents presented to patients in 24 rheumatological studies in the literature was evaluated and they found that they generally had a higher readability level than recommended for health literature [51]. In another study in which 200 websites were examined, it was determined that 46% of AS-related websites were high-quality, that sites sourced from scientific journals and news provided high quality information, and commercial websites provided low quality information. The authors stated that websites with poor readability provided high quality information [52]. AI is often mentioned as a potential solution to the problems facing healthcare today. It is suggested that AI has the potential to “give the gift of time” by allowing doctors and patients to use time more effectively in care [53].

For this reason, our study aimed to fill an important gap in the readability, reliability and quality of the answers given by AI-based chatbots, not online websites, to questions about a chronic disease such as AS.

To our knowledge, no studies have been conducted to comparatively examine the readability, quality, and reliability of responses from AI chatbots such as Gemini, ChatGPT, and Perplexity on the topic of AS. In a the study in which ChatGPT chatbots were examined to 60 questions asked in Spanish about some chronic diseases such as Ssystemic lupus erythematosus and Rheumatoid Arthritis, they determined the readability level of the answers obtained as “moderately difficult”. The authors reported that such practices tend to produce more successful results in the language in which they are trained, and therefore future studies comparing the responses given in Spanish and English may shed light on the subject [54].

In a study in which the answers to 30 rheumatological questions asked to the AI chatbot called Microsoft Bing Chat were evaluated, rheumatologists found that these answers were of low quality and poor readability. The authors stated that rheumatology patients should be careful when using these AI tools when answering their medical questions [55]. In the study, which examined the answers given by five AI chatbots, namely Gemini, Microsoft Copilot, PiAI, ChatGPT and ChatSpot, to 45 questions created based on the recommendations in the guidelines on psoriasis, cardiovascular health and oncology, the readability levels of the answers were determined as “advanced and academically based level”. The authors reported that these answers may be difficult to understand for individuals with less than a university-level education , and that the answers provided by their chatbots vary in terms of sentence length, readability, consistency, and accuracy [56].

There are studies in the literature that evaluate the answers given by AI not only in rheumatological diseases but also in other medical subjects, such as palliative care, subdural hematoma, or low back pain. In these studies, it has been clearly stated that AI chatbots can offer the opportunities to improve health outcomes and patient satisfaction, but the high level of readability prevents this situation [13,20,21].

In parallel with other studies in the literature, our research, it was determined that the readability level of the content about AS produced by AI models such as Gemini, Perplexity and ChatGPT was above the sixth- grade level recommended by the NIH and AMA. The easiest readability was found in Gemini, and the most difficult was found in Perplexity. The presentation of more readable text by AI will allow doctors and patients to better collaborate with technology, maintaining a human-centered approach to care. Not only this, it has been reported that the use of AI in rheumatoid arthritis has a promising role in early diagnosis and treatment development. It is reported that incorporating appropriate algorithms, machine and/or deep learning algorithms into real-world environments can increase the utility of future next-generation AI applications [15]. The results of our study also emphasize that AI applications that will provide appropriate readability and high-quality reliable information in the future may positively affect public health.

In this study, not only the readability but also the medical accuracy and reliability of the information provided by AI chatbots about AS were carefully examined. In this way, iwe attempted to reveal how reliable a source of information these tools provide in the field of health. In the study where the answers given by Microsoft Copilot with ChatGPT 4.0 to 15 frequently asked questions about pediatric familial Mediterranean fever were evaluated, it was determined that there were important inaccuracies and omissions with poor reliability. The authors stated that despite the valuable functions and potential of AI models, the lack of validated methods to determine the reliability and accuracy of the information they provide is still prevalent [57]. In another study examining the the responses provided by ChatGPT3.5 about complementary and alternative medicine in rheumatology, it was reported that the responses lacked a scientific basis. [58]. Different from the results of current study, another study examining AI models on magnetic resonance imaging in rheumatology found that AI shows significant promise with potential uses in disease diagnosis, classification and management in MRI in rheumatology, and that it is often comparable to or compared to expert radiologists and rheumatologists in SpA management aspects. They reported that they were able to achieve performance that exceeded them. However, the authors reported that there are still challenges to be overcome, such as the need for large, high-quality datasets and the integration of AI into clinical workflows [59]. There are studies in the literature showing that AI chatbots provide reliable information to questions about rheumatological diseases [54,56]. In our research, it was observed that the Perplexity language model, like other studies in the literature, achieved significantly higher scores than other models on the quality and reliability scales [12,21]. The date, author and source information added to the answers of the Perplexity model caused the model to differ from other models in terms of information reliability and quality. These findings show that the advantages offered by the Perplexity model should be taken into account in the development of AI-supported healthcare services.

However, during the presentation of AI-based health information, it should not be forgotten that the medical decisions that patients can make can never replace face-to-face medical consultation due to weak points regarding security, lack of transparency in AI algorithms, and the possibility of fabricated information in the content created by AI. In addition, due to the responses given by chatbots, possible misinformation or overconfidence in patients may cause life-threatening situations and affect public health.

Limitations

We can list the limitations of our study as follows. This study we conducted based on the first 25 keywords offered by Google to reach the most up-to-date information about AS can be made more comprehensive by supporting it with a larger keyword pool. In addition, it should not be ignored that the word group axial apondyloarthritis, which has a similar meaning, may have been used by users in the search engine instead of AS, and this may change the research results. Future studies conducted with the keyword axial spondyloarthritis will shed light on the quality and safety of responses given by AI chatbots in a broader terminology, including nonradiographic axial spondyloarthritis.

The scope of the study is limited to keywords specific to the English-language. This prevents more comprehensive inferences about the impact of language. Although AI applications have the capacity to create and process text in multiple languages, their performance is often better in the language they were trained in. Therefore, the accuracy of answers given to questions in English may differ from the accuracy of answers in other languages. Not only this, but due to the use of European languages and Roman alphabets, it is clear that the results will differ in different language structures and alphabets [54]. The issue can be clarified in the future with studies conducted with different language groups or comparing English with a different language. Comparisons with different and more diverse AI models than those we used in our study will allow us to better understand the limits and potential of this technology. Our analysis is based on a static dataset for October 2024. This situation limits the ability to capture change over time and current trends. Considering that new models are released every day and new technologies and new features are included, there is no small possibility that we will receive very advanced reliable and quality information in a different time period in the future. In future studies, the generalizability of the results can be increased by considering more languages and dynamic datasets.

Study strengths

Our study is the first to evaluate not only the readability of AI chatbots regarding AS, but also quality and reliability metrics such as information accuracy, consistency, and referencing. Unlike other studies, by comparing multiple popular AI chatbots, it more clearly reveals the potential and limits of this technology in healthcare. These results may contribute to the development of more reliable and effective AI-based healthcare services by providing a roadmap for future studies.

Conclusion

AI-powered chatbots (ChatGPT, Perplexity, Gemini) are becoming increasingly capable of providing information on medical topics such as AS. However, there are still question marks about the readability and reliability of the content produced by these chatbots. In our study, it was determined that the AS-related information produced by these chatbots was of a complexity above the sixth-grade grade level, which is considered to be understandable by a large part of the general population. According to the results of this study, it can be emphasized that although AI chatbots provide useful information for patients and healthcare professionals on medical issues in daily practice, users should be alert against the possible danger of misinformation.

Future advances in AI may improve patient care through improved diagnostic accuracy and personalized treatment strategies, but healthcare providers, AI developers, and policymakers have a role to play to ensure this. Based on the results of this study; it can be interpreted that AI developers should establish the necessary infrastructure to provide accurate and readable information that will not endanger public health by new model AI technologies in the future, healthcare providers should be suspicious of the reliability of the information provided by AI, and policymakers should impose some penalties on AI applications that pose a threat to public health by providing misinformation. It is clear that in future studies, addressing user comprehension issues and new AI models to be produced in languages other than English can help in the development of AI and in reaching more people with more understandable and less erroneous information.

Data Availability

The data supporting the findings of this study are available at the Internet Archive repository (https://archive.org/download/lowbackpain-artificialintelligence).

Funding Statement

The author(s) received no specific funding for this work.

References

  • 1.Mauro D, Thomas R, Guggino G, Lories R, Brown MA, Ciccia F. Ankylosing spondylitis: an autoimmune or autoinflammatory disease? Nat Rev Rheumatol. 2021;17(7):387–404. doi: 10.1038/s41584-021-00625-y [DOI] [PubMed] [Google Scholar]
  • 2.Voruganti A, Bowness P. New developments in our understanding of Ankylosing spondylitis pathogenesis. Immunology. 2020;161(2):94–102. doi: 10.1111/imm.13242 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Ebrahimiadib N, Berijani S, Ghahari M, Pahlaviani FG. Ankylosing spondylitis. J Ophthalmic Vis Res. 2021;16(3):462–9. doi: 10.18502/jovr.v16i3.9440 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Bagcier F, Yurdakul OV, Ozduran E. Top 100 cited articles on Ankylosing spondylitis. Reumatismo. 2021;72(4):218–27. doi: 10.4081/reumatismo.2020.1325 [DOI] [PubMed] [Google Scholar]
  • 5.Ramiro S, Nikiphorou E, Sepriano A, Ortolan A, Webers C, Baraliakos X, et al. ASAS-EULAR recommendations for the management of axial spondyloarthritis: 2022 update. Ann Rheum Dis. 2023;82(1):19–34. doi: 10.1136/ard-2022-223296 [DOI] [PubMed] [Google Scholar]
  • 6.Nikiphorou E, Santos EJF, Marques A. EULAR recommendations for the implementation of self-management strategies in patients with inflammatory arthritis. Ann Rheum Dis. 2021;80:1278–85. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.van der Kraan YM, Paap D, Lennips N, Veenstra ECA, Wink FR, Kieskamp SC, et al. Patients’ needs concerning patient education in axial spondyloarthritis: a qualitative study. Rheumatol Ther. 2023;10(5):1349–68. doi: 10.1007/s40744-023-00585-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Lapane KL, Khan S, Shridharmurthy D, Beccia A, Dubé C, Yi E, et al. Primary care physician perspectives on barriers to diagnosing axial Spondyloarthritis: a qualitative study. BMC Fam Pract. 2020;21(1):204. doi: 10.1186/s12875-020-01274-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Ozduran E, Hanci V, Erkin Y. Evaluating the readability, quality and reliability of online patient education materials on chronic low back pain. Natl Med J India. 2024;37(3):124–30. doi: 10.25259/NMJI_327_2022 [DOI] [PubMed] [Google Scholar]
  • 10.Iorliam A, Ingio JA. A comparative analysis of generative artificial intelligence tools for natural language processing. JCTA. 2024;1:311–25. [Google Scholar]
  • 11.Singh S. Number of ChatGPT users; 2025. [cited 2025 Feb 3]. Available from: https://www.demandsage.com/chatgpt-statistics/ [Google Scholar]
  • 12.Ömür Arça D, Erdemir İ, Kara F, Shermatov N, Odacioğlu M, İbişoğlu E, et al. Assessing the readability, reliability, and quality of artificial intelligence chatbot responses to the 100 most searched queries about cardiopulmonary resuscitation: an observational study. Medicine (Baltimore). 2024;103(22):e38352. doi: 10.1097/MD.0000000000038352 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Ozduran E, Hancı V, Erkin Y, Özbek İC, Abdulkerimov V. Assessing the readability, quality and reliability of responses produced by ChatGPT, Gemini, and Perplexity regarding most frequently asked keywords about low back pain. PeerJ. 2025;13:e18847. doi: 10.7717/peerj.18847 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Kara M, Ozduran E, Mercan Kara M, Hanci V, Erkin Y. Assessing the quality and reliability of YouTube videos as a source of information on inflammatory back pain. PeerJ. 2024;12:e17215. doi: 10.7717/peerj.17215 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Momtazmanesh S, Nowroozi A, Rezaei N. Artificial intelligence in rheumatoid arthritis: current status and future perspectives: a state-of-the-art review. Rheumatol Ther. 2022;9(5):1249–304. doi: 10.1007/s40744-022-00475-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Gürcan Atçı A, Tolu S. What is the frequency of internet searches by patients with rheumatic diseases? To what degree are the websites they get information from reliable and what is the effect of these websites on their treatment? Istanbul Med J. 2020;21(4):275–80. doi: 10.4274/imj.galenos.2020.38243 [DOI] [Google Scholar]
  • 17.Yoon J, Lee S-B, Cho S-K, Sung Y-K. Information and communication technology-based patient education for autoimmune inflammatory rheumatic diseases: a scoping review. Semin Arthritis Rheum. 2024;69:152575. doi: 10.1016/j.semarthrit.2024.152575 [DOI] [PubMed] [Google Scholar]
  • 18.Zhao J, Li L, Li J, Zhang L. Application of artificial intelligence in rheumatic disease: a bibliometric analysis. Clin Exp Med. 2024;24(1):196. doi: 10.1007/s10238-024-01453-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Thorat V, Rao P, Joshi N, Talreja P, Shetty AR. Role of artificial intelligence (AI) in patient education and communication in dentistry. Cureus. 2024;16(5):e59799. doi: 10.7759/cureus.59799 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Gül Ş, Erdemir İ, Hanci V, Aydoğmuş E, Erkoç YS. How artificial intelligence can provide information about subdural hematoma: assessment of readability, reliability, and quality of ChatGPT, BARD, and perplexity responses. Medicine. 2024;103(18):e38009. doi: 10.1097/md.0000000000038009 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Hancı V, Ergün B, Gül Ş, Uzun Ö, Erdemir İ, Hancı FB. Assessment of readability, reliability, and quality of ChatGPT®, BARD®, Gemini®, Copilot®, Perplexity® responses on palliative care. Medicine (Baltimore). 2024;103(33):e39305. doi: 10.1097/MD.0000000000039305 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Oliffe M, Thompson E, Johnston J, Freeman D, Bagga H, Wong PKK. Assessing the readability and patient comprehension of rheumatology medicine information sheets: a cross-sectional Health Literacy Study. BMJ Open. 2019;9(2):e024582. doi: 10.1136/bmjopen-2018-024582 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Wong PKK. Medication adherence in patients with rheumatoid arthritis: why do patients not take what we prescribe? Rheumatol Int. 2016;36(11):1535–42. doi: 10.1007/s00296-016-3566-4 [DOI] [PubMed] [Google Scholar]
  • 24.Noble S. [cited 2025 Feb 5]. Available from: https://newsroom.ucla.edu/stories/how-ai-discriminates-and-what-that-means-for-your-google-habit [Google Scholar]
  • 25.Statista. [cited 2024 Sep 21]. Available from: https://www.statista.com/statistics/216573/worldwide-market-share-of-search-engines/ [Google Scholar]
  • 26.Pan A, Musheyev D, Bockelman D, Loeb S, Kabarriti AE. Assessment of artificial intelligence chatbot responses to top searched queries about cancer. JAMA Oncol. 2023;9(10):1437–40. doi: 10.1001/jamaoncol.2023.2947 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.McCarthy CJ, Berkowitz S, Ramalingam V, Ahmed M. Evaluation of an artificial intelligence chatbot for delivery of IR patient education material: a comparison with societal website content. J Vasc Interv Radiol. 2023;34(10):1760-1768.e32. doi: 10.1016/j.jvir.2023.05.037 [DOI] [PubMed] [Google Scholar]
  • 28.Strzalkowski P, Strzalkowska A, Chhablani J, Pfau K, Errera M-H, Roth M, et al. Evaluation of the accuracy and readability of ChatGPT-4 and Google Gemini in providing information on retinal detachment: a multicenter expert comparative study. Int J Retin Vitr. 2024;10(1):61. doi: 10.1186/s40942-024-00579-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.King DR, Nanda G, Stoddard J, Dempsey A, Hergert S, Shore JH, et al. An introduction to generative artificial intelligence in mental health care: considerations and guidance. Curr Psychiatry Rep. 2023;25(12):839–46. doi: 10.1007/s11920-023-01477-x [DOI] [PubMed] [Google Scholar]
  • 30.Meyrowitsch DW, Jensen AK, Sørensen JB, Varga TV. AI chatbots and (mis)information in public health: impact on vulnerable communities. Front Public Health. 2023;11:1226776. doi: 10.3389/fpubh.2023.1226776 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Solanki H. [cited 2025 Feb 5]. Available from: https://economictimes.indiatimes.com/news/how-to/ai-and-privacy-the-privacy-concerns-surrounding-ai-its-potential-impact-on-personal-data/articleshow/99738234.cms?from=mdr#google_vignette [Google Scholar]
  • 32.Özduran E, Hanci V. Evaluating the readability, quality and reliability of online information on Behçet’s disease. Reumatismo. 2022;74(2). doi: 10.4081/reumatismo.2022.1495 [DOI] [PubMed] [Google Scholar]
  • 33.Erkin Y, Hanci V, Ozduran E. Evaluation of the reliability and quality of YouTube videos as a source of information for transcutaneous electrical nerve stimulation. PeerJ. 2023;11:e15412. doi: 10.7717/peerj.15412 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Erkin Y, Hanci V, Ozduran E. Evaluating the readability, quality and reliability of online patient education materials on transcutaneuous electrical nerve stimulation (TENS). Medicine (Baltimore). 2023;102(16):e33529. doi: 10.1097/MD.0000000000033529 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Silberg WM, Lundberg GD, Musacchio RA. Assessing, controlling, and assuring the quality of medical information on the Internet: caveant lector et viewor--Let the reader and viewer beware. JAMA. 1997;277(15):1244–5. [PubMed] [Google Scholar]
  • 36.Charnock D, Shepperd S, Needham G, Gann R. DISCERN: an instrument for judging the quality of written consumer health information on treatment choices. J Epidemiol Community Health. 1999;53(2):105–11. doi: 10.1136/jech.53.2.105 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Özduran E, Hanci V. Youtube as a source of information about stroke rehabilitation during the COVID-19 pandemic. NeuroAsia. 2023;28(4):907–15. doi: 10.54029/2023kif [DOI] [Google Scholar]
  • 38.Gunduz ME, Matis GK, Ozduran E, Hanci V. Evaluating the readability, quality, and reliability of online patient education materials on spinal cord stimulation. Turk Neurosurg. 2024;34(4):588–99. doi: 10.5137/1019-5149.JTN.42973-22.3 [DOI] [PubMed] [Google Scholar]
  • 39.Ozduran E, Hanci V. Evaluating the readability, quality, and reliability of online information on Sjogren’s syndrome. Indian J Rheumatol. 2023;18(1):16–25. doi: 10.4103/injr.injr_56_22 [DOI] [Google Scholar]
  • 40.Ozduran E, Büyükçoban S. Evaluating the readability, quality and reliability of online patient education materials on post-covid pain. PeerJ. 2022;10:e13686. doi: 10.7717/peerj.13686 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Bernard A, Langille M, Hughes S, Rose C, Leddin D, Veldhuyzen van Zanten S. A systematic review of patient inflammatory bowel disease information resources on the World Wide Web. Am J Gastroenterol. 2007;102(9):2070–7. doi: 10.1111/j.1572-0241.2007.01325.x [DOI] [PubMed] [Google Scholar]
  • 42.Ladhar S, Koshman SL, Yang F, Turgeon R. Evaluation of online written medication educational resources for people living with heart failure. CJC Open. 2022;4(10):858–65. doi: 10.1016/j.cjco.2022.07.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Coskun B, Ocakoglu G, Yetemen M, Kaygisiz O. Can ChatGPT, an artificial intelligence language model, provide accurate and high-quality patient information on prostate cancer? Urology. 2023;180:35–58. doi: 10.1016/j.urology.2023.05.040 [DOI] [PubMed] [Google Scholar]
  • 44.Moult B, Franck LS, Brady H. Ensuring quality information for patients: development and preliminary validation of a new instrument to improve the quality of written health care information. Health Expect. 2004;7(2):165–75. doi: 10.1111/j.1369-7625.2004.00273.x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Nossent J, Inderjeeth C, Keen H, Preen D, Li I, Kelty E. The association between TNF inhibitor therapy availability and hospital admission rates for patients with Ankylosing spondylitis. a longitudinal population-based study. Rheumatol Ther. 2022;9(1):127–37. doi: 10.1007/s40744-021-00393-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.McQueen FMF. Rheumatology around the world: perspectives from Australia and New Zealand. Arthritis Res Ther. 2017;19(1):24. doi: 10.1186/s13075-017-1246-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Járomi M, Szilágyi B, Velényi A, Leidecker E, Raposa BL, Hock M, et al. Assessment of health-related quality of life and patient’s knowledge in chronic non-specific low back pain. BMC Public Health. 2021;21(Suppl 1):1479. doi: 10.1186/s12889-020-09506-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Smith B, Magnani JW. New technologies, new disparities: the intersection of electronic health and digital health literacy. Int J Cardiol. 2019;292:280–2. doi: 10.1016/j.ijcard.2019.05.066 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Neter E, Brainin E. eHealth literacy: extending the digital divide to the realm of health information. J Med Internet Res. 2012;14(1):e19. doi: 10.2196/jmir.1619 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Omar A, Sari I, Inman R, Haroon N. THU0582 readability of online Ankylosing spondylitis patient education material. Ann Rheum Dis. 2015;74:411. doi: 10.1136/annrheumdis-2015-eular.6284 [DOI] [Google Scholar]
  • 51.Hamnes B, van Eijk-Hustings Y, Primdahl J. Readability of patient information and consent documents in rheumatological studies. BMC Med Ethics. 2016;17(1):42. doi: 10.1186/s12910-016-0126-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Kocyigit BF, Koca TT, Akaltun MS. Quality and readability of online information on Ankylosing spondylitis. Clin Rheumatol. 2019;38(11):3269–74. doi: 10.1007/s10067-019-04706-y [DOI] [PubMed] [Google Scholar]
  • 53.Sauerbrei A, Kerasidou A, Lucivero F, Hallowell N. The impact of artificial intelligence on the person-centred, doctor-patient relationship: some problems and solutions. BMC Med Inform Decis Mak. 2023;23(1):73. doi: 10.1186/s12911-023-02162-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Soto-Chávez MJ, Bustos MM, Fernández-Ávila DG, Muñoz OM. Evaluation of information provided to patients by ChatGPT about chronic diseases in Spanish language. Digit Health. 2024;10:20552076231224603. doi: 10.1177/20552076231224603 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Ye C, Zweck E, Ma Z, Smith J, Katz S. Doctor versus artificial intelligence: patient and physician evaluation of large language model responses to rheumatology patient questions in a cross-sectional study. Arthritis Rheumatol. 2024;76(3):479–84. doi: 10.1002/art.42737 [DOI] [PubMed] [Google Scholar]
  • 56.Olszewski R, Watros K, Mańczak M, Owoc J, Jeziorski K, Brzeziński J. Assessing the response quality and readability of chatbots in cardiovascular health, oncology, and psoriasis: a comparative study. Int J Med Inform. 2024;190:105562. doi: 10.1016/j.ijmedinf.2024.105562 [DOI] [PubMed] [Google Scholar]
  • 57.La Bella S, Attanasi M, Porreca A, Di Ludovico A, Maggio MC, Gallizzi R, et al. Reliability of a generative artificial intelligence tool for pediatric familial Mediterranean fever: insights from a multicentre expert survey. Pediatr Rheumatol. 2024;22(1). doi: 10.1186/s12969-024-01011-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Keyßer G, Pfeil A, Reuß-Borst M, Frohne I, Schultz O, Sander O. Welches Potential hat ChatGPT 3.5 für eine qualifizierte Patienteninformation?: Versuch einer systematischen Analyse anhand einer Befragung zu komplementärmedizinischen Verfahren in der Rheumatologie [What is the potential of ChatGPT for qualified patient information?: attempt of a structured analysis on the basis of a survey regarding complementary and alternative medicine (CAM) in rheumatology]. Z Rheumatol. 2025;84(3):179–87. doi: 10.1007/s00393-024-01535-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Adams LC, Bressem KK, Ziegeler K, Vahldiek JL, Poddubnyy D. Artificial intelligence to analyze magnetic resonance imaging in rheumatology. Joint Bone Spine. 2024;91(3):105651. doi: 10.1016/j.jbspin.2023.105651 [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The data supporting the findings of this study are available at the Internet Archive repository (https://archive.org/download/lowbackpain-artificialintelligence).


Articles from PLOS One are provided here courtesy of PLOS

RESOURCES