Abstract
Study Design
Comparative study.
Objectives
This study aims to compare Google and GPT-4 in terms of (1) question types, (2) response readability, (3) source quality, and (4) numerical response accuracy for the top 10 most frequently asked questions (FAQs) about anterior cervical discectomy and fusion (ACDF).
Methods
“Anterior cervical discectomy and fusion” was searched on Google and GPT-4 on December 18, 2023. Top 10 FAQs were classified according to the Rothwell system. Source quality was evaluated using JAMA benchmark criteria and readability was assessed using Flesch Reading Ease and Flesch-Kincaid grade level. Differences in JAMA scores, Flesch-Kincaid grade level, Flesch Reading Ease, and word count between platforms were analyzed using Student’s t-tests. Statistical significance was set at the .05 level.
Results
Frequently asked questions from Google were varied, while GPT-4 focused on technical details and indications/management. GPT-4 showed a higher Flesch-Kincaid grade level (12.96 vs 9.28, P = .003), lower Flesch Reading Ease score (37.07 vs 54.85, P = .005), and higher JAMA scores for source quality (3.333 vs 1.800, P = .016). Numerically, 6 out of 10 responses varied between platforms, with GPT-4 providing broader recovery timelines for ACDF.
Conclusions
This study demonstrates GPT-4’s ability to elevate patient education by providing high-quality, diverse information tailored to those with advanced literacy levels. As AI technology evolves, refining these tools for accuracy and user-friendliness remains crucial, catering to patients’ varying literacy levels and information needs in spine surgery.
Keywords: GPT-4, Google, artificial intelligence, anterior cervical discectomy and fusion, anterior cervical discectomy and fusion, patient education, health literacy, readability
Introduction
ChatGPT, an (artificial intelligence) AI chatbot developed by OpenAI, is used by over 100 million weekly users for its ability to provide insightful answers to various queries. 1 OpenAI released GPT-4 on March 13, 2023, enhancing ChatGPT’s capabilities; it is accessible through a ChatGPT Plus subscription. 2 While both GPT-3.5, the current open-access AI chatbot of ChatGPT, and GPT-4 function on pattern recognition as a means of generating responses, GPT-4 has distinguished itself as a far more accurate and up-to-date source of information for a variety of user queries.3-7 GPT-4 actively pulls information from the Internet search engine Bing and reports sources when formulating responses to users’ questions, which is in stark contrast to the restricted bank of information that GPT-3.5 has access to.8,9 This technological advancement in AI chatbots, especially large language models (LLMs), brings new potential for their use in online patient education. Currently, 89% of U.S. citizens use Google for health-related information before consulting a physician. 10 Google dominates the U.S. search engine market with 88.06% traffic, while Bing and Yahoo hold 6.94% and 2.70%, respectively. 11 As such, Google was utilized as the metric for which the quality of answers provided by GPT-4 would be compared.
Various studies have assessed the quality of online information on different medical conditions. For instance, Wei et al compared Google’s search results with those from GPT-4 for head and neck cancer frequently asked question (FAQs), finding responses by GPT-4 to be less readable. 12 Similarly, Kerbage et al examined GPT-4’s accuracy for gastrointestinal pathologies, noting an 84% accuracy but subpar performance when compared to a physician in broader medical queries. 13 The heterogeneity that exists between Google search results, AI-powered chatbots, and physicians’ expertise introduces a new discussion into the quality of information that patients may encounter online, especially as it pertains to common surgical procedures of the spine. Anterior cervical discectomy and fusion (ACDF) is one of the most common procedures of the cervical spine, accounting for 132 000 surgeries in the United States each year. 14 The indications for ACDF are vast, treating conditions such as cervical myelopathy, radiculopathy, and instability brought upon by disc herniations and other degenerative changes to the cervical spine.15,16 Since ACDF has earned a reputation as a safe, effective treatment for various cervical spine conditions and is often recommended by spine surgeons, analyzing online resources like GPT-4 can offer insights into the quality of information available to those considering or indicated for ACDF.
Previous studies have aimed to assess the quality of answers provided by AI chatbots related to various pathologies and treatment modalities. However, to the best of our knowledge, no studies have analyzed how GPT-4 compares to Google in quality of answers generated regarding a common spinal pathology such as ACDF. This study aims to compare Google and GPT-4 in terms of (1) question types, (2) response readability, (3) source quality, and (4) numerical response accuracy for the top 10 most FAQs about ACDF.
Materials and Methods
On December 18, 2023, the term “ACDF” was searched on Google and the 10 most FAQs related to the search term were recorded. The following statement was then inputted into GPT-4: “Perform a search with the search term ‘ACDF’ and record the 10 most FAQs related to the search term.” These questions and their corresponding answers were categorized using the Rothwell classification system, which divides questions into 3 categories: fact, policy, and value, further subclassified into 10 topics specific to ACDF (Table 1).17-23
Table 1.
Description of Classification Systems for Questions and Websites.
Rothwell’s Classification | Description |
---|---|
Fact | Asks whether something is true and to what extent. Example: How do you sleep after cervical neck surgery? |
Policy | Asks whether a specific course of action should be taken to solve a problem. Example: Is ACDF the right option for me? |
Value | Asks for an evaluation of an idea, object, or event. Example: How serious is an anterior cervical discectomy and fusion? |
Question classification by topic | |
Fact | |
Specific activities | Ability/inability to perform a specific activity or action. Example: Do you have to wear a neck brace after cervical fusion? |
Timeline of recovery | Length of time for recovery milestones. Example: How long is recovery from anterior cervical discectomy and fusion? |
Technical details | Technical aspects of ACDF surgery, including instrumentation, approach, and anesthesia. Example: How are the vertebrae fused together? |
Restrictions | Restrictions to activity or lifestyle in the recovery process. Example: What are the permanent restrictions after cervical fusion? |
Cost | Cost of surgery/treatment. Example: What is the cost of ACDF? |
Policy | |
Indications/Management | Specific indications for ACDF, as well as questions regarding the postoperative management of ACDF. Example: Which conditions are treated with ACDF? |
Risks/Complications | Risks/complications occurring prior to, during, or following surgery, including the postoperative recovery and rehabilitation periods. Example: What are the risks of ACDF? |
Value | |
Pain | Related to pain following ACDF. Example: How much pain is there after anterior cervical discectomy and fusion? |
Longevity | Long-term complications and effects following ACDF. Example: How long does ACDF last? |
Evaluation of surgery | Related to the successfulness, seriousness, or invasiveness of ACDF. Example: Is ACDF considered major surgery? |
Website categorization | |
Commercial | Commercial organization that positions itself as a source of health information, including medical device companies. |
Academic | Academic institutions with a clear commitment to research and education, including universities, academic societies, and journals. |
Medical practice | Clinical practice or hospital with no clear academic affiliation. |
Government | Websites operated under the purview of a federal government, including URLs with gov and nhs.uk. |
Social media | Websites maintained by nonmedical organizations with the intent of spreading information between users, including blogs, internet forums, and videos uploaded to public entertainment platforms. |
Abbreviation: ACDF, Anterior cervical discectomy and fusion.
Linked websites corresponding to each question were evaluated for quality using the Journal of the American Medical Association (JAMA) benchmark criteria. These criteria, based on authorship, attribution, disclosure, and currency, have been validated in previous studies for assessing website quality.24-28 Two independent reviewers categorized these questions, and any disagreements were resolved by a senior author. The interobserver reliability for this classification was measured using Cohen’s kappa coefficients. Additionally, the top 10 FAQs with numerical responses from the Google search were inputted into GPT-4 for similar numerical outputs.
Comparative analysis of the responses from both platforms included assessments of readability, using the Flesch Reading Ease score and the Flesch-Kincaid grade level, and word count. The Flesch Reading Ease score measures readability on a scale from 0 to 100, utilizing a formula that incorporates average sentence length and average number of syllables per word. Higher scores indicate a higher degree of readability. Flesch Reading Ease scores of 90 to 100 typically indicate articles geared towards an elementary school student, scores of 60 to 70 are geared towards middle school students, and scores of 0 to 30 are geared towards university graduates. The Flesch Reading Ease score is thus inversely correlated with the Flesch-Kincaid Grade Level, with a higher score indicating a lower grade level. The Flesch Reading Ease score inversely correlates with the Flesch-Kincaid grade level.29-31 Plain-text format was used for analyzing shared responses (https://readability-score.com).
Cohen’s kappa coefficient was utilized to assess interobserver reliability using R statistical software (version 4.3.1; R Project for Statistical Computing, Vienna, Austria). Student’s t-tests were used to assess differences in the JAMA benchmark criteria scores, Flesch-Kincaid grade level, Flesch Reading Ease, and word count between GPT-4 and Google responses and website sources. Statistical significance was set at the .05 level.
Results
Frequently Asked Questions in Google Web Search and GPT-4 for ACDF
Of the 10 most FAQs generated from a Google web search of ACDF, the most common Rothwell category was fact, accounting for 6 of the FAQs provided, followed by value questions, consisting of 4 questions (Table 2, Figure 1). There were no questions that were classified as policy. The distribution of Google’s top 10 most FAQs by topic was equal for specific activities, timeline of recovery, restrictions, pain, and evaluation of surgery, each accounting for 2 questions (Table 2, Figure 2). The most common Rothwell category for GPT-4 was fact, consisting of 5 questions. This was followed by policy, accounting for 4 questions (Table 2, Figure 1). One question was classified as value. The most common topics among questions produced by GPT-4 were technical details and indications/management, each accounting for 3 questions (Table 2, Figure 2). Interobserver reliability for Rothwell and topic categorizations were .918 and .942 (P < .001 for all), respectively, suggesting significant agreement between observers.
Table 2.
Top 10 Most Frequently Asked Questions for ACDF.
GPT-4 | |
---|---|
1. How serious is an anterior cervical discectomy and fusion? (Value/evaluation of surgery) | 1. Is ACDF the right option for me? (Policy/indications or management) |
2. How long is recovery from anterior cervical discectomy and fusion? (Fact/timeline of recovery) | 2. What is a discectomy, and will I need this operation in addition to spinal fusion? (Fact/technical details) |
3. How much pain is there after anterior cervical discectomy and fusion? (Value/pain) | 3. How are the vertebrae fused together? (Fact/technical details) |
4. Is ACDF considered major surgery? (Value/evaluation of surgery) | 4. How long is the recovery from anterior cervical discectomy and fusion? (Fact/timeline of recovery) |
5. How long do you sleep in a recliner after neck surgery? (Fact/timeline of recovery) | 5. What are the risks of ACDF? (Policy/risks or complications) |
6. What are the permanent restrictions after cervical fusion? (Fact/restrictions) | 6. What is ACDF? (Fact/technical details) |
7. What activities should you avoid after cervical fusion? (Fact/restrictions) | 7. Which conditions are treated with ACDF? (Policy/indications or management) |
8. How painful is cervical neck surgery? (Value/pain) | 8. What can be expected on the day of surgery? (Value/evaluation of surgery) |
9. Do you have to wear a neck brace after cervical fusion? (Fact/specific activities) | 9. What is the recovery timeline following ACDF surgery? (Fact/timeline of recovery) |
10. How do you sleep after cervical neck surgery? (Fact/specific activities) | 10. Is ACDF right for me? (Policy/indications or management) |
Abbreviation: ACDF, Anterior cervical discectomy and fusion.
Figure 1.
Rothwell’s classification stratified by search engine.
Figure 2.
Topic classification stratified by search engine.
Google and GPT-4 shared one similar question regarding their top 10 FAQs, and the answers provided to this question varied (Table 3). Responses provided by Google had an average Flesch-Kincaid grade level of 9.28, an average Flesch Reading ease score of 54.85, and a mean word count of 158.30 (Table 4). Responses generated by GPT-4 had an average Flesch-Kincaid grade level of 12.96, Flesch Reading ease score of 37.07, and word count of 122.90. There was a statistically significant difference between Flesch-Kincaid grade level (P = .003) and Flesch Reading ease score (P = .005) between the answers provided via Google web search and GPT-4, with answers coming from GPT-4 showing a higher grade level and being more difficult to read. Overall, responses generated from GPT-4 were less direct and constituted a far longer timeline of recovery.
Table 3.
Answers to Similarly Asked Questions From Google and GPT-4 for ACDF.
Question | GPT-4 | |
---|---|---|
Google asked: How long is recovery from anterior cervical discectomy and fusion?
GPT-4 asked: How long is the recovery from anterior cervical discectomy and fusion? |
The usual period of recovery from cervical fusion surgery is 8 to 12 weeks and complete healing may take up to 6 months. Some patients may return to work sooner than others depending on their job, response to surgery, and ability to perform other lighter tasks in the work place. Physician approval is required prior to returning to work. | The recovery time following anterior cervical discectomy and fusion (ACDF) surgery can vary but typically follows this general timeline: 1. Immediate post-Operative Period: Most patients spend 1 or 2 nights in the hospital. Some may even go home the same day. 2. Early recovery (First few weeks): Patients are advised to rest and limit their activities. A cervical collar or brace might be needed to support the neck. 3. Physical Therapy: Begins after a few weeks to restore strength, flexibility, and range of motion. This may continue for several weeks or months. 4. Return to Normal activities: Patients can often return to work and resume most everyday activities within 4 to 6 weeks. However, the time frame can vary based on the patient’s overall health and the nature of their work. 5. Complete fusion: For the fusion to fully mature and heal into 1 solid, strong bone, it may take 12 to 18 months. It’s important to follow your surgeon’s specific advice regarding activity restrictions and rehabilitation, as these will vary depending on individual health conditions and surgical outcomes. |
ACDF, Anterior cervical discectomy and fusion.
Table 4.
Comparison of Readability and Source Quality.
Measures of Readability | GPT-4 | P-Value a | |
---|---|---|---|
(SD) | (SD) | ||
Flesch-Kincaid grade level | 9.28 (2.44) | 12.96 (2.25) | P = .003 |
Flesch reading ease score | 54.85 (14.15) | 37.07 (10.25) | P = .005 |
Word count | 158.30 (134.90) | 122.90 (22.29) | P = .433 |
, mean; SD, standard deviation.
aBolded P-value indicates statistical significance.
bValue expressed as mean (standard deviation).
Source Quality From Which Answers by Google Web Search and GPT-4 for FAQs Regarding ACDF Were Generated
Of the websites provided by Google web search to answer FAQs regarding ACDF, the most common source type was medical practice, accounting for 6 out of 10 unique websites, followed by social media and government, which comprised 3 out of 10 and 1 out of 10 unique websites, respectively. Of the unique websites utilized by GPT-4 to answer FAQs regarding ACDF, 2 out of 3 sources were categorized as social media, and one source was determined to be an academic website. Interobserver reliability for website categorization was .930 (P < .001), suggesting high agreement between observers.
Source quality was further scrutinized through the assignment of JAMA scores. Of the unique websites that Google directed users to for answering FAQs, 6 out of 10 unique sources received a JAMA score of 1 (Table 4). 2 out of the 3 unique sources utilized by GPT-4 received a JAMA score of 3, and the remaining one source received a JAMA score of 4. The mean JAMA scores for sources utilized by Google Web search and GPT-4 to answer FAQs regarding ACDF were determined to be 1.800 and 3.333, respectively. A statistically significant difference (P = .016) between mean JAMA scores of sources utilized by Google and GPT-4 was observed. Interobserver reliability for JAMA score categorization was 1.000 (P < .001), suggesting complete agreement between observers.
Numerical Responses in Google Web Search and GPT-4 for ACDF
The top 10 FAQs provided by a Google web search regarding ACDF with numerical responses were recorded and entered into GPT-4. Of the answers provided by Google and generated by GPT-4, 6 out of 10 questions varied in response (Table 5). Overall, GPT-4’s responses avoided specific numerical timeframes, favoring phrases such as “a few days,” “a few weeks,” and “several months” to convey durations.
Table 5.
Numerical Responses Compared Between Google and GPT-4.
Question a | GPT-4 | |
---|---|---|
1. How long is recovery from anterior cervical discectomy and fusion? | 8-12 weeks | Few weeks, 1-6 months |
2. How long do you sleep in a recliner after neck surgery? | 1-2 days | Few days to several weeks |
3. How long do you need to wear collar after cervical neck surgery? | 6-8 weeks | 4-6 weeks |
4. How long do you need pain meds after cervical fusion? | 1-4 weeks | Few days to several weeks |
5. How long is hospital stay after cervical fusion? | 1-2 days | 1-2 days |
6. How long does it take for neck fusion to fuse? | 3 months | 3-6 months |
7. How long do you lose your voice after neck surgery? | Most people fully recover their voice within a few days of the surgery, but others may have trouble speaking for weeks or longer | Few days to a few weeks |
8. How long will my throat hurt after neck surgery? | 2-3 days | Few days to a week |
9. How long does it hurt to swallow after ACDF? | At least 12 weeks | Several weeks to a few months |
10. How long does it take to feel normal after ACDF surgery? | 3 months | 1 year or more |
ACDF, Anterior cervical discectomy and fusion.
aQuestions highlighted in bold varied in response.
Discussion
Large language models have been developed to serve a diverse array of functions, including assistance in education, recommendations for medical care, and individual case analysis in surgery.32,33 Despite having fewer parameters than other LLMs, ChatGPT is one of the most studied chatbots in the literature. 34 The purpose of this study was to investigate GPT-4’s ability to provide patients with readable answers and reliable sources to FAQs related to ACDF in comparison to a Google web search conducted on December 18, 2023. While Google had an equal distribution of question topics, technical details and indications/management comprised the majority of FAQs queried through GPT-4. Answers provided by GPT-4 to FAQs were associated with a higher level of source evidence when compared to answers provided by a Google web search. Answers provided by GPT-4 were also associated with a higher Flesch-Kincaid grade level and lower Flesch Reading Ease score, yet similar word count, when compared to Google.
When asked to produce the 10 most FAQs for ACDF, GPT-4 only replicated 1 similar question that was queried through a Google web search. Currently, there is no available literature comparing the ability of GPT-4 to provide the 10 most FAQs related to a given search term to Google. Previous studies, like Dubin et al, found a 25% overlap in FAQs between GPT-3.5 and Google regarding hip and knee arthroplasty. 17 GPT-4’s unique web search capabilities via Bing might contribute to these differences.8,9 Its method of determining FAQs could involve mining data from websites with dedicated FAQ sections, as opposed to querying Bing’s “People Also Ask” feature. Each GPT-4-generated question was linked to a website with a related question, enhancing the reliability of the information. The distinct approach of GPT-4 in generating FAQs, compared to traditional search engines, could influence how patients access and interpret information about ACDF, potentially altering their expectations and understanding of the procedure. This highlights the importance of evaluating the sources and methods of AI-generated medical information to ensure its alignment with evidence-based practices.
GPT-4 cited more reliable web sources than Google, which tended to rely on social media and medical practice websites. This is consistent with prior literature evaluating the most commonly directed websites for FAQs when querying Google for ACDF, whereas GPT-4 utilized sources such as academic and social media websites. 23 The JAMA score of GPT-4’s sources was significantly higher, a novel finding given that earlier ChatGPT models didn’t provide source links, resulting in a JAMA score of zero.35,36 Walker et al also confirmed GPT-4’s ability to match the quality of existing online medical information, as assessed by the Ensuring Quality Information for Patients (EQIP) tool. 37 Additionally, GPT-4’s reduced rate of citation fabrication marks a significant improvement when compared to GPT-3.5, reflecting a critical step forward in the utility of AI for providing trustworthy online health information. 38 This shift towards more reliable and academically credible sources by GPT-4 suggests a notable enhancement in the quality of online health information provided by AI systems. It underscores the potential of AI tools like GPT-4 in supporting spine surgeons by directing patients to more evidence-based and peer-reviewed information, thereby improving patient education and potentially influencing treatment outcomes.
Answers provided by GPT-4 had a higher Flesch-Kincaid grade level and lower Flesch Reading Ease score than answers provided by Google, aligning with earlier observations of GPT-3.5’s collegiate to post-collegiate readability levels.36,39,40 This suggests that GPT-4, like its predecessor, may not be fully accessible to the average American adult, whose health literacy is at or below an eighth-grade level (Flesch-Kincaid grade level 8) as defined by the National Institute of Health (NIH) and American Medical Association (AMA).41,42 However, ChatGPT can be prompted to simplify complex texts, significantly reducing collegiate reading level by an average of 3.3 grades. 43 Despite GPT-4’s answers having a higher reading level, a previous study by Biro et al indicates that exposure to ChatGPT’s technical language responses can improve test scores, particularly among individuals with higher health literacy, and increase trust in the chatbot. 44 The higher reading level of GPT-4’s responses, while potentially limiting accessibility for some, could also be seen as an opportunity to elevate patient understanding and engagement in their healthcare. This suggests the importance of tailoring AI-generated content to the reader’s literacy level, emphasizing the role of spine surgeons in interpreting and contextualizing information for patients to ensure comprehension and informed decision-making.
When comparing Google’s web search responses to those generated by GPT-4 for ACDF-related FAQs requiring numerical responses, 6 out of 10 questions had differing information between the 2 platforms. This finding aligns with a prior study on total joint arthroplasty, which reported that 55% of numerical question responses varied between the platforms. 17 Google’s answers were more concise, while GPT-4 tended to offer a wider range of recovery timelines. For example, in response to “How long is recovery from ACDF?”, Google suggested 8-12 weeks, whereas GPT-4 provided an estimate ranging from a few weeks to 6 months. These variations in response timelines are crucial in the medical context, as they can influence patient expectations and preparedness. The broader spectrum of timelines from GPT-4 might reflect its capability to incorporate a wider array of patient experiences and clinical outcomes, as opposed to the more standardized responses typically found in Google searches. However, more objective questions, like “How long is hospital stay after cervical fusion?”, received consistent responses of 1-2 days from both platforms, indicating a consensus on certain clinical aspects. Overall, these findings suggest that while GPT-4 offers more comprehensive and varied responses, potentially enhancing patient understanding and expectations, it is also important to recognize the value of Google’s concise answers for certain straightforward, objective medical questions. This balance between detailed and straightforward information is essential in providing patients with a well-rounded understanding of their expectations following ACDF.
This study has several limitations. Google’s web search results and GPT-4’s responses, which are dependent on Bing, may vary due to updated content, affecting response accuracy. As a result, our findings might not be reproducible. Despite this limitation, this study provides a starting point for evaluating the capabilities of GPT-4 in answering FAQs relating to ACDF. Additionally, despite using a newly installed Google Chrome browser to avoid personalized results, Google might still have personalized the search outcomes. For GPT-4, user login requirements could introduce biases, as its responses might be influenced by the user’s interaction history. GPT-4 being a paid subscription service, unlike the open-access GPT-3.5, may bias its responses due its limited user base. The search methodology used to query GPT-4 in this study might not fully reflect the natural language of patients. Despite these challenges, this study provides important insights into GPT-4’s capabilities in medical information provision, highlighting the evolving role of AI in healthcare. Future studies should explore methods to prompt GPT-4 to respond in ways that more closely represent patient viewpoints.
Conclusion
AI’s role in patient education is expanding, with GPT-4 showcasing enhanced capabilities and accuracy, though its higher reading level might limit usability for some users. This study demonstrates GPT-4’s potential to complement sources like Google, providing varied, in-depth responses. Yet, it emphasizes the need to balance complex AI content with clear, concise information. As AI integrates further into healthcare, refining these tools for accuracy and accessibility remains essential, meeting the diverse information needs and literacy levels of online medical information seekers.
Footnotes
The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: Mitchell K. Ng is a paid consultant at Ferghana Partners. For the remaining authors none were declared.
Funding: The author(s) received no financial support for the research, authorship, and/or publication of this article.
ORCID iDs
Paul G. Mastrokostas https://orcid.org/0000-0002-1196-517X
Leonidas E. Mastrokostas https://orcid.org/0009-0006-8926-3502
Ian J. Wellington https://orcid.org/0000-0002-6630-2939
Mitchell K. Ng https://orcid.org/0000-0002-5831-055X
References
- 1.Number of ChatGPT Users (Dec 2023). Accessed December 22, 2023. https://explodingtopics.com/blog/chatgpt-users [Google Scholar]
- 2.GPT-4 Released: What it Means for the Future of Your Business. Accessed December 22, 2023. https://www.forbes.com/sites/forbesbusinesscouncil/2023/03/28/gpt-4-released-what-it-means-for-the-future-of-your-business/?sh=2b86082a2dc6 [Google Scholar]
- 3.Egli A, Egli A. ChatGPT, GPT-4, and other large language models: the next revolution for clinical microbiology? Clin Infect Dis. 2023;77(9):1322-1328. doi: 10.1093/CID/CIAD407 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Brin D, Sorin V, Vaid A, et al. Comparing ChatGPT and GPT-4 performance in USMLE soft skill assessments. Sci Rep. 2023;13(1):16492. doi: 10.1038/S41598-023-43436-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Currie G, Robbie S, Tually P. ChatGPT and patient information in nuclear medicine: GPT-3.5 versus GPT-4. J Nucl Med Technol. 2023;51(4):307. doi: 10.2967/JNMT.123.266151 [DOI] [PubMed] [Google Scholar]
- 6.Moshirfar M, Altaf AW, Stoakes IM, Tuttle JJ, Hoopes PC. Artificial intelligence in ophthalmology: a comparative analysis of GPT-3.5, GPT-4, and human expertise in answering StatPearls questions. Cureus. 2023;15(6):e40822. doi: 10.7759/CUREUS.40822 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Rosoł M, Gąsior JS, Łaba J, Korzeniewski K, Młyńczak M. Evaluation of the performance of GPT-3.5 and GPT-4 on the polish medical final examination. Sci Rep. 2023;13(1):1-13. doi: 10.1038/s41598-023-46995-z [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.ChatGPT0-3.5: An Overview and Limitations | Blocshop. Accessed December 22, 2023. https://www.blocshop.io/blog/chatgpt3-5-limitations [Google Scholar]
- 9.ChatGPT Powers up: OpenAI Finally Lets the GPT-4 Chatbot to Browse the Internet for Answers. Accessed December 22, 2023. https://www.firstpost.com/world/openai-finally-lets-chatgpt-connect-to-the-internet-to-search-for-answers-12345442.html [Google Scholar]
- 10.Every State’s Most Googled Medical Symptoms. Accessed December 22, 2023. https://eligibility.com/medicare/states-most-googled-medical-symptom [Google Scholar]
- 11.Search Engine Market Share United States Of America | Statcounter Global Stats. Accessed December 22, 2023. https://gs.statcounter.com/search-engine-market-share/all/united-states-of-america#monthly-201501-202301 [Google Scholar]
- 12.Wei K, Fritz C, Rajasekaran K. Answering head and neck cancer questions: an assessment of ChatGPT responses. Am J Otolaryngol. 2024;45(1):104085. doi: 10.1016/J.AMJOTO.2023.104085 [DOI] [PubMed] [Google Scholar]
- 13.Kerbage A, Kassab J, El Dahdah J, Burke CA, Achkar JP, Rouphael C. Accuracy of ChatGPT in common gastrointestinal diseases: impact for patients and providers. Clin Gastroenterol Hepatol. 2023. doi: 10.1016/j.cgh.2023.11.008 [DOI] [PubMed] [Google Scholar]
- 14.Babici D, Johansen PM, Miller TD, Snelling B. Five-level anterior cervical discectomy and fusion. Cureus. 2021;13(11):e19961. doi: 10.7759/CUREUS.19961 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Rhee JM, Ju KL. Anterior cervical discectomy and fusion. JBJS Essent Surg Tech. 2016;6(4):1-8. doi: 10.2106/JBJS.ST.15.00056 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Zheng HL, Li B, Song SK, Chen PB, Jiang LS, Jiang SD. Anterior cervical discectomy and fusion to treat cervical instability with vertigo and dizziness: a single center, retrospective, observational study. Front Surg. 2022;9:1047504. doi: 10.3389/FSURG.2022.1047504 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Dubin JA, Bains SS, Chen Z, et al. Using a Google web search analysis to assess the utility of ChatGPT in total joint arthroplasty. J Arthroplasty. 2023;38(7):1195-1202. doi: 10.1016/J.ARTH.2023.04.007 [DOI] [PubMed] [Google Scholar]
- 18.Foster BK, Brule NR, Callahan C, Baylor J, Klena JC, Grandizio LC. Online information related to symptoms of carpal tunnel syndrome: a Google search analysis. Cureus. 2023;15(2):e35586. doi: 10.7759/CUREUS.35586 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Hodakowski AJ, McCormick JR, Damodar D, et al. Rotator cuff repair: what questions are patients asking online and where are they getting their answers? Clin Shoulder Elb. 2023;26(1):25. doi: 10.5397/CISE.2022.01235 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.McCormick JR, Kruchten MC, Mehta N, et al. Internet search analytics for shoulder arthroplasty: what questions are patients asking? Clin Shoulder Elb. 2023;26(1):55. doi: 10.5397/CISE.2022.01347 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Shen TS, Driscoll DA, Islam W, Bovonratwet P, Haas SB, Su EP. Modern internet search analytics and total joint arthroplasty: what are patients asking and reading online? J Arthroplasty. 2021;36(4):1224-1231. doi: 10.1016/j.arth.2020.10.024 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Khalil LS, Castle JP, Akioyamen NO, et al. What are patients asking and reading online? An analysis of online patient searches for rotator cuff repair. J Shoulder Elbow Surg. 2023;32:2245. doi: 10.1016/j.jse.2023.04.021 [DOI] [PubMed] [Google Scholar]
- 23.Kasthuri V, Homer A, Alsoof D, et al. Modern internet search analytics and spine: what are patients asking and reading online? N Am Spine Soc J. 2023;14:100214. doi: 10.1016/J.XNSJ.2023.100214 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Agar A, Sahin A. Kyphosis-related information on the internet is the quality, content and readability sufficient for the patients? Global Spine J. 2022;12(3):476-482. doi: 10.1177/21925682211015955 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Guo WJ, Wang WK, Xu D, Qiao Z, Shi YL, Luo P. Evaluating the quality, content, and readability of online resources for failed back spinal surgery. Spine (Phila Pa 1976). 2019;44(7):494-502. doi: 10.1097/BRS.0000000000002870 [DOI] [PubMed] [Google Scholar]
- 26.Nason GJ, Baker JF, Byrne DP, Noel J, Moore D, Kiely PJ. Scoliosis-specific information on the internet: has the “information highway” led to better information provision? Spine (Phila Pa 1976). 2012;37(21):E1364. doi: 10.1097/BRS.0B013E31826619B5 [DOI] [PubMed] [Google Scholar]
- 27.Elhassan Y, Sheridan G, Nassiri M, Osman M, Kiely P, Noel J. Discectomy-related information on the internet: does the quality follow the surge? Spine (Phila Pa 1976). 2015;40(2):121-125. doi: 10.1097/BRS.0000000000000689 [DOI] [PubMed] [Google Scholar]
- 28.Chong LSL, Zhu M, Baker JF. Quality assessment of degenerative cervical myelopathy information on the internet. Int J Spine Surg. 2024;18:54. doi: 10.14444/8566 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Eltorai AEM, Cheatham M, Naqvi SS, et al. Is the readability of spine-related patient education material improving? An assessment of subspecialty websites. Spine (Phila Pa 1976). 2016;41(12):1041-1048. doi: 10.1097/BRS.0000000000001446 [DOI] [PubMed] [Google Scholar]
- 30.Luciani AM, Foster BK, Hayes D, DelSole EM. Readability of online spine patient education resources. World Neurosurg. 2022;162:e640-e644. doi: 10.1016/J.WNEU.2022.03.091 [DOI] [PubMed] [Google Scholar]
- 31.Vives M, Young L, Sabharwal S. Readability of spine-related patient education materials from subspecialty organization and spine practitioner websites. Spine (Phila Pa 1976). 2009;34(25):2826-2831. doi: 10.1097/BRS.0B013E3181B4BB0C [DOI] [PubMed] [Google Scholar]
- 32.Chatterjee S, Bhattacharya M, Pal S, Lee SS, Chakraborty C. ChatGPT and large language models in orthopedics: from education and surgery to research. J Exp Orthop. 2023;10(1):128. doi: 10.1186/S40634-023-00700-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Safi Z, Abd-Alrazaq A, Khalifa M, Househ M. Technical aspects of developing chatbots for medical applications: scoping review. J Med Internet Res. 2020;22(12):e19127. doi: 10.2196/19127 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Thirunavukarasu AJ, Ting DSJ, Elangovan K, Gutierrez L, Tan TF, Ting DSW. Large language models in medicine. Nat Med. 2023;29(8):1930-1940. doi: 10.1038/S41591-023-02448-8 [DOI] [PubMed] [Google Scholar]
- 35.Ulusoy I, Yılmaz M, Kıvrak A. How efficient is ChatGPT in accessing accurate and quality health-related information? Cureus. 2023;15(10):e46662. doi: 10.7759/CUREUS.46662 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Hurley ET, Crook BS, Lorentz SG, et al. Evaluation high-quality of information from ChatGPT (artificial intelligence-large language model) artificial intelligence on shoulder stabilization surgery. Arthroscopy. doi: 10.1016/J.ARTHRO.2023.07.048. Published online 2023. [DOI] [PubMed]
- 37.Walker HL, Ghani S, Kuemmerli C, et al. Reliability of medical information provided by ChatGPT: assessment against clinical guidelines and patient information quality instrument. J Med Internet Res. 2023;25:e47479. doi: 10.2196/47479 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Walters WH, Wilder EI. Fabrication and errors in the bibliographic citations generated by ChatGPT. Sci Rep. 2023;13(1):14045. doi: 10.1038/S41598-023-41032-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Crook BS, Park CN, Hurley ET, Richard MJ, Pidgeon TS. Evaluation of online artificial intelligence-generated information on common hand procedures. J Hand Surg Am. 2023;48(11):1122-1127. doi: 10.1016/J.JHSA.2023.08.003 [DOI] [PubMed] [Google Scholar]
- 40.Mishra A, Begley SL, Chen A, et al. Exploring the intersection of artificial intelligence and neurosurgery: let us be cautious with ChatGPT. Neurosurgery. 2023;93(6):1366-1373. doi: 10.1227/NEU.0000000000002598 [DOI] [PubMed] [Google Scholar]
- 41.Weiss BD. Health literacy and patient safety: help patients understand. Manual for clinicians. American Medical Association Foundation. Published online 2007.
- 42.Cotugna N, Vickery CE, Carpenter-Haefele KM. Evaluation of literacy level of patient education pages in health-related journals. J Community Health. 2005;30(3):213-219. doi: 10.1007/S10900-004-1959-X [DOI] [PubMed] [Google Scholar]
- 43.Ayre J, Mac O, McCaffery K, et al. New frontiers in health literacy: using chatgpt to simplify health information for people in the community. J Gen Intern Med. doi: 10.1007/S11606-023-08469-W. Published online 2023. [DOI] [PMC free article] [PubMed]
- 44.Biro J, Linder C, Neyens D. The effects of a health care chatbot’s complexity and persona on user trust, perceived usability, and effectiveness: mixed methods study. JMIR Hum Factors. 2023;10:e41017. doi: 10.2196/41017 [DOI] [PMC free article] [PubMed] [Google Scholar]