Comparative Evaluation of Popular Gen-AI Chatbots in Generating Patient Education Material on Pulmonary Artery Catheter Insertion

Omshubham Gangadhar Asai; Nayana Sabu; Prakash Gondode; Soumya Das; Gajanan Chauhan

doi:10.4103/aca.aca_145_25

. 2026 Jan 16;29(1):81–88. doi: 10.4103/aca.aca_145_25

Comparative Evaluation of Popular Gen-AI Chatbots in Generating Patient Education Material on Pulmonary Artery Catheter Insertion

Omshubham Gangadhar Asai ¹, Nayana Sabu ¹, Prakash Gondode ¹, Soumya Das ^2,^✉, Gajanan Chauhan ¹

PMCID: PMC12935120 PMID: 41543410

Abstract

Introduction:

Patient education significantly improves outcomes, especially in high-risk procedures. However, traditional educational resources often fail to address patient literacy and emotional needs adequately. Large language models like ChatGPT (OpenAI) and Gemini (Google) offer promising alternatives, potentially enhancing both accessibility and comprehensibility of procedural information. This study evaluates and compares the effectiveness of ChatGPT and Gemini in generating accurate, readable, and clinically relevant patient education materials (PEMs) for pulmonary artery catheter insertion.

Methodology:

A comparative, single-blinded study was conducted using structured validation methods using a common prompt for both gen artificial intelligence (AI) chatbots. AI-generated PEMs were assessed by board-certified anesthesiologists and intensivists. Face validity was determined using a 5-point Likert scale evaluating appropriateness, clarity, relevance, and trustworthiness. Content validity was measured by calculating content validity index. Accuracy and completeness were evaluated by a separate expert panel using a 10-point Likert scale. Readability and sentiment analysis were assessed via automated online tools.

Results:

Both chatbots achieved robust face and content validity (S-CVI = 0.91). ChatGPT scored significantly higher on accuracy [9.00 vs. 8.00; P = 0.021] and perceived trustworthiness, while Gemini outperformed in readability (Flesch Reading Ease score: 65 vs. 54; Flesch-Kincaid Grade Level: 7.58 vs. 8.64) and clarity. Both outputs maintained a neutral emotional tone.

Conclusion:

AI chatbots show promise as innovative tools for patient education. By leveraging the strengths of both AI-driven technologies and human expertise, healthcare providers can enhance patient education and empower individuals to make informed decisions about their health and medical care involving complex clinical procedures.

Keywords: Face validity, generative artificial intelligence, patient education, pulmonary arteries, readability, sentiment analysis

INTRODUCTION

The integration of artificial intelligence (AI) into healthcare has progressed beyond diagnostic and predictive tasks toward more nuanced roles, including patient education and communication. A major advancement in this space is the emergence of large language models (LLMs) such as ChatGPT (OpenAI) and Gemini (Google), which leverage natural language processing to generate human-like, contextually relevant content in real time. These generative AI models are being explored as tools for automated health information delivery and search, offering advantages over traditional platforms like Wikipedia [Figure 1a] by improving access to accurate, readable, and empathetic medical information.

(a) Google Trends on search terms ‘chatgpt’ ‘gemini’ and ‘wikipedia’ over the period [June 2022-May 2025] in health category worldwide. (b) Schematic representation of the study design outlining each phase of evaluation

The need for patient education in high-risk interventions

Effective patient education is fundamental to clinical practice, improving adherence, reducing anxiety, and enhancing outcomes.[1] This is particularly important for high-risk, technically complex procedures such as pulmonary artery (PA) catheter insertion, routinely used in cardiac surgery and intensive care for advanced hemodynamic monitoring.[2] Despite its clinical value, PA catheterization carries risks—including arrhythmias, infection, thrombosis, and pulmonary artery rupture—that can increase patient apprehension if poorly explained.[3]

Conventional education methods—pamphlets, verbal briefings, and static documents—often fail to meet patients’ diverse literacy and emotional needs.[4] This has prompted growing interest in AI-powered chatbots as scalable, personalized alternatives to support health literacy.

Promise and pitfalls of AI in patient education

Recent studies show that generative AI can produce coherent, engaging health education materials. ChatGPT has demonstrated the ability to simplify radiology reports and address cancer-related queries with empathy.[5] Gemini offers similar conversational nuance and contextual adaptation. However, concerns remain regarding the accuracy and completeness of AI-generated content.[6] Incomplete or inaccurate information can mislead patients, affecting decision-making and informed consent.

Therefore, deploying AI for patient education requires rigorous validation—beyond linguistic coherence—to assess factual accuracy, completeness, readability, and tone, all critical for patient comprehension and trust.

PA catheter insertion: A high-stakes use case

PA catheter insertion, used for precise hemodynamic monitoring in high-risk patients, is often misunderstood due to its complexity.[3] This makes it ideal for assessing AI chatbots’ ability to simplify technical information for patients.

Gaps in the literature and study rationale

Despite growing interest in LLMs, a few studies compare ChatGPT and Gemini for procedural patient education. Existing research, largely in dermatology and other fields,[7,8,9] relies heavily on clinician assessments. Comparative studies using structured validation frameworks—such as face validity, content validity, and expert-rated accuracy—remain scarce, particularly for high-risk procedures like PA catheter insertion.

Study objectives

In response to this gap, the current study was designed as a structured, blinded comparison of ChatGPT and Gemini in generating patient education materials on PA catheter insertion. The content was evaluated by domain experts for accuracy, completeness, readability, and tone neutrality. In addition, structured validation metrics were applied to assess the pedagogical and clinical robustness of the generated outputs.

Primary Objective: To compare ChatGPT and Gemini in generating patient education content on PA catheter insertion, specifically assessing accuracy, completeness, readability, and tone through blinded expert evaluation.
Secondary Objective: To assess the face validity and content validity of the AI-generated materials using a structured validation framework, including expert-based content relevance scoring (CVI) and perceived appropriateness metrics.

METHODOLOGY

Study design and objective

This was a comparative, single-blinded validation and evaluation study conducted to assess the quality and clinical reliability of AI-generated patient education materials related to PA catheter insertion. Two LLMs—ChatGPT (OpenAI) and Gemini (Google)—were analyzed. The primary objective was to compare their outputs in terms of face validity, content validity, accuracy, completeness, readability, and tone neutrality using structured expert assessment tools. The schematic presentation of the study design, including AI content generation, validation, and evaluation, is summarized in the flowchart [Figure 1b]. A total of 15 domain experts participated: Ten assessed face and content validity, and five distinct experts conducted blinded evaluations of accuracy and completeness. Experts with prior involvement in AI development or with potential conflicts of interest were excluded to minimize bias.

Phase I: Generation and validation of AI outputs

Prompt standardization and AI output collection

A standardized patient-centered prompt was used to generate content from both chatbots:

“Generate a patient education material on PA catheter insertion for a patient undergoing cardiac surgery. Make it easy to read and understand, accurate to current practices and guidelines, and with a positive or enthusiastic tone.”

Both responses were collected on the same day to minimize variation from chatbot updates or contextual drift [Supplementary File 1].

Validation of generated content

Face Validity Assessment:

To establish preliminary acceptability and perceived utility, ten independent experts (board-certified anesthesiologists and intensivists with ≥5 years of clinical experience in cardiac anesthesia) were recruited to assess the AI-generated content for face validity.

Each expert reviewed both versions of the patient education material (anonymized and randomized) through google form (https://forms.gle/AVDyNT8qQ83mBqC86) circulated via mail and scored them on a 5-point Likert scale across the following domains:
- Appropriateness for patient education
- Clarity of information
- Relevance to the clinical context
- Perceived trustworthiness
Content Validity Index (CVI) Analysis:

A 10-item content checklist based on current practice guidelines and expert consensus was developed to assess the completeness and clinical adequacy of the generated content. Items included definitions, indications, procedural steps, risks, comfort measures, and tone appropriateness.

Experts rated each item on a 4-point relevance scale:
- 1 = Not relevant
- 2 = Somewhat relevant
- 3 = Quite relevant
- 4 = Highly relevant.

Scores of 3 or 4 were considered “valid”. The Item-level Content Validity Index (I-CVI) was calculated for each item, and the Scale-level CVI (S-CVI) was computed as the mean of I-CVI scores. A minimum S-CVI of 0.90 was considered indicative of strong content validity.

Phase II: Evaluation of AI outputs

Expert review process for accuracy and completeness

Five additional blinded expert reviewers (distinct from those involved in validation) independently assessed the deidentified outputs. Evaluation was done using a 10-point Likert scale for:

Accuracy: Factual correctness and alignment with current clinical standards
Completeness: Inclusion of key elements relevant to patient understanding of PA catheter insertion.

Readability analysis

Readability of each chatbot’s output was objectively assessed using two standardized metrics:

Flesch Reading Ease (FRE): Higher scores indicate easier readability.
Flesch-Kincaid Grade Level (FKGL): Reflects the U.S. grade level required to understand the text.

Both tools were implemented via online automated linguistic software available in the public domain (https://readabilityformulas.com/readability-scoring-system.php). Scores were compared to identify which output was more accessible to the average patient.

Sentiment and tone analysis

An online sentiment analysis tool freely available in the public domain (https://www.danielsoper.com/sentimentanalysis/default.aspx) was used to evaluate the emotional tone of the educational material. Each output was analyzed for a sentiment score ranging from –100 (very negative) to +100 (very positive). A predefined range of –25 to +25 was used to define a “neutral tone”, as appropriate for medical education content.

Assessor blinding

Both annexures were blinded for source and send to the experts with standardized scoring rubrics to minimize assessor bias. Experts remained blinded to chatbot identity throughout the evaluation and validation processes.

Ethical considerations

The entire study was done using online tools available in the public domain, and as this study did not involve human subjects or patient data, formal ethics approval was waived. However, the research adheres to ethical principles by prioritizing participant confidentiality, voluntary participation, and ensuring no harm to human subjects.

Statistical analysis

Descriptive statistics will be used to summarize all outcome measures. Continuous variables will be presented as mean ± standard deviation (SD) or median and interquartile range (IQR), as appropriate. Group comparisons between ChatGPT and Gemini will be performed using independent t-tests or Mann–Whitney U tests for nonparametric data. A P value < 0.05 will be considered statistically significant.

RESULTS

Face validity assessment

Face validity was established assessing four dimensions: appropriateness, clarity, relevance, and trustworthiness. Both ChatGPT and Gemini achieved high median scores across all domains. ChatGPT was slightly preferred for its perceived factual reliability, while Gemini received higher marks for clarity [Table 1].

Table 1.

Face validity assessment scores for AI-generated patient education materials

Face validity assessment scores of ChatGPT and Gemini outputs
Domain	Group	n	Median	IQR	Shapiro–Wilk		Percentiles		P*
Domain	Group	n	Median	IQR	W	P	25^th	75^th	P*
Appropriateness	ChatGPT	10	5.00	0.750	0.594	<0.001	4.25	5.00	0.681
Appropriateness	Gemini	10	5.00	1.000	0.640	<0.001	4.00	5.00	0.681
Clarity	ChatGPT	10	4.50	1.000	0.655	<0.001	4.00	5.00	1.000
Clarity	Gemini	10	4.50	1.000	0.655	<0.001	4.00	5.00	1.000
Relevance	ChatGPT	10	5.00	0.000	0.509	<0.001	5.00	5.00	0.366
Relevance	Gemini	10	5.00	1.000	0.640	<0.001	4.00	5.00	0.366
Perceived trustworthiness	ChatGPT	10	5.00	0.000	0.366	<0.001	5.00	5.00	0.064
Perceived trustworthiness	Gemini	10	4.50	1.000	0.655	<0.001	4.00	5.00

Open in a new tab

*Mann–Whitney U test

These results indicate that both models were generally viewed as appropriate and relevant for patient education, with ChatGPT having a slight edge in trustworthiness and Gemini excelling in readability and linguistic clarity. None of the domains scored below a median of 4.5, reflecting robust face validity overall.

Content validity index

To evaluate content coverage, ten clinical experts rated ten essential educational elements using a 4-point relevance scale. The I-CVI was calculated for each element, and S-CVI was derived as their mean [Table 2].

Table 2.

Content validity index (CVI) for AI-generated patient education materials

Content item evaluated	I-CVI (ChatGPT) (n=10)	I-CVI (Gemini) (n=10)
Definition of PA catheter	1.00	1.00
Indications for use	1.00	0.90
Description of insertion procedure	0.90	0.90
Monitoring parameters explained	0.90	0.80
Explanation of risks and complications	1.00	0.90
Pre- and postprocedure expectations	0.90	0.90
Patient reassurance and comfort-related language	0.80	0.90
Simplicity and tone suitability	0.90	1.00
Clarity in transitions and flow of explanation	0.80	0.90
Overall comprehensiveness	0.90	0.90

Open in a new tab

S-CVI (ChatGPT) = 0.91
S-CVI (Gemini) = 0.91.

Interpretation: Both models achieved strong content validity (S-CVI ≥0.9). ChatGPT was rated slightly higher on items related to factual and procedural accuracy, while Gemini scored better in language clarity and patient-centered tone.

Expert evaluation: Accuracy and completeness

In a separate blinded evaluation, five expert anesthesiologists assessed each chatbot’s output for accuracy and completeness on a 10-point scale. ChatGPT was rated significantly more accurate (P = 0.021), likely reflecting more precise alignment with current clinical practice. Completeness scores were comparable between the two models [Table 3].

Table 3.

Expert ratings for accuracy and completeness of AI-generated content

Expert ratings for accuracy and completeness of AI-generated content
Domain	Group	n	Mean	Median	SD	IQR	Shapiro–Wilk		Percentiles		P
Domain	Group	n	Mean	Median	SD	IQR	W	P	25^th	75^th	P
Accuracy	ChatGPT	5	8.80	9	0.447	0.00	0.552	<0.001	9.00	9.00	0.021*
Accuracy	Gemini	5	7.80	8	0.447	0.00	0.552	<0.001	8.00	8.00	0.021*
Completeness	ChatGPT	5	7.60	8	0.548	1.00	0.684	0.006	7.00	8.00	0.347^†
	Gemini	5	8.00	8	0.707	0.00	0.883	0.325	8.00	8.00

Open in a new tab

*Mann–Whitney U test. ^†Student t-test

Readability analysis

The readability of the AI-generated texts was assessed using the FRE and FKGL metrics. These objective indices help determine how accessible the text is for the average patient reader.

Gemini produced content that was both easier to read (higher FRE) and appropriate for a lower education level (lower FKGL) [Table 4]. These results suggest that Gemini may be better suited for broader patient populations with diverse literacy levels.

Table 4.

Readability analysis of AI-generated patient education material

Metric	ChatGPT	Gemini	Interpretation
Flesch reading ease (FRE)	54	65	Range [0 to 100] Gemini’s content is easier to read
Flesch-Kincaid grade level (FKGL)	8.64	7.58	Number shows corresponding grade level. Gemini requires a lower reading level
Sentiment analysis	21.6	20.8	Both chatbots generated neutral tone text output. Range: <–25 to –100 negative tone, –25 to+25 neutral tone, >+25 to 100 positive tone

Open in a new tab

Tone analysis

Both outputs were interpreted as neutral in tone, appropriate for the clinical context [Table 4]. No significant tonal bias was detected, and the scores were close, suggesting consistency in emotional framing.

DISCUSSION

This study systematically evaluated the effectiveness of two advanced LLMs, ChatGPT (OpenAI) and Gemini (Google), in creating patient educational materials about PA catheter insertion—a complex and critical medical intervention. Our findings highlight distinctive capabilities and limitations of each model, providing important insights into their potential application in clinical practice.

ChatGPT displayed superior accuracy and greater alignment with clinical standards, consistent with recent studies in other medical disciplines. Fattah et al.[6] reported a similar pattern in a broader medical inquiry context, where ChatGPT consistently outperformed Gemini in providing medically precise responses, suggesting its advantage when factual accuracy is paramount.

In a specialized setting focused on breast cancer, Gummadi et al.[10] observed that although ChatGPT’s responses were predominantly accurate, completeness varied significantly, highlighting an ongoing challenge of achieving comprehensive yet succinct patient education. Similarly, Stalp et al.[11] examined ChatGPT-generated therapy recommendations in breast cancer management, concluding that while generally effective in straightforward scenarios, the model struggled in more complex clinical contexts, particularly those involving sequential or nuanced therapeutic strategies. This aligns with our study, where ChatGPT was accurate in fundamental details but occasionally fell short in capturing the procedural nuances and comprehensive information—including potential complications, advantages, and alternatives—necessary to support unbiased patient education and informed consent for PA catheter insertion.

Gemini excelled notably in readability and patient-friendly presentation. These results reinforce findings from Phillips et al.,[12] who evaluated educational materials generated by both ChatGPT and Gemini related to neurological disorders, finding Gemini more readable and approachable for diverse patient populations. The ability to simplify complex medical language without significantly compromising informational value remains critical for improving health literacy, particularly given current recommendations from the U.S. Department of Health and Human Services and the American Medical Association that patient materials ideally be accessible at or below the sixth-grade reading level.[13] The importance of readability is underscored by recent systematic analyses evaluating medical responses generated by ChatGPT. Wei et al.,[14] in their systematic review and meta-analysis, emphasized that while ChatGPT-produced medical content generally scored high on readability and user-friendliness, concerns remained regarding the depth and accuracy of some medical information, particularly in complex clinical contexts. This highlights the ongoing challenge in achieving both accessibility and clinical precision, reinforcing the need for expert oversight when deploying AI-generated patient education materials.

Haver et al.[13] demonstrated that ChatGPT improved readability and reduced complexity in breast cancer educational content, highlighting the potential of LLMs to enhance accessibility. However, both their study and ours found that AI-generated materials often exceed the recommended Flesch-Kincaid Grade 6 level, underscoring the persistent challenge of simplifying complex medical information. Although Gemini produced lower reading grade levels than ChatGPT, neither met ideal readability standards. Additionally, while we used a generic prompt to reflect real-world AI use, more tailored prompts considering literacy, cultural context, and procedural alternatives could further improve content quality—an important area for future research.

Moreover, a study by Kotmungkun et al.[15] analyzing linguistic qualities and plagiarism rates found Gemini produced clearer, more cohesive, and readable narratives, though it occasionally compromised deeper content accuracy and originality compared to ChatGPT. This duality underscores our finding that Gemini’s strengths in readability may sometimes be offset by limitations in factual comprehensiveness.

From a technological acceptance perspective, Mheidly used the Technology Acceptance Model (TAM) to assess ChatGPT’s viability for disseminating public health awareness on breast cancer. They highlighted that the model’s user-friendly conversational interface significantly increased user acceptance, but concerns remained regarding content accountability and consistency due to ChatGPT’s evolving nature.[16] These concerns were echoed in our study, highlighting the necessity of professional oversight in clinical applications.

The application of AI tools in patient education also raises ethical and practical concerns. Masalkhi et al.[17] examined the comparative effectiveness of ChatGPT and Gemini in ophthalmology education, finding Gemini superior in multimodal integration capabilities, though ChatGPT provided more reliable and comprehensive clinical content. This reflects our broader interpretation that AI tools might best be used in complementary roles rather than as standalone resources.

Gondode et al.[18] reinforced the need for balancing readability, accuracy, and emotional tone, emphasizing that AI-generated patient information materials frequently require human oversight to ensure actionability and precision. Such findings substantiate our recommendation that a combined approach of AI-generated drafts reviewed by clinical experts likely offers the safest and most effective model for deploying these technologies in real-world healthcare settings. While this study focused solely on the textual component of AI-generated patient education materials, it is important to acknowledge that generative AI models are increasingly capable of producing visual content, including diagrams, images, and videos. Although these capabilities were not assessed here, they represent an exciting avenue for future research in multimodal patient education. The broader discourse on AI integration in healthcare reveals both enthusiasm and caution. Proponents highlight AI’s potential to improve accessibility and efficiency in health communication, while critics raise valid concerns regarding accuracy, accountability, and ethical considerations.[13,16] Our findings contribute to this ongoing debate by emphasizing the need for expert oversight in AI-generated content.

Finally, our findings and recent literature clearly demonstrate that despite rapid advancements, significant work remains to ensure reliability, safety, and clinical validity in AI-driven patient education. Given their strengths and limitations, neither ChatGPT nor Gemini alone is sufficient for comprehensive patient education. However, their potential as tools to enhance clinical workflows and patient engagement is evident. Only free-access versions of ChatGPT (V 3.5 by OpenAI) and Gemini (V 1.5 flash by Google) were utilized for this study. Paid, advanced versions may produce different outputs with potentially improved accuracy and readability. Future studies should explore the comparative performance of both free and paid AI tools. The path forward should involve a synergistic approach combining the linguistic fluency and readability advantages of Gemini with the clinical precision of ChatGPT, under expert supervision, ensuring accurate and empathetic communication tailored to patient needs. Although expert validation provides a reliable preliminary assessment of content accuracy and relevance, true clinical utility—particularly regarding patient comprehension, anxiety reduction, and informed decision-making—can only be established through direct patient testing. Future studies should involve patient populations to evaluate real-world effectiveness and acceptability of AI-generated educational materials.

Limitations

Despite the structured evaluation and expert validation processes employed in this study, several limitations merit consideration:

Single-Prompt Design: Each chatbot was assessed using a single, standardized, simplified prompt reflecting common clinical queries. While this ensured consistency, it may have underutilized the adaptive potential of LLMs. More refined prompts, tailored to patient literacy, procedural alternatives, and context, could improve content relevance and completeness.
Small Expert Sample Size: Although the study included experienced cardiac anesthesiologists and intensivists, the expert panel size was modest. Broader input from nurses, allied health educators, or layperson raters could have provided richer, more generalizable insights.
Absence of Layperson Evaluation: This study did not assess clinical utility through real-world patient testing, including comprehension, emotional response, or impact on informed consent.
Limited Scope of Analysis Tools: Readability and sentiment scores were generated using publicly available online tools:
- Readability Score Tool
- Daniel Soper’s Sentiment Analysis Tool
  
  These tools, while accessible and widely used in educational contexts, lack formal validation studies in medical text analysis. Thus, the accuracy and generalizability of the scores they produce should be interpreted cautiously. Future studies should consider using validated linguistic and psychometric instruments for greater robustness.
Potential for Plagiarism and Hallucination: As highlighted in prior studies, AI-generated content can inadvertently reproduce nonoriginal phrasing or exhibit confident but incorrect statements.[15] Although expert review mitigated this risk in our evaluation, it remains a known limitation in using generative AI for unsupervised deployment. Only free-access versions of ChatGPT and Gemini were evaluated; paid versions may yield different results. These aspects warrant future investigation.

CONCLUSION

This study underscores the nuanced capabilities of ChatGPT and Gemini in procedural patient education. ChatGPT offers higher factual reliability and clinical alignment, whereas Gemini provides more accessible, patient-friendly narratives. Both models demonstrate strong content validity and neutral tone, supporting their adjunctive use in clinical communication. However, optimal deployment demands human oversight, rigorous validation, and possibly a hybrid model that synergizes the strengths of both systems. As generative AI continues to evolve, embedding it thoughtfully into patient education workflows may enhance understanding, reduce anxiety, and promote shared decision-making—particularly in complex interventions like PA catheterization.

Presentation at a meeting

This manuscript has not been presented at any academic or professional meeting.

Clinical trial registration

Not applicable. This study did not involve human subjects or patient data. Ethics committee approval was waived, and informed consent was obtained from all participating expert reviewers.

Conflicts of interest

There are no conflicts of interest.

SUPPLEMENTARY FILE 1

Prompt used: [generate a patient education material on PA catheter insertion for a patient undergoing cardiac surgery. make it easy to read and understand, accurate to current practices and guidelines and with a positive or enthusiastic tone.]

Annexure A (Google gemini)

Annexure B (ChatGPT)

Understanding your pulmonary artery (PA) catheter
Hi there! You’re about to have cardiac surgery, and we want to make sure you understand everything that’s happening. One of the tools we use to keep a close eye on your heart and lungs during and after surgery is called a Pulmonary Artery (PA) catheter, sometimes called a Swan-Ganz catheter. Think of it as a super-smart monitor that gives us real-time information.
What is a PA catheter?
A PA catheter is a thin, flexible tube that we gently place into a large vein in your neck, chest, or groin. It travels through your heart and into your pulmonary artery, which carries blood to your lungs. This allows us to measure:

Pressures in your heart and lungs: This helps us understand how well your heart is pumping and how your lungs are working.
Cardiac output: This tells us how much blood your heart is pumping per minute, which is vital information.
Oxygen levels: We can see how well your blood is being oxygenated.
Other important values: We can also check other critical values that help us manage your care.

Why do you need a PA catheter? For cardiac surgery, this information is invaluable! It helps us:

Ensure your heart is functioning optimally during and after the procedure.
Manage your fluids and medications precisely.
Detect any changes in your heart or lung function quickly.
Provide the best possible care tailored to your specific needs.
Help you recover as smoothly as possible.

What to expect during insertion

We’ll clean and numb the insertion site, so you won’t feel much discomfort.
You might feel a slight pressure as the catheter is inserted.
We’ll use ultrasound or fluoroscopy (a type of X-ray) to guide the catheter safely.
The whole process usually takes just a few minutes, and we’ll keep you comfortable.

What to expect after insertion

Once in place, the catheter will be secured, and we’ll monitor it closely.
You might feel a little restricted in your movement, but we’ll help you stay as comfortable as possible.
The Nursing staff will be monitoring the information from the catheter constantly.
The catheter will be removed when you are stable and it is no longer needed.

Your comfort and safety Your comfort and safety are our top priorities. We’ll:

Use sterile techniques to minimize the risk of infection.
Monitor you closely for any complications.
Keep you informed and answer any questions you have.

Working together for a successful recovery The PA catheter is a valuable tool that helps us provide you with the best possible care. By working together, we can ensure a smooth and successful recovery. We’re here to support you every step of the way! Questions? Please don’t hesitate to ask any questions you have. We’re here to help you understand and feel comfortable with your care. We are a team, and we will get through this together!

Patient education: Pulmonary Artery (PA) catheter insertion for cardiac surgery
What is a PA catheter? A Pulmonary Artery (PA) catheter is a special thin, flexible tube that helps your medical team closely monitor your heart and circulation during and after your cardiac surgery. It provides important information about how your heart is working, ensuring you receive the best possible care.
Why is a PA catheter needed? During cardiac surgery, your heart and lungs need close monitoring. The PA catheter helps by:

Measuring pressures in your heart and lungs
Checking how well your heart is pumping blood
Guiding fluid and medication management
Ensuring enough oxygen is reaching your organs

This real-time monitoring allows your doctors to make quick and precise adjustments to keep you safe and stable. How is the PA catheter inserted? The procedure is done before surgery, usually in the operating room or ICU, under local anesthesia and sedation so you are comfortable. Here’s what happens:

A numbing medicine is applied to your neck, upper chest, or groin area.
A thin tube (catheter) is inserted into a large vein and carefully advanced into your heart and lungs.
The catheter is positioned in the pulmonary artery (a blood vessel leading to your lungs) using imaging guidance.
Once in place, it stays secured during and after surgery for continuous monitoring.

The procedure is quick and safe, and you should feel minimal discomfort. Is the PA Catheter Safe? Yes! PA catheter insertion is a well-established and routine procedure in heart surgeries. Some potential risks include:

Minor bleeding or bruising at the insertion site
Temporary irregular heartbeats (which usually go away on their own)
Rarely, infection or damage to a blood vessel

Your medical team takes every precaution to prevent complications and ensure your safety. What happens after surgery?

The PA catheter remains in place for a short time after surgery to monitor your recovery.
You may feel a slight sensation at the insertion site, but it should not be painful.
Once your heart function stabilizes, the catheter is safely removed without discomfort.

What should I expect?

Before insertion: You may be given sedation to help you relax.
During placement: You will feel minimal discomfort as the numbing medicine works.
After surgery: The catheter will be monitored and removed when it is no longer needed.

Final Thoughts A PA catheter is an important tool that helps your doctors provide the best care for your heart during surgery. If you have any questions or concerns, your healthcare team is happy to explain everything in more detail. Your heart health is our priority, and we are here to support you every step of the way!

Open in a new tab

Acknowledgment

The authors thank the Department of Anesthesiology and Critical Care Medicine, AIIMS Nagpur, for support and facilities. We also acknowledge the contributions of domain experts who evaluated and validated the AI-generated educational content.

Funding Statement

This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.

REFERENCES

1.Paterick TE, Patel N, Tajik AJ, Chandrasekaran K. Improving health outcomes through patient education and partnerships with patients. Proc (Bayl Univ Med Cent) 2017;30:112–3. doi: 10.1080/08998280.2017.11929552. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Senoner T, Velik-Salchner C, Tauber H. The pulmonary artery catheter in the perioperative setting: Should it still be used? Diagnostics (Basel) 2022;12:177. doi: 10.3390/diagnostics12010177. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Ziccardi RM, Khalid N. Treasure Island (FL): StatPearls Publishing; 2023. Pulmonary Artery Catheterization. [PubMed] [Google Scholar]
4.Dong H, Lio J, Sherer R, Jiang I. Some learning theories for medical educators. Med Sci Educ. 2021;31:1157–72. doi: 10.1007/s40670-021-01270-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Javaid M, Haleem A, Singh RP. ChatGPT for healthcare services: An emerging stage for an innovative perspective. TBench. 2023;3:100105. [Google Scholar]
6.Fattah FH, Salih AM, Salih AM, Asaad SK, Ghafour AK, Bapir R, et al. Comparative analysis of ChatGPT and Gemini (Bard) in medical inquiry: A scoping review. Front Digit Health. 2025;7:1482712. doi: 10.3389/fdgth.2025.1482712. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Cè M, Chiarpenello V, Bubba A, Felisaz PF, Oliva G, Irmici G, et al. Exploring the role of ChatGPT in oncology: Providing information and support for cancer patients. BioMedInformatics. 2024;4:877–88. [Google Scholar]
8.Lorenzi A, Pugliese G, Maniaci A, Lechien JR, Allevi F, Boscolo-Rizzo P, et al. Reliability of large language models for advanced head and neck malignancies management: A comparison between ChatGPT 4 and Gemini Advanced. Eur Arch Otorhinolaryngol. 2024;281:5001–6. doi: 10.1007/s00405-024-08746-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Pagano S, Strumolo L, Michalk K, Schiegl J, Pulido LC, Reinhard J, et al. Evaluating ChatGPT, Gemini and other Large Language Models (LLMs) in orthopaedic diagnostics: A prospective clinical study. Comput Struct Biotechnol J. 2024;28:9–15. doi: 10.1016/j.csbj.2024.12.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Gummadi R, Dasari N, Kumar DS, Pindiprolu SK. Evaluating the accuracy of large language model (ChatGPT) in providing information on metastatic breast cancer. Adv Pharm Bull. 2024;14:499–503. doi: 10.34172/apb.2024.060. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Stalp JL, Denecke A, Jentschke M, Hillemanns P, Klapdor R. Quality of ChatGPT-generated therapy recommendations for breast cancer treatment in gynecology. Curr Oncol. 2024;31:3845–54. doi: 10.3390/curroncol31070284. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Phillips V, Kiryakoza F, Arefin S, Choudhary N, Garifullin R. ChatGPT versus Google Gemini: A comparison to evaluate patient education guide created on common neurological disorders. Discov Artif Intell. 2024;4:79. [Google Scholar]
13.Haver HL, Gupta AK, Ambinder EB, Bahl M, Oluyemi ET, Jeudy J, et al. Evaluating the use of ChatGPT to accurately simplify patient-centered information about breast cancer prevention and screening. Radiol Imaging Cancer. 2024;6:e230086. doi: 10.1148/rycan.230086. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Wei Q, Yao Z, Cui Y, Wei B, Jin Z, Xu X. Evaluation of ChatGPT-generated medical responses: A systematic review and meta-analysis. J Biomed Inform. 2024;151:104620. doi: 10.1016/j.jbi.2024.104620. [DOI] [PubMed] [Google Scholar]
15.Kotmungkun S, Chompurach W, Thaksanan P. OpenAI ChatGPT vs Google Gemini: A study of AI chatbots’ writing quality evaluation and plagiarism checking. Engl Lang Teach Educ J. 2024;7:90–108. [Google Scholar]
16.Mheidly N. Unleashing the power of AI: Assessing the reliability of ChatGPT in disseminating breast cancer awareness. J Educ Health Promot. 2024;13:172. doi: 10.4103/jehp.jehp_1033_23. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Masalkhi M, Ong J, Waisberg E, Lee AG. Google DeepMind’s Gemini AI versus ChatGPT: A comparative analysis in ophthalmology. Eye (Lond) 2024;38:1412–7. doi: 10.1038/s41433-024-02958-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Gondode P, Duggal S, Garg N, Lohakare P, Jakhar J, Bharti S, et al. Comparative analysis of accuracy, readability, sentiment, and actionability: Artificial intelligence Chatbots (ChatGPT and Google Gemini) versus traditional patient information leaflets for local anesthesia in eye surgery. Br Ir Orthopt J. 2024;20:183–92. doi: 10.22599/bioj.377. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R1] 1.Paterick TE, Patel N, Tajik AJ, Chandrasekaran K. Improving health outcomes through patient education and partnerships with patients. Proc (Bayl Univ Med Cent) 2017;30:112–3. doi: 10.1080/08998280.2017.11929552. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] 2.Senoner T, Velik-Salchner C, Tauber H. The pulmonary artery catheter in the perioperative setting: Should it still be used? Diagnostics (Basel) 2022;12:177. doi: 10.3390/diagnostics12010177. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] 3.Ziccardi RM, Khalid N. Treasure Island (FL): StatPearls Publishing; 2023. Pulmonary Artery Catheterization. [PubMed] [Google Scholar]

[R4] 4.Dong H, Lio J, Sherer R, Jiang I. Some learning theories for medical educators. Med Sci Educ. 2021;31:1157–72. doi: 10.1007/s40670-021-01270-6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] 5.Javaid M, Haleem A, Singh RP. ChatGPT for healthcare services: An emerging stage for an innovative perspective. TBench. 2023;3:100105. [Google Scholar]

[R6] 6.Fattah FH, Salih AM, Salih AM, Asaad SK, Ghafour AK, Bapir R, et al. Comparative analysis of ChatGPT and Gemini (Bard) in medical inquiry: A scoping review. Front Digit Health. 2025;7:1482712. doi: 10.3389/fdgth.2025.1482712. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] 7.Cè M, Chiarpenello V, Bubba A, Felisaz PF, Oliva G, Irmici G, et al. Exploring the role of ChatGPT in oncology: Providing information and support for cancer patients. BioMedInformatics. 2024;4:877–88. [Google Scholar]

[R8] 8.Lorenzi A, Pugliese G, Maniaci A, Lechien JR, Allevi F, Boscolo-Rizzo P, et al. Reliability of large language models for advanced head and neck malignancies management: A comparison between ChatGPT 4 and Gemini Advanced. Eur Arch Otorhinolaryngol. 2024;281:5001–6. doi: 10.1007/s00405-024-08746-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] 9.Pagano S, Strumolo L, Michalk K, Schiegl J, Pulido LC, Reinhard J, et al. Evaluating ChatGPT, Gemini and other Large Language Models (LLMs) in orthopaedic diagnostics: A prospective clinical study. Comput Struct Biotechnol J. 2024;28:9–15. doi: 10.1016/j.csbj.2024.12.013. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] 10.Gummadi R, Dasari N, Kumar DS, Pindiprolu SK. Evaluating the accuracy of large language model (ChatGPT) in providing information on metastatic breast cancer. Adv Pharm Bull. 2024;14:499–503. doi: 10.34172/apb.2024.060. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] 11.Stalp JL, Denecke A, Jentschke M, Hillemanns P, Klapdor R. Quality of ChatGPT-generated therapy recommendations for breast cancer treatment in gynecology. Curr Oncol. 2024;31:3845–54. doi: 10.3390/curroncol31070284. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] 12.Phillips V, Kiryakoza F, Arefin S, Choudhary N, Garifullin R. ChatGPT versus Google Gemini: A comparison to evaluate patient education guide created on common neurological disorders. Discov Artif Intell. 2024;4:79. [Google Scholar]

[R13] 13.Haver HL, Gupta AK, Ambinder EB, Bahl M, Oluyemi ET, Jeudy J, et al. Evaluating the use of ChatGPT to accurately simplify patient-centered information about breast cancer prevention and screening. Radiol Imaging Cancer. 2024;6:e230086. doi: 10.1148/rycan.230086. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] 14.Wei Q, Yao Z, Cui Y, Wei B, Jin Z, Xu X. Evaluation of ChatGPT-generated medical responses: A systematic review and meta-analysis. J Biomed Inform. 2024;151:104620. doi: 10.1016/j.jbi.2024.104620. [DOI] [PubMed] [Google Scholar]

[R15] 15.Kotmungkun S, Chompurach W, Thaksanan P. OpenAI ChatGPT vs Google Gemini: A study of AI chatbots’ writing quality evaluation and plagiarism checking. Engl Lang Teach Educ J. 2024;7:90–108. [Google Scholar]

[R16] 16.Mheidly N. Unleashing the power of AI: Assessing the reliability of ChatGPT in disseminating breast cancer awareness. J Educ Health Promot. 2024;13:172. doi: 10.4103/jehp.jehp_1033_23. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] 17.Masalkhi M, Ong J, Waisberg E, Lee AG. Google DeepMind’s Gemini AI versus ChatGPT: A comparative analysis in ophthalmology. Eye (Lond) 2024;38:1412–7. doi: 10.1038/s41433-024-02958-w. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] 18.Gondode P, Duggal S, Garg N, Lohakare P, Jakhar J, Bharti S, et al. Comparative analysis of accuracy, readability, sentiment, and actionability: Artificial intelligence Chatbots (ChatGPT and Google Gemini) versus traditional patient information leaflets for local anesthesia in eye surgery. Br Ir Orthopt J. 2024;20:183–92. doi: 10.22599/bioj.377. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Comparative Evaluation of Popular Gen-AI Chatbots in Generating Patient Education Material on Pulmonary Artery Catheter Insertion

Omshubham Gangadhar Asai

Nayana Sabu

Prakash Gondode

Soumya Das

Gajanan Chauhan

Abstract

Introduction:

Methodology:

Results:

Conclusion:

INTRODUCTION

Figure 1.

The need for patient education in high-risk interventions

Promise and pitfalls of AI in patient education

PA catheter insertion: A high-stakes use case

Gaps in the literature and study rationale

Study objectives

METHODOLOGY

Study design and objective

Phase I: Generation and validation of AI outputs

Prompt standardization and AI output collection

Validation of generated content

Phase II: Evaluation of AI outputs

Expert review process for accuracy and completeness

Readability analysis

Sentiment and tone analysis

Assessor blinding

Ethical considerations

Statistical analysis

RESULTS

Face validity assessment

Table 1.

Content validity index

Table 2.

Expert evaluation: Accuracy and completeness

Table 3.

Readability analysis

Table 4.

Tone analysis

DISCUSSION

Limitations

CONCLUSION

Presentation at a meeting

Clinical trial registration

Conflicts of interest

SUPPLEMENTARY FILE 1

Acknowledgment

Funding Statement

REFERENCES

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases