Abstract
Background
Patient comprehension of spine MRI reports remains a significant challenge, potentially affecting healthcare engagement and outcomes. Artificial Intelligence (AI) may offer a solution for interpreting complex medical terminology into layman's terms language.
Objective
To evaluate the effectiveness of AI-based interpretation of spine MRI reports in improving patient comprehension and satisfaction.
Methods
A prospective, single-center survey study was conducted at a single institution's multidisciplinary pain and spine clinics from May 2024 to November 2024, enrolling 102 adult patients scheduled for spine MRI. Imaging reports were interpreted using a single AI-based Large Language Model (LLM) that is securely operated within the hospital's network, with interpretations independently reviewed by healthcare providers and research coordinators. A board-certified neuroradiologist evaluated the accuracy of AI interpretations using a standardized 5-point scale. We analyzed survey responses from participants who received both their original MRI reports and AI-interpreted versions, comparing comprehension, clarity, engagement, and satisfaction.
Results
Participants reported higher comprehension with AI-interpreted MRI reports versus original radiology reports (8.50 ± 1.91 vs 6.56 ± 2.42; P < .001). AI interpretations received superior scores for clarity (8.57 ± 1.79 vs 6.96 ± 2.12; P < .001), understanding of medical conditions (7.75 ± 2.18 vs 6.27 ± 2.28; P < .001), and healthcare engagement (8.35 ± 2.00 vs 6.78 ± 2.48; P < .001). Accuracy assessment showed that 82.4 % of AI interpretations achieved high-quality ratings (≥4) [95 % CI: 69.7%–90.4 %], while 92.2 % were rated acceptable (≥3). Most participants (54.0 %) assigned the highest possible recommendation scores to AI interpretation. No significant differences were found between age groups and gender.
Conclusions
AI-based interpretation of spine MRI reports significantly improved patient comprehension and satisfaction. Despite the promise of rapidly evolving AI-based technologies, a considerable percentage of AI interpretations were deemed to be inaccurate, warranting the need for further research.
Keywords: Magnetic resonance imaging, Artificial intelligence, Patient education, Natural language processing, Patient satisfaction, Healthcare communication, Radiology reports, Patient engagement, Spine imaging
1. Introduction
Magnetic Resonance Imaging (MRI) is the diagnostic gold standard for spinal pathology evaluation, providing high-resolution anatomical detail that guides clinical decision-making in routine medical practice [1]. While standard radiology reports (SRR) effectively communicate findings among healthcare providers, direct patient access to these reports through modern healthcare information systems exposes them to complex medical terminology that poses barriers to comprehension, potentially impeding effective patient-provider communication [2,3].
The growing accessibility of SRR has led to significant comprehension challenges faced by patients, resulting in increased anxiety and decreased satisfaction with care [[4], [5], [6]]. A systematic review of 13 studies identified that comprehension of medical terminology and technical language constitutes a significant barrier to patient understanding of SRR, frequently resulting in compensatory information-seeking behaviors and risk of misinterpretation [7].
To address these challenges, various strategies have been implemented to enhance patient comprehension and reduce anxiety associated with SRR interpretation, with translation into lay language emerging as a particularly effective approach [8]. In this context, the transformative impact of Artificial Intelligence (AI) and Natural Language Processing (NLP) demonstrates promising potential for improving healthcare communication by enhancing medical information accessibility and understanding [9]. AI-based LLMs can generate human-like text, as well as analyze, summarize, and interpret human language using artificial neural networks [2,9]. While these artificial intelligence systems show promise across various radiological subspecialties, their application in spine MRI interpretation remains understudied.
This study aims to assess whether AI-assisted language simplification and interpretation of SRR of the spine can serve as a complementary tool to support patient-provider communication and improve the overall care experience.
2. Methods
2.1. Study design and population
This prospective, single-center survey study, approved by Mayo Clinic's Institutional Review Board (#23–009415), was conducted from May 2024 to November 2024. We specifically enrolled adults (aged ≥18 years) scheduled for spinal MRI ordered by providers in our institution's Pain Medicine Department or Spine Center. Notably, the study excluded patients with reports indicating catastrophic diagnoses (e.g., metastasis or new cancer diagnoses).
2.2. AI translation protocol
AI-assisted interpretation of SRR was performed using Microsoft Copilot through the hospital's secure network, with all chat interactions deleted after content extraction to maintain data security. A standardized prompt was used for all SRR: "Please provide a translation that explains these findings in a straightforward and compassionate manner, suitable for patients without medical background." Before distribution, each AI-translated report underwent independent review by the patient's healthcare provider and a research coordinator to verify accuracy and appropriateness.
2.3. Data collection
Participants accessed their SRR via standard clinical communication channels within the healthcare system. Specifically, participants retrieved SRR information through the institutional health portal. After reviewing the original report, researchers transmitted the AI-interpreted SRR through the electronic medical record message portal, accompanied by a survey link. Participants evaluated their experience using a 25-item survey on a 10-point Likert scale, assessing report comprehension and satisfaction metrics. The complete questionnaire is available as supplementary material.
2.4. Quality assurance
Two independent reviewers (the assigned healthcare provider and research coordinator) evaluated each AI translation against its original report for accuracy. The AI interpretation length varied according to the complexity and detail of the original SRR. All AI interpretations met quality standards without requiring modification. Using a 5-point scale (1 = poor, 5 = excellent) adapted from a previous study [10], a board-certified neuroradiologist independently rated the accuracy of the AI-interpreted MRI reports. The evaluation criteria focused on the preservation of key diagnostic findings, accuracy of anatomical descriptions, and maintenance of clinically relevant details. The complete accuracy assessment scale and detailed rating methodology are provided in the supplementary material.
2.5. Statistical analysis
Descriptive statistics summarized participant demographics. The Shapiro-Wilk test assessed the distribution normality of outcome measures. Due to non-normal distributions (all P < .05), we employed Wilcoxon signed-rank tests to compare ratings between original and AI-translated reports across understanding, clarity, impact, and engagement domains. We applied the Bonferroni correction, adjusting the significance threshold to α = .05/8 = 0.0063 for our primary analyses of eight outcome measures.
For secondary analyses, we used Kruskal-Wallis tests to examine age group differences and Mann-Whitney U tests to assess gender differences in preferences for future medical reports and the likelihood of recommending AI interpretation. All outcome measures used 10-point scales (range, 0–10), with higher scores indicating a stronger preference for AI-translated reports.
Statistical tests were two-sided, with P < .05 indicating significance. We did not adjust for multiple comparisons. We performed all analyses using R version 4.4.2.
3. Results
Of the 102 patients recruited, 95 eligible participants received the questionnaire, with 51 completing the study (53.7 % response rate; Fig. 1).The cohort included 26 men (51.0 %) and 25 women (49.0 %), with 21 participants (41.2 %) aged 71–80 years. Most participants (56.9 %) held bachelor's degrees or higher (Table 1).
Fig. 1.
Participant flow diagram.
Table 1.
Baseline demographic characteristics of study participants.
| Age, years | n = 51 |
|---|---|
| 31–40 | 1 (2.0 %) |
| 41–50 | 2 (3.9 %) |
| 51–60 | 11 (21.6 %) |
| 61–70 | 14 (27.5 %) |
| 71–80 | 21 (41.2 %) |
| Over 80 | 2 (3.9 %) |
| Gender | |
| Female | 25 (49.0 %) |
| Male | 26 (51.0 %) |
| Education level | |
| No formal education | 1 (2.0 %) |
| High school diploma or equivalent | 8 (15.7 %) |
| Some college, no degree | 8 (15.7 %) |
| Associate's degree | 5 (9.8 %) |
| Bachelor's degree | 15 (29.4 %) |
| Master's degree | 11 (21.6 %) |
| Doctoral Degree | 3 (5.9 %) |
AI-interpreted MRI reports demonstrated superior performance across all evaluation metrics compared with original reports (Table 2). Radiologist evaluation of AI interpretation accuracy revealed that 82.4 % of reports (42/51) achieved high-quality ratings (≥4) [95 % CI: 69.7 %–90.4 %], with 51.0 % receiving the highest rating of 5 (excellent) and 31.4 % rated as 4 (good). Only 9.8 % of reports received an average rating (3), while 7.9 % were rated below average or poor (≤2). Scores below 3 were secondary to omissions deemed to be important and potentially relevant to patient symptoms rather than inaccuracy of the reported findings.
Table 2.
Comparative analysis of patient responses between SRR and AI-Interpreted MRI spine reports.
| Metric | Original Reports | AI Reports | Mean Difference | p-value∗ |
|---|---|---|---|---|
| Understanding of MRI report | 6.56 ± 2.42 (5.90–7.22) | 8.50 ± 1.91 (7.98–9.02) | 1.94 | <0.001 |
| Clarity of MRI report | 6.96 ± 2.12 (6.38–7.54) | 8.57 ± 1.79 (8.08–9.06) | 1.61 | <0.001 |
| Understanding of medical condition | 6.27 ± 2.28 (5.64–6.90) | 7.75 ± 2.18 (7.15–8.35) | 1.48 | <0.001 |
| Preferred method to receive reports | 6.84 ± 2.49 (6.16–7.52) | 8.41 ± 1.97 (7.87–8.95) | 1.57 | 0.008 |
| Engagement with healthcare journey | 6.78 ± 2.48 (6.10–7.46) | 8.35 ± 2.00 (7.80–8.90) | 1.57 | <0.001 |
| Clarity of medical terms | 6.47 ± 2.48 (5.79–7.15) | 8.71 ± 1.66 (8.25–9.17) | 2.24 | <0.001 |
| Completeness of information | 7.39 ± 2.09 (6.82–7.96) | 8.51 ± 2.07 (7.94–9.08) | 1.12 | 0.008 |
| Overall satisfaction | 7.24 ± 2.06 (6.67–7.81) | 8.63 ± 1.97 (8.09–9.17) | 1.39 | 0.008 |
Data are presented as mean ± SD (95 % CI). ∗p-values are Bonferroni-corrected for multiple comparisons (significance level adjusted to α = .05/8 = 0.0063).
Patients reported significantly higher comprehension scores for AI-interpreted reports versus SRR (8.50 ± 1.91 vs 6.56 ± 2.42; P < .001). AI interpretation enhanced report clarity (8.57 ± 1.79 vs 6.96 ± 2.12; P < .001), patient understanding of their medical condition (7.75 ± 2.18 vs 6.27 ± 2.28; P < .001), and healthcare engagement (8.35 ± 2.00 vs 6.78 ± 2.48; P < .001).
Analysis of patient perceptions relative to the use of AI revealed high patient confidence in discussing conditions and treatments (8.22 ± 1.97), with 39.2 % of participants assigning maximum scores of 10 (Fig. 2). Patients strongly preferred AI-interpreted reports for future use (8.35 ± 2.11; 45.1 % maximum scores) and demonstrated a high likelihood to recommend the service (8.70 ± 1.97; 54.0 % maximum scores). These preferences remained consistent across age groups (P = .32) and gender (P = .62). AI-translated reports significantly outperformed original SRR in information completeness (8.51 ± 2.0 vs 7.39 ± 2.09; P < .001) and overall satisfaction (8.63 ± 1.97 vs 7.24 ± 2.06; P < .001) (see Fig. 3).
Fig. 2.
Participant response analysis: Satisfaction, communication confidence, and recommendation Intent for AI-Interpreted reports. No significant differences were found between gender or age groups (p > .05).
Fig. 3.
Accuracy of AI-Interpreted MRI reports.
4. Discussion
We found that MRI reports can be successfully interpreted via AI-based LLMs into patient-friendly formats. Participants' responses showed that AI-interpreted results can improve their understanding of medical reports. Specifically, the AI-generated translations enhanced patients' comprehension of complex medical information, providing a more accessible format for interpreting diagnostic findings. Our findings suggest that integrating AI-assisted interpretation of SRR may help reduce barriers between technical, medical language, and patient comprehension. Similar to our results, a multicenter analysis of 685 spine MRI reports showed that AI-assisted SRR interpretation can improve comprehension of medical findings. Understanding scores increased significantly from 2.71 ± 0.73 for original reports to 4.69 ± 0.48 for AI-generated versions (P < .001) among non-physician raters [10]. Based on previous studies demonstrating AI's ability to simplify medical language, our positive findings may have been expected. However, while our participants reported improved understanding of their medical conditions and increased healthcare engagement after reviewing the AI-interpreted reports, important questions remain about the depth of medical understanding. AI interpretations, while improving accessibility, may not fully capture the nuanced clinical details such as pathophysiological mechanisms, disease progression, prognosis, and long-term implications. Moreover, the potential for patients to seek additional information using AI-generated terms raises critical concerns about the risk of misinformation, as the simplified language might lead to incomplete or potentially misleading internet searches.
A notable observation from our analysis was that participants felt more confident about discussing their medical conditions using AI-interpreted reports. This could improve communication and strengthen shared decision-making and treatment compliance. Broader studies of AI in healthcare report similar benefits, highlighting its ability to enhance clinician-patient communication and support collaborative care [10,11]. Patient engagement is critical in spinal condition management, affecting compliance with conservative treatments and surgical decision-making. Our findings provide evidence that AI-assisted interpretation may help enhance patient comprehension, addressing existing communication barriers in healthcare. A study by Krysa et al. highlights the importance of understanding, patient engagement, and trust in improving outcomes for patients with spinal disorders. Patients were likely to gain motivation to participate in their spinal rehabilitation when they felt empowered and well-informed, thus emphasizing the importance of active patient engagement. In contrast, patients who reported not being involved in decision-making affected their trust in health providers and hampered their rehabilitation progress [12].
The participants in our study demonstrated a notably high likelihood of recommending AI interpretation, with 54.0 % of respondents giving the maximum recommendation score. This highlights the participant's readiness to adopt this innovative measure, possibly driven by their significantly higher sense of completeness and over-satisfaction with AI-generated reports seen in our participants.
The neuroradiologist's evaluation showed 82.4 % of AI interpretations achieved high-quality ratings (≥4) and 92.2 % were acceptable (≥3), demonstrating strong clinical accuracy. Interestingly, while radiologist reports typically follow a standardized format and terminology, AI interpretations demonstrated notable variability in presentation styles despite using identical prompts and the same AI model. Some interpretations favored bullet points and clear hierarchical organization, while others used more narrative, conversational approaches. This heterogeneity in AI output formatting may affect how patients comprehend, engage with, and prefer medical information, suggesting the need for studies exploring the impact of AI-interpreted SRR formatting on patient preferences and comprehension of MRI findings.Challenges certainly exist, such as ethical concerns surrounding the use of AI in disseminating health information and the need for further research to evaluate AI's broader applicability across diverse imaging modalities, patient cases, and age groups. Our study's findings suggest a promising role in enhancing the patient experience and providing a feasible way to generate reports in “layman's terms” that patients may prefer. If we successfully navigate the known ethical challenges concerning the use of AI in medicine, such tools may improve patient understanding and promote active engagement.
5. Limitations
This study has several significant limitations. First, there is an inherent bias in comparing AI-translated reports to SRR, as the latter were written in technical language intended for healthcare providers rather than patients. This fundamental difference in the intended audience may have influenced participants' satisfaction ratings. Second, our single-center design with recruitment limited to our pain medicine and spine clinics limits generalizability to other clinical environments. The study population was predominantly older and highly educated, and their experiences may not represent other patient populations. The 53 % response rate could have introduced selection bias, potentially favoring participants more comfortable with technology or more engaged in their healthcare. Third, methodological limitations include the lack of randomization in report presentation order (original always preceded AI translation), variable time intervals between participants reading original and AI-translated reports, absence of objective comprehension assessment, reliance on subjective Likert scales for all outcomes, and no control for potential recall bias when comparing reports. Fourth, technological constraints merit consideration. Using a single AI model (Copilot) within a secure hospital network may not reflect the performance of other AI translation tools. While three independent healthcare providers reviewed translations for accuracy, only one of them was an experienced radiologist. Finally, we did not assess long-term outcomes, such as the impact on patient-provider communication or healthcare decision-making. The absence of follow-up data limits our understanding of the sustained utility of AI-translated reports in clinical practice.
6. Conclusion
Translating complex radiology reports using AI into layman's terms appears to enhance patient understanding of medical findings and improve satisfaction with medical communication. However, this study is limited by its ability to recommend the integration of AI-interpreted radiology reports into routine medical practice, as a considerable percentage of reports were graded as inaccurate by radiologist standards. While the AI horizon shows promise, additional research and technological advancement will be necessary before healthcare providers can confidently rely on AI technologies for SRR interpretation.
Declaration of competing interest
The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: CLH reports a relationship with Nevro that includes funding grants. If there are other authors, they declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Footnotes
Supplementary data to this article can be found online at https://doi.org/10.1016/j.inpm.2025.100550.
Appendix A. Supplementary data
The following is the Supplementary data to this article:
References
- 1.Florkow M.C., et al. Magnetic resonance imaging versus computed tomography for three-dimensional bone imaging of musculoskeletal pathologies: a review. J Magn Reson Imag. 2022;56(1):11–34. doi: 10.1002/jmri.28067. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Li H., et al. Decoding radiology reports: potential application of OpenAI ChatGPT to enhance patient understanding of diagnostic reports. Clin Imag. 2023;101:137–141. doi: 10.1016/j.clinimag.2023.06.008. [DOI] [PubMed] [Google Scholar]
- 3.Jeblick K., et al. ChatGPT makes medicine easy to swallow: an exploratory case study on simplified radiology reports. Eur Radiol. 2024;34(5):2817–2825. doi: 10.1007/s00330-023-10213-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Alpert J.M., et al. Patient access to clinical notes in oncology: a mixed method analysis of oncologists' attitudes and linguistic characteristics towards notes. Patient Educ Counsel. 2019;102(10):1917–1924. doi: 10.1016/j.pec.2019.05.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Alarifi M., et al. Understanding patient needs and gaps in radiology reports through online discussion forum analysis. Insights Imaging. 2021;12(1):50. doi: 10.1186/s13244-020-00930-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Delić D., et al. Anxiety of patients at magnetic resonance imaging screening. Psychiatr Danub. 2021;33(Suppl 4):762–767. [PubMed] [Google Scholar]
- 7.Rogers C., et al. Patient experience of imaging reports: a systematic literature review. Ultrasound. 2023;31(3):164–175. doi: 10.1177/1742271X221140024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.van der Mee F.A.M., et al. The impact of different radiology report formats on patient information processing: a systematic review. Eur Radiol. 2024 doi: 10.1007/s00330-024-11165-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Sallam M. ChatGPT utility in healthcare education, research, and practice: systematic review on the promising perspectives and valid concerns. Healthcare (Basel) 2023;11(6) doi: 10.3390/healthcare11060887. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Park J., et al. Patient-centered radiology reports with generative artificial intelligence: adding value to radiology reporting. Sci Rep. 2024;14(1) doi: 10.1038/s41598-024-63824-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Lee S., et al. Artificial intelligence in spinal imaging and patient care: a review of recent advances. Neurospine. 2024;21(2):474–486. doi: 10.14245/ns.2448388.194. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Krysa J.A., et al. Empowerment, communication, and navigating care: the experience of persons with spinal cord injury from acute hospitalization to inpatient rehabilitation. Front Rehabil Sci. 2022;3 doi: 10.3389/fresc.2022.904716. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.



