ABSTRACT
Background
Artificial intelligence (AI) chatbots perform well in answering English cancer questions. For Spanish, their performance is unknown and may differ by free vs. paywall versions.
Methods
We evaluated the quality (range: 1–5 points), actionability (range: 0–100%), and readability (range: 1–13 grades) of six popular AI chatbots in responding to the 15 most searched Spanish questions regarding breast, prostate, and colon cancer.
Results
The quality of overall AI chatbot responses was good (mean [95% CI]: 3.5 [3.4–3.6] points), while the actionability was low (mean [95% CI]: 35.6% [30.8%–40.3%]). The readability was high‐school‐level (mean [95% CI]: 9.2 [8.8–9.6] grades), not concordant with the American Medical Association recommendation (≤ 6th grade). The quality, actionability, and readability did not differ by free and paywall versions (p > 0.05).
Conclusion
Our findings suggested AI chatbots may generate good‐quality responses to Spanish cancer questions, regardless of free or paywall versions. However, further improvement in actionability and readability is needed to benefit Spanish‐speaking patients.
Keywords: artificial intelligence, health literacy, Hispanic Americans, natural language processing, neoplasms, patient education
Artificial intelligence (AI) chatbots are trained using English materials, and their performance in answering Spanish cancer questions is unknown. After comparing the six most popular AI chatbots, we found their answers were generally accurate (but hard to read) and needed more improvement in supporting patients to make cancer‐related decisions.

1. Introduction
After cancer diagnosis, 60% of patients in the US seek online information about cancer symptoms, diagnoses, and treatments [1], and artificial intelligence (AI) chatbots are increasingly recognized as a paradigm shift for patients to access such information [2]. Prior studies suggested that AI chatbots performed well in answering English cancer‐related questions [3, 4, 5, 6, 7], likely because they were predominantly trained on English materials [8, 9]. In addition to free versions, these AI companies also offer subscription‐based paywall versions claiming to provide enhanced capabilities. Thus, there have been increasing concerns that the linguistic imperialism of English and paywall barriers of AI chatbots may exacerbate existing disparities in cancer‐related health literacy for minority populations.
Spanish is the most spoken language after English in the US and spoken by 94% of US Hispanics/Latinos (the largest US minority population) [9]. Few studies have evaluated the performance of AI chatbot responses to Spanish cancer questions. Therefore, we utilized 6 popular AI chatbots (free and paywall versions) and assessed the quality, actionability, and readability of their responses to most searched Spanish cancer questions in the US.
2. Methods
This cross‐sectional study focused on the top 3 most common cancers diagnosed in US Hispanics/Latinos including cancer de mama (breast cancer), cancer de prostata (prostate cancer), and cancer de colon (colon cancer) [10]. From Google Trends (searching time range: from January 1, 2020 to January 1, 2024), we identified the top 5 most searched Spanish cancer questions in the US associated with these cancers (Supporting Information Methods and Table S1). On March 20, 2024, we put 15 searching‐related questions into free and paywall versions of ChatGPT (OpenAI Inc.), Claude (Anthropic PBC), and Gemini (Google LLC) available then: ChatGPT 3.5 (free), ChatGPT 4.0 (paywall), Claude 3 Sonnet (free), Claude 3 Opus (paywall), Gemini 1.0 (free), and Gemini 1.0 Advanced (paywall). ChatGPT, Claude, and Gemini were among the most commonly used AI chatbots when our study was conducted [11]. Out of 90 generated Spanish responses, we provided a sample for illustration (Table 1). This study obtained Institutional Review Board approval and followed the Standards for Reporting Qualitative Research (SRQR) reporting guideline.
TABLE 1.
Examples of free and paywall versions of AI chatbot responses to a most searched spanish question regarding breast cancer (cancer de mama).
| AI chatbot type | Response to “Cancer de mama sintomas (breast cancer symptoms)” |
|---|---|
| ChatGPT 3.5 (free) |
Los síntomas del cáncer de mama pueden variar de una persona a otra, y algunas personas pueden no experimentar síntomas en absoluto. Sin embargo, algunos síntomas comunes del cáncer de mama incluyen:
Es importante recordar que estos síntomas no siempre indican cáncer de mama, y muchas veces pueden ser causados por afecciones benignas. Sin embargo, cualquier cambio o síntoma inusual en la mama debe ser evaluado por un médico para determinar su causa y recibir un diagnóstico adecuado. La detección temprana puede mejorar significativamente las posibilidades de un tratamiento exitoso. |
| ChatGPT 4.0 (paywall) |
Los síntomas del cáncer de mama pueden variar de una persona a otra, y en algunos casos, el cáncer de mama puede ser asintomático, especialmente en sus etapas iniciales. Sin embargo, hay varios síntomas y signos que pueden indicar la presencia de cáncer de mama. Estos incluyen:
Es importante recordar que estos síntomas no significan definitivamente que una persona tenga cáncer de mama, ya que muchos de ellos pueden ser causados por condiciones benignas. Sin embargo, si se detecta alguno de estos síntomas, es crucial consultar a un médico para realizar un diagnóstico y, si es necesario, comenzar el tratamiento lo antes posible. La detección temprana es clave para mejorar las posibilidades de tratamiento exitoso. |
| Claude 3 Sonnet (free) |
Los síntomas más comunes del cáncer de mama incluyen:
Es importante tener en cuenta que estos síntomas también pueden presentarse en condiciones benignas de la mama. Sin embargo, si notas algún cambio inusual, es recomendable consultar a un profesional médico para un examen y evaluación adecuados. La detección temprana a través del autoexamen mamario y las mamografías periódicas aumenta significativamente las probabilidades de un tratamiento exitoso. |
| Claude 3 Opus (paywall) |
Los síntomas del cáncer de mama pueden variar de una persona a otra, y algunos tipos de cáncer de mama pueden no presentar síntomas evidentes en las primeras etapas. Sin embargo, los siguientes son algunos de los síntomas más comunes:
Es importante recordar que estos síntomas también pueden ser causados por condiciones benignas. Sin embargo, si notas alguno de estos cambios, es crucial que consultes a un médico para un diagnóstico adecuado. La detección temprana es clave para un tratamiento exitoso del cáncer de mama. |
| Gemini 1.0 (free) |
Es importante estar familiarizada con los posibles síntomas del cáncer de mama para que pueda detectarlo lo antes posible. La detección temprana del cáncer de mama es fundamental para un tratamiento eficaz. Aquí están algunos de los síntomas más comunes del cáncer de mama:
(https://www.healthline.com/health/breast‐cancer/warning‐signs#early‐signs)
Es importante tener en cuenta que estos síntomas también pueden ser causados por otras afecciones benignas (no cancerosas). Pero si experimenta alguno de estos síntomas, es importante que consulte a su médico para que lo revise. La Sociedad Americana del Cáncer [American Cancer Society] recomienda que todas las mujeres con un riesgo promedio de cáncer de mama se realicen una mamografía de detección a partir de los 40 años. Una mamografía es una radiografía de la mama que puede detectar cambios en el tejido mamario que podrían ser cancerosos. |
| Gemini 1.0 advanced (paywall) |
Aquí te detallo los síntomas más comunes del cáncer de mama. Es importante recalcar que estos pueden variar entre personas, y que algunas podrían no presentarlos en las etapas iniciales de la enfermedad.
Síntomas frecuentes del cáncer de mama
:
¿Qué hacer si noto algún síntoma? Si detectas alguno de estos síntomas, es crucial que consultes con tu médico lo antes posible. Aunque no siempre indican cáncer de mama, es importante hacer una evaluación profesional para descartar cualquier problema grave. La detección temprana es vital para mejorar las probabilidades de un tratamiento exitoso. ¡Recuerda! La autoexploración mamaria es importante para familiarizarte con tus senos y notar cambios. Sin embargo, no sustituye las mamografías regulares, que son la mejor forma de detectar el cáncer de mama en estadios tempranos cuando aún es muy tratable. |
For each response, the quality was assessed using the DISCERN Instrument (16 questions; an overall score ranging from 1 [low quality] to 5 [high quality]) [12], actionability using the Patient Education Materials Assessment Tool (7 questions; an overall score ranging from 0 [no clear action suggestions] to 100% [clear action suggestions]) [13], readability using the Fernández Huerta Reading Grade Level (calculated using a mathematical formula; an overall score ranging from 1 [1st grade] to 13 [college]) [14]. Particularly, actionability represented the degree to which AI chatbot responses may help individuals with diverse backgrounds and varying levels of health literacy in identifying specific actions to take.
Two board‐certified oncologists (J.D.A. and C.B.‐R.) speaking native Spanish were blinded to AI chatbots and independently assessed responses. Raters completed a robust rater training process including (1) an overview of research purpose and specific aims; (2) completing a didactive review of assessment tools; (3) sample practice and calibration; (4) inter‐rater reliability assessment; (5) discussing common challenges and conflict resolution; and (6) following standard operating procedures to assess all responses. In the phase of pilot testing, we used the most searched Spanish questions for liver cancer (cancer de higado) as an example and generated the related responses for raters' practice. Any discrepancies were referred to another investigator (E.C.) and discussed among these three investigators to reach consensus. Each oncologist involved in this study for rating responses had more than 10 years of experience in clinical practice, and the intraclass correlation coefficient (ICC) was 0.91 suggesting excellent agreement [15].
To compare free vs. paywall versions across AI chatbots, we utilized linear mixed‐effect models to assess their differences of quality, actionability, and readability (Supporting Information Methods). To compare free vs. paywall versions within each type (ChatGPT, Claude, or Gemini), we used the paired t‐test instead. Estimates were presented as means with 95% confidence intervals (CI), and two‐sided p < 0.05 was considered statistically significant. All analyses were conducted using R (Version 4.3.1) from September 27, 2024, to October 16, 2024.
3. Results
Of 90 AI chatbot responses (Figure 1), the quality of overall AI chatbot responses was good (mean [95% CI]: 3.5 [3.4–3.6] points), although the actionability was low (mean [95% CI]: 35.6% [30.8%–40.3%]). The readability was at the high school level (mean [95% CI]: 9.2 [8.8–9.6] grade), and the average length of responses was 301 (95% CI: 283–319) words. The performance of quality, actionability, and readability did not differ significantly by free and paywall versions (p > 0.05), although paywall versions provided lengthier responses (p < 0.001).
FIGURE 1.

The comparisons (means [95% confidence intervals]) of free versus paywall versions in responding to most searched spanish cancer questions. Dots refer to means, and the error bars refer to 95% confidence intervals. The values (means [95% confidence intervals]) for overall, free, and paywall versions were provided in the plots. The symbol * indicates the difference as statistically significant (p < 0.001).
Similar findings were observed within each AI type (Table S2). However, the paywall version of ChatGPT performed worse than its free version in the actionability (mean [95% CI]: 22.7% [10.2%–35.1%] vs. 44.0% [33.6%–54.4%]; p = 0.006). In contrast, a moderate increase in actionability was observed for the paywall version of Claude (p = 0.02), although its actionability remained low (mean [95% CI]: 33.3% [21.7%–44.9%]). Despite such differences, six AI chatbots provided generally accurate information for most searched Spanish cancer‐related questions. The responses were not readily actionable and were written at the high‐school level, which was not concordant with the American Medical Association (AMA) recommendation (6th grade or lower). Their performance did not improve markedly by using paywall versions.
4. Discussion
We were initially concerned about the performance of AI chatbot responses to Spanish cancer questions: AI chatbots were predominantly trained on English materials, due to the abundance and accessibility of English corpora in biomedical science as well as technology, literature, and everyday communications [8, 9]. However, AI chatbots (ChatGPT 3.5 and 4) were reported to pass the Spanish Medical Residency Entrance Examination (MIR) and achieve 100% correctness in the Oncology Specialty section, although only four oncology questions were tested in the MIR [16]. Our study found that six popular AI chatbots generated good‐quality Spanish responses (mean [95% CI]: 3.5 [3.4–3.6] points), and such a score was comparable to prior studies of AI chatbots in responding to English cancer questions [3, 4, 5, 6, 7].
Despite good quality, we observed low actionability that may affect decision‐making and clinical actions among Hispanics/Latinos, such as seeking healthcare for cancer‐related symptoms and adhering to cancer treatment. Such low actionability was also observed in responding to English cancer question [3], suggesting that AI chatbots may need general improvement in helping patients take appropriate actions for cancer management. Interestingly, we noticed the ChatGPT paywall version performed worse than its free version in actionability. Although it may be a chance finding, there is another possible explanation: a paywall version typically runs a larger and superior AI model requiring stronger medical‐safety guardrails, which may consequently result in lower actionability [17]. In addition, we found the responses were written at high‐school‐level readability and exceeded the AMA recommendation of sixth grade or lower, especially considering that 7 in 10 foreign‐born Hispanics/Latinos finished high school or less [18]. This gap is concerning because patient care‐seeking behaviors and adherence to clinical recommendations can be notably affected by patient‐facing materials [19]. Further improvement in readability can move towards reducing cancer health disparities and health literacy barriers among Hispanics/Latinos.
Initially, we were also concerned about the paywall as another barrier to accessing the most advanced AI technologies, especially for low‐income Hispanics/Latinos. However, we did not find that paywall versions markedly outperform free versions, although they generated lengthier responses. Our findings aligned with a prior study of using free and paywall ChatGPT versions in answering English cancer questions [5], and text simplification should be considered in AI technology development meanwhile ensuring accurate and actionable information to answer cancer questions.
In addition to answering cancer questions, there is increasing interest in applying AI chatbots to a variety of healthcare‐related domains (such as academic writing in biomedical research, mental health assessments, health promotion, and nutrition). However, the performance of AI chatbots is not always satisfactory and may vary across these domains, highlighting the need for supervision and improvement from both healthcare professionals and AI developers.
To our knowledge, this is the first study to evaluate the performance of AI chatbots in responding to the most searched Spanish cancer questions. One unique strength of our study is that board‐certified oncologists speaking native Spanish independently assessed all responses. However, our study has several limitations. First, we limited our study to the most searched Spanish questions in the US according to Google Trends, because such most searched queries were not publicly available by AI chatbots. Second, we primarily focused on the top 3 most common cancers in US Hispanics/Latinos, and future studies could consider a full spectrum of cancer types. Third, AI chatbots are rapidly evolving and their performance may be improved via using appropriate prompts [5], and thus future studies should consider testing more AI chatbots and a variety of prompts that may help Spanish‐speaking patients get better responses.
5. Conclusion
In conclusion, our findings suggested AI chatbots have the potential to generate good‐quality responses to Spanish cancer questions, regardless of free or paywall versions. However, their responses need improvement in actionability and readability to further benefit Spanish‐speaking patients with cancer. In the era of digital health, further research is needed to investigate the impact of AI chatbot implementation on health disparities across different minority populations.
Author Contributions
En Cheng: conceptualization, data curation, funding acquisition, formal analysis, investigation, methodology, project administration, resources, software, supervision, validation, visualization, writing – original draft, writing – review and editing. Jesus D. Anampa: data curation, methodology, project administration, resources, writing – review and editing. Carolina Bernabe‐Ramirez: data curation, methodology, project administration, resources, writing – review and editing. Juan Lin: data curation, formal analysis, software, visualization, writing – review and editing. Xiaonan Xue: data curation, funding acquisition, formal analysis, software, visualization, writing – review and editing. Carmen R. Isasi: funding acquisition, writing – review and editing. Alyson B. Moadel‐Robblee: writing – review and editing. Edward Chu: funding acquisition, supervision, writing – review and editing.
Ethics Statement
This study obtained Institutional Review Board's approval from Albert Einstein College of Medicine (Ethics Approval Number: 2024‐15752).
Consent
The authors have nothing to report.
Conflicts of Interest
The authors declare no conflicts of interest.
Supporting information
Supporting Information Methods. Linear mixed‐effect models.
Table S1: Top 5 most searched Spanish cancer questions in the US extracted from Google Trends (from January 1, 2020 to January 1, 2024).
Table S2: The comparison of free vs. paywall versions within each AI chatbot type.
Acknowledgments
This study was supported by start‐up funding from Albert Einstein College of Medicine and Montefiore Einstein Comprehensive Cancer Center, the NCI P30 Cancer Center Support Grant (P30CA013330) to the Montefiore Einstein Comprehensive Cancer Center, the funding from the National Center for Advancing Translational Sciences (UM1TR004400) to the Harold and Muriel Block Institute for Clinical and Translational Research. Funders did not contribute to the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; or decision to submit the manuscript for publication.
Cheng E., Anampa J. D., Bernabe‐Ramirez C., et al., “Artificial Intelligence Chatbots and Their Responses to Most Searched Spanish Cancer Questions,” Cancer Medicine 14, no. 21 (2025): e71364, 10.1002/cam4.71364.
Funding: This work was supported by Albert Einstein College of Medicine, Montefiore Einstein Comprehensive Cancer Center, National Cancer Institute (P30CA013330), and National Center for Advancing Translational Sciences (UM1TR004400).
En Cheng, Jesus D. Anampa and Carolina Bernabe‐Ramirez contributed equally as co‐first authors.
Data Availability Statement
De‐identified data may be requested from En Cheng (en.cheng@einsteinmed.edu) and Juan Lin (juan.lin@einsteinmed.edu). A review process includes verifying the availability of data, conducting a review of any existing agreements that may have implications for the project, and ensuring that any transfer is in compliance with the Institutional Review Board. The investigator will be required to sign a data release form prior to the transfer.
References
- 1. Fareed N., Swoboda C. M., Jonnalagadda P., and Huerta T. R., “Persistent Digital Divide in Health‐Related Internet Use Among Cancer Survivors: Findings From the Health Information National Trends Survey, 2003–2018,” Journal of Cancer Survivorship 15 (2021): 87–98. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Hopkins A. M., Logan J. M., Kichenadasse G., and Sorich M. J., “Artificial Intelligence Chatbots Will Revolutionize How Cancer Patients Access Information: ChatGPT Represents a Paradigm‐Shift,” JNCI Cancer Spectrum 7, no. 2 (2023): pkad010, 10.1093/jncics/pkad010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Pan A., Musheyev D., Bockelman D., Loeb S., and Kabarriti A. E., “Assessment of Artificial Intelligence Chatbot Responses to Top Searched Queries About Cancer,” JAMA Oncology 9, no. 10 (2023): 1437–1440. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Chen S., Kann B. H., Foote M. B., et al., “Use of Artificial Intelligence Chatbots for Cancer Treatment Information,” JAMA Oncology 9, no. 10 (2023): 1459–1462, 10.1001/jamaoncol.2023.2954. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Musheyev D., Pan A., Gross P., et al., “Readability and Information Quality in Cancer Information From a Free vs Paid Chatbot,” JAMA Network Open 7, no. 7 (2024): e2422275, 10.1001/jamanetworkopen.2024.22275. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Roldan‐Vasquez E., Mitri S., Bhasin S., et al., “Reliability of Artificial Intelligence Chatbot Responses to Frequently Asked Questions in Breast Surgical Oncology,” Journal of Surgical Oncology 130, no. 2 (2024): 188–203, 10.1002/jso.27715. [DOI] [PubMed] [Google Scholar]
- 7. Johnson S. B., King A. J., Warner E. L., Aneja S., Kann B. H., and Bylund C. L., “Using ChatGPT to Evaluate Cancer Myths and Misconceptions: Artificial Intelligence and Cancer Information,” JNCI Cancer Spectrum 7, no. 2 (2023): pkad015, 10.1093/jncics/pkad015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Lai V. D., Ngo N. T., Veyseh A. P. B., et al., “ChatGPT Beyond English: Towards a Comprehensive Evaluation of Large Language Models in Multilingual Learning,” Findings of the Association for Computational Linguistics: EMNLP 2023 (2023): 13171–13189. [Google Scholar]
- 9. Nicholas G. and Bhatia A., Lost in Translation: Large Language Models in Non‐English Content Analysis (Center for Democracy & Technology, 2023), https://cdt.org/wp‐content/uploads/2023/05/non‐en‐content‐analysis‐primer‐051223‐1203.pdf. [Google Scholar]
- 10. Miller K. D., Ortiz A. P., Pinheiro P. S., et al., “Cancer Statistics for the US Hispanic/Latino Population, 2021,” CA: A Cancer Journal for Clinicians 71, no. 6 (2021): 466–487. [DOI] [PubMed] [Google Scholar]
- 11. Forbes , “Top AI Chatbots in 2024: Choosing the Ideal Bot For Your Business,” (2024), https://www.forbes.com/sites/digital‐assets/2023/12/19/top‐ai‐chatbots‐in‐2024‐choosing‐the‐ideal‐bot‐for‐your‐business/.
- 12. Charnock D., Shepperd S., Needham G., and Gann R., “DISCERN: An Instrument for Judging the Quality of Written Consumer Health Information on Treatment Choices,” Journal of Epidemiology and Community Health 53, no. 2 (1999): 105–111, 10.1136/jech.53.2.105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Shoemaker S. J., Wolf M. S., and Brach C., “Development of the Patient Education Materials Assessment Tool (PEMAT): A New Measure of Understandability and Actionability for Print and Audiovisual Patient Information,” Patient Education and Counseling 96, no. 3 (2014): 395–403, 10.1016/j.pec.2014.05.027. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Fernández Huerta J., “Medidas Sencillas de Lecturabilidad,” Consigna 214 (1959): 29–32. [Google Scholar]
- 15. Koo T. K. and Li M. Y., “A Guideline of Selecting and Reporting Intraclass Correlation Coefficients for Reliability Research,” Journal of Chiropractic Medicine 15, no. 2 (2016): 155–163, 10.1016/j.jcm.2016.02.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Guillen‐Grima F., Guillen‐Aguinaga S., Guillen‐Aguinaga L., et al., “Evaluating the Efficacy of ChatGPT in Navigating the Spanish Medical Residency Entrance Examination (MIR): Promising Horizons for AI in Clinical Medicine,” Clinics and Practice 13, no. 6 (2023): 1460–1487, 10.3390/clinpract13060130. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Gilbert S. and Kather J. N., “Guardrails for the Use of Generalist AI in Cancer Care,” Nature Reviews. Cancer 24, no. 6 (2024): 357–358, 10.1038/s41568-024-00685-8. [DOI] [PubMed] [Google Scholar]
- 18. Pew Research Center , “Educational Attainment of Hispanic population in the U.S.,” (2021), https://www.pewresearch.org/chart/us‐hispanics‐education/.
- 19. Nutbeam D. and Lloyd J. E., “Understanding and Responding to Health Literacy as a Social Determinant of Health,” Annual Review of Public Health 42 (2021): 159–173. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Supporting Information Methods. Linear mixed‐effect models.
Table S1: Top 5 most searched Spanish cancer questions in the US extracted from Google Trends (from January 1, 2020 to January 1, 2024).
Table S2: The comparison of free vs. paywall versions within each AI chatbot type.
Data Availability Statement
De‐identified data may be requested from En Cheng (en.cheng@einsteinmed.edu) and Juan Lin (juan.lin@einsteinmed.edu). A review process includes verifying the availability of data, conducting a review of any existing agreements that may have implications for the project, and ensuring that any transfer is in compliance with the Institutional Review Board. The investigator will be required to sign a data release form prior to the transfer.
