Abstract
Background
Artificial Intelligence has increasingly been integrated into clinical practice, yet its adoption and perception among medical professionals remain poorly understood, particularly in the Italian healthcare system. To investigate Italian physicians' knowledge, attitudes, and clinical concordance with AI-generated diagnostic recommendations, using a validated questionnaire and a clinical scenario processed by ChatGPT.
Methods
A national, cross-sectional web-based survey was conducted among 587 Italian physicians using an online validated questionnaire. The first part of the questionnaire assessed self-reported knowledge, prior experience, attitudes, and willingness to adopt AI in medicine. The second part assessed clinical concordance between AI proposals and physicians about clinical cases evaluated by ChatGPT.
Results
Most participants reported basic AI knowledge (n = 380, 64.8%) and minimal exposure to AI training (18.4%). Only 21.6% reported they used AI in clinical practice, and the most familiar application was diagnostic imaging (35.4% of AI users; 7.7% of the total sample). Major perceived barriers included lack of training (76.7%) and resistance to change (50.9%). In the universal clinical scenario, physicians showed the highest agreement with ChatGPT's correct diagnosis (mean = 4.07) compared to incorrect alternatives (mean = 2.57 and 1.82, p < 0.001). For correct diagnosis, the agreement rate was very high at 89% [86%–91%].
Conclusion
Italian physicians showed a strong interest in adopting AI tools, despite significant knowledge gaps and limited practical experience. The high concordance between physicians' evaluations and ChatGPT's diagnostic insights suggests potential for AI-based decision support in clinical workflows. Targeted training and institutional support are essential to bridge the gap between enthusiasm and readiness for AI integration.
Keywords: artificial intelligence, ChatGPT, clinical decision support, diagnostic concordance, digital health, Italy, medical education, physicians
1. Introduction
Artificial Intelligence (AI) has significantly transformed the healthcare sector, evolving from an experimental tool to an increasingly integrated technology in clinical practice. AI's capacity to process vast amounts of data, identify complex diagnostic patterns, and support decision-making drives a paradigm shift in medical approaches, with substantial implications for healthcare quality and organisation (1). The successful implementation of this technology relies not just on advancements but also on healthcare professionals' acceptance and integration of AI.
While countries at the forefront of technological innovation, such as the United States and China, are allocating substantial resources to AI development and implementation (2), the challenge for other nations lies in bridging the resulting gap to benefit from these technologies and contribute to their advancement.
Italy offers a distinct perspective on healthcare AI adoption in Europe. In fact, Italy has one of the largest healthcare systems in Europe, and its investment in healthcare AI technologies is still limited compared to other EU countries. This funding gap presents significant obstacles to the widespread adoption of AI in clinical settings nationwide (3, 4).
Several pioneering AI applications have emerged in Italian healthcare institutions despite limited resources. Indeed, Italy has experienced a significant increase in AI research, particularly in medical imaging (MRI, CT, radiography), with a focus on targeting neurological diseases and cancer diagnosis. Most studies focus on machine learning for classification and segmentation tasks, showcasing an active research community (5).
At the Fondazione IRCCS Istituto Nazionale dei Tumori in Milan, an AI diagnostic imaging system has been implemented that detects early signs of lung cancer with over 90% accuracy, reducing diagnostic time by approximately 30% (6). Similarly, the IRCCS Ospedale San Raffaele has deployed AI systems for retinal image analysis that achieve 93% sensitivity in detecting diabetic retinopathy (7). However, these implementations remain concentrated in a few academic medical centres and have not been widely disseminated throughout the national healthcare system (8).
Despite the growing interest in AI applications in healthcare, limited empirical research has explored medical professionals' knowledge, attitudes, and clinical agreement with AI-generated recommendations, particularly in the Italian context (9, 10).
Previous studies have highlighted a disparity in physicians' understanding and acceptance of AI due to cultural, educational, and regulatory factors. Research on AI perception among healthcare providers has shown that, while many professionals recognise AI's potential to enhance diagnostic and therapeutic efficiency, concerns about patient privacy, algorithmic transparency, and professional autonomy persist (11).
A systematic review of 60 studies involving over 750 physicians worldwide revealed that while more than 60% were optimistic about AI, only 15% considered it more accurate than human clinicians, with 68% believing AI should complement rather than replace physician judgment (12).
Despite this global research, a significant knowledge gap remains regarding Italian physicians' familiarity, experience, and confidence in AI applications. A recent survey by the Italian Society of Radiology (SIRM) found that while 78% of Italian radiologists expressed interest in AI technologies, only 23% reported having received formal training, and merely 17% felt confident in their ability to critically evaluate AI-based diagnostic tools (13). Research has identified several barriers to AI adoption in the Italian healthcare context, including insufficient training (cited by 72% of physicians), limited access to appropriate technologies (68%), concerns about data security (65%), and the absence of clear guidelines on AI implementation in clinical settings (58%). These barriers underscore the need for targeted interventions to facilitate the integration of AI into medical practice (14).
The study aims to address this gap by systematically analysing Italian physicians' knowledge, attitudes, and level of clinical agreement with AI-generated diagnostic recommendations. The research explores physicians' digital literacy, trust in AI technologies, and willingness to integrate them into clinical practice.
A key aspect of the research is the assessment of concordance between physicians and AI-generated diagnostic evaluations. The investigation, in fact, includes a generic clinical scenario and various speciality-specific cases, enabling an evaluation of how physicians perceive AI-suggested diagnoses and the factors influencing their level of agreement. Additionally, analysing physicians' prior experiences with AI, their training, and perceived obstacles provides a detailed understanding of the barriers limiting technology adoption.
The research represents a crucial step in understanding Italy's position in the global context of AI in medicine and outlining strategies to promote informed and safe adoption of these tools. The value of this study lies in its ability to provide concrete data on a highly relevant but under-explored topic in Italy, with the potential to inform both policy decisions and educational initiatives.
The research's findings could contribute to the development of tailored interventions to overcome identified barriers and facilitate the integration of appropriate AI into clinical practice. Furthermore, the study results will help bridge the gap between Italy's considerable medical expertise and the emerging field of healthcare AI, potentially accelerating the country's progress in this strategically important area (3).
2. Methods
2.1. Study design and setting
This is an observational, cross-sectional, web-based study design to investigate Italian physicians' knowledge, attitudes, and perceptions of AI use in medicine, and to explore the level of clinical agreement between AI-generated diagnostic recommendations and those of physicians.
The study population consisted of physicians in Italy, including specialists, trainees, and general practitioners from various clinical fields and regions. Essential participation criteria included registration with the Italian Medical Council (15) at the time of the study. Exclusion criteria encompassed non-graduated or unlicensed medical students, non-physician healthcare professionals, and individuals who did not provide informed consent. The invitation to participate in the study was accompanied by a brief description of the survey, highlighting its objectives and scientific significance.
Before participating in the study, everyone received a digital informed consent form. Only after providing explicit consent could participants proceed with completing the questionnaire. The survey was deployed via LimeSurvey (v6.6.1). A comprehensive pre-launch testing phase ensured technical reliability and compatibility.
Data access was limited exclusively to authorised researchers. Participants had the right to withdraw from the study at any time.
2.2. Questionnaire structure
Data were collected from September 2024 to March 2025 using a structured and validated survey instrument: the I-KAPCAM-AI-Q (Knowledge, Attitudes, Practice, and Clinical Agreement between Medical Doctors and Artificial Intelligence Questionnaire) (16). Content validity was high (CVI = 0.98) (17). Internal consistency for the attitudinal Likert scale (8 items) was acceptable (Cronbach's α = 0.748), while dichotomous items showed strong reliability (KR-21 = 0.832) (18, 19).
The questionnaire comprised two sections:
General assessment, with six domains covering demographics, AI knowledge, experience, attitudes, and willingness to use/learn about AI (including open-ended feedback).
Clinical agreement, presenting real anonymised clinical scenarios submitted to ChatGPT, and evaluating physicians' agreement with AI-generated diagnoses using a 5-point Likert scale (1 = strongly disagree, 5 = strongly agree). This section included a universal clinical scenario that all participants could respond to, as well as seven other specific scenarios tailored to participants from the following medical specialities: Hygiene and Preventive Medicine, Infectious Diseases, Obstetrics and Gynaecology, Anesthesiology, Geriatrics, Occupational Health, and Endocrinology.
It is important to clarify that ChatGPT is a general-purpose conversational AI and is not classified or certified as a medical device. Within the framework of this study, it functioned strictly as a research instrument to generate diagnostic hypotheses. The objective was to simulate and reflect contemporary real-world interactions between clinicians and widely accessible, non-specialized AI tools, rather than to utilize the platform for direct clinical decision support. The study utilized ChatGPT Version GPT-3.5 (OpenAI), accessed via the web interface during the 2024 period.
This paper reports findings from the universal clinical scenario—a general case describing a young adult with fatigue, fever, lymphadenopathy, and a genital lesion—which was completed by all participants regardless of specialty. Additional specialty-specific scenarios were included in the survey but are not analysed in this paper.
The prompt provided to ChatGPT was structured to simulate a clinical consultation request. The scenario was structured as follows: “A 32-year-old Caucasian patient has reported asthenia and evening fever for about 10 days. In medical history: cocaine addiction, history of stage IV Hodgkin lymphoma treated with chemotherapy in complete remission at the last follow-up in 2023, unprotected sexual intercourse, previous HPV infection. During the physical exam, it was determined that there were inguinal lymphadenopathy and a single genital ulcerative lesion in the balano-preputial area.”
To mitigate the risk of AI-generated hallucinations or inaccuracies, several quality control measures were implemented. First, all diagnostic recommendations produced by ChatGPT underwent independent blinded review by three experienced clinicians to ensure medical accuracy and plausibility. Any suggestions identified as inconsistent or medically unsound were flagged and subsequently excluded from the analysis. Furthermore, to evaluate the reliability and stability of the outputs, the clinical scenario was processed through ChatGPT in three separate iterations (n = 3), allowing for a formal assessment of response consistency.
The final section of the questionnaire collected qualitative observations and comments through open-ended feedback.
2.3. Participant recruitment
A non-probabilistic Snowball sampling method was adopted. This method is based on a cascade recruitment mechanism, where initial participants invite other colleagues to participate in the study, thus progressively expanding the sample. This strategy proved particularly effective in reaching many physicians operating across the national territory, belonging to different age groups and specialisations, while ensuring sample heterogeneity (20). Various strategies were employed to achieve a broad and representative sample of the medical population, combining digital and traditional methods. First, dissemination was leveraged through social media, with particular attention to professional groups and communities on platforms such as Facebook and LinkedIn. In parallel, recruitment took place during training events, such as congresses, seminars, and refresher courses. Another method employed was word-of-mouth among colleagues, utilising participants' professional networks to expand the study's dissemination.
2.4. Sample size and statistical analysis
Based on a population of 439,957 Italian physicians (15), with a 5% error margin and 95% confidence interval, the estimated minimum sample size was 384 participants, calculated using the Raosoft sample size calculator (21). However, due to the challenges of web surveys, including potential data inconsistencies and incomplete responses, we expected a dropout rate of approximately 25%. This led us to target an optimal sample size of about 500 participants to ensure adequate statistical power and representativeness even after excluding invalid responses. Web surveys present methodological challenges, including the lack of direct control over respondent identity, the possibility of unreliable responses, and the risk of selecting participants with a greater predisposition to technology, which can generate potential biases. To mitigate these risks, specific methodological measures were implemented. Completion time was controlled, excluding responses with a completion time incompatible with carefully reading the questionnaire. Data were also analysed for inconsistencies between correlated responses to identify participants who had completed the questionnaire superficially or randomly. Initially, the survey obtained 736 responses. Following quality control procedures, records with completion times significantly shorter than the estimated minimums required for thoughtful engagement with the questionnaire content were excluded. The final analysis sample consisted of 587 participants who responded to the first part of the survey and 529 who responded to the proposed scenario (22).
All variables were analysed and reported with absolute frequencies and percentages or means and standard deviations (SD) depending on the variable's nature. Comparisons between categorical variables were made with the chi-square test, while those between continuous variables were conducted through Student's t-test for independent samples or its non-parametric equivalent.
To assess clinical agreement between physicians and AI-generated diagnostic suggestions, we analysed physicians' evaluations of three diagnoses proposed by ChatGPT in response to the universal clinical scenario. Descriptive statistics were used to summarise the distribution of agreement scores for each diagnosis. Agreement rates were calculated as the proportion of physicians selecting “agree” or “strongly agree” for each diagnosis, with 95% confidence intervals. The Friedman test for related samples was used to compare agreement levels across the three diagnoses. post-hoc pairwise comparisons were performed using the Durbin-Watson test. To measure diagnostic discrimination, we identified physicians who both strongly agreed with the correct diagnosis (score ≥4) and rejected incorrect diagnoses. Statistical significance was assessed using an alpha level of 0.05. All analysis was conducted with Stata 18 software.
3. Results
The demographic characteristics of participants revealed a relatively balanced gender distribution with a slight female predominance (57.2% female, 42.4% male) (Table 1). The mean age was 42.7 years (SD = 13.6), with participants reporting an average of 14.1 years of clinical practice experience (SD = 13.2). Geographical distribution showed a concentration in Central Italy (58.8%), followed by Southern Italy and the Islands (28.6%), and Northern Italy (12.6%). Regarding professional status, 60.5% were specialist physicians and 39.5% were residents. The most represented medical specialisations were Hygiene and Preventive Medicine (16.0%), Gynaecology and Obstetrics (11.4%), and Anesthesiology and Intensive Care (10.6%). Other notable specialities included Geriatrics (7.3%), Occupational Medicine (7.5%), Endocrinology and Metabolic Diseases (6.1%) and Infectious and Tropical Diseases (6.1%). The remaining 35.0% represented various other specialisations. Only 18.4% of participants reported receiving specific training in digital or information technologies during their education, highlighting a substantial educational deficit in this rapidly evolving field.
Table 1.
Demographic characteristics of study participants.
| Characteristic | N = 587 |
|---|---|
| Gender | N (%) |
| Male | 249 (42.4%) |
| Female | 336 (57.2%) |
| Other | 2 (0.3%) |
| Age (years) | 42.726 (13.641) |
| Geographic area | N (%) |
| North | 74 (12.6%) |
| Central | 345 (58.8%) |
| South and Islands | 168 (28.6%) |
| Medical specialisation | N (%) |
| Geriatrics | 43 (7.3%) |
| Endocrinology and Metabolic Diseases | 36 (6.1%) |
| Infectious and Tropical Diseases | 36 (6.1%) |
| Gynaecology and Obstetrics | 67 (11.4%) |
| Anesthesiology and Intensive Care | 62 (10.6%) |
| Hygiene and Preventive Medicine | 94 (16.0%) |
| Occupational Medicine | 44 (7.5%) |
| Other specialisations | 205 (35.0%) |
| Professional status | N (%) |
| Specialist physician | 355 (60.5%) |
| Resident | 232 (39.5%) |
| Years of practice | 14.109 (13.221) |
| Specific training in digital or information technologies | N (%) |
| No | 479 (81.6%) |
| Yes | 108 (18.4%) |
Data are presented as N (%) for categorical variables and mean (SD) for continuous variables.
Table 2 shows that most physicians (64.8%) have only basic knowledge, while 21.6% have no knowledge at all, according to their self-assessment of AI knowledge. Concerning familiarity with specific AI applications, diagnostic imaging was the most recognised (47.2%), likely reflecting the earlier adoption and more widespread implementation of AI in radiology and dermatology. Other relatively familiar applications included report interpretation (33.2%), analysis of electronic health records (27.9%), and clinical decision support systems (25.4%).
Table 2.
Knowledge and familiarity with AI in medicine (N = 587).
| Variable | N (%) |
|---|---|
| Self-assessed knowledge of artificial intelligence | |
| No knowledge | 127 (21.6%) |
| Basic knowledge | 380 (64.8%) |
| Intermediate knowledge | 67 (11.4%) |
| Advanced knowledge | 13 (2.2%) |
| Familiarity with AI applications in medicine | |
| Diagnostic imaging (radiology, dermatology) | 277 (47.2%) |
| Clinical decision support systems | 149 (25.4%) |
| Care planning and management | 63 (10.7%) |
| Patient data monitoring and analysis | 147 (25.0%) |
| Report interpretation (ECG, EEG, etc.) | 195 (33.2%) |
| Machine learning in diagnosis | 95 (16.2%) |
| Machine learning and Big Data for new drug development | 56 (9.5%) |
| Virtual nursing assistants | 10 (1.7%) |
| Chatbots for diagnostic queries | 136 (23.2%) |
| Monitoring, home care and predictive analysis | 44 (7.5%) |
| Personalised medicine | 50 (8.5%) |
| Improving genomic editing | 16 (2.7%) |
| Electronic health records and data analysis | 164 (27.9%) |
| AI-based robotic surgery procedures | 65 (11.1%) |
| None of the above | 162 (27.6%) |
The use of artificial intelligence instruments in clinical practice throughout their life was stated by 21.6% (127) of all interviewees. Analysis of AI adoption (instruments or applications) among medical professionals in the last year reveals its limited integration into clinical practice. Diagnostic applications exhibit the highest usage rates, with diagnostic imaging (7.7%), diagnostic chatbots (7.2%), and clinical decision support systems (6.3%) leading the way. Moderate adoption exists in report interpretation (4.8%), Electronic health records (EHR) data analysis (4.3%), and patient monitoring (3.9%). Most advanced AI applications demonstrate minimal penetration: machine learning for diagnosis (2.6%), personalised medicine (1.5%), home care monitoring (1.2%), robotic surgery (0.5%), and genomic/pharmaceutical applications (0.2%) (Table 3).
Table 3.
AI tools (instruments or applications) usage Among medical professionals in the last year before the interview.
| AI Tool/Application | ||
|---|---|---|
| Use of artificial intelligence instruments in clinical practice | 127/587 (21.6%) | |
| AI Tool/Application | All N = 587 | Usage (Yes) N = 127 |
| Diagnostic imaging (radiology, dermatology) | 45/587 (7.7%) | 45/127 (35.4%) |
| Clinical decision support systems | 37/587 (6.3%) | 37/127 (29.1%) |
| Care planning and management | 15/587 (2.6%) | 15/127 (11.8%) |
| Patient data monitoring and analysis | 23/587 (3.9%) | 23/127 (18.1%) |
| Report interpretation (ECG, EEG, etc.) | 28/587 (4.8%) | 28/127 (22.0%) |
| Machine learning in diagnosis | 15/587 (2.6%) | 15/127 (11.8%) |
| Drug discovery and development | 1/587 (0.2%) | 1/127 (0.8%) |
| Virtual nursing assistants | 1/587 (0.2%) | 1/127 (0.8%) |
| Diagnostic chatbots | 42/587 (7.2%) | 42/127 (33.1%) |
| Home care monitoring and predictive analysis | 7/587 (1.2%) | 7/127 (5.5%) |
| Personalised medicine | 9/587 (1.5%) | 9/127 (7.1%) |
| Genomic editing enhancing | 1/587 (0.2%) | 1/127 (0.8%) |
| Electronic health records (EHR) and data analysis | 25/587 (4.3%) | 25/127 (19.7%) |
| AI-based robotic surgery procedures | 3/587 (0.5%) | 3/127 (2.4%) |
The “Usage (Yes)” column pertains exclusively to participants who reported the use of artificial intelligence instruments in clinical practice.
As reported in Table 4, analysis of the Italian Physicians' attitudes toward Artificial Intelligence revealed predominantly positive attitudes toward AI among Italian physicians. The highest agreement was observed regarding the need for specific AI training (92.3% agreed/strongly agreed, mean = 4.29 ± 0.72). Strong positive perceptions were also found in AI's potential to aid in differential diagnosis (77.7% agreed/strongly agreed, mean = 3.84 ± 0.66) and improve prescription of diagnostic tests (68.8% agreed/strongly agreed, mean = 3.68 ± 0.74). Physicians showed moderate optimism about AI enhancing workflow efficiency, with 65.4% believing AI could improve patient management (mean = 3.61 ± 0.85) and 61.8% acknowledging AI's potential to reduce workload (mean = 3.63 ± 0.91). Participants did not perceive AI as a threat to their professional role (56.4% disagreed/strongly disagreed, mean = 2.45 ± 0.95), 31.5% expressed neutrality.
Table 4.
Italian physicians' attitudes toward artificial intelligence in medicine n = 587.
| Statement | Strongly Disagree | Disagree | Neutral | Agree | Strongly Agree | Median [IQR] | Mean ± SD |
|---|---|---|---|---|---|---|---|
| n (%) | n (%) | n (%) | n (%) | n (%) | |||
| AI can aid in differential diagnosis | 4 (0.68) | 16 (2.73) | 111 (18.91) | 394 (67.12) | 62 (10.56) | 4 [4-4] | 3.84 ± 0.66 |
| AI can improve therapeutic prescriptions | 7 (1.32) | 42 (7.94) | 132 (24.95) | 302 (57.09) | 46 (8.70) | 4 [3-4] | 3.64 ± 0.80 |
| AI can improve the prescription of diagnostic and laboratory tests | 8 (1.36) | 31 (5.28) | 144 (24.53) | 364 (62.01) | 40 (6.81) | 4 [3-4] | 3.68 ± 0.74 |
| AI can reduce physicians' workload | 9 (1.53) | 63 (10.73) | 152 (25.89) | 279 (47.53) | 84 (14.31) | 4 [3-4] | 3.63 ± 0.91 |
| AI can improve efficiency in patient management | 12 (2.04) | 53 (9.03) | 138 (23.51) | 331 (56.39) | 53 (9.03) | 4 [3-4] | 3.61 ± 0.85 |
| AI represents a threat to the role of physicians | 85 (14.48) | 246 (41.91) | 185 (31.52) | 52 (8.86) | 19 (3.24) | 2 [2–3] | 2.45 ± 0.95 |
| Using AI requires specific additional training for physicians | 6 (1.02) | 7 (1.19) | 32 (5.45) | 306 (52.13) | 236 (40.20) | 4 [4–5] | 4.29 ± 0.72 |
As reported in Table 5, the most striking finding is the prominent role of education and training. “Lack of training” is identified as the primary barrier by 76.7% of physicians, while “professional training and education” is overwhelmingly selected as the top incentive (85.7%). There was an alignment between identified barriers and corresponding incentives: the barrier of “lack of scientific evidence” (27.6%) corresponds with the incentive of “scientific evidence on efficacy and safety” (55.2%); “the cost of AI technologies” (42.9%) pairs with “the cost reduction” (28.8%), “the resistance to change” (50.9%) may be addressed through “the continuous technical support” (52.3%).
Table 5.
Incentives and barriers to AI adoption Among Italian physicians.
| Incentives for AI Adoption | ||
|---|---|---|
| Incentive | N | % |
| Professional training and education | 503 | 85.7 |
| Scientific evidence on efficacy and safety | 324 | 55.2 |
| Continuous technical support | 307 | 52.3 |
| Economic incentives and health policies | 168 | 31.8 |
| Cost reduction | 169 | 28.8 |
| Barriers to AI Adoption | ||
| Barrier | N | % |
| Lack of training | 450 | 76.7 |
| Resistance to change | 299 | 50.9 |
| Cost of AI technologies | 252 | 42.9 |
| Privacy and data security concerns | 203 | 34.6 |
| Lack of scientific evidence on efficacy | 162 | 27.6 |
The analysis reveals a striking readiness for AI adoption, with 81.9% of physicians reporting willingness to integrate AI into their clinical practice. Consistently, most physicians (93.5%) expressed interest in receiving AI training, with a preference for online courses (64.7%) and practical hands-on sessions (61.2%), followed by in-person seminars (47.6%) and webinars (29.0%). (Table 6).
Table 6.
Willingness and training preferences (n = 587).
| Category | Subcategory | N | Percentage (%) |
|---|---|---|---|
| Willingness to Integrate AI | Very willing | 153 | 26.0 |
| Willing | 328 | 55.9 | |
| Undecided | 78 | 13.3 | |
| Somewhat unwilling | 20 | 3.4 | |
| Not at all willing | 8 | 1.4 | |
| Interest in AI Training | Yes | 549 | 93.5 |
| No | 38 | 6.5 | |
| Preferred Training Modalities | Online courses | 357 | 64.7 |
| Practical hands-on training | 339 | 61.2 | |
| In-person seminars and workshops | 263 | 47.6 | |
| Webinars and conferences | 160 | 29.0 |
3.1. Clinical agreement between ChatGPT and physicians
To assess physician agreement with ChatGPT's diagnostic suggestions, Diagnosis_a was identified as the correct diagnosis, followed by Diagnosis_b and Diagnosis_c. The analysis reveals a clear hierarchy in physicians' level of agreement with the three diagnoses proposed by ChatGPT. Physicians show significantly higher agreement with Diagnosis_a (mean = 4.07, median = 4), which was the correct diagnosis and was proposed as the first option by ChatGPT. This is followed by lower agreement with Diagnosis_c (mean = 2.57, median = 2), while showing the lowest agreement with Diagnosis_b (mean = 1.82, median = 2). The Friedman test demonstrates significant differences in agreement levels among the three diagnoses (χ2 = 735, df = 2, p < 0.001). All pairwise comparisons using the Durbin-Conover test show statistically significant differences (all p < 0.001), confirming that physicians' agreement levels differ significantly across all three diagnoses. The physician responses for Diagnosis_a (the correct diagnosis) display the lowest variability (SD = 0.75, IQR = 0), which indicates strong consensus among medical professionals in favour of the correct diagnosis. Diagnosis_c's reactions have the most variability (SD = 1.05, IQR = 1) (Table 7, Figure 1).
Table 7.
Clinical agreement analysis between physicians and AI diagnoses for the universal scenario only.
| Descriptive statistics | Diagnosis_a | Diagnosis_b | Diagnosis_c |
|---|---|---|---|
| Mean (SD) | 4.07 (0.74) | 1.82 (0.81) | 2.57 (1.05) |
| Median (IQR) | 4 (0) | 2 (1) | 2 (1) |
| Friedman Test | χ2 = 735, df = 2, p < 0.001 | ||
| Pairwise Comparisons | Statistic | p-value | |
| Diagnosis_a-Diagnosis_b | 48.6 | < 0.001 | |
| Diagnosis_a-Diagnosis_c | 29.5 | < 0.001 | |
| Diagnosis_b-Diagnosis_c | 19.1 | < 0.001 |
Figure 1.
Clinical agreement distribution.
A deeper analysis of concordance patterns reveals that for Diagnosis_a, the agreement rate (agree or strongly agree) was 89% [95%CI: 86%–92%]. For Diagnosis_b, the agreement rate was very low at 4% [95%CI: 2%–6%], while the agreement rate was 23% [95%CI: 20%–27%] for Diagnosis_c. 68% [95%CI: 64%–72%] of physicians demonstrated clear diagnostic discrimination, both strongly agreeing with the correct diagnosis (score ≥4).
4. Discussion
This national cross-sectional web-based study investigated the knowledge, attitudes, and clinical concordance with AI-generated diagnostic recommendations among 587 Italian physicians, including both residents and trained medical doctors, conducted from September 2024 to March 2025. Our findings reveal significant knowledge gaps, strong interest in AI adoption, limited current usage patterns, and remarkably high clinical agreement between physicians and ChatGPT's diagnostic recommendations.
The following results provide important insights into AI integration's current state and future potential in Italian healthcare.
4.1. Knowledge gap and educational needs
Significant gaps in AI knowledge and training among Italian physicians were found. Only 18.4% of participants reported receiving specific training in digital or information technologies during their education, highlighting a substantial educational deficit in this rapidly evolving field. This aligns with findings from Blease et al. (23) and Sapci (24), who identified similar patterns of limited AI education across European healthcare systems. In a very recent synthesis of literature and case studies (25), the dual nature of AI—its promising benefits juxtaposed with significant challenges concerning privacy, security, autonomy, and freedom- is highlighted. Authors underscore the vital importance of public acceptance, normative frameworks, technological innovation, and international collaboration in effectively addressing these issues. Most physicians (64.8%) self-assessed as possessing only basic knowledge of AI, with 2.2% claiming advanced knowledge. This knowledge deficiency creates a barrier to adoption that needs to be addressed through targeted educational interventions. Similar knowledge gaps have been documented in other countries, suggesting a systemic challenge in medical education systems that have not yet fully incorporated digital health competencies into their curricula (24, 26–30).
4.2. Familiarity with AI applications
When examining familiarity with specific AI applications, diagnostic imaging emerged as the most recognised (47.2%), likely reflecting the earlier adoption and more widespread implementation of AI in radiology and dermatology (31). This finding is consistent with the literature, as imaging specialities have pioneered AI integration due to the structured nature of their data and clear pattern recognition tasks (1, 32). Other relatively familiar applications included report interpretation (33.2%), electronic health records analysis (27.9%), and clinical decision support systems (25.4%).
Despite the growing prominence of AI in healthcare, a substantial proportion (27.6%) of physicians reported no familiarity with any of the listed AI applications. Applications with the lowest recognition included virtual nursing assistants (1.7%), genomic editing (2.7%), and home care monitoring systems (7.5%), indicating areas where awareness campaigns might be particularly beneficial. This distribution of familiarity mirrors findings from Blease et al. (23), who documented similar patterns of awareness among primary care physicians across Europe and North America.
4.3. AI adoption patterns
The integration of AI into clinical practice was limited, with only 127 (22%) of respondents reporting the use of AI tools. The analysis revealed notable patterns, particularly the significant uptake of AI chatbots. Diagnostic chatbots [42/127 (33.1%)] represented one of the most widely adopted AI applications, nearly matching diagnostic imaging tools [45/127 (35.4%)] and surpassing most other AI technologies. This growing trend in chatbot utilisation aligns with recent systematic reviews that have documented the increasing implementation of conversational AI in clinical settings (33).
Clinical decision support systems [37/127 (29.1%)] follow closely, while other applications show moderate to minimal adoption, including report interpretation, EHR data analysis, and patient monitoring. The substantial adoption of chatbots relative to other technologies suggests that text-based, conversational AI interfaces may offer advantages in clinical workflows, potentially due to their accessibility, lower implementation barriers, and alignment with traditional consultation processes (34). Laymouna et al. (35) similarly observed that chatbots often serve as “gateway technologies” for broader healthcare AI adoption due to their intuitive interfaces and integration with existing clinical communication patterns.
4.4. Attitudes toward AI
Italian physicians recognise AI's potential value in clinical decision support while acknowledging the importance of proper training. The highest agreement was observed regarding the need for specific AI training (92.3% agreed/strongly agreed, mean = 4.29 ± 0.71), indicating widespread recognition that effective AI implementation requires dedicated education. This finding aligns with multinational surveys conducted by Pinto et al. (36), which documented similar educational priorities among European physicians. Most respondents did not perceive AI as a threat to their professional role (56.4% disagreed/strongly disagreed, mean = 2.45 ± 0.95), though 31.5% expressed neutrality on this issue, suggesting some uncertainty about long-term implications. Positive attitudes towards AI coexist with some concerns about its impact on the workforce. This indicates a practical approach to adopting AI, which can help integrate these technologies thoughtfully into healthcare systems (37, 38).
4.5. Barriers and incentives for AI adoption
The parallel structure between identified barriers and incentives suggests that physicians clearly understand both the challenges and potential solutions for AI integration. The high percentage of physicians citing “resistance to change” (50.9%) reflects the well-documented conservatism in medical practice, where practitioners are understandably cautious about adopting new technologies that might affect patient care. This finding is consistent with Jussupow et al. (39), who identified professional identity threats as a significant factor in resistance to AI adoption among medical professionals.
However, the equally high percentage seeking “continuous technical support” (52.3%) shows a willingness to overcome this resistance with proper assistance. Similar findings by Castagno et al. (40) emphasised the critical importance of ongoing technical support in successful healthcare AI implementations.
Privacy and data security concerns (34.6%) represent a moderate barrier, reflecting the growing awareness of the ethical and legal implications of AI in healthcare. This is particularly relevant in the European context, where strict GDPR regulations apply. Gille et al. (41) documented similar privacy concerns among healthcare providers in GDPR-regulated environments, highlighting the need for transparent AI governance frameworks.
While the cost of AI technologies is cited as a significant barrier (42.9%), economic incentives (31.8%) and cost reduction (28.8%) rank lowest among potential incentives. This suggests that Italian physicians are more motivated by professional improvement and evidence-based practice than by financial considerations, a pattern also observed by Liyanage et al. (42) in their analysis of Physician Technology Adoption Drivers. Professional training and education (85.7%) emerged as the strongest incentive, together with scientific evidence on AI efficacy and safety (55.2%), underscoring physicians' preference for knowledge-based and evidence-driven support over purely economic motivations.
4.6. Readiness for AI integration
Data results reveal a striking readiness for AI adoption, with 81.9% of physicians reporting willingness to integrate AI into their clinical practice. This openness is particularly noteworthy given the identified barriers, primarily lack of training (76.7%) and resistance to change (50.9%).
The findings highlight a critical knowledge-action gap: while physicians recognise AI's potential, they clearly identify education as both the primary barrier and solution. An overwhelming 93.5% express interest in AI training programs, with a preference for accessible formats, such as online courses (64.7%) and hands-on training (61.2%).
These results suggest that Italian healthcare is poised for AI transformation, provided that appropriate educational infrastructure and evidence-based implementation strategies are prioritised. Similar patterns of readiness coupled with educational needs have been documented by Huisman et al. (43) across multiple European healthcare systems.
4.7. AI-Physician diagnostic concordance
The analysis shows a significant alignment between physicians' consensus and ChatGPT's diagnostic prioritisation. The clear hierarchy in agreement levels (4.06 for the correct diagnosis vs. 2.57 and 1.82 for incorrect alternatives) suggests that ChatGPT successfully captures expert clinicians' differential diagnostic process. The statistical significance of the differences between all three diagnoses (p < 0.001) further confirms that physicians' evaluations clearly distinguished between the diagnostic options in a way that aligns with ChatGPT's proposal [89%, 95% CI: 86%–91%].
These findings align with a recent work by Nori et al. (44), who documented similar levels of concordance between LLM diagnostic outputs and physician judgement.
This concordance between physician judgement and AI-generated diagnostic recommendations has substantial implications for clinical practice and AI implementation in healthcare. It suggests that Large Language Models like ChatGPT demonstrate the ability to emulate human diagnostic reasoning processes, despite not being trained explicitly for medical diagnoses. From a practical perspective, this concordance opens concrete possibilities for integrating AI as decision support in daily clinical practice. In settings characterised by high workloads, such as emergency departments or primary care, LLM-based tools could serve as an automated “second opinion,” validating the physician's diagnostic orientation or suggesting alternative diagnoses worthy of consideration.
Thirunavukarasu et al. (45) similarly concluded that general-purpose LLMs show promise as clinical decision support tools, particularly in high-volume clinical environments.
The ability of ChatGPT to propose diagnoses aligned with collective medical judgment could be particularly valuable for less experienced clinicians, including physicians in training or professionals working outside their specialisation area. AI could thus represent a tool to support continuing education, and a mechanism to reduce variability in clinical decisions, as suggested by Magrabi et al. (46) in their analysis of AI as an educational adjunct in medical training. However, the implications of this concordance should also be considered in terms of potential risks.
Excessive confidence in the alignment between AI and medical judgment could lead to “collective blindness” in atypical or rare cases, where both AI and physicians might share the same biases or knowledge limitations. Paradoxically, cases where ChatGPT showed less concordance with physicians might represent the most interesting areas for mutual learning and improvement of diagnostic systems. Daneshjou et al. (47) highlighted similar concerns regarding potential reinforcement of existing biases when AI and human judgment are overly concordant.
From a regulatory and organisational perspective, this significant concordance provides empirical evidence supporting the implementation of AI systems as augmentation tools, rather than replacements for clinical judgment. The fact that physicians and AI converge on similar diagnoses suggests the possibility of developing integrated clinical workflows where AI plays a complementary role, validating and enriching the medical decision-making process.
4.8. Strengths and limitations
This study presents several strengths, including the combined assessment of knowledge, attitudes, and clinical concordance in one investigation, which provides a holistic view of AI integration challenges. The innovative methodology of using AI-generated diagnoses and measuring physician concordance provides concrete data on the real-world utility of AI, an approach that addresses the gap between theoretical AI capabilities and practical clinical applications highlighted by Sendak et al. (48). However, several limitations should be considered when interpreting these results. First, the convenience sampling method, the snowball technique, does not guarantee the sample's representativeness compared to the entire Italian physician population. Consequently, the results cannot be generalised to the national population with full reliability. In fact, given the non-probabilistic snowball sampling approach, the sample cannot be considered statistically representative of the entire population of Italian physicians. Consequently, the results cannot be fully generalised to the national level due to potential imbalances across different medical specialities, geographic regions, and levels of seniority. Additionally, there may have been a selection bias related to a greater propensity to participate among subjects interested in artificial intelligence.
A methodological consideration concerns the online survey format and the self-reported nature of the data. Because the survey was conducted remotely, participants theoretically could have consulted various resources, including ChatGPT, when responding to clinical scenarios. To mitigate this risk, we implemented specific quality control measures, including carefully monitoring completion times. Responses with completion times incompatible with thoughtful engagement (either too short, suggesting random answers, or excessively long, suggesting external consultation) were excluded from the analysis. This filtering process substantially reduced the likelihood that participants consulted AI tools during the completion process, strengthening the validity of our concordance findings. Nevertheless, we cannot eliminate the possibility that some participants may have used external resources, which represents a limitation inherent to remote survey methodologies. Another limitation relates to presenting a universal clinical scenario for assessing clinical concordance. While this approach ensured comparability across all participants, it may not capture the complexity and diversity of real-world clinical decision-making across different medical specialities and patient presentations. The findings are limited to the case presented and may not generalise to other clinical contexts or more uncertain diagnostic situations. Finally, although the survey included physicians from multiple clinical backgrounds, specialty representation was uneven. Radiologists, a specialty with early and extensive AI adoption, were not specifically targeted and may be under-represented, which may limit the generalisability of the findings to AI-mature clinical domains.
Despite these limitations, the sample size achieved exceeds that calculated as necessary to ensure acceptable precision in descriptive estimates, and the quality control procedures implemented enhance the reliability of the collected data. The exploratory analysis has highlighted useful trends and perceptions to guide more extensive future studies with rigorous designs.
5. Conclusions
Our findings provide an empirical foundation to support the evidence-based integration of AI tools in Italian clinical workflows, while highlighting the urgent need for structured education programs, tailored to medical professionals. Italian physicians show a strong interest in adopting AI tools, despite significant knowledge gaps and limited practical experience. The high concordance between physicians' evaluations and ChatGPT's diagnostic proposals suggests potential for AI-based decision support in clinical workflows. Targeted training and institutional support are essential to bridge the gap between enthusiasm and readiness for AI integration.
The study reveals that Italian physicians are cautiously optimistic about AI adoption, with a strong emphasis on proper training, scientific validation, and technical support as prerequisites for successful implementation in clinical practice. The remarkable alignment between physician judgment and AI-generated diagnostic recommendations opens new possibilities for integrating AI as decision support tools in clinical practice, provided that appropriate educational infrastructure and evidence-based implementation strategies are prioritised.
Acknowledgments
Authors thank all participants to all activity in the current study.
Funding Statement
The author(s) declared that financial support was not received for this work and/or its publication.
Footnotes
Edited by: Cristian Vacacela Gomez, National Laboratory of Frascati (INFN), Italy
Reviewed by: Swati Goyal, Gandhi Medical College Bhopal, India
Rasha Ahmed, Ninevah University, Iraq
Data availability statement
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.
Ethics statement
Ethical approval was not required for the studies involving humans because all procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards. The studies were conducted in accordance with the local legislation and institutional requirements. The participants provided their written informed consent to participate in this study.
Author contributions
VC: Formal analysis, Data curation, Validation, Conceptualization, Methodology, Supervision, Investigation, Writing – original draft, Writing – review & editing. MMu: Writing – review & editing, Investigation, Validation. LP: Investigation, Validation, Writing – original draft, Writing – review & editing. EB: Data curation, Conceptualization, Investigation, Software, Validation, Writing – review & editing, Writing – original draft. GDi: Validation, Writing – review & editing. MMa: Writing – original draft, Validation, Investigation, Writing – review & editing. EC: Investigation, Writing – review & editing, Validation. PP: Investigation, Writing – review & editing. EP: Validation, Writing – review & editing, Investigation. GP: Validation, Writing – review & editing, Investigation. LT: Writing – review & editing, Investigation, Validation. AB: Validation, Writing – review & editing, Investigation. GDe: Investigation, Writing – review & editing, Validation. LF: Investigation, Validation, Writing – review & editing. MG: Writing – review & editing, Investigation, Validation. FM: Validation, Writing – review & editing, Investigation. SN: Investigation, Validation, Methodology, Writing – review & editing.
Conflict of interest
The author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Generative AI statement
The author(s) declared that generative AI was not used in the creation of this manuscript.
Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.
Publisher's note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
References
- 1.Esteva A, Robicquet A, Ramsundar B, Kuleshov V, DePristo M, Chou K, et al. A guide to deep learning in healthcare. Nat Med. (2019) 25(1):24–9. 10.1038/s41591-018-0316-z [DOI] [PubMed] [Google Scholar]
- 2.Maslej N, Fattorini L, Perrault R, Parli V, Reuel A, Brynjolfsson E, et al. The AI Index 2024 Annual Report. Stanford, CA: AI Index Steering Committee, Institute for Human-Centered AI, Stanford University; (2024). [Google Scholar]
- 3.Ministero della Salute. Piano Nazionale per L'innovazione del Sistema Sanitario Basata Sull'intelligenza artificiale 2023–2027. Roma: Ministero della Salute; (2023). [Google Scholar]
- 4.Cascini F, Beccia F, Causio FA, Melnyk A, Zaino A, Ricciardi W. Scoping review of the current landscape of AI-based applications in clinical trials. Front Public Health. (2022) 10:949377. 10.3389/fpubh.2022.949377 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Reale R, Biasin E, Scardovi A, Toro S. The design and implementation of a national AI platform for public healthcare in Italy: implications for semantics and interoperability. ArXiv, abs/2304.11893. (2023). 10.48550/arXiv.2304.11893 [DOI]
- 6.Cellina M, Cacioppa LM, Cè M, Chiarpenello V, Costa M, Vincenzo Z, et al. Artificial intelligence in lung cancer screening: the future is now. Cancers (Basel). (2023) 15(17):4344. 10.3390/cancers15174344 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Injante R, Julca M. Detection of diabetic retinopathy using artificial intelligence: an exploratory systematic review. LatIA. (2024) 2:112. 10.62486/latia2024112 [DOI] [Google Scholar]
- 8.Ahmadi A. Navigating the future: challenges and opportunities in hospital care in Italy—a review of AI and big data integration. Int J BioMed Insights. (2024) 1(1):23–34. 10.22034/ijbmi.2024.198709 [DOI] [Google Scholar]
- 9.Bini SA. Artificial intelligence, machine learning, deep learning, and cognitive computing: what do these terms mean and how will they impact health care? J Arthroplasty. (2021) 33(8):2358–61. 10.1016/j.arth.2018.02.067 [DOI] [PubMed] [Google Scholar]
- 10.Cingolani M, Scendoni R, Fedeli P, Cembrani F. Artificial intelligence and digital medicine for integrated home care services in Italy: opportunities and limits. Front Public Health. (2023) 10:1095001. 10.3389/fpubh.2022.1095001 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Commissione europea, D. G. Orientamenti etici per un'IA affidabile (2019). Available online at: https://data.europa.eu/doi/10.2759/640340 (Accessed January 10, 2026).
- 12.Chen M, Zhang B, Cai Z, Seery S, Gonzalez MJ, Ali NM, et al. Acceptance of clinical artificial intelligence among physicians and medical students: a systematic review with cross-sectional survey. Front Med (Lausanne). (2022) 9:990604. 10.3389/fmed.2022.990604 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Neri E, Coppola F, Miele V, Bibbolino C, Grassi R. Artificial intelligence: who is responsible for the diagnosis? Radiol Med. (2020) 125(6):517–21. 10.1007/s11547-020-01135-9 [DOI] [PubMed] [Google Scholar]
- 14.Associazione Italiana di Informatica Medica. Indagine Nazionale Sull'adozione Dell'intelligenza artificiale Nella Pratica Clinica in Italia. Milano: AIIM Press; (2023). [Google Scholar]
- 15.FNOMCeO. Medici e Odontoiatri: ecco i numeri della professione (2023). Available online at: https://portale.fnomceo.it/medici-e-odontoiatri-ecco-i-numeri-della-professione/ (Accessed October 24, 2025).
- 16.Cofini V, Piccardi L, Benvenuti E, Di Pangrazio G, Cimino E, Mancinelli M, et al. The I-KAPCAM-AI-Q: a novel instrument for evaluating health care providers’ AI awareness in Italy. Front Public Health. (2025) 13:1655659. 10.3389/fpubh.2025.1655659 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Dalawi I, Isa MR, Chen XW, Azhar ZI, Aimran N. Development of the Malay language of understanding, attitude, practice and health literacy questionnaire on COVID-19 (MUAPHQ C-19): content validity & face validity analysis. BMC Public Health. (2023) 23(1):1131. 10.1186/s12889-023-16044-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Bland JM, Altman DG. Statistics notes: Cronbach's alpha. Br Med J. (1997) 314(7080):572. 10.1136/bmj.314.7080.572 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Foster RC. KR20 And KR21 for some nondichotomous data (it's not just Cronbach's alpha). Educ Psychol Meas. (2021) 81(6):1172–85. 10.1177/0013164421992535 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Sadler GR, Lee HC, Lim RS, Fullerton J. Recruitment of hard-to-reach population subgroups via adaptations of the snowball sampling strategy. Nurs Health Sci. (2010) 12(3):369–74. 10.1111/j.1442-2018.2010.00541.x [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Raosoft Inc. Sample size calculator(2024). Available online at: http://www.raosoft.com/samplesize.html (Accessed October 24, 2025).
- 22.Cofini V, Benvenuti E, Mancinelli M, Di Pangrazio G, Piccardi L, Muselli M, et al. A framework to improve data quality and manage dropout in web-based medical surveys: insights from an ai awareness study among Italian physicians. EBPH. (2025) 20:35–6. 10.54103/2282-0930/29202 [DOI] [Google Scholar]
- 23.Blease C, Kharko A, Bernstein M, Bradley C, Houston M, Walsh I, et al. Machine learning in medical education: a survey of the experiences and opinions of medical students in Ireland. BMJ Health Care Inform. (2022) 29(1):e100480. 10.1136/bmjhci-2021-100480 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Sapci AH, Sapci HA. Artificial intelligence education and tools for medical and health informatics students: systematic review. JMIR Med Educ. (2020) 6(1):e19285. 10.2196/19285 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Ding X, Shang B, Xie C, Xin J, Yu F. Artificial intelligence in the COVID-19 pandemic: balancing benefits and ethical challenges in China's response. Humanit Soc Sci Commun. (2025) 12:245. 10.1057/s41599-025-04564-x [DOI] [Google Scholar]
- 26.Wartman SA, Combs CD. Medical education must move from the information age to the age of artificial intelligence. Acad Med. (2018) 93(8):1107–9. 10.1097/ACM.0000000000002044 [DOI] [PubMed] [Google Scholar]
- 27.Goyal S, Sakhi P, Kalidindi S, Nema D, Pakhare AP. Knowledge, attitudes, perceptions, and practices related to artificial intelligence in radiology among Indian radiologists and residents: a multicenter nationwide study. Cureus. (2024) 16(12):e76667. 10.7759/cureus.76667 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Qurashi AA, Alanazi RK, Alhazmi YM, Almohammadi AS, Alsharif WM, Alshamrani KM. Saudi Radiology personnel's perceptions of artificial intelligence implementation: a cross-sectional study. J Multidiscip Healthc. (2021) 14:3225–31. 10.2147/JMDH.S340786 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Alyami AS, Majrashi NA, Shubayr NA. Radiologists’ and Radiographers’ perspectives on artificial intelligence in medical imaging in Saudi Arabia. Curr Med Imaging. (2023) 20:e15734056250970. 10.2174/0115734056250970231117111810 [DOI] [PubMed] [Google Scholar]
- 30.Huisman M, Ranschaert E, Parker W, Mastrodicasa D, Koci M, de Santos D P, et al. An international survey on AI in radiology in 1041 radiologists and radiology residents part 2: expectations, hurdles to implementation, and education. Eur Radiol. (2021) 31(11):8797–806. 10.1007/s00330-021-07782-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Pesapane F, Codari M, Sardanelli F. Artificial intelligence in medical imaging: threat or opportunity? Radiologists again at the forefront of innovation in medicine. Eur Radiol Exp. (2018) 2(1):35. 10.1186/s41747-018-0061-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Hosny A, Parmar C, Quackenbush J, Schwartz LH, Aerts HJWL. Artificial intelligence in radiology. Nat Rev Cancer. (2018) 18(8):500–10. 10.1038/s41568-018-0016-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Tudor Car L, Dhinagaran DA, Kyaw BM, Kowatsch T, Joty S, Theng YL, et al. Conversational agents in health care: scoping review and conceptual analysis. J Med Internet Res. (2020) 22(8):e17158. 10.2196/17158 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Ma Y, Achiche S, Pomey MP, Paquette J, Adjtoutah N, Vicente S, et al. Adapting and evaluating an AI-based chatbot through patient and stakeholder engagement to provide information for different health conditions: master protocol for an adaptive platform trial (the MARVIN chatbots study). JMIR Res Protoc. (2024) 13:e54668. 10.2196/54668 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Laymouna M, Ma Y, Lessard D, Schuster T, Engler K, Lebouché B. Roles, users, benefits, and limitations of chatbots in health care: rapid review. J Med Internet Res. (2024) 26:e56930. 10.2196/56930 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Pinto Dos Santos D, Giese D, Brodehl S, Chon SH, Staab W, Kleinert R, et al. Medical students’ attitude towards artificial intelligence: a multicentre survey. Eur Radiol. (2019) 29(4):1640–6. 10.1007/s00330-018-5601-1 [DOI] [PubMed] [Google Scholar]
- 37.Patel BN, Rosenberg L, Willcox G, Baltaxe D, Lyons M, Irvin J, et al. Human-machine partnership with artificial intelligence for chest radiograph diagnosis. NPJ Digit Med. (2019) 2:111. 10.1038/s41746-019-0189-7. Erratum in: NPJ Digit Med. 2019 December 10;2:129. doi: 10.1038/s41746-019-0198-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Verghese A, Shah NH, Harrington RA. What this computer needs is a physician: humanism and artificial intelligence. JAMA. (2018) 319(1):19–20. 10.1001/jama.2017.19198 [DOI] [PubMed] [Google Scholar]
- 39.Jussupow E, Spohrer K, Heinzl A. Identity threats as a reason for resistance to artificial intelligence: survey study with medical students and professionals. JMIR Form Res. (2022) 6(3):e28750. 10.2196/28750 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Castagno S, Khalifa M. Perceptions of artificial intelligence among healthcare staff: a qualitative survey study. Front Artif Intell. (2020) 3:578983. 10.3389/frai.2020.578983 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Gille F, Jobin A, Ienca M. What we talk about when we talk about trust: theory of trust for AI in healthcare. Intell Based Med. (2020) 1–2:100001. 10.1016/j.ibmed.2020.100001 [DOI] [Google Scholar]
- 42.Liyanage H, Liaw ST, Jonnagaddala J, Schreiber R, Kuziemsky C, Terry AL, et al. Artificial intelligence in primary health care: perceptions, issues, and challenges. Yearb Med Inform. (2019) 28(1):41–6. 10.1055/s-0039-1677901 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Huisman M, Ranschaert E, Parker W, Mastrodicasa D, Koci M, Pinto de Santos D, et al. An international survey on AI in radiology in 1,041 radiologists and radiology residents part 1: fear of replacement, knowledge, and attitude. Eur Radiol. (2021) 31(9):7058–66. 10.1007/s00330-021-07781-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Nori H, King N, McKinney SM, Carignan DM, Horvitz E. Capabilities of GPT-4 on medical challenge problems. arXiv:2303.13375. (2023). 10.48550/arXiv.2303.13375 [DOI]
- 45.Thirunavukarasu AJ. How can the clinical aptitude of AI assistants be assayed? J Med Internet Res. (2023) 25:e51603. 10.2196/51603 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Magrabi F, Ammenwerth E, McNair JB, De Keizer NF, Hyppönen H, Nykänen P, et al. Artificial intelligence in clinical decision support: challenges for evaluating AI and practical implications. Yearb Med Inform. (2019) 28(1):128–34. 10.1055/s-0039-1677903 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Daneshjou R, Smith MP, Sun MD, Rotemberg V, Zou J. Lack of transparency and potential bias in artificial intelligence data sets and algorithms: a scoping review. JAMA Dermatol. (2021) 157(11):1362–9. 10.1001/jamadermatol.2021.3129 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Sendak MP, Gao M, Brajer N, Balu S. Presenting machine learning model information to clinical end users with model facts labels. NPJ Digit Med. (2020) 3:41. 10.1038/s41746-020-0253-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

