Abstract
Objective
Artificial intelligence (AI) technology is profoundly transforming the healthcare domain, with generative artificial intelligence (GenAI) and AI chatbots demonstrating significant potential across clinical practice, medical research, and medical education through their robust data generation and personalized interaction capabilities. However, systematic research on how Chinese physicians apply these advanced technologies remains limited, highlighting a critical need to explore real-world application patterns and multidimensional challenges of GenAI technology in medicine from physicians’ perspectives. This study aims to provide empirical evidence for the rational application and responsible governance of GenAI in the Chinese healthcare system, thereby advancing global understanding of AI integration in medicine.
Materials and Methods
This study employed a cross-sectional survey design targeting licensed physicians in China, and data were collected through standardized anonymous electronic questionnaires. To ensure study comprehensiveness and scientific rigor, we systematically reviewed relevant literature to inform questionnaire development. The literature search encompassed studies published between January 2018 and February 2024 indexed in PubMed, Web of Science, and Google Scholar. The questionnaire assessed respondents’ demographic characteristics, current applications of AI chatbots in clinical practice, medical research, and medical education, as well as physicians’ attitudes toward AI chatbots and their perceived potential challenges.
Results
Results revealed that physicians who had used AI chatbots generally held positive attitudes, while non-users demonstrated significantly more cautious attitudes. Physicians primarily expressed concerns regarding information reliability in AI chatbot application, compliance with relevant academic ethical standards, and possible impacts on critical thinking development among medical professionals. These findings collectively depict an overall attitude toward GenAI in the Chinese medical context that is both enthusiastic and cautious.
Conclusions
This study represents one of the few empirical investigations on Chinese physicians’ use of GenAI, with both academic and practical significance. The research findings provide valuable insights into the practical application of GenAI in the Chinese healthcare system and offer evidence-based support for technology optimization, application strategy development, and establishment of systematic risk mitigation mechanisms.
Supplementary information
The online version contains supplementary material available at 10.1186/s12967-026-07912-w.
Keywords: AI-chatbots, Artificial intelligence, Chinese physicians, Electronic survey, Generative artificial intelligence
Introduction
The rapid development of artificial intelligence (AI) technology is profoundly reshaping service delivery models in the healthcare domain [1–6]. As a significant branch of AI, generative AI (GenAI) demonstrates broad application prospects in medicine based on its powerful data generation capabilities and novel outputs [1]. Recent breakthrough advancements in large language models (LLMs) and natural language processing (NLP) technologies [7] have facilitated the development and application of advanced AI-chatbots, such as OpenAI’s GPT-4, Anthropic’s Claude, and Google’s PaLM2 [1, 8, 9]. The systematic review conducted by Lu Xu et al. on 78 medical chatbots further validated the broad applicability of this technology in the medical field [10]. Specifically, in clinical practice, these AI-chatbots can not only efficiently process electronic health record (EHR) information [11], provide reliable diagnostic decision support for clinicians [1], assist in precise medical image interpretation [12–16], but also significantly reduce the documentation burden on healthcare professionals [1], exemplified by automatically generating standardized discharge summaries [17]. Additionally, in terms of doctor-patient communication, AI-chatbots can facilitate accurate translation of medical terminology [8], provide evidence-based health consultations [18], and assist patients in understanding medication interaction information [19].
In the domains of medical research and education, GenAI is increasingly demonstrating its distinct and significant value. First, in medical research, LLMs provide researchers with powerful knowledge acquisition tools [8], capable of generating high-quality academic text references [20], and by significantly lowering technical barriers, enable clinicians to conduct data analysis more conveniently and efficiently [8]. In terms of data synthesis, researchers have successfully implemented intelligent synthesis of EHR data through the application of generative adversarial network technology, effectively addressing key issues in clinical research such as data privacy protection and acquisition limitations [1]. In drug development, GenAI can facilitate intelligent design of novel small molecules, nucleic acid sequences, and proteins [21], significantly accelerating the generation process of candidate drugs and improving the predictive accuracy of new drug safety profiles [1]. Second, in medical education, GenAI can effectively enhance medical students’ clinical thinking and practical abilities by creating realistic, high-quality virtual cases and personalized intelligent doctor-patient dialogue simulation systems [1]. Overall, GenAI technology is profoundly driving the synergistic development of clinical practice, medical research, and medical education at an unprecedented pace, providing powerful technical support and transformative momentum for systematic evolution of future medical models.
Despite the extensive research on GenAI and AI-chatbots, current studies predominantly focus on the development and optimization of the technology itself. In contrast, research from the user’s perspective, especially systematic investigations into how healthcare professionals interact with these advanced technologies and their acceptance and usage experiences, remains relatively scarce. Furthermore, although GenAI and AI-chatbots have demonstrated significant potential across multiple healthcare domains in China including clinical practice, research innovation, and medical education, systematic studies on how Chinese physicians utilize these technological tools remain notably insufficient, creating a significant knowledge gap. This research status highlights the pressing academic need to systematically explore the practical applications of GenAI and AI-chatbots in medicine from the perspective of Chinese physicians.
This study employs a questionnaire survey methodology to systematically analyze Chinese physicians’ perception and current application of GenAI. The research content encompasses respondent physicians’ demographic characteristics, usage behavior patterns of AI-chatbots, cognition assessment, and analysis of usage preferences for different AI-chatbots both domestically and internationally. Second, the study examines specific applications of GenAI and intelligent dialogue systems in clinical practice, research innovation, and medical education, systematically analyzing application frequency in different medical scenarios and the impact of usage behavior characteristics on application effectiveness. Additionally, the study focuses on Chinese healthcare practitioners’ overall attitudes toward GenAI technology, evaluating their usage satisfaction, future expectations, and practical challenges, identifying technology optimization directions through user feedback collection and analysis to provide guidance for the continuous development of AI in healthcare. This study addresses the academic gap in research on GenAI applications among Chinese healthcare practitioners, thoroughly analyzing Chinese physicians’ perception, current usage status, and attitude tendencies toward intelligent dialogue systems through standardized questionnaire surveys, establishing an important foundation for subsequent research in this field. This study provides empirical research support for the innovative development of new-generation AI technology in China’s healthcare domain, particularly regarding application value and challenges in specific scenarios within clinical practice, research innovation, and medical education, offering a robust scientific basis for technology optimization, application strategy formulation, and policy improvement.
Methods
Research design
In accordance with the Checklist for Reporting Results of Internet E-Surveys (CHERRIES) [22], this study explicitly delineated the target population and sampling frame for the questionnaire design. This study employed a cross-sectional survey design targeting practicing physicians in China. The sampling frame comprised practicing physicians in China who could be reached through multiple channels to access the questionnaire. The research team implemented a multi-channel questionnaire distribution strategy that included: (1) disseminating questionnaire invitations to practicing physicians at medical institutions through social media platforms; (2) promoting the questionnaire at academic conferences, continuing education courses, and professional training sessions; (3) distributing the questionnaire link and QR code through DXY, a professional medical platform and one of China’s most influential online communities for healthcare professionals, to expand sample coverage and representativeness. According to the official DXY website, the platform currently has over 9 million registered professional users, accounting for approximately 80% of the total health technical personnel nationwide, including 4.05 million registered physicians, representing approximately 92% of all licensed physicians in China [23]. Leveraging this extensive and representative professional physician network, this study established a robust and nationally representative sampling foundation. Through the multi-channel questionnaire distribution strategy, the study sample encompassed physicians from diverse regions, specialty categories, and hospital levels, thereby ensuring the generalizability and reliability of the research findings. This study employed convenience sampling, whereby respondents voluntarily participated via publicly available questionnaire links or QR codes. This study was approved by the Scientific Research Ethics Committee of the First Affiliated Hospital of Jinan University (approval number: KY-2024–178). All participants provided written electronic informed consent prior to completing the questionnaire. On the first page of the online questionnaire, respondents were presented with an informed consent statement outlining the research purpose, estimated completion time, and data confidentiality measures. Respondents could proceed with the questionnaire only after selecting “I have already known. Continue the fillings.” This study strictly adhered to data privacy protection principles by utilizing anonymized electronic questionnaires for data collection and implementing encryption protocols for all stored information to ensure data security.
Questionnaire design
Prior to initiating this study, the research team provided target participants with comprehensive explanations regarding the research background and objectives. The questionnaire development process incorporated systematic literature reviews, internet-based research data, and findings from multiple rounds of unstructured in-depth interviews with clinical physicians and researchers who had experience using AI chatbots [24]. To ensure both the comprehensiveness and scientific rigor of the study, we systematically reviewed relevant literature to inform the questionnaire framework design. The literature search encompassed studies published between January 2018 and February 2024 indexed in PubMed, Web of Science, and Google Scholar databases. The questionnaire was structured around four primary dimensions: first, demographic characteristics [25–27], which encompassed participants’ basic information including age, gender, departmental affiliation, professional title, educational attainment, years in practice, and clinical, research, and teaching experience; second, AI chatbot usage and cognition assessment [28], designed to evaluate participants’ actual usage experiences and understanding of this technology; third, attitude assessment [29], which established four specific application scenarios for each of the three major domains—clinical practice, research, and medical education—utilizing a 5-point Likert scale to measure users’ satisfaction (from 1 = very dissatisfied to 5 = very satisfied) and non-users’ assessments of application prospects (from 1 = very unpromising to 5 = very promising) [30], with higher scores indicating more positive attitudes; fourth, concern assessment [26, 27], which addressed potential concerns regarding AI chatbots in the aforementioned three domains, with five potential risk items for each domain, similarly employing a 5-point Likert scale to assess the importance and urgency of these issues. To evaluate the internal consistency reliability of the questionnaire scales, we calculated Cronbach’s α coefficients for both the application attitude scales and potential concern scales regarding AI chatbots across the three major domains of clinical practice, research innovation, and medical education [31]. The results demonstrated that all subscales exhibited good to excellent internal consistency based on their Cronbach’s α coefficients. Specifically, the Cronbach’s α coefficients for the application attitude scales ranged from 0.874 to 0.909, encompassing three subscales: clinical practice application attitude (α = 0.909), research innovation application attitude (α = 0.874), and medical education application attitude (α = 0.905). Similarly, the Cronbach’s α coefficients for the potential concern scales ranged from 0.871 to 0.885, comprising three subscales: clinical practice concerns (α = 0.885), research innovation concerns (α = 0.871), and medical education concerns (α = 0.879). These findings confirm that the scales employed in this study possess high internal consistency and reliability, thereby enabling stable and reliable assessment of physicians’ attitudes toward the use of generative artificial intelligence and their risk perceptions across different healthcare scenarios. The complete questionnaire content is available in Supplementary Material 1.
Quality control of the questionnaire survey
Prior to formal implementation of the survey, the research team conducted a small-scale pilot test to evaluate the clarity and validity of the questionnaire content. The research questionnaire was developed using a professional online survey platform (Sojump), while the research team systematically assessed and optimized the questionnaire’s usability and technical feasibility. To ensure data quality and reliability, this study implemented the following quality control measures: 1) establishing IP address restrictions to prevent duplicate submissions; 2) setting reasonable response time parameters to identify and exclude invalid responses; 3) implementing consistency verification protocols for key items; 4) instituting mandatory response completion mechanisms. Questionnaire distribution and data collection were conducted between April 1, 2024, and June 30, 2024. All survey data were automatically collected and securely stored through the platform system. Inclusion criteria included the following: (1) Explicit consent to participate (indicated by clicking the consent button on the questionnaire homepage). (2) Confirmation of current status as practicing physicians actively employed in mainland China (indicated by selecting the physician option in the identity verification item). (3) Completion of all mandatory questionnaire items within a reasonable timeframe (minimum of 5 minutes). This threshold was based on pilot testing data and designed to exclude responses completed carelessly or without adequate consideration. Exclusion criteria included the following: (1) Identity incongruence (e.g., selection of medical student rather than practicing physician status in the identity verification item). (2) Abnormally short response time (less than 5 minutes), indicative of invalid or non-conscientious responses. (3) Incomplete responses (failure to complete all mandatory items). (4) Duplicate submissions identified through IP address filtering. The study protocol design and report preparation strictly adhered to the Checklist for Reporting Results of Internet E-Surveys (CHERRIES) guidelines (see Supplementary Material 2 for details). According to CHERRIES guidelines, this study could not accurately calculate the view rate, participation rate, or completion rate. The questionnaire was distributed through multiple channels (including social media, academic conferences, and the DXY platform) rather than via a single website. Consequently, the number of unique site visitors could not be obtained, precluding calculation of the view rate. Moreover, because of the mandatory completion mechanism implemented in the questionnaire system, respondents could not submit partial responses (e.g., only the consent page). Consequently, the number of individuals who agreed to participate could not be determined, precluding calculation of both the participation rate and completion rate.
Statistical analysis
In this study, continuous variables are presented as mean ± standard deviation (Mean ± SD), while categorical variables are expressed as frequencies and percentages [n (%)]. Wilcoxon rank-sum tests were employed for analyzing between-group differences in continuous variables, chi-square tests (χ2 tests) were utilized to assess differences in unordered categorical variables, and Wilcoxon rank-sum tests were applied to evaluate between-group differences in ordered categorical variables. Spearman rank correlation analysis was conducted to examine relationships between variables. All statistical tests were two-sided, with p < 0.05 considered statistically significant. Statistical analyses were performed using Microsoft Excel (Version 16.661), IBM SPSS Statistics (Version 24.0), and R Software (Version 4.2.3).
Results
Respondent demographics
This study included a total of 373 healthcare practitioners with a mean age of 33.1 years, comprising 136 AI-chatbot users and 237 non-users (Fig. 1). As shown in Table 1, AI-chatbot users and non-users demonstrated significant differences across multiple demographic characteristics. AI-chatbot users were significantly younger than non-users (32.42 vs. 33.47 years, p = 0.023, Table 1) and demonstrated notably higher representation among oncology department personnel (20.6% vs. 7.2%, p = 0.006). The user group contained a higher proportion of licensed physicians (44.9% vs. 34.6%, p = 0.020) and significantly more individuals with doctoral degrees (35.3% vs. 14.8%, p = 0.001). Users were more likely to have overseas study experience (19.0% vs. 8.0%, p = 0.002) and possessed fewer years of professional experience on average (6.76 vs. 7.96 years, p = 0.018). Additionally, a higher percentage of users were employed at university-affiliated hospitals (70.6% vs. 50.6%, p = 0.001). The study also revealed that users were more inclined to engage in research activities, with a significantly higher proportion serving as project members or non-principal investigators in research projects (65.4% vs. 42.2%, p = 0.001) (Table 1).
Fig. 1.
Flowchart. Survey flowchart illustrating the questionnaire preparation phase (design and pre-testing), implementation phase (distribution methods and quality control measures), and analysis phase (data organization and statistical analysis). This figure was created based on the tools provided by Biorender.com (accessed on 10/3/2025)
Table 1.
Basic characteristics of the respondents
| AI chatbots non-users | AI chatbots users | p value | |
|---|---|---|---|
| N | 237 | 136 | |
| Age (mean (SD)) | 33.47 (5.39) | 32.42 (5.38) | 0.023 |
| Gender (%) | 0.832 | ||
| Female | 102 (43.0) | 57 (42.0) | |
| Male | 135 (57.0) | 79 (58.0) | |
| Department (%) | 0.006 | ||
| Surgery | 72 (30.4) | 41 (30.1) | |
| Internal Medicine | 53 (22.4) | 30 (22.1) | |
| Oncology | 17 (7.2) | 28 (20.6) | |
| Radiology | 17 (7.2) | 7 (5.1) | |
| Obstetrics and Gynecology | 16 (6.8) | 4 (2.9) | |
| Pediatrics | 11 (4.6) | 3 (2.2) | |
| Other Departments | 51 (21.4) | 23 (17.0) | |
| Title (%) | 0.020 | ||
| Medical Practitioner | 13 (5.5) | 9 (6.6) | |
| Physician | 82 (34.6) | 61 (44.9) | |
| Attending Physician | 119 (50.2) | 57 (41.9) | |
| Associate Chief Physician | 22 (9.3) | 5 (3.7) | |
| Chief Physician | 1 (0.4) | 4 (2.9) | |
| Education (%) | 0.001 | ||
| Junior college | 2 (0.8) | 2 (1.5) | |
| Undergraduate | 78 (32.9) | 24 (17.6) | |
| Master | 122 (51.5) | 62 (45.6) | |
| Doctor | 35 (14.8) | 48 (35.3) | |
| Oversea experience (%) | 0.002 | ||
| Have | 19 (8.0) | 26 (19.0) | |
| Not have | 218 (92.0) | 110 (81.0) | |
| Practice duration (mean (SD)) | 7.96 (5.72) | 6.76 (5.49) | 0.018 |
| Hospital of employment (%) | 0.001 | ||
| Affiliated hospital of a university | 120 (50.6) | 96 (70.6) | |
| Non-affiliated hospital of a university | 69 (29.1) | 32 (23.5) | |
| Private hospital | 9 (3.8) | 3 (2.2) | |
| International hospital | 0 (0.0) | 0 (0.0) | |
| Other | 39 (16.5) | 5 (3.7) | |
| Clinical Work Conducted (%) | 0.121 | ||
| Not Conducted | 20 (8.4%) | 6 (4.4) | |
| Regular Member of Medical Team | 203 (85.7%) | 116 (85.3) | |
| Leader of Medical Team | 14 (5.9%) | 14 (10.3) | |
| Research Work Conducted (%) | 0.001 | ||
| Not Conducted | 110 (46.4) | 25 (18.4) | |
|
Project Participant, Non-Principal Investigator (PI) |
100 (42.2) | 89 (65.4) | |
|
Independent PI, Research Group Leader |
27 (11.4) | 22 (16.2) | |
| Teaching Work Conducted (%) | 0.123 | ||
| No Teaching | 118 (49.8) | 72 (52.9) | |
| Lecturer | 33 (13.9) | 20 (14.7) | |
| Internship Supervisor | 82 (34.6) | 37 (27.2) | |
| Master’s Supervisor | 4 (1.7) | 3 (2.2) | |
| PhD Supervisor | 0 (0.0) | 2 (1.5) | |
| Post-Doctoral Supervisor | 0 (0.0) | 2 (1.5) |
Usage of AI-chatbots
Preferences
Figure 2A indicate that among the 136 surveyed healthcare professionals, AI-chatbot usage was distributed as follows: ChatGPT (63%), ERNIE-Bot (14%), other chatbots (13%), ChatGPT mirror sites (7%), and Bing Chat (3%). Regarding chatbot types, international platforms accounted for the highest proportion (71%), followed by domestic platforms (22%) and mirror sites (7%). In terms of performance assessment, 52% of respondents indicated a preference for international AI-chatbots, while only 9% preferred domestic platforms; 29% were unable to determine performance differences between the two platforms, and 10% considered their performance comparable (Fig. 2A). As shown in Fig. 2B, stratified analysis of AI-chatbot usage preferences revealed that respondents consistently demonstrated a preference for international AI-chatbots, regardless of specific products or categories. Among ChatGPT and ERNIE-Bot users, 52 and 63% respectively believed that international chatbots outperformed domestic platforms (Fig. 2B).
Fig. 2.
Chinese physicians’ usage patterns of AI-chatbots. A: Pie charts depicting the proportion of AI-chatbot names and types most frequently used by Chinese physicians, as well as their preference distribution toward domestic versus international AI-chatbots. B: Stacked bar chart stratifying users by their most frequently used AI-chatbot names and types, showing the preference proportion for domestic versus international AI-chatbots among users
Usage habits
According to survey data in Supplementary Table 1, over half (54.4%) of healthcare professionals frequently utilized AI-chatbots, primarily accessing them through social media platforms (44.1%) and peer recommendations (42.6%), with self-directed learning being the predominant learning method (69.9%). Regarding usage duration, 44.1% of respondents had used AI-chatbots for less than three months, 25.0% had experience ranging from three to six months, 39.0% used them infrequently, and 34.6% maintained a usage frequency of once every few days, with research applications constituting the primary purpose (43.4%). Survey results indicated that 86.0% of users lacked understanding of AI algorithms; regarding attention to version updates, 56.6% of users occasionally monitored updates, while 25.1% never paid attention to updates (Supplementary Table 1).
Attitude towards the application of AI-chatbots
AI-chatbot use rate in clinical practice, research and education
Supplementary Table 2 presents the utilization of AI-chatbots across three major application domains: clinical practice (27.6%), research work (41.4%), and educational activities (30.7%). Regarding specific applications, the clinical practice domain primarily focused on clinical documentation (53.7%). The utilization rates for clinical diagnosis and clinical treatment were lower than those for clinical documentation, at 19.1% and 25.7%, respectively. The research domain mainly utilized chatbots for research methodology design (50%) and academic writing (48.5%); and the educational domain primarily employed them for medical knowledge acquisition (44.1%) and research knowledge acquisition (40.4%) (Supplementary Table 2).
Comparison between those who had used previously and those who had not
Figure 3 and Supplementary Table 3 indicate that attitudinal differences between AI-chatbot users and non-users across various application scenarios were statistically significant (p < 0.05, Fig. 3). In the clinical practice domain, the user group demonstrated positive evaluations for clinical documentation, medical report generation, and clinical diagnostic assistance at rates of 72.6%, 62.5%, and 61.5% respectively; in contrast, the non-user group expressed negative attitudes toward clinical treatment applications and medical documentation at rates of 27.7% and 27.0% respectively. Regarding research applications, the user group demonstrated higher acceptance in research idea generation, academic writing, and research methodology applications, with positive evaluation rates of 66.7%, 63.5%, and 61.8% respectively; conversely, the non-user group exhibited negative evaluations toward research idea generation and research data analysis at rates of 31.8% and 29.2%. In the medical education domain, the user group showed highest approval for exam preparation assistance, with positive evaluations reaching 77.8%; followed by teaching assistance applications and medical knowledge acquisition, with positive evaluation rates of 64.0% and 61.7% respectively. In contrast, the non-user group expressed negative attitudes toward exam preparation and research knowledge acquisition functions at rates of 30.3% and 29.6% respectively (Supplementary Table 3).
Fig. 3.
Chinese physicians’ attitudes toward AI-chatbot applications in (A) clinical practice, (B) research, and (C) education. Each domain encompasses four specific application scenarios, with attitude proportions including users’ satisfaction levels (from 1 “very dissatisfied” to 5 “very satisfied”) and non-users’ expectation values for application prospects (from 1 “very little potential” to 5 “very high potential”). Attitude proportions for each application are represented through color stratification, and Wilcoxon rank-sum tests were used to compare attitude differences between users and non-users regarding AI-chatbot applications. * p < 0.05, ** p < 0.01, *** p < 0.001
Correlation between AI-chatbot usage behavior and attitude towards its application in clinical practice, research and education
Figure 4 and Supplementary Table 4 demonstrate that AI-chatbot usage frequency was significantly positively correlated with healthcare practitioners’ attitudes toward research methodology applications (r = 0.462, p < 0.001), academic writing (r = 0.332, p = 0.006), research knowledge acquisition (r = 0.320, p = 0.017), and teaching assistance tools (r = 0.407, p = 0.043). Further analysis revealed that AI-chatbot usage duration was significantly positively correlated with attitudes toward academic writing applications (r = 0.271, p = 0.027), and users’ attention to AI updates showed a significant positive correlation with attitudes toward research methodology applications (r = 0.299, p = 0.013) (Supplementary Table 4).
Fig. 4.
Correlations between five AI-chatbot usage habits and attitudes toward specific applications across three domains: clinical practice, research, and education (each domain containing four subdivided applications). Correlations and p-values were calculated using Spearman’s tests, with the heatmap depicting statistical relationships between usage habits and application attitudes. Correlation strength is represented by color intensity, and significance levels are denoted by asterisks (* p < 0.05, ** p < 0.01, *** p < 0.001)
Potential concerns on AI-chatbots
As shown in Figure 5 and Supplementary Table 5, regarding clinical practice applications, healthcare practitioners’ three primary concerns were information authenticity (54.4%), medical responsibility attribution (50.0%), and patient informed consent (50.0%). In the research application domain, respondents’ concerns about academic integrity were most prominent (61.8%), followed by apprehensions regarding authorship attribution (56.6%) and information authenticity (55.1%). In the medical education domain, survey participants primarily focused on concerns regarding information authenticity (55.1%), development of self-study abilities (50.7%), academic integrity assurance (49.3%), and critical thinking development (49.3%) (Supplementary Table 5).
Fig. 5.
Chinese physicians’ concerns regarding AI-chatbot use across clinical practice, research, and education domains, encompassing 15 specific issues. The chart presents varying degrees of concern through five levels (from “concerned” to “not concerned”), with the proportion of concern levels for each issue represented through color stratification
Discussion
Despite the increasing application of generative artificial intelligence (GenAI) and AI chatbots in the medical field, existing research on their user populations has primarily focused on medical researchers and medical students. International surveys have demonstrated that medical researchers exhibit considerable interest in the application of AI chatbots during the scientific research process [32]; whereas a nationwide study in China targeting medical students indicates that although they hold positive attitudes toward AI chatbots, their actual usage rates remain low [33]. Furthermore, national cross-sectional studies on Chinese physicians and nurses have predominantly focused on the overall perception, attitudes, and factors influencing adoption intentions regarding medical artificial intelligence [34], but have lacked in-depth analysis of the specific applications of AI chatbots. This study employed a questionnaire survey methodology to systematically analyze the demographic characteristics, cognitive awareness, and usage behavior patterns of AI chatbots among the surveyed physicians. The findings reveal the current applications of AI chatbots in clinical practice, scientific research, and medical education, while also clarifying Chinese physicians’ attitudes and potential concerns regarding this emerging technology. Building upon existing domestic and international research, this study conducted a systematic investigation among Chinese practicing physicians, providing empirical data on the current application status of AI chatbots in clinical practice, scientific research, and education, thereby offering scientific evidence for future technology optimization, application strategy formulation, and policy refinement. This study is expected to advance the further development and innovative applications of AI technology in the medical field.
Users of AI-chatbots in the healthcare field generally place high value on their application benefits, while non-users demonstrate a more cautious attitude, reflecting significant room for optimization in the development of AI-chatbot applications. At the clinical application level, healthcare professionals widely acknowledge the practical value of AI-chatbots in medical documentation, clinical report generation, and diagnostic assistance, as these intelligent tools can effectively streamline clinical workflows, enhance healthcare efficiency, and significantly alleviate healthcare worker burnout [35, 36]. AI-chatbots, with their powerful natural language processing capabilities, can efficiently summarize patient clinical information, generate systematic case assessments and standardized discharge summaries [17, 37], not only saving valuable clinical time but also allowing healthcare professionals to focus more energy on critical medical decision-making. AI systems based on LLMs, through systematic analysis of EHR data, can provide differential diagnostic suggestions with relatively high accuracy in complex cases [11, 38], thereby offering robust support for clinical decision-making. However, non-users maintain a cautious attitude toward AI-chatbot applications in clinical diagnosis, with primary concerns including potential erroneous guidance [39] and the “black box” effect [40, 41]. In the medical research domain, users of AI-chatbots demonstrate significant recognition of their auxiliary functions in research method design, innovative idea generation, and academic writing, with their role in enhancing research efficiency being widely affirmed. BioGPT demonstrates excellent performance in biomedical text generation [42], while ChatGPT is widely used for academic translation, paper summarization, and manuscript drafting [20, 43–45], thereby significantly improving research efficiency and quality. Non-users maintain reservations regarding AI-chatbot in research idea generation and data analysis, reflecting differences in technology acceptance. In the field of medical education, AI-chatbots have received high praise among educators in terms of professional knowledge acquisition, teaching assistance, and exam preparation, demonstrating their potential value as educational auxiliary tools. ChatGPT achieved nearly a 60% passing rate on the United States Medical Licensing Examination [46], demonstrating its potential in medical education. AI can rapidly integrate information, generate standardized content [47], provide demonstrations, translations, and contextualized learning materials in teaching assistance [8], and effectively summarize textbook key points [48]. Non-users’ concerns about AI-chatbots primarily focus on content accuracy [47] and the lack of emotional interaction capabilities [49, 50]. Overall, AI-chatbots demonstrate significant value in structured tasks across healthcare, research, and education domains, yet require further optimization for complex tasks involving professional judgment.
Despite the advantages AI-chatbots demonstrate in the medical field, physicians maintain numerous concerns, primarily including false information generation, risks of academic misconduct, authorship disputes, and negative impacts on critical thinking. First, the generation of false information raises widespread concern. Research indicates that among medical references provided by ChatGPT, over two-thirds are fabricated, and one-quarter of responses contain significant factual errors [51]. This “artificial hallucination” phenomenon [48] may stem from training data errors or a lack of specialized knowledge [52], potentially creating safety hazards in clinical settings [53] and undermining reliability in research [54]. Second, academic misconduct is receiving increasing attention. Some individuals may submit AI-generated text as original work, thereby constituting plagiarism. AI-generated content may copy phrases or sentences from other documents [55], and the current lack of effective identification tools [56] further exacerbates academic integrity issues. Furthermore, authorship disputes have also garnered significant attention. The International Committee of Medical Journal Editors (ICMJE) stipulates that authors must meet standards including conception, design, data analysis, and manuscript review [57]. Although some researchers have attempted to list AI-chatbot as an author [58, 59], most institutions maintain that AI-chatbot cannot assume authorship responsibility [60], primarily because the content generated by AI-chatbots involves intellectual property issues [53] and they cannot independently assume academic responsibility. Finally, in medical education, AI-chatbot use may undermine students’ critical thinking abilities [49]. If students become overly dependent on AI-chatbot while neglecting in-depth understanding and critical assessment, this may affect their future clinical decision-making abilities [8], warranting serious attention [61]. Moreover, the findings of this study reveal that the survey content encompassed clinical application scenarios involving decision support, such as “clinical diagnosis” and “clinical treatment,” yet their utilization rates were significantly lower than those of “clinical writing.” This discrepancy may reflect physicians’ cautious attitude toward AI tools in high-risk decision-making processes involving diagnosis and treatment. As demonstrated by our findings, physicians’ primary concerns included information authenticity, medical liability attribution, and patient informed consent. These concerns may lead physicians to preferentially utilize AI for documentation and auxiliary tasks in clinical practice, rather than for high-risk components of medical decision-making.
This study systematically proposes optimization recommendations, usage strategies, and regulatory measures for the multidimensional challenges faced by AI-chatbots in medical practice, aiming to ensure their safe and compliant application in the healthcare field. The study finds that senior physicians with extensive clinical experience demonstrate relatively lower acceptance of AI technology, while potential users are primarily constrained by cognitive barriers and limited technical understanding. Therefore, AI system developers need to construct comprehensive training systems to systematically help healthcare practitioners understand the advantages and limitations of AI technology, thereby promoting its effective application in clinical practice, scientific research, and medical education. Regarding false information risks, developers should train AI models based on authoritative medical databases, establish regular update mechanisms and continuously enhance reasoning capabilities and system transparency, thereby strengthening the credibility of AI models [51, 62]. Regarding AI application positioning, it is recommended to utilize AI as an auxiliary reasoning tool rather than an independent knowledge repository. Considering the “hallucination” issues in AI [63], researchers can input specific literature or clinical guidelines for AI-chatbot analysis to compensate for knowledge limitations and fully leverage its reasoning advantages [64]. In research practice and medical education, emphasis should be placed on avoiding excessive AI dependence, strengthening critical thinking cultivation, and achieving a balance between AI auxiliary functions and human clinical insights [65]. In terms of regulation, there is an urgent need to clarify responsibility attribution mechanisms for AI-related medical harm, improve relevant laws and regulations to effectively protect patient rights, and promote the standardized development of AI in healthcare [41, 66, 67]. Future studies should design clinical application scenarios related to accountability supervision, attribution mechanisms, and audit trails to explore the regulatory and accountability frameworks of AI in real-world medical practice. Such research would facilitate a more comprehensive understanding of physicians’ risk perceptions when using AI-based tools and reveal the underlying institutional requirements, thereby providing evidence-based insights for establishing a trustworthy AI governance system. Regarding academic norms, researchers should be required to clearly disclose the use of AI in the research process [43], strengthen academic integrity education [65], and develop corresponding AI text detection tools [56]. In conclusion, through optimizing application strategies, standardizing usage methods, and strengthening regulatory measures, the safe and reliable continuous development of AI-chatbots in the medical field can be effectively promoted.
This study conducted a systematic analysis of the current status, attitudes, and risk perceptions of Chinese physicians regarding AI-chatbot use, but several limitations remain. First, the study sample comprised 373 individuals, among whom 136 physicians (36.5%) were AI-chatbot users. The sample size was limited, and the recruitment strategy combining academic settings with the DXY community may have overrepresented younger physicians with academic backgrounds and higher levels of technological interest. Furthermore, the high proportion of respondents from university-affiliated hospitals among AI-chatbot users limits the generalizability of the findings to physicians practicing in primary care or private institutions. However, due to the lack of detailed baseline data on the national distribution of physicians across key dimensions such as age, region, specialty, and hospital level, this study was unable to conduct post-hoc stratified weighting analysis to correct for potential biases. Therefore, caution is warranted when extrapolating the study findings. Second, this study employed a cross-sectional research design with a three-month data collection period; this static measurement approach makes it difficult to capture the dynamic evolution of participants’ perceptions over time. Research subjects were primarily concentrated in economically developed regions such as Fujian, Guangdong, Beijing, and Zhejiang; this geographical distribution limitation may affect the external validity of the research findings. Finally, regarding potential risks of AI-chatbots, this study only addressed 15 assessment dimensions and has not covered all possible issues, such as bias problems that AI systems might cause [62]. This study did not fully control for all potential confounding factors in the correlation analyses, such as hospital type, academic role, educational attainment, and overseas experience. Although we initially identified significant direct associations between AI-chatbot usage frequency, duration, level of attention, and attitudes toward specific applications, these background factors may exert moderating or mediating effects on these relationships when examined in more complex models. Future research should incorporate these variables into multivariate regression or stratified analyses to more comprehensively elucidate the underlying mechanisms governing the relationship between usage behavior and attitudes, thereby yielding more precise conclusions. Based on these limitations, future research recommendations include expanding the sample size, integrating post-hoc weighting analysis with national physician distribution data, employing longitudinal research designs to track cognitive changes, broadening geographical coverage to enhance sample representativeness, and refining assessment dimensions to obtain more universally applicable research conclusions.
Conclusions
This study employed a questionnaire survey methodology to systematically analyze the current usage status, acceptance, and challenges faced by Chinese physicians regarding AI-chatbots in clinical practice, scientific research, and medical education. The findings indicate that physicians who have already used AI-chatbots hold positive attitudes toward their applications, while non-users demonstrate relatively cautious positions. The survey reveals that physicians primarily focus on core issues involved in AI-chatbot usage, including information reliability verification, academic ethical norms, and impacts on critical thinking. This study provides important empirical evidence for the practical application of AI-chatbots in China’s healthcare system, offering significant reference value for advancing relevant technology optimization, formulating application strategies, and establishing risk prevention and control mechanisms.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Acknowledgements
We would like to express our sincere gratitude to the GenAIMed Group (https://genaimed.org/), Sogen Biotechnology (Zhengzhou, Henan Province), and the SolvingLab team for their invaluable technical support and assistance throughout this research.
Abbreviations
- AI
Artificial intelligence
- GenAI
Generative artificial intelligence
- LLMs
Large language models
- NLP
Natural language processing
- EHR
Electronic health record
- CHERRIES
Checklist for reporting results of Internet e-surveys
- ICMJE
International committee of medical journal editors
Author contributions
Writing-original draft, M.Y.Z., A.Q.L.; Conceptualization, A.Q.L., W.Y.G., A.M.J., P.L.; Investigation, A.Q.L., W.Y.G., A.M.J. Y.K.L., C.Q., L.X.Z., W.M.M., D.Q.Z., M.J.X, G.D.C., S.K.P., H.Z.H.W., L.Z., H.G.Z., X.P.D., J.Z., Q.C., B.F.T., P.L.; Writing-review and editing, M.Y.Z., A.Q.L.; Visualization, M.Y.Z., A.Q.L. P.L.; All authors have read and agreed to the published version of the manuscript.
Funding
Not applicable.
Data availability
The de-identified dataset will be made publicly available in a GitHub repository(https://github.com/dollarzzz/Chinese-Physicians-on-Generative-Artificial-Intelligence-Implementation-and-Challenges.git) upon manuscript acceptance for publication. Given that the data contain information about practicing physicians, access will be restricted to academic and non-commercial research purposes, requiring users to accept the data sharing agreement prior to download.
Declarations
Ethical approval and consent to participate
This study was approved by the Scientific Research Ethics Committee of the First Affiliated Hospital of Jinan University (approval number: KY-2024–178). The Ethics Committee operates in strict accordance with China Good Clinical Practice (China GCP), the International Council for Harmonisation Good Clinical Practice (ICH-GCP) guidelines, the Declaration of Helsinki, and relevant national regulations, and its review process is free from any external influence. All participants provided electronic informed consent through a confirmation procedure on the first page of the online questionnaire prior to survey completion, acknowledging their understanding of the study objectives and data confidentiality provisions.
Consent for publication
Not applicable.
Generative AI use statement
No generative artificial intelligence tools were used in the writing, editing, data analysis, or figure preparation of this manuscript. All content was independently completed and verified by the authors, who assume full responsibility for the accuracy and completeness of the manuscript.
Competing interests
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Footnotes
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Anqi Lin, Meiyuan Zeng, Wenyi Gan and Aimin Jiang contributed equally to this work.
Contributor Information
Quan Cheng, Email: chengquan@csu.edu.cn.
Bufu Tang, Email: tangbufu@zju.edu.cn.
Peng Luo, Email: luopeng@smu.edu.cn.
References
- 1.Reddy S. Generative AI in healthcare: an implementation science informed translational path on application, integration and governance. Implement Sci. 2024;19:27. 10.1186/s13012-024-01357-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Wang F, Li S, Gao Y, Li S. Computed tomography-based artificial intelligence in lung disease-chronic obstructive pulmonary disease. MedComm Future Med. 2024;3:e73. 10.1002/mef2.73.
- 3.Lin A, Ye J, Qi C, Zhu L, Mou W, Gan W, et al. Bridging artificial intelligence and biological sciences: a comprehensive review of large language models in bioinformatics. Brief Bioinform. 2025;26:bbaf357. 10.1093/bib/bbaf357. [DOI] [PMC free article] [PubMed]
- 4.Yi M, Liu Y, Su Z. AlphaMissense, a groundbreaking advancement in artificial intelligence for predicting the effects of missense variants. MedComm Future Med. 2024;3:e70. 10.1002/mef2.70.
- 5.Liu Y, Zhang S, Liu K, Hu X, Gu X. Advances in drug discovery based on network pharmacology and omics technology. Curr Pharm Anal. 2024;21:33–43. 10.1016/j.cpan.2024.12.002. [Google Scholar]
- 6.Lin A, Wang Z, Jiang A, Chen L, Qi C, Zhu L, et al. Large language models in clinical trials: applications, technical advances, and future directions. BMC Med. 2025;23:563. 10.1186/s12916-025-04348-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Tangadulrat P, Sono S, Tangtrakulwanich B. Using ChatGPT for clinical practice and medical education: cross-sectional survey of medical students’ and physicians’ perceptions. JMIR Med Educ. 2023;9:e50658. 10.2196/50658. [DOI] [PMC free article] [PubMed]
- 8.Clusmann J, Kolbinger FR, Muti HS, Carrero ZI, Eckardt J-N, Laleh NG, et al. The future landscape of large language models in medicine. Commun Med (Lond). 2023;3:141. 10.1038/s43856-023-00370-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Qiu Z, Jiang A, Qi C, Gan W, Zhu L, Mou W, et al. Temporal evolution of large language models (LLMs) in oncology. J Transl Med. 2025;23:1219. 10.1186/s12967-025-07227-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Xu L, Sanders L, Li K, Chow JCL. Chatbot for health care and oncology applications using artificial intelligence and machine learning: systematic review. JMIR Cancer. 2021;7:e27850. 10.2196/27850. [DOI] [PMC free article] [PubMed]
- 11.Kanjee Z, Crowe B, Rodman A. Accuracy of a generative artificial intelligence model in a complex diagnostic challenge. JAMA. 2023;330:78–80. 10.1001/jama.2023.8288. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Bhayana R, Krishna S, Bleakney RR. Performance of ChatGPT on a radiology board-style examination: insights into current strengths and limitations. Radiology. 2023;307:e230582. 10.1148/radiol.230582. [DOI] [PubMed]
- 13.Luo P, Fan C, Li A, Jiang T, Jiang A, Qi C, et al. Performance analysis of large language models in multi-disease detection from chest computed tomography reports: a comparative study. Int J Surg. 2025;111:5071–87. 10.1097/JS9.0000000000002582. [DOI] [PubMed] [Google Scholar]
- 14.Shen J, Feng S, Zhang P, Qi C, Liu Z, Feng Y, et al. Evaluating generative AI models for explainable pathological feature extraction in lung adenocarcinoma: grading assessment and prognostic model construction. Int J Surg. 2025;111:4252–62. 10.1097/JS9.0000000000002507. [DOI] [PubMed] [Google Scholar]
- 15.Zhu L, Lai Y, Mou W, Zhang H, Lin A, Qi C, et al. ChatGPT’s ability to generate realistic experimental images poses a new challenge to academic integrity. J Hematol Oncol. 2024;17:27. 10.1186/s13045-024-01543-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Zhu L, Mou W, Wu K, Lai Y, Lin A, Yang T, et al. Multimodal ChatGPT-4V for electrocardiogram interpretation: promise and limitations. J Med Internet Res. 2024;26:e54607. 10.2196/54607. [DOI] [PMC free article] [PubMed]
- 17.Patel SB, Lam K. ChatGPT: the future of discharge summaries? Lancet Digit Health. 2023;5:e107–8. 10.1016/S2589-7500(23)00021-3. [DOI] [PubMed]
- 18.Zheng Y, Wang L, Feng B, Zhao A, Wu Y. Innovating healthcare: the role of ChatGPT in streamlining hospital workflow in the future. Ann Biomed Eng. 2024;52:750–53. 10.1007/s10439-023-03323-w. [DOI] [PubMed] [Google Scholar]
- 19.Juhi A, Pipil N, Santra S, Mondal S, Behera JK, Mondal H. The capability of ChatGPT in predicting and explaining common drug-drug interactions. Cureus. 2023;15:e36272. 10.7759/cureus.36272. [DOI] [PMC free article] [PubMed]
- 20.Biswas S. ChatGPT and the future of medical writing. Radiology. 2023;307:e223312. 10.1148/radiol.223312. [DOI] [PubMed]
- 21.Vert J-P. How will generative AI disrupt data science in drug discovery? Nat Biotechnol. 2023;41:750–51. 10.1038/s41587-023-01789-6. [DOI] [PubMed] [Google Scholar]
- 22.Eysenbach G. Improving the quality of web surveys: the Checklist for Reporting Results of Internet E-Surveys (CHERRIES). J Med Internet Res. 2004;6:e34. 10.2196/jmir.6.3.e34. [DOI] [PMC free article] [PubMed]
- 23.https://www.dxy.cn/pages/about.html.
- 24.Lintner T. A systematic review of AI literacy scales. NPJ Sci Learn. 2024;9:50. 10.1038/s41539-024-00264-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Doraiswamy PM, Blease C, Bodner K. Artificial intelligence and the future of psychiatry: insights from a global physician survey. Artif Intell Med. 2020;102:101753. 10.1016/j.artmed.2019.101753. [DOI] [PubMed] [Google Scholar]
- 26.Oh S, Kim JH, Choi S-W, Lee HJ, Hong J, Kwon SH. Physician confidence in artificial intelligence: an online mobile survey. J Med Internet Res. 2019;21:e12422. 10.2196/12422. [DOI] [PMC free article] [PubMed]
- 27.Cortes J, Paravar T, Oldenburg R. Physician opinions on artificial intelligence chatbots in dermatology: a national online cross-sectional survey of dermatologists. JDD. 2024;23:972–78. 10.36849/JDD.8239. [DOI] [PubMed] [Google Scholar]
- 28.Choudhury A, Shamszare H. Investigating the impact of user trust on the adoption and use of ChatGPT: survey analysis. J Med Internet Res. 2023;25:e47184. 10.2196/47184. [DOI] [PMC free article] [PubMed]
- 29.Estrada Alamo CE, Diatta F, Monsell SE, Lane-Fall MB. Artificial intelligence in anesthetic care: a survey of physician anesthesiologists. Anesth Analg. 2024;138:938–50. 10.1213/ANE.0000000000006752. [DOI] [PubMed] [Google Scholar]
- 30.Sullivan GM, Artino AR. Analyzing and interpreting data from likert-type scales. J Grad Med Educ. 2013;5:541–42. 10.4300/JGME-5-4-18. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Tavakol M, Dennick R. Making sense of Cronbach’s alpha. Int J Med Educ. 2011;2:53–55. 10.5116/ijme.4dfb.8dfd. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Ng JY, Maduranayagam SG, Suthakar N, Li A, Lokker C, Iorio A, et al. Attitudes and perceptions of medical researchers towards the use of artificial intelligence chatbots in the scientific process: an international cross-sectional survey. Lancet Digit Health. 2025;7:e94–102. 10.1016/S2589-7500(24)00202-4. [DOI] [PubMed]
- 33.Tao W, Yang J, Qu X. Utilization of, perceptions on, and intention to use AI chatbots among medical students in China: national cross-sectional study. JMIR Med Educ. 2024;10:e57132. 10.2196/57132. [DOI] [PMC free article] [PubMed]
- 34.Dai Q, Li M, Yang M, Shi S, Wang Z, Liao J, et al. Attitudes, perceptions, and factors influencing the adoption of AI in health care among medical staff: nationwide cross-sectional survey study. J Med Internet Res. 2025;27:e75343. 10.2196/75343. [DOI] [PMC free article] [PubMed]
- 35.Pang KH, Webb TE, Esperto F, Osman NI. Is urologist burnout different on the other side of the pond? A European perspective. Can Urol Assoc J. 2021;15:S25–30. 10.5489/cuaj.7227. [DOI] [PMC free article] [PubMed]
- 36.Jacobsen FM, Jensen CFS, Schmidt MLK, Qin Y, Akselberg NJ, Sønksen J, et al. Burnout among urologists from Denmark and Michigan. Urology. 2021;147:68–73. 10.1016/j.urology.2020.07.066. [DOI] [PubMed] [Google Scholar]
- 37.Zhou Z. Evaluation of ChatGPT’s capabilities in medical report generation. Cureus. 2023;15:e37589. 10.7759/cureus.37589. [DOI] [PMC free article] [PubMed]
- 38.Yang X, Chen A, PourNejatian N, Shin HC, Smith KE, Parisien C, et al. A large language model for electronic health records. NPJ Digit Med. 2022;5:194. 10.1038/s41746-022-00742-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.https://openai.com/policies/terms-of-use/.
- 40.Duffourc M, Gerke S. Generative AI in health care and liability risks for physicians and safety concerns for patients. JAMA. 2023;330:313. 10.1001/jama.2023.9630. [DOI] [PubMed] [Google Scholar]
- 41.Duffourc MN, Gerke S. The proposed EU directives for AI liability leave worrying gaps likely to impact medical AI. NPJ Digit Med. 2023;6:77. 10.1038/s41746-023-00823-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Luo R, Sun L, Xia Y, Qin T, Zhang S, Poon H, et al. BioGPT: generative pre-trained transformer for biomedical text generation and mining. Briefings Bioinf. 2022;23:bbac409. 10.1093/bib/bbac409. [DOI] [PubMed]
- 43.Koo M. The importance of proper use of ChatGPT in medical writing. Radiology. 2023;307:e230312. 10.1148/radiol.230312. [DOI] [PubMed]
- 44.Cascella M, Montomoli J, Bellini V, Bignami E. Evaluating the feasibility of ChatGPT in healthcare: an analysis of multiple clinical and research scenarios. J Med Syst. 2023;47:33. 10.1007/s10916-023-01925-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Shen Y, Heacock L, Elias J, Hentel KD, Reig B, Shih G, et al. ChatGPT and Other large language models are double-edged swords. Radiology. 2023;307:e230163. 10.1148/radiol.230163. [DOI] [PubMed]
- 46.Kung TH, Cheatham M, Medenilla A, Sillos C, De Leon L, Elepaño C, et al. Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models. PLoS Digit Health. 2023;2:e0000198. 10.1371/journal.pdig.0000198. [DOI] [PMC free article] [PubMed]
- 47.Khan RA, Jawaid M, Khan AR, Sajjad M. ChatGPT - Reshaping medical education and clinical management. Pak J Med Sci. 2023;39:605–07. 10.12669/pjms.39.2.7653. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Jin JQ, Dobry AS. ChatGPT for healthcare providers and patients: practical implications within dermatology. J Am Acad Dermatol. 2023;89:870–71. 10.1016/j.jaad.2023.05.081. [DOI] [PubMed] [Google Scholar]
- 49.Zhang W, Cai M, Lee HJ, Evans R, Zhu C, Ming C. AI in medical education: global situation, effects and challenges. Educ Inf Technol. 2024;29:4611–33. 10.1007/s10639-023-12009-8. [Google Scholar]
- 50.Mirchi N, Bissonnette V, Yilmaz R, Ledwos N, Winkler-Schwartz A, Del Maestro RF. The virtual operative assistant: an explainable artificial intelligence tool for simulation-based training in surgery and medicine. PLoS One. 2020;15:e0229596. 10.1371/journal.pone.0229596. [DOI] [PMC free article] [PubMed]
- 51.Gravel J, D’Amours-Gravel M, Osmanlliu E. Learning to fake it: limited responses and fabricated references provided by ChatGPT for medical questions. Mayo Clinic Proc Digit Health. 2023;1:226–34. 10.1016/j.mcpdig.2023.05.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Jamaluddin J, Gaffar NA, Din NSS. Hallucination: a key challenge to artificial intelligence-generated writing. Malays Fam Physician. 2023;18:68. 10.51866/lte.527. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Meskó B, Topol EJ. The imperative for regulatory oversight of large language models (or generative AI) in healthcare. NPJ Digit Med. 2023;6:120. 10.1038/s41746-023-00873-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Eysenbach G. The role of ChatGPT, generative language models, and artificial intelligence in medical education: a Conversation with ChatGPT and a call for papers. JMIR Med Educ. 2023;9:e46885. 10.2196/46885. [DOI] [PMC free article] [PubMed]
- 55.Kitamura FC. ChatGPT is shaping the future of medical writing but still requires human judgment. Radiology. 2023;307:e230171. 10.1148/radiol.230171. [DOI] [PubMed]
- 56.Eke DO. ChatGPT and the rise of generative AI: threat to academic integrity? J Responsib Technol. 2023;13:100060. 10.1016/j.jrt.2023.100060. [Google Scholar]
- 57.https://www.icmje.org/recommendations/browse/roles-and-responsibilities/defining-the-role-of-authors-and-contributors.html.
- 58.King MR, chatGPT. A conversation on artificial intelligence, chatbots, and plagiarism in higher education. Cell Mol Bioeng. 2023;16:1–2. 10.1007/s12195-022-00754-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Generative Pre-Trained Transformer C, Zhavoronkov A. Rapamycin in the context of Pascal’s Wager: generative pre-trained transformer perspective. Oncoscience. 2022;9:82–84. 10.18632/oncoscience.571. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Meyer JG, Urbanowicz RJ, Martin PCN, O’Connor K, Li R, Peng P-C, et al. ChatGPT and large language models in academia: opportunities and challenges. Biodata Min. 2023;16(20):s13040–023–00339–9. 10.1186/s13040-023-00339-9. [DOI] [PMC free article] [PubMed]
- 61.Sallam M. ChatGPT utility in healthcare education, research, and practice: systematic review on the promising perspectives and valid concerns. Healthcare (Basel). 2023;11:887. 10.3390/healthcare11060887. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Rajpurkar P, Chen E, Banerjee O, Topol EJ. AI in health and medicine. Nat Med. 2022;28:31–38. 10.1038/s41591-021-01614-0. [DOI] [PubMed] [Google Scholar]
- 63.Sanderson K. GPT-4 is here: what scientists think. Nature. 2023;615:773. 10.1038/d41586-023-00816-5. [DOI] [PubMed] [Google Scholar]
- 64.Truhn D, Reis-Filho JS, Kather JN. Large language models should be used as scientific reasoning engines, not knowledge databases. Nat Med. 2023;29:2983–84. 10.1038/s41591-023-02594-z. [DOI] [PubMed] [Google Scholar]
- 65.Khalifa M, Albadawy M. Using artificial intelligence in academic writing and research: an essential productivity tool. Comput Methods Programs Biomed Update. 2024;5:100145. 10.1016/j.cmpbup.2024.100145. [Google Scholar]
- 66.D’Amico RS, White TG, Shah HA, Langer DJ. I asked a ChatGPT to write an editorial about how we can incorporate chatbots into neurosurgical research and patient care …. Neurosurgery. 2023;92:663–64. 10.1227/neu.0000000000002414. [DOI] [PubMed] [Google Scholar]
- 67.Mennella C, Maniscalco U, De Pietro G, Esposito M. Ethical and regulatory challenges of AI technologies in healthcare: a narrative review. Heliyon. 2024;10:e26297. 10.1016/j.heliyon.2024.e26297. [DOI] [PMC free article] [PubMed]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The de-identified dataset will be made publicly available in a GitHub repository(https://github.com/dollarzzz/Chinese-Physicians-on-Generative-Artificial-Intelligence-Implementation-and-Challenges.git) upon manuscript acceptance for publication. Given that the data contain information about practicing physicians, access will be restricted to academic and non-commercial research purposes, requiring users to accept the data sharing agreement prior to download.





