Skip to main content
Plastic and Reconstructive Surgery Global Open logoLink to Plastic and Reconstructive Surgery Global Open
. 2025 Jun 12;13(6):e6871. doi: 10.1097/GOX.0000000000006871

Diagnostic Accuracy of Microsoft’s Copilot Artificial Intelligence in Chronic Wound Assessment: A Comparative Study

Kirollos Tadrousse *, Catherine A Cash *, Madhulika R Kastury *, Noelle Thompson *, Richard Simman †,‡,
PMCID: PMC12160731  PMID: 40510430

Abstract

Background:

Chronic wounds affect approximately 2.5% of the US population and can cause severe complications if not identified and treated promptly. Artificial intelligence tools such as Microsoft’s Copilot have the potential to expedite diagnosis, but their clinical diagnostic accuracy remains underexplored.

Methods:

Ten chronic wound cases were selected from the publicly available database of the Silesian University of Technology. Images and demographic data were entered into Copilot, which generated the top 3 differential diagnoses for each case. Diagnostic accuracy was evaluated using a predefined scoring system. Statistical analysis included descriptive statistics, the Wilcoxon signed-rank test, bootstrapping, the Fisher–Pitman permutation test, Cohen kappa, and Fisher exact test.

Results:

Copilot correctly identified the primary diagnosis in 30% of cases and included the correct diagnosis within its top 3 differentials in 70% of cases. The mean diagnostic score was 1.7 (median: 2, SD: 1.25, variance: 1.57). The Wilcoxon test indicated no significant deviation from the median reference value (P = 0.6364), whereas bootstrapping yielded a 95% confidence interval of 1–4. The permutation test demonstrated a significant difference from the null hypothesis (P = 0.017), and the Cohen kappa revealed perfect agreement (kappa = 1, P = 0.00157). The Fisher exact test showed no significant association between primary and top 3 diagnostic accuracy (P = 0.20).

Conclusions:

Microsoft Copilot demonstrated limited diagnostic accuracy in chronic wound assessment, underscoring the need for cautious integration into clinical workflows. Broader datasets and more rigorous validation are crucial for enhancing artificial intelligence–supported diagnostics in wound care.


Takeaways

Question: What is the accuracy of Microsoft’s Copilot artificial intelligence in diagnosing chronic wounds?

Findings: In this comparative study, we used a grading system to assign scores based on the correctness of the artificial intelligence–generated diagnoses compared with clinician-provided diagnoses. We found that Copilot has limited accuracy in chronic wound assessment.

Meaning: Artificial intelligence systems have the potential to reduce the time to diagnose and treat chronic wounds; however, this study highlighted the importance of further validating and optimizing these systems before their integration into clinical practice.

INTRODUCTION

Chronic wounds represent a significant health burden, affecting nearly 2.5% of the US population. These wounds, including venous leg ulcers, diabetic foot ulcers, and pressure ulcers, can lead to severe complications if not promptly diagnosed and treated. Complications such as prolonged hospitalization, increased risk of infections, and amputations significantly impact patients’ quality of life and impose substantial economic costs on healthcare systems.1,2

Timely diagnosis and management of chronic wounds are critical to prevent adverse outcomes. However, several barriers hinder the effective diagnosis and treatment. Social determinants of health, such as socioeconomic status, education level, and access to healthcare services, play a crucial role in wound care disparities. There is a significant shortage of clinicians and specialized wound care centers in many parts of the country, particularly in rural and underserved urban areas, leading to delays in diagnosis and treatment.3

Artificial intelligence (AI) has emerged as a promising tool to address these challenges. AI systems, particularly those based on machine learning and natural language processing, have shown potential in various diagnostic and treatment planning applications. These technologies can analyze large datasets, recognize patterns, and provide diagnostic suggestions, potentially reducing the diagnosis time and improving treatment accuracy.2 The ability of AI to process and analyze medical images, patient records, and other health data could be particularly beneficial for chronic wound care. Despite the potential benefits, the application of AI in medical diagnosis, especially for chronic wounds, still needs to be explored, and its accuracy is primarily unknown. Concerns about the reliability, safety, and ethical implications of AI systems in healthcare necessitate rigorous evaluation before widespread adoption.

Recent advancements in generative AI models, such as Microsoft’s Copilot, powered by ChatGPT-4, show promise, but their clinical performance needs thorough assessment.4 Leveraging large-scale language models, Copilot has the potential to assist in medical diagnoses by providing differential diagnoses based on input data, such as medical images and patient demographics. Additionally, Copilot accepts images as input at no additional cost, potentially reducing diagnosis time without extra fees. However, the accuracy and clinical applicability of AI systems in diagnosing chronic wounds have yet to be systematically evaluated.

This study aimed to assess the diagnostic accuracy of Copilot in identifying chronic wounds. By comparing the AI-generated differential diagnoses with those provided by experienced clinicians, this research seeks to determine the effectiveness of Copilot as a diagnostic tool for chronic wound care. This evaluation is a crucial step in understanding the potential and limitations of AI in medical diagnostics and guiding the future development and implementation of AI technologies in clinical practice.

METHODS

Study Design and Case Selection

This comparative study evaluated the diagnostic accuracy of Microsoft’s Copilot in identifying chronic wound etiologies. Chronic wound cases were included from the Chronic Wounds Image Database (WoundDB), developed by the Silesian University of Technology in Gliwice, Poland. WoundDB is a publicly accessible resource containing multimodal images of chronic wounds, including venous leg ulcers, diabetic foot ulcers, and pressure ulcers. In addition to the case images, the database includes limited patient information such as age and sex. The cases were randomly selected from each wound type to ensure a representative and diverse sample. AI-generated differential diagnoses from Copilot were compared with clinician-provided diagnoses in the WoundDB database.

Data Collection

For each selected case, high-resolution wound images, basic patient demographics (age and sex), and clinician-provided diagnosis were collected from the WoundDB database. (See figure, Supplemental Digital Content 1, which displays wound images uploaded to Microsoft Copilot for analysis, https://links.lww.com/PRSGO/E114.) An advanced large language model developed by OpenAI, ChatGPT-4 integrates natural language processing capabilities to interpret inputs and create outputs. A standardized prompt was developed and applied for each case to ensure consistency in eliciting responses. The prompt used was, “Provide a ranked top three differential diagnoses of the wound etiology based on the information.” Along with the prompt, case images and basic patient demographic information (age and sex) were input into Copilot. Finally, Copilot provided its top 3 differential diagnoses for each case, ranked in order of likelihood, based on the images and demographic data provided.

Analysis and Evaluation Criteria

For each of the 10 cases, the Copilot outputs were recorded for further analysis. The accuracy of Copilot in differential diagnoses was evaluated using a predefined scoring system:

  • Three points: Correct primary diagnosis (top-ranked).

  • Two points: Correct secondary diagnosis (second position).

  • One point: Correct tertiary diagnosis (third position).

  • Zero points: Diagnosis not included in the top 3 differentials.

This grading system quantified the performance of Copilot by evaluating its ability to identify the correct diagnosis and prioritize it appropriately.

Statistical Analysis

Statistical analysis was performed using R software to compare the accuracy of differential diagnoses from Copilot with clinician-provided diagnoses. Various tests and descriptive measures—including mean, median, SD, and variance of the AI scores—were calculated to comprehensively assess its performance. The Wilcoxon signed-rank test determined if the median score significantly differed from a hypothetical median value of 1.5, chosen as the midpoint of the scoring range (0–3). It provides a balanced threshold to evaluate whether the performance of AI is better or worse than average. Bootstrapping was used to calculate a 95% confidence interval for the median score, offering a robust measure of its variability and reliability. An approximate 2-sample Fisher–Pitman permutation test compared AI scores against a null hypothesis score, assessing whether the performance of AI differed significantly from a random distribution. The Cohen kappa was calculated to measure agreement between the AI diagnoses and the accurate diagnoses, offering a standardized measure of the consistency between AI predictions and actual clinical outcomes. Finally, the Fisher exact test evaluated the significance of the correlation between the accuracy of the primary diagnosis from Copilot and the accuracy within its top 3 differentials. This test was chosen due to the small sample size and suitability for categorical data.

RESULTS

This study analyzed 10 chronic wound cases to assess the diagnostic accuracy of Copilot. The cases included a diverse range of chronic wounds, such as venous leg ulcers, diabetic foot ulcers, and pressure ulcers. Each case consisted of high-resolution images of the wound and essential patient demographic data, including age and sex.

Diagnostic Accuracy

The accuracy of differential diagnoses from Copilot was evaluated based on a predefined grading system, which assigned scores depending on the correctness of the AI-generated diagnoses compared with the clinician-provided diagnoses. Of the 10 chronic wound cases analyzed (n = 10), the most probable diagnosis from Copilot (score = 3) was accurate in 30% of cases, correctly identifying the primary diagnosis in 3 cases. The accuracy of Copilot in providing a correct diagnosis within its top 3 differentials (score > 0) was 70%, indicating it correctly identified the primary diagnosis in 1 of the top 3 potential diagnoses in 7 of the 10 cases. For the 5 venous insufficiency cases analyzed (n = 5), the most probable diagnosis of Copilot (score = 3) was accurate in 20% of cases, correctly identifying the primary diagnosis for 1 out of the 5 cases (Table 1). For these cases, the accuracy of Copilot in providing a correct diagnosis within its top 3 differentials (score > 0) was 80%, meaning it correctly identified the primary diagnosis in 1 of the top 3 potential diagnoses in 4 cases (Table 2).

Table 1.

Bing AI Differential Diagnosis Rank List

Case Diagnosis AI Diagnosis Ranked (in Order) Score
1 Venous ulcer Venous stasis ulcer 3
Cellulitis
Arterial ulcer
2 Venous ulcer Cellulitis 0
Contact dermatitis
Gout or inflammatory arthritis
3 Wound infection Cellulitis 3
Abscess
Venous ulcer
4 Amputation Infection 2
Trauma
Ulceration
5 Venous ulcer Cellulitis 2
Deep vein thrombosis
Contact dermatitis
6 Diabetic foot Cellulitis 0
Deep vein thrombosis
Gout
7 Lymphatic ulcer Cellulitis 0
Abscess
Venous insufficiency
8 Venous insufficiency Cellulitis 2
Deep vein thrombosis
Eczema or dermatitis
9 Wound infection Cellulitis 3
Contact dermatitis
Burns
10 Venous insufficiency Cellulitis 2
Venous stasis dermatitis
Contact dermatitis

This table lists the clinical diagnosis for each chronic wound case, along with the top 3 differential diagnoses from Copilot, and the scored differential diagnoses based on the predefined grading system.

Table 2.

Diagnostic Accuracy of Bing AI for Chronic Wounds

Case Type No. Cases, n Most Probable Diagnosis Accuracy (Score = 3) Top 3 Differential Accuracy (Score > 0)
All chronic wounds 10 30% (3/10) 70% (7/10)
Venous insufficiency cases 5 20% (1/5) 80% (4/5)

This table summarizes the diagnostic accuracy of Copilot for chronic wounds. It shows the percentage of cases where the most probable diagnosis from Copilot was correct, and the percentage where the correct diagnosis was included within the top 3 differentials.

Statistical Analysis

Descriptive statistics summarized the performance of the AI, including mean, median, SD, and variance (Table 3). The mean score was 1.7, the median score was 2, the SD was 1.25, and the variance was 1.57. The frequency distribution showed that the AI scored 0 points in 3 cases, 2 points in 4 cases, and 3 points in 3 cases (Table 4).

Table 3.

Descriptive Statistics of Bing AI Predicted Scores

Statistic Value
Mean score 1.7
Median score 2
SD 1.25
Variance 1.57

The mean, median, SD, and variance of the predicted scores by Copilot for the chronic wound cases are shown.

Table 4.

Frequency Distribution of Bing AI Scores

Score Frequency
0 3
2 4
3 3

The frequency distribution of the scores (0, 2, and 3) given by Copilot is shown for the 10 chronic wound cases.

The Wilcoxon signed-rank test with continuity correction indicated no significant difference from the hypothetical median value of 1.5 (V = 32.5, P = 0.6364), suggesting that the median performance of the AI was not significantly different from an average score. Bootstrapping provided a 95% confidence interval for the median score, ranging from 1 to 4, indicating variability in the performance of the AI (Table 5). An approximate 2-sample Fisher–Pitman permutation test showed a statistically significant difference from the null hypothesis (Z = 2.1501, P = 0.017), suggesting the scores of the AI significantly differed from a random distribution (Table 6). The Cohen kappa indicated perfect agreement between AI’s diagnoses and proper diagnoses (kappa = 1, Z = 3.16, P = 0.00157), though this should be interpreted with caution due to the small sample size (Table 7).

Table 5.

Bootstrapping Results for Median Score

Statistic Value
Original mean 2
Bias −0.0715
Standard error 0.5912
95% CI (lower) 1
95% CI (upper) 4

This table presents the results of the Wilcoxon signed-rank test with continuity correction. It shows the test statistic (V) and the P value, indicating whether the median score is significantly different from a hypothetical median value of 1.5.

CI, confidence interval.

Table 6.

Fisher–Pitman Permutation Test Results

Test Statistic Value
Z 2.1501
P 0.017

The Fisher–Pitman permutation test, indicating whether the AI scores are significantly different from a random distribution centered around the hypothetical median, is shown. P values less than or equal to 0.05 were considered statistically significant.

Table 7.

The Cohen Kappa for Agreement Between AI and True Diagnoses

Statistic Value
Kappa 1
z 3.16
P 0.00157

The Cohen kappa statistic, which measures the agreement between Copilot diagnoses and the true diagnoses, is shown. P values less than or equal to 0.05 were considered statistically significant.

A Fisher exact test was used to evaluate the correlation between the accuracy of the primary diagnosis of Copilot and the accuracy within its top 3 differentials for all chronic wound cases (Table 8). The analysis revealed no significant correlation (P = 0.20). A similar study of venous insufficiency cases also showed no significant correlation (P = 0.083).

Table 8.

The Fisher Exact Test Results for Correlation of Diagnostic Accuracy

Case Type P
All chronic wounds 0.20
Venous insufficiency cases 0.083

The Fisher exact test for evaluating the correlation between the accuracy of the primary diagnosis from Copilot and the accuracy within its top 3 differentials for all chronic wounds and venous insufficiency cases is shown. P values less than or equal to 0.05 were considered statistically significant.

DISCUSSION

The findings of this study contribute to the ongoing discussion regarding the utility and reliability of AI systems in medical diagnostics, particularly for chronic wound assessment. Despite the potential of AI to expedite diagnosis and improve patient outcomes, the results suggest that Copilot exhibited limited diagnostic accuracy in identifying chronic wound etiology.

Evaluation of Copilot Performance

Microsoft’s Copilot demonstrated moderate accuracy in diagnosing chronic wounds, correctly identifying the primary diagnosis in 30% of cases and including the correct diagnosis in its top 3 differentials 70% of the time. For venous insufficiency cases, the correct diagnosis appeared in the top 3 differential diagnoses 80% of the time, although it was identified as the primary diagnosis in only 20%. These findings suggest that although Copilot can recognize distinct visual features, such as hemosiderin staining and ulcer location, it struggles to differentiate conditions with overlapping clinical presentations.2,5

Although Copilot does not explicitly incorporate Bayesian inference, its ranked list of differentials reflects an inherent form of probabilistic reasoning, with confidence levels assigned based on learned patterns. This suggests that providing structured clinical inputs, such as patient history, could further refine these outputs and improve diagnostic accuracy, particularly for complex cases with ambiguous presentations. This underscores the need for enhanced algorithms and robust training datasets for generative AI systems.

Comparison With Previous Studies

The results align with other previous research demonstrating the challenges in AI-based diagnostic tools. For instance, machine learning has shown promise in burn wound evaluation but requires further confirmation to ensure reliability in clinical practice.2 Similarly, research on AI-enabled applications and deep learning algorithms for pressure injury assessment demonstrates technological advancements while emphasizing the need for further verification.6,7 These studies collectively reinforce that although AI has potential, its performance in complex medical diagnostics, such as chronic wounds, requires substantial improvement.

Social Determinants of Health

Social determinants of health, including socioeconomic status, education, healthcare access, and social support, heavily influence chronic wound care disparities. Individuals in lower socioeconomic groups often face delays in diagnosis and treatment due to clinician shortages, especially in rural and underserved areas, increasing the risk of complications such as infections and amputations.8 Education level further impacts health literacy, affecting patients’ ability to manage chronic wounds effectively.9 Employment in physically demanding jobs may exacerbate wounds, whereas limited social support networks can hinder treatment adherence, particularly for isolated individuals who lack emotional and practical assistance.10 Chronic wounds not only impose a physical burden but also result in substantial emotional and economic costs, further exacerbating disparities in care.11

Microsoft’s Copilot was specifically chosen for this study because, at the time of data collection, it was one of the few generative AI systems that accepted images as input at no cost. This feature is particularly important, as free access to Copilot helps alleviate disparities in chronic wound care. By reducing the cost of diagnostic tools, this study aimed to assess the accuracy of Copilot and its potential to reduce time to diagnosis, which could significantly improve the treatment of chronic wounds. Since the completion of this study, other generative AI systems have introduced similar features to take in images as input at no cost and have demonstrated improved performance in diagnosing various other wounds.12 These advancements further highlight the growing potential of AI tools in improving diagnostic accuracy and accessibility.

Clinical and Ethical Implications for AI in Chronic Wound Care

The limitations of the diagnostic exactness of Copilot underline the need for careful integration into clinical practice. Although Copilot offers accessible and computationally efficient diagnostic support, its performance falls short of the standards required for reliable decision-making. Similar to the caution advised in radiology,8 AI tools must be applied carefully to avoid overreliance without adequate human oversight. The transformative potential of AI lies in its ability to complement, rather than replace, clinician expertise—a concept essential to fostering trust and ensuring patient safety.13

AI-induced Biases and Human Oversight

Bias in AI systems such as Copilot and ChatGPT poses significant challenges in medical image analysis, particularly in applications such as wound assessments. Such biases can originate from various stages of AI development, including data collection, algorithm design, or environmental factors that can result in performance disparities across patient demographics. A significant contributor to AI bias includes dataset bias, where certain skin tones, wound types, and patient demographics may be under- or overrepresented. For instance, underrepresentation of diverse populations in data used in AI training can exacerbate inequalities in medical decision-making and diagnostics.14 Moreover, it is difficult for AI systems to access high-quality data sets with a variety of patient populations, particularly due to privacy laws.15 Systematic reviews reveal that AI systems can exhibit variability when applied to datasets lacking demographic diversity, as shown in the study by Wilhelm et al,16 which highlights the limited generalizability of AI systems tested on homogeneous, high-income populations.

Mitigating these AI-induced biases requires a multifaceted approach. Strategies such as curating diverse datasets ensure fair representation of all demographics, whereas posttraining interventions such as reinforcement learning with human feedback align AI outputs with human-centered goals. This is demonstrated by Open AI’s fine-tuning of generative pre-trained transformer models, which has shown to be effective in improving AI output quality and reducing biases by aligning AI outputs with human outputs and fairness goals.17 However, integrating human validation and clinician oversight remains crucial, given AI biases and in particularly high-risk applications such as wound care. Tools such as the Quality Analysis of Medical AI framework can provide a structured methodology to evaluate AI outputs for accuracy, relevance, and fairness.18 Combined with continuous monitoring and retraining on updated and diverse data sets, this can help ensure AI tools such as Copilot can deliver equitable, reliable, and clinically useful outcomes.

Limitations

The findings of this study were constrained by several key factors that impede generalizability. A small sample size was used, as only 10 cases were selected for analysis. This was due to the limited number of images available in the database, with each case represented by a single image. The diversity of wound etiology in the database was also relatively small, as the selected images represented the most common types of cases. Nevertheless, this study was intended as an exploratory analysis, and although the sample size is constrained, it provides valuable insights into the potential of generative AI for diagnosing chronic wound etiology. Future studies with larger datasets will aim to validate and expand upon these findings.

Another limitation of our study was the use of a database that presents some other challenges. The image quality across the cases is somewhat poor and not standardized, with limitations in visualizing wound depth, details, and other critical features that aid in making an accurate diagnosis. Additionally, the patient demographic and clinical information provided with each case is extremely limited. We hypothesize that with more comprehensive patient and clinical data, or the inclusion of a screening questionnaire (such as pertinent medical history such as diabetes, history of paralysis, or recent trauma to the skin), the AI system could have had higher diagnostic accuracy. These factors are important for improving the robustness of AI-based diagnostic tools in wound care. Nevertheless, our study assessed the capability of Microsoft Copilot to diagnose wounds based on image-based recognition, serving as a preliminary step toward using generative AI models that integrate both visual and clinical data. Although this approach allows for an evaluation of generative AI models’ ability to classify wounds based on appearance alone, it does not fully replicate real-world clinical decision-making, where contextual factors and history play a crucial role.

It is also important to acknowledge that these results pertain specifically to Microsoft Copilot and may not apply to other AI systems. Variability in AI performance across models can arise from differences in algorithm designs, training datasets, and intended clinical application. For instance, models explicitly trained on wound care data may outperform more generalized platforms such as Copilot. Additionally, demographic representation within training datasets remains a critical factor, as studies have shown that AI tools can exhibit reduced accuracy in underrepresented populations.11,19

To address these limitations, future research should prioritize the development of standardized protocols for image capture, validation across diverse patient populations, and comparisons with other AI systems tailored for wound diagnosis. Expanding and diversifying training datasets will be essential to improve AI adaptability, accuracy, and equity across clinical settings.

Future Directions

Although this study evaluated the diagnostic accuracy of Microsoft Copilot based solely on wound images and basic patient demographics, future studies should focus on integrating more extensive data sources to further enhance clinical applicability. Incorporating patient history such as comorbidities, smoking status, and history of vascular disease; wound progression data; and physician input such as bedside assessment will provide a more comprehensive framework for AI-assisted wound diagnosis. By incorporating both visual and clinical data, there is the potential to improve the diagnostic accuracy of generative AI models and improve clinical decision-making.

CONCLUSIONS

This study underscored both the promise and limitations of Microsoft Copilot as a diagnostic tool for chronic wound assessment. Although Copilot correctly included the primary diagnosis in its top 3 differentials 70% of the time, its accuracy in identifying the primary diagnosis alone was limited to 30%. These findings reflect broader challenges observed in AI-based diagnostics, where tools may recognize key visual features but struggle with complex conditions requiring nuanced clinical interpretation. To enhance performance, AI systems must be trained on large, diverse datasets that reflect real-world clinical variability and demographic diversity.18,20 This study also emphasized the importance of advanced assessment by trained healthcare professionals to ensure diagnosis of chronic wounds.

Beyond technical performance, the ethical and practical integration of AI into clinical care is critical. Ensuring that AI systems perform equitably across diverse populations is essential to avoid exacerbating healthcare disparities.19 Additionally, data privacy, clinician oversight, and reliability must remain at the forefront to support AI adoption as a trusted and integrative tool in clinical decision-making.11 Addressing these challenges through interdisciplinary collaboration will enable AI tools to improve diagnostic precision, enhance clinical workflows, and ultimately deliver better patient outcomes in chronic wound care.

DISCLOSURE

The authors have no financial interest to declare in relation to the content of this article.

Supplementary Material

gox-13-e6871-s001.pdf (88.2KB, pdf)

Footnotes

Published online 12 June 2025.

Disclosure statements are at the end of this article, following the correspondence information.

Related Digital Media are available in the full-text version of the article on www.PRSGlobalOpen.com.

The data used in this literature review are derived from publicly available sources, primarily academic databases such as PubMed. All referenced studies and articles are appropriately cited in the references section.

REFERENCES

  • 1.Sen CK. Human wound and its burden: updated 2020 compendium of estimates. Adv Wound Care (New Rochelle). 2021;10:281–292. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Huang S, Dang J, Sheckter CC, et al. A systematic review of machine learning and automation in burn wound evaluation: a promising but developing frontier. Burns. 2021;47:1691–1704. [DOI] [PubMed] [Google Scholar]
  • 3.Valdés AM, Angderson C, Giner JJ. A multidisciplinary, therapy-based, team approach for efficient and effective wound healing: a retrospective study. Ostomy Wound Manage. 1999;45:30–36. [PubMed] [Google Scholar]
  • 4.Suthar PP, Kounsal A, Chhetri L, et al. Artificial intelligence (AI) in radiology: a deep dive into ChatGPT 4.0’s accuracy with the American Journal of Neuroradiology’s (AJNR) “Case of the Month”. Cureus. 2023;15:e43958. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Pugliese G, Maccari A, Felisati E, et al. Are artificial intelligence large language models a reliable tool for difficult differential diagnosis? An a posteriori analysis of a peculiar case of necrotizing otitis externa. Clin Case Rep. 2023;11:e7933. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Lau CH, Yu KH, Yip TF, et al. An artificial intelligence-enabled smartphone app for real-time pressure injury assessment. Front Med Technol. 2022;4:905074. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Liu H, Hu J, Zhou J, et al. Application of deep learning to pressure injury staging. J Wound Care. 2024;33:368–378. [DOI] [PubMed] [Google Scholar]
  • 8.Sutherland BL, Pecanac K, Bartels CM, et al. Expect delays: poor connections between rural and urban health systems challenge multidisciplinary care for rural Americans with diabetic foot ulcers. J Foot Ankle Res. 2020;13:32. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Bouldin ED, Taylor LL, Littman AJ, et al. Chronic lower limb wound outcomes among rural and urban veterans. J Rural Health. 2015;31:410–420. [DOI] [PubMed] [Google Scholar]
  • 10.Chrisman CA. Care of chronic wounds in palliative care and end-of-life patients. Int Wound J. 2010;7:214–235. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Olsson M, Järbrink K, Divakar U, et al. The humanistic and economic burden of chronic wounds: a systematic review. Wound Repair Regen. 2019;27:114–125. [DOI] [PubMed] [Google Scholar]
  • 12.Shiraishi M, Kanayama K, Kurita D, et al. Performance of artificial intelligence chatbots in interpreting clinical images of pressure injuries. Wound Repair Regen. 2024;32:652–654. [DOI] [PubMed] [Google Scholar]
  • 13.Temsah M-H, Jamal A, Aljamaan F, et al. ChatGPT-4 and the global burden of disease study: advancing personalized healthcare through artificial intelligence in clinical and translational medicine. Cureus. 2023;15:e39384. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Lastrucci A, Wandael Y, Barra A, et al. Revolutionizing radiology with natural language processing and chatbot technologies: a narrative umbrella review on current trends and future directions. J Clin Med. 2024;13:7337. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Klug K, Beckh K, Antweiler D, et al. From admission to discharge: a systematic review of clinical natural language processing along the patient journey. BMC Med Inform Decis Mak. 2024;24:238. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Wilhelm C, Steckelberg A, Rebitschek FG. Benefits and harms associated with the use of AI-related algorithmic decision-making systems by healthcare professionals: a systematic review. Lancet Reg Health Eur. 2025;48:101145. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Ouyang L, Wu J, Jiang X., et al. Training language models to follow instructions with human feedback. 2022. Preprint available at 10.48550/ARXIV.2203.02155. Accessed March 2025. [DOI] [Google Scholar]
  • 18.Vaira LA, Lechien JR, Abbate V, et al. Validation of the Quality Analysis of Medical Artificial Intelligence (QAMAI) tool: a new tool to assess the quality of health information provided by AI platforms. Eur Arch Otorhinolaryngol. 2024;281:6123–6131. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Dankwa-Mullan I. Health equity and ethical considerations in using artificial intelligence in public health and medicine. Prev Chronic Dis. 2024;21:E64. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Taib BG, Karwath A, Wensley K, et al. Artificial intelligence in the management and treatment of burns: a systematic review and meta-analyses. J Plast Reconstr Aesthet Surg. 2023;77:133–161. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

gox-13-e6871-s001.pdf (88.2KB, pdf)

Articles from Plastic and Reconstructive Surgery Global Open are provided here courtesy of Wolters Kluwer Health

RESOURCES