Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2021 Mar 11;11:5792. doi: 10.1038/s41598-021-84973-5

A meta-analysis of Watson for Oncology in clinical application

Zhou Jie 1,2,#, Zeng Zhiying 3,#, Li Li 1,
PMCID: PMC7952578  PMID: 33707577

Abstract

Using the method of meta-analysis to systematically evaluate the consistency of treatment schemes between Watson for Oncology (WFO) and Multidisciplinary Team (MDT), and to provide references for the practical application of artificial intelligence clinical decision-support system in cancer treatment. We systematically searched articles about the clinical applications of Watson for Oncology in the databases and conducted meta-analysis using RevMan 5.3 software. A total of 9 studies were identified, including 2463 patients. When the MDT is consistent with WFO at the ‘Recommended’ or the ‘For consideration’ level, the overall concordance rate is 81.52%. Among them, breast cancer was the highest and gastric cancer was the lowest. The concordance rate in stage I–III cancer is higher than that in stage IV, but the result of lung cancer is opposite (P < 0.05).Similar results were obtained when MDT was only consistent with WFO at the "recommended" level. Moreover, the consistency of estrogen and progesterone receptor negative breast cancer patients, colorectal cancer patients under 70 years old or ECOG 0, and small cell lung cancer patients is higher than that of estrogen and progesterone positive breast cancer patients, colorectal cancer patients over 70 years old or ECOG 1–2, and non-small cell lung cancer patients, with statistical significance (P < 0.05). Treatment recommendations made by WFO and MDT were highly concordant for cancer cases examined, but this system still needs further improvement. Owing to relatively small sample size of the included studies, more well-designed, and large sample size studies are still needed.

Subject terms: Medical research, Oncology

Introduction

With the rapid development of human society, cancer-related knowledge is also growing exponentially, which has caused a knowledge gap for clinic physicians1. With the increasing understanding of each patient, more and more information need to be absorbed from the literature in providing evidence-based cancer treatment. Research shows that clinic physicians can only spend 4.6 h a week to acquire the latest professional knowledge2, resulting in a relative delay in information absorption, leading to an increasing gap between the results achieved by academic research centers and the actual situation3. However, compared with physicians in other clinical disciplines, clinical oncologists urgently need to acquire evidence-based medicine knowledge in time to support patients' personalized treatment plans. Consequently, clinicians need some new types of tools to bridge this knowledge gap, support and adopt new treatment methods in an evidence-based manner, so that more patients can benefit from social investment in research and development4,5. Artificial intelligence (AI) first appeared in the early 1950s, which refers to the creation of intelligent machines with functions and reactions like human beings6. The goal of AI is to replicate human mind, that is to say, it can perform tasks such as identification, interpretation, reasoning and transformation, and it is good at the areas that human beings are not good at, such as absorbing a large amount of qualitative information that can recognize the patterns of relevant information7,8. Now AI has gradually entered medicine. Image recognition using AI has been successfully applied to image-based clinical diagnosis, such as melanoma recognition in dermoscopy images9 or detection of diabetic retinopathy in retinal fundus photographs10, and more and more researches on AI are also carried out in oncology1114. AI aims to enhance human capabilities, enable human beings to apply more and more complex knowledge to clinical decision-making, and bring more and more diversified and complex patient data into personalized management. Due to the recent development of cognitive computing technology, its application in clinical oncology still lacks large-scale data, and there are clinical differences in different regions and ethnic groups. Watson for Oncology (WFO), an artificial intelligence assistant decision system, was developed by IBM Corporation (USA) with the help of top oncologists from Memorial Sloan Kettering Cancer Center (MSK). It took more than 4 years of training, based on national comprehensive cancer network (NCCN) cancer treatment guidelines and more than 100 years of clinical cancer treatment experience in the United States, and can recommend appropriate chemotherapy regimens for specific cancer patients. As for supported cases, the treatment recommendations provided by WFO are divided into 3 groups: Recommended, i.e. green "buckets", which represents a treatment supported by obvious evidence; For consideration, i.e. yellow "buckets", which represents a potentially suitable alternative; and Not recommended, i.e. red "buckets", which stands for a treatment with contraindications or obvious evidence against its use. In order to compare the consistency between WFO and clinicians in different countries and regions in various aspects and on a large scale, many hospitals have formed Multidisciplinary Team (MDT), which is composed of oncologists, surgeons, pathologists and radiologists, etc. They discuss the advantages and disadvantages of each candidate treatment scheme and finally determine the treatment scheme. If the concordance is achieved when the MDT recommendation is in the ‘Recommended’/‘Recommended’ or ‘For consideration’ categories of WFO, it is defined as concordant; Otherwise, it is discordant. The results showed that there were obvious differences in the concordance rate of different regions and types of cancers. And so far, there has been no published meta-analysis comparing the consistency of WFO and MDT. Therefore, this study aims to systematically review the literature and provide the latest evidence of WFO's clinical use, analyze the consistency, advantages and disadvantages between WFO's treatment scheme in cancer patients and that of clinicians, and further summarize and analyze WFO's clinical practice, so as to provide references for further clinical application of WFO.

Materials and methods

This meta-analysis is registered in the International Prospective Register of Systematic Reviews (PROSPERO) trial registry (CRD42020199418). In addition and where applicable, the general guidelines of the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) Statement were followed. And this study was performed and prepared according to the guidelines proposed by Cochrane Collaboration (http://www.cochrane-handbook.org).

Literature search

Since WFO started commercial use in 2015, literatures from 2015 onwards were searched. Cochrane Library, PubMed, Excerpta Medica Database (EMbase), China National Knowledge Infrastructure (CNKI), CQVIP and Chinese Biomedicine (CBM) databases (updated until December 31, 2019) were searched using the following terms: artificial intelligence, clinical decision-support system, Watson for Oncology, neoplasm, treatment, Multidisciplinary Team, concordance and comparative study. Other potentially qualified articles were also screened manually.

Inclusion and exclusion criteria

The studies meeting the following criteria would be included:

(a) The clinical use of WFO has been focused on regardless of cancer type, (b) the studies contain at least one subgroup of analysis data, (c) the studies should be original research articles published either in Chinese or English regardless of nationality, (d) the studies have compared the consistency of treatment schemes determined by WFO and MDT, and (e) there is no limit to whether the article is a prospective or a retrospective study and whether blind methods have used.

The following are the major exclusion criteria:

(a) The studies only describe the simple use of WFO and do not involve any data or only WFO research and development process data, (b) the article does not compare the treatment schemes between WFO and MDT, and (c) book chapter, comment, case reports, and other forms without detailed data.

Data extraction and quality assessment

Two investigators evaluated the quality of the literatures and extracted the data independently. Any disagreements were discussed and consulted by an additional independent arbitrator for further resolution. The lack of original data is supplemented by contacting the original author via e-mail. The data were extracted with a standardized table, including (a) general information, such as the title of the publication, first author’s surname, the original document number and source, year of publication and country, (b) research characteristics, such as the eligibility of the research, the characteristics of the research object, the design scheme and quality of the literature, the design scheme and quality of the literature, the specific contents and implementation methods of the research measures, relevant bias prevention measures, and the main test results; (c) data needed for this meta-analysis, such as the total number of cases in each group, and the number of cases of events were collected by the second classification.

According to the Cochrane Reviewers’ Handbook 6.1 (http://www.cochrane-handbook.org), the quality of the literature was evaluated including 7 aspects: random sequence generation (selection bias), allocation concealment (selection bias), blinding of participants and personnel (performance bias), blinding of outcome assessment (detection bias), incomplete outcome data (attrition bias), selective reporting (reporting bias) and other bias, and the judgment of "yes" (low bias), "no" (high bias) and "unclear" (lack of relevant information or uncertainty of bias) is made respectively. Review Manager statistical software (RevMan, version 5.3.5, Cochrane Collaboration Network) was applied to assess the risk-of-bias and provide visual results.

Statistical analysis

RevMan 5.3.5 was also applied to analyze the extracted data. The main purpose of this study was to compare the consistency of treatment schemes determined by WFO and MDT in different cancer types, so the statistical data were dichotomous data (coincidence or non-coincidence). In the analysis, odds ratios (ORs) and the 95% confidence intervals (CIs) were performed for clinic-pathological features (TNM stage, histopathological category, etc.). Q test or I2 test was used to judge the heterogeneity among the studies. When P < 0.05 or I2 > 50%, there was significant heterogeneity among the studies. On the contrary, there was no heterogeneity. When there was no statistical heterogeneity between studies, the fixed effect model was used to merge the results. If there was statistical heterogeneity, we analyzed the causes of heterogeneity, and adopted subgroup analysis or sensitivity analysis. For the documents that still could not eliminate heterogeneity, the data could be combined from the perspective of clinical significance. Random effect model was adopted for combination analysis, and the results were carefully interpreted. If the data provided could not be meta-analyzed, only descriptive analysis would be done.

Results

Characteristics and quality evaluation of eligible studies

A total of 367 relevant publications from January 2015 to December 2019, were obtained from the preliminary search. There were 237 English literatures (Pubmed: 102, Embase: 106, Cochrane Library: 29) and 130 Chinese literatures (CNKI: 43, CQVIP: 47, CBM: 40). After reading the title, abstract and full text successively, 8 articles1522 and 1 conference abstract 23 were finally included, all of which were Non-RCTs published between 2017 and 2019, 7 studies1517,19,20,22,23 were published in English, and 2 studies18,21 in Chinese. The basic process of publication selection, the main characteristics and quality evaluation of included publications have been shown in Fig. 1, Table 1, Supplementary Fig. 1, 2, respectively. Of the 9 studies, 7 studies1517,1922 clearly defined the method of selecting cases, and other studies did not indicate the "randomization" of the included samples. In all studies, WFO and MDT treatment schemes were formulated successively for the same patient in the group, so there was no allocation bias. 7 studies15,16,1822 did not indicate specific blind method implementation plan or did not adopt blind method, but the result judgment and measurement will not be affected. Although two studies16,22 did not provide detailed four-category data, they did not completely affect our meta-analysis, so we believed that all studies had no obvious bias in selective reporting results and ensured the basic integrity of the data, but other biases were still unclear. Because it was of little significance to use Begg’s funnel plot and Egger test to detect publication bias when the number of documents was too small (< 10), no publication bias analysis had been performed in this study. Due to the little difference in the quality of the documents included in this meta-analysis, no further sensitivity analysis had been made. After subgroup analysis, most I2 test results were less than 50%, and there was lower heterogeneity among the studies included in this system evaluation.

Figure 1.

Figure 1

Flow diagram of the study selection process.

Table 1.

Main characteristics of publications included in this meta-analysis.

Study Languages Type Country Sample size Cancer type Details of WFO treatment schemes Outcome measure
Choi 201915 English Article Korea 65 Gastric cancer R + C + NR + NA
Kim 201916 English Article Korea 69 Colorectal cancer R + C + (NR + NA)a ①③④
Zhou 201917 English Article China 362 Colon, rectal, lung, breast, stomach, cervical and ovarian cancer R + C + NR + NA ①②
Hu 201818 Chinese Article China 30 Colon cancer R + C + NR + NA
Liu 201819 English Article China 149 Lung cancer R + C + NR + NA ①②
Somashekhar 201820 English Article India 638 Breast cancer R + C + NR + NA ①②
Xu 201821 Chinese Article China 132 Breast cancer R + C + NR + NA ①②
Lee 201822 English Article Korea 656 Colon cancer R + (C + NR + NA)a ①③④
Somashekhar 201723 English Conference abstract India 362 Colon, rectal and lung cancer R + C + NR + NA

R Recommended, C for consideration, NR not recommended, NA not available.

① Clinical staging, ② Pathological subtypes, ③ ECOG performance status, ④ Age.

aThe data in brackets represent that only the merged data were given, and not each item of data was given separately.

Results of meta-analysis

Overall analysis of consistency between WFO and MDT

Of the 9 included studies, a total of 7 studies15,1721,23 provided four types of complete data (including WFO three types of treatment schemes and unavailable cases) on the consistency of treatment schemes determined by WFO and MDT in different cancer types, involving seven types of cancers including breast cancer, rectal cancer, colon cancer, gastric cancer, lung cancer, ovarian cancer and cervical cancer. Of the 1738 cases included (shown in Supplementary Fig. 3), 959 (55.18%) cases were WFO ‘Recommended’ schemes (green schemes) that were consistent with MDT treatment schemes, 503 cases (28.94%) were ‘For consideration’ (orange schemes), and the sum of the two was 1462 cases (84.12%). However, there were 166 cases (9.55%) that were ‘Not recommend’ scheme (pink scheme) and 110 cases (6.33%) that were not supported by WFO (‘Not available’ scheme).

Under the condition that the MDT recommendations were consistent with the ‘Recommended’ or ‘For consideration’ categories of WFO, we conducted meta-analysis according to different clinical stages of patients (stage I–III vs. stage IV). A total of 8 studies1521,23 were included in the analysis. Of the 1807 cases included, 1473 (81.52%) WFO treatment schemes were consistent with the MDT. The concordance rate of stage I–III was 86.00% (1026/1193), which was higher than 80.78% (496/614) of stage IV. But the meta-analysis results showed that there was a significant statistical heterogeneity (I2 = 83%) at different stages, the meta-analysis was conducted using random effect model (shown in Fig. 2A). The results showed that the difference was not statistically significant, P = 0.20 [OR 1.68, 95% CI (0.76, 3.74)]. In order to further analyze the consistency between MDT and WFO, we analyzed the situation that only WFO ‘Recommended’ was included but ‘For consideration’ was excluded. A total of 9 studies1523 were included in the analysis. Of the 2463 cases included, 1299 (52.74%) WFO treatment schemes were consistent with MDT. The consistency of stage I–III was 56.46% (962/1704), which was greater than 44.40% (337/759) of stage IV. The meta-analysis results showed that there was significant statistical heterogeneity (I2 = 90%) in different stages (shown in Fig. 3A), so we also conducted the meta-analysis using random effect model. The results also showed that the difference was not statistically significant, P = 0.08 [OR 1.77, 95% CI (0.93, 3.40)]. Meta-analysis showed significant statistical heterogeneity (I2 > 50%), so subgroup analysis was further adopted according to tumor classification.

Figure 2.

Figure 2

Forest plot of consistency between WFO (‘Recommended’ or ‘For consideration’) and MDT for patients with various cancers. Treatment was considered concordant if the delivered treatment was rated as either ‘Recommended’ or ‘For consideration’ by WFO and discordant if the delivered treatment was either ‘Not recommended’ by WFO or was physician’s choice (not included in WFO). Overall concordance of various cancers in stages I–III and IV (A). Concordance of various estrogen and progesterone receptors (ER+/PR+ vs. ER−, PR−) in breast cancer (B). Concordance of various pathological types (small cell vs. non-small cell) in lung cancer (C).

Figure 3.

Figure 3

Forest plot of consistency between WFO (only ‘Recommended’) and MDT for patients with various cancers. Treatment was considered concordant if the delivered treatment was rated as ‘Recommended’ by WFO and discordant if the delivered treatment was rated as other options by WFO or was physician’s choice (not included in WFO). Overall concordance of various cancers in stages I–III and IV (A). Concordance of various estrogen and progesterone receptors (ER+/PR+vs. ER−, PR−) in breast cancer (B). Concordance of various performance status (ECOG 0 vs. ECOG 1–2) in colorectal cancer (C). Concordance of various age (< 70-year-old vs. older) in colorectal cancer (D). Concordance of various pathological types (small cell vs. non-small cell) in lung cancer (E).

Subgroup analysis of consistency between WFO and MDT

Consistency between WFO (‘Recommended’ or ‘For consideration’) and MDT

Under the condition that the MDT recommendations were consistent with the ‘Recommended’ or ‘For consideration’ categories of WFO, we conducted meta-analysis according to different clinical stages of patients (stage I–III vs. stage IV). The results showed that the consistency of stage I–III was greater than that of stage IV except lung cancer (shown in Table 2 and Fig. 4). A total of 3 studies17,20,21 (n = 890) were included in our meta-analysis of breast cancer, the results showed that the difference was statistically significant, P = 0.001 [OR 2.29, 95% CI (1.37, 3.82)]. A total of 4 studies1618,23 (n = 398) were included in our analysis of colorectal cancer, the results showed that the difference was statistically significant, P < 0.0001 [OR 3.44, 95% CI (1.91, 6.17)]. A total of 3 studies17,18,23 (n = 181) were included in our analysis of colon cancer, the results showed that the difference was statistically significant, P = 0.04 [OR 2.31, 95% CI (1.06, 5.05)]. A total of 2 studies17,23 (n = 148) were included in our analysis of rectal cancer, the results showed that the difference was not statistically significant, P = 0.17 [OR 3.31, 95% CI (0.60, 18.25)]. A total of 2 studies15,17 (n = 107) were included in our analysis of gastric cancer, the results showed that the difference was statistically significant, P = 0.07 [OR 9.81, 95% CI (0.86, 111.5)]. A total of 3 studies17,19,23 (n = 374) were included in our analysis of lung cancer, the results showed that the difference was not statistically significant, P = 0.08 [OR 0.32, 95% CI (0.09, 1.13)].

Table 2.

Meta-analysis results of consistency between WFO (‘Recommended’ or ‘For consideration’) and MDT for patients with various cancers in stages I–III and IV.

Cancer type Number of studies Sample size Stage I–III Consistency Stage IV Consistency I2 Odds ratio (95% CI) P value
C NC C NC
Breast cancer 3 890 657 68 90.62% 135 30 81.82% 2% 2.29 (1.37, 3.82) 0.001
Colorectal cancer 4 398 218 21 91.21% 119 40 74.84% 0% 3.44 (1.91, 6.17) < 0.0001
Colon cancer 3 181 75 12 86.21% 69 25 73.40% 0% 2.31 (1.06, 5.05) 0.04
Rectal cancer 2 148 100 6 94.34% 33 9 78.57% 53% 3.31 (0.60, 18.25) 0.17
Gastric cancer 2 107 44 20 68.75% 18 25 41.86% 43% 9.81 (0.86, 111.5) 0.07
Lung cancer 3 374 92 54 63.01% 207 21 90.79% 68% 0.32 (0.09, 1.13) 0.08
Totala 8 1807 1026 167 86.00% 496 118 80.78% 83% 1.68 (0.76, 3.74) 0.20

C Concordance cases, NC nonconcordant cases.

aThe number of rectal cancer and colon cancer, which overlaps with colorectal cancer, has been excluded from the total. In addition, the total includes the number of ovarian cancer and cervical cancer.

Figure 4.

Figure 4

Forest plot of consistency between WFO (‘Recommended’ or ‘For consideration’) and MDT for patients (subgroup).

In addition, a total of 3 studies17,20,21 (n = 890) provided data on estrogen and progesterone receptors (ER+/PR+ vs. ER−, PR−) in breast cancer patients, so meta-analysis was further carried out. The results showed (shown in Fig. 2B) that there was not statistically significant difference, P = 0.47 [OR 0.85, 95% CI (0.54, 1.34)]. A total 2 of studies17,19 (n = 262) provided data on pathological types (small cell vs. non-small cell) of lung cancer patients. The results showed that the consistency of small cell lung cancer was higher than that of non-small cell lung cancer (shown in Fig. 2C), and the difference was statistically significant, P = 0.02 [OR 3, 95% CI (1.20, 7.48)].

Consistency between WFO (only ‘Recommended’) and MDT

Under the condition that the MDT recommendations were consistent with only the ‘Recommended’ categories of WFO, we conducted meta-analysis again according to different clinical stages of patients (stage I–III vs. stage IV). Similarly, the results showed that the consistency of stage I–III was greater than that of stage IV except lung cancer (shown in Table 3 and Fig. 5). A total of 3 studies17,20,21 (n = 890) were included in our meta-analysis of breast cancer, the results showed that the difference was not statistically significant, P = 0.37 [OR 1.33, 95% CI (0.72, 2.47)]. A total of 5 studies1618,22,23 (n = 1054) were included in our analysis of colorectal cancer, the results showed that the difference was statistically significant, P < 0.0001 [OR 3.70, 95% CI (1.93, 7.11)]. A total of 4 studies17,18,22,23 (n = 837) were included in our analysis of colon cancer, the results showed that the difference was statistically significant, P = 0.0004 [OR 2.49, 95% CI (1.50, 4.14)]. A total of 2 studies17,23 (n = 148) were included in our analysis of rectal cancer, the results showed that the difference was statistically significant, P = 0.0001 [OR 5.87, 95% CI (2.36, 14.58)]. A total of 2 studies15,17 (n = 107) were included in our analysis of gastric cancer, the results showed that the difference was statistically significant, P = 0.01 [OR 3.48, 95% CI (1.28, 9.43)]. A total of 3 studies17,19,23 (n = 374) were included in our analysis of lung cancer, the results showed that the difference was not statistically significant, P = 0.18 [OR 0.36, 95% CI (0.08, 1.57)].

Table 3.

Meta-analysis results of consistency between WFO (only ‘Recommended’) and MDT for patients with various cancers in stages I–III and IV.

Cancer type Number of studies Sample size Stage I–III Consistency Stage IV Consistency I2 Odds ratio (95% CI) P value
C NC C NC
Breast cancer 3 890 458 267 63.17% 88 77 53.33% 38% 1.33 (0.72, 2.47) 0.37
Colorectal cancer 5 1054 448 302 59.73% 129 175 42.43% 66% 3.70 (1.93, 7.11) < 0.0001
Colon cancer 4 837 325 273 54.35% 99 140 41.42% 27% 2.49 (1.50, 4.14) 0.0004
Rectal cancer 2 148 96 10 90.57% 25 17 59.52% 0% 5.87 (2.36, 14.58) 0.0001
Gastric cancer 2 107 25 39 39.06% 7 36 16.28% 0% 3.48 (1.28, 9.43) 0.01
Lung cancer 3 374 39 107 26.71% 97 131 42.54% 86% 0.36 (0.08, 1.57) 0.18
Totala 9 2463 962 742 56.46% 337 422 44.40% 90% 1.77 (0.93, 3.40) 0.08

C Concordance cases, NC Nonconcordant cases.

aThe number of rectal cancer and colon cancer, which overlaps with colorectal cancer, has been excluded from the total. In addition, the total includes the number of ovarian cancer and cervical cancer.

Figure 5.

Figure 5

Forest plot of consistency between WFO (only ‘Recommended’) and MDT for patients with various cancers in stages I–III and IV (subgroup).

In addition, a total of 3 studies17,20,21 (n = 890) provided data on estrogen and progesterone receptors (ER+/PR+ vs. ER−, PR−) in breast cancer patients. The meta-analysis results showed that the consistency of hormone receptor-positive patients (Luminal A and Luminal B) was lower than that of negative patients (HER2 positive and triple negative), and the difference was statistically significant, P = 0.02 [OR 0.72, 95% CI (0.54, 0.95)] (shown in Fig. 3B). A total of 2 studies16,22 provided data of different performance status (ECOG 0 vs. ECOG 1–2) and age (< 70-year-old vs. older) of colorectal cancer patients. The results showed that the consistency of ECOG 0 patients was higher than that of ECOG 1–2 patients and the difference was statistically significant, P = 0.003 [OR 1.59, 95% CI (1.17, 2.17)] (shown in Fig. 3C); the consistency of patients under 70 years old was higher than that of older, the difference was statistically significant, P = 0.03 [OR 4.06, 95% CI (1.18, 13.97)] (shown in Fig. 3D). A total of 2 studies17,19 (n = 262) provided data on pathological types (small cell vs. non-small cell) of lung cancer patients. The results also showed that the consistency of small cell lung cancer was higher than that of non-small cell lung cancer, and the difference was statistically significant, P < 0.00001 [OR 11.05, 95% CI (4.93, 24.77)] (shown in Fig. 3E).

Discussion

Consistency analysis between WFO and MDT

On the whole, it is found that the consistency of stage I–III of other cancers except lung cancer is better than that of stage IV, and most of the results are statistically significant (P < 0.05), regardless of setting WFO consistent with MDT at the ‘For consideration’ level (‘Recommended’ or ‘For consideration’) or at the ‘Recommended’ level (only ‘Recommended’). At the ‘For consideration’ level, the overall concordance rate of breast cancer is the highest (88.99%), while that of gastric cancer is the lowest (57.94%). The consistency of small cell lung cancer in patients with lung cancer is higher than that of non-small cell lung cancer, and the difference is statistically significant. At the ‘Recommended’ level, the overall concordance rate of rectal cancer is the highest (81.76%), while that of gastric cancer is still the lowest (29.90%). The consistency of hormone receptor-positive patients (Luminal A and B) of breast cancer is lower than that of hormone receptor-negative patients (HER2 positive and triple negative). In colorectal cancer patients, the consistency of ECOG 0 is higher than that of ECOG 1–2 and under 70 years old is higher than older. However, in lung cancer patients, the consistency of small cell lung cancer is still higher than that of non-small cell lung cancer, and the difference is statistically significant.

Advantages of WFO

Besides showing high consistency with MDT in most cancers, WFO, as an artificial intelligence clinical decision support system also has the following advantages: (a) WFO improves doctors' work efficiency and reduces workload. Hu’s study18 showed that using WFO can save an average of 8.2 min per case (the average time for obtaining reports is 7.3 ± 2.2 min, and the average time for MDT consultation is 15.5 ± 6.1 min). There is no need to wait for MDT to discuss together helps to reduce the time required to formulate chemotherapy scheme24, thus shortening the hospitalization time of patients. (b) WFO can prevent man-made calculation errors. Chemotherapy schemes and drug selection involve complicated and time-consuming processes, and there may be errors in selection25,26; it can realize accurate medication through computer programs to prevent such errors20,27. (c) WFO can improve the quality of doctor-patient communication and prevent doctor-patient disputes. Nowadays, due to a variety of reasons, patients' distrust of doctors is increasing in China28,29. The more patients participate in the decision-making of their own therapeutic regimen and understand the incidence of adverse events and other information, the more they have confidence in the therapeutic regimen and will cooperate with doctors more actively30. (d) WFO can reduce the burden on patients. It can eliminate the time wasted by patients in consultation in various large hospitals, help patients to obtain the more accurate treatment as soon as possible, avoid fatigue caused by transportation, and reduce travel and accommodation costs while avoiding fatigue caused by travel. (e) WFO can improve the professional level of young doctors. It can significantly shorten the time that junior doctors must spend on consulting relevant documents. At the same time, WFO will give reasons for selection, evidence documents and drug use instructions for each scheme, and update the system once every 1–2 months, thus improving the ability of junior doctors to make accurate diagnosis and treatment recommendations in a short time and improving self-confidence.

Disadvantages of WFO

Recent studies showed that the consistency between WFO and MDT for cancer patients is not completely consistent, especially in patients with advanced cancer, there is a significant decrease in consistency. It is confirmed that WFO still has certain limitations, which lead to differences in the consistency rate when the system is applied in other countries. The limitations are shown as follows: (a) Different treatment schemes: yellow and white people have significant differences in sensitivity and tolerance to certain specific chemotherapeutic drugs due to their different constitutions and key enzyme groups of drug metabolism, so that clinical guidelines between different countries and regions must also have certain differences. For example, the mutation rate of EGFR in lung cancer in European and American countries is about 15%, while that in China is more than 50%31,32. In China, primary research drugs Icotinib and Endostar3335 are used to instead of other first-generation epidermal growth factor receptor-tyrosine kinase inhibitor (EGFR-TKI) and bevacizumab, because studies have shown that they are as effective as EGFR-TKI and bevacizumab in lung cancer patients in China36,37. Liu et al.19 and others have proposed that if WFO system can provide these two alternative therapeutic regimens in ‘Recommended’ or ‘For consideration’, the overall consistency of lung cancer in China can be increased from 65.8 to 93.2%. Xu et al.21 also believe that the difference in first-line treatment of advanced breast cancer can also be attributed to the fact that CDK4/6 inhibitors cannot be used because they are not listed in China. Similarly, WFO recommended panizumab targeted therapy in colon cancer patients, but it is not listed in China and patients cannot choose it38. (b) Different drug choices: WFO recommended chemotherapy regimen complies with NCCN guidelines, but it also includes thousands of clinical practice cases from MSK16. For example, due to the large difference between the surgical methods and guidelines for adjuvant treatment of gastric cancer in China and the United States39,40, the WFO applied research on gastric cancer in the study shows poor concordance rate. On the contrary, the adjuvant therapy and drug selection for colon cancer in eastern and western countries are more consistent, so the concordance rate between WFO and MDT is obviously increased. Liu et al.19 also suggested that WFO recommended concurrent chemoradiation during the treatment of lung cancer, whereas China performs sequential chemoradiation (up to 67%). Chinese patients often cannot tolerate concurrent radiotherapy and chemotherapy because their physique is usually weaker than that of western patients. The physique of Chinese patients is usually weaker than that of western patients, which leads to the decrease of coincidence rate between WFO and MDT. (c) Complications: comprehensive treatment for cancer patients is continuous, and patients may suffer from reversible and transient organ function damage. WFO may sometimes exclude some available schemes in the process of selecting the candidate scheme only based on the transient abnormal biochemical results of the patient41. In Hu's study18, a biochemical blood test of a colon cancer patient showed creatinine clearance rate < 30. WFO did not recommend CapeOX (oxaliplatin + capecitabine) scheme for the patient, but MDT considered that this was only the result of transient biochemical abnormality of the patient, so creatinine clearance rate was rechecked one week later and the result was > 30, so CapeOX scheme treatment was still carried out. In Liu's study19, a patient with active pulmonary tuberculosis was also diagnosed as stage III squamous cell lung cancer. If the standard chemoradiotherapy recommended by WFO is accepted, tuberculosis may spread rapidly, resulting in rapid death. Therefore, Liu et al. modified the treatment strategy to oral anti-tuberculosis drugs before radiotherapy and chemotherapy. Therefore, it is believed that if such individualized information can be incorporated into WFO, the coincidence rate between WFO and MDT will be greatly improved. (d) Economic factors: for example, in the treatment of breast cancer, WFO recommends the use of trastuzumab for HER2 positive patients, but patients in China are often forced to choose chemotherapy first due to the high price of this drug38. In the Republic of Korea, both WFO and MDT recommend regorafenib for patients with stage IV rectal cancer42, but some patients still received 5-fluorouracil (5-Fu)-base chemotherapy, because regorafenib is not only expensive, but also not covered by the national health insurance system16. Similarly, China also needs to consider the issue of medical insurance reimbursement, which also affects the consistency between WFO and MDT. If WFO can make targeted improvements to the treatment recommendations for patients with advanced cancer, non-small cell lung cancer, breast cancer with hormone receptor-positive and colorectal cancer with ECOG 1–2 or older (age > 70), it will be more suitable for clinical use in other countries.

Characteristics and limitations of this meta-analysis

Although WFO has been gradually developed in many countries and regions, and the types of cancers supported are also gradually increasing, so far there is still a lack of evidence-based medicine research for this system. In order to understand the consistency between WFO and MDT, WFO advantages and disadvantages in clinical use, and to solve the practical problems encountered in the practical use of the system, we carried out a targeted meta-analysis. Unlike most of the original studies, which only carry out the consistency research at the ‘For consideration’ level (‘Recommended’ or ‘For consideration’) or at the ‘Recommended’ level (only ‘Recommended’), this research respectively carries out meta-analysis of the above two aspects, which further supports some statistical results obtained from the original studies and provides new statistical evidence. It not only reminds clinicians to pay enough attention to patients with advanced cancer, non-small cell lung cancer, Luminal A and B breast cancer and colorectal cancer with ECOG 1–2 or older (age > 70) in the future when using WFO, but also provides clinical evidence for improvement of WFO. Of course, this meta-analysis still has certain limitations, which are mainly manifested in the following aspects: (a) The possibility of selection bias may exist in a few included studies; (b) The number of samples included in some studies is relatively small, and some study results are not fully reported, lacking complete data of the four classifications. (3) Most studies did not mention the relevant data of WFO's advantages such as shortening consultation time and coincidence between junior or senior doctors and WFO, which leads us to fail to further analyze some of WFO's advantages. (d) All data are published research or conference summaries, lack of grey literature, and possible literature selectivity bias. In addition, 182 cases were included in the initial stage in Liu's study on lung cancer19. In the further study, a total of 33 cases were excluded from the study without the support of WFO, and the remaining 149 patients were included in the study. However, the clinical stages of these 33 cases are not listed in detail and cannot be included for further Meta-analysis. Moreover, the distribution of patients in this study is unbalanced, that is, there are fewer patients in early stage, which is obviously different from the situation that there are more early-stage patients than late-stage patients in other cancers. All these may lead to different conclusions about lung cancer from other cancers. Of course, the sample size included in our systematic evaluation is small, so larger sample size, multi-center and high-quality randomized controlled trials are still needed for further verification in order to reach more reliable conclusions.

To sum up, we should regard WFO as "a tool, not a crutch"43. If WFO is properly used, it will be regarded as a valuable tool. Proper use requires WFO to be only in the position of a complement to the doctor's work, instead of relying on it completely. Oncologists can integrate it with traditional resources such as colleagues' experience and scientific journals to choose the most effective method to provide chemotherapy schemes for patients, to help patients obtain more accurate and effective treatment, fasten and improve their treatment results. Of course, WFO should also make continuous improvement according to clinical use in other countries. People often say that AI will change medicine. In fact, through examples like WFO, we can look forward to how AI can enable people all over the world to obtain the best quality medical services fairly, no matter where or who the patients are44.

Supplementary Information

Supplementary Figure 1. (66.1KB, docx)
Supplementary Figure 2. (39.4KB, docx)
Supplementary Figure 3. (21.8KB, docx)

Author contributions

Conceptualization, L.L. and Z.J.; software, Z.Z.; validation, L.L. and Z.J.; investigation, Z.J.; resources, Z.J.; data curation, Z.Z.; writing—original draft preparation, Z.J.; writing—review and editing, L.L.; visualization, L.L.; supervision, L.L.; project administration, L.L.; funding acquisition, L.L. All authors have read and agreed to the published version of the manuscript.

Funding

Scientific Research and Technology Development Program of Guangxi (NO. Guike 14124004) and the Natural Science Foundation of Guangxi (NO. GXNSFAA118147).

Competing interests

The authors declare no competing interests.

Footnotes

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

These authors contributed equally: Zhou Jie and Zeng Zhiying.

Supplementary Information

The online version contains supplementary material available at 10.1038/s41598-021-84973-5.

References

  • 1.Denu RA, et al. Influence of patient, physician, and hospital characteristics on the receipt of guideline-concordant care for inflammatory breast cancer. Cancer Epidemiol. 2016;40:7–14. doi: 10.1016/j.canep.2015.11.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Woolhandler S, Himmelstein DU. Administrative work consumes one-sixth of U.S. physicians' working hours and lowers their career satisfaction. Int. J. Health Serv. 2014;44(4):635–642. doi: 10.2190/HS.44.4.a. [DOI] [PubMed] [Google Scholar]
  • 3.American Society of Clinical Oncology The state of cancer care in America, 2016: A report by the American Society of Clinical Oncology. J. Oncol. Pract. 2016;12(4):339–383. doi: 10.1200/JOP.2015.010462. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Yu P, Artz D, Warner J. Electronic health records (EHRs): Supporting ASCO’s vision of cancer care. Am. Soc. Clin. Oncol. Educ. Book. 2014;2014:225–231. doi: 10.14694/EdBook_AM.2014.34.225. [DOI] [PubMed] [Google Scholar]
  • 5.Castaneda C, et al. Clinical decision support systems for improving diagnostic accuracy and achieving precision medicine. J. Clin. Bioinform. 2015;5:4. doi: 10.1186/s13336-015-0019-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Musib M, et al. Artificial intelligence in research. Science. 2017;357(6346):28–30. doi: 10.1126/science.357.6346.28. [DOI] [PubMed] [Google Scholar]
  • 7.Spangler, S. et al. Automated Hypothesis Generation Based on Mining Scientific Literature: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, NY, USA2014, 1877–1886. https://doi.org/10.1145/2623330.2623667 (2014).
  • 8.Dayarian A, et al. Predicting protein phosphorylation from gene expression: Top methods from the IMPROVER Species Translation Challenge. Bioinformatics. 2015;31(4):462–470. doi: 10.1093/bioinformatics/btu490. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Codella N, et al. Deep learning, sparse coding, and SVM for melanoma recognition in dermoscopy images. Mach. Learn. Med. Imaging. 2015;2015:118–126. doi: 10.1007/978-3-319-24888-2_15. [DOI] [Google Scholar]
  • 10.Gulshan V, et al. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA. 2016;316(22):2402–2410. doi: 10.1001/jama.2016.17216. [DOI] [PubMed] [Google Scholar]
  • 11.Malek M, et al. A machine learning approach for distinguishing uterine sarcoma from leiomyomas based on perfusion weighted MRI parameters. Eur. J. Radiol. 2019;110:203–211. doi: 10.1016/j.ejrad.2018.11.009. [DOI] [PubMed] [Google Scholar]
  • 12.Kawakami E, et al. Application of artificial intelligence for preoperative diagnostic and prognostic prediction in epithelial ovarian cancer based on blood biomarkers. Clin. Cancer Res. 2019;25(10):3006–3015. doi: 10.1158/1078-0432.CCR-18-3378. [DOI] [PubMed] [Google Scholar]
  • 13.Li S, et al. A DNA nanorobot functions as a cancer therapeutic in response to a molecular trigger in vivo. Nat. Biotechnol. 2018;36(3):258–264. doi: 10.1038/nbt.4071. [DOI] [PubMed] [Google Scholar]
  • 14.Lu HN, et al. A mathematical-descriptor of tumor-mesoscopic-structure from computed-tomography images annotates prognostic- and molecular-phenotypes of epithelial ovarian cancer. Nat. Commun. 2019;10(1):764. doi: 10.1038/s41467-019-08718-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Choi YI, et al. Concordance rate between clinicians and Watson for Oncology among patients with advanced gastric cancer: Early, real-world experience in Korea. Can. J. Gastroenterol. Hepatol. 2019;2019:8072928. doi: 10.1155/2019/8072928. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Kim EJ, et al. Early experience with Watson for oncology in Korean patients with colorectal cancer. PLoS ONE. 2019;14(3):e0213640. doi: 10.1371/journal.pone.0213640. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Zhou N, et al. Concordance study between IBM Watson for Oncology and clinical practice for patients with cancer in China. Oncologist. 2019;24(6):812–819. doi: 10.1634/theoncologist.2018-0255. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Hu CL, et al. The application value of Watson for oncology in patients with colon cancer. Chin. J. Front. Med. Sci. (Electronic Version) 2018;10(10):116–120. doi: 10.12037/YXQY.2018.10-27. [DOI] [Google Scholar]
  • 19.Liu C, et al. Using artificial intelligence (Watson for Oncology) for treatment recommendations amongst Chinese patients with lung cancer: Feasibility study. J. Med. Internet Res. 2018;20(9):e11087. doi: 10.2196/11087. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Somashekhar SP, et al. Watson for Oncology and breast cancer treatment recommendations: Agreement with an expert multidisciplinary tumor board. Ann. Oncol. 2018;29(2):418–423. doi: 10.1093/annonc/mdx781. [DOI] [PubMed] [Google Scholar]
  • 21.Xu JN, Jiang YJ, Duan YY, Hua SY, Sun T. Application of Watson for Oncology on therapy in patients with breast cancer. J. Chin. Res. Hosp. 2018;3:19–24. doi: 10.19450/j.cnki.jcrh.2018.03.005. [DOI] [Google Scholar]
  • 22.Lee WS, et al. Assessing concordance with Watson for Oncology, a cognitive computing decision support system for colon cancer treatment in Korea. JCO Clin. Cancer Inform. 2018;2:1–8. doi: 10.1200/CCI.17.00109. [DOI] [PubMed] [Google Scholar]
  • 23.Somashekhar, S. P. et al. Early experience with IBM Watson for Oncology (WFO) cognitive computing system for lung and colorectal cancer treatment. In Journal of clinical oncology, Conference: 2017 annual meeting of the american society of clinical oncology, ASCO. United States35(15 Supplement 1) (2017).
  • 24.Printz C. Artificial intelligence platform for oncology could assist in treatment decisions. Cancer. 2017;123(6):905. doi: 10.1002/cncr.30655. [DOI] [PubMed] [Google Scholar]
  • 25.Murphy EV. Clinical decision support: Effectiveness in improving quality processes and clinical outcomes and factors that may influence success. Yale J. Biol. Med. 2014;87(2):187–197. [PMC free article] [PubMed] [Google Scholar]
  • 26.Keiffer MR. Utilization of clinical practice guidelines: Barriers and facilitators. Nurs. Clin. N. Am. 2015;50(2):327–345. doi: 10.1016/j.cnur.2015.03.007. [DOI] [PubMed] [Google Scholar]
  • 27.Svenstrup D, Jørgensen HL, Winther O. Rare disease diagnosis: A review of web search, social media and large-scale datamining approaches. Rare Dis. 2015;3(1):e1083145. doi: 10.1080/21675511.2015.1083145. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Zhou M, Zhao L, Campy KS, Wang S. Changing of China’s health policy and doctor-patient relationship: 1949–2016. Health Policy Technol. 2017;6(3):358–367. doi: 10.1016/j.hlpt.2017.05.002. [DOI] [Google Scholar]
  • 29.Chan CS. Mistrust of physicians in China: Society, institution, and interaction as root causes. Dev. World Bioeth. 2018;18(1):16–25. doi: 10.1111/dewb.12162. [DOI] [PubMed] [Google Scholar]
  • 30.Fang JM, et al. The establishment of a new medical model for tumor treatment combined with Watson for Oncology, MDT and patient involvement. J. Clin. Oncol. 2018;36(15 suppl):e18504. doi: 10.1200/JCO.2018.36.15_suppl.e18504. [DOI] [Google Scholar]
  • 31.Li T, Kung HJ, Mack PC, Gandara DR. Genotyping and genomic profiling of non-small-cell lung cancer: Implications for current and future therapies. J. Clin. Oncol. 2013;31(8):1039–1049. doi: 10.1200/JCO.2012.45.3753. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Zhou C. Lung cancer molecular epidemiology in China: Recent trends. Transl. Lung Cancer Res. 2014;3(5):270–279. doi: 10.3978/j.issn.2218-6751.2014.09.01. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Lu S, et al. A multicenter, open-label, randomized phase II controlled study of rh-endostatin (Endostar) in combination with chemotherapy in previously untreated extensive-stage small-cell lung cancer. J. Thorac. Oncol. 2015;10(1):206–211. doi: 10.1097/JTO.0000000000000343. [DOI] [PubMed] [Google Scholar]
  • 34.Sun Y, et al. Endostar Phase III NSCLC Study Group. Long-term results of a randomized, double-blind, and placebo-controlled phase III trial: Endostar (rh-endostatin) versus placebo in combination with vinorelbine and cisplatin in advanced non-small cell lung cancer. Thorac. Cancer. 2013;4(4):440–448. doi: 10.1111/1759-7714.12050. [DOI] [PubMed] [Google Scholar]
  • 35.Wang J, Gu LJ, Fu CX, Cao Z, Chen QY. Endostar combined with chemotherapy compared with chemotherapy alone in the treatment of nonsmall lung carcinoma: A meta-analysis based on Chinese patients. Indian J. Cancer. 2014;51(Suppl 3):e106–e109. doi: 10.4103/0019-509X.154099. [DOI] [PubMed] [Google Scholar]
  • 36.Grigoriu B, Berghmans T, Meert AP. Management of EGFR mutated nonsmall cell lung carcinoma patients. Eur. Respir. J. 2015;45(4):1132–1141. doi: 10.1183/09031936.00156614. [DOI] [PubMed] [Google Scholar]
  • 37.Shi Y, et al. Icotinib versus gefitinib in previously treated advanced non-small-cell lung cancer (ICOGEN): A randomized, double-blind phase 3 non-inferiority trial. Lancet Oncol. 2013;14(10):953–961. doi: 10.1016/S1470-2045(13)70355-3. [DOI] [PubMed] [Google Scholar]
  • 38.Zhou N, Li AQ, Liu GW, Zhang GQ, Zhang XC. Clinical application of artificial intelligence-Watson for Oncology. China Digit. Med. 2018;13(10):23–25. [Google Scholar]
  • 39.Zhou J, Fan YZ. Different methods of alimentary tract reconstruction after gastrectomy. Surg. Res. New Tech. 2015;4(4):270–277. [Google Scholar]
  • 40.Strong VE, et al. Comparison of young patients with gastric cancer in the United States and China. Ann. Surg. Oncol. 2017;24(13):3964–3971. doi: 10.1245/s10434-017-6073-2. [DOI] [PubMed] [Google Scholar]
  • 41.Wang CF. Discussion on the comprehensive treatment and prevention of cancer. World Latest Med. Inf. 2018;18(35):180–183. doi: 10.19613/j.cnki.1671-3141.2018.35.118. [DOI] [Google Scholar]
  • 42.Grothey A, et al. Regorafenib monotherapy for previously treated metastatic colorectal cancer (CORRECT): An international, multicentre, randomised, placebo-controlled, phase 3 trial. Lancet. 2013;381(9863):303–312. doi: 10.1016/S0140-6736(12)61900-X. [DOI] [PubMed] [Google Scholar]
  • 43.Hamilton JG, et al. “A Tool, Not a Crutch”: Patient perspectives about IBM Watson for Oncology trained by memorial sloan kettering. J. Oncol. Pract. 2019;15(4):e277–e288. doi: 10.1200/JOP.18.00417. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Krittanawong C, Zhang HJ, Wang Z, Aydar M, Kitai T. Artificial intelligence in precision cardiovascular medicine. J. Am. Coll. Cardiol. 2017;69(21):2657–2664. doi: 10.1016/j.jacc.2017.03.571. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Figure 1. (66.1KB, docx)
Supplementary Figure 2. (39.4KB, docx)
Supplementary Figure 3. (21.8KB, docx)

Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES