Abstract
Background
The diagnosis and prognostic assessment of bone tumors represent a complex and clinically significant challenge. In recent years, the rise of artificial intelligence (AI), particularly deep learning (DL) and classical machine learning (ML), has emerged as a promising tool in this field. This study systematically reviews the applications of AI in bone tumor diagnosis, prognosis, segmentation, and treatment response, with a focus on model performance, emerging trends, and current limitations.
Methods
This systematic review follows to the PRISMA guidelines and conducted a comprehensive search of four major databases (PubMed, Web of Science, Scopus, and Cochrane Library) to identify studies published between January 2019 and May 2025 on the application of AI in bone tumors. Relevant original articles were identified based on predefined inclusion and exclusion criteria, and research data such as basic information, algorithms, models, performance metrics, and clinical tasks, were systematically extracted and analyzed. And the performance of DL and ML methods in bone tumors was comparatively analyzed.
Results
The review included 70 studies involving 53,149 cases, of which 45.83% were malignant bone tumors. DL was used in 77.63% of the studies and classical ML in 22.37%. Diagnostic tasks dominated the research focus (81.94%), followed by survival prediction (11.11%) and treatment response evaluation (6.94%). Performance metrics indicated that DL models exhibited higher weighted averages in accuracy (0.87), AUC (0.89), sensitivity (0.84), specificity (0.88), precision (0.81), and F-score (0.84), while classical ML models achieved the highest precision (0.90). Although DL demonstrated a performance advantage in image-based tasks, classical ML maintained greater stability in structured datasets. No significant performance differences were observed between large-sample and small-sample studies, reflecting the robustness of both model types. Additionally, a recent shift in research focus was observed, from diagnostic applications toward disease prediction.
Conclusion
Artificial intelligence has demonstrated strong performance and potential in bone tumor research. DL often demonstrates more balanced performance in image-based bone tumor tasks, while classical ML remains competitive and may hold advantages in structured, small-sample datasets, precision-prioritized settings. However, we did not observe statistically significant differences, so these findings should be interpreted as performance tendencies in specific contexts rather than universally validated superiority. Future research should focus on optimizing DL and classical ML models, developing fusion algorithms in bone tumors can improve the generalization performance, accuracy, and ability to adapt to complex data scenarios. At the same time, fostering interdisciplinary and multicenter collaborations between computer scientists and clinicians, improving data-sharing frameworks, and addressing ethical and privacy concerns will be essential to fully harness the significant potential of AI in bone tumor research and clinical applications.
Keywords: Artificial intelligence, Bone tumor, Deep learning, Classical machine learning, Diagnostic performance, Systematic review
Introduction
The incidence of primary bone tumors is 2–3 cases per 100,000 population, accounting for approximately 6.2% of all tumors [1]. Despite the overall incidence of bone and soft tissue malignance is relatively low, they account for only 0.2% of all human malignancies, they account for 10% of cancers in children under the age of 15 years and are the third leading cause of death in cancer patients under 20 years of age [2–5]. Primary bone tumors pose a uniquely difficult diagnostic challenge compared with many other solid tumors. Their imaging and histopathological appearances frequently overlap with a wide spectrum of benign, intermediate, and malignant mesenchymal and non-mesenchymal lesions. This morphological heterogeneity leads to substantial inter-observer variability, especially among less experienced clinicians, and often necessitates multi-modal evaluation combining radiography, MRI, clinical presentation, and biopsy [6]. Clinical treatment options for different types of bone tumors (benign, malignant, and intermediate) vary widely [7]. Misdiagnosis can lead to inappropriate treatment strategies, affecting limb salvage and patient survival. Therefore, early determination of the tumor’s benign or malignant nature and its staging is crucial for development of treatment strategies and assessing prognosis [8, 9]. Consequently, there is an urgent clinical need for objective, automated diagnostic, consistent, and high-quality tools to assist in this complex decision-making process.
Artificial Intelligence (AI) is a technology with human-like problem-solving capabilities that uses data and rules to make decisions and predictions through programs and algorithms [10]. Machine Learning (ML) and its subset deep learning (DL) represent different applications of AI. Where classical ML is the traditional approach in ML that relies heavily on hand-designed features and statistical modelling to learn patterns from data through algorithms, DL is the construction and training of deep neural networks (multi-layer neural networks) to allow models to automatically extract and learn complex features from data [5, 11]. Currently, the application of AI in medicine is expanding from simple data analysis to clinical practice and clinical applications [12]. Current research also concludes that the performance of AI far exceeds that of traditional analysis methods such as statistical analysis and multivariate analysis in many scenarios [13]. Significant progress has been made in the use of ML and DL for medical image analysis, enabling rapid and high-precision diagnoses [14, 15]. Moreover, AI can integrate digital pathology, clinical patient data, and clinical experience to accurately predict patient prognosis and survival [16]. The increasing application of optimization algorithms in biomedical data analysis reflects the comprehensive advancement of intelligent model optimization strategies within the healthcare research field [17]. However, as AI algorithms continue to advance and proliferate, the challenges and future research directions have become increasingly critical. These include not only ethical concerns related to AI’s interaction with human medicine but also issues surrounding the accessibility and security of large annotated patient data sets [18]. As clinical scientist, it is crucial to understand how AI can be optimally utilized in the comprehensive management of bone tumor patients. Therefore, it is essential to be aware of the current research landscape and the key areas for future advancement.
In this study, we aim to analyze and evaluate the application of AI algorithms in bone tumor research since 2019 through a systematic review. The review summarizes the primary model types, application scenarios, outcomes, and limitations while providing a comparative analysis of the strengths and weaknesses of classical ML and DL. Additionally, we discuss several unresolved issues currently encountered in clinical practice. We chose 2019 as the starting point of our review because it coincides with the launch of the European Union’s AI4EU initiative, which signified a major policy and technological push toward human-centric AI development across sectors, including healthcare.
Materials and methods
This systematic review was conducted in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines [19].
Search Strategy
In July 2025, a comprehensive search was conducted across four major databases—MEDLINE (PubMed), Web of Science, Scopus, and the Cochrane Library—targeting literature published from January 2019 to May 2025. The systematic search employed the following keywords and/or Medical Subject Headings terms, without any filters or restrictions: (“Artificial Intelligence” OR “deep learning” OR “machine learning”) AND (“bone” OR “skeleton” OR “musculoskeletal”) AND ((malignant AND (tumor OR lesion OR neoplasm)) OR sarcoma OR malignancy). The starting year 2019 was chosen to align with the strategic launch of the EU-funded AI4EU initiative.
Eligibility criteria
Studies meeting the following criteria were included in this review: (1) Original articles; (2) Manuscript written in English; (3) Clinical application of AI (DL/ML, etc.). Studies were excluded if they met any of the following criteria: (1) Focus on disease-specific secondary bone tumors; (2) Soft tissue tumor; (3) Not available in full text; (4) Animal and cadaveric studies; (5) Review, case reports, letters, editorials, notes, congress abstracts, conference papers and unpublished research, etc.; (6) Significant lack of relevant evaluation data; (7) No clear source of patient information.
Methodology of the review
Two authors (Y.Q. and F.H.) independently reviewed the collected references. Duplicate studies were removed, and the titles and abstracts were assessed based on the inclusion and exclusion criteria outlined above. For articles where eligibility could not be determined from the title and abstract alone, a full-text review was conducted to identify studies meeting the final criteria for subsequent analysis.
Data collection
For each included study, we extracted the title, first author and country of affiliation, year of publication, data type, sample size, tumor classification (benign, malignant, intermediate), specific numbers, algorithms, models, tasks, performance metrics, outcome labels, etc. To ensure the accuracy of the final data included in the analysis, all study results were independently assessed by two reviewers, and all inconsistencies were resolved by consensus of the study team.
Statistical analysis
For continuous parameters, the corresponding range, median, mean, standard deviation (SD), and interquartile range (IQR) were calculated. For discrete parameters, the occurrence counts and corresponding percentages of each metric were recorded. Due to significant heterogeneity in evaluation metrics, imaging data source, validation strategies, and other aspects of AI, formal meta-analysis is considered inappropriate. To present a representative overview of the field and minimize the impact of small-sample outliers, performance metrics were calculated using a sample-size-weighted average formula (Weighted
=
, x represents the applied metrics in each research project, and N represents the number of cases included in the study). This ensures that studies with larger sample sizes contribute proportionally greater weight to the composite results. For performance metrics for which only interval-based data were provided in the study, we instead used the calculation of their mean values (Mean
).
Result
Results of relevant literature
Based on a search of four major databases, we identified 1409 potentially relevant records. After removing 421 duplicate studies, 988 records were screened based on their titles and abstracts, resulting in the exclusion of 860 studies. Full-text reviews were conducted for the remaining 128 studies, and 58 were excluded according to our predefined criteria. Ultimately, 70 studies met the inclusion criteria and were included in the final analysis (Fig. 1) [20–89]. Table 1 presents detailed information on the included studies, including publication details, sample types, sample sizes, research objectives, model types, and performance metrics. We conducted a heatmap analysis of all possible literature since 2019 based on the research objectives. This analysis highlights a growing focus on disease survival prediction as a research hotspot in recent years (Fig. 2A).
Fig. 1.
Flowchart of literature screening based on PRISMA
Table 1.
Final article with continuous and discrete parameters and basic information
| First Author | Year | National | No.Cases | Healthy Cases |
Benign Cases |
Intermediate Cases |
Malignant Cases |
Metastases Cases |
Analyzing data | Purpose | Task | Model Type | Number of labels |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Alabdulkreem | 2023 | Saudi Arabia | 200 | 100 | 0 | 0 | 100 | 0 | Radiograph | Diagnose | Classification | Deep Learning | 2 |
| Anand | 2023 | India | 1144 | 536 | 0 | 0 | 608 | 0 | Histological | Diagnose | Classification |
Deep Learning, Classical Machine Learning |
3 |
| Breden | 2023 | Germany | 409 | 220 | 45 | 0 | 131 | 0 | Radiograph | Diagnose | Classification | Deep Learning | 2 |
| Chen | 2024 | China | 500 | 0 | 335 | 0 | 165 | 0 | MRI | Diagnose | Classification | Deep Learning | 2 |
| Cheng | 2023 | China | 1707 | 0 | 0 | 0 | 1707 | 0 | SEER database | Prognosis | Predicting survival | Deep Learning | Multi-class |
| Chianca | 2021 | Italy | 146 | 0 | 49 | 0 | 40 | 57 | MRI | Diagnose | Classification |
Deep Learning, Classical Machine Learning |
2, 3 |
| Consalvo | 2022 | Germany | 58 | 31 | 9 | 0 | 18 | 0 | Radiograph | Diagnose | Classification | Deep Learning | 3 |
| Do | 2021 | South Korea | 1576 | 381 | 1061 | 0 | 134 | 0 | Radiograph | Diagnose | Classification | Deep Learning | 3 |
| Eweje | 2021 | USA | 1060 | 0 | 582 | 0 | 478 | 0 | MRI | Diagnose | Classification |
Deep Learning, Classical Machine Learning |
2 |
| Georgeanu | 2022 | Romania | 23 | 0 | 10 | 0 | 13 | 0 | MRI | Diagnose | Classification | Deep Learning | 2 |
| Gitto | 2024 | Italy | 150 | 0 | 0 | 102 | 48 | 0 | Radiograph | Diagnose | Classification | Classical Machine Learning | 2 |
| Gitto | 2022 | Italy | 30 | 0 | 0 | 0 | 30 | 0 | MRI | Therapy | Chemotherapy response | Classical Machine Learning | 2 |
| Gitto | 2020 | Italy | 58 | 0 | 0 | 0 | 58 | 0 | MRI | Diagnose | Classification | Classical Machine Learning | 2 |
| Gitto | 2022 | Italy | 158 | 0 | 0 | 119 | 39 | 0 | MRI | Diagnose | Classification | Classical Machine Learning | 2 |
| Guo | 2024 | China | 580 | 0 | 295 | 0 | 285 | 0 | Radiograph | Diagnose | Classification | Deep Learning | 2 |
| Hasei | 2024 | Japan | 1262 | 526 | 0 | 0 | 736 | 0 | Radiograph | Diagnose | Classification | Deep Learning | 2 |
| He | 2024 | China | 249 | 0 | 0 | 81 | 168 | 0 | CT | Diagnose | Classification | Deep Learning | 2 |
| He | 2020 | China | 1356 | 0 | 679 | 317 | 360 | 0 | Radiograph | Diagnose | Classification | Deep Learning | 3 |
| He | 2019 | China | 56 | 0 | 0 | 56 | 0 | 0 | MRI | Prognosis | Prediction recurrence | Deep Learning | 2 |
| Hinterwimmer | 2024 | Germany | 809 | 0 | 523 | 69 | 217 | 0 | Radiograph | Diagnose | Classification | Deep Learning | Multi-class |
| Ho | 2019 | South Korea | 963 | 500 | 329 | 0 | 134 | 0 | Radiograph | Diagnose | Classification | Deep Learning | 3 |
| Holm | 2022 | Denmark | 4264 | 0 | 0 | 0 | 4264 | 0 | Clinical data | Prognosis | Predicts survival | Classical Machine Learning | 2 |
| Ibrahim | 2023 | The Netherlands | 2365 | 1543 | 0 | 0 | 822 | 0 | Bone scintigraphy | Diagnose | Classification | Deep Learning | 2 |
| Jiang | 2021 | China | 835 | 0 | 0 | 0 | 835 | 0 | SEER database | Prognosis | Predicts survival | Classical Machine Learning | 2 |
| Li | 2023 | China | 1430 | 345 | 555 | 77 | 108 | 0 | Radiograph | Diagnose | Classification | Deep Learning | Multi-class |
| Li | 2022 | China | 1201 | 0 | 0 | 0 | 1201 | 0 |
SEER database Clinical data |
Prognosis |
Prediction metastasis and survival |
Deep Learning | Multi-class |
| Liu | 2022 | China | 585 | 0 | 270 | 0 | 315 | 0 | MRI | Diagnose | Classification | Deep Learning | 2 |
| Liu | 2022 | China | 643 | 0 | 392 | 93 | 158 | 0 | Radiograph | Diagnose | Classification |
Deep Learning, Classical Machine Learning |
3 |
| Liu | 2021 | China | 3352 | 0 | 1882 | 0 | 0 | 1470 | 99mTc-MDP | Diagnose | Detection metastasis | Deep Learning | 2 |
| Magdy | 2023 | Egypt | 581 | 219 | 0 | 0 | 362 | 0 | Gamma camera | Diagnose | Detection metastasis | Classical Machine Learning | 2 |
| Malibari | 2022 | Saudi Arabia | 1144 | 536 | 0 | 0 | 608 | 0 | Histological | Diagnose | Classification | Deep Learning | 3 |
| Motohashi | 2023 | Japan | 435 | 172 | 0 | 0 | 0 | 263 | CT | Diagnose | Detection metastasis | Deep Learning | 2 |
| Pan | 2023 | China | 538 | 324 | 0 | 0 | 214 | 0 | Radiograph | Diagnose | Classification | Deep Learning | 3 |
| Park | 2022 | South Korea | 269 | 60 | 120 | 0 | 89 | 0 | Radiograph | Diagnose | Classification | Deep Learning | 3 |
| Saleena | 2023 | India | 37 | 0 | 0 | 0 | 37 | 0 | Histological | Therapy | Segment | Deep Learning | 2 |
| Sampath | 2024 | India | 1141 | 511 | 0 | 0 | 530 | 0 | CT | Diagnose | Classification | Deep Learning | 2 |
| Shao | 2024 | China | 333 | 0 | 0 | 197 | 136 | 0 | Radiograph | Diagnose | Classification | Deep Learning | 2 |
| Shuai | 2023 | China | 23 | 0 | 0 | 0 | 23 | 0 | CT | Diagnose | Segment | Deep Learning | 2 |
| Song | 2024 | China | 1305 | 0 | 750 | 228 | 325 | 0 | Radiograph, CT, MRI | Diagnose | Classification | Deep Learning | 3 |
| Tao | 2021 | China | 458 | 0 | 206 | 96 | 156 | 0 | Histological | Diagnose | Classification | Deep Learning | 2, 3 |
| Vezakis | 2023 | Greece | 1144 | 536 | 0 | 0 | 608 | 0 | Histological | Diagnose | Classification | Deep Learning | 3 |
| Vijayaraj | 2024 | India | 220 | 0 | 110 | 0 | 110 | 0 | MRI | Diagnose | Classification | Deep Learning | 3 |
| Schacky | 2021 | Germany | 1045 | 0 | 743 | 0 | 302 | 0 | Radiograph | Diagnose |
Segment, Classification |
Deep Learning | 2 |
| Wang | 2023 | China | 81 | 0 | 0 | 0 | 36 | 45 | CE-MRI | Diagnose | Segment | Deep Learning | 2 |
| Wang | 2022 | China | 204 | 0 | 0 | 0 | 204 | 0 | MRI | Diagnose | Segment | Deep Learning | 2 |
| Wang | 2024 | China | 170 | 0 | 0 | 0 | 170 | 0 | MRI | Diagnose |
Segment, Classification |
Deep Learning, Classical Machine Learning |
2 |
| Wu | 2022 | China | 240 | 0 | 0 | 0 | 240 | 0 | MRI | Diagnose | Segment | Deep Learning | 2 |
| Wu | 2023 | China | 2164 | 0 | 0 | 0 | 2164 | 0 | Histological | Diagnose | Segment | Deep Learning | 2 |
| Xie | 2024 | China | 878 | 0 | 0 | 0 | 878 | 0 | Radiograph | Diagnose | Classification | Deep Learning | Multi-class |
| Xu | 2024 | China | 2675 | 1988 | 81 | 159 | 447 | 0 | Radiograph | Diagnose | Classification | Deep Learning | 2 |
| Xu | 2022 | China | 119 | 0 | 0 | 0 | 119 | 0 | Radiograph | Therapy | Detection necrosis | Deep Learning | 2 |
| Ye | 2024 | China | 749 | 0 | 207 | 66 | 125 | 0 | MRI | Diagnose |
Segment, Classification |
Deep Learning, Classical Machine Learning |
2, 3 |
| Potter | 2023 | USA | 84 | 0 | 24 | 0 | 60 | 0 | CT | Diagnose |
Segment, Classification |
Deep Learning | 2 |
| Zhan | 2023 | China | 89 | 0 | 0 | 0 | 89 | 0 | MRI | Diagnose | Segment | Deep Learning | 2 |
| Zheng | 2024 | China | 106 | 0 | 0 | 0 | 106 | 0 | MRI | Therapy | Chemotherapy response | Deep Learning | 2 |
| Zhong | 2022 | China | 144 | 0 | 0 | 0 | 144 | 0 | MRI | Therapy | Chemotherapy response | Classical Machine Learning | 2 |
| Şimşek | 2024 | Turkey | 1173 | 0 | 1173 | 0 | 0 | 0 | Radiograph | Diagnose | Classification | Deep Learning | 2 |
| Dalai | 2024 | India | 200 | 100 | 0 | 0 | 100 | 0 | Radiograph | Diagnose | Classification | Deep Learning | 2 |
| Deng | 2024 | China | 604 | 0 | 0 | 0 | 604 | 0 | Clinical data | Prognosis | Predicts 1-year risk of reoperation | Classical Machine Learning | 2 |
| Deng | 2024 | China | 150 | 75 | 25 | 0 | 50 | 0 | Radiograph | Diagnose | Classification | Deep Learning | 2 |
| Gassert | 2025 | USA | 344 | 0 | 124 | 0 | 220 | 0 | CT | Diagnose | Classification | Deep Learning | 2 |
| Hasei | 2024 | Japan | 846 | 378 | 0 | 0 | 468 | 0 | Radiograph | Diagnose | Classification | Deep Learning | 2 |
| Hinterwimmer | 2024 | Germany | 804 | 0 | 479 | 117 | 208 | 0 | Radiograph | Diagnose | Classification | Deep Learning | Multi-class |
| Hong | 2025 | China | 112 | 0 | 59 | 53 | 0 | 0 | CT | Diagnose | Classification | Classical Machine Learning | 2 |
| Long | 2024 | China | 76 | 0 | 0 | 0 | 76 | 0 | CT | Diagnose | Classification | Classical Machine Learning | 2 |
| Nie | 2024 | China | 211 | 0 | 0 | 123 | 88 | 0 |
CT, Clinical data |
Diagnose, Prognosis |
Classification, Prediction survival |
Deep Learning | 2 |
| Rao | 2024 | India | 1144 | 0 | 536 | 0 | 608 | 0 | Histological | Diagnose | Classification | Deep Learning |
2, Multi-class |
| Shouman | 2024 | Egypt | 178 | 0 | 46 | 0 | 132 | 0 | CT |
Diagnose, Prognosis |
Classification | Deep Learning | 2, 3 |
| Wang | 2025 | China | 16 | 8 | 0 | 0 | 8 | 0 | Histological | Diagnose | Classification | Deep Learning | 2 |
| Yao | 2025 | China | 3746 | 1879 | 1525 | 0 | 342 | 0 |
Radiograph, Clinical data |
Diagnose | Classification | Deep Learning | 3 |
SEER: Surveillance, Epidemiology, and End Results
Fig. 2.
Overview of AI applications in bone tumors from 2019 to 2025. (A) Heatmap of research tasks across the years, showing a recent rise in prediction-oriented studies. (B) Distribution of imaging and non-imaging modalities, with radiograph and MRI being the most frequently used data types. (C) Frequency of task categories, categorization is the main focus of AI applications. (D) Application metrics reported across studies
Basic characteristics of the included literature
Tables 2 and 3 show the specific information of the continuous and discrete parameters, respectively. Our systematic review includes 70 original studies published between 2019 and May 2025, covering a total of 53,149 cases, of which 24,358 (45.83%) involve malignant bone tumor samples. The studies were published in various countries, including China, India, Italy, Germany, South Korea, USA, Japan, Saudi Arabia, Egypt, Greece, Romania, Denmark, The Netherlands, and Turkey. The studies were all retrospective. The most commonly analyzed imaging modalities were Radiograph and MRI, which accounted for 35.14% and 25.68%, respectively, followed by CT (14.86%), histological (10.81%), clinical data (9.46%), and specialized imaging (4.05%) (Fig. 2B). The purpose of the investigated studies was mainly categorized into three groups: disease diagnosis (81.94%), survival prediction (11.11%), and treatment response evaluation (6.94%). Specific research tasks included disease classification (66.67%), image segmentation (14.67%), survival prediction (9.33%), detection (5.33%), and chemotherapy response analysis (4.00%) (Fig. 2C). DL was used in 77.63% of the literature, and classical ML was used in 22.37%. Accuracy, sensitivity, AUC, specificity, precision and F-score are the top six metrics used in the literature with 20.21%, 19.52%, 15.07%, 13.01%, 10.62%, and 10.62% respectively (Fig. 2D).
Table 2.
Continuous parameters with interval, median, mean IQR, and standard deviation
| Parameter | Interval | Median | IQR | Mean | SD |
|---|---|---|---|---|---|
| Year of publication | [2019; 2025] | 2023 | 2.00 | 2022.89 | 1.38 |
| Number of patients/cases | [16; 4264] | 458 | 992.50 | 753.11 | 875.25 |
| Healthy | [0; 1988] | 0 | 87.50 | 154.48 | 387.06 |
| Benign | [0; 1882] | 0 | 206.50 | 186.25 | 370.29 |
| Intermediate | [0; 317] | 0 | 0.00 | 27.51 | 61.14 |
| Malignant | [0; 4264] | 144 | 302.00 | 343.07 | 606.36 |
| Metastases | [0; 1470] | 0 | 0.00 | 30.38 | 180.22 |
IQR: interquartile range; SD: standard deviation
Table 3.
Discrete parameters with incidence and percentage share per entity
| Parameter | Entity | Σ | % |
|---|---|---|---|
| Country | |||
| China | 36 | 51.43% | |
| India | 6 | 8.57% | |
| Italy | 5 | 7.14% | |
| Germany | 5 | 7.14% | |
| South Korea | 3 | 4.29% | |
| USA | 3 | 4.29% | |
| Japan | 3 | 4.29% | |
| Saudi Arabia | 2 | 2.86% | |
| Egypt | 2 | 2.86% | |
| Greece | 1 | 1.43% | |
| Romania | 1 | 1.43% | |
| Denmark | 1 | 1.43% | |
| The Netherlands | 1 | 1.43% | |
| Turkey | 1 | 1.43% | |
| Study design | |||
| Retrospective | 70 | 100.00% | |
| Prospective | 0 | 0.00% | |
| Purpose | |||
| Diagnose | 59 | 81.94% | |
| Prognosis | 8 | 11.11% | |
| Therapy | 5 | 6.94% | |
| Task | |||
| Classification | 50 | 66.67% | |
| Segmentation | 11 | 14.67% | |
| Prediction | 7 | 9.33% | |
| Detection | 4 | 5.33% | |
| Response | 3 | 4.00% | |
| Model type | |||
| Deep Learning | 59 | 77.63% | |
| Classical Machine Learning | 17 | 22.37% | |
| Outcome label | |||
| binary classification | 51 | 68.00% | |
| triple classification | 17 | 22.67% | |
| Multi-class | 7 | 9.33% | |
| Imaging modality | |||
| Radiograph | 26 | 35.14% | |
| MRI | 19 | 25.68% | |
| CT | 11 | 14.86% | |
| Histological | 8 | 10.81% | |
| Clinical data | 7 | 9.46% | |
| Specialized Imaging | 3 | 4.05% | |
| Applied metric | |||
| Accuracy | 59 | 20.21% | |
| Sensitive | 57 | 19.52% | |
| AUC | 44 | 15.07% | |
| Specificity | 38 | 13.01% | |
| Precision | 31 | 10.62% | |
| F-score | 31 | 10.62% | |
| Dice score | 10 | 3.42% | |
| IoU | 10 | 3.42% | |
| Kappa | 3 | 1.03% | |
| C-index | 2 | 0.68% | |
| Brier score | 2 | 0.68% | |
| auPRC | 1 | 0.34% | |
| MCC | 1 | 0.34% | |
| Jaccard Index | 1 | 0.34% | |
| IBS | 1 | 0.34% | |
| PPV | 1 | 0.34% |
Results of classical machine learning in bone tumors
Seventeen studies utilized classical ML algorithms for case analysis. The mean accuracy was 0.81 with a SD of 0.10, suggesting generally high predictive accuracy across studies, although some models still exhibited lower performance (minimum value: 0.59). The mean AUC was 0.86 (SD: 0.10), indicating robust overall discriminatory power. Sensitivity averaged 0.81 with a relatively small SD of 0.11, reflecting consistent performance in identifying positive cases. Specificity had a slightly lower mean of 0.83 (SD: 0.09), but still demonstrated reliable ability to exclude negative samples. Precision maintained a high mean value of 0.89 (SD: 0.07), showing stable and accurate positive prediction performance across datasets. Additionally, the F-score, a harmonic measure combining precision and sensitivity, had a mean of 0.77 with a broader SD of 0.13, suggesting greater variability and occasional trade-offs between recall and precision (Table 4).
Table 4.
Continuous parameters with interval, median, mean IQR, and standard deviation
| Parameter | Interval | Median | IQR | Mean | SD | |
| Classical machine learning | Accuracy | [0.59; 0.99] | 0.80 | 0.12 | 0.81 | 0.10 |
| AUC | [0.63; 0.96] | 0.89 | 0.13 | 0.86 | 0.10 | |
| Sensitive | [0.59; 0.99] | 0.84 | 0.15 | 0.81 | 0.11 | |
| Specificity | [0.67; 1.00] | 0.86 | 0.12 | 0.83 | 0.09 | |
| Precision | [0.78; 0.97] | 0.91 | 0.07 | 0.89 | 0.07 | |
| F-score | [0.53; 0.93] | 0.79 | 0.16 | 0.77 | 0.13 | |
| Parameter | Interval | Median | IQR | Mean | SD | |
| Deep learning | Accuracy | [0.56; 1.00] | 0.89 | 0.15 | 0.86 | 0.11 |
| AUC | [0.62; 0.99] | 0.88 | 0.11 | 0.89 | 0.08 | |
| Sensitive | [0.34; 1.00] | 0.88 | 0.17 | 0.85 | 0.13 | |
| Specificity | [0.32; 1.00] | 0.91 | 0.12 | 0.86 | 0.15 | |
| Precision | [0.64; 1.00] | 0.91 | 0.13 | 0.89 | 0.10 | |
| F-score | [0.48; 1.00] | 0.87 | 0.17 | 0.85 | 0.13 |
IQR: interquartile range, SD: standard deviation
Results of deep learning in bone tumors
A total of 59 studies employed DL algorithms for analysis, showing consistently high performance across key evaluation metrics. The mean accuracy was 0.86 with a SD of 0.11, demonstrating the overall reliability of DL models, though individual model performance ranged as low as 0.56. The mean AUC reached 0.89 (SD: 0.08), reflecting excellent discriminatory capacity across different datasets. Sensitivity averaged 0.85 (SD: 0.13), indicating the model’s solid ability to identify true positive cases, albeit with some inter-study variability. Specificity had a mean of 0.86 and a wider SD of 0.15, suggesting greater fluctuation in identifying negative cases. Precision remained strong with a mean of 0.89 (SD: 0.10), while the F-score, which integrates both precision and sensitivity, had a mean of 0.85 and an SD of 0.13, highlighting a well-balanced overall performance despite moderate variation across studies (Table 4).
Impact of AI development timeline and sample size on model performance
Figure 3A shows the temporal trend in model performance from 2020 to 2024, revealing overall improvements in all key metrics (2025 is not included as it is incomplete). Accuracy, AUC, specificity, and precision exhibited a continuous upward trend over the years. The observed performance improvements from 2020 to 2023 can be linked to concrete algorithmic advances. In particular, the increasing adoption of Vision Transformers (ViT), Swin-Transformer architectures, and attention-based feature extractors has enhanced the ability of models to capture long-range dependencies in radiographs and MRI. At the same time, self-supervised pretraining, contrastive learning, and domain-specific foundation models (e.g., RadImageNet) have significantly improved feature robustness in limited or heterogeneous datasets. These methodological developments, together with optimization strategies such as stronger data augmentation pipelines, automated hyperparameter search, and ensemble training, likely contributed to the upward trend in accuracy, AUC, and specificity observed in recent years. Notably, AUC and specificity consistently remained above 0.85 after 2021, indicating enhanced discriminatory ability and stable negative case identification. Sensitivity and F-score, however, showed more fluctuation, suggesting variability in positive case detection and trade-off balance across models. Figure 3B presents the distribution of performance metrics stratified by sample size (large sample: ≥1000 cases; small sample: <1000 cases). Despite the variation in dataset scale, no statistically significant differences were observed across all metrics (p > 0.05). The overlapping distributions in ACC, AUC, sensitivity, specificity, precision, and F-score suggest that sample size did not have a decisive impact on model performance in the current dataset. Although there is no statistically significant difference, this result must be interpreted with caution. One possible explanation lies in the enhanced robustness of contemporary models. An equally plausible explanation may attribute this to the “pseudo-large-sample” effect, wherein a large number of slices/tiles/patches are generated from a small patient population.
Fig. 3.
Comparative performance analysis of AI models in bone tumor studies. (A) Line plot of performance metrics over time (2020–2024), showing general improvements in accuracy, AUC, and specificity in recent years. (B) Metric distribution in relation to sample size, categorized by large (≥ 1000) and small (< 1000) sample groups. There was no statistical difference between the two groups. “Sample size” were taken as reported by the original studies and may represent either unique patients or patient-derived units (e.g., slices/patches) depending on reporting practices. (C) Scatterplot of various application metrics for deep learning and classical machine learning algorithms. There was no statistical difference between the two groups. (D) The radar plot shows the weighted of the performance metrics for both algorithms. The weighted Acc, AUC, Sensitive, Specificity, Precision, and F-score for deep learning are 0.87, 0.89, 0.84, 0.88, 0.81, and 0.84, respectively. The weighted Acc, AUC, Sensitive, Specificity, Precision, and F-score for classical machine learning are 0.83, 0.75, 0.83, 0.83, 0.90, 0.75, respectively
Comparison of deep learning and classical machine learning
To compare model performance across algorithm types, we analyzed top six evaluation metrics, accuracy, AUC, sensitivity, specificity, precision, and F-score, between DL and classical ML models. As shown in Fig. 3C, DL models exhibited generally higher metric values across all parameters, though the differences between the two groups were not statistically significant (p > 0.05). The variability was more pronounced in the ML group, particularly for accuracy and sensitivity, suggesting less consistent performance.
Figure 3D summarizes the weighted metric values for DL and ML using radar plots. The weighted performance values for the DL model are as follows: accuracy 0.87, AUC 0.89, sensitivity 0.84, specificity 0.88, precision 0.81, and F-score 0.84. In contrast, the weighted accuracy for the classical ML group is 0.83 with an AUC of 0.75, sensitivity 0.83, and specificity 0.83, F-score of 0.75, but a precision of 0.90. These results, visualize the performance trade-off between the two algorithms. Overall, DL models demonstrated superior generalization across most metrics, particularly in AUC and specificity, which are critical for reducing false positives and ensuring diagnostic reliability. Meanwhile, ML models exhibited the highest precision (0.90), indicating strong positive predictive capability, but showed reduced consistency in global performance, as reflected by the lower F-score. These findings suggest that while DL offers more balanced and robust performance in complex tasks such as medical image interpretation, ML may still hold an advantage in structured-data scenarios or when model interpretability and precision are prioritized.
Discussion
The complexity of bone tumors diagnosis and treatment, as well as the diversity of patient data, has become one of the most important reasons for using medical AI research. The emerging trends in artificial intelligence application for bone tumors reflect both advances in computational techniques and shifts in clinical demand. Our analysis shows a clear transition from diagnostic tasks toward survival prediction and treatment-related decision support [90]. This shift is likely driven by both deepening clinical needs and continued methodological advances. As image-based diagnostic AI has become more mature, research has moved beyond simple tumor classification toward clinically actionable objectives, including prognosis stratification, recurrence/metastasis risk estimation, survival prediction, and treatment response assessment, reflecting clinicians’ growing demand for decision support across the full care pathway [91].
Our analysis indicates that DL holds absolute dominance in bone tumors, being applied in 77.63% of studies. This is primarily attributed to DL’s suitability for processing high-dimensional image data [92, 93]. Classical ML methods, on the other hand, excel in handling structured data, particularly tabular datasets with fixed feature dimensions and clear semantics (e.g., patient clinical information). This makes classical ML particularly advantageous in tasks such as prognostic analysis [94]. Moreover, our study showed that 90.54% of the studies were based on image processing, which reasonably explains why DL is more widely used in the field of oncology, which may be related to the fact that more tumor-related studies need to be based on the processing of a large amount of image information.
Temporal analysis (Fig. 3A) indicates a steady performance improvement from 2020 to 2024. This trajectory aligns with broader advancements in AI, including the rise of Vision Transformers, Swin Transformer architectures, and foundation models, which have enhanced computer vision capabilities [95, 96]. Sample size analysis (Fig. 3B) showed no statistically significant differences in model performance between large-sample and small-sample studies across all metrics (p > 0.05). This observation suggests that contemporary AI models, especially DL frameworks, may have achieved sufficient robustness to maintain stable performance even in data-limited environments. However, the slightly greater dispersion of performance metrics in small-sample studies underscores the continued need for standardization in data preprocessing, annotation quality, and model validation practices to ensure generalizability. However, a more significant factor may be the reporting of “pseudo-large-sample.” Many studies included in the dataset list substantial numbers of samples, yet these samples may originate from a very small number of patients, such as multiple CT or MRI slices from the same case, different imaging sequences, or segmented pathological blocks extracted from a limited number of slides. Conversely, some studies with small sample sizes have also expanded their sample sizes through various methods. This “pseudo-large-sample” phenomenon may increase accuracy, but simultaneously obscures the authenticity of deep learning models trained on small patient cohorts [97]. This phenomenon not only obscures the true definition of sample size but also introduces the risk of “data leakage.” When studies report only the total count of images while neglecting the unique patient count, slices or patches from the same individual may inadvertently appear in both training and validation sets. Furthermore, given that classical ML exhibited superior precision (0.90) and stability in our analysis, we hypothesize that if adjusted for the actual number of patients, the advantage of ML in “true small data” regimes would be even more distinct, whereas the perceived robustness of DL in small patient cohorts might be re-evaluated.
Analysis of continuous parameters in Table 4 reveals nuanced differences in the stability and variability of model performance between classical ML and DL algorithms. For DL models, the accuracy ranged from 0.56 to 1.00, and specificity ranged from 0.32 to 1.00, indicating that although performance was often high (mean ACC = 0.86, mean specificity = 0.86), some models struggled significantly on specific datasets. Sensitivity showed the lowest minimum value (0.34), suggesting a risk of under-detection in certain clinical scenarios, though its overall mean (0.85) and median (0.88) remained high. In contrast, classical ML models demonstrated narrower ranges and slightly lower minimums: accuracy (0.59–0.99), AUC (0.63–0.96), and sensitivity (0.59–0.99). These results indicate a more consistent lower-bound performance but potentially less upward flexibility compared to DL. Notably, precision in the ML group was both high and stable (range: 0.78–0.97, mean: 0.89, IQR: 0.07), reflecting strong reliability in positive case identification. The F-score showed slightly more dispersion in the ML group (IQR = 0.16, SD = 0.13), indicating variable balance between precision and recall across studies. Importantly, for both algorithm types, the medians were close to the means across all six metrics, and IQR were generally small, particularly for precision, supporting the notion that the performance distributions were symmetric and concentrated. These findings suggest that both DL and ML models achieve relatively stable results in most studies, though DL models exhibit greater performance variability due to their broader operational range, which may reflect their adaptability to diverse data types and complexity levels.
Based on the scatter plot distribution (Fig. 3C), we observed that while DL models tended to cluster at higher performance levels with relatively low variability, the differences between the DL and classical ML groups did not reach statistical significance (p > 0.05). This suggests that under certain conditions, the practical performance of both approaches can be comparable. Radar chart analysis (Fig. 3D) revealed that DL models achieved higher weighted averages across most key performance metrics, particularly in AUC (0.89 vs. 0.75), specificity (0.88 vs. 0.83), and F-score (0.84 vs. 0.75). However, ML models outperformed in precision (0.90 vs. 0.81), indicating a stronger ability to correctly identify positive cases. These findings suggest that DL offers more balanced and consistent performance across a broader range of metrics, while ML demonstrates a more polarized pattern, excelling in precision but lagging in overall recall and global consistency. In summary, these results suggest that DL models are better suited for complex image-based or high-dimensional biomedical tasks due to their large parameter space and nonlinear feature extraction capabilities. In contrast, classical ML models, despite their simplicity, remain highly competitive when applied to clean, well-structured datasets, especially in diagnostic tasks where high accuracy is crucial. Currently, in addition to multicenter collaborations to expand patient samples, the development of ML algorithms for small samples is an approach that could be useful to apply to diseases such as bone tumors, which have a low incidence and prevalence [98, 99].
The DL model showed high volatility in the sensitivity index (mean of 0.85, SD of 0.13), suggesting that the ability to recognize positive samples still needs to be strengthened in model development to reduce the rate of missed diagnosis. However, deep learning performed stably in terms of accuracy (0.86), AUC (0.89) and precision (0.89), showing that it is expected to effectively reduce the burden of doctors and improve the diagnostic efficiency and consistency in the auxiliary diagnosis of bone tumors. While DL has a slight advantage in sensitivity (0.85 vs. 0.81) and specificity (0.86 vs. 0.83), classical ML exhibits lower standard deviations on several metrics, suggesting that its results are more stable. This feature makes it more applicable to small and medium sample sizes, which is of practical value especially in the context of limited data resources.
Despite high reported performance, clinical translation remains limited. The gap between research findings and clinical application is an issue that warrants attention and reflection from both researchers and clinicians. We attribute this gap to several factors. First, data heterogeneity and transparency issues persist; only 60% of studies specified tumor types, and inconsistent labeling across institutions hinders model generalizability. Second, the “Black Box” nature of DL models creates a trust gap. For high-stakes decisions, clinicians require interpretable predictions, yet many models lack transparency. Third, data privacy and ethical concerns impede the sharing of high-quality pathological data, particularly under strict regulations like General Data Protection Regulation (GDPR) [100–102]. This has led to a disconnect where computer scientists focus on algorithmic optimization without fully addressing clinical needs such as interpretability and prospective validation [5, 103].
To address the complex clinical reality of bone tumors, future research should prioritize hybrid model architectures, not merely for technical novelty, but to satisfy the unmet need for multimodal prognostic stratification [104, 105].While pure DL models excel at image perception and pure ML models dominate in structured data precision, neither can independently replicate the clinician’s workflow of synthesizing imaging phenotypes with patient clinical context. A hybrid approach is uniquely positioned to bridge this gap. This strategy justifies its computational complexity by enabling the integration of high-dimensional imaging data with low-dimensional clinical variables, thereby facilitating the transition from simple diagnostic classification to comprehensive survival prediction and treatment planning. For example, convolutional neural network (CNN) or transformer-based encoders can extract high-level imaging representations that are subsequently fed into logistic regression, support vector machines, or random forest classifiers. Meanwhile, feature selection strategies based on optimization demonstrate significant potential in cancer classification tasks [106]. Similarly, attention mechanisms can be coupled with gradient boosting models to improve feature weighting; and ensemble frameworks can merge deep feature embeddings with traditional ML decision layers to enhance stability and interpretability. Although such strategies have been successfully applied in various diseases, their integration in bone tumor research remains insufficient, opening up broad prospects for future exploration [107–110]. However, to fully unlock the potential of multimodal and hybrid systems, it is essential to acquire high-quality, diverse, and consistently annotated datasets. Achieving robust multimodal fusion necessitates overcoming practical obstacles, including heterogeneous data formats, missing modal information, and variations in imaging protocols across centers. To address these challenges while ensuring ethical compliance, future efforts should promote multicenter collaboration and adopt privacy-preserving techniques like federated learning. To address the “pseudo-large-sample” phenomenon, future medical imaging AI research should adopt a “Dual-Reporting Standard.” Researchers must explicitly distinguish and report both: the patient count (Np), to assess biological biodiversity and universality; and the computational sample size (Ni) (e.g., number of images, slices, or patches), to assess computational load and overfitting risk. This distinction is critical for systematic reviews to identify instances of performance overestimation caused by inconsistent sample size definitions. Ultimately, we should shift our focus from solely emphasizing diagnostic accuracy to developing integrated clinical decision support systems that unify diagnosis, prognosis, and treatment planning. Embedding interpretable AI modules, such as SHAP, Grad-CAM, and attention-based heatmaps, within these systems will further enhance clinicians’ trust and accelerate the practical clinical application of AI in the field of bone oncology.
Despite this systematic review included many of the literature, it still has some limitations. (i) Although we incorporated literature published up to May 2025, the rapid evolution of AI, especially the emergence of LLM, transformer, and multimodal systems, means that publication timelines may not fully capture the most recent methodological advances. (ii) This review focused exclusively on primary bone tumors and excluded secondary bone tumors. While metastatic bone disease constitutes a substantial clinical burden, secondary tumors arise from biologically heterogeneous primary cancers and exhibit diverse imaging phenotypes. Including them would introduce major variability in disease behavior, diagnostic pathways, and data distribution, thereby increasing noise and reducing comparability among AI models. In the future, we propose conducting separate, dedicated systematic reviews specifically for secondary bone metastases. (iii) Although the “black box” nature of AI models was discussed, our review could not provide a quantitative assessment of model interpretability because most included studies did not report explainability metrics. Nonetheless, methods such as SHAP, LIME, Grad-CAM, and attention-based visualization tools represent valuable strategies for enhancing transparency and clinician trust. (iv) Due to the uneven amount of research on ML and DL, coupled with inconsistencies in “sample size” reporting within the original literature (some studies failed to distinguish between patient counts and image counts). Therefore, when conducting stratified analysis, we cannot entirely eliminate the confounding effects introduced by “pseudo-large samples.” The weighting algorithm we used does not necessarily truly reflect the level between the two, but only removes the impact of the data volume bias as much as possible, and we also simply perform an inductive analysis, which could potentially influence our assessment of the model’s performance across different data scales.
Conclusion
This systematic review highlights the potential of AI algorithms in the diagnosis, detection, survival prediction, image segmentation, and chemotherapy response analysis of bone tumors. Although statistical analysis indicates comparable overall performance between the two approaches, with both classical ML and DL methods demonstrating high accuracy across multiple tasks, DL is particularly well-suited for processing complex, high-dimensional medical images. Conversely, classical ML excels in structured data tasks with superior precision and stability. Therefore, the future direction of AI for bone tumors may not lie in choosing one over the other, but in leveraging their complementary strengths: applying DL to perception tasks while utilizing ML for decision logic. However, challenges such as small sample sizes, lack of multicenter studies, inconsistent data labeling, and the “black box” nature of AI models have hindered their clinical application. Future research should focus on developing AI models that integrate multimodal data, improve interpretability, and enhance robustness while fostering closer collaboration between computational scientists and clinicians. Addressing these issues is critical to bridging the gap between research results and clinical implementation, ultimately leading to more personalized clinical treatments and more effective clinical decision-making.
Acknowledgements
Not applicable.
Abbreviations
- AI
Artificial Intelligence
- ML
Machine Learning
- DL
Deep Learning
- LLM
Large Language Model
- AUC
Area Under the Curve
- ACC
Accuracy
- SD
Standard Deviation
- IQR
Interquartile Range
- PRISMA
Preferred Reporting Items for Systematic Reviews and Meta-Analyses
Author contributions
Y.Q. wrote the original draft, conducted data curation and formal analysis. C.E. and R.v.E.-R. reviewed and edited the manuscript and provided supervision. F.H. contributed to conceptualization, methodology, formal analysis, and supervision, and also reviewed and edited the manuscript.
Funding
Open Access funding enabled and organized by Projekt DEAL. This work was supported by the Nemetschek Innovationsfond and the Bavarian Ministry of Arts and Science.
Data availability
Data extracted from included studies are presented in tables. All included data can also be accessed directly from the relevant publications [20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43 , 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69].
Declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
All authors reviewed the article and unanimously agreed to publish it.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Zhou X, Wang H, Feng C, Xu R, He Y, Li L, Tu C. Emerging applications of deep learning in bone tumors: current advances and challenges. Front Oncol. 2022;12:908873. 10.3389/fonc.2022.908873. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Siegel RL, Giaquinto AN, Jemal A. Cancer statistics. CA Cancer J Clin. 2024;74(1):12–49. 10.3322/caac.21820. [DOI] [PubMed]
- 3.Biermann JS, Chow W, Reed DR, Lucas D, Adkins DR, Agulnik M, Benjamin RS, Brigman B, Budd GT, Curry WT, Didwania A, Fabbri N, Hornicek FJ, Kuechle JB, Lindskog D, Mayerson J, McGarry SV, Million L, Morris CD, Movva S, O’Donnell RJ, Randall RL, Rose P, Santana VM, Satcher RL, Schwartz H, Siegel HJ, Thornton K, Villalobos V, Bergman MA, Scavone JL, Insights NCCNG. Bone cancer, version 2.2017. J Natl Compr Canc Netw. 2017;15(2):155–67. 10.6004/jnccn.2017.0017. [DOI] [PubMed] [Google Scholar]
- 4.Trama A, Bernasconi A, Canete A, Carulla M, Daubisse-Marliac L, Rossi S, De Angelis R, Sanvisens A, Katalinic A, Paapsi K, Went P, Mousavi M, Blum M, Eberle A, Lamy S, Capocaccia R, Didone F, Botta L, Eurocare WG. Incidence and survival of rare adult solid cancers in Europe (EUROCARE-6): A population-based study. Eur J Cancer. 2024;214:115147. 10.1016/j.ejca.2024.115147. [DOI] [PubMed] [Google Scholar]
- 5.Hinterwimmer F, Consalvo S, Neumann J, Rueckert D, von Eisenhart-Rothe R, Burgkart R. Applications of machine learning for imaging-driven diagnosis of musculoskeletal malignancies-a scoping review. Eur Radiol. 2022;32(10):7173–84. 10.1007/s00330-022-08981-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Salazar C, Leite M, Sousa A, Torres J. Correlation between imagenological and histological diagnosis of bone tumors. A retrospective study. Acta Ortop Mex. 2019;33(6):386–90. [PubMed] [Google Scholar]
- 7.Choi JH, Ro JY. The 2020 WHO Classification of Tumors of Bone: An Updated Review. Adv Anat Pathol. 2021;28(3):119–38. 10.1097/pap.0000000000000293. [DOI] [PubMed] [Google Scholar]
- 8.Horstmann PF, Hettwer WH, Petersen MM. Treatment of benign and borderline bone tumors with combined curettage and bone defect reconstruction. J Orthop Surg (Hong Kong). 2018;26(3):2309499018774929. 10.1177/2309499018774929. [DOI] [PubMed] [Google Scholar]
- 9.Gutowski CJ, Basu-Mallick A, Abraham JA. Management of Bone Sarcoma. Surg Clin North Am. 2016;96(5):1077–106. 10.1016/j.suc.2016.06.002. [DOI] [PubMed] [Google Scholar]
- 10.Gaur K, Jagtap MM. Role of Artificial intelligence and machine learning in prediction, diagnosis, and prognosis of cancer. Cureus J Med Sci. 2022;14(11). 10.7759/cureus.31008. [DOI] [PMC free article] [PubMed]
- 11.Janiesch C, Zschech P, Heinrich K. Machine learning and deep learning. Electron Markets. 2021;31(3):685–95. 10.1007/s12525-021-00475-2. [Google Scholar]
- 12.Kourou K, Exarchos KP, Papaloukas C, Sakaloglou P, Exarchos T, Fotiadis DI. Applied machine learning in cancer research: A systematic review for patient diagnosis, classification and prognosis. Comput Struct Biotechnol J. 2021;19:5546–55. 10.1016/j.csbj.2021.10.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Huang SG, Yang J, Fong S, Zhao Q. Artificial intelligence in cancer diagnosis and prognosis: Opportunities and challenges. Cancer Lett. 2020;471:61–71. 10.1016/j.canlet.2019.12.007. [DOI] [PubMed] [Google Scholar]
- 14.Jeon K, Park WY, Kahn CE Jr., Nagy P, You SC, Yoon SH. Advancing Medical Imaging Research Through Standardization: The Path to Rapid Development, Rigorous Validation, and Robust Reproducibility. Invest Radiol. 2025;60(1):1–10. 10.1097/rli.0000000000001106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Deo RC. Machine learning in medicine. Circulation. 2015;132(20):1920–30. 10.1161/circulationaha.115.001593. [DOI] [PMC free article] [PubMed]
- 16.Parvaiz A, Nasir ES, Fraz MM. From Pixels to Prognosis: A Survey on AI-Driven Cancer Patient Survival Prediction Using Digital Histology Images. J Imaging Inf Med. 2024;37(4):1728–51. 10.1007/s10278-024-01049-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Abrar Y, Navneet Kumar V, Rabia Musheer A. Metaheuristic algorithms and their applications in different fields, metaheuristics for machine learning: algorithms and applications, Wiley; 2024, pp. 1–35.
- 18.Papalia GF, Brigato P, Sisca L, Maltese G, Faiella E, Santucci D, Pantano F, Vincenzi B, Tonini G, Papalia R, Denaro V. Artificial intelligence in detection, management, and prognosis of bone metastasis. Syst Rev Cancers. 2024;16(15). 10.3390/cancers16152700. [DOI] [PMC free article] [PubMed]
- 19.Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, Shamseer L, Tetzlaff JM, Akl EA, Brennan SE, Chou R, Glanville J, Grimshaw JM, Hrobjartsson A, Lalu MM, Li T, Loder EW, Mayo-Wilson E, McDonald S, McGuinness LA, Stewart LA, Thomas J, Tricco AC, Welch VA, Whiting P, Moher D. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ. 2021;372:n71. 10.1136/bmj.n71. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Alabdulkreem E, Saeed MK, Alotaibi SS, Allafi R, Mohamed A, Hamza MA. Bone Cancer Detection and Classification Using Owl Search Algorithm With Deep Learning on X-Ray Images. Ieee Access. 2023;11:109095–103. 10.1109/Access.2023.3319293. [Google Scholar]
- 21.Anand D, Khalaf OI, Hajjej F, Wong WK, Pan SH, Chandra GR. Optimized swarm enabled deep learning technique for bone tumor detection using histopathological image. 2023;27(3):16. 10.22441/sinergi.2023.3.016.
- 22.Breden S, Hinterwimmer F, Consalvo S, Neumann J, Knebel C, von Eisenhart-Rothe R, Burgkart RH, Lenze U. Deep learning-based detection of bone tumors around the knee in X-rays of children. J Clin Med. 2023;12(18). 10.3390/jcm12185960. [DOI] [PMC free article] [PubMed]
- 23.Chen X, Chen H, Wan J, Li J, Wei F. An enhanced AlexNet-Based model for femoral bone tumor classification and diagnosis using magnetic resonance imaging. J Bone Oncol. 2024;48:100626. 10.1016/j.jbo.2024.100626. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Cheng D, Liu D, Li X, Mi Z, Zhang Z, Tao W, Dang J, Zhu D, Fu J, Fan H. A deep learning model for accurately predicting cancer-specific survival in patients with primary bone sarcoma of the extremity: a population-based study. Clin Transl Oncol. 2024;26(3):709–19. 10.1007/s12094-023-03291-6. [DOI] [PubMed] [Google Scholar]
- 25.Chianca V, Cuocolo R, Gitto S, Albano D, Merli I, Badalyan J, Cortese MC, Messina C, Luzzati A, Parafioriti A, Galbusera F, Brunetti A, Sconfienza LM. Radiomic machine learning classifiers in spine bone tumors: a multi-software, multi-scanner study. Eur J Radiol. 2021;137. 10.1016/j.ejrad.2021.109586. [DOI] [PubMed]
- 26.Consalvo S, Hinterwimmer F, Neumann J, Steinborn M, Salzmann M, Seidl F, Lenze U, Knebel C, Rueckert D, Burgkart RHH. Two-Phase Deep Learning Algorithm for Detection and Differentiation of Ewing Sarcoma and Acute Osteomyelitis in Paediatric Radiographs. Anticancer Res. 2022;42(9):4371–80. 10.21873/anticanres.15937. [DOI] [PubMed] [Google Scholar]
- 27.Do NT, Jung ST, Yang HJ, Kim SH. Multi-level seg-unet model with global and patch-based X-ray images for knee bone tumor detection. Diagnostics (Basel). 2021;11(4). 10.3390/diagnostics11040691. [DOI] [PMC free article] [PubMed]
- 28.Eweje FR, Bao B, Wu J, Dalal D, Liao WH, He Y, Luo Y, Lu S, Zhang P, Peng X, Sebro R, Bai HX, States L. Deep learning for classification of bone lesions on routine MRI, EBioMedicine 2021;68:103402. 10.1016/j.ebiom.2021.103402. [DOI] [PMC free article] [PubMed]
- 29.Georgeanu VA, Mamuleanu M, Ghiea S, Selisteanu D. Malignant bone tumors diagnosis using magnetic resonance imaging based on deep learning algorithms. Medicina-Lithuania. 2022;58(5). 10.3390/medicina58050636. [DOI] [PMC free article] [PubMed]
- 30.Gitto S, Annovazzi A, Nulle K, Interlenghi M, Salvatore C, Anelli V, Baldi J, Messina C, Albano D, Di Luca F, Armiraglio E, Parafioriti A, Luzzati A, Biagini R, Castiglioni I, Sconfienza LM. X-rays radiomics-based machine learning classification of atypical cartilaginous tumour and high-grade chondrosarcoma of long bones. EBioMedicine. 2024;101:105018. 10.1016/j.ebiom.2024.105018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Gitto S, Corino VDA, Annovazzi A, Milazzo Machado E, Bologna M, Marzorati L, Albano D, Messina C, Serpi F, Anelli V, Ferraresi V, Zoccali C, Aliprandi A, Parafioriti A, Luzzati A, Biagini R, Mainardi L, Sconfienza LM. 3D vs. 2D MRI radiomics in skeletal Ewing sarcoma: Feature reproducibility and preliminary machine learning analysis on neoadjuvant chemotherapy response prediction. Front Oncol. 2022;12:1016123. 10.3389/fonc.2022.1016123. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Gitto S, Cuocolo R, Albano D, Chianca V, Messina C, Gambino A, Ugga L, Cortese MC, Lazzara A, Ricci D, Spairani R, Zanchetta E, Luzzati A, Brunetti A, Parafioriti A, Sconfienza LM. MRI radiomics-based machine-learning classification of bone chondrosarcoma. Eur J Radiol. 2020;128. https://doi.org/ARTN10904310.1016/j.ejrad.2020.109043. [DOI] [PubMed]
- 33.Gitto S, Cuocolo R, van Langevelde K, van de Sande MAJ, Parafioriti A, Luzzati A, Imbriaco M, Sconfienza LM, Bloem JL. MRI radiomics-based machine learning classification of atypical cartilaginous tumour and grade II chondrosarcoma of long bones. EbioMedicine. 2022;75:103757. 10.1016/j.ebiom.2021.103757. [DOI] [PMC free article] [PubMed]
- 34.Guo CQ, Chen Y, Li JJ. Radiographic imaging and diagnosis of spinal bone tumors: AlexNet and ResNet for the classification of tumor malignancy. J Bone Oncol. 2024;48. 10.1016/j.jbo.2024.100629. https://doi.org/ARTN. [DOI] [PMC free article] [PubMed]
- 35.Hasei J, Nakahara R, Otsuka Y, Nakamura Y, Hironari T, Kahara N, Miwa S, Ohshika S, Nishimura S, Ikuta K, Osaki S, Yoshida A, Fujiwara T, Nakata E, Kunisada T, Ozaki T. High-quality expert annotations enhance artificial intelligence model accuracy for osteosarcoma X-ray diagnosis. Cancer Sci. 2024;115(11):3695–704. 10.1111/cas.16330. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.He JT, Bi XJ. Automatic classification of spinal osteosarcoma and giant cell tumor of bone using optimized DenseNet. J Bone Oncol. 2024;46. 10.1016/j.jbo.2024.100606. https://doi.org/ARTN. [DOI] [PMC free article] [PubMed]
- 37.He Y, Pan I, Bao BT, Halsey K, Chang M, Liu H, Peng SP, Sebro RA, Guan J, Yi T, Delworth AT, Eweje F, States LJ, Zhang PJ, Zhang ZS, Wu J, Peng XJ, Bai HX. Deep learning-based classification of primary bone tumors on radiographs: a preliminary study. Ebiomedicine. 2020;62. 10.1016/j.ebiom.2020.103121. [DOI] [PMC free article] [PubMed]
- 38.He YF, Guo JP, Ding XY, van Ooijen PMA, Zhang YP, Chen A, Oudkerk M, Xie XQ. Convolutional neural network to predict the local recurrence of giant cell tumor of bone after curettage based on pre-surgery magnetic resonance images. Eur Radiol. 2019;29(10):5441–51. 10.1007/s00330-019-06082-2. [DOI] [PubMed] [Google Scholar]
- 39.Hinterwimmer F, Serena RS, Wilhelm N, Breden S, Consalvo S, Seidl F, Juestel D, Burgkart RHH, Woertler K, von Eisenhart-Rothe R, Neumann J, Rueckert D. Recommender-based bone tumour classification with radiographs-a link to the past. Eur Radiol. 2024;34(10):6629–38. 10.1007/s00330-024-10672-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Ho NH, Yang HJ, Kim SH, Jung ST, Joo SD. Regenerative Semi-Supervised Bidirectional W-Network-Based Knee Bone Tumor Classification on Radiographs Guided by Three-Region Bone Segmentation. Ieee Access. 2019;7:154277–89. 10.1109/Access.2019.2949125. [Google Scholar]
- 41.Holm CE, Grazal CF, Raedkjaer M, Baad-Hansen T, Nandra R, Grimer R, Forsberg JA, Petersen MM, Soerensen MS. Development and comparison of 1-year survival models in patients with primary bone sarcomas: external validation of a bayesian belief network model and creation and external validation of a new gradient boosting machine model. Sage Open Med. 2022;10. https://doi.org/Artn20503121221076387. [DOI] [PMC free article] [PubMed]
- 42.Ibrahim A, Vaidyanathan A, Primakov S, Belmans F, Bottari F, Refaee T, Lovinfosse P, Jadoul A, Derwael C, Hertel F, Woodruff HC, Zacho HD, Walsh S, Vos W, Occhipinti M, Hanin F-X, Lambin P, Mottaghy FM, Hustinx R. Deep learning based identification of bone scintigraphies containing metastatic bone disease foci. Cancer Imaging. 2023;23(1). 10.1186/s40644-023-00524-3. [DOI] [PMC free article] [PubMed]
- 43.Jiang J, Pan H, Li M, Qian B, Lin X, Fan S. Predictive model for the 5-year survival status of osteosarcoma patients based on the SEER database and XGBoost algorithm. Sci Rep. 2021;11(1):5542. 10.1038/s41598-021-85223-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Li J, Li SD, Li XL, Miao S, Dong C, Gao CP, Liu XJ, Hao DP, Xu WJ, Huang MQ, Cui JF. Primary bone tumor detection and classification in full-field bone radiographs via YOLO deep learning model. Eur Radiol. 2023;33(6):4237–48. 10.1007/s00330-022-09289-y. [DOI] [PubMed] [Google Scholar]
- 45.Li WL, Dong YZ, Liu WC, Tang ZR, Sun CY, Lowe S, Chen SY, Bentley R, Zhou Q, Xu C, Li WY, Wang B, Wang HS, Dong ST, Hu ZH, Liu Q, Cai XT, Feng XW, Zhao W, Yin CL. A deep belief network-based clinical decision system for patients with osteosarcoma. Front Immunol. 2022;13. https://doi.org/ARTN1003347. [DOI] [PMC free article] [PubMed]
- 46.Liu H, Jiao ML, Yuan Y, Ouyang HQ, Liu JF, Li Y, Wang CJ, Lang N, Qian YL, Jiang L, Yuan HS, Wang XD. Benign and malignant diagnosis of spinal tumors based on deep learning and weighted fusion framework on MRI. Insights Imaging 2022;13(1). 10.1186/s13244-022-01227-2. [DOI] [PMC free article] [PubMed]
- 47.Liu RY, Pan DR, Xu Y, Zeng H, He ZL, Lin JB, Zeng WX, Wu ZQ, Luo ZD, Qin GG, Chen WG. A deep learning-machine learning fusion approach for the classification of benign, malignant, and intermediate bone tumors. Eur Radiol. 2022;32(2):1371–83. 10.1007/s00330-021-08195-z. [DOI] [PubMed] [Google Scholar]
- 48.Liu YM, Yang P, Pi Y, Jiang LS, Zhong X, Cheng JJ, Xiang YZ, Wei JN, Li L, Yi Z, Cai HW, Zhao Z. Automatic identification of suspicious bone metastatic lesions in bone scintigraphy using convolutional neural network, Bmc Med Imaging. 2021;21(1) . 10.1186/s12880-021-00662-9. [DOI] [PMC free article] [PubMed]
- 49.Magdy O, Abd Elaziz M, Elgarayhi A, Ewees AA, Sallah M. Bone metastasis detection method based on improving golden jackal optimization using whale optimization algorithm. Sci Rep-Uk. 2023;13(1). 10.1038/s41598-023-41733-x. https://doi.org/ARTN. [DOI] [PMC free article] [PubMed]
- 50.Malibari AA, Alzahrani JS, Obayya M, Negm N, Al-Hagery MA, Salama AS, Hilal AM. Biomedical Osteosarcoma Image Classification Using Elephant Herd Optimization and Deep Learning. Cmc-Comput Mater Con. 2022;73(3):6443–59. 10.32604/cmc.2022.031324. [Google Scholar]
- 51.Motohashi M, Funauchi Y, Adachi T, Fujioka T, Otaka N, Kamiko Y, Okada T, Tateishi U, Okawa A, Yoshii T, Sato S. A New Deep Learning Algorithm for Detecting Spinal Metastases on Computed Tomography Images. Spine. 2024;49(6):390–7. 10.1097/Brs.0000000000004889. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Pan CY, Lian LY, Chen JY, Huang RS. FemurTumorNet: bone tumor classification in the proximal femur using DenseNet model based on radiographs. J Bone Oncol. 2023;42. 10.1016/j.jbo.2023.100504. [DOI] [PMC free article] [PubMed]
- 53.Park CW, Oh SJ, Kim KS, Jang MC, Kim IS, Lee YK, Chung MJ, Cho BH, Seo SW. Artificial intelligence-based classification of bone tumors in the proximal femur on plain radiographs: system development and validation. PLoS ONE. 2022;17(2). 10.1371/journal.pone.0264140. https://doi.org/ARTNe0264140. [DOI] [PMC free article] [PubMed]
- 54.Saleena TS, Ilyas PM, Sajna VMK, Haque AKMB. Deep Learning Techniques for Quantification of Tumour Necrosis in Post-neoadjuvant Chemotherapy Osteosarcoma Resection Specimens for Effective Treatment Planning. Acta Inf Prag. 2023;12(1):87–103. 10.18267/j.aip.207. [Google Scholar]
- 55.Sampath K, Rajagopal S, Chintanpalli A. A comparative analysis of CNN-based deep learning architectures for early diagnosis of bone cancer using CT images. Sci Rep-Uk. 2024;14(1). 10.1038/s41598-024-52719-8. [DOI] [PMC free article] [PubMed]
- 56.Shao JJ, Lin HX, Ding L, Li B, Xu DY, Sun Y, Guan TM, Dai HY, Liu RH, Deng DM, Huang BS, Feng ST, Diao XF, Gao ZH. Deep learning for differentiation of osteolytic osteosarcoma and giant cell tumor around the knee joint on radiographs: a multicenter study. Insights Imaging. 2024;15(1). https://doi.org/ARTN35.10.1186/s13244-024-01610-1 [DOI] [PMC free article] [PubMed]
- 57.Shuai LM, Zou W, Hu N, Gao X, Wang JJ. An advanced W-shaped network with adaptive multi-scale supervision for osteosarcoma segmentation. Biomed Signal Proces. 2023;80. https://doi.org/ARTN10424310.1016/j.bspc.2022.104243.
- 58.Song LW, Li CP, Tan LL, Wang MH, Chen XQ, Ye Q, Li SS, Zhang R, Zeng QH, Xie ZY, Yang W, Zhao YH. A deep learning model to enhance the classification of primary bone tumors based on incomplete multimodal images in X-ray, CT, and MRI. Cancer Imaging. 2024;24(1). https://doi.org/ARTN13510.1186/s40644-024-00784-7. [DOI] [PMC free article] [PubMed]
- 59.Tao YZ, Huang X, Tan YW, Wang HW, Jiang WQ, Chen Y, Wang CL, Luo J, Liu Z, Gao KR, Yang W, Guo MK, Tang BY, Zhou AG, Yao ML, Chen TM, Cao YD, Luo CS, Zhang J. Qualitative histopathological classification of primary bone tumors using deep learning: a pilot study. Front Oncol. 2021;11. https://doi.org/ARTN73573910.3389/fonc.2021.735739. [DOI] [PMC free article] [PubMed]
- 60.Vezakis IA, Lambrou GI, Matsopoulos GK. Deep learning approaches to osteosarcoma diagnosis and classification: a comparative methodological approach. Cancers. 2023;15(8). 10.3390/cancers15082290. [DOI] [PMC free article] [PubMed]
- 61.Vijayaraj J, Abirami B, Mohanty SN, Kavitha VP. An efficient convolutional histogram-oriented gradients and deep convolutional learning approach for accurate classification of bone cancer. Int J Imag Syst Tech. 2024;34(2). 10.1002/ima.23000.
- 62.von Schacky CE, Wilhelm NJ, Schäfer VS, Leonhardt Y, Gassert FG, Foreman SC, Gassert FT, Jung M, Jungmann PM, Russe MF, Mogler C, Knebel C, von Eisenhart-Rothe R, Makowski MR, Woertler K, Burgkart R, Gersing AS. Multitask Deep Learning for Segmentation and Classification of Primary Bone Tumors on Radiographs. Radiology. 2021;301(2):398–406. 10.1148/radiol.2021204531. [DOI] [PubMed] [Google Scholar]
- 63.Wang H, Xu SH, Fang KB, Dai ZS, Wei GZ, Chen LF. Contrast-enhanced magnetic resonance image segmentation based on improved U-Net and Inception-ResNet in the diagnosis of spinal metastases. J Bone Oncol. 2023;42:100498. 10.1016/j.jbo.2023.100498. [DOI] [PMC free article] [PubMed]
- 64.Wang LA, Yu L, Zhu J, Tang HY, Gou FF, Wu J. Auxiliary segmentation method of osteosarcoma in MRI images based on denoising and local enhancement. Healthcare-Basel. 2022;10(8). 10.3390/healthcare10081468. [DOI] [PMC free article] [PubMed]
- 65.Wang S, Sun M, Sun J, Wang Q, Wang G, Wang X, Meng X, Wang Z, Yu H. Advancing musculoskeletal tumor diagnosis: Automated segmentation and predictive classification using deep learning and radiomics. Comput Biol Med. 2024;175:108502. 10.1016/j.compbiomed.2024.108502. [DOI] [PubMed] [Google Scholar]
- 66.Wu J, Yang S, Gou FF, Zhou ZX, Xie P, Xu N, Dai ZH. Intelligent segmentation medical assistance system for MRI images of osteosarcoma in developing countries. Comput Math Method M. 2022;2022. 10.1155/2022/7703583Artn7703583. [DOI] [PMC free article] [PubMed]
- 67.Wu J, Yuan TY, Zeng JC, Gou FF. A Medically Assisted Model for Precise Segmentation of Osteosarcoma Nuclei on Pathological Images. Ieee J Biomed Health. 2023;27(8):3982–93. 10.1109/Jbhi.2023.3278303. [DOI] [PubMed] [Google Scholar]
- 68.Xie ZY, Zhao HM, Song LW, Ye Q, Zhong LM, Li SS, Zhang R, Wang MH, Chen XQ, Lu ZX, Yang W, Zhao YH. A radiograph-based deep learning model improves radiologists’ performance for classification of histological types of primary bone tumors: a multicenter study. Eur J Radiol. 2024;176. https://doi.org/ARTN11149610.1016/j.ejrad.2024.111496. [DOI] [PubMed]
- 69.Xu DY, Li B, Liu WX, Wei D, Long XW, Huang TY, Lin HX, Cao KY, Zhong SN, Shao JJ, Huang BS, Diao XF, Gao ZH. Deep learning-based detection of primary bone tumors around the knee joint on radiographs: a multicenter study. Quant Imag Med Surg. 2024;14(8). 10.21037/qims-23-1743. [DOI] [PMC free article] [PubMed]
- 70.Xu ZY, Niu K, Tang S, Song TQ, Rong Y, Guo W, He ZQ. Bone tumor necrosis rate detection in few-shot X-rays based on deep learning. Comput Med Imag Grap. 2022;102:102141. 10.1016/j.compmedimag.2022.102141. [DOI] [PubMed]
- 71.Ye Q, Yang HN, Lin BM, Wang MH, Song LW, Xie ZY, Lu ZX, Feng QJ, Zhao YH. Automatic detection, segmentation, and classification of primary bone tumors and bone infections using an ensemble multi-task deep learning framework on multi-parametric MRIs: a multi-center study. Eur Radiol. 2024;34(7):4287–99. 10.1007/s00330-023-10506-5. [DOI] [PubMed] [Google Scholar]
- 72.Potter IY, Yeritsyan D, Mahar S, Wu JM, Nazarian A, Vaziri A, Vaziri A. Automated Bone Tumor Segmentation and Classification as Benign or Malignant Using Computed Tomographic Imaging. J Digit Imaging. 2023;36(3):869–78. 10.1007/s10278-022-00771-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Zhan XB, Liu J, Long HY, Zhu J, Tang HY, Gou FF, Wu J. An intelligent auxiliary framework for bone malignant tumor lesion segmentation in medical image analysis. Diagnostics 2023;13(2). 10.3390/diagnostics13020223. [DOI] [PMC free article] [PubMed]
- 74.Zheng F, Yin P, Liang KW, Wang YJ, Hao WH, Hao Q, Hong N. Fusion Radiomics-Based Prediction of Response to Neoadjuvant Chemotherapy for Osteosarcoma. Acad Radiol. 2024;31(6):2444–55. 10.1016/j.acra.2023.12.015. [DOI] [PubMed] [Google Scholar]
- 75.Zhong JY, Zhang CX, Hu YF, Zhang J, Liu Y, Si LP, Xing Y, Ding DF, Geng J, Jiao Q, Zhang HZ, Yang G, Yao WW. Automated prediction of the neoadjuvant chemotherapy response in osteosarcoma with deep learning and an MRI-based radiomics nomogram. Eur Radiol. 2022;32(9):6196–206. 10.1007/s00330-022-08735-1. [DOI] [PubMed] [Google Scholar]
- 76.Aydin Simsek S, Aydin A, Say F, Cengiz T, Ozcan C, Ozturk M, Okay E, Ozkan K. Enhanced enchondroma detection from x-ray images using deep learning: A step towards accurate and cost-effective diagnosis. J Orthop Res. 2024;42(12):2826–34. 10.1002/jor.25938. [DOI] [PubMed] [Google Scholar]
- 77.Dalai SS, Ranjan Sahu BJ, Rautaray J, Khan MI, Jabr BA, Ali YA. Automated Bone Cancer Detection Using Deep Learning on X-Ray Images. Surg Innov. 2025;32(2):94–108. 10.1177/15533506241299886. [DOI] [PubMed] [Google Scholar]
- 78.Deng J, Moskalyk M, Shammas-Toma M, Aoude A, Ghert M, Bhatnagar S, Bozzo A. Development of Machine Learning Models for Predicting the 1-Year Risk of Reoperation After Lower Limb Oncological Resection and Endoprosthetic Reconstruction Based on Data From the PARITY Trial. J Surg Oncol. 2024;130(8):1706–16. 10.1002/jso.27854. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Deng S, Huang Y, Li C, Qian J, Wang X. Auxiliary diagnosis of primary bone tumors based on Machine learning model. J Bone Oncol. 2024;49:100648. 10.1016/j.jbo.2024.100648. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Gassert FG, Lang D, Hesse N, Durr HR, Klein A, Kohll L, Hinterwimmer F, Luitjens J, Weissinger S, Peeken JC, Mogler C, Knebel C, Bartzsch S, Gassert FT, Gersing AS. A deep learning model for classification of chondroid tumors on CT images. BMC Cancer. 2025;25(1):561. 10.1186/s12885-025-13951-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Hasei J, Nakahara R, Otsuka Y, Nakamura Y, Ikuta K, Osaki S, Hironari T, Miwa S, Ohshika S, Nishimura S, Kahara N, Yoshida A, Fujiwara T, Nakata E, Kunisada T, Ozaki T. The three-class annotation method improves the AI detection of early-stage osteosarcoma on plain radiographs: a novel approach for rare cancer diagnosis. Cancers (Basel). 2024;17(1). 10.3390/cancers17010029. [DOI] [PMC free article] [PubMed]
- 82.Hinterwimmer F, Guenther M, Consalvo S, Neumann J, Gersing A, Woertler K, von Eisenhart-Rothe R, Burgkart R, Rueckert D. Impact of metadata in multimodal classification of bone tumours. BMC Musculoskelet Disord. 2024;25(1):822. 10.1186/s12891-024-07934-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Hong R, Li Q, Ma J, Lu C, Zhong Z. Computed tomography-based radiomics machine learning models for differentiating enchondroma and atypical cartilaginous tumor in long bones. Rofo. 2025;197(4):416–23. 10.1055/a-2344-5398. [DOI] [PubMed] [Google Scholar]
- 84.Long QY, Wang FY, Hu Y, Gao B, Zhang C, Ban BH, Tian XB. Development of the interpretable typing prediction model for osteosarcoma and chondrosarcoma based on machine learning and radiomics: a multicenter retrospective study. Front Med (Lausanne). 2024;11:1497309. 10.3389/fmed.2024.1497309. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85.Nie P, Zhao X, Ma J, Wang Y, Li B, Li X, Li Q, Wang Y, Xu Y, Dai Z, Wu J, Wang N, Yang G, Hao D, Yu T. Can the preoperative CT-based deep learning radiomics model predict histologic grade and prognosis of chondrosarcoma? Eur J Radiol. 2024;181:111719. 10.1016/j.ejrad.2024.111719. [DOI] [PubMed] [Google Scholar]
- 86.Bolleddu Devananda R, Madhavi K. BCDNet: A deep learning model with improved convolutional neural network for efficient detection of bone cancer using histology images. Int J Comput Experimental Sci Eng. 2024;10(4). 10.22399/ijcesen.430.
- 87.Shouman M, Rahouma KH, Hamed HFA. Automatic segmentation, classification, and prediction of pelvic bone tumors using deep learning techniques. J Eng Appl Sci. 2024;71(1):214. 10.1186/s44147-024-00551-2. [Google Scholar]
- 88.Wang Z, Wu J, Li C, Wang B, Wu Q, Li L, Wang H, Tu C, Yin J. Diagnosis of osteosarcoma based on multimodal microscopic imaging and deep learning. J Innovative Opt Health Sci. 2025;18(02):2343001. 10.1142/s1793545823430010. [Google Scholar]
- 89.Yao S, Huang Y, Wang X, Zhang Y, Paixao IC, Wang Z, Chai CL, Wang H, Lu D, Webb GI, Li S, Guo Y, Chen Q, Song J. A Radiograph Dataset for the Classification, Localization, and Segmentation of Primary Bone Tumors. Sci Data. 2025;12(1):88. 10.1038/s41597-024-04311-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 90.Rizk PA, Gonzalez MR, Galoaa BM, Girgis AG, Van Der Linden L, Chang CY. Lozano-Calderon, machine learning-assisted decision making in orthopaedic oncology. JBJS Rev. 2024;12(7). 10.2106/jbjs.Rvw.24.00057e24.00057. [DOI] [PubMed]
- 91.Zhou X, Wang H, Feng C, Xu R, He Y, Li L, Tu C. Emerging applications of deep learning in bone tumors: current advances and challenges. Front Oncol. 2022;12. 10.3389/fonc.2022.908873. [DOI] [PMC free article] [PubMed]
- 92.Yu Y, Li M, Liu LL, Li YH, Wang JX. Clinical Big Data and Deep Learning: Applications, Challenges, and Future Outlooks. Big Data Min Analytics. 2019;2(4):288–305. 10.26599/bdma.2019.9020007. [Google Scholar]
- 93.Panayides AS, Amini A, Filipovic N, Sharma A, Tsaftaris SA, Young A, Foran DJ, Do N, Golemati S, Kurc T, Huang K, Nikita KS, Veasey B, Zervakis M, Saltz JH. Pattichis, AI in Medical Imaging Informatics: Current Challenges and Future Directions. Ieee J Biomed Health. 2020;24(7):1837–57. 10.1109/jbhi.2020.2991043. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 94.Guetari R, Ayari H, Sakly H. Computer-aided diagnosis systems: a comparative study of classical machine learning versus deep learning-based approaches. Knowl Inf Syst. 2023. 10.1007/s10115-023-01894-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 95.Kasneci E, Seßler K, Küchemann S, Bannert M, Dementieva D, Fischer F, Gasser U, Groh G, Günnemann S, Hüllermeier E. ChatGPT for good? On opportunities and challenges of large language models for education. Learn individual differences. 2023;103:102274. [Google Scholar]
- 96.Zhang C, Liu SS, Zhou XY, Zhou SY, Tian YL, Wang SL, Xu NF, Li WS. Examining the role of large language models in orthopedics: systematic review. J Med Internet Res. 2024;26. 10.2196/59607. [DOI] [PMC free article] [PubMed]
- 97.Tan Z, Liu A, Wan J, Liu H, Lei Z, Guo G, Li SZ. Cross-Batch Hard Example Mining With Pseudo Large Batch for ID vs. Spot Face Recognition. IEEE Trans Image Process. 2022;31:3224–35. 10.1109/tip.2021.3137005. [DOI] [PubMed] [Google Scholar]
- 98.Caiafa CF, Solé-Casals J, Marti-Puig P, Zhe S, Tanaka T. Decomposition methods for machine learning with small, incomplete or noisy datasets. Appl Sciences-Basel. 2020;10(23). 10.3390/app10238481.
- 99.Kokol P, Kokol M, Zagoranski S. Machine learning on small size samples: a synthetic knowledge synthesis. Sci Prog. 2022;105(1). 10.1177/00368504211029777. [DOI] [PMC free article] [PubMed]
- 100.Astarastoae V, Rogozea LM, Leasu F, Ioan BG. Ethical dilemmas of using artificial intelligence in medicine. Am J Ther. 2024;31(4):e388–97. 10.1097/mjt.0000000000001693. [DOI] [PubMed]
- 101.Jackson BR, Ye Y, Crawford JM, Becich MJ, Roy S, Botkin JR, de Baca ME, Pantanowitz L. The ethics of artificial intelligence in pathology and laboratory medicine: principles and practice. Acad Pathol. 2021;8. 10.1177/2374289521990784. [DOI] [PMC free article] [PubMed]
- 102.Savulescu J, Giubilini A, Vandersluis R, Mishra A. Ethics of artificial intelligence in medicine. Singapore Med J. 2024;65(3):150–8. 10.4103/singaporemedj.SMJ-2023-279. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 103.Galiana I, Gudino LC, González PM. Ethics and artificial intelligence. Rev Clin Esp. 2024;224(3):178–86. 10.1016/j.rce.2024.01.007. [DOI] [PubMed] [Google Scholar]
- 104.Yaqoob A, Verma NK, Aziz RM, Shah MA. Optimizing cancer classification: a hybrid RDO-XGBoost approach for feature selection and predictive insights. Cancer Immunol Immunother. 2024;73(12):261. 10.1007/s00262-024-03843-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 105.Yaqoob A, Verma NK, Aziz RM. Optimizing gene selection and cancer classification with hybrid sine cosine and cuckoo search algorithm. J Med Syst. 2024;48(1). 10.1007/s10916-023-02031-1. [DOI] [PubMed]
- 106.Abrar Y, Navneet Kumar V, Rabia Musheer A, Akash S. Enhancing feature selection through metaheuristic hybrid cuckoo search and Harris Hawks optimization for cancer classification, metaheuristics for machine learning: algorithms and applications. Wiley; 2024, pp. 95–134.
- 107.Sadr H, Salari A, Ashoobi MT, Nazari M. Cardiovascular disease diagnosis: a holistic approach using the integration of machine learning and deep learning models. Eur J Med Res. 2024;29(1). 10.1186/s40001-024-02044-7. [DOI] [PMC free article] [PubMed]
- 108.Yen HH, Tsai HY, Wang CC, Tsai MC, Tseng MH. An improved endoscopic automatic classification model for gastroesophageal reflux disease using deep learning integrated machine learning. Diagnostics 2022;12(11). 10.3390/diagnostics12112827. [DOI] [PMC free article] [PubMed]
- 109.Chen R, Mo X, Chen ZP, Feng PJ, Li HY. An integrated model combining machine learning and deep learning algorithms for classification of rupture status of IAs. Front Neurol. 2022;13. 10.3389/fneur.2022.868395. [DOI] [PMC free article] [PubMed]
- 110.Alquran H, Mustafa WA, Abu Qasmieh I, Yacob YM, Alsalatie M, Al-Issa Y, Alqudah AM. Cervical cancer classification using combined machine learning and deep learning approach. Cmc-Comput Mater Con. 2022;72(3):5117–34. 10.32604/cmc.2022.025692. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
Data extracted from included studies are presented in tables. All included data can also be accessed directly from the relevant publications [20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43 , 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69].



