Development and reporting of artificial intelligence in osteoporosis management

Guillaume Gatineau; Enisa Shevroja; Colin Vendrami; Elena Gonzalez-Rodriguez; William D Leslie; Olivier Lamy; Didier Hans

doi:10.1093/jbmr/zjae131

. 2024 Aug 20;39(11):1553–1573. doi: 10.1093/jbmr/zjae131

Development and reporting of artificial intelligence in osteoporosis management

Guillaume Gatineau ^1,^✉, Enisa Shevroja ², Colin Vendrami ³, Elena Gonzalez-Rodriguez ⁴, William D Leslie ⁵, Olivier Lamy ⁶, Didier Hans ⁷

PMCID: PMC11523092 PMID: 39163489

Abstract

An abundance of medical data and enhanced computational power have led to a surge in artificial intelligence (AI) applications. Published studies involving AI in bone and osteoporosis research have increased exponentially, raising the need for transparent model development and reporting strategies. This review offers a comprehensive overview and systematic quality assessment of AI articles in osteoporosis while highlighting recent advancements. A systematic search in the PubMed database, from December 17, 2020 to February 1, 2023 was conducted to identify AI articles that relate to osteoporosis. The quality assessment of the studies relied on the systematic evaluation of 12 quality items derived from the minimum information about clinical artificial intelligence modeling checklist. The systematic search yielded 97 articles that fell into 5 areas; bone properties assessment (11 articles), osteoporosis classification (26 articles), fracture detection/classification (25 articles), risk prediction (24 articles), and bone segmentation (11 articles). The average quality score for each study area was 8.9 (range: 7–11) for bone properties assessment, 7.8 (range: 5–11) for osteoporosis classification, 8.4 (range: 7–11) for fracture detection, 7.6 (range: 4–11) for risk prediction, and 9.0 (range: 6–11) for bone segmentation. A sixth area, AI-driven clinical decision support, identified the studies from the 5 preceding areas that aimed to improve clinician efficiency, diagnostic accuracy, and patient outcomes through AI-driven models and opportunistic screening by automating or assisting with specific clinical tasks in complex scenarios. The current work highlights disparities in study quality and a lack of standardized reporting practices. Despite these limitations, a wide range of models and examination strategies have shown promising outcomes to aid in the earlier diagnosis and improve clinical decision-making. Through careful consideration of sources of bias in model performance assessment, the field can build confidence in AI-based approaches, ultimately leading to improved clinical workflows and patient outcomes.

Keywords: Analysis/quantitation of bone, osteoporosis, orthopaedics, fracture risk assessment, screening

Graphical Abstract

Introduction

Artificial intelligence (AI) and its subfields, machine learning (ML) and deep-learning (DL), are revolutionizing many domains by developing algorithmic tools that mimic human reasoning and behaviors through the identification of high-dimensional patterns in data.¹ This involves training complex models and architectures for task-specific questions. AI is being applied to complex medical scenarios and multifactorial conditions. However, AI models may produce unreliable or misleading results, prompting researchers to develop explainable AI (XAI) methods. XAI aims to enhance the interpretability and explainability of the models, making the decision-making process more transparent and accessible to humans.²

Osteoporosis is a systemic skeletal disorder characterized by compromised bone strength, increasing the risk of fractures. It is an increasingly prevalent condition that affects up to 1 in 2 women and 1 in 5 men after the age of 50.³^,⁴ Fractures lead to significant health, societal, and economic burdens.⁴ DXA is the reference standard for diagnosing osteoporosis, assessing BMD. Determining fracture risk is crucial in osteoporosis prevention and treatment. FRAX® is the most widely used tool for quantifying fracture risk, by integrating clinical risk factors and DXA-derived bone parameters.⁵ However, FRAX® remains an imperfect tool, and there is a need for more accessible and accurate tools for identifying patients with elevated risk of osteoporosis and/or fracture. The availability of vast amounts of medical data and improved computing power has facilitated an exponential increase in studies using ML and DL methods in bone research using clinical and imaging data.⁶^,⁷

Objectives

This review aims to summarize recent advancements in AI for osteoporosis management, report state-of-the-art AI methods, and evaluate key quality items associated with these methods. It is a logical update of the qualitative review by Smets et al. published in 2020,⁷ serving a broad audience from clinicians to researchers.

Materials and methods

Literature search strategy

A relevant search strategy to fracture and osteoporosis was created with a Medical Library expert from University of Lausanne (C.J). A systematic search was performed in PubMed from December 17, 2020 (end date of the systematic search performed by Smets et al.) to February 1, 2023. The search strategy is provided in the Supplementary Material (section I).

Study selection

All identified records were extracted and imported into Rayyan, where duplicates were removed and the titles and abstracts were screened.⁸ Full-text records were then retrieved after this initial selection. Inclusion criteria were: original study, written in English, use of AI methods, and gold standard approaches for osteoporosis management, including confirmed densitometric assessment for osteoporosis or fragility fractures of the forearm, hip, spine, or humerus.

Study qualitative analysis

The minimum information about clinical artificial intelligence modeling⁷ checklist was used for qualitative assessment. This 6-part checklist includes subsections to avoid common misuses or pitfalls, and rigorously report study design, data and optimization, model performance, model examination, and reproducibility. A simplified version of the checklist was created from the 12 most relevant items to cover the 6 main quality assessment parts (Supplementary Material section 2, Box S1).

Each article was scored with a maximum of 12 points, and methodological details summarizing key study characteristics were retrieved: country (population), task, input data modality, amount of data for model development, amount of data for external validation (EV), number of inputs, model architecture, cross-validation (CV) or train/validation/test split, evaluation metrics, best results, and quality score.

Results

Search strategy

The search identified 409 records from PubMed. A total of 97 articles meeting inclusion criteria were included in this qualitative review. The workflow and results of the search are summarized in Figure 1.

General characteristics of the studies

The 97 included articles fell within 5 broad areas: bone properties assessment (11 articles), osteoporosis classification (26 articles), fracture detection/classification (25 articles), risk prediction (24 articles), and bone segmentation (11 articles). Tables 1–5 provide a comprehensive summary of the general characteristics, methodological details, main results, and overall quality score for each of the 5 study areas. Most studies were performed in Asia (68%), with China (27%), South Korea (13%), and the USA (13%) being the most represented countries. ML input data modalities included databases (49%), X-rays (27%), CT (15%), MRI (9%), DXA images (5%), and QUS (2%). To note, database input modalities represent any form of input data that are not in visual image format, such as structured information combining qualitative or quantitative features. Most database studies included DXA-derived measurements such as BMD. Thus, DXA was a data source in more than 5% of studies. Similarly, CT was a data source in more than 15% of the studies, as some database studies include Hounsfield Units (HU) derived from CT scans. Figure 2 presents the distribution of all study tasks per input modality for the 97 included articles.

Table 1.

General characteristics, methodological details, main results, and overall quality score for bone properties studies.

Reference	Country (population)	Task	Modality	Data amount = n (slices)	Data amount external validation	Inputs (no.)	Model	Train/validation/test	K-fold cross validation	Evaluation metrics	Best result	Quality score (max 12)
Dai et al.⁹	China	BMD	Database	245		1218	Stacking Linreg, RF, XGB, stacking ensemble	80% train, 20% test	10-fold	Adjusted R2, RMSE, MAE	R2 = 0.83	9
Ho et al.¹⁰	Taiwan	BMD	X-ray, database	5027		IMG³	ResNet-18	80% train, 20% test	5-fold	Pearson’s correlation coefficient, accuracy, sensitivity, specificity	R2 = 0.85, Accuracy = 0.88	10
Hsieh et al.¹¹	Taiwan	BMD	X-ray, database	36 279	5406	IMG patches	VGG-11, VGG-16, ResNet-18, ResNet-34, ensemble	50% train, 50% test	4-fold	Pearson’s correlation coefficient, R2, RMSE, calibration slope, calibration bias	R2 = 0.9	11
Kang et al.¹²	South Korea	BMD	CT	547		IMG patches	U-net and custom ResNet	83% train, 17% test	10-fold	Pearson’s correlation coefficient, MAPE, accuracy, specificity, sensitivity, F1-score	R2 = 0.90	7
Min et al.¹³	South Korea	BMD	Database	736		45	Linreg, MLP	80% train, 20% test	10-fold	MSE, Pearson’s correlation coefficient	R2 = 0.78	7
Nguyen et al.¹⁴	South Korea	BMD	X-ray, database	660		IMG	Custom VGG ensemble	5-fold 80% train, 20% test	5-fold	Pearson’s correlation coefficient, MAE	R2 = 0.81	9
Nissinen et al.¹⁵	Finland	Pathological features	DXA	2949	574	IMG	VGG-16, VGG-19, Inception-V3, DenseNet-121, custom	10-fold train/validation	10-fold	Accuracy, confusion matrix, sensitivity, specificity, AUC	AUC = 0.94	11
Sato et al.¹⁶	Japan	BMD	X-ray	17 899		IMG patches	ResNet-50	70% train, 10% validation, 20% test		Pearson’s correlation coefficient, MAE, BA, calibration slope, calibration bias	R2 = 0.75	7
Tanphiriyakun et al.¹⁷	Thailand	BMD change	Database	13 562		225	RF, XGB, Logreg, SVM, NB, MLP, KNN	90% train, 10% test	3-fold	Accuracy, precision, recall, F1-Score, AUC	AUC = 0.7	8
Xiao et al.¹⁸	USA	Bone stiffness	DXA	522		IMG^1–3^–^6–9	CNN	80% train, 20% test		Pearson’s correlation coefficient, norm error	R2 = 0.97	9
Zhang et al.¹⁹	China	Bone strength	Database	80		46	SVR	68% train, 32% test	10-fold	MSE, R2, mean bias, SD bias	MSE = 0.014, R2 = 0.93	10

Open in a new tab

Abbreviations: IMG, image; SVR, support vector regression; CNN, convolutional neural network; Linreg, linear regression; Logreg, logistic regression; RF, random forests; XGB, extreme gradient boosting; SVM, support vector machines; NB, Naïve Bayes; MLP, multilayer perceptron; R2, coefficient of determination; RMSE, root-mean-square error; AUC = area under the curve of the receiver operating characteristic; MAE, mean absolute error; MAPE, mean absolute percentage error; MSE, mean squared error; BA, Bland–Altman. If several models were evaluated for a given task, the best performing is highlighted in bold.

Table 5.

General characteristics, methodological details, main results, and overall quality score for bone segmentation studies.

Reference	Country (population)	Task	Modality	Data amount	Data amount external validation	Inputs (no.)	Model	CV train / validation / test	K-fold cross validation	Evaluation metrics	Best result	Quality score (max 12)
Cheng et al.⁹⁶	China	SS	CT	15	15	IMG¹⁰	Custom U-Net	66.6% train, 33.3% test		Dice coefficient, location error, detection rate, IoU, Hausdorff distance, pixel accuracy	Dice coefficient: 0.95	10
Deng et al.⁹⁷	China	HS	CT	100		IMG	U-Net	85% train, 15% test	10-fold	Dice coefficient, average surface distance, sensitivity, specificity	Dice coefficient: 0.98	10
Kim et al.⁹⁸	South Korea	SS	X-ray	797		IMG	U-Net, hybrid	80% train, 20% test		Dice coefficient, precision, sensitivity, specificity, area error, Hausdorff distance	Dice coefficient: 0.92	10
Kim et al.⁹⁹	South Korea	SS	X-ray	339		IMG	U-Net, R2U-Net, SegNet, E-Net, dilated recurrent residual U-Net	80% train, 20% test	5-fold	Sensitivity, specificity, accuracy, dice coefficient	Dice coefficient: 0.93	7
Park et al.¹⁰⁰	South Korea	SS	CT	467	102	IMG	U-Net	80% train, 20% test		Dice coefficient	Dice coefficient: 0.93	11
Suri et al.¹⁰¹	USA	SS	X-ray, CT, MRI	6975		IMG	Custom CNN	5-fold	5-fold	Accuracy, IoU, dice coefficient	Dice coefficient: 0.95	10
Wang et al.¹⁰²	China	HS	CT	50		IMG	U-Net	66% train, 20% validation, 14% test	5-fold	Dice coefficient, precision, sensitivity	Dice coefficient: 0.92	8
Wei et al.¹⁰³	China	FS	X-ray	1274		IMG	hybrid ResNet+FPN and DeepLabv3	60% train, 20% validation, 20% test		MAP, AUC	AUC: 0.98	10
Yang et al.¹⁰⁴	China	FS	DXA	720		IMG²	U-Net Resblock	83% train-validation, 17% test	5-fold	Dice coefficient, Jaccard index	Dice coefficient: 0.99	10
Yang et al.¹⁰⁵	China	HS	CT	160		IMG	DenseUnet and Mask R-CNN	75% train, 25% test		Accuracy, dice coefficient	Accuracy: 0.89, dice: 0.90	6
Zhao et al.¹⁰⁶	China	SS	MRI	222	25	IMG	U-Net	70% train, 30% test		Dice coefficient	Dice coefficient: 0.912	7

Open in a new tab

Abbreviations: FS, forearm segmentation; SS, spine segmentation; HS, hip segmentation; IMG, image; CNN, convolutional neural network; IoU, intersection over union; MAP, mean average precision; AUC, area under the curve of the receiver operating characteristic. If several models were evaluated for a given task, the best performing is highlighted in bold.

Distribution of study tasks per input modalities included in this qualitative review. FX, fracture; FF, forearm fracture; FS, forearm segmentation; HF, hip fracture; HS, hip segmentation; MOF, major osteoporotic fracture; OP, osteoporosis; SS, spine segmentation; VF, vertebral fracture. As an example, studies that used X-ray input images investigated BMD measurements, LS OP classification, spine OP classification, hip OP classification, LS or hip OP classification, VF prediction, VF detection, multiple anatomical sites FX detection, HF detection, SS and forearm segmentation (FS).

Qualitative analysis of the studies

The percentage of studies fulfilling each quality item is represented in Figure 3. The average quality score for each study area was 8.9 (range: 7–11) for bone properties assessment, 7.8 (range: 5–11) for osteoporosis classification, 8.4 (range: 7–11) for fracture detection, 7.6 (range: 4–11) for risk prediction, and 9.0 (range: 6–11) for bone segmentation.

Quality scores per item for all included studies (n = 97).

The 3 most fulfilled items were: research task definition (100%), validation methodology (93%), and reliability and robustness discussion (92%). The least fulfilled items were reproducibility and transparency (6%), use of a baseline model for performance comparison (53%), and justification of performance metrics (53%).

For deeper interest in AI subfields and model selection, the reader can refer to the Supplementary Material, Section 3. Supplementary Material, Box S1, gives an overview of the key ML/DL development, evaluation, and reporting steps. Concepts of data stratification, CV, classification tasks, regression tasks, segmentation tasks, and their respective performance metrics are provided in Tables S1–S3.

Studies on bone properties

Eleven studies evaluated AI models for bone properties assessment, including BMD prediction,^9–14 BMD change over time,¹⁷ pathological features detection,¹⁵ bone stiffness assessment,¹⁸ and bone strength prediction.¹⁹ The objective of these studies was to develop opportunistic screening tools to assess and predict bone health parameters to intervene at an earlier stage and improve the prognosis of individuals at high risk of osteoporotic fracture. Study details are presented in Table 1, and their individual quality scores are specified in Table S4.

AI models for BMD prediction used diverse approaches and input modalities. BMD prediction from spine, hip, or chest radiographs was investigated with convolutional neural networks (CNNs).¹⁰^,¹¹^,¹⁶ Hip BMD prediction from X-ray was performed as a regression task for the first time.¹⁰^,¹¹ The ResNet-18 architecture was selected to predict hip BMD from pelvis X-rays. The selection of this model was motivated by its lighter architecture compared with larger neural networks that are better suited to multi-class classification tasks.¹⁰ Two image pre-processing approaches for hip BMD prediction compared a ROI with and without surrounding soft tissues. The model utilizing a ROI with surrounding soft tissues had the best correlation with DXA BMD (r = 0.84 ± 0.02 vs r = 0.81 ± 0.03). Gaussian occlusion sensitivity (GOS) enabled heat-map visualizations showing model activations in both the femur and adjacent soft tissues. Hsieh and colleagues advanced CNN applications for BMD prediction with a multi-stage approach.¹¹ First, a deep adaptive graph CNN was used for precise anatomical landmark detection and ROI extraction of the LS vertebrae and hip, serving as an opportunistic vertebral fracture (VF) detection tool with 93.2% sensitivity and 91.5% specificity on independent cases. Then, PelviXNet CNN architectures identified cases with structural bone changes, including vertebral compression fractures (VCFs), hip fractures (HFs), and surgical implants. Finally, ResNet and VGG architectures, each augmented with transfer learning, achieved BMD prediction. ResNet-18 and ResNet-34 provided the best results for hip BMD (r = 0.917), while VGG-16 performed best for spine BMD (r = 0.900). The robust performance across stages and successful EV highlighted the multi-stage approach’s potential for opportunistic BMD screening. Other pre-processing approaches for hip BMD prediction involved Sobel gradient filtering applied to 3 different Singh ROIs of 160 x 160 pixels.¹⁴ The Sobel gradient filtering enhanced trabecular patterns, significantly improving the DXA BMD correlation from 0.308 to 0.723. An ensemble of VGG-nets, each trained on one specific Singh ROI, aggregated the BMD predictions, improving the correlation (r = 0.807).

BMD prediction was also achieved from CT using CNNs with image slices¹² or ML regression models using image-derived features such as HU and/or radiomics.⁹^,¹³^,¹⁹ BMD prediction from CT image slices using a ResNet-101v2 architecture showed a correlation of 0.905 with DXA BMD when including surrounding soft tissues, compared to 0.878 when these tissues were removed, corroborating previous findings for hip BMD prediction from X-rays.¹⁰

BMD change and response to treatment were investigated using multiple ML algorithms with various input features, including demographic data, diagnoses, laboratory results, medications, and initial BMD measurements.¹⁷ Nissinen et al. developed and evaluated different CNN architectures for detecting pathological features from DXA images, including severe scoliosis and unreliable BMD measurements due to structural abnormalities.¹⁵ Xiao et al. used DXA images to predict the apparent stiffness tensor of cadaveric trabecular bone cubes.¹⁸ Their model was trained with micro-CT-based finite element simulations as ground truth. Finally, Zhang et al. evaluated a support vector machine (SVM) regressor to predict femoral strength in elderly men from QCT-based finite element analysis.¹⁹

Six studies (54%) involved baseline models for comparison. Incorporating clinical variables such as age, height, and weight into CNN architectures improved hip BMD prediction from X-rays, increasing the correlation from 0.766 to 0.807.¹⁴ The performance of an image-based CNN showed equivalent performance with histomorphometry and bone/volume/fraction tensor parameter-based regression models for predicting the apparent stiffness tensor of trabecular bone cubes.¹⁸ Zhang and colleagues compared the performance of support vector regression models using different sets of input features and dimensionality reduction for predicting femoral strength from QCT.¹⁹ They obtained the best results by reducing the dimensionality from 46 radiomics to 12 components with principal component analysis (PCA),¹⁰⁷ keeping more than 95% of the explained variance. Hsieh et al. compared FRAX performance using DL-predicted BMD from X-ray or DXA BMD, finding no significant differences.¹¹ Nissinen et al. compared custom CNN architectures with classical CNN classifiers for predicting scoliosis and unreliability in BMD measurements from LS DXA scans.¹⁵ Automation of hyperparameter tuning involved random search and Hyperband. Their model outperformed radiologists in scoliosis detection (94.1% vs 92.5%) and in terms of image unreliability (82.4% vs 78.8%). Finally, Dai et al. compared the performance of common ML algorithms against a 2-tier stacking ensemble model to predict spine BMD from CT images. The stacking ensemble model yielded superior performances with correlation and calibration bias with DXA BMD of 0.932 and −0.01 ± 0.14 mg/cm², respectively.⁹

Image-based features or variable contributions were investigated in 6 studies (54%). Heat maps were generated using GOS maps,¹⁰ gradient-weighted class activation mapping (Grad-CAM),¹²^,¹⁵ or vanilla gradient descent.¹⁵ Innovative strategies enabled heat-map visualizations from a regressor CNN instead of a classifier.¹² Feature engineering strategies included PCA and Least Absolute Shrinkage and Selection Operator (LASSO)¹⁰⁸ for dimensionality reduction,⁹^,¹⁹ and Shapley Additive explanations (SHAP)¹⁰⁹ for ranking variable importance and contribution.¹⁷ The LASSO model efficiently reduced the dimensionality of CT radiomics from 1218 features to 11 for predicting spine BMD.

Studies on osteoporosis classification

Twenty-six studies on osteoporosis classification were included; their characteristics are summarized in Table 2, and quality assessment in Table S5. The objective of these studies was to furnish novel screening tools for opportunistic osteoporosis identification. The ground-truth for osteoporosis diagnosis was DXA, using LS,^20–27^,³⁰^,^34–37^,⁴¹^,⁴⁵ hip,²¹^,^24–26^,²⁸^,^34–36^,⁴¹^,⁴⁴ forearm,²²^,³¹^,³⁸ or the minimum BMD T-score of these regions.³³^,⁴²^,⁴³ Input modalities included databases,²⁰^,²¹^,^23–26^,³⁰^,^33–37^,^40–45 X-ray images,³²^,⁴¹^,⁴⁴ QUS signals,²²^,³¹ or the combination of images and patient characteristics.³²^,⁴¹^,⁴⁴ Image-based osteoporosis classification involved CNNs in every study with common architectures,²³^,²⁷^,²⁸^,³² multi-stage approaches,³¹ or ensemble models.²²^,⁴¹^,⁴⁴ Studies without image input used ML models with numerical features derived from CT,²⁰^,²³^,²⁶^,³⁵^,³⁷^,³⁸^,⁴⁵ X-ray image features,⁴³ population-based data, and electronic medical records²⁰^,²¹^,²⁴^,²⁵^,²⁹^,³³^,³⁴^, ³⁹^,⁴⁰^,⁴² or a combination of CT radiomics and clinical attributes.³⁰ CT bone images and ROIs were obtained from from 3D slicer software,²⁶^,^35–38 or automatically from AI models including a ResNet CNN to segment vertebral bodies²³ and the FDA-approved AI-Rad Companion software.⁴⁵ Unsupervised learning techniques including Fuzzy C-means and K-means clustering were used as X-ray pre-processing strategies to cluster pixels into trabecular patterns and predict osteoporosis.⁴³ Pre-processing strategies from CT images included the extraction of 12 texture and shape features of vertebral bodies with gray-level co-occurrence matrices and Otsu binarization.³⁰ Interestingly, these texture and shape features contributed more to osteoporosis classification than clinical parameters including age, age of menopause, or BMI.

Table 2.

General characteristics, methodological details, main results, and overall quality score for osteoporosis classification studies.

Reference	Country (population)	Task	Modality	Data amount (%OP)	Data amount external validation (%OP)	Inputs (no.)	Model	CV train / validation / test	K-fold cross validation	Evaluation metrics	Best result	Quality score (max 12)
Biamonte et al.²⁰	Italy	Spine OP	Database	240 (24%)		20	SVM	75% train, 25% test		Accuracy, sensitivity, specificity, AUC	AUC: 0.79	6
Bui et al.²¹	Vietnam	LS + Hip OP	Database	1951 (28,9%)		15	Logreg, SVM, RF, MLP	80% train, 20% test	5-fold	AUC, precision, recall, F1-score	AUC: 0.85	9
Chen et al.²²	China	Radius OP	QUS	114 (29%)		4	CNN, RF	75% train, 12% validation, 13% test	5-fold	Accuracy, sensitivity, specificity, Kappa, AUROC, AUC	AUC: 0.80	9
Chen et al.²³	Taiwan	LS OP	Database	197 (51%)	396 (1.3%)	44	ResNet50 + SVM	5-fold CV	5-fold	AUC, accuracy, sensitivity, specificity, PPV, NPV	AUC: 0.98	7
Erjiang et al.²⁴	Ireland	LS + Hip OP	Database	13 577 (18.1%)		30	CatBoost, XGB, MLP, bagged flexible discriminant analysis (BFDa), RF, Logreg, SVM	Stratified, 80% train, 20% validation		AUC	AUC: 0.83	9
Fasihi et al.²⁵	Iran	LS + Hip OP	Database	1224 (18%)		19	ET, RF, KNN, SVM, GB, extra trees (ET), AdaBoost, MLP	80% train, 20% test		AUC, accuracy, precision, sensitivity, specificity, F-score	AUC: 0.95	5
Huang et al.²⁶	China	LS + Hip OP	Database	172 (47.7 %)		826	GNB, RF, Logreg, SVM, GBM, XGB	60% train, 40% test	5-fold	AUC, sensitivity, specificity, accuracy	AUC: 0.86	8
Jang et al.²⁷	South Korea	Spine OP	X-ray	13 026 (33%)	1089 (29%)	IMG	Inception-v3	70% train, 10% validation, 20% test		AUC, accuracy, sensitivity, specificity	AUC: 0.88	10
Jang et al.²⁸	South Korea	Hip OP	X-ray	1001 (50.3%)	117 (73.5%)	IMG	VGG16 + NLNN	Stratified, 80% train, 10% validation, 10% test		Accuracy, sensitivity, specificity, AUC, PPV, NPV	AUC: 0.87	8
Kwon et al.²⁹	South Korea	LS + Hip OP	Database	1431 (57%)		1151	RF, AdaBoost, GBM	80% train, 20% test	5-fold	AUC	AUC: 0.91	8
Liu et al.³⁰	China	Spine OP	Database	246		28	Logreg, SVM, MLP, RF, XGBoost, Stacking	80% train, 20% test	10-fold	AUC	AUC: 0.96	7
Luo et al.³¹	China	Radius OP	QUS	274 (34%)		4	CNN	70% train, 15% validation, 15% test	5-fold	Accuracy, sensitivity, specificity, PPV, NPV, Kappa	Accuracy: 0.83	9
Mao et al.³²	China	LS OP	X-ray, database	5652 (33%)	628 (34%)	IMG (+3)	DenseNet	80% train, 10% validation, 10% test		AUC, sensitivity, specificity, PPV, NPV	AUC: 0.95	10
Ou Yang et al.³³	Taiwan	Any OP	Database	5982 (7%)		19	MLP, SVM, RF, KNN, Logreg	65% train, 15% validation, 20% test		AUC, sensitivity, specificity, Youden’s index	AUC: 0.84	8
Park et al.³⁴	South Korea	LS + Hip OP	Database	3309 (9.2%)		20	Logreg, XGB, MLP	70% train, 30% validation	10-fold	AUROC, AUPRC, sensitivity, specificity	AUC: 0.79	10
Sebro et al.³⁵	USA	LS + Hip OP	Database	253 (74%)		10	RF, XGBoost, NB, SVM	70% train, 30% test	10-fold	AUC, accuracy, sensitivity, specificity, PPV, NPV	AUC: 0.756	7
Sebro et al.³⁶	USA	LS + Hip OP	Database	394 (78%)		15	Logreg, LASSO, SVM	70% train, 30% test	10-fold	AUC, accuracy, sensitivity, specificity, PPV, NPV	AUC: 0.89	7
Sebro et al.³⁷	USA	LS + Hip OP	Database	364 (22.3%)		15	Lasso, elastic net, ridge regression, SVM with RBF	80% train, 20% test	10-fold	AUC, accuracy, sensitivity, specificity, PPV, NPV	AUC: 0.86	7
Sebro et al.³⁸	USA	Forearm OP	Database	196 (27.6%)		24	SVM, RF	49% train, 51% test	10-fold	AUC, sensitivity, specificity, accuracy, PPV, NPV	AUC: 0.82	7
Shen et al.³⁹	USA	Any OP	Database	211 (23.2%)		178	ZeroR, OneR, J48 tree, RF, KNN, Logreg, SVM, NB	80% train, 20% test	10-fold	Accuracy, sensitivity, specificity, AUC	AUC: 0.84	7
Suh et al.⁴⁰	South Korea	Hip OP	Database	8680 (3.1%) and 8274 (3.4%)		89 and 162	SVM, DT, ET, LGBM, Logreg, KNN, MLP	5-fold CV	5-fold	AUC	AUC: 0.92	7
Sukegawa et al.⁴¹	Japan	LS + Hip OP	X-ray, database	778 (30%)		IMG + 3	EfficientNet-b0, EfficientNet-b3, EfficientNet-b7, ResNet-18, ResNet-50, ResNet-152, Ensemble	80% train, 20% test	5-fold	Accuracy, AUC, precision, recall, specificity, F1 score	AUC: 0.92	11
Wang et al.⁴²	China	Any OP	Database	1419 (32%)		18	MLP, deep belief network (DBN), SVM, GA-DT	70% train, 30% test		Accuracy, Sensitivity, Specificity, AUC	AUC: 0.90	6
Widyaningrum et al.⁴³	Indonesia	Any OP	Database	102 (49%)		15	DT, NB, MLP	60% train, 40% test		Accuracy, sensitivity, specificity	Accuracy: 0.90	6
Yamamoto et al.⁴⁴	Japan	Hip OP	X-ray, database	1699 (53%)		IMG + 3	ResNet-18, ResNet-34, ResNet-50, ResNet-101, ResNet-152, ensemble	stratified,75% train, 25% test	4-fold	Accuracy, precision, recall, specificity, F1 Score, AUC	AUC: 0.89	9
Yang et al.⁴⁵	China	LS OP	Database	1046 (28%)		41	Logreg	NA	NA	AUC, sensitivity, specificity	AUC: 0.97	5

Open in a new tab

Abbreviations: OP, osteoporosis; IMG, image; Logreg, logistic regression; XGB, extreme gradient boosting; MLP, multilayer perceptron; RF, random forests; SVM, support vector machines; CNN, convolutional neural network; GBM, Gradient Boosting Machine; KNN, k-nearest neighbors; GNB, Gaussian naïve Bayes; LASSO, least absolute shrinkage and selection operator; NB, Naïve Bayes; DT, decision tree; ET, extra trees, LGBM, light gradient boosting machine; RBF, radial basis functions; GA-DT, genetic algorithm - decision tree; ET, extra trees; GB, gradient boosting; AUC, area under the curve of the receiver operating characteristic; NPV, negative predictive value; PPV, positive predictive value; NA, not available. If several models were evaluated for a given task, the best performing is highlighted in bold.

The best performances were seen with boosting algorithms,²²^,^24–26^,³⁴ SVM,²⁰^,³³^,^35–38 RF,²¹^,³⁹ or multilayer perceptron (MLP).²⁸^,⁴⁰^,⁴²^,⁴³ Ten (38%) studies assessed the efficiency of osteoporosis screening models using a baseline model or tool for comparison, such as the osteoporosis risk assessment index, osteoporosis self-assessment tool, osteoporosis self-assessment tool for Asians, osteoporosis index of risk, or simple calculated osteoporosis risk estimation.²¹^,²⁴^,^31–34^,⁴⁰^,⁴¹^,⁴⁴ Consistently, ML models including XGB,²⁴^,³⁴ SVM,³³ MLP,⁴⁰ or RF²¹ outperformed these tools. The synergistic effect of clinical variables with image features extracted from CNNs showed improved osteoporosis classification.²²^,³¹^,³²^,⁴¹^,⁴⁴

Feature relationships and image regions contributing to osteoporosis classification were explored in several studies.²⁰^,²²^,²³^,^26–28^,³⁴^,^39–41^,⁴⁵ Feature engineering strategies were mainly used to gain insights into the most relevant predictors and reduce dimensionality. These strategies included feature importance analysis,²⁰^,²¹^,²³^,³⁰^,³⁹ PCA,²² Local Interpretable Model-Agnostic Explanations (LIME),⁴⁰ LASSO,²⁶^,⁴⁰ SHAP,³⁴ or odds ratios (ORs).⁴⁵ The SHAP and LIME methods were used to visually rank variable importance.³⁴^,⁴⁰ Grad-CAM analysis was used to visualize activated regions from input images.²⁷^,²⁸^,⁴¹

Studies on fracture detection/classification

Twenty-five studies investigated fracture detection or classification; their main characteristics are shown in Table 3, and their quality assessment in Table S6. Among them, 20% investigated HF,⁴⁶^,⁴⁸^,⁶⁰^,⁶³ 68% VF,⁴⁷^,⁴⁹^,⁵⁰^,⁵²^,⁵³^,^56–59^,⁶¹^,⁶⁵^,⁶⁶^,^68–71 and 12% multiple fracture sites.⁵⁴^,⁶⁴^,⁶⁷ Studies used various medical image modalities as input: X-ray (48%),^46–49^,⁵²^,⁵⁴^, ⁶⁰^,^63–65^,⁶⁷ MRI (28%),⁵⁰^,⁵⁷^,⁶²^,⁶⁶^,⁶⁸^,⁶⁹ CT (24%),⁵⁶^,⁵⁷^,⁶²^,⁶⁴^, ⁷⁰^,⁷¹ DXA (8%),⁵⁸^,⁵⁹ and both CT and MRI (8%).⁵⁷^,⁶²

Table 3.

General characteristics, methodological details, main results, and overall quality score for fracture detection/classification studies.

Reference	Country (population)	Task	Modality	Data amount = n (slices) (% fractures)	Data amount external validation (% fractures)	Inputs (no.)	Model	CV train/validation/test	K-fold cross validation	Evaluation metrics	Best result	Quality score (max 12)
Bae et al.⁴⁶	South Korea	HF	X-ray	4189 (26%)	2099 (25%)	IMG	ResNet-18	80% train, 10% validation, 10% test		Sensitivity, specificity, AUC, Youden index	AUC 0.98	9
Chen et al.⁴⁷	China	VF	X-ray	3754 (50%)		IMG	ResNeSt-50 18	67% train, 33% test	5-fold	Accuracy, sensitivity, AUC	AUC: 0.80	11
Cheng et al.⁴⁸	Taiwan	HF	X-ray	7092 (61%)		IMG	DenseNet-169	75% train, 25% test	5-fold	Accuracy, sensitivity, specificity, AUC, PPV, NPV	AUC: 0.97	10
Chou et al.⁴⁹	Taiwan	VF	X-ray	7459 (15%)	1281 (8%)	IMG patches	YOLOv3, ResNet-34, DenseNet121, DenseNet201, ensemble	60% train, 20% validation, 20% test		Accuracy, sensitivity, specificity	Accuracy: 0.92	8
Del Lama et al.⁵⁰	Brazil	VF	MRI	189 (53%)		IMG patches + features⁵¹	VGG16, InceptionV3, Xception, hybrid	89% train, 11% test	10-fold	Precision, recall, f1-score, support, specificity, sensitivity, balanced accuracy	Balanced accuracy: 0.88	9
Dong et al.⁵²	USA	VF	X-ray	100 409 (1.2%)		IMG patches³	GoogLeNet	76.5% train, 8.5% validation, 15% test		AUC, AUC-PR	AUC: 0.99	10
Germann et al.⁵³	Switzerland	VF	MRI	1000 (23.8%)		IMG	U-Net	79% train, 9.8% validation, 9.8% test. 1.4% development		Accuracy, sensitivity, specificity, dice, ICC, Kappa	Accuracy: 0.96	10
Guermazi et al.⁵⁴	USA	Multiple	X-ray	60 170 (NR)	480 (50%)	IMG	Detectron2	70% train, 10% validation, 20% test		Sensitivity, specificity, AUC	AUC: 0.93	9
Inoue et al.⁵⁵	Japan	Multiple	CT	200 = 7664 (5.8%)		IMG	Faster R-CNN	90% train, 10% test		Sensitivity, precision, F1-score	Sensitivity: 0.79	7
Li et al.⁵⁶	China	VF	CT	433 (68%)		IMG	ResNet-50	10-fold CV	10-fold	Sensitivity, specificity, accuracy	Accuracy: 0.85	7
Li et al.⁵⁷	Taiwan	VF	CT, MRI	941 (17%)	52 (27%)	IMG	ResNet-34, ResNet-50, DenseNet121, DenseNet160, DenseNet201, ensemble	60% train, 20% validation, 20% test		Accuracy, sensitivity, specificity, AUC, Cohen’s Kappa	Accuracy: 0.89	8
Monchka et al.⁵⁸	Canada	VF	DXA	31 152 (15.5%)		IMG	Inception-ResNet-v2	71% train, 1% validation, 28% test		Balanced accuracy, sensitivity, specificity, PPV, NPV, F1-score	Balanced accuracy: 0.95	8
Monchka et al.⁵⁹	Canada	VF	DXA	12 742 (17%)		IMG	Inception-ResNet-v2, DenseNet, ensemble	60% train, 10% validation, 30% test		Accuracy, balanced accuracy, sensitivity, specificity, PPV, NPV, F1-score, AUC	AUC: 0.95	10
Murphy et al.⁶⁰	UK	HF	X-ray	3659 (56%)		IMG	GoogLeNet	60% train, 20% validation, 20% test		Accuracy, Cohen’s Kappa, AUC, F1-score	Accuracy: 0.92	10
Ozkaya et al.⁶¹	Turkey	VF	X-ray	390 (49%)		IMG	ResNet-50	52% train, 22% validation, 26% test		Sensitivity, specificity, AUC	AUC 0.84	5
Rosenberg et al.⁶²	Switzerland	VF	CT, MRI	222 = 630 (48%)		IMG patches	VGG16, ResNet18	90% train, 10% test	10-fold	Accuracy, sensitivity, specificity, NPV, AUC	Accuracy: 0.88	7
Sato et al.⁶³	Japan	HF	X-ray	10 484 (50%)		IMG³	EfficientNet-B4	80% train, 10% validation, 10% test		Accuracy, sensitivity, specificity, F1-score, AUC	Accuracy: 0.96	9
Twinprai et al.⁶⁴	Thailand	HF	X-ray	1000 (50%)		IMG	Yolov4-tiny	90% train, 10% test		Accuracy, precision, sensitivity, specificity, F1-score	Accuracy: 0.95	8
Xu et al.⁶⁵	China	VF	X-ray	1460 = 2031 (100%)	444 = 578	IMG	ResNet-18	80% train, 20% test		Accuracy, sensitivity, specificity, AUC, precision, F1-score, PPV, NPV	Accuracy: 0.83	11
Yabu et al.⁶⁶	Japan	VF	MRI	1624 (60%)		IMG	VGG-16, VGG-19, DenseNet-121, DenseNet-169, DenseNet-201, InceptionResNet-V2, Inception-V3, ResNet-50, Xception, ensemble	60% train, 40% test		Accuracy, sensitivity, specificity, AUC	AUC: 0.95	5
Yadav et al.⁶⁷	India	Multiple	X-ray	34 000 augmented (50%)		IMG	AlexNet, VGG16, ResNeXt, MobileNetV2, SFNet	80% train, 20% test		Precision, recall, f1-score, accuracy	Accuracy: 0.99	7
Yeh et al.⁶⁸	Taiwan	VF	MRI	190 (26%)		IMG³	ResNet-50	10-fold CV	10-fold	Accuracy	Accuracy: 0.92	9
Yoda et al.⁶⁹	China	VF	MRI	112 = 697 (48%)		IMG	Xception	5-fold CV	5-fold	Accuracy, sensitivity, specificity, AUC	AUC: 0.98	8
Zakharov et al.⁷⁰	Russia	VF	CT	100 = 3565 (21%)	300 and 100 (50%)	IMG	Custom CNN	5-fold CV	5-fold	Accuracy, precision, recall	AUC: 0.95	8
Zhang et al.⁷¹	China	VF	CT	1217 (96%)		IMG	U-GCN, 3DResNet	70% train, 10% validation, 20% test		Accuracy, balanced accuracy, sensitivity, specificity, AUC	Accuracy: 0.98 (detection), balanced accuracy: 0.80 (classification)	6

Open in a new tab

Abbreviations: HF, hip fracture; VF, vertebral fracture; MRI, magnetic resonance imaging; DXA, dual-energy X-ray absorptiometry; CT, computed tomography; IMG, image; CNN, convolutional neural network; AUC, area under the curve of the receiver operating characteristic; AUC-PR, area under the precision-recall curve; PPV, positive predictive value; NPV, negative predictive value; ICC, intra-class correlation coefficient. If several models were evaluated for a given task, the best performing is highlighted in bold.

All studies clearly stated the research task and 19 (76%) reported data characteristics.^46–49^,⁵³^,⁵⁴^,^56–60^,⁶³^,⁶⁴^,⁶⁶^,⁶⁸^,⁶⁹^,⁷¹ Fracture prevalence varied from 1.2% to 100% reflecting different uses between fracture detection and fracture classification.

All fracture detection and classification studies involved CNN architectures. Input images were used in their entirety or pre-processed to extract specific ROIs. ROIs were obtained from manual annotations by specialists,⁴⁷^,⁵⁰^,⁵²^,⁵⁶^,^61–63^,⁶⁵^,⁶⁹ or automatically extracted using DL models including You Only Look Once (YOLO),⁴⁹^,⁵⁷^,⁶⁴ U-net architectures,⁵³^,⁷¹ 3D U-Net for CT,⁷⁰ or a custom CNN.⁶⁰ Studies that used the whole image as input mainly focused on detecting the presence of a fracture (yes/no classification) rather than classifying its type.⁴⁶^,⁴⁸^,⁵⁴^,⁵⁵^,⁵⁸^,⁵⁹ In this context, heatmaps or bounding boxes highlighted regions suggesting fracture(s).⁴⁶^,⁴⁸^,⁵⁸^,⁵⁹

Seventeen studies (68%) used transfer learning to leverage pre-trained models from previous datasets.^47–50^,⁵²^,⁵⁷^,^60–67^,⁶⁹ Li et al. demonstrated improved accuracy, from 0.71 to 0.92, when using pre-trained model weights from ImageNet.⁵⁷ Data augmentation techniques were used in 76% of the studies to protect against overfitting.^46–48^,⁵⁰^,⁵⁴^,⁵⁶^,^58–60^,^62–66^,⁶⁸ Techniques included random shifts, rotations, scaling, contrast and brightness enhancement, or mosaic augmentation.⁴⁷ Bae et al. found that model area under the receiver operating characteristic curve (AUC) increased from 0.88 to 0.99 when using data augmentation.⁴⁶ However, Del Lama et al. reported a decrease in most metrics and explained that the augmentations did not accurately reflect characteristics in the original images.⁵⁰

Nineteen studies (76%) involved comparison models or methods.⁴⁶^,^48–50^,⁵⁴^,^56–61^,⁶³^,^65–69 Among these, 10 (40%) reported equivalent or improved performances using the AI models compared with specialist readings.^53–55^,⁶⁰^,^63–66^,⁶⁹^,⁷⁰ Xu et al. reported that AUCs for VF classification as acute, chronic, or pathological in an external dataset were significantly higher than for the trainee radiologist, similar to the competent radiologist, and only slightly lower than the expert radiologist. All 3 levels of radiological expertise improved with DL-model diagnosis.

Only eleven (44%) studies justified specific metrics used.⁴⁶^,⁴⁷^,⁵⁴^,⁶⁰^,⁶²^,⁶⁴^,⁶⁵^,⁶⁸^,⁷⁰ For example, Dong et al. prioritized sensitivity, positive predictive value (PPV), and precision-recall curve over AUC due to class imbalance.⁵² Most studies (72%) investigated model decision-making processes using heat-maps,⁴⁶^,⁴⁷^,^57–60^,⁶³ sensitivity analysis,⁴⁹^,⁵⁰^,⁵⁴^,⁶⁹ or error analysis.⁵²^,⁵³^,⁶⁸

Studies on risk prediction

Twenty-four studies investigated prediction of osteoporosis outcomes such as incident VF,⁷³^,⁸⁸^,⁹⁰^,^93–95 HF or related complications,⁷²^,^78–81^,⁸³^,⁸⁴^,⁸⁹^,⁹¹^,⁹² major osteoporotic fracture (MOF),⁷⁶^,⁷⁷^,⁸⁷ or BMD loss.⁷⁵ Study characteristics are summarized in Table 4, and their quality assessments are in Table S7.

Table 4.

General characteristics, methodological details, main results, and overall quality score for risk prediction studies.

Reference	Country (population)	Task (prediction time)	Modality	Data amount (% cases)	Data amount external validation (% cases)	Inputs (no.)	Model	CV train / validation / test	K-fold cross validation	Evaluation metrics	Best result	Quality score (max 12)
Cary et al.⁷²	USA	HF Mortality (30 d, 1 yr)	Database	17 140 (15%)		15	Logreg, MLP	10-fold CV	10-fold	Accuracy, AUC, precision, slope	AUC: 0.76	8
Chen et al.⁷³	China	VF (5 yr)	Database	1603 (8%)		147	Logreg, SVM, DT, KNN, RF, ERT, GBDT, AdaBoost, CatBoost, XGB, MLP, hybrid	80% train, 20% test	NR	Accuracy, precision, recall, F1-score, AUC	Accuracy: 0.90	9
Chen et al.⁷⁴	China	MOF (6 yr)	Database	487 (NA)		22	RF, MLP, SVM, XGB, DT	70% train, 30% test	10-fold	AUC, DCA, CIC	AUC: 0.87	9
Cheng et al.⁷⁵	Taiwan	BMD Loss (6 yr)	Database	23 497 (14%)		17	Logreg, XGB, RF, SVM	80% train, 20% test	10-fold	Sensitivity, specificity, AUC, accuracy, precision, f1-score	AUC: 0.75	7
Coco Martín et al.⁷⁶	Spain	MOF (> 1 yr)	Database	993 (28%)		25	Logreg, MLP	70% train, 30% test		Accuracy	Accuracy: 0.96	7
De Vries et al.⁷⁷	Netherlands	MOF (3 yr, 5 yr)	Database	7578 (11%)		46	Coxreg, RSF, MLP	10-fold CV	10-fold	C-index	C-Index: 0.70	11
DeBaun et al.⁷⁸	USA	HF Mortality (30 d)	Database	19 835 (1051)		43	MLP, naive Bayes, Logreg	80% train, 20% test		AUC	AUC: 0.92	4
Du et al.⁷⁹	China	HF	Database	120 (NA)		13	R2U-Net and SVM, RF, GBDT, AdaBoost, MLP, XGB	80% train, 20% test		Accuracy, specificity, recall, precision	Accuracy: 0.96	5
Forssten et al.⁸⁰	Sweden	HF Mortality (1 yr)	Database	124 707 (17%)		25	Logref, SVM, RF, NB	80%train, 20% test	5-fold	Accuracy, sensitivity, specificity, AUC	AUC: 0.74	11
Galassi et al.⁸¹	Spain	HF	Database	137 (65%)		38	Logreg, SVM, DT, RF	70% train, 30% test		Sensitivity, specificity, accuracy	Accuracy: 0.87	6
Harris et al.⁸²	USA	HF Mortality (30 d)	Database	82 168 (5%)		46	LASSO	10-fold	10-fold	Accuracy, C-Index	Accuracy: 0.76	8
Kitcharanant et al.⁸³	Thailand	HF Mortality (1 yr)	Database	492 (13%)		15	GB, RF, MLP, Logreg, NB, SVM, KNN	70% train, 30% test		Accuracy, sensitivity, specificity, AUC, PPV, NPV	AUC: 0.99	10
Klemt et al.⁸⁴	USA	HF Revision Surgery (> 2 yr)	Database	350 (5.2%)		NR	MLP, RF, KNN, PLR	80% train, 20% test	5-fold	AUC, intercept, calibration, Brier score	AUC: 0.81	6
Kong et al.⁸⁵	South Korea	VF	Database, X-ray	1595 (7.5%)		IMG patches (+7)	HRNet + ResNet and DeepSurv, Coxreg	89% train, 11% test	5-fold	AUC, sensitivity, specificity, PPV, NPV, C-Index	C-Index: 0.61	11
Lei et al.⁸⁶	China	HF Mortality	Database	391 (13.8%)	165 (10.9%)	27	RF, GBM, DT, XGB	67% train, 33% test	10-fold	AUC, accuracy, sensitivity, specificity, Youden Index, intercept, calibration slope	AUC: 0.71	7
Lu et al.⁸⁷	UK	MOF	Database	345 (28%)		359	Logreg, RF	80% train, 20% test	5-fold	AUC, sensitivity, specificity	AUC: 0.90	7
Ma et al.⁸⁸	China	VF	Database	529 (10.6%)		27	DT, RF, SVM, GBM, MLP, RDA, Logreg	75% train, 25% test	10-fold	AUC, Kappa, sensitivity, specificity	AUC: 0.94	9
Oosterhoff et al.⁸⁹	Netherlands	HF Mortality (90 d, 2 yr)	Database	2478 (9.1% and 23.5%)		14	SGB, RF, SVM, MLP, PLR	80% train, 20% test	10-fold	AUC, intercept, calibration, Brier score	AUC: 0.74 (90 d), AUC: 0.70 (2 yr)	9
Poullain et al.⁹⁰	France	VF	Database	60 (50%)		16	RF, CART	k-fold	k-fold	Sensitivity, specificity, AUC	AUC: 0.92	5
Shimizu et al.⁹¹	Japan	HF, FF (> 2 yr)	Database	6590 (4.4%)		10	LightGBM, ANN	75% train, 25% test		AUC	AUC: 0.75	4
Shtar et al.⁹²	Israel	HF Rehabilitation (8 yr)	Database	1896 (14%)		18	Linreg, Logreg, AdaBoost, CatBoost, ExtraTrees, KNN, RF, SVM, XGB, ensemble	NR	10-fold	AUC, R2	AUC: 0.86	10
Takahashi et al.⁹³	Japan	VF (Nonunion)	Database	153 (17%)		17	Logreg, DT, XGB, RF	70% train, 30% test	5-fold	AUC, accuracy	AUC: 0.86	9
Ulivieri et al.⁹⁴	Italy	VF (9 yr)	Database	174 (69)		9	MLP	70% train, 30% test		Sensitivity, specificity, AUC, accuracy	AUC: 0.82	5
Ulivieri et al.⁹⁵	Italy	VF (3 yr)	Database	172 (54%)		26	MLP	NR		Sensitivity, specificity, accuracy, AUC	Accuracy: 0.79	6

Open in a new tab

Abbreviations: VF, vertebral fracture; HF, hip fracture; FF, forearm fracture; MOF, major osteoporotic fracture; BMD, bone mineral density; NA, non assessable; MLP, multilayer perceptron; Logreg, logistic regression; GB, gradient boosting; RF, random forests; NB, Naïve Bayes; SVM, support vector machines; KNN, k-nearest neighbors; SGB, stochastic gradient boosting; PLR, Elastic-Net Penalized Logistic Regression; DT, decision tree; Linreg, linear regression; XGB, extreme gradient boosting; Coxreg, Cox regression; RSF, random survival forests; ERT, extremely randomized trees; GBDT, gradient boosted decision trees; GBM, gradient boosting machines; RDA, regularized discriminant analysis; CART, classification and regression tree; LASSO, least absolute shrinkage and selection operator; AUC, area under the curve of the receiver operating characteristic; PPV, positive predictive value; NPV, negative predictive value; C-Index, concordance index; R2, coefficient of determination; DCA, decision curve analysis; CIC, clinical impact curve. If several models were evaluated for a given task, the best performing is highlighted in bold.

Risk prediction studies developed ML and DL models trained from various input data modalities. Studies using non-image-based features trained ML models with retrospective data from large medical registries including demographic, clinical, or laboratory data to predict future events.⁷²^,⁷⁴^,⁷⁵^,^78–80^,⁸³^,⁸⁴^,⁸⁶^,⁸⁸^,⁸⁹^,⁹¹^,⁹² Other studies developed models that combined clinical parameters and bone measurements or image features. Bone measurements included BMD, bone strain index, or finite element analysis parameters derived from DXA.⁷⁴^,⁷⁷^,⁸¹^,⁹⁴^,⁹⁵ Image features involved radiomics from HR-pQCT, CT, and MRI,⁸⁷ GLCM from CT and MRI,⁹⁰ vertebral-level signal changes across time from MRI,⁹³ and image features from CNNs.⁸⁵ Combining clinical information with image features showed improved performance for fracture discrimination and prediction.⁷⁴^,⁸⁵^,⁸⁷ Notably, the LS X-ray image features extracted from a ResNet CNN showed higher performance than FRAX in terms of C-Index values to predict future VFs (0.612; 95% CI, 0.572-0.656 for the DL model vs 0.547 for FRAX).⁸⁵ Research on optimizing spine X-rays pre-processing for fracture prediction using CNNs revealed that using multiple bounding boxes of L1 to L5 vertebrae including the surrounding soft tissues led to better performance in predicting future fractures. This method outperformed the use of L1 to L5 bounding boxes without the surrounding soft tissues replaced by a black mask, as well as the use of the whole L1-L5 bounding box image with or without a black mask. These results align with Ho et al. and Kang et al. findings¹⁰^,¹² for predicting BMD from pelvis X-ray and suggest that a CNN tends to better analyze bone tissues when the input images include the surrounding tissues.

The best performing models were diverse, and when tested, ensemble voting algorithms or hybrid ML architectures showed improved performances.⁷³^,⁹² Specifically, a hybrid model concatenating a XGB output with an MLP demonstrated significant improvement in all performance metrics to predict future fractures.⁷³ Risk prediction studies showed promising results despite heterogeneous quality scores. Selecting and justifying performance metrics for model evaluation were often overlooked, with only 13 (54%) of studies providing justification.⁷²^,^74–76^,⁸⁰^,⁸³^,⁸⁵^,⁸⁶^,⁸⁸^,⁸⁹^,⁹² CIs were reported in 16 (67%) studies.⁷²^,⁷⁷^,^80–84^,^87–90^,⁹²^,⁹³

Most risk prediction studies (75%) proposed techniques to explain model behavior and/or visualize informative variable(s).^73–77^,⁸⁰^,⁸³^,⁸⁴^,^87–90^,^92–95 Methods included semantic connectivity maps from MLP,⁹⁴^,⁹⁵ SHAP analysis,⁸³^,⁹² feature importance from decision tree architectures,⁷³^,⁷⁵^,⁷⁷^,⁸⁰^,⁸⁸^,⁹³ Boruta algorithm,¹¹⁰ LASSO analysis,⁸² and ORs.⁷⁶^,⁸⁷ MLP and SHAP models provided individual-level risk explanations for the early failure of cementless TH arthroplasty in osteoporotic patients⁸⁴ and mortality in post HF patients.⁸⁹

Only de Vries et al. satisfied the reproducibility and transparency criterion by providing a source code repository.⁷⁷

Studies on bone segmentation

Eleven studies investigated AI segmentation of various bone regions, including forearm,¹⁰³^,¹⁰⁴ hip,⁹⁷^,¹⁰⁵ and spine,⁹⁶^,⁹⁸^,¹⁰⁰^,¹⁰¹^,¹⁰⁶ from different imaging modalities such as DXA,¹⁰⁴ MRI,¹⁰¹^,¹⁰⁶ X-ray,⁹⁸^,¹⁰¹^,¹⁰³ and CT.⁹⁷^,^100–102^,¹⁰⁵ Multimodal bone segmentation was investigated by Suri et al. with MRI, CT, and X-ray images.¹⁰¹ The main characteristics of bone segmentation studies are provided in Table 5, and their quality assessment is in Table S8. The main goal of bone segmentation is to automatically isolate the bone ROI from surrounding soft tissues or overlapping structures to derive bone measurements such as radiomics, BMD, and VF deformity ratios,⁹⁷^,⁹⁸^,¹⁰¹^,¹⁰⁴^,¹⁰⁶ or visualize complex bone regions.⁹⁶^,¹⁰⁰^,¹⁰²^,¹⁰³^,¹⁰⁵ More generally, bone segmentation strategies can be seen as pre-processing strategies to improve the input quality of imaging-based tasks. All studies used CNNs. Nine (82%) considered U-Net based architectures,⁹⁶^,⁹⁷^,⁹⁹^,¹⁰⁰^,¹⁰²^,^104–106 while others relied on custom architectures.¹⁰¹^,¹⁰³ Hybrid or multi-stage CNN approaches enhanced segmentation of complex bone regions.⁹⁶^,⁹⁸^,¹⁰³^,¹⁰⁵ These approaches involved separating the tasks of detection and segmentation to achieve more accurate results than segmenting from the whole image. Methods used included a Region Proposal Network (RPN) with precise rotation of the ROIs to reduce redundant background information,¹⁰³ instance segmentation from a Dense U-Net to define the fracture ROI,¹⁰⁵ and a Pose-Net to predict the coordinates of 5 lumbar vertebrae.⁹⁸

All studies used supervised learning, and the ground truth, or reference mask, was obtained from medical experts in 9 studies (82%),⁹⁸^,^100–104^,¹⁰⁶ from software in some cases,⁹⁷ or was not reported.⁹⁶ Eight studies (73%) compared performance against baseline models or methods, such as simpler CNN architectures,⁹⁸^,¹⁰³^,¹⁰⁴ software or thresholding methods,⁹⁷^,¹⁰² or expert segmentation.^99–101

To overcome limited numbers of training samples, 8 studies (73%) used data augmentation techniques.⁹⁷^,⁹⁸^,¹⁰⁰^,¹⁰²^,¹⁰⁴^,¹⁰⁵

The overall performance of segmentation tasks showed robust results. The primary performance metric used was the Dice coefficient, reported in 10 studies (91%) with an average of 0.93 (SD 0.03; range 0.90-0.99). In 9 studies (82%), model performance was supported with appropriate statistical methods.⁹⁷^,⁹⁸^,^100–104 Strategies to assist in visualizing the predictions were adopted in 8 studies (73%).^96–98^,¹⁰⁰^,¹⁰²^,¹⁰³^,¹⁰⁶ Three studies (27%) compared model performance with manual expert segmentations and demonstrated no statistical difference.^99–101

Studies on clinical decision support

The goal of the studies included in this review was to improve osteoporosis management algorithms and enhance patient outcomes by leveraging advanced ML and DL techniques. This section explores clinical decision support systems for osteoporosis management, utilizing previously evaluated articles from the 5 preceding sections, which aim to improve clinician efficiency, diagnostic accuracy, and patient outcomes through AI-driven models and opportunistic screening. Opportunistic screening addresses the need for cost-effective medical practices by utilizing routinely acquired imaging and clinical data not primarily intended for osteoporosis. Studies demonstrated the effectiveness of using opportunistic images for bone properties assessment and osteoporosis prediction from spine, chest, hip, or panoramic radiographs.¹⁰^,¹¹^,²⁷^,²⁸^,³²^,⁴¹ Automated fracture detection tools opportunistically detected fractures in routinely acquired data.⁴⁹^,⁵²^,⁵⁵^,⁷⁰^,⁷¹ Opportunistic screening helps in earlier diagnosis and fracture risk identification and paves the way for introducing formal diagnostic tests like DXA scans for definitive osteoporosis diagnosis.

AI-driven clinical decision support aims to improve clinician efficiency and diagnostic accuracy by automating or assisting with specific clinical tasks in complex scenarios. These tasks included distinguishing pathological VCFs from those secondary to osteoporosis,⁵⁶^,⁶⁹^,¹⁰⁰ fresh vs old VCFs,⁴⁷^,⁶⁶ or predicting rehabilitation outcomes and mortality following HFs.⁷²^,⁷⁸^,⁸⁰^,^82–84^,⁸⁹^,⁹² Studies exploited ML algorithms to provide individualized risk explanations and further improve patient triage and management.⁷⁴^,⁷⁷^,⁸³^,⁸⁶^,⁸⁹^,⁹² Moreover, AI promotes consistency in decision-making by mitigating potential human bias and ensuring standardized interpretation of medical data. As fracture types strongly determine the chosen surgical treatment, accurate diagnosis is required to optimize patient outcomes and treatment costs. Studies developed computer-aided diagnosis tools to assist in complex fracture classification.⁴⁸^,⁵⁰^,⁵²^,⁵⁴^,⁵⁷^,⁶⁰^,⁶¹^,⁶³^,⁶⁵^,⁹⁸^,⁹⁹^,¹⁰²^,¹⁰⁵ Importantly, these approaches can empower junior doctors, improving cost-effectiveness, task distribution, and addressing physician shortages. However, translating research into clinical practice requires further development, robust AI models, and seamless integration into established workflows.

Large retrospective electronic medical records were used to develop AI models to optimize treatment plans for reducing BMD loss and fragility fractures.¹⁷^,⁷⁶ Tanphiriyakun et al. evaluated an AI model’s ability to predict inadequate treatment response, defined as >3% lumbar BMD loss or >5% femoral BMD loss, using 8981 variables from clinical, laboratory, DXA, and prescription data.¹⁷ The model’s AUC ranged from 0.61 (KNN) to 0.70 (RF) and provided proof of concept for integration into EMR systems to improve treatment selection and outcomes by identifying novel response predictors. Complementing this research, Martín et al. analyzed anti-osteoporotic treatment responses using logistic regression and neural network models in 993 patients from the OSTEOMED registry.⁷⁶ They found that fracture reduction probabilities were generally independent of sex, age, and comorbidities, though treatments like vitamin D and calcium showed increased efficacy in specific groups. The logistic regression model accurately classified 96% of cases. Both studies highlight the potential of AI-driven approaches to enhance clinical decision support, improving patient care through informed and personalized treatment decisions. Integrating these models into practice could significantly improve outcomes for patients at risk of fragility fractures, enabling more precise and targeted therapeutic interventions.

Discussion

This review provides a comprehensive summary of the recent AI literature in osteoporosis. Although novel and well-performing ML/DL methods emerged in bone health assessment, the systematic assessment revealed disparities in study quality, highlighting the need for consistent standards in AI development and reporting.

Quality assessment standards in AI

Among the 97 reviewed studies, only 3 (3%) followed reporting guidelines,¹¹^,⁸⁰^,⁸⁹ such as the transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD)¹¹¹ and Nature Portfolio (https://www.nature.com/documents/nr-reporting-summary-flat.pdf). Both guidelines are comprehensive for assessing a model. However, these do not cover other important AI methods like data transformations, model optimization strategies, or examination methods that help to identify risks of bias and overfitting. Studies that used reporting guidelines showed better overall quality compared to the others, with an average quality score of 10.3 vs 8.1 (maximum score: 12).

A recent meta-analysis on CNNs for fracture recognition and classification in orthopedics reported a lack of suitable tools to assess the risk of bias in ML.¹¹² This meta-analysis relied on the modified Methodologic Index for Non-Randomized Studies¹¹³ checklist, which is suited for assessing the overall quality of studies with classical predictive statistics, but lacks specificity for AI applications. Kuo et al.⁵¹ recently conducted a systematic review and meta-analysis of AI for fracture detection and used 2 checklists, the TRIPOD and the Prediction model study Risk Of Bias Assessment Tool.¹¹⁴ These checklists were used in combination to provide a general indicator of reporting standards, and to assess potential bias, highlighting the need for AI development and reporting standards that are reflective of the complex methods and unique terminologies.

Overall quality of the studies

The best scoring studies involved a wide range of sample sizes, from 467 to 124 707. These studies utilized high-quality ground truth (such as osteoporosis, fracture classification or reference bone masks for segmentation tasks), established from gold standard methods or expert consensus. The high quality of the ground truth assessment combined with a proper understanding of the data characteristics facilitated effective model training and relevant feature-outcome associations.

CV strategies provided an assessment of the model’s generalizability and performance, reducing the risk of overfitting. Many studies lacked clear creation of independent training and testing sets.¹⁴^,¹⁷^,¹⁸^,²⁰^,²⁵^,²⁶^,³⁰^,³¹^,⁴²^,⁴⁵^,⁵⁶^,^62–64^,^66–69^,⁷²^, ^75–77^,⁷⁹^,⁸¹^,⁸²^,⁸⁴^,⁹⁰^,⁹²^,⁹⁴^,⁹⁵^,⁹⁹^,¹⁰¹^,¹⁰²^,¹⁰⁵ Generating an intermediate validation set or employing CV from the training set can help to prevent overfitting and optimize hyperparameters. Some studies assessed performance on the complete dataset through k-fold CV, meaning that model optimization and evaluation processes were done from the same dataset.²³^,⁵⁶^,⁶⁸^,⁶⁹^,⁷²^,⁷⁷^,⁸²^,¹⁰¹ This scenario assumes that the internal dataset is representative of the larger target population and usually demands EV.

Advances in AI in osteoporosis

AI research in osteoporosis has surged, covering more studies in a shorter period compared to the previous 5 yr assessed by Smets et al. The quality of research has also seen improvement. Utilizing the same scoring employed by Smets et al., this review assessed the overall quality of recent studies. The mean quality score was 8.1, superior to the 7.4 average of the earlier review period. Improvement in quality was supported by a notable increase in model examination strategies (64.8% vs 33.7% from the earlier period) and EVs (16.5% vs 4.5%).

AI tasks within the osteoporosis domain have become increasingly specific, moving beyond general triage to address precise clinical needs. Recent studies focused on diverse applications, including predicting HF complications such as death in hospital, revision surgeries, or rehabilitation outcomes.⁷²^,⁷⁸^,⁸⁰^,^82–84^,⁸⁹^,⁹¹^,⁹² Combined with relevant feature engineering, AI applications have taken a further step in explaining individualized predictions to enhance therapeutic decisions in specific subgroups of patients with conditions like primary fracture,⁸³^,⁸⁶^,⁸⁹^,⁹² rheumatoid arthritis,⁷⁴ osteopenia, and osteoporosis.⁷⁷^,⁹⁵ AI has aimed at being more precise, supported by innovative multi-factorial AI architectures. For the first time, CNNs were used for predicting BMD as continuous measures from pelvic X-rays, a significant improvement compared with classification-based CNNs.¹⁰^,¹¹ Strategies for model explainability in DL BMD regression tasks involved the adaptation of classical Grad-CAM to regression tasks (Grad-RAM)¹² and GOS maps.¹⁰

Combining clinical information with image-based features or radiomics significantly improved performance compared to standalone clinical or image-based models.¹⁴^,³²^,⁴¹^,⁴⁴^,⁵⁰^,⁸⁷ CNN architectures were exploited to combine additional features in the fully connected layers and improve the predictive performance of osteoporosis classifiers from spine, hip, or panoramic radiographs by adding simple covariates.¹⁴^,³²^,⁴¹^,⁴⁴ Notably, combining CNN image features with radiomics, clinical, and histogram information improved the classification of non-traumatic VCFs (fragility) vs malignant (tumors) VCFs from MRI.⁵⁰

In general, researchers have investigated more complex methods to improve pre-processing, explainability, and the overall performance of their models. Including the soft tissues surrounding a bone ROI enhanced BMD prediction and fracture risk assessment from CNNs, highlighting the importance of contextual information.¹⁰^,¹²^,⁸⁵ Unsupervised learning techniques including K-Means and Fuzzy C-Means as pre-processing clustered trabecular bone patterns to enhance osteoporosis prediction.⁴³ Similarly, Sobel gradient filtering applied to specific hip ROIs identified relevant trabecular patterns and enhanced BMD prediction.¹⁴ The continuously evolving YOLO v3 and YOLO v4 architectures showed pre-processing utility to automatically position bounding boxes around bone ROIs in medical images.⁴⁹^,⁵⁷^,⁶⁴ Automatic ROI positioning was also achieved through specific segmentation architectures including U-Net, Dense U-net, R2U-Net, SegNet, E-Net, PSPNet, Faster R-CNN, or HRNet. Multi-stage approaches comprising first the ROI extraction and then the bone segmentation by CNN have provided more detailed and accurate segmentation of complex bone structures by reducing information loss and improving the clarity of segmented areas.¹⁰³^,¹⁰⁵

Themes in model performance

Understanding model properties and assumptions is crucial for identifying the best model for a specific application. Researchers developed and evaluated multiple algorithms or incorporated established tools for direct comparison with clinical standards.⁹^,¹¹^,¹⁴^,¹⁵^,¹⁸^,²¹^,²⁴^,^31–34^,⁴⁰^,⁴¹^,⁴⁴^,⁴⁶^,^48–50^,⁵⁴^, ^56–61^,⁶³^,^65–67^,⁶⁹^,⁷³^,⁷⁷^,⁸⁰^,⁸⁵^,⁸⁷^,⁸⁸^,⁹²^,⁹⁷^,⁹⁸^,^100–104 The use of statistical approaches in the performance comparison of models has proven valuable in reinforcing and validating findings regarding the best performing models.^9–11^, ^15–18^,²⁰^,^22–24^,²⁷^,²⁸^,^30–32^,^34–42^,^44–50^,^52–55^,⁵⁷^,^59–61^,^63–65^,⁶⁹^,⁷¹^, ⁷²^,^76–94^,⁹⁷^,⁹⁸^,^100–104 It is worth noting that 37% of the studies included in this review did not report model performances with CIs, which limits critical evaluation of model performance.

Diversity among best-performing models underscores the importance of considering multiple architectures. Ensemble models demonstrated enhanced performance compared to standalone models, leveraging unique strengths and assigning different weights to input data. Training and optimizing ensemble models require efficient hyperparameter tuning and extensive computational costs. Hyperband optimization, reported in one study,¹⁵ benefits researchers training large models by speeding up random search through adaptive resource allocation and early stopping of bad runs. DL models, especially CNNs, were extensively used for image-based analysis, with common architectures or customized ensemble and multi-stage models. Transfer learning optimized training processes, particularly with small sample sizes.¹⁶^,⁴¹^,⁴⁴^,^47–50^,⁵²^,⁵⁷^,^60–67^,⁶⁹

Justifying specific performance metrics for model evaluation was often overlooked.¹²^,¹³^,¹⁷^,²⁰^,²¹^,²³^,³⁰^,^35–37^,³⁹^,⁴⁰^, ^42–45^,^48–50^,⁵³^,^56–59^,⁶¹^,⁶³^,⁶⁶^,⁶⁹^,⁷³^,⁷⁸^,⁷⁹^,⁸¹^,⁸⁷^,⁹⁰^,⁹¹^,⁹³^,⁹⁵^,¹⁰²^,¹⁰⁵^, ¹⁰⁶ Choosing suitable metrics for model evaluation is crucial, as it can affect decisions regarding the technology usefulness. Overly good performance metrics may indicate bias or overfitting. Studies that did not clearly demonstrate adequate validation methodologies and/or train/test dataset independence may have overestimated performance.¹⁴^,¹⁷^,¹⁸^,²⁰^,²³^,²⁵^,²⁶^,³⁰^,³¹^,⁴⁰^,⁴²^,⁴⁵^,⁵⁵^,⁵⁶^,⁶²^,⁶³^,^66–69^,⁷⁵^,⁷⁶^, ⁷⁸^,⁷⁹^,⁸¹^,⁸²^,⁸⁷^,⁹⁰^,⁹¹^,⁹⁹^,¹⁰¹^,¹⁰²^,¹⁰⁵ The monitoring of training and validation error rates ensures optimal model performance and prevents overfitting. Several studies used EV to assess performance.¹¹^,¹⁵^,²³^,²⁷^,³²^,⁴⁶^,⁴⁹^,⁵⁴^,⁵⁷^,⁶⁵^,⁷⁰^,⁸⁶^,⁹⁶^,¹⁰⁶ EV is the most demanding test of a model’s performance and a critical requirement before clinical deployment.¹¹⁵ Studies reported decreased performance with EV in 75% of cases,¹¹^,¹⁵^,²⁷^,²⁸^,³²^,⁴⁶^,⁵⁷^,⁶⁵^,⁷⁰^,⁸⁶^,⁹⁶^,¹⁰⁶ similar to findings in a systematic review on EV of DL algorithms for radiologic diagnosis.¹¹⁶

Model explainability and clinical implications

The demand for AI expertise in the osteoporosis field is largely led by the current limited understanding and explanation of bone fragility and fracture. Improvements in the clinical workflow and in accuracy of practitioners were reported for fracture detection and classification applications,⁶³^,⁶⁴ with performance comparable to experts for both fracture detection/classification and bone segmentation tasks.⁶⁴^,⁶⁵^,⁶⁹^,⁷⁰^,⁹⁸^,^100–102 Demystifying algorithm decision-making plays a pivotal role in clinical acceptance. Most studies (64%) made efforts to adopt XAI strategies including activation maps for image-based applications in bone properties,¹⁰^,¹²^,¹⁵ osteoporosis classification,²⁷^,²⁸^,⁴¹ fracture detection,⁴⁶^,⁴⁷^,^57–60^,⁶²^,⁶³ or risk prediction tasks.⁸⁵ Other studies provided visualization of complex high-dimensional relationships between variables using semantic connectivity maps,⁹⁴^,⁹⁵ SHAP,¹⁷^,³⁴^,⁸³^,⁹² LIME,⁴⁰ LASSO,⁹^,²⁶^,⁴⁰^,⁸² feature importance engineering,²⁰^,²²^,²³^,³⁰^,⁴⁵^,^73–77^,⁸⁰^,⁸⁴^,⁸⁸^,⁹⁰ sensitivity analysis,¹⁰^,⁴⁹^,⁵⁰^,⁵⁴^,⁶⁹^,⁸⁹ or error and thresholding analysis.³⁰^,⁵²^,⁶⁸^,⁸² Some studies deployed models as online applications, providing a platform for researchers and clinicians to access and expand validation beyond the original study.⁷⁷^,⁸²^,⁸³^,⁸⁹

Strengths and limitations of this study

The current work complements the report from Smets et al. by examining AI applications in osteoporosis after December 2020. This review discusses recent AI advances in osteoporosis and their clinical implications, showcasing novel screening and predictive techniques. Bone segmentation was introduced as a relevant AI domain. A thorough quality assessment was performed for each included study. However, there are several limitations to this review. First, our search was limited to the PubMed database, which may have resulted in the omission of relevant studies. Second, some studies overlapped between several application areas. The current work assigned a study to a given area based on the main AI output, regardless of its derived applications. Third, the quality assessment of the studies was done by only one assessor. Finally, the best performances are reported in Tables 1–5 with correlation coefficients (R2), AUC, accuracy or Dice when available, but these metrics may not be optimal for a given clinical scenario. Importantly, models should be assessed using multiple criteria.

Conclusion

The field of osteoporosis management has witnessed a surge of interest in the application of AI, fueled by the emergence of big data, advancements in computing power, and increased accessibility to AI technologies. This review offers a comprehensive overview of recent AI developments in osteoporosis, and details some key aspects of model development, optimization, and reporting. A wide range of models and examination strategies have shown promise and warrant further EV. By providing interpretable explanations, AI models can bridge the gap between complex algorithms and clinical practice, facilitating their integration into routine healthcare workflows. However, the pathway to clinical implementation of AI models necessitates careful consideration of potential biases in the training data and model development. Establishing reporting standards that ensure high-quality AI research is imperative and will build confidence in AI-based approaches, ultimately leading to improved patient outcomes in osteoporosis management.

Supplementary Material

Review_AI_SupplementaryMaterial_20240717_GATINEAU_et_al_zjae131

review_ai_supplementarymaterial_20240717_gatineau_et_al_zjae131.docx^{(209.2KB, docx)}

Acknowledgments

Special thanks go to Cecile Jacques from the Medical Library of Lausanne University Hospital and University of Lausanne, for her valuable assistance with the literature search on PubMed.

Contributor Information

Guillaume Gatineau, Interdisciplinary Center of Bone Diseases, Rheumatology Unit, Bone and Joint Department, Lausanne University Hospital and University of Lausanne, Av. Pierre-Decker 4, 1011 Lausanne, Switzerland.

Enisa Shevroja, Interdisciplinary Center of Bone Diseases, Rheumatology Unit, Bone and Joint Department, Lausanne University Hospital and University of Lausanne, Av. Pierre-Decker 4, 1011 Lausanne, Switzerland.

Colin Vendrami, Interdisciplinary Center of Bone Diseases, Rheumatology Unit, Bone and Joint Department, Lausanne University Hospital and University of Lausanne, Av. Pierre-Decker 4, 1011 Lausanne, Switzerland.

Elena Gonzalez-Rodriguez, Interdisciplinary Center of Bone Diseases, Rheumatology Unit, Bone and Joint Department, Lausanne University Hospital and University of Lausanne, Av. Pierre-Decker 4, 1011 Lausanne, Switzerland.

William D Leslie, Department of Medicine, University of Manitoba, Winnipeg, MB R3T 2N2, Canada.

Olivier Lamy, Internal Medicine Unit, Internal Medicine Department, Lausanne University Hospital and University of Lausanne, 1005 Lausanne, Switzerland.

Didier Hans, Interdisciplinary Center of Bone Diseases, Rheumatology Unit, Bone and Joint Department, Lausanne University Hospital and University of Lausanne, Av. Pierre-Decker 4, 1011 Lausanne, Switzerland.

Author contributions

Guillaume Gatineau (Conceptualization, Data curation, Investigation, Methodology, Writing—original draft, Writing—review & editing), Enisa Shevroja (Conceptualization, Methodology, Writing—original draft, Writing—review & editing), Colin Vendrami (Writing—review & editing), Elena Gonzalez-Rodriguez (Writing—review & editing), William D. Leslie (Writing—review & editing), Olivier Lamy (Writing—review & editing), and Didier Hans (Conceptualization, Supervision, Validation, Writing—review & editing.

Funding

This work was supported by the Swiss National Science Foundation (SNSF 32473B_156978 and 320030_188886) and the Foundation of the Orthopedic Hospital of the Vaudois University Hospital, Lausanne, Switzerland.

Conflicts of interest

All authors state that they have no conflicts of interest relevant to this work.

Data availability

The data supporting the findings of this study are available within the article and its supplementary materials. R and Python scripts were used to generate Figures 1 and 2, and are available from the corresponding author upon request.

References

1. Xu Y, Liu X, Cao X, et al. Artificial intelligence: a powerful paradigm for scientific research. The Innovation. 2021;2(4):100179. 10.1016/j.xinn.2021.100179 [DOI] [PMC free article] [PubMed] [Google Scholar]
2. Champendal M, Müller H, Prior JO, Dos Reis CS. A scoping review of interpretability and explainability concerning artificial intelligence methods in medical imaging. Eur J Radiol. 2023;169:111159. 10.1016/j.ejrad.2023.111159 [DOI] [PubMed] [Google Scholar]
3. Schuit SCE, Van Der Klift M, Weel AEAM, et al. Fracture incidence and association with bone mineral density in elderly men and women: the Rotterdam study. Bone. 2004;34(1):195-202. 10.1016/j.bone.2003.10.001 [DOI] [PubMed] [Google Scholar]
4. Kanis JA, Norton N, Harvey NC, et al. SCOPE 2021: a new scorecard for osteoporosis in Europe. Arch Osteoporos. 2021;16(1):82. 10.1007/s11657-020-00871-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
5. Kanis JA, Harvey NC, Johansson H, Odén A, Leslie WD, McCloskey EV. FRAX update. J Clin Densitom. 2017;20(3):360-367. 10.1016/j.jocd.2017.06.022 [DOI] [PubMed] [Google Scholar]
6. Weber GM, Mandl KD, Kohane IS. Finding the missing link for big biomedical data. JAMA. 2014 May 22 [cited 2023 Apr 18];311(24):2457-2548. 10.1001/jama.2014.4228 [DOI] [PubMed] [Google Scholar]
7. Smets J, Shevroja E, Hügle T, Leslie WD, Hans D. Machine learning solutions for osteoporosis—a review. J Bone Miner Res. 2021;36(5):833-851. 10.1002/jbmr.4292 [DOI] [PubMed] [Google Scholar]
8. Ouzzani M, Hammady H, Fedorowicz Z, Elmagarmid A. Rayyan—a web and mobile app for systematic reviews. Syst Rev. 2016;5(1):210. 10.1186/s13643-016-0384-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
9. Dai H, Wang Y, Fu R, et al. Radiomics and stacking regression model for measuring bone mineral density using abdominal computed tomography. Acta Radiol. 2023;64(1):228-236. [DOI] [PubMed] [Google Scholar]
10. Ho CS, Chen YP, Fan TY, et al. Application of deep learning neural network in predicting bone mineral density from plain X-ray radiography. Arch Osteoporos. 2021;16(1):153. 10.1007/s11657-021-00985-8 [DOI] [PubMed] [Google Scholar]
11. Hsieh CI, Zheng K, Lin C, et al. Automated bone mineral density prediction and fracture risk assessment using plain radiographs via deep learning. Nat Commun. 2021;12(1):5472. 10.1038/s41467-021-25779-x [DOI] [PMC free article] [PubMed] [Google Scholar]
12. Kang JW, Park C, Lee DE, Yoo JH, Kim MW. Prediction of bone mineral density in CT using deep learning with explainability. Front Physiol. 2022;13:1061911. [DOI] [PMC free article] [PubMed] [Google Scholar]
13. Min YK, Lee DH, Yoo JH, Park MJ, Huh JW, Kim MW. Estimation of bone mineral density in the femoral neck and lumbar spine using texture analysis of chest and pelvis computed tomography Hounsfield unit. Curr Med Imaging. 2022;19(10):1186-1195. https://pubmed.ncbi.nlm.nih.gov/36397633/ [DOI] [PubMed] [Google Scholar]
14. Nguyen TP, Chae DS, Park SJ, Yoon J. A novel approach for evaluating bone mineral density of hips based on Sobel gradient-based map of radiographs utilizing convolutional neural network. Comput Biol Med. 2021;132:104298. 10.1016/j.compbiomed.2021.104298 [DOI] [PubMed] [Google Scholar]
15. Nissinen T, Suoranta S, Saavalainen T, et al. Detecting pathological features and predicting fracture risk from dual-energy X-ray absorptiometry images using deep learning. Bone Rep. 2021;14:101070. 10.1016/j.bonr.2021.101070 [DOI] [PMC free article] [PubMed] [Google Scholar]
16. Sato Y, Yamamoto N, Inagaki N, et al. Deep learning for bone mineral density and T-score prediction from chest X-rays: a Multicenter study. Biomedicines [Internet]. 2022;10(9):2323. https://pubmed.ncbi.nlm.nih.gov/36140424/. 10.3390/biomedicines10092323 [DOI] [PMC free article] [PubMed] [Google Scholar]
17. Tanphiriyakun T, Rojanasthien S, Khumrin P. Bone mineral density response prediction following osteoporosis treatment using machine learning to aid personalized therapy. Sci Rep. 2021;11(1):13811. 10.1038/s41598-021-93152-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
18. Xiao P, Haque E, Zhang T, Dong XN, Huang Y, Wang X. Can DXA image-based deep learning model predict the anisotropic elastic behavior of trabecular bone? J Mech Behav Biomed Mater. 2021;124:104834. 10.1016/j.jmbbm.2021.104834 [DOI] [PubMed] [Google Scholar]
19. Zhang M, Gong H, Zhang M. Prediction of femoral strength of elderly men based on quantitative computed tomography images using machine learning. J Orthop Res Off Publ Orthop Res Soc. 2022;41(1):170-182. https://pubmed.ncbi.nlm.nih.gov/35393726/ [DOI] [PubMed] [Google Scholar]
20. Biamonte E, Levi R, Carrone F, et al. Artificial intelligence-based radiomics on computed tomography of lumbar spine in subjects with fragility vertebral fractures. J Endocrinol Investig. 2022;45(10):2007-2017. https://pubmed.ncbi.nlm.nih.gov/35751803/ [DOI] [PubMed] [Google Scholar]
21. Bui HM, Ha MH, Pham HG, et al. Predicting the risk of osteoporosis in older Vietnamese women using machine learning approaches. Sci Rep. 2022;12(1):20160. 10.1038/s41598-022-24181-x [DOI] [PMC free article] [PubMed] [Google Scholar]
22. Chen Z, Luo W, Zhang Q, et al. Osteoporosis diagnosis based on ultrasound radio frequency signal via multi-channel convolutional neural network. Annu Int Conf IEEE Eng Med Biol Soc. 2021;2021:832-835. [DOI] [PubMed] [Google Scholar]
23. Chen YC, Li YT, Kuo PC, et al. Automatic segmentation and radiomic texture analysis for osteoporosis screening using chest low-dose computed tomography. Eur Radiol. 2023;33(7):5097-5106. https://pubmed.ncbi.nlm.nih.gov/36719495/ [DOI] [PubMed] [Google Scholar]
24. Erjiang E, Wang T, Yang L, et al. Machine learning can improve clinical detection of low BMD: the DXA-HIP study. J Clin Densitom. 2021;24(4):527-537. 10.1016/j.jocd.2020.10.004 [DOI] [PubMed] [Google Scholar]
25. Fasihi L, Tartibian B, Eslami R, Fasihi H. Artificial intelligence used to diagnose osteoporosis from risk factors in clinical data and proposing sports protocols. Sci Rep. 2022;12(1):18330. 10.1038/s41598-022-23184-y [DOI] [PMC free article] [PubMed] [Google Scholar]
26. Huang CB, Hu JS, Tan K, Zhang W, Xu TH, Yang L. Application of machine learning model to predict osteoporosis based on abdominal computed tomography images of the psoas muscle: a retrospective study. BMC Geriatr. 2022;22(1):796. 10.1186/s12877-022-03502-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
27. Jang M, Kim M, Bae SJ, Lee SH, Koh JM, Kim N. Opportunistic osteoporosis screening using chest radiographs with deep learning: development and external validation with a cohort dataset. J Bone Miner Res Off J Am Soc Bone Miner Res. 2022;37(2):369-377. 10.1002/jbmr.4477 [DOI] [PubMed] [Google Scholar]
28. Jang R, Choi JH, Kim N, Chang JS, Yoon PW, Kim CH. Prediction of osteoporosis from simple hip radiography using deep learning algorithm. Sci Rep. 2021;11(1):19997. 10.1038/s41598-021-99549-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
29. Kwon Y, Lee J, Park JH, et al. Osteoporosis pre-screening using ensemble machine learning in postmenopausal Korean women. Healthcare. 2022;10(6):1107. Available from: 10.3390/healthcare10061107 [DOI] [PMC free article] [PubMed] [Google Scholar]
30. Liu L, Si M, Ma H, et al. A hierarchical opportunistic screening model for osteoporosis using machine learning applied to clinical data and CT images. BMC Bioinformatics. 2022;23(1):63. 10.1186/s12859-022-04596-z [DOI] [PMC free article] [PubMed] [Google Scholar]
31. Luo W, Chen Z, Zhang Q, et al. Osteoporosis diagnostic model using a multichannel convolutional neural network based on quantitative ultrasound radiofrequency signal. Ultrasound Med Biol. 2022;48(8):1590-1601. 10.1016/j.ultrasmedbio.2022.04.005 [DOI] [PubMed] [Google Scholar]
32. Mao L, Xia Z, Pan L, et al. Deep learning for screening primary osteopenia and osteoporosis using spine radiographs and patient clinical covariates in a Chinese population. Front Endocrinol. 2022;13:971877. 10.3389/fendo.2022.971877 [DOI] [PMC free article] [PubMed] [Google Scholar]
33. Ou, Yang WY, Lai CC, Tsou MT, Hwang LC. Development of machine learning models for prediction of osteoporosis from clinical health examination data. Int J Environ Res Public Health. 2021;18(14):7635. 10.3390/ijerph18147635 [DOI] [PMC free article] [PubMed] [Google Scholar]
34. Park HW, Jung H, Back KY, et al. Application of machine learning to identify clinically meaningful risk group for osteoporosis in individuals under the recommended age for dual-energy X-ray absorptiometry. Calcif Tissue Int. 2021;109(6):645-655. 10.1007/s00223-021-00880-x [DOI] [PubMed] [Google Scholar]
35. Sebro R, De la Garza-Ramos C. Utilizing machine learning for opportunistic screening for low BMD using CT scans of the cervical spine. J Neuroradiol J Neuroradiol. 2023;50(3):293-301. 10.1016/j.neurad.2022.08.001 [DOI] [PubMed] [Google Scholar]
36. Sebro R, De la Garza-Ramos C. Opportunistic screening for osteoporosis and osteopenia from CT scans of the abdomen and pelvis using machine learning. Eur Radiol. 2023;33(3):1812-1823. 10.1007/s00330-022-09136-0 [DOI] [PubMed] [Google Scholar]
37. Sebro R, De la Garza-Ramos C. Machine learning for the prediction of osteopenia/osteoporosis using the CT attenuation of multiple osseous sites from chest CT. Eur J Radiol. 2022;155:110474. 10.1016/j.ejrad.2022.110474 [DOI] [PubMed] [Google Scholar]
38. Sebro R, De la Garza-Ramos C. Machine learning for opportunistic screening for osteoporosis from CT scans of the wrist and forearm. Diagn Basel Switz [Internet]. 2022;12(3):691. 10.3390/diagnostics12030691 [DOI] [PMC free article] [PubMed] [Google Scholar]
39. Shen Y, Sardar Z, Chase H, Coury JR, Cerpa M, Lenke LG. Predicting bone health using machine learning in patients undergoing spinal reconstruction surgery. Spine. 2023;48(2):120-126. 10.1097/BRS.0000000000004511 [DOI] [PubMed] [Google Scholar]
40. Suh B, Yu H, Kim H, et al. Interpretable deep-learning approaches for osteoporosis risk screening and individualized feature analysis using large population-based data: model development and performance evaluation. J Med Internet Res. 2023;25:e40179. 10.2196/40179 [DOI] [PMC free article] [PubMed] [Google Scholar]
41. Sukegawa S, Fujimura A, Taguchi A, et al. Identification of osteoporosis using ensemble deep learning model with panoramic radiographs and clinical covariates. Sci Rep. 2022;12(1):6088. 10.1038/s41598-022-10150-x [DOI] [PMC free article] [PubMed] [Google Scholar]
42. Wang Y, Wang L, Sun Y, et al. Prediction model for the risk of osteoporosis incorporating factors of disease history and living habits in physical examination of population in Chongqing, Southwest China: based on artificial neural network. BMC Public Health. 2021;21(1):991. 10.1186/s12889-021-11002-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
43. Widyaningrum R, Sela EI, Pulungan R, Septiarini A. Automatic segmentation of periapical radiograph using color histogram and machine learning for osteoporosis detection. Int J Dent. 2023;2023:1-9. 10.1155/2023/6662911 [DOI] [PMC free article] [PubMed] [Google Scholar]
44. Yamamoto N, Sukegawa S, Yamashita K, et al. Effect of patient clinical variables in osteoporosis classification using hip X-rays in deep learning analysis. Medicina. 2021;57(8):846. 10.3390/medicina57080846 [DOI] [PMC free article] [PubMed] [Google Scholar]
45. Yang J, Liao M, Wang Y, et al. Opportunistic osteoporosis screening using chest CT with artificial intelligence. Osteoporos Int. 2022;33(12):2547-2561. 10.1007/s00198-022-06491-y [DOI] [PubMed] [Google Scholar]
46. Bae J, Yu S, Oh J, et al. External validation of deep learning algorithm for detecting and visualizing femoral neck fracture including displaced and non-displaced fracture on plain X-ray. J Digit Imaging. 2021;34(5):1099-1109. 10.1007/s10278-021-00499-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
47. Chen W, Liu X, Li K, et al. A deep-learning model for identifying fresh vertebral compression fractures on digital radiography. Eur Radiol. 2022;32(3):1496-1505. 10.1007/s00330-021-08247-4 [DOI] [PubMed] [Google Scholar]
48. Cheng CT, Wang Y, Chen HW, et al. A scalable physician-level deep learning algorithm detects universal trauma on pelvic radiographs. Nat Commun. 2021;12(1):1066. 10.1038/s41467-021-21311-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
49. Chou PH, Jou THT, Wu HTH, et al. Ground truth generalizability affects performance of the artificial intelligence model in automated vertebral fracture detection on plain lateral radiographs of the spine. Spine J Off J North Am Spine Soc. 2022;22(4):511-523. 10.1016/j.spinee.2021.10.020 [DOI] [PubMed] [Google Scholar]
50. Del Lama RS, Candido RM, Chiari-Correia NS, Nogueira-Barbosa MH, de Azevedo-Marques PM, Tinós R. Computer-aided diagnosis of vertebral compression fractures using convolutional neural networks and radiomics. J Digit Imaging. 2022;35(3):446-458. 10.1007/s10278-022-00586-y [DOI] [PMC free article] [PubMed] [Google Scholar]
51. Kuo RYL, Harrison C, Curran TA, et al. Artificial intelligence in fracture detection: a systematic review and meta-analysis. Radiology. 2022;304(1):50-62. 10.1148/radiol.211785 [DOI] [PMC free article] [PubMed] [Google Scholar]
52. Dong Q, Luo G, Lane NE, et al. Deep learning classification of spinal osteoporotic compression fractures on radiographs using an adaptation of the genant semiquantitative criteria. Acad Radiol. 2022;29(12):1819-1832. https://pubmed.ncbi.nlm.nih.gov/35351363/ [DOI] [PMC free article] [PubMed] [Google Scholar]
53. Germann C, Meyer AN, Staib M, Sutter R, Fritz B. Performance of a deep convolutional neural network for MRI-based vertebral body measurements and insufficiency fracture detection. Eur Radiol. 2023;33(5):3188-3199. 10.1007/s00330-022-09354-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
54. Guermazi A, Tannoury C, Kompel AJ, et al. Improving radiographic fracture recognition performance and efficiency using artificial intelligence. Radiology. 2022;302(3):627-636. 10.1148/radiol.210937 [DOI] [PubMed] [Google Scholar]
55. Inoue T, Maki S, Furuya T, et al. Automated fracture screening using an object detection algorithm on whole-body trauma computed tomography. Sci Rep. 2022;12(1):16549. 10.1038/s41598-022-20996-w [DOI] [PMC free article] [PubMed] [Google Scholar]
56. Li Y, Zhang Y, Zhang E, et al. Differential diagnosis of benign and malignant vertebral fracture on CT using deep learning. Eur Radiol. 2021;31(12):9612-9619. https://pubmed.ncbi.nlm.nih.gov/33993335/ [DOI] [PMC free article] [PubMed] [Google Scholar]
57. Li YC, Chen HH, Horng-Shing LH, Hondar Wu HT, Chang MC, Chou PH. Can a deep-learning model for the automated detection of vertebral fractures approach the performance level of human subspecialists? Clin Orthop. 2021;479(7):1598-1612. 10.1097/CORR.0000000000001685 [DOI] [PMC free article] [PubMed] [Google Scholar]
58. Monchka BA, Schousboe JT, Davidson MJ, et al. Development of a manufacturer-independent convolutional neural network for the automated identification of vertebral compression fractures in vertebral fracture assessment images using active learning. Bone. 2022;161:116427. 10.1016/j.bone.2022.116427 [DOI] [PubMed] [Google Scholar]
59. Monchka BA, Kimelman D, Lix LM, Leslie WD. Feasibility of a generalized convolutional neural network for automated identification of vertebral compression fractures: the Manitoba bone mineral density registry. Bone. 2021;150:116017. 10.1016/j.bone.2021.116017 [DOI] [PubMed] [Google Scholar]
60. Murphy EA, Ehrhardt B, Gregson CL, et al. Machine learning outperforms clinical experts in classification of hip fractures. Sci Rep. 2022;12(1):2058. 10.1038/s41598-022-06018-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
61. Ozkaya E, Topal FE, Bulut T, Gursoy M, Ozuysal M, Karakaya Z. Evaluation of an artificial intelligence system for diagnosing scaphoid fracture on direct radiography. Eur J Trauma Emerg Surg. 2022;48(1):585-592. 10.1007/s00068-020-01468-0 [DOI] [PubMed] [Google Scholar]
62. Rosenberg GS, Cina A, Schiró GR, et al. Artificial intelligence accurately detects traumatic thoracolumbar fractures on sagittal radiographs. Medicina. 2022;58(8):998. https://pubmed.ncbi.nlm.nih.gov/35893113/ [DOI] [PMC free article] [PubMed] [Google Scholar]
63. Sato Y, Takegami Y, Asamoto T, et al. Artificial intelligence improves the accuracy of residents in the diagnosis of hip fractures: a multicenter study. BMC Musculoskelet Disord. 2021;22(1):407. 10.1186/s12891-021-04260-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
64. Twinprai N, Boonrod A, Boonrod A, et al. Artificial intelligence (AI) vs. human in hip fracture detection. Heliyon. 2022;8(11):e11266. 10.1016/j.heliyon.2022.e11266 [DOI] [PMC free article] [PubMed] [Google Scholar]
65. Xu F, Xiong Y, Ye G, et al. Deep learning-based artificial intelligence model for classification of vertebral compression fractures: a multicenter diagnostic study. Front Endocrinol. 2023;14:1025749. 10.3389/fendo.2023.1025749 [DOI] [PMC free article] [PubMed] [Google Scholar]
66. Akito Y, Masatoshi H, Hitoshi T, et al. Using artificial intelligence to diagnose fresh osteoporotic vertebral fractures on magnetic resonance images. Spine J Off J North Am Spine Soc. 2021;21(10):1652-1658. 10.1016/j.spinee.2021.03.006 [DOI] [PubMed] [Google Scholar]
67. Yadav DP, Sharma A, Athithan S, Bhola A, Sharma B, Dhaou IB. Hybrid SFNet model for bone fracture detection and classification using ML/DL. Sensors. 2022;22(15):5823. https://pubmed.ncbi.nlm.nih.gov/35957380/ [DOI] [PMC free article] [PubMed] [Google Scholar]
68. Yeh LR, Zhang Y, Chen JH, et al. A deep learning-based method for the diagnosis of vertebral fractures on spine MRI: retrospective training and validation of ResNet. Eur Spine J. 2022;31(8):2022-2030. 10.1007/s00586-022-07121-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
69. Yoda T, Maki S, Furuya T, et al. Automated differentiation between osteoporotic vertebral fracture and malignant vertebral fracture on MRI using a deep convolutional neural network. Spine. 2022;47(8):E347-E352. 10.1097/BRS.0000000000004307 [DOI] [PubMed] [Google Scholar]
70. Zakharov A, Pisov M, Bukharaev A, et al. Interpretable vertebral fracture quantification via anchor-free landmarks localization. Med Image Anal. 2023;83:102646. 10.1016/j.media.2022.102646 [DOI] [PubMed] [Google Scholar]
71. Zhang J, Liu F, Xu J, et al. Automated detection and classification of acute vertebral body fractures using a convolutional neural network on computed tomography. Front Endocrinol. 2023;14:1132725. 10.3389/fendo.2023.1132725 [DOI] [PMC free article] [PubMed] [Google Scholar]
72. Cary MPJ, Zhuang F, Draelos RL, et al. Machine learning algorithms to predict mortality and allocate palliative Care for Older Patients with hip fracture. J Am Med Dir Assoc. 2021;22(2):291-296. 10.1016/j.jamda.2020.09.025 [DOI] [PMC free article] [PubMed] [Google Scholar]
73. Chen Y, Yang T, Gao X, Xu A. Hybrid deep learning model for risk prediction of fracture in patients with diabetes and osteoporosis. Front Med. 2021;16(3):496-506. https://pubmed.ncbi.nlm.nih.gov/34448125/ [DOI] [PubMed] [Google Scholar]
74. Chen R, Huang Q, Chen L. Development and validation of machine learning models for prediction of fracture risk in patients with elderly-onset rheumatoid arthritis. Int J Gen Med. 2022;15:7817-7829. 10.2147/IJGM.S380197 [DOI] [PMC free article] [PubMed] [Google Scholar]
75. Cheng CH, Lin CY, Cho TH, Lin CM. Machine learning to predict the progression of bone mass loss associated with personal characteristics and a metabolic syndrome scoring index. Healthcare. 2021;9(8):948. 10.3390/healthcare9080948 [DOI] [PMC free article] [PubMed] [Google Scholar]
76. Coco, Martín MB, Leal Vega L, Blázquez Cabrera J, et al. Comorbidity and osteoporotic fracture: approach through predictive modeling techniques using the OSTEOMED registry. Aging Clin Exp Res. 2022;34(9):1997-2004. https://pubmed.ncbi.nlm.nih.gov/35435583/ [DOI] [PMC free article] [PubMed] [Google Scholar]
77. de Vries BCS, Hegeman JH, Nijmeijer W, Geerdink J, Seifert C, Groothuis-Oudshoorn CGM. Comparing three machine learning approaches to design a risk assessment tool for future fractures: predicting a subsequent major osteoporotic fracture in fracture patients with osteopenia and osteoporosis. Osteoporos Int. 2021;32(3):437-449. 10.1007/s00198-020-05735-z [DOI] [PubMed] [Google Scholar]
78. De Baun MR, Chavez G, Fithian A, et al. Artificial neural networks predict 30-day mortality after hip fracture: insights from machine learning. J Am Acad Orthop Surg. 2021;29(22):977-983. 10.5435/JAAOS-D-20-00429 [DOI] [PubMed] [Google Scholar]
79. Du J, Wang J, Gai X, Sui Y, Liu K, Yang D. Application of intelligent X-ray image analysis in risk assessment of osteoporotic fracture of femoral neck in the elderly. Math Biosci Eng. 2023;20(1):879-893. 10.3934/mbe.2023040 [DOI] [PubMed] [Google Scholar]
80. Forssten MP, Bass GA, Ismail AM, Mohseni S, Cao Y. Predicting 1-year mortality after hip fracture surgery: an evaluation of multiple machine learning approaches. J Pers Med. 2021;11(8):727. 10.3390/jpm11080727 [DOI] [PMC free article] [PubMed] [Google Scholar]
81. Galassi A, Martin-Guerrero J, Villamor E, Monserrat C, Rupérez MJ. Risk assessment of hip fracture based on machine learning. Appl Bionics Biomech. 2020;2020:1-13. 10.1155/2020/8880786 [DOI] [PMC free article] [PubMed] [Google Scholar]
82. Harris AHS, Trickey AW, Eddington HS, et al. A tool to estimate risk of 30-day mortality and complications after hip fracture surgery: accurate enough for some but not all purposes? A study from the ACS-NSQIP database. Clin Orthop. 2022;480(12):2335-2346. 10.1097/CORR.0000000000002294 [DOI] [PMC free article] [PubMed] [Google Scholar]
83. Kitcharanant N, Chotiyarnwong P, Tanphiriyakun T, et al. Development and internal validation of a machine-learning-developed model for predicting 1-year mortality after fragility hip fracture. BMC Geriatr. 2022;22(1):451. 10.1186/s12877-022-03152-x [DOI] [PMC free article] [PubMed] [Google Scholar]
84. Klemt C, Yeo I, Cohen-Levy WB, Melnic CM, Habibi Y, Kwon YM. Artificial neural networks can predict early failure of Cementless Total hip arthroplasty in patients with osteoporosis. J Am Acad Orthop Surg. 2022;30(10):467-475. 10.5435/JAAOS-D-21-00775 [DOI] [PubMed] [Google Scholar]
85. Kong SH, Lee JW, Bae BU, et al. Development of a spine X-ray-based fracture prediction model using a deep learning algorithm. Endocrinol Metab Seoul Korea. 2022;37(4):674-683. 10.3803/EnM.2022.1461 [DOI] [PMC free article] [PubMed] [Google Scholar]
86. Lei M, Han Z, Wang S, et al. A machine learning-based prediction model for in-hospital mortality among critically ill patients with hip fracture: an internal and external validated study. Injury. 2023;54(2):636-644. 10.1016/j.injury.2022.11.031 [DOI] [PubMed] [Google Scholar]
87. Lu S, Fuggle NR, Westbury LD, et al. Machine learning applied to HR-pQCT images improves fracture discrimination provided by DXA and clinical risk factors. Bone. 2023;168:116653. 10.1016/j.bone.2022.116653 [DOI] [PubMed] [Google Scholar]
88. Ma Y, Lu Q, Yuan F, Chen H. Comparison of the effectiveness of different machine learning algorithms in predicting new fractures after PKP for osteoporotic vertebral compression fractures. J Orthop Surg. 2023;18(1):62. 10.1186/s13018-023-03551-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
89. Oosterhoff JHF, Savelberg ABMC, Karhade AV, et al. Development and internal validation of a clinical prediction model using machine learning algorithms for 90 day and 2 year mortality in femoral neck fracture patients aged 65 years or above. Eur J Trauma Emerg Surg. 2022;48(6):4669-4682. 10.1007/s00068-022-01981-4 https://pubmed.ncbi.nlm.nih.gov/35643788/ [DOI] [PMC free article] [PubMed] [Google Scholar]
90. Poullain F, Champsaur P, Pauly V, et al. Vertebral trabecular bone texture analysis in opportunistic MRI and CT scan can distinguish patients with and without osteoporotic vertebral fracture: a preliminary study. Eur J Radiol. 2023;158:110642. 10.1016/j.ejrad.2022.110642 [DOI] [PubMed] [Google Scholar]
91. Shimizu H, Enda K, et al. Machine learning algorithms: prediction and feature selection for clinical Refracture after surgically treated fragility fracture. J Clin Med. 2022;11(7):2021. 10.3390/jcm11072021 [DOI] [PMC free article] [PubMed] [Google Scholar]
92. Shtar G, Rokach L, Shapira B, Nissan R, Hershkovitz A. Using machine learning to predict rehabilitation outcomes in postacute hip fracture patients. Arch Phys Med Rehabil. 2021;102(3):386-394. 10.1016/j.apmr.2020.08.011 [DOI] [PubMed] [Google Scholar]
93. Takahashi S, Terai H, Hoshino M, et al. Machine-learning-based approach for nonunion prediction following osteoporotic vertebral fractures. Eur Spine J. 2022;32(11):3788-3796. https://pubmed.ncbi.nlm.nih.gov/36269421/ [DOI] [PubMed] [Google Scholar]
94. Ulivieri FM, Rinaudo L, Messina C, et al. Bone strain index predicts fragility fracture in osteoporotic women: an artificial intelligence-based study. Eur Radiol Exp. 2021;5(1):47. 10.1186/s41747-021-00242-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
95. Ulivieri FM, Rinaudo L, Piodi LP, et al. Bone strain index as a predictor of further vertebral fracture in osteoporotic women: an artificial intelligence-based analysis. PLoS One. 2021;16(2):e0245967. 10.1371/journal.pone.0245967 [DOI] [PMC free article] [PubMed] [Google Scholar]
96. Cheng P, Yang Y, Yu H, He Y. Automatic vertebrae localization and segmentation in CT with a two-stage dense-U-net. Sci Rep. 2021;11(1):22156. 10.1038/s41598-021-01296-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
97. Deng Y, Wang L, Zhao C, et al. A deep learning-based approach to automatic proximal femur segmentation in quantitative CT images. Med Biol Eng Comput. 2022;60(5):1417-1429. 10.1007/s11517-022-02529-9 [DOI] [PubMed] [Google Scholar]
98. Kim KC, Cho HC, Jang TJ, Choi JM, Seo JK. Automatic detection and segmentation of lumbar vertebrae from X-ray images for compression fracture evaluation. Comput Methods Prog Biomed. 2021;200:105833. 10.1016/j.cmpb.2020.105833 [DOI] [PubMed] [Google Scholar]
99. Kim DH, Jeong JG, Kim YJ, Kim KG, Jeon JY. Automated vertebral segmentation and measurement of vertebral compression ratio based on deep learning in X-ray images. J Digit Imaging. 2021;34(4):853-861. 10.1007/s10278-021-00471-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
100. Park T, Yoon MA, Cho YC, et al. Automated segmentation of the fractured vertebrae on CT and its applicability in a radiomics model to predict fracture malignancy. Sci Rep. 2022;12(1):6735. 10.1038/s41598-022-10807-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
101. Suri A, Jones BC, Ng G, et al. A deep learning system for automated, multi-modality 2D segmentation of vertebral bodies and intervertebral discs. Bone. 2021;149:115972. 10.1016/j.bone.2021.115972 [DOI] [PMC free article] [PubMed] [Google Scholar]
102. Wang D, Wu Z, Fan G, et al. Accuracy and reliability analysis of a machine learning based segmentation tool for intertrochanteric femoral fracture CT. Front Surg. 2022;9:913385. 10.3389/fsurg.2022.913385 [DOI] [PMC free article] [PubMed] [Google Scholar]
103. Wei D, Wu Q, Wang X, Tian M, Li B. Accurate instance segmentation in Pediatric elbow radiographs. Sensors. 2021;21(23):7966. https://pubmed.ncbi.nlm.nih.gov/34883969/. [DOI] [PMC free article] [PubMed] [Google Scholar]
104. Yang F, Weng X, Miao Y, Wu Y, Xie H, Lei P. Deep learning approach for automatic segmentation of ulna and radius in dual-energy X-ray imaging. Insights Imaging. 2021;12(1):191. 10.1186/s13244-021-01137-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
105. Yang L, Gao S, Li P, Shi J, Zhou F. Recognition and segmentation of individual bone fragments with a deep learning approach in CT scans of complex intertrochanteric fractures: a retrospective study. J Digit Imaging. 2022;35(6):1681-1689. https://pubmed.ncbi.nlm.nih.gov/35711073/ [DOI] [PMC free article] [PubMed] [Google Scholar]
106. Zhao Y, Zhao T, Chen S, et al. Fully automated radiomic screening pipeline for osteoporosis and abnormal bone density with a deep learning-based segmentation using a short lumbar mDixon sequence. Quant Imaging Med Surg. 2022;12(2):1198-1213. 10.21037/qims-21-587 [DOI] [PMC free article] [PubMed] [Google Scholar]
107. Jolliffe IT, Cadima J. Principal component analysis: a review and recent developments. Philos Trans R Soc Math Phys Eng Sci. 2016;374(2065):20150202. [DOI] [PMC free article] [PubMed] [Google Scholar]
108. Tibshirani R. Regression shrinkage and selection via the Lasso. J R Stat Soc Ser B Methodol. 1996;58(1):267-288. 10.1111/j.2517-6161.1996.tb02080.x [DOI] [Google Scholar]
109. Lundberg S, Lee SI. A unified approach to interpreting model predictions. 2017. [cited 2023 Apr 19]. https://arxiv.org/abs/1705.07874
110. Kursa MB, Rudnicki WR. Feature selection with the Boruta package. J Stat Softw. 2010;36(11):1-13. 10.18637/jss.v036.i11 [DOI] [Google Scholar]
111. Collins GS, Reitsma JB, Altman DG, Moons K. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. BMC Med. 2015;13(1):1. 10.1186/s12916-014-0241-z [DOI] [PMC free article] [PubMed] [Google Scholar]
112. Oliveira E, Carmo L, Van Den Merkhof A, et al. An increasing number of convolutional neural networks for fracture recognition and classification in orthopaedics: are these externally validated and ready for clinical application? Bone Jt Open. 2021;2(10):879-885. 10.1302/2633-1462.210.BJO-2021-0133 [DOI] [PMC free article] [PubMed] [Google Scholar]
113. Slim K, Nini E, Forestier D, Kwiatkowski F, Panis Y, Chipponi J. Methodological index for non-randomized studies ( MINORS ): development and validation of a new instrument: methodological index for non-randomized studies. ANZ J Surg. 2003;73(9):712-716. 10.1046/j.1445-2197.2003.02748.x [DOI] [PubMed] [Google Scholar]
114. Wolff RF, Moons KGM, Riley RD, et al. PROBAST: a tool to assess the risk of bias and applicability of prediction model studies. Ann Intern Med. 2019;170(1):51. 10.7326/M18-1376 [DOI] [PubMed] [Google Scholar]
115. König IR, Malley JD, Weimar C, Diener HC, Ziegler A, German Stroke Study Collaboration . On behalf of the German stroke study collaboration. Practical experiences on the necessity of external validation. Stat Med. 2007;26(30):5499-5511. 10.1002/sim.3069 [DOI] [PubMed] [Google Scholar]
116. Yu AC, Mohajer B, Eng J. External validation of deep learning algorithms for radiologic diagnosis: a systematic review. Radiol Artif Intell. 2022;4(3):e210064. 10.1148/ryai.210064 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Review_AI_SupplementaryMaterial_20240717_GATINEAU_et_al_zjae131

review_ai_supplementarymaterial_20240717_gatineau_et_al_zjae131.docx^{(209.2KB, docx)}

Data Availability Statement

[ref1] 1. Xu Y, Liu X, Cao X, et al. Artificial intelligence: a powerful paradigm for scientific research. The Innovation. 2021;2(4):100179. 10.1016/j.xinn.2021.100179 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref2] 2. Champendal M, Müller H, Prior JO, Dos Reis CS. A scoping review of interpretability and explainability concerning artificial intelligence methods in medical imaging. Eur J Radiol. 2023;169:111159. 10.1016/j.ejrad.2023.111159 [DOI] [PubMed] [Google Scholar]

[ref3] 3. Schuit SCE, Van Der Klift M, Weel AEAM, et al. Fracture incidence and association with bone mineral density in elderly men and women: the Rotterdam study. Bone. 2004;34(1):195-202. 10.1016/j.bone.2003.10.001 [DOI] [PubMed] [Google Scholar]

[ref4] 4. Kanis JA, Norton N, Harvey NC, et al. SCOPE 2021: a new scorecard for osteoporosis in Europe. Arch Osteoporos. 2021;16(1):82. 10.1007/s11657-020-00871-9 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref5] 5. Kanis JA, Harvey NC, Johansson H, Odén A, Leslie WD, McCloskey EV. FRAX update. J Clin Densitom. 2017;20(3):360-367. 10.1016/j.jocd.2017.06.022 [DOI] [PubMed] [Google Scholar]

[ref6] 6. Weber GM, Mandl KD, Kohane IS. Finding the missing link for big biomedical data. JAMA. 2014 May 22 [cited 2023 Apr 18];311(24):2457-2548. 10.1001/jama.2014.4228 [DOI] [PubMed] [Google Scholar]

[ref7] 7. Smets J, Shevroja E, Hügle T, Leslie WD, Hans D. Machine learning solutions for osteoporosis—a review. J Bone Miner Res. 2021;36(5):833-851. 10.1002/jbmr.4292 [DOI] [PubMed] [Google Scholar]

[ref8] 8. Ouzzani M, Hammady H, Fedorowicz Z, Elmagarmid A. Rayyan—a web and mobile app for systematic reviews. Syst Rev. 2016;5(1):210. 10.1186/s13643-016-0384-4 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref13] 9. Dai H, Wang Y, Fu R, et al. Radiomics and stacking regression model for measuring bone mineral density using abdominal computed tomography. Acta Radiol. 2023;64(1):228-236. [DOI] [PubMed] [Google Scholar]

[ref9] 10. Ho CS, Chen YP, Fan TY, et al. Application of deep learning neural network in predicting bone mineral density from plain X-ray radiography. Arch Osteoporos. 2021;16(1):153. 10.1007/s11657-021-00985-8 [DOI] [PubMed] [Google Scholar]

[ref10] 11. Hsieh CI, Zheng K, Lin C, et al. Automated bone mineral density prediction and fracture risk assessment using plain radiographs via deep learning. Nat Commun. 2021;12(1):5472. 10.1038/s41467-021-25779-x [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref12] 12. Kang JW, Park C, Lee DE, Yoo JH, Kim MW. Prediction of bone mineral density in CT using deep learning with explainability. Front Physiol. 2022;13:1061911. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref14] 13. Min YK, Lee DH, Yoo JH, Park MJ, Huh JW, Kim MW. Estimation of bone mineral density in the femoral neck and lumbar spine using texture analysis of chest and pelvis computed tomography Hounsfield unit. Curr Med Imaging. 2022;19(10):1186-1195. https://pubmed.ncbi.nlm.nih.gov/36397633/ [DOI] [PubMed] [Google Scholar]

[ref11] 14. Nguyen TP, Chae DS, Park SJ, Yoon J. A novel approach for evaluating bone mineral density of hips based on Sobel gradient-based map of radiographs utilizing convolutional neural network. Comput Biol Med. 2021;132:104298. 10.1016/j.compbiomed.2021.104298 [DOI] [PubMed] [Google Scholar]

[ref17] 15. Nissinen T, Suoranta S, Saavalainen T, et al. Detecting pathological features and predicting fracture risk from dual-energy X-ray absorptiometry images using deep learning. Bone Rep. 2021;14:101070. 10.1016/j.bonr.2021.101070 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref15] 16. Sato Y, Yamamoto N, Inagaki N, et al. Deep learning for bone mineral density and T-score prediction from chest X-rays: a Multicenter study. Biomedicines [Internet]. 2022;10(9):2323. https://pubmed.ncbi.nlm.nih.gov/36140424/. 10.3390/biomedicines10092323 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref16] 17. Tanphiriyakun T, Rojanasthien S, Khumrin P. Bone mineral density response prediction following osteoporosis treatment using machine learning to aid personalized therapy. Sci Rep. 2021;11(1):13811. 10.1038/s41598-021-93152-5 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref18] 18. Xiao P, Haque E, Zhang T, Dong XN, Huang Y, Wang X. Can DXA image-based deep learning model predict the anisotropic elastic behavior of trabecular bone? J Mech Behav Biomed Mater. 2021;124:104834. 10.1016/j.jmbbm.2021.104834 [DOI] [PubMed] [Google Scholar]

[ref19] 19. Zhang M, Gong H, Zhang M. Prediction of femoral strength of elderly men based on quantitative computed tomography images using machine learning. J Orthop Res Off Publ Orthop Res Soc. 2022;41(1):170-182. https://pubmed.ncbi.nlm.nih.gov/35393726/ [DOI] [PubMed] [Google Scholar]

[ref28] 20. Biamonte E, Levi R, Carrone F, et al. Artificial intelligence-based radiomics on computed tomography of lumbar spine in subjects with fragility vertebral fractures. J Endocrinol Investig. 2022;45(10):2007-2017. https://pubmed.ncbi.nlm.nih.gov/35751803/ [DOI] [PubMed] [Google Scholar]

[ref34] 21. Bui HM, Ha MH, Pham HG, et al. Predicting the risk of osteoporosis in older Vietnamese women using machine learning approaches. Sci Rep. 2022;12(1):20160. 10.1038/s41598-022-24181-x [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref26] 22. Chen Z, Luo W, Zhang Q, et al. Osteoporosis diagnosis based on ultrasound radio frequency signal via multi-channel convolutional neural network. Annu Int Conf IEEE Eng Med Biol Soc. 2021;2021:832-835. [DOI] [PubMed] [Google Scholar]

[ref32] 23. Chen YC, Li YT, Kuo PC, et al. Automatic segmentation and radiomic texture analysis for osteoporosis screening using chest low-dose computed tomography. Eur Radiol. 2023;33(7):5097-5106. https://pubmed.ncbi.nlm.nih.gov/36719495/ [DOI] [PubMed] [Google Scholar]

[ref23] 24. Erjiang E, Wang T, Yang L, et al. Machine learning can improve clinical detection of low BMD: the DXA-HIP study. J Clin Densitom. 2021;24(4):527-537. 10.1016/j.jocd.2020.10.004 [DOI] [PubMed] [Google Scholar]

[ref35] 25. Fasihi L, Tartibian B, Eslami R, Fasihi H. Artificial intelligence used to diagnose osteoporosis from risk factors in clinical data and proposing sports protocols. Sci Rep. 2022;12(1):18330. 10.1038/s41598-022-23184-y [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref36] 26. Huang CB, Hu JS, Tan K, Zhang W, Xu TH, Yang L. Application of machine learning model to predict osteoporosis based on abdominal computed tomography images of the psoas muscle: a retrospective study. BMC Geriatr. 2022;22(1):796. 10.1186/s12877-022-03502-9 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref27] 27. Jang M, Kim M, Bae SJ, Lee SH, Koh JM, Kim N. Opportunistic osteoporosis screening using chest radiographs with deep learning: development and external validation with a cohort dataset. J Bone Miner Res Off J Am Soc Bone Miner Res. 2022;37(2):369-377. 10.1002/jbmr.4477 [DOI] [PubMed] [Google Scholar]

[ref39] 28. Jang R, Choi JH, Kim N, Chang JS, Yoon PW, Kim CH. Prediction of osteoporosis from simple hip radiography using deep learning algorithm. Sci Rep. 2021;11(1):19997. 10.1038/s41598-021-99549-6 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref48] 29. Kwon Y, Lee J, Park JH, et al. Osteoporosis pre-screening using ensemble machine learning in postmenopausal Korean women. Healthcare. 2022;10(6):1107. Available from: 10.3390/healthcare10061107 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref29] 30. Liu L, Si M, Ma H, et al. A hierarchical opportunistic screening model for osteoporosis using machine learning applied to clinical data and CT images. BMC Bioinformatics. 2022;23(1):63. 10.1186/s12859-022-04596-z [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref43] 31. Luo W, Chen Z, Zhang Q, et al. Osteoporosis diagnostic model using a multichannel convolutional neural network based on quantitative ultrasound radiofrequency signal. Ultrasound Med Biol. 2022;48(8):1590-1601. 10.1016/j.ultrasmedbio.2022.04.005 [DOI] [PubMed] [Google Scholar]

[ref38] 32. Mao L, Xia Z, Pan L, et al. Deep learning for screening primary osteopenia and osteoporosis using spine radiographs and patient clinical covariates in a Chinese population. Front Endocrinol. 2022;13:971877. 10.3389/fendo.2022.971877 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref45] 33. Ou, Yang WY, Lai CC, Tsou MT, Hwang LC. Development of machine learning models for prediction of osteoporosis from clinical health examination data. Int J Environ Res Public Health. 2021;18(14):7635. 10.3390/ijerph18147635 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref24] 34. Park HW, Jung H, Back KY, et al. Application of machine learning to identify clinically meaningful risk group for osteoporosis in individuals under the recommended age for dual-energy X-ray absorptiometry. Calcif Tissue Int. 2021;109(6):645-655. 10.1007/s00223-021-00880-x [DOI] [PubMed] [Google Scholar]

[ref30] 35. Sebro R, De la Garza-Ramos C. Utilizing machine learning for opportunistic screening for low BMD using CT scans of the cervical spine. J Neuroradiol J Neuroradiol. 2023;50(3):293-301. 10.1016/j.neurad.2022.08.001 [DOI] [PubMed] [Google Scholar]

[ref31] 36. Sebro R, De la Garza-Ramos C. Opportunistic screening for osteoporosis and osteopenia from CT scans of the abdomen and pelvis using machine learning. Eur Radiol. 2023;33(3):1812-1823. 10.1007/s00330-022-09136-0 [DOI] [PubMed] [Google Scholar]

[ref37] 37. Sebro R, De la Garza-Ramos C. Machine learning for the prediction of osteopenia/osteoporosis using the CT attenuation of multiple osseous sites from chest CT. Eur J Radiol. 2022;155:110474. 10.1016/j.ejrad.2022.110474 [DOI] [PubMed] [Google Scholar]

[ref42] 38. Sebro R, De la Garza-Ramos C. Machine learning for opportunistic screening for osteoporosis from CT scans of the wrist and forearm. Diagn Basel Switz [Internet]. 2022;12(3):691. 10.3390/diagnostics12030691 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref47] 39. Shen Y, Sardar Z, Chase H, Coury JR, Cerpa M, Lenke LG. Predicting bone health using machine learning in patients undergoing spinal reconstruction surgery. Spine. 2023;48(2):120-126. 10.1097/BRS.0000000000004511 [DOI] [PubMed] [Google Scholar]

[ref41] 40. Suh B, Yu H, Kim H, et al. Interpretable deep-learning approaches for osteoporosis risk screening and individualized feature analysis using large population-based data: model development and performance evaluation. J Med Internet Res. 2023;25:e40179. 10.2196/40179 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref25] 41. Sukegawa S, Fujimura A, Taguchi A, et al. Identification of osteoporosis using ensemble deep learning model with panoramic radiographs and clinical covariates. Sci Rep. 2022;12(1):6088. 10.1038/s41598-022-10150-x [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref44] 42. Wang Y, Wang L, Sun Y, et al. Prediction model for the risk of osteoporosis incorporating factors of disease history and living habits in physical examination of population in Chongqing, Southwest China: based on artificial neural network. BMC Public Health. 2021;21(1):991. 10.1186/s12889-021-11002-5 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref46] 43. Widyaningrum R, Sela EI, Pulungan R, Septiarini A. Automatic segmentation of periapical radiograph using color histogram and machine learning for osteoporosis detection. Int J Dent. 2023;2023:1-9. 10.1155/2023/6662911 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref40] 44. Yamamoto N, Sukegawa S, Yamashita K, et al. Effect of patient clinical variables in osteoporosis classification using hip X-rays in deep learning analysis. Medicina. 2021;57(8):846. 10.3390/medicina57080846 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref33] 45. Yang J, Liao M, Wang Y, et al. Opportunistic osteoporosis screening using chest CT with artificial intelligence. Osteoporos Int. 2022;33(12):2547-2561. 10.1007/s00198-022-06491-y [DOI] [PubMed] [Google Scholar]

[ref49] 46. Bae J, Yu S, Oh J, et al. External validation of deep learning algorithm for detecting and visualizing femoral neck fracture including displaced and non-displaced fracture on plain X-ray. J Digit Imaging. 2021;34(5):1099-1109. 10.1007/s10278-021-00499-2 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref54] 47. Chen W, Liu X, Li K, et al. A deep-learning model for identifying fresh vertebral compression fractures on digital radiography. Eur Radiol. 2022;32(3):1496-1505. 10.1007/s00330-021-08247-4 [DOI] [PubMed] [Google Scholar]

[ref52] 48. Cheng CT, Wang Y, Chen HW, et al. A scalable physician-level deep learning algorithm detects universal trauma on pelvic radiographs. Nat Commun. 2021;12(1):1066. 10.1038/s41467-021-21311-3 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref61] 49. Chou PH, Jou THT, Wu HTH, et al. Ground truth generalizability affects performance of the artificial intelligence model in automated vertebral fracture detection on plain lateral radiographs of the spine. Spine J Off J North Am Spine Soc. 2022;22(4):511-523. 10.1016/j.spinee.2021.10.020 [DOI] [PubMed] [Google Scholar]

[ref57] 50. Del Lama RS, Candido RM, Chiari-Correia NS, Nogueira-Barbosa MH, de Azevedo-Marques PM, Tinós R. Computer-aided diagnosis of vertebral compression fractures using convolutional neural networks and radiomics. J Digit Imaging. 2022;35(3):446-458. 10.1007/s10278-022-00586-y [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref113] 51. Kuo RYL, Harrison C, Curran TA, et al. Artificial intelligence in fracture detection: a systematic review and meta-analysis. Radiology. 2022;304(1):50-62. 10.1148/radiol.211785 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref55] 52. Dong Q, Luo G, Lane NE, et al. Deep learning classification of spinal osteoporotic compression fractures on radiographs using an adaptation of the genant semiquantitative criteria. Acad Radiol. 2022;29(12):1819-1832. https://pubmed.ncbi.nlm.nih.gov/35351363/ [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref66] 53. Germann C, Meyer AN, Staib M, Sutter R, Fritz B. Performance of a deep convolutional neural network for MRI-based vertebral body measurements and insufficiency fracture detection. Eur Radiol. 2023;33(5):3188-3199. 10.1007/s00330-022-09354-6 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref71] 54. Guermazi A, Tannoury C, Kompel AJ, et al. Improving radiographic fracture recognition performance and efficiency using artificial intelligence. Radiology. 2022;302(3):627-636. 10.1148/radiol.210937 [DOI] [PubMed] [Google Scholar]

[ref73] 55. Inoue T, Maki S, Furuya T, et al. Automated fracture screening using an object detection algorithm on whole-body trauma computed tomography. Sci Rep. 2022;12(1):16549. 10.1038/s41598-022-20996-w [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref62] 56. Li Y, Zhang Y, Zhang E, et al. Differential diagnosis of benign and malignant vertebral fracture on CT using deep learning. Eur Radiol. 2021;31(12):9612-9619. https://pubmed.ncbi.nlm.nih.gov/33993335/ [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref65] 57. Li YC, Chen HH, Horng-Shing LH, Hondar Wu HT, Chang MC, Chou PH. Can a deep-learning model for the automated detection of vertebral fractures approach the performance level of human subspecialists? Clin Orthop. 2021;479(7):1598-1612. 10.1097/CORR.0000000000001685 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref59] 58. Monchka BA, Schousboe JT, Davidson MJ, et al. Development of a manufacturer-independent convolutional neural network for the automated identification of vertebral compression fractures in vertebral fracture assessment images using active learning. Bone. 2022;161:116427. 10.1016/j.bone.2022.116427 [DOI] [PubMed] [Google Scholar]

[ref63] 59. Monchka BA, Kimelman D, Lix LM, Leslie WD. Feasibility of a generalized convolutional neural network for automated identification of vertebral compression fractures: the Manitoba bone mineral density registry. Bone. 2021;150:116017. 10.1016/j.bone.2021.116017 [DOI] [PubMed] [Google Scholar]

[ref50] 60. Murphy EA, Ehrhardt B, Gregson CL, et al. Machine learning outperforms clinical experts in classification of hip fractures. Sci Rep. 2022;12(1):2058. 10.1038/s41598-022-06018-9 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref56] 61. Ozkaya E, Topal FE, Bulut T, Gursoy M, Ozuysal M, Karakaya Z. Evaluation of an artificial intelligence system for diagnosing scaphoid fracture on direct radiography. Eur J Trauma Emerg Surg. 2022;48(1):585-592. 10.1007/s00068-020-01468-0 [DOI] [PubMed] [Google Scholar]

[ref70] 62. Rosenberg GS, Cina A, Schiró GR, et al. Artificial intelligence accurately detects traumatic thoracolumbar fractures on sagittal radiographs. Medicina. 2022;58(8):998. https://pubmed.ncbi.nlm.nih.gov/35893113/ [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref51] 63. Sato Y, Takegami Y, Asamoto T, et al. Artificial intelligence improves the accuracy of residents in the diagnosis of hip fractures: a multicenter study. BMC Musculoskelet Disord. 2021;22(1):407. 10.1186/s12891-021-04260-2 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref53] 64. Twinprai N, Boonrod A, Boonrod A, et al. Artificial intelligence (AI) vs. human in hip fracture detection. Heliyon. 2022;8(11):e11266. 10.1016/j.heliyon.2022.e11266 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref68] 65. Xu F, Xiong Y, Ye G, et al. Deep learning-based artificial intelligence model for classification of vertebral compression fractures: a multicenter diagnostic study. Front Endocrinol. 2023;14:1025749. 10.3389/fendo.2023.1025749 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref64] 66. Akito Y, Masatoshi H, Hitoshi T, et al. Using artificial intelligence to diagnose fresh osteoporotic vertebral fractures on magnetic resonance images. Spine J Off J North Am Spine Soc. 2021;21(10):1652-1658. 10.1016/j.spinee.2021.03.006 [DOI] [PubMed] [Google Scholar]

[ref72] 67. Yadav DP, Sharma A, Athithan S, Bhola A, Sharma B, Dhaou IB. Hybrid SFNet model for bone fracture detection and classification using ML/DL. Sensors. 2022;22(15):5823. https://pubmed.ncbi.nlm.nih.gov/35957380/ [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref58] 68. Yeh LR, Zhang Y, Chen JH, et al. A deep learning-based method for the diagnosis of vertebral fractures on spine MRI: retrospective training and validation of ResNet. Eur Spine J. 2022;31(8):2022-2030. 10.1007/s00586-022-07121-1 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref60] 69. Yoda T, Maki S, Furuya T, et al. Automated differentiation between osteoporotic vertebral fracture and malignant vertebral fracture on MRI using a deep convolutional neural network. Spine. 2022;47(8):E347-E352. 10.1097/BRS.0000000000004307 [DOI] [PubMed] [Google Scholar]

[ref69] 70. Zakharov A, Pisov M, Bukharaev A, et al. Interpretable vertebral fracture quantification via anchor-free landmarks localization. Med Image Anal. 2023;83:102646. 10.1016/j.media.2022.102646 [DOI] [PubMed] [Google Scholar]

[ref67] 71. Zhang J, Liu F, Xu J, et al. Automated detection and classification of acute vertebral body fractures using a convolutional neural network on computed tomography. Front Endocrinol. 2023;14:1132725. 10.3389/fendo.2023.1132725 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref88] 72. Cary MPJ, Zhuang F, Draelos RL, et al. Machine learning algorithms to predict mortality and allocate palliative Care for Older Patients with hip fracture. J Am Med Dir Assoc. 2021;22(2):291-296. 10.1016/j.jamda.2020.09.025 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref76] 73. Chen Y, Yang T, Gao X, Xu A. Hybrid deep learning model for risk prediction of fracture in patients with diabetes and osteoporosis. Front Med. 2021;16(3):496-506. https://pubmed.ncbi.nlm.nih.gov/34448125/ [DOI] [PubMed] [Google Scholar]

[ref95] 74. Chen R, Huang Q, Chen L. Development and validation of machine learning models for prediction of fracture risk in patients with elderly-onset rheumatoid arthritis. Int J Gen Med. 2022;15:7817-7829. 10.2147/IJGM.S380197 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref96] 75. Cheng CH, Lin CY, Cho TH, Lin CM. Machine learning to predict the progression of bone mass loss associated with personal characteristics and a metabolic syndrome scoring index. Healthcare. 2021;9(8):948. 10.3390/healthcare9080948 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref92] 76. Coco, Martín MB, Leal Vega L, Blázquez Cabrera J, et al. Comorbidity and osteoporotic fracture: approach through predictive modeling techniques using the OSTEOMED registry. Aging Clin Exp Res. 2022;34(9):1997-2004. https://pubmed.ncbi.nlm.nih.gov/35435583/ [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref93] 77. de Vries BCS, Hegeman JH, Nijmeijer W, Geerdink J, Seifert C, Groothuis-Oudshoorn CGM. Comparing three machine learning approaches to design a risk assessment tool for future fractures: predicting a subsequent major osteoporotic fracture in fracture patients with osteopenia and osteoporosis. Osteoporos Int. 2021;32(3):437-449. 10.1007/s00198-020-05735-z [DOI] [PubMed] [Google Scholar]

[ref86] 78. De Baun MR, Chavez G, Fithian A, et al. Artificial neural networks predict 30-day mortality after hip fracture: insights from machine learning. J Am Acad Orthop Surg. 2021;29(22):977-983. 10.5435/JAAOS-D-20-00429 [DOI] [PubMed] [Google Scholar]

[ref83] 79. Du J, Wang J, Gai X, Sui Y, Liu K, Yang D. Application of intelligent X-ray image analysis in risk assessment of osteoporotic fracture of femoral neck in the elderly. Math Biosci Eng. 2023;20(1):879-893. 10.3934/mbe.2023040 [DOI] [PubMed] [Google Scholar]

[ref90] 80. Forssten MP, Bass GA, Ismail AM, Mohseni S, Cao Y. Predicting 1-year mortality after hip fracture surgery: an evaluation of multiple machine learning approaches. J Pers Med. 2021;11(8):727. 10.3390/jpm11080727 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref82] 81. Galassi A, Martin-Guerrero J, Villamor E, Monserrat C, Rupérez MJ. Risk assessment of hip fracture based on machine learning. Appl Bionics Biomech. 2020;2020:1-13. 10.1155/2020/8880786 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref97] 82. Harris AHS, Trickey AW, Eddington HS, et al. A tool to estimate risk of 30-day mortality and complications after hip fracture surgery: accurate enough for some but not all purposes? A study from the ACS-NSQIP database. Clin Orthop. 2022;480(12):2335-2346. 10.1097/CORR.0000000000002294 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref87] 83. Kitcharanant N, Chotiyarnwong P, Tanphiriyakun T, et al. Development and internal validation of a machine-learning-developed model for predicting 1-year mortality after fragility hip fracture. BMC Geriatr. 2022;22(1):451. 10.1186/s12877-022-03152-x [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref84] 84. Klemt C, Yeo I, Cohen-Levy WB, Melnic CM, Habibi Y, Kwon YM. Artificial neural networks can predict early failure of Cementless Total hip arthroplasty in patients with osteoporosis. J Am Acad Orthop Surg. 2022;30(10):467-475. 10.5435/JAAOS-D-21-00775 [DOI] [PubMed] [Google Scholar]

[ref80] 85. Kong SH, Lee JW, Bae BU, et al. Development of a spine X-ray-based fracture prediction model using a deep learning algorithm. Endocrinol Metab Seoul Korea. 2022;37(4):674-683. 10.3803/EnM.2022.1461 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref91] 86. Lei M, Han Z, Wang S, et al. A machine learning-based prediction model for in-hospital mortality among critically ill patients with hip fracture: an internal and external validated study. Injury. 2023;54(2):636-644. 10.1016/j.injury.2022.11.031 [DOI] [PubMed] [Google Scholar]

[ref94] 87. Lu S, Fuggle NR, Westbury LD, et al. Machine learning applied to HR-pQCT images improves fracture discrimination provided by DXA and clinical risk factors. Bone. 2023;168:116653. 10.1016/j.bone.2022.116653 [DOI] [PubMed] [Google Scholar]

[ref77] 88. Ma Y, Lu Q, Yuan F, Chen H. Comparison of the effectiveness of different machine learning algorithms in predicting new fractures after PKP for osteoporotic vertebral compression fractures. J Orthop Surg. 2023;18(1):62. 10.1186/s13018-023-03551-9 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref89] 89. Oosterhoff JHF, Savelberg ABMC, Karhade AV, et al. Development and internal validation of a clinical prediction model using machine learning algorithms for 90 day and 2 year mortality in femoral neck fracture patients aged 65 years or above. Eur J Trauma Emerg Surg. 2022;48(6):4669-4682. 10.1007/s00068-022-01981-4 https://pubmed.ncbi.nlm.nih.gov/35643788/ [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref78] 90. Poullain F, Champsaur P, Pauly V, et al. Vertebral trabecular bone texture analysis in opportunistic MRI and CT scan can distinguish patients with and without osteoporotic vertebral fracture: a preliminary study. Eur J Radiol. 2023;158:110642. 10.1016/j.ejrad.2022.110642 [DOI] [PubMed] [Google Scholar]

[ref81] 91. Shimizu H, Enda K, et al. Machine learning algorithms: prediction and feature selection for clinical Refracture after surgically treated fragility fracture. J Clin Med. 2022;11(7):2021. 10.3390/jcm11072021 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref85] 92. Shtar G, Rokach L, Shapira B, Nissan R, Hershkovitz A. Using machine learning to predict rehabilitation outcomes in postacute hip fracture patients. Arch Phys Med Rehabil. 2021;102(3):386-394. 10.1016/j.apmr.2020.08.011 [DOI] [PubMed] [Google Scholar]

[ref79] 93. Takahashi S, Terai H, Hoshino M, et al. Machine-learning-based approach for nonunion prediction following osteoporotic vertebral fractures. Eur Spine J. 2022;32(11):3788-3796. https://pubmed.ncbi.nlm.nih.gov/36269421/ [DOI] [PubMed] [Google Scholar]

[ref74] 94. Ulivieri FM, Rinaudo L, Messina C, et al. Bone strain index predicts fragility fracture in osteoporotic women: an artificial intelligence-based study. Eur Radiol Exp. 2021;5(1):47. 10.1186/s41747-021-00242-0 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref75] 95. Ulivieri FM, Rinaudo L, Piodi LP, et al. Bone strain index as a predictor of further vertebral fracture in osteoporotic women: an artificial intelligence-based analysis. PLoS One. 2021;16(2):e0245967. 10.1371/journal.pone.0245967 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref104] 96. Cheng P, Yang Y, Yu H, He Y. Automatic vertebrae localization and segmentation in CT with a two-stage dense-U-net. Sci Rep. 2021;11(1):22156. 10.1038/s41598-021-01296-1 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref102] 97. Deng Y, Wang L, Zhao C, et al. A deep learning-based approach to automatic proximal femur segmentation in quantitative CT images. Med Biol Eng Comput. 2022;60(5):1417-1429. 10.1007/s11517-022-02529-9 [DOI] [PubMed] [Google Scholar]

[ref107] 98. Kim KC, Cho HC, Jang TJ, Choi JM, Seo JK. Automatic detection and segmentation of lumbar vertebrae from X-ray images for compression fracture evaluation. Comput Methods Prog Biomed. 2021;200:105833. 10.1016/j.cmpb.2020.105833 [DOI] [PubMed] [Google Scholar]

[ref109] 99. Kim DH, Jeong JG, Kim YJ, Kim KG, Jeon JY. Automated vertebral segmentation and measurement of vertebral compression ratio based on deep learning in X-ray images. J Digit Imaging. 2021;34(4):853-861. 10.1007/s10278-021-00471-0 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref106] 100. Park T, Yoon MA, Cho YC, et al. Automated segmentation of the fractured vertebrae on CT and its applicability in a radiomics model to predict fracture malignancy. Sci Rep. 2022;12(1):6735. 10.1038/s41598-022-10807-7 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref108] 101. Suri A, Jones BC, Ng G, et al. A deep learning system for automated, multi-modality 2D segmentation of vertebral bodies and intervertebral discs. Bone. 2021;149:115972. 10.1016/j.bone.2021.115972 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref103] 102. Wang D, Wu Z, Fan G, et al. Accuracy and reliability analysis of a machine learning based segmentation tool for intertrochanteric femoral fracture CT. Front Surg. 2022;9:913385. 10.3389/fsurg.2022.913385 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref100] 103. Wei D, Wu Q, Wang X, Tian M, Li B. Accurate instance segmentation in Pediatric elbow radiographs. Sensors. 2021;21(23):7966. https://pubmed.ncbi.nlm.nih.gov/34883969/. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref99] 104. Yang F, Weng X, Miao Y, Wu Y, Xie H, Lei P. Deep learning approach for automatic segmentation of ulna and radius in dual-energy X-ray imaging. Insights Imaging. 2021;12(1):191. 10.1186/s13244-021-01137-9 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref101] 105. Yang L, Gao S, Li P, Shi J, Zhou F. Recognition and segmentation of individual bone fragments with a deep learning approach in CT scans of complex intertrochanteric fractures: a retrospective study. J Digit Imaging. 2022;35(6):1681-1689. https://pubmed.ncbi.nlm.nih.gov/35711073/ [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref105] 106. Zhao Y, Zhao T, Chen S, et al. Fully automated radiomic screening pipeline for osteoporosis and abnormal bone density with a deep learning-based segmentation using a short lumbar mDixon sequence. Quant Imaging Med Surg. 2022;12(2):1198-1213. 10.21037/qims-21-587 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref20] 107. Jolliffe IT, Cadima J. Principal component analysis: a review and recent developments. Philos Trans R Soc Math Phys Eng Sci. 2016;374(2065):20150202. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref21] 108. Tibshirani R. Regression shrinkage and selection via the Lasso. J R Stat Soc Ser B Methodol. 1996;58(1):267-288. 10.1111/j.2517-6161.1996.tb02080.x [DOI] [Google Scholar]

[ref22] 109. Lundberg S, Lee SI. A unified approach to interpreting model predictions. 2017. [cited 2023 Apr 19]. https://arxiv.org/abs/1705.07874

[ref98] 110. Kursa MB, Rudnicki WR. Feature selection with the Boruta package. J Stat Softw. 2010;36(11):1-13. 10.18637/jss.v036.i11 [DOI] [Google Scholar]

[ref110] 111. Collins GS, Reitsma JB, Altman DG, Moons K. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. BMC Med. 2015;13(1):1. 10.1186/s12916-014-0241-z [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref111] 112. Oliveira E, Carmo L, Van Den Merkhof A, et al. An increasing number of convolutional neural networks for fracture recognition and classification in orthopaedics: are these externally validated and ready for clinical application? Bone Jt Open. 2021;2(10):879-885. 10.1302/2633-1462.210.BJO-2021-0133 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref112] 113. Slim K, Nini E, Forestier D, Kwiatkowski F, Panis Y, Chipponi J. Methodological index for non-randomized studies ( MINORS ): development and validation of a new instrument: methodological index for non-randomized studies. ANZ J Surg. 2003;73(9):712-716. 10.1046/j.1445-2197.2003.02748.x [DOI] [PubMed] [Google Scholar]

[ref114] 114. Wolff RF, Moons KGM, Riley RD, et al. PROBAST: a tool to assess the risk of bias and applicability of prediction model studies. Ann Intern Med. 2019;170(1):51. 10.7326/M18-1376 [DOI] [PubMed] [Google Scholar]

[ref115] 115. König IR, Malley JD, Weimar C, Diener HC, Ziegler A, German Stroke Study Collaboration . On behalf of the German stroke study collaboration. Practical experiences on the necessity of external validation. Stat Med. 2007;26(30):5499-5511. 10.1002/sim.3069 [DOI] [PubMed] [Google Scholar]

[ref116] 116. Yu AC, Mohajer B, Eng J. External validation of deep learning algorithms for radiologic diagnosis: a systematic review. Radiol Artif Intell. 2022;4(3):e210064. 10.1148/ryai.210064 [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Development and reporting of artificial intelligence in osteoporosis management

Guillaume Gatineau

Enisa Shevroja

Colin Vendrami

Elena Gonzalez-Rodriguez

William D Leslie

Olivier Lamy

Didier Hans

Abstract

Graphical Abstract

Graphical Abstract.

Introduction

Objectives

Materials and methods

Literature search strategy

Study selection

Study qualitative analysis

Results

Search strategy

Figure 1.

General characteristics of the studies

Table 1.

Table 5.

Figure 2.

Qualitative analysis of the studies

Figure 3.

Studies on bone properties

Studies on osteoporosis classification

Table 2.

Studies on fracture detection/classification

Table 3.

Studies on risk prediction

Table 4.

Studies on bone segmentation

Studies on clinical decision support

Discussion

Quality assessment standards in AI

Overall quality of the studies

Advances in AI in osteoporosis

Themes in model performance

Model explainability and clinical implications

Strengths and limitations of this study

Conclusion

Supplementary Material

Acknowledgments

Contributor Information

Author contributions

Funding

Conflicts of interest

Data availability

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases