Abstract
Abstract
Introduction
Propofol is a widely used sedative-hypnotic agent for critically ill patients requiring invasive mechanical ventilation (IMV). Despite its clinical benefits, propofol is associated with increased risks of hypertriglyceridemia. Early identification of patients at risk for propofol-associated hypertriglyceridemia is crucial for optimising sedation strategies and preventing adverse outcomes. Machine-learning (ML) models offer a promising approach for predicting individualised patient risks of propofol-associated hypertriglyceridemia.
Methods and analysis
We propose the development of an ML model aimed at predicting the risk of propofol-associated hypertriglyceridemia in ICU patients receiving IMV. The study will use retrospective data from four Mayo Clinic sites. Nested cross validation (CV) will be employed, with a tenfold inner CV loop for model tuning and selection as well as an outer loop using leave-one-site-out CV for external validation. Feature selection will be conducted using Boruta and least absolute shrinkage and selection operator-penalised logistic regression. Data preprocessing steps include missing data imputation, feature scaling and dimensionality reduction techniques. Six ML algorithms will be tuned and evaluated. Bayesian optimisation will be used for hyperparameter selection. Global model explainability will be assessed using permutation importance, and local model explainability will be assessed using SHapley Additive exPlanations.
Ethics and dissemination
The proposed ML model aims to provide a reliable and interpretable tool for clinicians to predict the risk of propofol-associated hypertriglyceridemia in ICU patients. The final model will be deployed in a web-based clinical risk calculator. The model development process and performance measures obtained during nested CV will be described in a study publication to be disseminated in a peer-reviewed journal. The proposed study has received ethics approval from the Mayo Clinic Institutional Review Board (IRB #23–0 07 416).
Keywords: INTENSIVE & CRITICAL CARE, Health informatics, Adult anaesthesia
STRENGTHS AND LIMITATIONS OF THIS STUDY.
Robust external validation using a nested cross-validation framework will help assess the generalisability of models produced from the modelling pipeline across different hospital settings.
A diverse set of machine-learning (ML) algorithms and advanced hyperparameter tuning techniques will be employed to identify the most optimal model configuration.
Integration of feature explainability will enhance the clinical applicability of the ML models by providing transparency in predictions, which can improve clinician trust and encourage adoption.
Reliance on retrospective data may introduce biases due to inconsistent or erroneous data collection, and the computational intensity of the validation approach may limit replication and future model expansion in resource-constrained settings.
Introduction
Propofol is a sedative-hypnotic agent commonly used for sedation in critically ill adults requiring invasive mechanical ventilation (IMV).1 It is recommended as one of the first-line regimens for this indication by the 2018 pain, agitation/sedation, delirium, immobility and sleep guidelines2 due to its rapid onset and short duration of action. As a highly lipophilic drug, propofol is formulated in a 10% fat emulsion, typically using soybean oil.1 However, this formulation has the disadvantage of predisposing patients to hypertriglyceridemia.1 3 Up to 10% of patients who develop propofol-associated hypertriglyceridemia may progress to pancreatitis,4 which substantially increases these patients’ risks of morbidity and mortality in the ICU.5
Thus, it is crucial to assess potential risk factors associated with the development of propofol-associated hypertriglyceridemia when selecting sedative regimens. Several retrospective cohort studies have identified important predictors of hypertriglyceridemia following propofol sedation, including advanced age,4 propofol dose and duration, body mass index, illness severity and concomitant medications.6 7 However, the question of how this knowledge can be applied systematically for guiding clinical practice remains unanswered.
Machine learning (ML), which is a modelling paradigm that can identify complex and non-linear patterns in large datasets,8 9 has proven effective for providing individualised patient risk stratification in medical diagnoses and prognoses.10 ML models could potentially help identify recently intubated ICU patients who are at higher risks for developing propofol-associated hypertriglyceridemia, allowing for these patients to be switched to alternative sedation regimens. Thus, the objective of the proposed ML development study is to create an ML-powered clinical calculator to aid in selecting sedating regimens based on the personalised patient risk predictions of developing propofol-associated hypertriglyceridemia in ICU settings.
Methods and analysis
The proposed study will be conducted and reported in accordance with the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis+AI Checklist for Prediction Model Development and Evaluation.11
Data sources
The proposed study involves a secondary analysis of data from a multicentred retrospective cohort investigation conducted at Mayo Clinic sites in the USA. The original retrospective study included consecutive adults (≥18 years of age) admitted to 1 of 11 ICUs across four Mayo Clinic sites: (1) Mayo Clinic Rochester; (2) Mayo Clinic Phoenix; (3) Mayo Clinic Jacksonville and (4) Mayo Clinic Health System community sites in Mankato, Minnesota and Eau Claire, Wisconsin. Patient inclusion criteria for the proposed model development study are the same as the eligibility criteria in the original retrospective study, which included: (1) admission to one of the study ICUs between 5 May 2018 and 30 June 2023; (2) required IMV for greater than 24 hours and (3) received continuous propofol infusion for>24 hours. Exclusion criteria included: (1) development of hypertriglyceridemia (defined as serum triglyceride levels>400 mg/dL)5 prior to propofol infusion and (2) lack of prior authorisation for medical records to be assessed for research purposes. The index ICU admission for eligible patients was identified through institutional data warehouses (Mayo Clinic ICU DataMart and Unified Data Platform).
Outcomes of interest
The goal of model development is to predict the probability of hypertriglyceridemia within 10 days following the start of propofol infusion. Hypertriglyceridemia is defined as serum triglyceride levels exceeding 400 mg/dL.5 The predicted probability estimates will be transformed into a binary classification to categorise patients into high-risk and low-risk groups for developing hypertriglyceridemia based on a decision threshold. The anticipated usage setting for the model is during the ICU admission process or shortly before/after intubation and sedation for IMV.
Nested cross validation (CV)
A nested CV methodology12 will be used to evaluate the performance and consistency of the overall ML modelling process. Nested CV involves two layers of CV (figure 1).
Figure 1. Illustration of the proposed nested CV approach. The outer loop consists of a LOSO-CV approach for repeated external validation of models produced by the inner loop. The inner loop involves a tenfold CV approach for model tuning and selection. CV, cross validation; LOSO, leave-one-site-out; MCHS, Mayo Clinic Health System (community sites).
Inner CV loop: The inner CV loop is used to tune each ML algorithm and to select the best-performing model. This is similar to regular, non-nested ‘flat’ CV.
Outer CV loop: The dataset is divided into k outer folds. In each iteration of the outer loop, onefold is held out as the test set, while the remaining k-1 folds are used for model tuning and selection using the inner CV loop. The main function of the outer loop is to estimate the performance of the best-performing model selected by the inner loop.13
For our proposed study, we will use stratified tenfold CV as our inner CV loop for hyperparameter tuning and model selection. For the outer CV loop, we will use leave-one-site-out CV (LOSO-CV). In our study, LOSO-CV involves tuning, selecting and training models on data from three of the four included Mayo Clinic sites and externally validating the best-performing model on data from the remaining site. This is repeated four times so that each Mayo Clinic site serves as the external validation set at least once. In essence, we are externally validating our modelling process four times to better assess how our models will perform on new, unseen data.14 Following the nested CV process, we will run the inner CV loop on the entire dataset to generate the final production model for deployment into a clinical calculator.
Feature selection and sample size assessment
Candidate features will be first filtered based on the availability (≤10% of missing data) and expert domain (see table 1 for the list of candidate features). For each training set in the outer CV loop, the dimension of the feature set will be further reduced using random-forest-based Boruta15 and least absolute shrinkage and selection operator (LASSO) penalised logistic regression.16 During feature selection, the regularisation hyperparameter α of the LASSO penalised logistic regression model will be determined via a grid search of 1000 α values along the regularisation path to minimise the LASSO objective function across tenfold CV. A union of features selected by the two methods will be chosen as the final feature set used for predictive modelling.
Table 1. Candidate features selected based on the data availability and expert domain.
| Feature | Description |
| Patient demographics and comorbidities | |
| Age at ICU admission | Continuous feature, in the unit of years |
| Sex | Binary feature: yes, no (male) |
| BMI | Continuous feature, in the unit of kg/m2 |
| History of hypertension | Binary feature: yes, no |
| History of diabetes mellitus | Binary feature: yes, no |
| Charlson comorbidity index | Continuous feature, unitless |
| Clinical characteristics at ICU admission | |
| ICU admission 24-hour respiratory SOFA score | Categorical feature: 0–4 |
| ICU admission 24-hour CNS SOFA score | Categorical feature: 0–4 |
| ICU admission 24-hour cardiovascular SOFA score | Categorical feature: 0–4 |
| ICU admission 24-hour hepatic SOFA score | Categorical feature: 0–4 |
| ICU admission 24-hour coagulation SOFA score | Categorical feature: 0–4 |
| ICU admission 24-hour renal SOFA score | Categorical feature: 0–4 |
| COVID-19 positive at ICU admission | Binary feature: yes, no |
| First lab values during ICU admission | |
| First WBC count in ICU | Continuous feature, in the unit of ×109/L |
| First blood glucose in ICU | Continuous feature, in the unit of mmol/L |
| First serum lactate in ICU | Continuous feature, in the unit of mmol/L |
| Propofol and ventilation characteristics | |
| Initial propofol dose | Continuous feature, in the unit of mg/kg/hour, represents the average hourly propofol dose over the first 24 hours of infusion |
| Number of days from ICU admission to onset of IMV | Continuous feature, in the unit of days |
BMIbody mass indexCNS, central nervous system; ICUintensive care unitIMV, invasive mechanical ventilationSOFA, sequential organ failure assessment (score); WBC, white blood cell (count)
The goal of the feature selection process is to follow the ‘one-in-ten’ rule—which states that there should be ten patients with the target outcome in the dataset to support every one feature included to ensure stable ML model predictions17 18—as closely as possible. Given that our previous analyses involving our dataset showed that there were 851 patients who developed hypertriglyceridemia following propofol administration (without applying the 10-day postpropofol-initiation restriction),19 we will likely have capacity for 70–80 features. This is far greater than the number of candidate features that we expect to include in the model; thus, we anticipate that our dataset will contain enough case samples for modelling purposes.
Data preprocessing
Within each iteration of the outer CV loop, the training set will be split into tenfolds of different training and testing subsets for the inner CV loop. A data preprocessing pipeline will be constructed to identify and impute missing data, correct for class imbalance and improve compatibility with ML algorithms before the data are entered into the ML models (figure 2).
Figure 2. Flowchart illustrating the flow of training data through our proposed modelling pipeline. Because PCA requires a complete dataset with no missing data, model configurations that did not use imputation (and relied on ML algorithms’ native missing data handling methods) cannot use PCA. The flow of data through configurations without imputation is shown using the dotted arrows. LASSO, least absolute shrinkage and selection operator (penalised logistic regression); MICE, multivariate imputation by chained equations; ML, machine learning; PCA, principal component analysis.
Categorical data encoding
As all of the candidate categorical features are non-ordinal, they will be transformed via one-hot encoding into multiple binary features.20
Missing data imputation
Because we expect the amount of missing data in the dataset to be low (≤10%), we will assume missing data to be missing at random and perform missing data imputation. Imputation will be conducted using multivariate imputation by chained equations (MICE).21 As previous research had suggested that the number of MICE iterations should be determined based on the highest percentage of data missing (eg, if 30% of data need to be imputed for one feature, then MICE will be run for 30 iterations),22 23 we will run MICE for ten iterations to account for the maximum missing data percentage of 10%. The training and testing subsets will be imputed separately to avoid data leakage.24 When assessing and tuning ML algorithms with native missing data handling methods, we will treat the use of MICE as a hyperparameter to trial both MICE and the algorithms’ native missing data handling approaches.
Data imbalance and resampling
Given that we aim to develop a model that can perform probability predictions, we will not resample the dataset even if the dataset is imbalanced. However, we will tune ML algorithms’ weight scaling factors as a hyperparameter during model tuning to help reduce bias towards the majority class.
Feature scaling
Before the dataset is entered into dimensionality reduction steps and ML algorithms, the normality of each continuous feature within the dataset will be assessed using the Shapiro–Wilk test25 and by visual inspection of the features’ histograms.26 If the distribution of all continuous features is determined to be close to normal, the continuous features will be Z-score normalised to centre the feature data around the mean and scale the data according to its SD.27 If any of the continuous features are not close to normal, the continuous features will be transformed using robust normalisation, which centres the data around the median and scales the data around their IQR.27 Compared with Z-score normalisation, robust normalisation is more robust towards the presence of outlier values in the dataset.28
Dimensionality reduction
Potential multicollinearity in the final feature set will be assessed using variance inflation factors (VIFs).29 If the VIF for any feature exceeds 5, we will trial the use of principal component analysis as a dimensionality and multicollinearity reduction approach in our data preprocessing pipeline. The lowest number of principal components needed to explain at least 95% of the variance will be kept.30
Hyperparameter tuning
Within each iteration of the outer CV loop, we will tune and evaluate the following.
Two classical ML algorithms: (1) logistic regression (with LASSO, ridge or elastic net penalisation) and (2) support vector machines (with linear, polynomial, radial basis function or sigmoid kernels).
Three ensemble ML algorithms: (1) random decision forest, (2) light gradient-boosting machine31 (LightGBM, with either regular gradient boosting decision trees (GBDTs) algorithm32 or dropouts meet multiple additive regression trees (DART) algorithm33) and (3) eXtreme Gradient Boosting (XGBoost, with either regular GBDTs algorithm32 or DART algorithm.33 34
A multilayer perceptron neural network with two, three or four hidden layers. Each hidden layer will use a rectified linear unit (ReLU) activation function with Kaiming kernel initialisation.35 Each ReLU layer will be followed by a dropout layer for regularisation36 and a batch normalisation layer to improve training stability and speed.37 All hidden layers will use the same number of neurons determined via hyperparameter tuning. The output layer will use a sigmoid function to produce the predicted probability. An AdamW optimiser will be used.38 An example of the proposed network architecture with two hidden layers is shown in figure 3.
Figure 3. Example schema of the proposed network architecture, with dropout and batch normalisation components in each hidden layer. An example with two hidden layers is shown. ReLU, rectified linear unit.
Each algorithm will be tuned to minimise cross-entropy loss across stratified tenfold CV. The full list of hyperparameters that will be tuned for each model is tabulated in (online supplemental tables S1 - S17). The optimal hyperparameters will be selected using Bayesian optimisation.39
Bayesian optimisation starts with several initial rounds of random hyperparameter searches to gather data points for building a probabilistic model that predicts the performance of different hyperparameter combinations. An acquisition function then uses this model to identify the most promising hyperparameter combinations for the next round of evaluations. The results are used to update the probabilistic model, and the process is repeated until a pre-established performance budget is exhausted.40 In essence, Bayesian optimisation can be thought of as a more intelligent and ‘guided’ version of random hyperparameter searching, and it is empirically considered to outperform the traditional grid-search and random search approaches.41 Because random search can reliably identify hyperparameter combinations from the top 5% of the most performant combinations with 60 iterations,39 we aim to perform Bayesian optimisation at or beyond this performance budget. With computational limitations, we estimate that Bayesian optimisation will be performed for at least 200 iterations per model.
Calibration
As the objective of our hyperparameter tuning process is to minimise cross-entropy loss, we can reasonably assume that our models will be well calibrated. Nevertheless, we will trial Platt/sigmoidal scaling,42 43 isotonic regression42 or no further calibration to assess if our calibration performance can be further improved. The calibration approach with the lowest average cross-entropy loss on stratified tenfold CV will be selected.
Threshold selection
Following calibration, each model will undergo threshold tuning to maximise their binary classification performance. Youden’s index44 will be calculated for every classification threshold between 0.01 and 0.99 at an interval of 0.01, and the threshold with the highest average Youden’s index across stratified tenfold CV will be selected as the most optimal threshold.
Model selection and evaluation
We will follow a previously published framework proposed for the evaluation of clinical prediction models45 to perform model selection and external validation. The classification performance of each tuned and calibrated model will be assessed using the area under the curve of the receiver operating characteristic curve, accuracy, sensitivity, specificity and Youden’s index. Calibration performance will be assessed using Brier score.
During model selection within the training set of each outer CV loop, all metrics will be produced and averaged across tenfold CV, and 95% CIs will be used to assess performance variance. The model configuration with the best classification and calibration performance will be selected as the most optimal configuration.
For external validation within each iteration of the outer CV loop, the same set of classification and calibration metrics will be generated using the most optimal model configuration and data from the held-out site not involved in model tuning and selection. The external validation metrics will be averaged across all four iterations of the outer CV loop to assess the generalisation performance of our modelling process, and 95% CI will be used to assess the variance of the generalisation performance.
Model explainability and fairness
To understand how the final production model makes its predictions and to assess model bias arising from over-reliance on patient demographics, we will use the permutation importance method to assess which features have the greatest impact on the performance of our final production model.46 Permutation importance is a global explainability method that involves repeatedly shuffling individual predictors in the training dataset, which effectively renders the predictor useless to the model. It then observes how this operation affects model performance, thereby identifying the most influential features. To understand how the production model makes individual predictions, we will use permutation-based SHapley Additive exPlanations (SHAP),47 which is a local explainability method that produces an estimate of the direction and magnitude of each features’ effect on individual predictions. A Bee swarm summary plot will be used to illustrate SHAP feature attributions for each patient in our dataset.
Model deployment
The final production model will be integrated into a web-based clinical risk calculator based on shiny. SHAP waterfall plots will be provided for each individual prediction generated through the calculator to allow users to understand how the model is generating its predictions and to assess whether the predictions make clinical and biological sense.
Implementation of permutation SHAP in the clinical risk calculator requires access to the full training dataset as the user is using the calculator in order to allow SHAP to generate perturbations around a prediction to understand the local behaviours of the model. Involvement of training data in a web-based context represents a data privacy concern. To facilitate SHAP implementation, we will generate a synthetic dataset that mimics the distribution and characteristics of our training dataset for SHAP perturbations using a differentially private conditional tabular generative adversarial network (DP-CTGAN).48 DP-CTGAN uses random noise during network training and data generation to offer a privacy guarantee. Hyperparameters of the DP-CTGAN will be tuned manually to minimise generator and discriminator loss. The privacy budget ε will be set to between 0 and 10. We aim to generate a synthetic dataset that yields similar SHAP value predictions and SHAP base values for use with the clinical risk calculator.
Statistical software
The modelling and evaluation processes will be completed using Python 3.11. Boruta-based feature selection will be conducted using the BorutaPy package, and LASSO-based feature selection will be conducted using scikit-learn. ML models will be fitted, tuned, calibrated and evaluated using scikit-learn, lightGBM, XGBoost, tensorflow, keras and scikit-optimise. Permutation importance will be assessed using eli5, and permutation SHAP will be implemented using shap.
Patient and public involvement
None.
Ethics and dissemination
The proposed study has received ethics approval from the Mayo Clinic Institutional Review Board (IRB #23–0 07 416). The requirement for informed consent is waived by institutional review. The development process of the final model and performance measures yielded by nested CV will be described in a research article to be disseminated in a peer-reviewed academic journal.
Discussion
While traditional statistical modelling is useful for identifying the predictors of hypertriglyceridemia in mechanically ventilated patients receiving continuous propofol infusion, it is difficult to systematically translate these findings to actual clinical practice. By leveraging ML-based modelling approaches and by incorporating ML models into an accessible clinical risk calculator, we aim to develop a tool that could guide clinicians in making informed decisions surrounding the choice of sedation regimens and potentially reduce the incidence of propofol-related hypertriglyceridemia and associated adverse events, such as acute pancreatitis.
A notable use case for our proposed clinical risk calculator is to identify patients who may benefit from increased triglyceride level monitoring. In the absence of widely accepted guidelines and protocols, routine measurements of triglycerides among patients receiving IMV with propofol sedation are often ad hoc and remain care-team dependent. Previous studies have shown that only 15%–24% of intubated patients with propofol sedation receive routine measurements of triglyceride levels, with about a fifth of these patients having elevated triglyceride levels>400 mg/dL.5 It is possible that an ML model can identify specific patient characteristics at the time of ICU admission or intubation that can predict patient risks of developing hypertriglyceridemia so that triglyceride levels in these patients can be monitored and used to tailor the patients’ sedation regimen accordingly.
The proposed ML modelling study has several notable strengths. First, we propose the use of multiple well-established feature selection methods to reduce the dimensionality of our training dataset. This helps us to keep only the most important predictors of hypertriglyceridemia in our models, and to avoid the ‘curse of dimensionality’, including overfitting the models on noise in the data. Second, we aim to use robust external validation methods involving a nested CV framework with LOSO-CV. This approach allows the generalisability of our modelling process to be rigorously and repeatedly evaluated across different hospital settings, which improves the external validity of our performance metrics. Third, we aim to trial a diverse set of popular ML algorithms, including classical, ensemble and neural network models, along with state-of-the-art hyperparameter tuning techniques using Bayesian optimisation. This exhaustive approach will allow us to identify the most optimal ML model configuration for our dataset and target outcome. Finally, we considered the clinical applicability of our ML models by proposing to integrate both global and local feature explainability to assess how our final model makes its predictions, as well as by integrating our final model into a web-based clinical risk calculator.
Despite these strengths, we foresee several potential challenges associated with the proposed study. First, the study relies on retrospective data, which can introduce biases relating to misclassification from potential errors in charting and missing data. The retrospective nature of the data also results in limited granularity on the exact severity of comorbid conditions, such as liver disease, and prevents us from capturing baseline triglyceride values, which precludes their inclusion in the model. These shortcomings in our dataset highlight the need for additional prospective work to incorporate and further assess these features in future work. Second, while the proposed nested CV approach is thorough, it will be computationally intensive and time-consuming. This limits the feasibility of future replications of the proposed study methodology, especially in resource-constrained settings or when using larger datasets. Finally, while the generalisability of our dataset is improved by including information from multiple Mayo Clinic sites, these sites may share similar treatment or data recording protocols that reduce the external validity of our developed models.
Overall, our protocol outlines a comprehensive approach to developing and validating a clinical tool designed to provide a personalised risk estimate of developing hypertriglyceridemia when sedated with continuous propofol infusion while on IMV. In the absence of routine triglyceride monitoring, it remains important to prevent the development of hypertriglyceridemia in an effort to avoid a sequela of adverse events.
supplementary material
Footnotes
Funding: This work is supported by the Mayo Clinic Critical Care Research Committee, as well as the National Heart, Lung, and Blood Institute (NHLBI) of the National Institute of Health (NIH) Grant Number K23HL151671 (Recipient: Hemang Yadav). Its contents are solely the responsibility of the authors and do not necessarily represent the official views of the NIH.
Prepublication history and additional supplemental material for this paper are available online. To view these files, please visit the journal online (https://doi.org/10.1136/bmjopen-2024-092594).
Provenance and peer review: Not commissioned; externally peer reviewed.
Patient consent for publication: Not applicable.
Patient and public involvement: Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.
References
- 1.Sahinovic MM, Struys M, Absalom AR. Clinical Pharmacokinetics and Pharmacodynamics of Propofol. Clin Pharmacokinet. 2018;57:1539–58. doi: 10.1007/s40262-018-0672-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Devlin JW, Skrobik Y, Gélinas C, et al. Clinical Practice Guidelines for the Prevention and Management of Pain, Agitation/Sedation, Delirium, Immobility, and Sleep Disruption in Adult Patients in the ICU. Crit Care Med. 2018;46:e825–73. doi: 10.1097/CCM.0000000000003299. [DOI] [PubMed] [Google Scholar]
- 3.Kotani Y, Shimazawa M, Yoshimura S, et al. The experimental and clinical pharmacology of propofol, an anesthetic agent with neuroprotective properties. CNS Neurosci Ther. 2008;14:95–106. doi: 10.1111/j.1527-3458.2008.00043.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Devlin JW, Lau AK, Tanios MA. Propofol-associated hypertriglyceridemia and pancreatitis in the intensive care unit: an analysis of frequency and risk factors. Pharmacotherapy. 2005;25:1348–52. doi: 10.1592/phco.2005.25.10.1348. [DOI] [PubMed] [Google Scholar]
- 5.Pancholi P, Wu J, Lessen S, et al. Triglyceride Concentrations and Their Relationship to Sedation Choice and Outcomes in Mechanically Ventilated Patients Receiving Propofol. Ann Am Thorac Soc. 2023;20:94–101. doi: 10.1513/AnnalsATS.202205-403OC. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Devaud J-C, Berger MM, Pannatier A, et al. Hypertriglyceridemia: a potential side effect of propofol sedation in critical illness. Intensive Care Med. 2012;38:1990–8. doi: 10.1007/s00134-012-2688-8. [DOI] [PubMed] [Google Scholar]
- 7.Dube KM, Szumita PM, Rocchio MA, et al. The Effect of Concomitant Sirolimus and Propofol Therapy on Triglyceride Concentrations in Critically Ill Patients. Am J Ther. 2019;26:e103–9. doi: 10.1097/MJT.0000000000000461. [DOI] [PubMed] [Google Scholar]
- 8.Howell MD, Corrado GS, DeSalvo KB. Three Epochs of Artificial Intelligence in Health Care. JAMA. 2024;331:242–4. doi: 10.1001/jama.2023.25057. [DOI] [PubMed] [Google Scholar]
- 9.Deng J, Heybati K, Park Y-J, et al. Artificial intelligence in clinical practice: A look at ChatGPT. Cleve Clin J Med. 2024;91:173–80. doi: 10.3949/ccjm.91a.23070. [DOI] [PubMed] [Google Scholar]
- 10.Ching T, Himmelstein DS, Beaulieu-Jones BK, et al. Opportunities and obstacles for deep learning in biology and medicine. J R Soc Interface. 2018;15:20170387. doi: 10.1098/rsif.2017.0387. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Collins GS, Moons KGM, Dhiman P, et al. TRIPOD+AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods. BMJ. 2024;385:e078378. doi: 10.1136/bmj-2023-078378. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Wilimitis D, Walsh CG. Practical Considerations and Applied Examples of Cross-Validation for Model Development and Evaluation in Health Care: Tutorial. JMIR AI . 2023;2:e49023. doi: 10.2196/49023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Wainer J, Cawley G. Nested cross-validation when selecting classifiers is overzealous for most practical applications. Expert Syst Appl. 2021;182:115222. doi: 10.1016/j.eswa.2021.115222. [DOI] [Google Scholar]
- 14.Bradshaw TJ, Huemann Z, Hu J, et al. A Guide to Cross-Validation for Artificial Intelligence in Medical Imaging. Radiol Artif Intell . 2023;5:e220232. doi: 10.1148/ryai.220232. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Kursa MB, Rudnicki WR. Feature Selection with theBorutaPackage. J Stat Softw. 2010 doi: 10.18637/jss.v036.i11. [DOI] [Google Scholar]
- 16.Muthukrishnan R, Rohini R. LASSO: a feature selection technique in predictive modeling for machine learning. 2016 IEEE International Conference on Advances in Computer Applications (ICACA); Coimbatore, India. 2016. [DOI] [Google Scholar]
- 17.Harrell FE, Jr, Lee KL, Califf RM, et al. Regression modelling strategies for improved prognostic prediction. Stat Med. 1984;3:143–52. doi: 10.1002/sim.4780030207. [DOI] [PubMed] [Google Scholar]
- 18.Harrell FE, Jr, Lee KL, Mark DB. Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med. 1996;15:361–87. doi: 10.1002/(SICI)1097-0258(19960229)15:43.0.CO;2-4. [DOI] [PubMed] [Google Scholar]
- 19.Heybati K, Deng J, Xie G, et al. Propofol, Triglycerides, and Acute Pancreatitis: A Multi-Center Epidemiologic Analysis. Ann Am Thorac Soc . 2024 doi: 10.1513/AnnalsATS.202407-781OC. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Ashenden SK, Bartosik A, Agapow P-M, et al. In: The Era of Artificial Intelligence, Machine Learning, and Data Science in the Pharmaceutical Industry. Ashenden SK, editor. Cambridge, MA, USA: Academic Press; 2021. Chapter 2 - introduction to artificial intelligence and machine learning; pp. 15–26. [Google Scholar]
- 21.Buuren S, mice G-OK. Multivariate Imputation by Chained Equations inR. J Stat Softw. 2011 doi: 10.18637/jss.v045.i03. [DOI] [Google Scholar]
- 22.Bodner TE. What Improves with Increased Missing Data Imputations? Structural Equation Modeling: A Multidisciplinary Journal. 2008;15:651–75. doi: 10.1080/10705510802339072. [DOI] [Google Scholar]
- 23.White IR, Royston P, Wood AM. Multiple imputation using chained equations: Issues and guidance for practice. Stat Med. 2011;30:377–99. doi: 10.1002/sim.4067. [DOI] [PubMed] [Google Scholar]
- 24.Apicella A, Isgrò F, Prevete R. Don’t push the button! exploring data leakage risks in machine learning and transfer learning. SSRN. 2024 doi: 10.2139/ssrn.4733889. Preprint. [DOI] [Google Scholar]
- 25.Shapiro SS, Wilk MB. An analysis of variance test for normality (complete samples) Biometrika. 1965;52:591–611. doi: 10.1093/biomet/52.3-4.591. [DOI] [Google Scholar]
- 26.Ghasemi A, Zahediasl S. Normality tests for statistical analysis: a guide for non-statisticians. Int J Endocrinol Metab. 2012;10:486–9. doi: 10.5812/ijem.3505. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.de Amorim LBV, Cavalcanti GDC, Cruz RMO. The choice of scaling technique matters for classification performance. Appl Soft Comput. 2023;133:109924. doi: 10.1016/j.asoc.2022.109924. [DOI] [Google Scholar]
- 28.Cabello-Solorzano K, Araujo I, Peña M, et al. Lecture Notes in Networks and Systems. Cham: Springer Nature Switzerland; 2023. The impact of data normalization on the accuracy of machine learning algorithms: a comparative analysis; pp. 344–53. [Google Scholar]
- 29.James G, Witten D, Hastie T, et al. An Introduction to Statistical Learning: With Applications in R. New York, NY: Springer Science & Business Media; 2013. [Google Scholar]
- 30.Jolliffe IT, Cadima J. Principal component analysis: a review and recent developments. Philos Trans A Math Phys Eng Sci. 2016;374:20150202. doi: 10.1098/rsta.2015.0202. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Ke G, Meng Q, Finley T, et al. LightGBM: a highly efficient gradient boosting decision tree. In: Luxburg U, Guyon I, Bengio S, editors. NIPS’17: Proceedings of the 31st International Conference on Neural Information Processing Systems; Red Hook, NY, USA: Curran Associates Inc; pp. 3149–57. [Google Scholar]
- 32.Friedman JH. Greedy function approximation: A gradient boosting machine. Ann Statist. 2001;29:1189–232. doi: 10.1214/aos/1013203451. [DOI] [Google Scholar]
- 33.Rashmi KV, Gilad-Bachrach R. DART: dropouts meet multiple additive regression trees. arXiv. 2015 doi: 10.48550/arXiv.1505.01866. [DOI] [Google Scholar]
- 34.Chen T, Guestrin C. XGBoost: a scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2016. [DOI] [Google Scholar]
- 4.He K, Zhang X, Ren S, et al. Delving deep into rectifiers: surpassing human-level performance on imagenet classification. 2015 IEEE International Conference on Computer Vision (ICCV); Santiago, Chile. [DOI] [Google Scholar]
- 36.Srivastava N, Hinton G, Krizhevsky A, et al. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. J Mach Learn Res. 2014;15:1929–58. [Google Scholar]
- 37.Ioffe S, Szegedy C. Batch Normalization: Accelerating deep network training by reducing internal covariate shift. Arxiv. 2015 doi: 10.48550/ARXIV.1502.03167. [DOI] [Google Scholar]
- 38.Loshchilov I, Hutter F. Decoupled weight decay regularization. arXiv. 2017 doi: 10.48550/arXiv.1711.05101. [DOI] [Google Scholar]
- 39.Jones DR, Schonlau M, Welch WJ. Efficient Global Optimization of Expensive Black-Box Functions. Journal of Global Optimization. 1998;13:455–92. doi: 10.1023/A:1008306431147. [DOI] [Google Scholar]
- 40.Snoek J, Larochelle H, Adams RP. Practical Bayesian optimization of machine learning algorithms. Arxiv. 2012 doi: 10.48550/ARXIV.1206.2944. [DOI] [Google Scholar]
- 41.Turner R, Eriksson D, McCourt M, et al. Bayesian optimization is superior to random search for machine learning hyperparameter tuning: analysis of the black-box optimization challenge 2020. arXiv. 2021 doi: 10.48550/arXiv.2104.10201. [DOI] [Google Scholar]
- 42.Niculescu-Mizil A, Caruana R. Predicting good probabilities with supervised learning. 22nd international conference; Bonn, Germany. 2005. Available. [DOI] [Google Scholar]
- 43.Platt JC. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Advances in large margin classifiers. 1999;10:61–74. [Google Scholar]
- 44.Youden WJ. Index for rating diagnostic tests. Cancer. 1950;3:32–5. doi: 10.1002/1097-0142(1950)3:13.0.co;2-3. [DOI] [PubMed] [Google Scholar]
- 45.Steyerberg EW, Vickers AJ, Cook NR, et al. Assessing the performance of prediction models: a framework for traditional and novel measures. Epidemiology. 2010;21:128–38. doi: 10.1097/EDE.0b013e3181c30fb2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Breiman L. Random Forests. Mach Learn. 2001;45:5–32. doi: 10.1023/A:1010933404324. [DOI] [Google Scholar]
- 47.Lundberg S, Lee S-I. A unified approach to interpreting model predictions. arXiv. 2017 doi: 10.48550/arXiv.1705.07874. [DOI] [Google Scholar]
- 48.Fang ML, Dhami DS, Kersting K. Artificial Intelligence in Medicine. Cham: Springer International Publishing 2022:178–88; DP-ctgan: differentially private medical data generation using ctgans. [Google Scholar]



