Fig. 2.

Workflow for the development and validation of machine learning model for predicting acquired taxane resistance (ATR). The pipeline consists of three main parts: cross-study normalization, transformation into pathway information and model construction. The study cohort was preprocessed and splited into an internal development and validation cohort and an external blind validation cohort. An empirical Bayes approach (Combat) method was used for cross-study normalization. Transforming gene expression level information into pathway-level score for each individual sample was conducted using three curated pathway databases (Kyoto Encyclopedia of Genes and Genomes [KEGG], Pathway Interaction Database [PID], and BioCarta). Using these pathway-level score matrix, penalized regression model was constructed. Parameter optimization of the prediction model was conducted using leave-one-out cross validation (LOOCV) with Efficient Parameter Selection via Global Optimization (EPSGO) algorithm. QC, quality control; CGP, Cancer Genome Project; PTX, paclitaxel; DTX, docetaxel; CCLE, Cancer Cell Line Encyclopedia; EM, Empirical Bayes Method; PDS, pathway dysregulation scores; PC, principal component; AUROC, area under the receiver operating curve; AUPRC, area under the precision-recall curve; ACC, accuracy.