Skip to main content
. Author manuscript; available in PMC: 2014 Aug 15.
Published in final edited form as: Clin Cancer Res. 2013 Jun 18;19(16):4315–4325. doi: 10.1158/1078-0432.CCR-12-3937

Table 1.

practical issues and recommendations for the development and the translation of molecular classifiers in oncology

Step of development and translation Issue Proposal for best-practice
Experimental design Selection and curation of the datasets Cross-talk between Oncologists, Biostatisticians and Bioinformaticians is warranted to choose the most appropriate data
Appropriate selection of the samples according to the clinical and biological variables
Heterogeneity of the clinico-pathological variables between datasets should be evaluated and possibly adjusted
Sample size assessment should be processed a priori
Pre-processing step Latent unwanted structure embedded in data has the potential to dramatically impact the analysis
Importance of preprocessing procedures on subsequent data analysis is neglected

When translating the classifier at bedside, pre-processing a unique sample is challenging
Raw data should be used for all subsequent analysis
All the pre-processing steps should be described explicitly (including the normalization, the re-scaling and the correction for adjustment variables employed)
Pre-processing code and pre-analytical plots showing the structure of the data should be provided
Use of reference or housekeeper features.
Use of universal or control samples as reference to be processed simultaneously to the patient sample.
Statistical analysis Multiple comparisons
Resubstitution bias
Large p - small n
Robustness of the model (multiple local optima)
Report multiple correction adjustments for all statistics
Validation data should be kept entirely and wholly separated from the training data to ensure no potential for contamination
Apply methods that address the large p small n problem (e.g. ridge regression, lasso, principal component regression, partial least squares, etc.)
Internal assessment of the performance of the model (cross-validation, bootstrapping)
The analysis method and the code used to process it should be made publicly available
Performance assessment Generalization of the model in other dataset(s)
Kaplan-Meier plots and heatmaps are not adequate to assess the performance of the model
Medical or biological utility
Stability of the performance must be validated in external dataset(s)
ROC AUC is relevant for binary endpoints; RMSE and R2 are relevant for continuous endpoint; time dependent AUC or concordance index are relevant for survival endpoints.
The performance of the classifier must be compared with existing standard estimators
Clinical development Routine measurement of molecular classifiers is limited
Current clinical trial designs do not incorporate biomarkers
Limits on testing of archival pathology specimens
Expand the routine capture of pathological specimens and data in the clinic.
Incorporate biomarker validation in the design of clinical trials (crosstalk with biostatistician required)
Ensure the samples of the patients enrolled in ongoing and future prospective trials are preserved for subsequent and unplanned analysis provided appropriate consents are given.
Translation at bedside Molecular data derived from current sampling standards are highly context specific (intra-tumoral heterogeneity, treatment effect, host effect, etc.)
Uncertainty of the results of the classifiers are rarely revealed to the oncologist
Poor training in modern technologies and methods is a major limit in the translation of molecular classifiers at bedside
Increase the number of tumor samples to achieve a better representation of the disease. (Sequential biopsies, primary and metastatic sites)
Confidence intervals of the results should be provided to the oncologist for the decision to be made in the patient’s context.
Promote the cross-training of oncologists and cancer biologists in computational biology, systems biology and biostatistics