. Author manuscript; available in PMC: 2014 Aug 15.

Published in final edited form as: Clin Cancer Res. 2013 Jun 18;19(16):4315–4325. doi: 10.1158/1078-0432.CCR-12-3937

Table 1.

practical issues and recommendations for the development and the translation of molecular classifiers in oncology

Step of development and translation	Issue	Proposal for best-practice
Experimental design	Selection and curation of the datasets	Cross-talk between Oncologists, Biostatisticians and Bioinformaticians is warranted to choose the most appropriate data Appropriate selection of the samples according to the clinical and biological variables Heterogeneity of the clinico-pathological variables between datasets should be evaluated and possibly adjusted Sample size assessment should be processed a priori
Pre-processing step	Latent unwanted structure embedded in data has the potential to dramatically impact the analysis Importance of preprocessing procedures on subsequent data analysis is neglected When translating the classifier at bedside, pre-processing a unique sample is challenging	Raw data should be used for all subsequent analysis All the pre-processing steps should be described explicitly (including the normalization, the re-scaling and the correction for adjustment variables employed) Pre-processing code and pre-analytical plots showing the structure of the data should be provided Use of reference or housekeeper features. Use of universal or control samples as reference to be processed simultaneously to the patient sample.
Statistical analysis	Multiple comparisons Resubstitution bias Large p - small n Robustness of the model (multiple local optima)	Report multiple correction adjustments for all statistics Validation data should be kept entirely and wholly separated from the training data to ensure no potential for contamination Apply methods that address the large p small n problem (e.g. ridge regression, lasso, principal component regression, partial least squares, etc.) Internal assessment of the performance of the model (cross-validation, bootstrapping) The analysis method and the code used to process it should be made publicly available
Performance assessment	Generalization of the model in other dataset(s) Kaplan-Meier plots and heatmaps are not adequate to assess the performance of the model Medical or biological utility	Stability of the performance must be validated in external dataset(s) ROC AUC is relevant for binary endpoints; RMSE and R² are relevant for continuous endpoint; time dependent AUC or concordance index are relevant for survival endpoints. The performance of the classifier must be compared with existing standard estimators
Clinical development	Routine measurement of molecular classifiers is limited Current clinical trial designs do not incorporate biomarkers Limits on testing of archival pathology specimens	Expand the routine capture of pathological specimens and data in the clinic. Incorporate biomarker validation in the design of clinical trials (crosstalk with biostatistician required) Ensure the samples of the patients enrolled in ongoing and future prospective trials are preserved for subsequent and unplanned analysis provided appropriate consents are given.
Translation at bedside	Molecular data derived from current sampling standards are highly context specific (intra-tumoral heterogeneity, treatment effect, host effect, etc.) Uncertainty of the results of the classifiers are rarely revealed to the oncologist Poor training in modern technologies and methods is a major limit in the translation of molecular classifiers at bedside	Increase the number of tumor samples to achieve a better representation of the disease. (Sequential biopsies, primary and metastatic sites) Confidence intervals of the results should be provided to the oncologist for the decision to be made in the patient’s context. Promote the cross-training of oncologists and cancer biologists in computational biology, systems biology and biostatistics