Abstract
The rapid evolution of machine learning has led to a proliferation of sophisticated models for predicting therapeutic responses in cancer. While many of these show promise in research, standards for their clinical evaluation and adoption are lacking. Here we propose seven hallmarks all predictive oncology models might seek to address. These are Data Relevance/Actionability, Expressive Architecture, Standardized Benchmarking, Generalizability, Interpretability, Accessibility and Reproducibility, and Fairness. Considerations for each hallmark are discussed along with an example model scorecard. We encourage the broader community– including researchers, clinicians, and regulators– to engage in shaping these guidelines towards a concise set of standards.
Introduction
Predictive oncology can be defined as the branch of precision medicine focused on improving cancer treatment outcomes by customizing therapeutic decisions for each patient based on all available information – genetic, molecular, cellular, and clinical (1–3). This strategy stands in contrast to a conventional one-size-fits-all approach, in which all patients with the same tumor type and stage receive the same standard of care therapy. Current examples of predictive oncology that are well-established in the clinic include BCR-ABL gene fusions in chronic myeloid leukemia, which are targeted by imatinib (4,5); BRCA gene mutations in ovarian, breast, and other cancers, which are targeted by PARP inhibitors (6–8); and activating mutations in BRAF in melanoma and non-small cell lung tumors, which are targeted by BRAF or MEK inhibitors (9,10).
While such targeted therapeutics have been transformative, many patients lack the required biomarkers, and those with the right biomarkers may fail to respond (11) or develop resistance (12). This challenge of distinguishing responders from non-responders extends to non-targeted chemotherapeutics, where standard molecular indications are often lacking despite the many patients with innate or acquired chemotherapy resistance (13). For these reasons, the last few years have seen a surge of interest in predictive oncology models that use extensive molecular profiling and machine learning to integrate information from not just one biomarker or gene, but from dozens to thousands (14–16) (Figs. 1A,B). Modern molecular profiling enables the large-scale identification of genetic mutations and copy number alterations from tumor tissue samples and/or whole blood, profiled for the numerous genes included on clinical cancer gene panels (17,18) or, alternatively, ascertained by whole-exome or whole-genome sequencing in cancer research studies (19–21). Molecular profiling may also include omics layers such as DNA methylation, mRNA expression, or protein abundance and phosphorylation. These data have enabled greatly expanded precision oncology models that consider modulators of drug response not only in the targeted mechanism(s), but also in proteins that physically or functionally interact with the target in related molecular pathways. A growing number of multi-genic models have been commercialized (22–28), some of which, such as MammaPrint (22), OncoTypeDX (23), and microsatellite instability (MSI) (24,25) are in active clinical use.
Figure 1. Predictive modeling workflow and development timeline.
A, High-level schematic of the standard pipeline for formulating a modern predictive oncology model. B, Approximate number of predictive oncology models published per year over the time period 2000 – 2023. PubMed query using search string ‘predictive oncology AND (“machine learning” OR “artificial intelligence”)’ with filter: ‘from 2000 – 2023’.
Despite this progress, the vast majority of advanced predictive oncology models have not yet achieved widespread clinical impact (29,30), for several reasons. One key challenge lies in generalizability; models trained on preclinical datasets often fail to translate to patient data. This limitation primarily arises from limited access to data and disparities between preclinical training and real-world contexts, compounded by the heterogeneity of patient populations and the dynamic nature of disease status. Another major challenge relates to model transparency and interpretability – the ability to scrutinize the inner workings of a model and explain the biomolecular factors that underlie each of its predictions. The lack of model interpretation has been recognized as one of the most important barriers to building trustworthy AI systems in high-stakes clinical applications (31). In addition to challenges in model development, the successful clinical application of predictive oncology models also faces infrastructure and regulatory hurdles. The financial, computational, and regulatory resources needed to run both retrospective and prospective studies are rarely available outside major biomedical research campuses, especially in low-income regions. Such issues are exacerbated by the absence of a standardized predictive modeling “checklist”, such as the highly useful guidelines that have been developed for vetting prognostic biomarkers (32) or reporting clinical trials (33). These challenges, among others, highlight the need for a set of structured recommendations for model development, which clearly enumerate the methodological and clinical utility risks.
To gather a comprehensive perspective on these fundamental challenges, we engaged the community through a series of workshops involving a broad panel of cancer biologists, clinicians, ethicists, and machine learning specialists (including all of the authors on this Review). A major outcome of these meetings was to formulate key characteristics, or “modeling hallmarks”, which all predictive oncology models should strive to address (Fig. 2). The resulting recommended hallmarks are: 1) Data Relevance and Actionability, ensuring the model’s input is both pertinent and actionable; 2) Expressive Architecture, denoting the model’s ability to capture complex biological interactions; 3) Standardized Benchmarking, for consistent model evaluation; 4) Demonstrated Generalizability, to ensure model performance across diverse settings; 5) Mechanistic Interpretability, for understanding the biological basis of model predictions; 6) Accessibility and Reproducibility, guaranteeing user-friendly model use; and 7) Fairness, to promote equitable model application across different patient demographics and resource-constrained communities. In addition, we considered the ethical principles that apply to each of the seven hallmarks and are important to maximize the societal benefits of therapy response models. Notably, these hallmarks are not meant to replace, but to complement, the more general established guidelines in the fields of AI and clinical practice including considerations of security, privacy, or energy (34,35) (arXiv:2104.10350). In what follows, we describe each hallmark and its implications for best practices in model research, development, and ultimate deployment.
Figure 2. The seven hallmarks of predictive oncology.
Starting from the top left, in clockwise order the proposed hallmarks are: Data Relevance and Actionability (yellow), Expressive Architecture (blue), Standardized Benchmarking (black), Demonstrated Generalizability (green), Mechanistic Interpretability (orange), Accessibility and Reproducibility (purple) and Fairness (pink), with Ethics underlying each of these hallmarks (yellow ring).
Hallmark 1: Data Relevance and Actionability
Data relevance refers to how closely the data used to train and evaluate a model conform to the intended clinical settings, i.e., predicting patient responses to therapy and related clinical outcome(s) of interest. In contrast, data actionability refers to the practical utility of the model inputs in influencing treatment decisions, i.e., whether the acquisition of that type of data facilitates meaningful interventions to improve patient outcomes.
Data relevant to training predictive oncology models have typically been derived from one of several sources. To provide large numbers of samples for training, a major avenue has been high-throughput drug response screens conducted in preclinical cancer models such as immortalized cancer cell lines (21,36–42), patient-derived tumor cells (PDTC) (43–45), patient-derived organoids (PDO) (46–48), and patient-derived xenografts in mice (PDX) (49,50). Similar drug response datasets gathered in the clinic, such as those provided by clinical trials (51–56) and large clinical research consortia (19,20,57), currently have fewer samples than are provided by high-throughput screens in tumor models but nonetheless are rapidly growing to sizes useful either for validation or direct model training. With this landscape in mind, clinical relevance increases progressively as one moves from data collected in 2D cell cultures to 3D organoid or sphere-culture systems to patient xenografts and (ultimately) to patients, while scalability and data completeness decrease. Cell cultures have the advantage of scalability and cost-effectiveness but do not recapitulate the tumor microenvironment, and they are susceptible to genetic drift (58,59), limiting their translational accuracy in predicting clinical outcomes (60). Patient-derived organoids and xenografts attempt to mitigate these issues in several ways: first, by using cancer cells derived from primary patient tumors rather than from immortalized lines (61–63), and second, by co-culture of tumor cells with various other cell types in organoids, or with host cells in xenografts, to mimic at least some aspects of the tumor microenvironment. Relative to cell lines, such additional features come at the expense of lower screening throughput and higher cost (64). On the other hand, training predictive models based on PDO and PDX platforms is potentially more scalable than using patient data directly; while inherently relevant, the collection of patient data faces practical challenges such as the lengthy and resource-intensive process of clinical data acquisition and management, the impossibility of screening a wide range of drugs per tumor, and the additional complexities introduced by patient heterogeneity and evolving health status (65).
In terms of data actionability, the data types most often used in predictive modeling include molecular profiles, such as the genome, epigenome, transcriptome, proteome, and metabolome (16,66), as well as histology images (67) and radiomics (68,69). Among these types, genetic alteration profiles are commonly used in predictive oncology models (16) and are also available clinically via cancer gene panels (17,18), making them highly actionable. Other ‘omics profiles are considerably less common in both pre-clinical and clinical datasets, and thus have limited actionability. Actionability is an evolving concept, however, as new data types emerge with interest for clinical use. For example, circulating tumor DNA has become a promising clinical data modality based on its ease of measurement from blood over successive follow-up visits, allowing longitudinal tracking of therapy resistance (70,71). Another issue concerning data actionability relates to standardization. For example, a common practice is to normalize mRNA or protein levels so that all tumor samples have the same mean and standard deviation (z-scoring). These models do not easily work for an isolated sample, making them less clinically actionable. Overall, data across different modalities and sources requires careful preprocessing to harmonize measurements and avoid data leakage between the data used to train and test the models.
Hallmark 2: Expressive Architecture
The architecture of a computational model refers to the structural layout of its components. It is a key consideration, as model architecture strongly governs the ability to integrate and extract information from input data to make predictions. In this context, architectural expressivity can be defined as the number and types of unique functions or patterns an architecture can capture (arXiv:1611.03530v2, arXiv:1606.05336v6). For example, deep neural networks tend to be highly expressive because successive layers within the network describe increasingly complex non-linear patterns as information moves from one layer to the next (72). The number and size of these neural network layers can make the model more or less expressive (73). Since the number of factors regulating therapy response can range from individual genetic mutations to multi-genic pathways to complex medical imaging (74–76), it is likely that predictive oncology models will ultimately require highly expressive architectures.
Expressivity affects how closely a model fits the training data, in a sliding scale referred to as the bias-variance tradeoff (77). Insufficiently expressive architectures tend to learn marginal associations between input features and outcomes with few interacting combinations of features, and thus only “loosely” fit the data. The resulting models have high bias, as predictions are insufficiently sensitive to the feature values of individual patients. Such models perform relatively poorly on both training and test data. Conversely, overly expressive models make predictions that closely follow the training data, leading to a high variance of predictions across patients and therapies. These models perform well on the training data but not when analyzing new “out-of-distribution” datasets. Given that most big biomedical datasets contain many irrelevant features unlikely to reappear in future samples, it is important to ensure that a complex model architecture does not create undue dependencies on spurious features at the expense of identifying predictive underlying relationships. Such high-variance models are overfit to the training data, whereas high-bias models, which do not fully capture the patterns present in the training data, are underfit. A major goal of model development is therefore to balance these opposing forces, selecting a “best-fit” architecture sufficiently expressive to model complex input patterns but simple enough to keep the model generalizable (78).
Open-source libraries like PyTorch, TensorFlow, and Scikit-Learn provide access to a wide range of learning algorithms, making it relatively easy to create models with highly expressive architectures. This situation thus enables model developers to select a highly complex architecture by default. However, given the bias-variance tradeoff mentioned above, complex models are especially susceptible to overfitting. In considering architectural choices, a useful guideline is the principle of parsimony or “Occam’s Razor”, i.e. seeking predictive oncology models that are as simple as possible but as complex as necessary. Developing the simplest model that achieves adequate performance may be more favorable in terms of generalizability (Hallmark 4) and interpretability (Hallmark 5), facilitating clinical translation. Techniques such as meta-learning (79), inductive bias (80,81), regularization (82,83), pruning (84), and feature selection are all of interest in building parsimonious architectures. When trade-offs between simple and complex models are unclear, a practical approach may be to present results from different architectures in a transparent and unbiased manner.
Hallmark 3: Standardized Benchmarking
Evaluating a predictive oncology model nearly always involves benchmarking, that is, the comparison of model performance against baseline models. However, the surge of published predictive oncology methods and the lack of agreed-upon benchmarking protocols within the community have made it challenging to carry out comparisons among the many available models (16). Currently, bespoke performance evaluations are devised for each new modeling publication, typically invoking simple baselines, varied compositions of datasets, inconsistent data splits, arbitrary scoring metrics, and with varying protocols for hyperparameter optimization (or none at all).
To enable rigorous comparative analysis of model strengths, weaknesses, and key factors contributing to predictive performance, it is crucial for the community to adopt standard procedures of evaluation against reference baseline models (i.e. against which future models should be evaluated). Three types of baseline models are typically seen in predictive modeling applications (16). A first straightforward option is to use standard out-of-the-box models from open-source libraries. In this case, priority should be given to libraries known for their training efficiency and accurate predictions. For example, regularized linear regression (83) and boosting algorithms (87) have been often (but not always) used for benchmarking in predictive oncology publications (76,88,89), and these might constitute a starting standard. Here the complexities of the chosen baseline models may be progressively increased to justify the need for a proposed model. A second option is ablation analysis, which can be performed at the level of the model or data. Model ablation involves substituting model attributes with simpler alternatives (e.g., replacing convolutional layers with dense layers) (90–92), while data ablation can involve removing proposed data representations or replacing them with more conventional ones (e.g., replacing molecular graph structures representing drug compounds with Morgan fingerprints) (93,94). Regardless, conducting ablation studies that systematically analyze the progressive removal or addition of complexity is essential. Lastly, a new model can be compared with other complex state-of-the-art models that have been recently released by the scientific community. While this option is perhaps the most laborious, benchmarking performance against advanced community models boosts credibility and opens important connections to ongoing complementary efforts in the field.
Another key consideration for developing standard benchmarks relates to the datasets used for model training, validation, and testing. Cross-validation is often used in the development of predictive oncology models to ensure these models have the ability to make accurate predictions on previously unseen data (95). To achieve standardized benchmarking across a broad collection of models, however, all of these models should be trained and evaluated on the same cross-validation dataset(s). If these specific datasets are not available, models should be evaluated on other, better-established benchmarking datasets that are fully open to the community.
A final standard to consider relates to the evaluation of model accuracy, where the ideal approach is to evaluate and present multiple performance metrics. For example, correlation and mean squared error might be used for evaluating regression models; AUPRC and odds ratio might be used for evaluating binary classification models; and hazard ratio and concordance index might be used for scoring survival analysis. Furthermore, providing complete information on true and predicted outcomes for each tumor drug response, for example in a supplemental table, is necessary to enable other investigators to evaluate model performance with their methods of choice. Multiple metrics paint a fuller picture of a model, especially when outcome labels are imbalanced (e.g., if drug-resistant outcomes are far more frequent than drug-responsive ones).
Hallmark 4: Demonstrated Generalizability
Translating predictive oncology models into clinical practice poses a variety of challenges, reflecting the complexity of models as well as the stringent standards required for clinical application. The challenge of generalizability in predictive oncology revolves around the ability of models to perform accurately and reliably across different application scenarios, sites of care, clinical workflows, patient populations, or data distributions that may not have been part of the initial training data. Other problems with generalizability may arise if a new group fails to implement the model correctly due to insufficient documentation (see Hallmark 6).
Predictive models can fail to generalize for at least two reasons:
Disparity between preclinical and clinical contexts:
A model trained from preclinical drug response data (e.g. tumor cell lines, PDX, PDO) may fail to translate to clinical predictions in specific patient contexts. For example, models trained on preclinical drug screens are designed to predict change in cell fitness given a particular drug treatment (16). However, the intended clinical application may require predicting a patient’s overall or recurrence-free survival, or estimating other measurements included in clinical guidelines, e.g., RECIST response rate (Response Evaluation Criteria in Solid Tumors) (96). Disparities may also occur when a subset of input features are not present in the patient data or when they are measured differently due to batch effects or the use of different technologies, requiring data harmonization to ensure efficient application of the predictive oncology models.
Heterogeneity in patient populations:
Extensive biological differences across patient populations such as age, sex, race or ethnicity, and genetic background are inherently difficult to generalize. A model trained on one population may not perform well when applied to a population with differing characteristics. Such problems are exacerbated by “drift in procedures” (97), whereby clinical practices evolve or the composition of the target population changes, leading to batch effects and degradation in model performance.
Strategies to ensure generalizability of predictive oncology models, and maximize the likelihood of successful clinical translation, are as follows. First, models should be rigorously assessed using external datasets that best reflect the intended clinical application. This assessment ensures that models can make sensible predictions for previously unseen patient data. Second, models should be continually tested against clinical confounders, including age, sex, and ethnicity. Ideally, the assessment should be conducted on a cancer-specific, representative cross-section of patients (98) selected under the same standards as for clinical trials, verifying that demographic and clinical distributions of selected test sets match those observed in practice (arXiv:2001.09765). Evaluating the model from this perspective is closely related to fairness (Hallmark 7). Third, techniques such as domain adaptation and transfer learning (90) can bridge the gap between in-vitro and in-vivo data (99,100) while ascertaining a model’s robustness. Finally, generalizability can be evaluated on retrospective data, but ultimately a prospective clinical study is needed to validate the real-world utility of a model.
Hallmark 5: Mechanistic Interpretability
The concept of model interpretability refers to the ability to understand the inner workings of a model and the logic by which it uses input features to make predictions. In the context of predictive oncology, interpretable models elucidate the underlying biological factors leading to this response, such as genetic mutations, drug mechanisms of action, and molecular pathways relevant to cancer. Such interpretation is often accomplished by the use of a clear and simple model structure, such as that provided by decision trees (87,101) or linear regression models (83). In contrast, complex models such as neural networks tend to be more difficult to interpret, creating some degree of tension between this hallmark (Mechanistic Interpretability) and Hallmark 2 (Expressive Architecture). Nonetheless, a range of complex interpretable models have been recently developed by integrating prior biological knowledge, such as defined molecular pathways or tissue types, directly into the input data or the internal architecture of the model (74–76,102–106).
A closely related concept to interpretability is model explainability, which aims to provide meaningful justifications for a model’s decisions by highlighting the most influential features or factors considered by the model for that specific instance. Explainability is typically implemented using post-hoc analysis algorithms such as SHAP, LIME, and DeepLIFT, among others (107) (arXiv:1705.07874v2, arXiv:1704.02685v2, arXiv:1610.02391v4).
Understanding the reasons why a model predicts specific treatment outcomes promotes trust by all parties involved, including patients, healthcare providers, and regulatory bodies (108,109). After all, cancer clinicians and biomedical scientists must routinely explain to each other the reasons behind their opinions, and such discussion can inform therapy selection and is a cornerstone of continued scientific progress. Should a predictive model be held any less accountable? On the other hand, model interpretability need not be a strict requirement provided strong trust has already been established. In this respect, we see two acceptable scenarios. The first scenario is a fully interpretable model which is able to explain its predictions by prediction rules that are in line with medical evidence or can be verified experimentally. The second scenario is a model that is not readily interpretable but has a significant track record of accuracy, robustness, and generalizability.
Beyond trust, the ability to readily interpret model decisions facilitates numerous aspects of predictive oncology. Examining the inner workings of a model can help detect biases, batch effects, errors in data collection or processing, and spurious features (110–112). Insights into feature relevance can improve model performance by highlighting which features should be included or excluded. Identifying relevant tumor features is also crucial to ensure that the most informative clinical data are available (Hallmark 1). Lastly, by fully understanding the limitations of the model, one can work to ensure that models are equally accurate across all populations (Hallmark 7).
Hallmark 6: Accessibility and Reproducibility
Given the complexity of developing a therapeutic response model, access to the model and its key components is necessary to ensure reproducibility, critical evaluation, and wide adoption. Here, accessibility refers to the availability of all materials related to a model of interest, including the data required to train and validate the model, the computer code, and the fully specified model itself including architecture, weights, parameters, and detailed documentation describing the model inputs and outputs. Reproducibility refers to the faithful recapitulation of published model results, including the full software environment and computational resources that implement the model software and train/test/validation datasets (113) (arXiv:2012.09932).
This hallmark is closely related to the so-called “FAIR” principles – Findability, Accessibility, Interoperability, and Reusability (114). To address the imperative of findability, training data and relevant fitted models should be easily discoverable across diverse use cases, employing standardized metadata and data formats (e.g., using programmatic pre-defined data structures or saved checkpoint states of data structures and trained models in PyTorch). Accessibility is achieved through the availability of original frameworks and components, adopting open-source practices, and ensuring the configuration of the correct software environment. Interoperability is fostered by the use of common data types, standardized formats for model inputs (features) and outputs (predictions), and centralized repositories that facilitate exchange of models (115,116), data, and metadata (117). Reusability is heightened through configurations ready for execution, open-source model practices, and adherence to standardized protocols. When these principles are put into action, models can be reproduced and reused “out of the box”, and their results can be understood as they relate to model development (118,119).
Without such considerations, models can become too complex to reproduce and reuse, with the consequence that they may not be considered for future endeavors even if preliminary results are promising. Managing model complexity involves promoting transparency (through detailed documentation of model structure and parameters) coupled with open science practices. In turn, lack of reproducibility can be partially mitigated by comprehensive documentation, and limitations in computational infrastructure can be alleviated by embracing standardized data formats, open-source methodologies, and efficient training pipelines. In these respects, open-source training frameworks and execution methods are ideal for minimizing issues encountered during testing and reuse. Nonetheless, there are many published models that do not share data, computer code, or fitted models, calling for more transparency in AI research (119). Even when the original authors fully adhere to FAIR principles during model development, issues can arise as future investigators attempt to execute the model. Indications and limitations of published models should be clearly documented to ensure the model is used as intended.
Hallmark 7: Fairness
Fairness is an oft neglected aspect of predictive oncology models and other medical technologies. In the growing literature on AI ethics (120,121), fairness is described in terms of parity, i.e., ensuring the model is beneficial to diverse patient populations. A first step is to characterize patient populations by the attributes for which equity is a key concern. Attributes of interest include, but are not limited to, race/ethnicity; sex; age; social determinants of health (SDH) such as levels of education and income (122); and relevant codes defined in the International Classification of Diseases (ICD) (123,124).
Because some of these patient attributes do not currently have sufficient coverage in the available cancer treatment data, therapy response models risk being unfair to the corresponding patient populations. BRAF mutations, for example, were first identified in melanoma patients of Caucasian descent leading to the development of BRAF inhibitors like vemurafenib (10,125). The prevalence of these mutations has turned out to be lower in other ethnic groups, potentially influencing treatment strategies (126,127). On the other hand, the essence of precision medicine is to tailor predictive models to specific individuals and subpopulations. To balance personalized recommendations with fairness to all patients, we should encourage the continual development of models relevant to underrepresented individuals such as those who have rare cancers, live in less developed countries and regions, or come from minority ethnic backgrounds. Collecting such representative data should be an early step in the development of precision oncology models, whether these models are intended to be universal or specialized (Fig. 3A). For models developed from preclinical data such as immortalized cancer cell lines, attention should be paid to the biases that exist in the current portfolio of preclinical models (128,129).
Figure 3. Framework for ensuring fairness of a predictive oncology model.
From left to right are depicted four key aspects of fairness, related to A, training, B, testing, C, interpretation, and D, communication. SDH, Social Determinants of Health; ICD, International Classification of Diseases. Created in BioRender.
Regardless of whether a model has been fairly balanced during training, a subsequent step is to perform rigorous “fair-aware” model validation and testing. The goal is to identify potential predictive bias within particular patient populations, which can prompt further model optimization in an attempt to explicitly rectify the model to reduce or eliminate such bias (Fig. 3B). For models with high mechanistic interpretability (Hallmark 5), the model can also be analyzed to reveal whether particular patient populations are associated with distinct molecular or cellular pathways used to make accurate predictions (Fig. 3C). Such pathway interpretations may be very helpful in illuminating key differences by which different race/SDH/ICD groups respond to a therapy.
Performance in machine learning is typically assessed by the four types of predictive outcomes – true positives, true negatives, false positives, and false negatives. Different combinations of these outcomes can result in different definitions of fairness (130). For instance, if a model mistakenly predicts a positive response to a drug for a particular population (false positive), and that drug has adverse health consequences, then the cost of a false positive prediction is high and one should consider more stringent model prediction thresholds. This design decision might be made even at the expense of increasing false negatives for that or another subpopulation. On the other hand, if the health consequences and opportunity costs of a drug are modest, then one might afford a more ambitious approach that maximizes true positives instead. One also has to consider inter-population trade-offs. If augmenting fairness can only be achieved by sacrificing accuracy for the largest population, then one may be forced to choose between prioritizing accuracy versus parity for less represented populations. There is no easy formula for optimizing accuracy/fairness trade-offs, although different moral theories (131,132) such as prioritarianism (133) may favor some choices over others. Ultimately, model developers should make design choices in consultation with clinicians, ethicists, and, ideally, with feedback from the various patient populations that will be affected (Fig. 3D).
Invoking the hallmarks: A practical example
To explore how these seven hallmarks might be implemented practically, we considered a particular precision oncology model known as TCRP (Translation of Cellular Response Prediction). TCRP was originally published by Ma et al. (134) and independently evaluated in a reusability report by So et al. (135), both of which involved present authors of this Perspective (Fig. 4A). This model uses techniques from transfer learning, specifically few-shot learning, to promote the construction of predictive models poised to readily generalize to new contexts, for example to new patient populations, patient-derived xenografts or tumor cells. To assess this model against each of the defined hallmarks, we implemented a general hallmarks checklist (HARMONY, Table 1) along with a scoring criterion on a scale of 5 to 0, where 5 indicates the highest alignment with the hallmark, and 0 signifies the lowest, as follows:
Figure 4. Sample model rating and comparison, with self-critique of authors’ own model.
A, Overview of the TCRP model (Transfer of Cellular Response Prediction) approach. Adapted with permission from Ma et al. (134). B, Scorecard for the TCRP model based on the criteria defined for scoring each hallmark.
Table 1.
HARMONY Checklist (Hallmark-Adhering Recommendations for Models in predictive ONcologY).
| RATIONALE |
|
|
| 1 ☐ State the model objective and prior works. |
|
|
| ETHICAL CONSIDERATIONS |
|
|
| 2 ☐ Discuss ethical considerations in model development and deployment. |
| 3 ☐ Outline protocols for patient data privacy. |
|
|
| HALLMARKS |
|
|
| Data Relevance and Actionability |
| 4 ☐ Specify data sources and criteria for data selection. |
| 5 ☐ Define actionability in the context of model objectives. |
| Expressive Architecture |
| 6 ☐ Describe the model structure and capability to represent biological complexity. |
| 7 ☐ Detail the computational methods and algorithms used. |
| Standardized Benchmarking |
| 8 ☐ Outline benchmark methods and metrics for evaluation. |
| 9 ☐ Describe the procedure for model validation (e.g. cross validation). |
| Demonstrated Generalizability |
| 10 ☐ Outline validation datasets, methods, and metrics for evaluation. |
| 11 ☐ Present model performance across various datasets. |
| Mechanistic Interpretability |
| 12 ☐ Explain how the model elucidates molecular pathways or mechanisms. |
| 13 ☐ Detail the explainability techniques used. |
| Accessibility and Reproducibility |
| 14 ☐ Specify code repository and data format (input and output). |
| 15 ☐ Describe the training and evaluation procedures including dependencies. |
| Fairness |
| 16 ☐ Outline approaches to ensure equitable model performance across populations. |
| 17 ☐ Describe measures to prevent bias in model predictions. |
|
|
| APPLICATION |
|
|
| 18 ☐ Interpret results in the context of the hallmarks, discussing the model’s limitations and implications for future research. |
5: Model implements the hallmark with no identifiable limitations.
4: Model implements the hallmark with very few limitations.
3: Model implements the hallmark with several major limitations.
2: Model implements the hallmark but fails the evaluation.
1: Model has the necessary foundation but does not implement the hallmark.
0: Model does not consider the hallmark at all, or claims to incorporate it with no available foundation (such as a codebase) for evaluation.
Below, we score the TCRP model against each hallmark and discuss its relative strengths and weaknesses (Fig. 4B).
Hallmark 1. Data relevance and actionability [Score = 5].
The datasets used to construct TCRP are accessible via public repositories (38,43,49,136). The model was trained to predict tumor cellular response to drugs, based on dose-response curves also available in the same repositories. Two data types, somatic mutations, and mRNA expression profiles were used as input. At least one of the data types is mandatory for the model to run inference on a new tumor sample, with the presence of both mutation and expression data enabling better performance. For incorporating clinically relevant and actionable data types, we give it a score of 5.
Hallmark 2. Expressive architecture [Score = 4].
The few-shot deep learning algorithm used in TCRP allows a model pre-trained on one set of tumor samples to be fine-tuned on a small number of examples from a new context (e.g. new tumor type). By using a deep neural network architecture, TCRP has the ability to capture both linear and non-linear relationships between input features and drug responses, while allowing developers to tune the overall complexity of the model (by adjusting the number of hidden layers for instance). However, optimization of the model hyperparameters is computationally intensive and may have a strong impact on its performance and translational capabilities. For its “fit-for-purpose” but computationally demanding architecture, we assign it a score of 4.
Hallmark 3. Standardized benchmarking [Score = 3].
The predictive performance of the TCRP model was benchmarked using multiple independent preclinical datasets, covering tumor cell lines, PDTC, and PDX models. This variety of contexts was selected with the intent of showing the model’s flexibility and robustness. Pearson correlation between predicted and actual drug responses was used to quantify performance, which has been frequently used as a standard metric in preclinical biomarker discovery (137). Using this metric, TCRP was benchmarked against a panel of standard statistical and machine learning strategies, including regularized linear regression, random forests, and a simplified (1-layer) artificial neural network. Although these models represent a good baseline, they did not include some state-of-the-art drug response models such as MERIDA (138), ENLIGHT (139), logic models (106,140), and OncoTreat/OncoTarget (28). We thus give TCRP a score of 3 for its good, but still incomplete, benchmarking.
Hallmark 4. Demonstrated generalizability [Score = 3].
The goal of the few-shot learning framework is to increase model generalization, resulting in superior performance than other benchmarks in transfer across domains including new tumor cell lines, patient-derived xenografts, and clinical data from cancer patients. The original TCRP publication trained on drug responses in tumor cell lines and validated on PDTC and PDX datasets (134) was extended to two patient clinical trial datasets in the reusability report (135). Notably, TCRP outputs a drug response as an “Area Under Dose Response Curve”, since this was the unit of measurement in the tumor cell-line training data. In contrast, clinical drug response outcomes are measured by RECIST criteria, progression-free survival, or overall survival. As the TCRP framework does not provide any explicit method or algorithm to translate its predictions to clinical units, ad-hoc procedures had to be used (135). Additionally, implementing the few-shot algorithm in a prospective clinical trial can be difficult, as some clinical data must be used to fine-tune the model, potentially preventing the first patients enrolled in the trial from benefiting from the model’s prediction. For its good performance but limited evaluation in clinical trials and an incomplete end-to-end pipeline for clinical application, we give it a score of 3.
Hallmark 5. Mechanistic interpretability [Score = 3].
The TCRP architecture is a deep neural network optimized for predictive performance. The model architecture did not facilitate interpretation. However, LIME (107) was used to explain important predictive features for both mutation and expression data types. Although, in the future, TCRP can be adapted to leverage model architectural designs that prioritize biological interpretability (75,76), at present we give it a score of 3.
Hallmark 6. Accessibility and reproducibility [Score = 5].
From a computational standpoint, the TCRP framework is implemented with a compact and concise file setup, with easily accessible methods and processes. The training and testing modules can be debugged or modified easily, and the results from the TCRP framework can be interpreted by raw prediction values or benchmark correlation metrics. The entire codebase was released by the original authors, along with the data designed to reproduce the results from the original paper. However, the reusability report found inconsistency in hyperparameter configuration and training reference files. Consequently, So et al. (135) made improvements to the documentation and codebase and created a fully specified software environment hosted on Code Ocean [https://codeocean.com/capsule/8411716/tree/v2]. This more recent release increases the reusability and accessibility of TCRP and, as a result, we assign a score of 5.
Hallmark 7. Fairness [Score = 0].
TCRP was not originally designed or assessed with attention to principles of fairness. Clinical confounding factors such as age, sex, and race or ethnicity were not reported for training and validation datasets. Model predictions did not have bias or uncertainty measures. These shortcomings are frequent limitations of predictive modeling methods; regardless, it is imperative to ensure that the clinical deployment of these models benefits the larger population of patients, and that their indications and limitations are clearly considered and discussed. We assign TCRP a score of 0 for no explicit efforts in this category.
Discussion
Here, we have covered a set of seven hallmarks to serve as guidelines for the predictive modeling field, covering Data Relevance and Actionability, Expressive Architecture, Standardized Benchmarking, Demonstrated Generalizability, Mechanistic Interpretability, Accessibility and Reproducibility, and Fairness. Notably, these hallmarks are not intended as prescriptive, detailed instructions for building models: such particular implementation details are in fact expected to evolve rapidly, as they did even in the course of writing this article. Rather, the hallmarks are intended to capture fundamental, complementary concepts that are likely to remain relevant for many generations of model / biomarker development along the foreseeable trajectory of precision oncology.
With this article as the first step, we have aimed to establish the hallmarks to lay the foundation for reaching consensus and standards within the relevant research and clinical communities. Once the hallmarks have been established, a valuable direction, for instance, would be to apply the scoring criteria to rate the broad collection of contemporary models. The scientific community would identify a panel of reviewers based on the intended scope of the model, with expertise in machine learning, medicine, or biology. Model scores from the reviewer panel could be calibrated by converting them to percentile ranks, or by using scores to cluster models into a small number of categories with similar hallmark characteristics. Such a platform would enable the scientific community to better assess the state-of-the-art and identify gaps, and accordingly, make recommendations.
Several hallmarks might be addressed in large part by following relevant best practices that have gained maturity in other fields of research. For example, Hallmarks 3 (Standardized Benchmarking) and 6 (Accessibility and Reproducibility) are at least partially addressed by following common practices developed in software engineering over the past several decades. Software best-practice examples include Open-Source licensing, which facilitates the use, modification, and distribution of models in ways that proprietary licenses do not; unit testing, which detects and corrects errors early in the development cycle using both manual and automated approaches; and clear user interface, application programming interface, and code documentation, which facilitates adoption by new users and developers.
As of this writing, the potential of AI to impact numerous fields has been widely recognized, but the ultimate ramifications and implications are still very much evolving. We expect the field will continue to progress rapidly, creating and coping with problems as they arise rather than waiting for regulatory guidelines to be implemented or governing bodies to take action. To this point, the AI-directed legislation currently under consideration around the world (108,109,141) advances rather cautiously in comparison to the pace that new models and capabilities are being developed. Especially as relates to model transparency and interpretability (Hallmarks 5 and 6), steps toward regulatory compliance will help to foster trust among users of predictive oncology models. While these regulatory requirements are still under development, AI developers must nonetheless spearhead guidelines for model transparency (such as the HARMONY checklist, Table 1) and collaborate with stakeholders with diverse levels of AI literacy to improve transparency in terms they understand (142). Beyond initial regulatory requirements, deploying models in the electronic health record system requires close collaboration with the Information Technology (IT) teams and staff at the healthcare centers, with notable concerns for the care staff such as alert fatigue, lack of training, and increased workload (143). In addition, it is crucial to continuously monitor the performance of the deployed models across the whole patient population to account for changes in profiling technologies and the baseline characteristics of the patients (144,145) (arXiv:2305.02474). By working with stakeholders such as healthcare providers, IT teams, and regulators, AI developers can help mitigate these challenges and perceived risks (146) and potentially increase the adoption of models in a clinical setting.
Finally, we would be remiss to not also mention the impact of AI technologies on society more broadly. AI is a swiftly developing field, and we expect that the predictive oncology hallmarks developed herein can serve as a springboard for evolving discussions in responsible AI moving forward. By suggesting these standards and principles, we hope that the broader community, not just model developers but regulators, clinicians, and lawmakers will help refine and amend them to allow for a broad positive impact of predictive oncology and allied efforts. We call on these communities to be active participants in the discussion and to ensure that the discourse moves the field forward to realize the promise of technologies like AI for patient care.
Box 1. Ethical Considerations for Data Relevance and Actionability.
Collecting and sharing data derived from patients and tumor materials is not a trivial task, as it involves protecting patient privacy and must be done responsibly (35). If data can be de-identified with minimal risk of re-identification, the resulting data can be shared publicly through a data access committee or a binding data usage agreement.
Box 2. Ethical Considerations for Expressive Architecture.
Complex architectures often require larger computational resources, which may have a substantial ecological footprint and may not be readily available to a broad user base. Down-sized models can democratize their accessibility (85) (arXiv:2301.07014v3) and minimize the environmental impact (86) (arXiv:2104.10350).
Box 3. Ethical Considerations for Standardized Benchmarking.
Rigorously evaluating model performance using robust benchmarks and assessing added value compared to baseline and state-of-the-art models can justify future investment and increase the likelihood of clinical benefit. Additionally, whenever possible, model performance from both training and test data should be presented, including true labels and predictions of individual samples.
Box 4. Ethical Considerations for Demonstrated Generalizability.
A valuable exercise is to evaluate a predictive oncology model on diverse settings within its intended scope. For instance, does a model trained on adult tumors accurately predict response to pediatric tumors? Is a model trained predominantly using data from one ethnicity applicable to make drug response predictions in a new ethnically diverse population? Reporting such outcomes define the extent of the model’s generalizability and can help clinicians make informed decisions.
Box 5. Ethical Considerations for Mechanistic Interpretability.
A mechanistically interpretable model lends important insight into the treatment choice recommended to the patient. Even if the complexity of the information precludes complete comprehension by the physician or patient, providing an explanation adds legitimacy to the model’s prediction. Equally important, an interpretable model facilitates further research into the molecular pathways involved in response and resistance.
Box 6. Ethical Considerations for Accessibility and Reproducibility.
A fully specified predictive oncology model must be shared responsibly. Especially when it is trained or fine-tuned on clinical data, model sharing should not disclose private information from the original patient datasets. This issue can be prevented by enforcing differential privacy during model training or establishing a proper sharing mechanism (as is routinely done for sharing the data themselves) (arXiv:1412.7584). Regardless, access to the datasets used to train a model is highly preferred. Without knowledge of the training data, the only way to reliably assess a model is to carry out a prospective study, since any retrospective study could unwittingly include datasets used for training.
Box 7. Ethical Considerations for Fairness.
Beyond the above considerations, models can promote fairness through a variety of approaches to reduce health disparities. For example, predictive oncology models can focus attention on conditions that disproportionately affect underserved groups. Alternatively, developers can prioritize modeling of drugs that are highly accessible and affordable, as opposed to the many drugs that are unaffordable by underserved populations.
Statement of Significance.
As the field of AI evolves rapidly, hallmarks are intended to capture fundamental, complementary concepts necessary for the progress and timely adoption of predictive modeling in precision oncology. Through these hallmarks, we hope to establish standards and guidelines that enable the symbiotic development of AI and precision oncology.
Acknowledgments
This study was supported by grants from the National Institutes of Health (NIH) including the National Cancer Institute (NCI U54 CA274502 and P30 CA023100) and National Institute for General Medical Sciences (NIGMS P41 GM103504). Additional support was provided from the Eric and Wendy Schmidt AI in Science Postdoctoral Fellowship; Cancer Moonshot Task Order No. 75N91019F00134; Frederick National Laboratory for Cancer Research Contract 75N91019D00024; U.S. Department of Energy support to Argonne National Laboratory under Contract DE-AC02-06-CH11357; the American Cancer Society grant # IRG-19-230-48.
Conflicts of Interest
TI is a co-founder, member of the advisory board, and has an equity interest in Data4Cure and Serinus Biosciences. TI is a consultant for and has an equity interest in Ideaya Biosciences. The terms of these arrangements have been reviewed and approved by the University of California San Diego in accordance with its conflict-of-interest policies. B.A.P. has received research support to the institution from Pfizer, Genentech/Roche, Novartis, GlaxoSmithKline and Oncternal Therapeutics and had received consulting income from Daré Bioscience. K.T.Y received research support to the institution from Dantari, Gilead, Jazz Pharmaceuticals, Pfizer, Treadwell Therapeutics, and Zymeworks.
References
- 1.National Research Council, Division on Earth and Life Studies, Board on Life Sciences, Committee on A Framework for Developing a New Taxonomy of Disease. Toward Precision Medicine: Building a Knowledge Network for Biomedical Research and a New Taxonomy of Disease. National Academies Press; 2012. [Google Scholar]
- 2.Ashley EA. Towards precision medicine. Nat Rev Genet. 2016;17:507–22. [DOI] [PubMed] [Google Scholar]
- 3.Nelson MR, Tipney H, Painter JL, Shen J, Nicoletti P, Shen Y, et al. The support of human genetic evidence for approved drug indications. Nat Genet. 2015;47:856–60. [DOI] [PubMed] [Google Scholar]
- 4.Faderl S, Talpaz M, Estrov Z, O’Brien S, Kurzrock R, Kantarjian HM. The biology of chronic myeloid leukemia. N Engl J Med. 1999;341:164–72. [DOI] [PubMed] [Google Scholar]
- 5.Druker BJ, Sawyers CL, Kantarjian H, Resta DJ, Reese SF, Ford JM, et al. Activity of a specific inhibitor of the BCR-ABL tyrosine kinase in the blast crisis of chronic myeloid leukemia and acute lymphoblastic leukemia with the Philadelphia chromosome. N Engl J Med. 2001;344:1038–42. [DOI] [PubMed] [Google Scholar]
- 6.Bryant HE, Schultz N, Thomas HD, Parker KM, Flower D, Lopez E, et al. Specific killing of BRCA2-deficient tumours with inhibitors of poly(ADP-ribose) polymerase. Nature. 2005;434:913–7. [DOI] [PubMed] [Google Scholar]
- 7.Farmer H, McCabe N, Lord CJ, Tutt ANJ, Johnson DA, Richardson TB, et al. Targeting the DNA repair defect in BRCA mutant cells as a therapeutic strategy. Nature. 2005;434:917–21. [DOI] [PubMed] [Google Scholar]
- 8.Moore K, Colombo N, Scambia G, Kim B-G, Oaknin A, Friedlander M, et al. Maintenance Olaparib in Patients with Newly Diagnosed Advanced Ovarian Cancer. N Engl J Med. 2018;379:2495–505. [DOI] [PubMed] [Google Scholar]
- 9.Planchard D, Besse B, Groen HJM, Souquet P-J, Quoix E, Baik CS, et al. Dabrafenib plus trametinib in patients with previously treated BRAF(V600E)-mutant metastatic non-small cell lung cancer: an open-label, multicentre phase 2 trial. Lancet Oncol. 2016;17:984–93. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Chapman Paul B, Hauschild Axel, Robert Caroline, Haanen John B, Ascierto Paolo, Larkin James, et al. Improved Survival with Vemurafenib in Melanoma with BRAF V600E Mutation. N Engl J Med. Massachusetts Medical Society; 364:2507–16. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Hu C, Dignam JJ. Biomarker-Driven Oncology Clinical Trials: Key Design Elements, Types, Features, and Practical Considerations. JCO Precis Oncol [Internet]. 2019;3. Available from: 10.1200/PO.19.00086 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Nahta R, Esteva FJ. Herceptin: mechanisms of action and resistance. Cancer Lett. 2006;232:123–38. [DOI] [PubMed] [Google Scholar]
- 13.Malviya R, Singh AK, Yadav D. Multi-Drug Resistance in Cancer: Mechanism and Treatment Strategies. John Wiley & Sons; 2023. [Google Scholar]
- 14.Liang G, Fan W, Luo H, Zhu X. The emerging roles of artificial intelligence in cancer drug development and precision therapy. Biomed Pharmacother. 2020;128:110255. [DOI] [PubMed] [Google Scholar]
- 15.Ballester PJ, Stevens R, Haibe-Kains B, Huang RS, Aittokallio T. Artificial intelligence for drug response prediction in disease models. Brief Bioinform [Internet]. 2022;23. Available from: 10.1093/bib/bbab450 [DOI] [PubMed] [Google Scholar]
- 16.Partin A, Brettin TS, Zhu Y, Narykov O, Clyde A, Overbeek J, et al. Deep learning methods for drug response prediction in cancer: Predominant and emerging trends. Front Med. 2023;10:1086097. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Frampton GM, Fichtenholtz A, Otto GA, Wang K, Downing SR, He J, et al. Development and validation of a clinical cancer genomic profiling test based on massively parallel DNA sequencing. Nat Biotechnol. 2013;31:1023–31. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Beaubier N, Tell R, Lau D, Parsons JR, Bush S, Perera J, et al. Clinical validation of the tempus xT next-generation targeted oncology sequencing assay. Oncotarget. 2019;10:2384–96. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Hoadley KA, Yau C, Hinoue T, Wolf DM, Lazar AJ, Drill E, et al. Cell-of-Origin Patterns Dominate the Molecular Classification of 10,000 Tumors from 33 Types of Cancer. Cell 2018;173:291–304.e6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.AACR Project GENIE Consortium. AACR Project GENIE: Powering Precision Medicine through an International Consortium. Cancer Discov. 2017;7:818–31. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Barretina J, Caponigro G, Stransky N, Venkatesan K, Margolin AA, Kim S, et al. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature. 2012;483:603–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.van de Vijver Marc J, He Yudong D, van ‘t Veer Laura J, Dai Hongyue, Hart Augustinus AM, Voskuil Dorien W, et al. A Gene-Expression Signature as a Predictor of Survival in Breast Cancer. N Engl J Med. Massachusetts Medical Society; 347:1999–2009. [DOI] [PubMed] [Google Scholar]
- 23.Paik S, Tang G, Shak S, Kim C, Baker J, Kim W, et al. Gene expression and benefit of chemotherapy in women with node-negative, estrogen receptor-positive breast cancer. J Clin Oncol. 2006;24:3726–34. [DOI] [PubMed] [Google Scholar]
- 24.Tian S, Roepman P, Popovici V, Michaut M, Majewski I, Salazar R, et al. A robust genomic signature for the detection of colorectal cancer patients with microsatellite instability phenotype and high mutation frequency. J Pathol. 2012;228:586–95. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Boland CR, Goel A. Microsatellite instability in colorectal cancer. Gastroenterology. 2010;138:2073–2087.e3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Dinstag G, Shulman ED, Elis E, Ben-Zvi DS, Tirosh O, Maimon E, et al. Clinically oriented prediction of patient response to targeted and immunotherapies from the tumor transcriptome. Med. 2023;4:15–30.e8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Alvarez MJ, Subramaniam PS, Tang LH, Grunn A, Aburi M, Rieckhof G, et al. A precision oncology approach to the pharmacological targeting of mechanistic dependencies in neuroendocrine tumors. Nat Genet. 2018;50:979–89. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Mundi PS, Dela Cruz FS, Grunn A, Diolaiti D, Mauguen A, Rainey AR, et al. A Transcriptome-Based Precision Oncology Platform for Patient-Therapy Alignment in a Diverse Set of Treatment-Resistant Malignancies. Cancer Discov. 2023;13:1386–407. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Markowetz F. All models are wrong and yours are useless: making clinical prediction models impactful for patients. NPJ Precis Oncol. 2024;8:54. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Spencer KL, Absolom KL, Allsop MJ, Relton SD, Pearce J, Liao K, et al. Fixing the Leaky Pipe: How to Improve the Uptake of Patient-Reported Outcomes-Based Prognostic and Predictive Models in Cancer Clinical Practice. JCO Clin Cancer Inform. 2023;7:e2300070. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Rudin C. Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead. Nat Mach Intell. 2019;1:206–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.McShane LM, Altman DG, Sauerbrei W, Taube SE, Gion M, Clark GM, et al. REporting recommendations for tumour MARKer prognostic studies (REMARK). Br J Cancer. 2005;93:387–91. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.ICH Official web site : ICH [Internet]. [cited 2024 Apr 9]. Available from: https://www.ich.org/page/efficacy-guidelines
- 34.45 CFR part 170 subpart B -- standards and implementation specifications for health information technology [Internet]. [cited 2024 Apr 9]. Available from: https://www.ecfr.gov/current/title-45/subtitle-A/subchapter-D/part-170/subpart-B
- 35.45 CFR 164.514 -- Other requirements relating to uses and disclosures of protected health information [Internet] [cited 2024 Apr 9]. Available from: https://www.ecfr.gov/current/title-45/section-164.514
- 36.Seashore-Ludlow B, Rees MG, Cheah JH, Cokol M, Price EV, Coletti ME, et al. Harnessing Connectivity in a Large-Scale Small-Molecule Sensitivity Dataset. Cancer Discov. 2015;5:1210–23. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Shoemaker RH. The NCI60 human tumour cell line anticancer drug screen. Nat Rev Cancer. 2006;6:813–23. [DOI] [PubMed] [Google Scholar]
- 38.Iorio F, Knijnenburg TA, Vis DJ, Bignell GR, Menden MP, Schubert M, et al. A Landscape of Pharmacogenomic Interactions in Cancer. Cell. 2016;166:740–54. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Garnett MJ, Edelman EJ, Heidorn SJ, Greenman CD, Dastur A, Lau KW, et al. Systematic identification of genomic markers of drug sensitivity in cancer cells. Nature. 2012;483:570–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Pemovska T, Kontro M, Yadav B, Edgren H, Eldfors S, Szwajda A, et al. Individualized systems medicine strategy to tailor treatments for patients with chemorefractory acute myeloid leukemia. Cancer Discov. 2013;3:1416–29. [DOI] [PubMed] [Google Scholar]
- 41.Haverty PM, Lin E, Tan J, Yu Y, Lam B, Lianoglou S, et al. Reproducible pharmacogenomic profiling of cancer cell line panels. Nature. 2016;533:333–7. [DOI] [PubMed] [Google Scholar]
- 42.McMillan EA, Ryu M-J, Diep CH, Mendiratta S, Clemenceau JR, Vaden RM, et al. Chemistry-First Approach for Nomination of Personalized Treatment in Lung Cancer. Cell. 2018;173:864–878.e29. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Bruna A, Rueda OM, Greenwood W, Batra AS, Callari M, Batra RN, et al. A Biobank of Breast Cancer Explants with Preserved Intra-tumor Heterogeneity to Screen Anticancer Compounds. Cell. 2016;167:260–274.e22. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Yu N, Hwang M, Lee Y, Song BR, Kang EH, Sim H, et al. Patient-derived cell-based pharmacogenomic assessment to unveil underlying resistance mechanisms and novel therapeutics for advanced lung cancer. J Exp Clin Cancer Res. 2023;42:37. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Bennett AV, Tan X, Foster MC, Richardson DR, Bryant AL, Wood WA, et al. Is Disease Response a Patient-Centered Clinical Trial Endpoint in Acute Myeloid Leukemia: Differences in Symptom Burden and Physical Function By Response Status in the Beat-AML Master Trial. Blood. American Society of Hematology; 2022;140:3367–8. [Google Scholar]
- 46.Lee SH, Hu W, Matulay JT, Silva MV, Owczarek TB, Kim K, et al. Tumor Evolution and Drug Response in Patient-Derived Organoid Models of Bladder Cancer. Cell. 2018;173:515–528.e17. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.van de Wetering M, Francies HE, Francis JM, Bounova G, Iorio F, Pronk A, et al. Prospective derivation of a living organoid biobank of colorectal cancer patients. Cell. 2015;161:933–45. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Driehuis E, Kretzschmar K, Clevers H. Establishment of patient-derived cancer organoids for drug-screening applications. Nat Protoc. 2020;15:3380–409. [DOI] [PubMed] [Google Scholar]
- 49.Gao H, Korn JM, Ferretti S, Monahan JE, Wang Y, Singh M, et al. High-throughput screening using patient-derived tumor xenografts to predict clinical trial drug response. Nat Med. 2015;21:1318–25. [DOI] [PubMed] [Google Scholar]
- 50.Townsend EC, Murakami MA, Christodoulou A, Christie AL, Köster J, DeSouza TA, et al. The Public Repository of Xenografts Enables Discovery and Randomized Phase II-like Trials in Mice. Cancer Cell. 2016;30:183. [DOI] [PubMed] [Google Scholar]
- 51.Fernandez-Martinez A, Krop IE, Hillman DW, Polley M-Y, Parker JS, Huebner L, et al. Survival, Pathologic Response, and Genomics in CALGB 40601 (Alliance), a Neoadjuvant Phase III Trial of Paclitaxel-Trastuzumab With or Without Lapatinib in HER2-Positive Breast Cancer. J Clin Oncol. 2020;38:4184–93. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.O’Leary B, Cutts RJ, Liu Y, Hrebien S, Huang X, Fenwick K, et al. The Genetic Landscape and Clonal Evolution of Breast Cancer Resistance to Palbociclib plus Fulvestrant in the PALOMA-3 Trial. Cancer Discov. 2018;8:1390–403. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Wolf DM, Yau C, Wulfkuhle J, Brown-Swigart L, Gallagher RI, Lee PRE, et al. Redefining breast cancer subtypes to guide treatment prioritization and maximize response: Predictive biomarkers across 10 cancer therapies. Cancer Cell. 2022;40:609–623.e6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Shepherd JH, Ballman K, Polley M-YC, Campbell JD, Fan C, Selitsky S, et al. CALGB 40603 (Alliance): Long-Term Outcomes and Genomic Correlates of Response and Survival After Neoadjuvant Chemotherapy With or Without Carboplatin and Bevacizumab in Triple-Negative Breast Cancer. J Clin Oncol. 2022;40:1323–34. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Zhu Z, Turner NC, Loi S, André F, Martin M, Diéras V, et al. Comparative biomarker analysis of PALOMA-2/3 trials for palbociclib. NPJ Precis Oncol. 2022;6:56. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.van der Velden DL, Hoes LR, van der Wijngaart H, van Berge Henegouwen JM, van Werkhoven E, Roepman P, et al. The Drug Rediscovery protocol facilitates the expanded use of existing anticancer drugs. Nature. 2019;574:127–31. [DOI] [PubMed] [Google Scholar]
- 57.Ma X, Liu Y, Liu Y, Alexandrov LB, Edmonson MN, Gawad C, et al. Pan-cancer genome and transcriptome analyses of 1,699 paediatric leukaemias and solid tumours. Nature. 2018;555:371–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Quevedo R, Smirnov P, Tkachuk D, Ho C, El-Hachem N, Safikhani Z, et al. Assessment of Genetic Drift in Large Pharmacogenomic Studies. Cell Syst. 2020;11:393–401.e2. [DOI] [PubMed] [Google Scholar]
- 59.Ben-David U, Siranosian B, Ha G, Tang H, Oren Y, Hinohara K, et al. Genetic and transcriptional evolution alters cancer cell line drug response. Nature. 2018;560:325–30. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Trastulla L, Noorbakhsh J, Vazquez F, McFarland J, Iorio F. Computational estimation of quality and clinical relevance of cancer cell lines. Mol Syst Biol. 2022;18:e11017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Rae C, Amato F, Braconi C. Patient-Derived Organoids as a Model for Cancer Drug Discovery. Int J Mol Sci [Internet]. 2021;22. Available from: 10.3390/ijms22073483 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Golebiewska A, Hau A-C, Oudin A, Stieber D, Yabo YA, Baus V, et al. Patient-derived organoids and orthotopic xenografts of primary and recurrent gliomas represent relevant patient avatars for precision oncology. Acta Neuropathol. 2020;140:919–49. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Hidalgo M, Amant F, Biankin AV, Budinská E, Byrne AT, Caldas C, et al. Patient-derived xenograft models: an emerging platform for translational cancer research. Cancer Discov. 2014;4:998–1013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.He J, Zhang C, Ozkan A, Feng T, Duan P, Wang S, et al. Patient-derived tumor models and their distinctive applications in personalized drug therapy. Mechanobiology in Medicine. 2023;1:100014. [Google Scholar]
- 65.Nugent BM, Madabushi R, Buch B, Peiris V, Crentsil V, Miller VM, et al. Heterogeneity in treatment effects across diverse populations. Pharm Stat. 2021;20:929–38. [DOI] [PubMed] [Google Scholar]
- 66.Chakraborty S, Hosen MI, Ahmed M, Shekhar HU. Onco-Multi-OMICS Approach: A New Frontier in Cancer Research. Biomed Res Int. 2018;2018:9836256. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Dawood M, Vu QD, Young LS, Branson K, Jones L, Rajpoot N, et al. Cancer drug sensitivity prediction from routine histology images. NPJ Precis Oncol. 2024;8:5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Nakajima EC, Simpson A, Bogaerts J, de Vries EGE, Do R, Garalda E, et al. Tumor Size Is Not Everything: Advancing Radiomics as a Precision Medicine Biomarker in Oncology Drug Development and Clinical Care. A Report of a Multidisciplinary Workshop Coordinated by the RECIST Working Group. JCO Precis Oncol. 2024;8:e2300687. [DOI] [PubMed] [Google Scholar]
- 69.Zhou H, Luo Q, Wu W, Li N, Yang C, Zou L. Radiomics-guided checkpoint inhibitor immunotherapy for precision medicine in cancer: A review for clinicians. Front Immunol. 2023;14:1088874. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Thompson JC, Scholes DG, Carpenter EL, Aggarwal C. Molecular response assessment using circulating tumor DNA (ctDNA) in advanced solid tumors. Br J Cancer. 2023;129:1893–902. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Patelli G, Mauri G, Tosi F, Amatu A, Bencardino K, Bonazzina E, et al. Circulating Tumor DNA to Drive Treatment in Metastatic Colorectal Cancer. Clin Cancer Res. 2023;29:4530–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Bengio Y, Delalleau O. On the Expressive Power of Deep Architectures. Algorithmic Learning Theory. Springer Berlin Heidelberg; 2011. page 18–36. [Google Scholar]
- 73.Poole B, Lahiri S, Raghu M, Sohl-Dickstein JN, Ganguli S. Exponential expressivity in deep neural networks through transient chaos. Adv Neural Inf Process Syst. 2016;3360–8. [Google Scholar]
- 74.Jin I, Nam H. HiDRA: Hierarchical Network for Drug Response Prediction with Attention. J Chem Inf Model. 2021;61:3858–67. [DOI] [PubMed] [Google Scholar]
- 75.Zhao X, Singhal A, Park S, Kong J, Bachelder R, Ideker T. Cancer Mutations Converge on a Collection of Protein Assemblies to Predict Resistance to Replication Stress. Cancer Discov. 2024;14:508–23. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Park S, Silva E, Singhal A, Kelly MR, Licon K, Panagiotou I, et al. A deep learning model of tumor cell architecture elucidates response and resistance to CDK4/6 inhibitors. Nat Cancer. 2024;5:996–1009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Briscoe E, Feldman J. Conceptual complexity and the bias/variance tradeoff. Cognition. 2011;118:2–16. [DOI] [PubMed] [Google Scholar]
- 78.Domingos P. The Role of Occam’s Razor in Knowledge Discovery. Data Min Knowl Discov. 1999;3:409–25. [Google Scholar]
- 79.Thrun S, Pratt L. Learning to Learn. Springer US; 1998. [Google Scholar]
- 80.Raedt D. Advances in Inductive Logic Programming. 1st ed. NLD: IOS Press; 1996. [Google Scholar]
- 81.Cohen WW. Compiling Prior Knowledge Into an Explicit Bias. In: Sleeman D, Edwards P, editors. Machine Learning Proceedings 1992. San Francisco (CA): Morgan Kaufmann; 1992. page 102–10. [Google Scholar]
- 82.Tibshirani R. Regression Shrinkage and Selection Via the Lasso. J R Stat Soc Series B Stat Methodol. Oxford Academic; 2018;58:267–88. [Google Scholar]
- 83.Friedman J, Hastie T, Tibshirani R. Regularization Paths for Generalized Linear Models via Coordinate Descent. J Stat Softw. 2010;33:1–22. [PMC free article] [PubMed] [Google Scholar]
- 84.Quinlan JR. Simplifying decision trees. Int J Man Mach Stud. 1987;27:221–34. [Google Scholar]
- 85.Gou J, Yu B, Maybank SJ, Tao D. Knowledge Distillation: A Survey. Int J Comput Vis. 2021;129:1789–819. [Google Scholar]
- 86.CO2 Emissions and the 🤗 Hub: Leading the Charge [Internet]. [cited 2024 Jan 22]. Available from: https://huggingface.co/blog/carbon-emissions-on-the-hub
- 87.Friedman JH. Greedy function approximation: A gradient boosting machine. aos. Institute of Mathematical Statistics; 2001;29:1189–232. [Google Scholar]
- 88.Firoozbakht F, Yousefi B, Schwikowski B. An overview of machine learning methods for monotherapy drug response prediction. Brief Bioinform [Internet]. 2022;23. Available from: 10.1093/bib/bbab408 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 89.Park A, Lee Y, Nam S. A performance evaluation of drug response prediction models for individual drugs. Sci Rep. 2023;13:11911. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 90.Zhu Y, Brettin T, Evrard YA, Partin A, Xia F, Shukla M, et al. Ensemble transfer learning for the prediction of anti-cancer drug response. Sci Rep. 2020;10:18040. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 91.Sharifi-Noghabi H, Peng S, Zolotareva O, Collins CC, Ester M. AITL: Adversarial Inductive Transfer Learning with input and output space adaptation for pharmacogenomics. Bioinformatics. 2020;36:i380–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 92.Lee K, Cho D, Jang J, Choi K, Jeong H-O, Seo J, et al. RAMP: response-aware multi-task learning with contrastive regularization for cancer drug response prediction. Brief Bioinform [Internet]. 2023;24. Available from: 10.1093/bib/bbac504 [DOI] [PubMed] [Google Scholar]
- 93.Li Y, Hostallero DE, Emad A. Interpretable deep learning architectures for improving drug response prediction performance: myth or reality? Bioinformatics [Internet]. 2023;39. Available from: 10.1093/bioinformatics/btad390 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 94.Tang Y-C, Gottlieb A. Explainable drug sensitivity prediction through cancer pathway enrichment. Sci Rep. 2021;11:3128. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 95.Stone M. Cross-validatory choice and assessment of statistical predictions. J R Stat Soc. Wiley; 1974;36:111–33. [Google Scholar]
- 96.Eisenhauer EA, Therasse P, Bogaerts J, Schwartz LH, Sargent D, Ford R, et al. New response evaluation criteria in solid tumours: revised RECIST guideline (version 1.1). Eur J Cancer. 2009;45:228–47. [DOI] [PubMed] [Google Scholar]
- 97.Feaster DJ, Mitrani VB, Burns MJ, McCabe BE, Brincks AM, Rodriguez AE, et al. A randomized controlled trial of Structural Ecosystems Therapy for HIV medication adherence and substance abuse relapse prevention. Drug Alcohol Depend. 2010;111:227–34. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 98.Schulman KL, Berenson K, Tina Shih Y-C, Foley KA, Ganguli A, de Souza J, et al. A checklist for ascertaining study cohorts in oncology health services research using secondary data: report of the ISPOR oncology good outcomes research practices working group. Value Health. 2013;16:655–69. [DOI] [PubMed] [Google Scholar]
- 99.Partin A, Brettin T, Zhu Y, Dolezal JM, Kochanny S, Pearson AT, et al. Data augmentation and multimodal learning for predicting drug response in patient-derived xenografts from gene expressions and histology images. Front Med. 2023;10:1058919. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 100.He D, Liu Q, Wu Y, Xie L. A context-aware deconfounding autoencoder for robust prediction of personalized clinical drug response from cell-line compound screening. Nature Machine Intelligence. Nature Publishing Group; 2022;4:879–92. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 101.Moore DH II. Classification and regression trees, by Leo Breiman, Friedman Jerome H., Olshen Richard A., and Stone Charles J. Brooks/Cole Publishing, Monterey, 1984,358 pages, $27.95. Cytometry. Wiley; 1987;8:534–5. [Google Scholar]
- 102.Ma J, Yu MK, Fong S, Ono K, Sage E, Demchak B, et al. Using deep learning to model the hierarchical structure and function of a cell. Nat Methods. 2018;15:290–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 103.Kuenzi BM, Park J, Fong SH, Sanchez KS, Lee J, Kreisberg JF, et al. Predicting Drug Response and Synergy Using a Deep Learning Model of Human Cancer Cells. Cancer Cell. 2020;38:672–684.e6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 104.Elmarakeby HA, Hwang J, Arafeh R, Crowdis J, Gang S, Liu D, et al. Biologically informed deep neural network for prostate cancer discovery. Nature. 2021;598:348–52. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 105.Huang X, Huang K, Johnson T, Radovich M, Zhang J, Ma J, et al. ParsVNN: parsimony visible neural networks for uncovering cancer-specific and drug-sensitive genes and pathways. NAR Genom Bioinform. 2021;3:lqab097. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 106.Ba-Alawi W, Nair SK, Li B, Mammoliti A, Smirnov P, Mer AS, et al. Bimodal Gene Expression in Patients with Cancer Provides Interpretable Biomarkers for Drug Sensitivity. Cancer Res. 2022;82:2378–87. [DOI] [PubMed] [Google Scholar]
- 107.Ribeiro MT, Singh S, Guestrin C. “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining New York, NY, USA: Association for Computing Machinery; 2016. page 1135–44. [Google Scholar]
- 108.Office of the Commissioner. Artificial intelligence and medical products [Internet]. U.S. Food and Drug Administration. FDA; 2024. [cited 2024 Apr 4]. Available from: https://www.fda.gov/science-research/science-and-research-special-topics/artificial-intelligence-and-medical-products [Google Scholar]
- 109.Using artificial intelligence & machine learning in the development of drug and biological products [Internet]. [cited 2024 Apr 4]. Available from: https://www.fda.gov/media/167973
- 110.Narla A, Kuprel B, Sarin K, Novoa R, Ko J. Automated Classification of Skin Lesions: From Pixels to Practice. J. Invest. Dermatol 2018. page 2108–10. [DOI] [PubMed] [Google Scholar]
- 111.Winkler JK, Fink C, Toberer F, Enk A, Deinlein T, Hofmann-Wellenhof R, et al. Association Between Surgical Skin Markings in Dermoscopic Images and Diagnostic Performance of a Deep Learning Convolutional Neural Network for Melanoma Recognition. JAMA Dermatol. 2019;155:1135–41. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 112.DeGrave AJ, Janizek J, Lee S-I. AI for radiographic COVID-19 detection selects shortcuts over signal. Nature Machine Intelligence. Nature Publishing Group; 2021;3:610–9. [Google Scholar]
- 113.Peng RD, Hicks SC. Reproducible Research: A Retrospective. Annu Rev Public Health. 2021;42:79–93. [DOI] [PubMed] [Google Scholar]
- 114.Wilkinson MD, Dumontier M, Aalbersberg IJJ, Appleton G, Axton M, Baak A, et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data. 2016;3:160018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 115.Models - Hugging Face [Internet]. Hugging Face. [cited 2024 Jan 22]. Available from: https://huggingface.co/models [Google Scholar]
- 116.MHub - The New Deployment Standard for Medical Imaging AI [Internet]. [cited 2024 Jan 22]. Available from: https://mhub.ai/
- 117.Wilson SL, Way GP, Bittremieux W, Armache J-P, Haendel MA, Hoffman MM. Sharing biological data: why, when, and how. FEBS Lett. 2021;595:847–63. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 118.Heil BJ, Hoffman MM, Markowetz F, Lee S-I, Greene CS, Hicks SC. Reproducibility standards for machine learning in the life sciences. Nat Methods. 2021;18:1132–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 119.Haibe-Kains B, Adam GA, Hosny A, Khodakarami F, Massive Analysis Quality Control (MAQC) Society Board of Directors, Waldron L, et al. Transparency and reproducibility in artificial intelligence. Nature. 2020. page E14–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 120.Ethics and governance of artificial intelligence for health: WHO guidance. World Health Organization; 2021. [Google Scholar]
- 121.Corrêa NK, Galvão C, Santos JW, Del Pino C, Pinto EP, Barbosa C, et al. Worldwide AI ethics: A review of 200 guidelines and recommendations for AI governance. Patterns (N Y). 2023;4:100857. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 122.Social determinants of health [Internet]. [cited 2024 May 22]. Available from: https://www.who.int/health-topics/social-determinants-of-health
- 123.International Classification of Diseases (ICD) [Internet]. [cited 2024 May 22]. Available from: https://www.who.int/standards/classifications/classification-of-diseases
- 124.ICD-10-CM official guidelines for coding and reporting FY 2023 -- UPDATED April 1, 2023 (October 1, 2022 - September 30, 2023). 2023 [cited 2024 May 22]; Available from: https://stacks.cdc.gov/view/cdc/126426
- 125.Domingues B, Lopes JM, Soares P, Pópulo H. Melanoma treatment in review. Immunotargets Ther. 2018;7:35–49. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 126.Byeon S, Cho HJ, Jang K-T, Kwon M, Lee J, Lee J, et al. Molecular profiling of Asian patients with advanced melanoma receiving check-point inhibitor treatment. ESMO Open. 2021;6:100002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 127.Castellani G, Buccarelli M, Arasi MB, Rossi S, Pisanu ME, Bellenghi M, et al. BRAF Mutations in Melanoma: Biological Aspects, Therapeutic Implications, and Circulating Biomarkers. Cancers [Internet]. 2023;15. Available from: 10.3390/cancers15164026 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 128.Mosher JT, Pemberton TJ, Harter K, Wang C, Buzbas EO, Dvorak P, et al. Lack of population diversity in commonly used human embryonic stem-cell lines. N Engl J Med. 2010;362:183–5. [DOI] [PubMed] [Google Scholar]
- 129.Badal S. Ethnically diverse cancer cell lines for drug testing. Nat Rev Cancer. 2022;22:65–6. [DOI] [PubMed] [Google Scholar]
- 130.Verma S, Rubin J. Fairness definitions explained. Proceedings of the International Workshop on Software Fairness. New York, NY, USA: Association for Computing Machinery; 2018. page 1–7. [Google Scholar]
- 131.Vaughn L. Bioethics: Principles, Issues, and Cases. Oxford University Press; 2020. [Google Scholar]
- 132.Tännsjö T, Tännsjö T. Setting Health-Care Priorities: What Ethical Theories Tell Us. Oxford University Press; 2019. [Google Scholar]
- 133.Arneson RJ. Prioritarianism. Cambridge University Press; 2022. [Google Scholar]
- 134.Ma J, Fong SH, Luo Y, Bakkenist CJ, Shen JP, Mourragui S, et al. Few-shot learning creates predictive models of drug response that translate from high-throughput screens to individual patients. Nat Cancer. 2021;2:233–44. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 135.So E, Yu F, Wang B, Haibe-Kains B. Reusability report: Evaluating reproducibility and reusability of a fine-tuned model to predict drug response in cancer patient samples. Nature Machine Intelligence. Nature Publishing Group; 2023;5:792–8. [Google Scholar]
- 136.Meyers RM, Bryan JG, McFarland JM, Weir BA, Sizemore AE, Xu H, et al. Computational correction of copy number effect improves specificity of CRISPR–Cas9 essentiality screens in cancer cells. Nat Genet. Nature Publishing Group; 2017;49:1779–84. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 137.Smirnov P, Smith I, Safikhani Z, Ba-Alawi W, Khodakarami F, Lin E, et al. Evaluation of statistical approaches for association testing in noisy drug screening data. BMC Bioinformatics. 2022;23:188. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 138.Lenhof K, Gerstner N, Kehl T, Eckhart L, Schneider L, Lenhof H-P. MERIDA: a novel Boolean logic-based integer linear program for personalized cancer therapy. Bioinformatics. 2021;37:3881–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 139.Aharonov R, Dinstag G, Shulman E, Elis E, Ben-Zvi D, Tirosh O, et al. ENLIGHT: Pancancer response prediction to targeted and immunotherapies via tumor transcriptomics. J Clin Orthod. Wolters Kluwer; 2022;40:e13556–e13556. [Google Scholar]
- 140.Knijnenburg TA, Klau GW, Iorio F, Garnett MJ, McDermott U, Shmulevich I, et al. Logic models to predict continuous outputs based on binary inputs with an application to personalized cancer therapy. Sci Rep. 2016;6:36812. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 141.The Bletchley Declaration by Countries Attending the AI Safety Summit, 1–2 November 2023 [Internet]. Gov.uk. 2023. [cited 2024 May 22]. Available from: https://www.gov.uk/government/publications/ai-safety-summit-2023-the-bletchley-declaration/the-bletchley-declaration-by-countries-attending-the-ai-safety-summit-1-2-november-2023 [Google Scholar]
- 142.Russell RG, Lovett Novak L, Patel M, Garvey KV, Craig KJT, Jackson GP, et al. Competencies for the Use of Artificial Intelligence-Based Tools by Health Care Professionals. Acad Med. 2023;98:348–56. [DOI] [PubMed] [Google Scholar]
- 143.Lee TC, Shah NU, Haack A, Baxter SL. Clinical Implementation of Predictive Models Embedded within Electronic Health Record Systems: A Systematic Review. Informatics (MDPI) [Internet]. 2020;7. Available from: 10.3390/informatics7030025 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 144.Adam GA, Chang C-HK, Haibe-Kains B, Goldenberg A. Hidden Risks of Machine Learning Applied to Healthcare: Unintended Feedback Loops Between Models and Future Data Causing Model Degradation. In: Doshi-Velez F, Fackler J, Jung K, Kale D, Ranganath R, Wallace B, et al. , editors. Proceedings of the 5th Machine Learning for Healthcare Conference. PMLR; 07−-08 Aug 2020. page 710–31. [Google Scholar]
- 145.Sahiner B, Chen W, Samala RK, Petrick N. Data drift in medical machine learning: implications and potential remedies. Br J Radiol. 2023;96:20220878. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 146.Choudhury A. Factors influencing clinicians’ willingness to use an AI-based clinical decision support system. Front Digit Health. 2022;4:920662. [DOI] [PMC free article] [PubMed] [Google Scholar]




