Real-world data and evidence: pioneering frontiers in precision oncology

Jingxin JIANG; Weiwei PAN; Liyang SUN; Liwei PANG; Hailang CHEN; Jian HUANG; Wuzhen CHEN

doi:10.1631/jzus.B2400285

. 2025 Dec 22;27(1):44–57. doi: 10.1631/jzus.B2400285

Show available content in

Real-world data and evidence: pioneering frontiers in precision oncology

Jingxin JIANG ^1,^2,^3,^*, Weiwei PAN ^1,^2,^3,^*, Liyang SUN ^4,^*, Liwei PANG ^1,^2,³, Hailang CHEN ⁵, Jian HUANG ^1,^2,^3,^✉, Wuzhen CHEN ^1,^2,^5,^✉

PMCID: PMC12848558 PMID: 41601366

Abstract

Real-world studies (RWSs) have emerged as a transformative force in oncology research, complementing traditional randomized controlled trials (RCTs) by providing comprehensive insights into cancer care within routine clinical settings. This review examines the evolving landscape of RWSs in oncology, focusing on their implementation, methodological considerations, and impact on precision medicine. We systematically analyze how RWSs leverage diverse data sources, including electronic health records (EHRs), insurance claims, and patient registries, to generate evidence that bridges the gap between controlled clinical trials and real-world clinical practice. The review underscores the key contributions of RWSs, including capturing therapeutic outcomes in traditionally underrepresented populations, expanding drug indications, and evaluating long-term safety and effectiveness in routine clinical settings. While acknowledging significant challenges, including data quality variability and privacy concerns, we discuss how emerging technologies like artificial intelligence are helping to address these limitations. The integration of RWSs with traditional clinical research is revolutionizing the paradigm of precision oncology and enabling more personalized treatment approaches based on real-world evidence.

Keywords: Real-world study (RWS), Precision oncology, Real-world data (RWD), Study design, Data characterization

1. Introduction

The landscape of oncology research has undergone a paradigm shift with the emergence of real-world studies (RWSs), which have become an indispensable complement to traditional randomized controlled trials (RCTs). This evolution reflects the growing recognition that while RCTs remain the gold standard for establishing treatment efficacy, they may not fully capture the complexities and heterogeneity of cancer care in routine clinical practice (Berger et al., 2017).

Real-world data (RWD) encompasses a broad spectrum of health-related information generated beyond routine clinical trials, including electronic health records (EHRs), medical and pharmacy claims, disease registries, and patient-generated data from wearable devices and social media platforms. The integration of these diverse data sources, particularly when combined with multi-omics data, offers unprecedented opportunities to understand cancer biology, treatment responses, and patient outcomes in real-world settings. This comprehensive approach has proven invaluable in supporting clinical drug development, regulatory decision-making, and the advancement of precision oncology (Agiro et al., 2018; Gross et al., 2023; Verkerk and Voest, 2024). High-quality, purpose-built RWSs have emerged as crucial tools for expanding drug and medical device indications, monitoring post-market safety, and validating clinical trial outcomes in broader patient populations.

Although there are inherent challenges in implementing RWSs, including data quality assurance, methodological rigor, and analytical complexity, this review provides a comprehensive analysis of their application in oncology by examining the characteristics and quality requirements of data sources, essential design elements, current challenges, and emerging solutions. We aim to illustrate how the integration of RWSs with traditional clinical research is transforming our understanding of cancer biology and treatment, while providing a framework for future developments in precision oncology.

2. Real-world studies in oncology

2.1. Insights into disease epidemiology and burden

RWSs that utilize comprehensive registry systems have revolutionized our understanding of cancer epidemiology and disease burden (Global Burden of Disease Cancer Collaboration, 2019; Global Burden of Disease 2019 Cancer Collaboration, 2022). Leading administrative cancer registries, notably the Surveillance, Epidemiology, and End Results (SEER) program, systematically collect and analyze data on cancer incidence, treatment patterns, and outcomes across diverse populations. These registries are particularly valuable for studying demographic groups traditionally underrepresented in RCTs, including elderly patients, those with multiple comorbidities, and vulnerable populations, enabling more targeted and effective public health interventions.

The integration of population-based cancer registries, exemplified by the United States National Cancer Database, has further enhanced our understanding by linking epidemiological data with detailed individual-level information (Roshandel et al., 2023). These registries capture comprehensive patient profiles, including sociodemographic factors, lifestyle behaviors, and genomic characteristics, while connecting them to treatment outcomes and insurance claims. This integration provides invaluable insights into disease burden, prognosis, and survival patterns. A landmark example of the impact of RWD is the identification of hormone-replacement therapy as a breast cancer risk factor, a finding initially observed through observational studies and subsequently validated by RCTs (Collaborative Group on Hormonal Factors in Breast Cancer, 1997; Chlebowski et al., 2003). This illustrates the complementary relationship between real-world observation and controlled clinical research.

The impact of registry-based studies extends beyond clinical insights to inform healthcare policy and resource allocation. Global Burden of Disease (GBD) studies have been particularly influential in shaping public health strategies, especially in cancer screening and prevention. Notable successes include the implementation of cost-effective human papillomavirus (HPV) vaccination programs and systematic screening protocols, which have made cervical cancer elimination a realistic goal in many regions (Global Burden of Disease Cancer Collaboration, 2019; Voelker, 2023). Similarly, evidence-based mammography screening programs have significantly reduced breast cancer mortality in developed nations (Farkas and Nattinger, 2023). These achievements have catalyzed the establishment of national and regional cancer control initiatives that extend to low-income countries (Newman, 2022), demonstrating how real-world evidence (RWE) can drive meaningful improvements in global cancer care.

2.2. Assessment of treatment modalities and outcomes through real-world studies

RWSs provide critical insights into the effectiveness and accessibility of oncological treatments in routine clinical practice, often revealing significant variations from outcomes observed in RCTs (Moore et al., 2019; Chakiryan et al., 2021; Jazieh et al., 2021; Lin et al., 2021; Martin et al., 2023; Samlowski et al., 2023; Yan et al., 2023). These studies are particularly valuable in evaluating treatment efficacy, safety profiles, and patient-reported outcomes (PROs) among populations typically underrepresented in clinical trials, including patients with poor performance status, advanced age, multiple malignancies, or rare cancers. A landmark example is the expansion of palbociclib indications to include men with hormone receptor (HR)-positive, human epidermal growth factor receptor 2 (HER2)-negative metastatic breast cancer, approved based on RWD from EHRs, insurance claims, and global safety databases (Wedam et al., 2020).

RWSs offer unique advantages in scenarios where traditional prospective trials may be impractical or ethically challenging. The “trial effect” or “Hawthorne effect” (McCarney et al., 2007) observed in RCTs often yields outcomes that differ from real-world clinical experience. By leveraging diverse data sources, RWSs provide deeper insights into tumor pathology and treatment impacts on patient quality of life (Penberthy et al., 2022; Tang et al., 2023).

Comparative analyses of RWS and RCT findings have revealed important disparities that inform clinical practice improvements. For example, RWSs have shown that 40%‒50% of younger women with early-stage breast cancer experience interruptions in endocrine therapy, including ovarian suppression, a rate substantially higher than that reported in RCTs (Cluze et al., 2012). Similarly, investigations into trastuzumab emtansine (T-DM1) effectiveness in stage IV HER2-positive breast cancer have demonstrated reduced benefits in second-line treatment compared to pertuzumab-naive patients (Ethier et al., 2021). The study found that only 50.1% of patients received T-DM1 as second-line therapy, with median overall survival of 12‒19 months, considerably shorter than the 30 months reported in RCTs (Diéras et al., 2017). These findings underscore the limitations of generalizing RCT results to broader clinical settings.

Recent advances in artificial intelligence (AI) have enhanced RWS capabilities through the integration and analysis of complex multi-omics datasets, including genomics, pathomics, and radiomics (Stein-O'Brien et al., 2023). These AI-driven approaches efficiently process censored data and complex interactions, enabling the identification of novel prognostic markers (Prelaj et al., 2024). However, while promising, these findings require validation through well-designed prospective trials before implementation in clinical practice, highlighting the complementary relationship between RWSs and traditional clinical research.

2.3. Economic evaluation of cancer treatments through real-world studies

RWSs have emerged as essential tools for evaluating the economic impact of cancer treatments because they offer more accurate assessments than traditional economic forecasting models based on idealized assumptions. By analyzing comprehensive data from actual clinical settings, RWSs provide crucial insights into treatment costs, patient economic burdens, and comparative cost-effectiveness across different therapeutic strategies (Dai et al., 2022b; Pollard et al., 2022; Sinha et al., 2022; Soliman et al., 2023; van den Puttelaar et al., 2023). These evaluations are particularly valuable in understanding the financial implications of complex treatment decisions, including drug sequencing and dosing protocols, which enable healthcare systems to optimize resource allocation while maintaining quality care.

A compelling example of the economic impact of RWSs is the evaluation of immune checkpoint inhibitors, breakthrough therapies that have revolutionized cancer treatment but pose significant financial challenges to healthcare systems worldwide. The initial RWE suggested that lower doses of anti-programmed cell death protein-1 (anti-PD-1) therapies (nivolumab and pembrolizumab) could achieve comparable efficacy to standard dosing (Malmberg et al., 2022). This observation led to a pivotal randomized trial examining the combination of low-dose nivolumab with triple metronomic chemotherapy in head and neck cancer patients (Patil et al., 2023). The positive results from this study demonstrated that strategic dose optimization could maintain therapeutic efficacy while substantially reducing financial toxicity, thus establishing an alternative standard of care for patients with limited access to full-dose treatments. This exemplifies how RWS-driven insights can inform cost-effective treatment strategies without compromising clinical outcomes.

3. Real-world data in oncology

3.1. Characteristics of real-world data in oncology

The U.S. Food and Drug Administration (FDA) defines RWD as “data consistently accumulated from diverse sources pertinent to a patient’s health status and/or the provision of medical services” (Corrigan-Curay, 2018). While RCTs generate data in highly controlled environments with selected populations, RWD encompasses information from routine clinical practice, including preventive care, diagnostics, treatment outcomes, health examinations, insurance claims, and patient registries (Sherman et al., 2016; Jarow et al., 2017; Booth et al., 2019). This fundamental difference enables RWD to complement RCT findings by providing comprehensive insights into treatment effectiveness and safety across diverse patient populations (Fig. 1), ultimately enhancing clinical and regulatory decision-making (Gross et al., 2023; Merola et al., 2023). To better understand the complementary relationship between RWSs and RCTs, a comprehensive comparison of their respective strengths and limitations is essential (Table 1). This framework highlights how RWSs offer advantages in terms of generalizability and real-world applicability, while RCTs excel in internal validity and controlled assessment, underscoring the value of integrating both approaches in oncological research.

Table 1.

Characteristics of real-world studies (RWSs) and randomized controlled trials (RCTs)

Parameter	Characteristics
Parameter	RWS	RCT
Study design	Observational studies using routine clinical data; Multiple design options	Controlled experiments with randomization; Standardized protocols
Population	Broad, diverse patient populations	Selected participants meeting strict criteria; Excluding comorbidities
Data quality	Variable quality; Potential bias and confounding	High quality; Minimized bias through randomization
Outcomes	Long-term effectiveness; Patient-reported outcomes; Real-world safety	Specific endpoints; Efficacy under ideal conditions; Short-term focus
Implementation	Cost-effective; Faster recruitment; Applying existing data	Resource-intensive; Lengthy recruitment; Complex logistics
Strength	Real-world applicability; Population representativeness; Long-term insights	Strong causal inference; Internal validity; Standardized evidence
Limitation	Data quality concerns; Bias control challenges; Privacy issues	Limited generalizability; High cost; Artificial setting
Regulatory role	Complementary evidence; Growing acceptance	Primary evidence; Established standard

Open in a new tab

RCTs often exclude key patient populations encountered in routine practice, such as elderly patients, those with poor performance status, or individuals with significant comorbidities. RWD is particularly valuable for studying rare cancers, specific molecular subtypes, and complex patient groups typically excluded from traditional research settings. Post-marketing studies utilizing RWD are therefore essential for validating trial findings in real-world settings and understanding therapeutic effectiveness across diverse patient populations (Black, 1996; Kennedy-Martin et al., 2015; Skovlund et al., 2018). Despite its advantages, RWD faces significant challenges in completeness and accuracy (Cook and Collins, 2015). The integration and management of data from multiple sources requires extensive standardization and verification processes. Methodological limitations include population representativeness bias, incomplete data in registries, and reliability concerns with certain data sources (Booth et al., 2019; Tsai et al., 2019; Collins et al., 2020; Velmovitsky et al., 2021). Additionally, establishing causal relationships and conducting safety evaluations present unique analytical challenges (Flynn et al., 2022; Zhu et al., 2023). These limitations suggest that RWD should complement rather than replace RCTs in clinical decision-making, particularly in safety-critical areas.

3.2. Real-world data application in real-world precision oncology research

3.2.1. Enhancing screening programs

RWD has revolutionized cancer-screening programs by enabling comprehensive population-level analyses that integrate diverse data sources, including EHRs, cancer registries, and multidimensional clinical-genetic features. For example, the CanPredict model for lung cancer risk assessment (Liao et al., 2023) was developed using multiple data dimensions: sociodemographic factors (age, sex, ethnicity, etc.), lifestyle characteristics (body mass index (BMI), smoking status, alcohol consumption, etc.), and comprehensive medical history (comorbidities, previous medical history, and family history). By analyzing data from 19.67 million individuals across two English primary care databases, this model has emerged as a robust tool for identifying high-risk individuals who would benefit most from targeted lung cancer screening.

The impact of RWD on screening programs extends beyond risk stratification. A landmark study by Trentham-Dietz et al. (2024) demonstrated this potential by integrating six Cancer Intervention and Surveillance Modeling Network (CISNET) models with national breast cancer data. Their analysis, which showed significant reductions in breast cancer mortality with mammography screening, provided compelling evidence to support the initiation of biennial screening at age 40 years. RWD has also proven valuable in uncovering disparities in screening adherence, as demonstrated by a U.S. national lung cancer screening program analysis (Núñez et al., 2021), which highlighted significant gaps—particularly among low-income individuals, Black populations, and those with mental health conditions—thereby informing targeted interventions to improve screening accessibility and utilization.

3.2.2. Advancing diagnostic accuracy and treatment optimization

RWD has transformed diagnostic practices by providing comprehensive insights into disease presentation and progression across diverse populations, transcending the traditional “symptom-imaging-pathology” paradigm. AI-assisted tools have transformed radiological and pathological image interpretation by offering remarkable accuracy in tasks such as image classification, reconstruction, detection, segmentation, registration, and synthesis (Kleppe et al., 2021; Chen et al., 2022; Jiang et al., 2023). Platforms like QuPath (Bankhead et al., 2017) exemplify this advancement by integrating deep learning and machine-learning capabilities; this enables automated tumor detection and classification while supporting customized models for specific cellular identification and biomarker quantification.

The integration of big data has catalyzed a paradigm shift in oncological treatment strategies, from uniform histology-based approaches toward genomic-guided precision oncology. This evolution demands robust methodologies for evaluating novel patient subgroups, assessing real-world drug efficacy, and validating new biomarkers (Terranova and Venkatakrishnan, 2024; Verkerk and Voest, 2024). The discrepancy between clinical trial outcomes and real-world treatment patterns highlights the importance of RWD in optimizing therapeutic strategies. A compelling example is the treatment of older women (aged >75 years) with ductal carcinoma in situ (DCIS), a population traditionally excluded from clinical trials. Analysis of real-world treatment patterns revealed that while these patients are not necessarily frail, they often receive de-escalated interventions, potentially increasing the risk of disease progression, particularly in estrogen receptor-negative DCIS (Karakatsanis and Markopoulos, 2020). This finding emphasizes the need for individualized treatment approaches that consider not only tumor characteristics but also patient-specific factors such as life expectancy, performance status, and personal preferences, with the goal of optimizing oncological outcomes while preserving quality of life.

3.2.3. Accelerating drug development

The integration of RWD into drug-approval processes has significantly enhanced regulatory decision-making in oncology. RWE provides crucial insights into post-approval drug performance and safety while enabling expedited effectiveness assessments, particularly for treatments that address urgent health needs or rare diseases (Arondekar et al., 2022). Major regulatory agencies, including the FDA, European Medicines Agency (EMA), and China’s National Medical Products Administration (NMPA), have formally recognized the value of RWD in drug-approval frameworks (Arlett et al., 2022; Xu et al., 2024). Blinatumomab’s accelerated approval in 2014 exemplifies the successful integration of RWE in drug development. Its approval for treating adults with poor-prognosis Philadelphia chromosome-negative relapsed/refractory B-cell precursor acute lymphoblastic leukemia was based on a single-arm phase II study of 189 patients (Przepiorka et al., 2015), supported by RWE from a historical control group of 694 patients (Topp et al., 2015). A subsequent phase III RCT confirmed blinatumomab’s survival benefit over standard chemotherapy (Kantarjian et al., 2017), validating the initial RWE-supported approval.

3.2.4. Applying RWD/RWSs in breast cancer research

RWD has especially advanced breast cancer research across multiple domains, including screening and diagnosis patterns (Duggento et al., 2021; Zhang et al., 2023; Gennaro et al., 2024; Trentham-Dietz et al., 2024), quality-of-life assessments (Kirkham et al., 2019; Timmins et al., 2024), treatment outcomes (Agiro et al., 2018; Veitch et al., 2019; Anwar et al., 2021; Lin et al., 2021; Dai et al., 2022a; Wang et al., 2022; Wu et al., 2023), prognostic predictions (Stabellini et al., 2023), and cost-effectiveness analyses (Dinan et al., 2019). The Canadian rethinking clinical trials (REaCT) program demonstrates the efficiency of RWE in addressing recruitment challenges; it results in faster enrollment and improved cost-effectiveness compared to traditional RCTs (Basulaiman et al., 2019). A recent example is the expedited approval of goserelin (10.8 mg) for breast cancer treatment, based on the Ezreal study presented at the 2024 European Society for Medical Oncology (ESMO) Congress (Wang, 2024). This nationwide multicenter observational case-control study screened 15 629 patients across 16 Chinese hospitals, enrolling 1060 eligible participants. After propensity-score matching, analysis of 590 patients (295 per group) compared the estradiol-suppressing effects of 10.8 mg versus 3.6 mg goserelin in pre- and perimenopausal HR-positive breast cancer patients, demonstrating the non-inferiority of the higher-dose formulation.

4. Design essentials for real-world studies

The exponential growth of RWD applications in oncology research, marked by a seven-fold increase in publications over the past decade (Malone et al., 2018), highlights the critical need for robust methodological frameworks in conducting RWSs. While RWD offers unprecedented opportunities to bridge the gap between scientific discovery and clinical practice, the variable quality of studies underscores the importance of standardized approaches. Success in RWSs depends fundamentally on data accessibility, transparency, and methodological rigor. Essential elements include protocol pre-registration, adherence to data-privacy regulations, and maintenance of high research standards. Given the inherent complexities of RWD, generating high-quality RWE requires well-defined research questions, appropriate data selection, and robust study design. This systematic approach is crucial for identifying disparities between clinical evidence and practice, and ultimately advancing cancer care and improving patient outcomes (Franklin and Schneeweiss, 2017; Khozin et al., 2017; Miksad and Abernethy, 2018; Tang et al., 2023).

4.1. Articulating research questions using the PICOTS framework

RWSs serve three primary functions: evaluating intervention impacts, characterizing disease patterns in specific populations, and developing predictive models. The foundation of effective RWSs lies in well-defined research questions structured through the PICOTS framework: Population, Intervention, Comparison, Outcomes, Time, and Setting (U.S. Food and Drug Administration, 2023). This systematic approach ensures comprehensive consideration of all critical study elements, which facilitates proper study design and execution (Fig. 2).

Research questions in RWSs must be both scientifically meaningful and pragmatically answerable using available RWD. The focus should be on questions suited to observational studies rather than those requiring RCTs, such as the establishment of initial treatment efficacy (Baumfeld Andre et al., 2020). Successful RWS design demands a clear definition of the target population and setting, precise specification of interventions or exposures, selection of appropriate comparison groups, and establishment of well-defined, measurable outcomes within relevant time frames. Careful consideration must be given to the availability and quality of data sources that can support these research objectives.

4.2. Identifying the appropriate data source

The selection of optimal data sources is fundamental to the success of RWS. This process requires a comprehensive understanding of various data-source characteristics, including population representativeness, variable scope, and data architecture (Table 2). Researchers must ensure that selected datasets not only align with their research questions but also adhere to ethical guidelines, while providing sufficient coverage of target populations and necessary variables (Miksad and Abernethy, 2018). Different research objectives demand different data sources. For instance, cancer-burden assessments should primarily utilize comprehensive national or global epidemiological registries rather than EHRs. However, EHRs play a crucial role in providing detailed clinical information, including patient demographics, diagnoses, laboratory results, and treatment data. The integration of such longitudinal data enables thorough investigations of disease characteristics, treatment patterns, and outcomes in routine clinical practice (Penberthy et al., 2022).

Table 2.

Characterization of different data sources in real-world studies

Data source

Volume and format

Accessibility and quality

Primary applications

Major limitations

Clinical source

Electronic health records

Large;

Unstructured

Medium;

Variable

Clinical outcomes;

Treatment patterns;

Patient history

Data inconsistency;

Integration challenges

Disease registries

Medium;

Semi-structured

Medium;

High

Disease-specific outcomes;

Population studies;

Long-term follow-up

Limited scope;

Selection bias

Administrative source

Insurance claims

Large;

Structured

Low;

Standardized

Resource utilization;

Cost analysis;

Treatment patterns

Limited clinical detail;

Privacy restrictions

Patient-generated data

Wearable devices & apps

Small;

Continuous

High;

Variable

Real-time monitoring;

Patient behavior;

Quality of life

Small samples;

Validation needs

Social media

Medium;

Unstructured

High;

Low

Patient experience;

Treatment satisfaction;

Side effects

Reliability issues;

Verification challenges

Regulatory data

Safety monitoring systems

Medium;

Structured

Low;

High

Adverse event;

Safety signals;

Post-market surveillance

Reporting delays;

Incomplete capture

Open in a new tab

The complexity of RWSs often necessitates the integration of multiple data sources to achieve comprehensive analysis. Successful examples include drug safety research, in which combining EHRs with national adverse event monitoring systems provides complementary insights into both adverse event patterns and their contributing factors (Booth et al., 2019; Kim, 2024). Similarly, oncology survival analyses benefit from supplementing traditional follow-up data with household registration records to minimize patient attrition. However, this integration process presents significant challenges, particularly in maintaining patient privacy while ensuring reliable cross-dataset identification. While AI-based solutions offer promising approaches for data integration, their implementation requires careful evaluation of accuracy and completeness (Kovačević et al., 2024), and balancing data integration with privacy protection remains a central consideration in RWS methodology.

4.3. Ensuring data collection quality

RWSs require rigorous standardization of data extraction and processing protocols to maintain scientific credibility. Adherence to established reporting guidelines and transparent documentation of methodology are essential for ensuring result traceability and reproducibility (Wang et al., 2021). This transparency extends to acknowledging data-handling limitations and their potential impact on study outcomes, thereby minimizing the risk of the results being misinterpreted. Quality assurance in RWSs demands a comprehensive understanding of data generation processes. This includes systematic assessment of variable completeness, identification of missing or miscoded data, and regular quality verification through sampling. The implementation of standardized data collection methods helps minimize quality discrepancies at the source. Notable initiatives such as the Observational Health Data Sciences and Informatics (OHDSI) Common Data Model (Hripcsak et al., 2015) and the Fast Healthcare Interoperability Resources (FHIR) standard (Vorisek et al., 2022) have emerged to address these challenges by facilitating data harmonization across disparate sources, though careful consideration must be given to maintaining the integrity of underlying pathophysiological relationships.

4.4. Confounding factors: identification and control

RWSs face inherent challenges in terms of controlling confounding variables. To enhance study validity, researchers must implement robust methodological approaches, including propensity-score matching and careful study design. Cross-validation with independent data sources, such as clinical trials or national registries, further strengthens the credibility of findings and ensures appropriate comparisons between treatment groups.

Pragmatic clinical trials (PCTs) represent an innovative solution that bridges the gap between traditional RCTs and RWSs (Franklin and Schneeweiss, 2017; Derksen et al., 2019). In addition to maintaining randomization, PCTs accommodate broader patient populations and implement streamlined processes, including simplified consent procedures and flexible data-collection methods. This approach not only reduces participant burden but also facilitates long-term follow-up data collection. The above-mentioned REaCT program exemplifies successful PCT implementation in oncology (Basulaiman et al., 2019). By employing simplified eligibility criteria, streamlined consent processes, and efficient data collection methods, REaCT achieved remarkable success, enrolling over 2100 patients across 11 cancer centers in four years.

5. Application of AI in real-world data analysis

AI has emerged as a transformative tool in oncology research, enabling sophisticated pattern recognition and prediction from complex real-world datasets (Perez-Lopez et al., 2024). The integration of AI with RWD encompasses multiple complementary approaches, each addressing specific analytical challenges in cancer research.

5.1. Machine-learning algorithms

Traditional machine-learning methods, such as random forests and gradient boosting machines (GBMs), provide robust frameworks for predictive modeling and patient classification (Poirion et al., 2021). Deep learning approaches have shown promise, with convolutional neural networks (CNNs) excelling in medical image analysis and pattern recognition, while recurrent neural networks (RNNs) effectively handle sequential and temporal data. For example, CNNs have demonstrated superior performance in analyzing medical imaging data, achieving results comparable to or exceeding those of expert radiologists in certain diagnostic tasks (Li et al., 2019).

5.2. Natural language processing

Natural language processing (NLP) has revolutionized the extraction of insights from unstructured clinical data (Yim et al., 2016). Through algorithms such as Latent Dirichlet Allocation and term frequency-inverse document frequency (TF-IDF), NLP enables automated analysis of clinical notes, pathology reports, and patient feedback. This capability extends to analyzing PROs and social media data, providing valuable insights into treatment satisfaction and patient experiences (Lu et al., 2021; Sim et al., 2023).

5.3. Reinforcement learning

Reinforcement learning (RL) represents a promising frontier in treatment optimization (Xu et al., 2023). By learning from patient interactions and outcomes, RL algorithms can simulate and identify optimal treatment pathways. This approach has successfully informed personalized cancer-screening strategies and diagnoses (Qaiser and Rajpoot, 2019; Yala et al., 2022), treatment-regimen design (Tortora et al., 2021; Lu et al., 2024), and clinical trial optimization (Zhao et al., 2009).

5.4. Challenges and considerations

Despite the potential benefits, the effectiveness of AI applications depends critically on data quality and representativeness. The “black box” nature of many AI models presents challenges for clinical interpretation, driving ongoing development of more transparent approaches, such as the SHapley Additive exPlanations (SHAP) analysis (Hsu et al., 2023). These challenges underscore the importance of combining AI capabilities with robust clinical validation and interpretability frameworks.

6. Challenges and advancements of real-world studies in oncology

While RWD offers unprecedented advantages in terms of scale, efficiency, and cost-effectiveness, it also faces significant challenges with regard to data quality, heterogeneity, and inherent biases. The diverse nature of RWD, which is collected from routine clinical practice, presents complex challenges, including missing values, coding inconsistencies, and the integration of disparate “data islands.” Standardization of data collection, management, and analysis processes is therefore key for oncological research, and this will require sophisticated data-cleansing algorithms and robust research guidelines.

The management of sensitive health information poses additional ethical challenges, demanding stringent protocols for data security, access control, and privacy protection. These considerations necessitate the development of standardized ethical review procedures and strict data-anonymization protocols to safeguard patient confidentiality while maintaining data utility for research purposes.

To address these challenges, the oncology research community is developing innovative solutions and collaborative frameworks. The Cancer Core Europe (CCE) initiative, comprising seven leading European cancer centers, exemplifies this approach through its establishment of a virtual data hub linking molecular profiles with clinical outcomes while addressing legal and privacy concerns (Eggermont et al., 2019). Such initiatives highlight the importance of multi-institutional collaboration in establishing unified data standards and enhancing the representativeness of research findings.

7. Conclusions

RWSs are bringing about a paradigm shift in oncological research by providing detailed information about patient demographics, long-term outcomes, and treatment safety. The potential of RWSs is becoming more substantial only as regulatory frameworks change and technology advances. To fully realize this potential, it is important to have standardized protocols for data collection, analysis, and interpretation. Strict research standards and ethical guidelines ensure that the medical community can use RWSs to advance precision oncology, thus improving patient outcomes.

Acknowledgments

This work was supported by the Zhejiang Provincial Natural Science Foundation (No. ZCLY24H1601), the National Natural Science Foundation of China (No. 82403697), the Medical and Health Science and Technology Project of Zhejiang Province (No. 2025KY411), and the National Key R&D Program of China (No. 2022YFC2505100). We thank Dr. Chao NI, Dr. Xuan SHAO, Dr. Zhigang CHEN, Dr. Ke WANG, Dr. Jun YE, and Dr. Pin WU (all from The Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China) for their invaluable advice and insights during the preparation of this manuscript. Their expertise significantly contributed to the completion of this work.

Author contributions

Jingxin JIANG led the design and conceptualization of the review, drafted significant sections of the manuscript, and was responsible for the creation of the figures to visually represent key concepts. Wuzhen CHEN assisted in the conceptualization and contributed to drafting and refining the manuscript. Weiwei PAN, Liyang SUN, Liwei PANG, and Hailang CHEN were instrumental in collecting and analyzing the literature, and they actively participated in revising the manuscript. Jian HUANG supervised the overall project and focused on critically revising the manuscript for important intellectual content. All authors reviewed, edited, and approved the final manuscript, ensuring the accuracy and integrity of the work.

Compliance with ethics guidelines

Jingxin JIANG, Weiwei PAN, Liyang SUN, Liwei PANG, Hailang CHEN, Jian HUANG, and Wuzhen CHEN declare that they have no conflicts of interest.

This article does not contain any studies with human or animal subjects performed by any of the authors.

References

Agiro A, DeVries A, Malin J, et al. , 2018. Real-world impact of a decision support tool on colony-stimulating factor use and chemotherapy-induced febrile neutropenia among patients with breast cancer. J Natl Compr Canc Netw, 16(2): 162-169. 10.6004/jnccn.2017.7033 [DOI] [PubMed] [Google Scholar]
Anwar M, Chen QT, Ouyang DJ, et al. , 2021. Pyrotinib treatment in patients with HER2-positive metastatic breast cancer and brain metastasis: exploratory final analysis of real-world, multicenter data. Clin Cancer Res, 27(16): 4634-4641. 10.1158/1078-0432.CCR-21-0474 [DOI] [PMC free article] [PubMed] [Google Scholar]
Arlett P, Kjær J, Broich K, et al. , 2022. Real-world evidence in EU medicines regulation: enabling use and establishing value. Clin Pharmacol Ther, 111(1): 21-23. 10.1002/cpt.2479 [DOI] [PMC free article] [PubMed] [Google Scholar]
Arondekar B, Duh MS, Bhak RH, et al. , 2022. Real-world evidence in support of oncology product registration: a systematic review of new drug application and biologics license application approvals from 2015-2020. Clin Cancer Res, 28(1): 27-35. 10.1158/1078-0432.CCR-21-2639 [DOI] [PMC free article] [PubMed] [Google Scholar]
Bankhead P, Loughrey MB, Fernández JA, et al. , 2017. QuPath: open source software for digital pathology image analysis. Sci Rep, 7: 16878. 10.1038/s41598-017-17204-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
Basulaiman B, Awan AA, Fergusson D, et al. , 2019. Creating a pragmatic trials program for breast cancer patients: rethinking clinical trials (REaCT). Breast Cancer Res Treat, 177(1): 93-101. 10.1007/s10549-019-05274-0 [DOI] [PubMed] [Google Scholar]
Baumfeld Andre E, Reynolds R, Caubel P, et al. , 2020. Trial designs using real-world data: the changing landscape of the regulatory approval process. Pharmacoepidemiol Drug Saf, 29(10): 1201-1212. 10.1002/pds.4932 [DOI] [PMC free article] [PubMed] [Google Scholar]
Berger M, Daniel G, Frank K, et al. , 2017. A Framework for Regulatory Use of Real-World Evidence. Duke Margolis Center for Health Policy, Washington, DC, USA. [Google Scholar]
Black N, 1996. Why we need observational studies to evaluate the effectiveness of health care. BMJ, 312(7040): 1215-1218. 10.1136/bmj.312.7040.1215 [DOI] [PMC free article] [PubMed] [Google Scholar]
Booth CM, Karim S, Mackillop WJ, 2019. Real-world data: towards achieving the achievable in cancer care. Nat Rev Clin Oncol, 16(5): 312-325. 10.1038/s41571-019-0167-7 [DOI] [PubMed] [Google Scholar]
Chakiryan NH, Jiang DD, Gillis KA, et al. , 2021. Real-world survival outcomes associated with first-line immunotherapy, targeted therapy, and combination therapy for metastatic clear cell renal cell carcinoma. JAMA Netw Open, 4(5): e2111329. 10.1001/jamanetworkopen.2021.11329 [DOI] [PMC free article] [PubMed] [Google Scholar]
Chen XX, Wang XM, Zhang K, et al. , 2022. Recent advances and clinical applications of deep learning in medical image analysis. Med Image Anal, 79: 102444. 10.1016/j.media.2022.102444 [DOI] [PMC free article] [PubMed] [Google Scholar]
Chlebowski RT, Hendrix SL, Langer RD, et al. , 2003. Influence of estrogen plus progestin on breast cancer and mammography in healthy postmenopausal women: the women’s health initiative randomized trial. JAMA, 289(24): 3243-3253. 10.1001/jama.289.24.3243 [DOI] [PubMed] [Google Scholar]
Cluze C, Rey D, Huiart L, et al. , 2012. Adjuvant endocrine therapy with tamoxifen in young women with breast cancer: determinants of interruptions vary over time. Ann Oncol, 23(4): 882-890. 10.1093/annonc/mdr330 [DOI] [PubMed] [Google Scholar]
Collaborative Group on Hormonal Factors in Breast Cancer , 1997. Breast cancer and hormone replacement therapy: collaborative reanalysis of data from 51 epidemiological studies of 52 705 women with breast cancer and 108 411 women without breast cancer. Lancet, 350(9084): 1047-1059. 10.1016/S0140-6736(97)08233-0 [DOI] [PubMed] [Google Scholar]
Collins R, Bowman L, Landray M, et al. , 2020. The magic of randomization versus the myth of real-world evidence. N Engl J Med, 382(7): 674-678. 10.1056/NEJMsb1901642 [DOI] [PubMed] [Google Scholar]
Cook JA, Collins GS, 2015. The rise of big clinical databases. Br J Surg, 102(2): e93-e101. 10.1002/bjs.9723 [DOI] [PubMed] [Google Scholar]
Corrigan-Curay J, 2018. Framework for FDA’s Real-World Evidence Program. U.S. Food and Drug Administration, Silver Spring. https://www.fda.gov/media/123160/download[accessed on Nov. 25, 2024]. [Google Scholar]
Dai WF, Beca JM, Nagamuthu C, et al. , 2022a. Comparative effectiveness and safety of pertuzumab and trastuzumab plus chemotherapy vs trastuzumab plus chemotherapy for treatment of metastatic breast cancer. JAMA Netw Open, 5(2): e2145460. 10.1001/jamanetworkopen.2021.45460 [DOI] [PMC free article] [PubMed] [Google Scholar]
Dai WF, Beca JM, Nagamuthu C, et al. , 2022b. Cost-effectiveness analysis of pertuzumab with trastuzumab in patients with metastatic breast cancer. JAMA Oncol, 8(4): 597-606. 10.1001/jamaoncol.2021.8049 [DOI] [PMC free article] [PubMed] [Google Scholar]
Derksen JWG, May AM, Koopman M, 2019. The era of alternative designs to connect randomized clinical trials and real-world data. Nat Rev Clin Oncol, 16(9): 589. 10.1038/s41571-019-0250-0 [DOI] [PubMed] [Google Scholar]
Diéras V, Miles D, Verma S, et al. , 2017. Trastuzumab emtansine versus capecitabine plus lapatinib in patients with previously treated HER2-positive advanced breast cancer (EMILIA): a descriptive analysis of final overall survival results from a randomised, open-label, phase 3 trial. Lancet Oncol, 18(6): 732-742. 10.1016/S1470-2045(17)30312-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
Dinan MA, Wilson LE, Reed SD, 2019. Chemotherapy costs and 21-gene recurrence score genomic testing among medicare beneficiaries with early-stage breast cancer, 2005 to 2011. J Natl Compr Canc Netw, 17(3): 245-254. 10.6004/jnccn.2018.7097 [DOI] [PubMed] [Google Scholar]
Duggento A, Conti A, Mauriello A, et al. , 2021. Deep computational pathology in breast cancer. Semin Cancer Biol, 72: 226-237. 10.1016/j.semcancer.2020.08.006 [DOI] [PubMed] [Google Scholar]
Eggermont AMM, Apolone G, Baumann M, et al. , 2019. Cancer Core Europe: a translational research infrastructure for a European mission on cancer. Mol Oncol, 13(3): 521-527. 10.1002/1878-0261.12447 [DOI] [PMC free article] [PubMed] [Google Scholar]
Ethier JL, Desautels D, Robinson A, et al. , 2021. Practice patterns and outcomes of novel targeted agents for the treatment of ERBB2-positive metastatic breast cancer. JAMA Oncol, 7(9): e212140. 10.1001/jamaoncol.2021.2140 [DOI] [PMC free article] [PubMed] [Google Scholar]
Farkas AH, Nattinger AB, 2023. Breast cancer screening and prevention. Ann Intern Med, 176(11): ITC161-ITC176. 10.7326/AITC202311210 [DOI] [PubMed] [Google Scholar]
Flynn R, Plueschke K, Quinten C, et al. , 2022. Marketing authorization applications made to the European medicines agency in 2018‒2019: what was the contribution of real-world evidence? Clin Pharmacol Ther, 111(1): 90-97. 10.1002/cpt.2461 [DOI] [PMC free article] [PubMed] [Google Scholar]
Franklin JM, Schneeweiss S, 2017. When and how can real world data analyses substitute for randomized controlled trials? Clin Pharmacol Ther, 102(6): 924-933. 10.1002/cpt.857 [DOI] [PubMed] [Google Scholar]
Gennaro G, Bucchi L, Ravaioli A, et al. , 2024. The risk-based breast screening (RIBBS) study protocol: a personalized screening model for young women. Radiol Med, 129(5): 727-736. 10.1007/s11547-024-01797-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
Global Burden of Disease Cancer Collaboration , 2019. Global, regional, and national cancer incidence, mortality, years of life lost, years lived with disability, and disability-adjusted life-years for 29 cancer groups, 1990 to 2017: a systematic analysis for the global burden of disease study. JAMA Oncol, 5(12): 1749-1768. 10.1001/jamaoncol.2019.2996 [DOI] [PMC free article] [PubMed] [Google Scholar]
Global Burden of Disease 2019 Cancer Collaboration , 2022. Cancer incidence, mortality, years of life lost, years lived with disability, and disability-adjusted life years for 29 cancer groups from 2010 to 2019: a systematic analysis for the global burden of disease study 2019. JAMA Oncol, 8(3): 420-444. 10.1001/jamaoncol.2021.6987 [DOI] [PMC free article] [PubMed] [Google Scholar]
Gross AJ, Pisano CE, Khunsriraksakul C, et al. , 2023. Real-world data: applications and relevance to cancer clinical trials. Semin Radiat Oncol, 33(4): 374-385. 10.1016/j.semradonc.2023.06.003 [DOI] [PubMed] [Google Scholar]
Hripcsak G, Duke JD, Shah NH, et al. , 2015. Observational health data sciences and informatics (OHDSI): opportunities for observational researchers. Stud Health Technol Inform, 216: 574-578. 10.3233/978-1-61499-564-7-574 [DOI] [PMC free article] [PubMed] [Google Scholar]
Hsu WH, Ko AT, Weng CS, et al. , 2023. Explainable machine learning model for predicting skeletal muscle loss during surgery and adjuvant chemotherapy in ovarian cancer. J Cachexia Sarcopenia Muscle, 14(5): 2044-2053. 10.1002/jcsm.13282 [DOI] [PMC free article] [PubMed] [Google Scholar]
Jarow JP, LaVange L, Woodcock J, 2017. Multidimensional evidence generation and FDA regulatory decision making: defining and using “real-world” data. JAMA, 318(8): 703-704. 10.1001/jama.2017.9991 [DOI] [PubMed] [Google Scholar]
Jazieh AR, Onal HC, Tan DSW, et al. , 2021. Real-world treatment patterns and clinical outcomes in patients with stage III NSCLC: results of KINDLE, a multicountry observational study. J Thorac Oncol, 16(10): 1733-1744. 10.1016/j.jtho.2021.05.003 [DOI] [PubMed] [Google Scholar]
Jiang XY, Hu ZJ, Wang SH, et al. , 2023. Deep learning for medical image-based cancer diagnosis. Cancers, 15(14): 3608. 10.3390/cancers15143608 [DOI] [PMC free article] [PubMed] [Google Scholar]
Kantarjian H, Stein A, Gökbuget N, et al. , 2017. Blinatumomab versus chemotherapy for advanced acute lymphoblastic leukemia. N Engl J Med, 376(9): 836-847. 10.1056/NEJMoa1609783 [DOI] [PMC free article] [PubMed] [Google Scholar]
Karakatsanis A, Markopoulos C, 2020. The challenge of avoiding over- and under-treatment in older women with ductal cancer in situ: a scoping review of existing knowledge gaps and a meta-analysis of real-world practice patterns. J Geriatr Oncol, 11(6): 917-925. 10.1016/j.jgo.2020.02.005 [DOI] [PubMed] [Google Scholar]
Kennedy-Martin T, Curtis S, Faries D, et al. , 2015. A literature review on the representativeness of randomized controlled trial samples and implications for the external validity of trial results. Trials, 16: 495. 10.1186/s13063-015-1023-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
Khozin S, Blumenthal GM, Pazdur R, 2017. Real-world data for clinical evidence generation in oncology. J Natl Cancer Inst, 109(11): djx187. 10.1093/jnci/djx187 [DOI] [PubMed] [Google Scholar]
Kim HS, 2024. Dark data in real-world evidence: challenges, implications, and the imperative of data literacy in medical research. J Korean Med Sci, 39(9): e92. 10.3346/jkms.2024.39.e92 [DOI] [PMC free article] [PubMed] [Google Scholar]
Kirkham AA, Bland KA, Wollmann H, et al. , 2019. Maintenance of fitness and quality-of-life benefits from supervised exercise offered as supportive care for breast cancer. J Natl Compr Canc Netw, 17(6): 695-702. 10.6004/jnccn.2018.7276 [DOI] [PubMed] [Google Scholar]
Kleppe A, Skrede OJ, de Raedt S, et al. , 2021. Designing deep learning studies in cancer diagnostics. Nat Rev Cancer, 21(3): 199-211. 10.1038/s41568-020-00327-9 [DOI] [PubMed] [Google Scholar]
Kovačević A, Bašaragin B, Milošević N, et al. , 2024. De-identification of clinical free text using natural language processing: a systematic review of current approaches. Artif Intell Med, 151: 102845. 10.1016/j.artmed.2024.102845 [DOI] [PubMed] [Google Scholar]
Li XC, Zhang S, Zhang Q, et al. , 2019. Diagnosis of thyroid cancer using deep convolutional neural network models applied to sonographic images: a retrospective, multicohort, diagnostic study. Lancet Oncol, 20(2): 193-201. 10.1016/S1470-2045(18)30762-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
Liao WQ, Coupland CAC, Burchardt J, et al. , 2023. Predicting the future risk of lung cancer: development, and internal and external validation of the CanPredict (lung) model in 19.67 million people and evaluation of model performance against seven other risk prediction models. Lancet Respir Med, 11(8): 685-697. 10.1016/S2213-2600(23)00050-4 [DOI] [PubMed] [Google Scholar]
Lin AP, Huang TW, Tam KW, 2021. Treatment of male breast cancer: meta-analysis of real-world evidence. Br J Surg, 108(9): 1034-1042. 10.1093/bjs/znab279 [DOI] [PubMed] [Google Scholar]
Lu YT, Chu Q, Li Z, et al. , 2024. Deep reinforcement learning identifies personalized intermittent androgen deprivation therapy for prostate cancer. Brief Bioinform, 25(2): bbae071. 10.1093/bib/bbae071 [DOI] [PMC free article] [PubMed] [Google Scholar]
Lu ZH, Sim JA, Wang JX, et al. , 2021. Natural language processing and machine learning methods to characterize unstructured patient-reported outcomes: validation study. J Med Internet Res, 23(11): e26777. 10.2196/26777 [DOI] [PMC free article] [PubMed] [Google Scholar]
Malmberg R, Zietse M, Dumoulin DW, et al. , 2022. Alternative dosing strategies for immune checkpoint inhibitors to improve cost-effectiveness: a special focus on nivolumab and pembrolizumab. Lancet Oncol, 23(12): e552-e561. 10.1016/S1470-2045(22)00554-X [DOI] [PubMed] [Google Scholar]
Malone DC, Brown M, Hurwitz JT, et al. , 2018. Real-world evidence: useful in the real world of us payer decision making? How? When? And What studies? Value Health, 21(3): 326-333. 10.1016/j.jval.2017.08.3013 [DOI] [PubMed] [Google Scholar]
Martin P, Cohen JB, Wang M, et al. , 2023. Treatment outcomes and roles of transplantation and maintenance rituximab in patients with previously untreated mantle cell lymphoma: results from large real-world cohorts. J Clin Oncol, 41(3): 541-554. 10.1200/JCO.21.02698 [DOI] [PMC free article] [PubMed] [Google Scholar]
McCarney R, Warner J, Iliffe S, et al. , 2007. The Hawthorne effect: a randomised, controlled trial. BMC Med Res Methodol, 7: 30. 10.1186/1471-2288-7-30 [DOI] [PMC free article] [PubMed] [Google Scholar]
Merola D, Campbell U, Gautam N, et al. , 2023. The aetion coalition to advance real-world evidence through randomized controlled trial emulation initiative: oncology. Clin Pharmacol Ther, 113(6): 1217-1222. 10.1002/cpt.2800 [DOI] [PubMed] [Google Scholar]
Miksad RA, Abernethy AP, 2018. Harnessing the power of real-world evidence (RWE): a checklist to ensure regulatory-grade data quality. Clin Pharmacol Ther, 103(2): 202-205. 10.1002/cpt.946 [DOI] [PMC free article] [PubMed] [Google Scholar]
Moore S, Leung B, Wu J, et al. , 2019. Real-world treatment of stage III NSCLC: the role of trimodality treatment in the era of immunotherapy. J Thorac Oncol, 14(8): 1430-1439. 10.1016/j.jtho.2019.04.005 [DOI] [PubMed] [Google Scholar]
Newman LA, 2022. Breast cancer screening in low and middle-income countries. Best Pract Res Clin Obstet Gynaecol, 83: 15-23. 10.1016/j.bpobgyn.2022.03.018 [DOI] [PubMed] [Google Scholar]
Núñez ER, Caverly TJ, Zhang SQ, et al. , 2021. Adherence to follow-up testing recommendations in US veterans screened for lung cancer, 2015-2019. JAMA Netw Open, 4(7): e2116233. 10.1001/jamanetworkopen.2021.16233 [DOI] [PMC free article] [PubMed] [Google Scholar]
Patil VM, Noronha V, Menon N, et al. , 2023. Low-dose immunotherapy in head and neck cancer: a randomized study. J Clin Oncol, 41(2): 222-232. 10.1200/JCO.22.01015 [DOI] [PubMed] [Google Scholar]
Penberthy LT, Rivera DR, Lund JL, et al. , 2022. An overview of real-world data sources for oncology and considerations for research. CA Cancer J Clin, 72(3): 287-300. 10.3322/caac.21714 [DOI] [PubMed] [Google Scholar]
Perez-Lopez R, Ghaffari Laleh N, Mahmood F, et al. , 2024. A guide to artificial intelligence for cancer researchers. Nat Rev Cancer, 24(6): 427-441. 10.1038/s41568-024-00694-7 [DOI] [PubMed] [Google Scholar]
Poirion OB, Jing Z, Chaudhary K, et al. , 2021. DeepProg: an ensemble of deep-learning and machine-learning models for prognosis prediction using multi-omics data. Genome Med, 13: 112. 10.1186/s13073-021-00930-x [DOI] [PMC free article] [PubMed] [Google Scholar]
Pollard S, Weymann D, Chan B, et al. , 2022. Defining a core data set for the economic evaluation of precision oncology. Value Health, 25(8): 1371-1380. 10.1016/j.jval.2022.01.005 [DOI] [PubMed] [Google Scholar]
Prelaj A, Miskovic V, Zanitti M, et al. , 2024. Artificial intelligence for predictive biomarker discovery in immuno-oncology: a systematic review. Ann Oncol, 35(1): 29-65. 10.1016/j.annonc.2023.10.125 [DOI] [PubMed] [Google Scholar]
Przepiorka D, Ko CW, Deisseroth A, et al. , 2015. FDA approval: blinatumomab. Clin Cancer Res, 21(18): 4035-4039. 10.1158/1078-0432.CCR-15-0612 [DOI] [PubMed] [Google Scholar]
Qaiser T, Rajpoot NM, 2019. Learning where to see: a novel attention model for automated immunohistochemical scoring. IEEE Trans Med Imaging, 38(11): 2620-2631. 10.1109/TMI.2019.2907049 [DOI] [PubMed] [Google Scholar]
Roshandel G, Badar F, Barchuk A, et al. , 2023. REPCAN: guideline for REporting population-based CANcer registry data. Asian Pac J Cancer Prev, 24(9): 3297-3303. 10.31557/APJCP.2023.24.9.3297 [DOI] [PMC free article] [PubMed] [Google Scholar]
Samlowski W, Robert NJ, Chen LW, et al. , 2023. Real-world nivolumab dosing patterns and safety outcomes in patients receiving adjuvant therapy for melanoma. Cancer Med, 12(3): 2378-2388. 10.1002/cam4.5061 [DOI] [PMC free article] [PubMed] [Google Scholar]
Sherman RE, Anderson SA, Dal Pan GJ, et al. , 2016. Real-world evidence — what is it and what can it tell us? N Engl J Med, 375(23): 2293-2297. 10.1056/NEJMsb1609216 [DOI] [PubMed] [Google Scholar]
Sim JA, Huang XL, Horan MR, et al. , 2023. Natural language processing with machine learning methods to analyze unstructured patient-reported outcomes derived from electronic health records: a systematic review. Artif Intell Med, 146: 102701. 10.1016/j.artmed.2023.102701 [DOI] [PMC free article] [PubMed] [Google Scholar]
Sinha S, Laskar SG, Wadasadawala T, et al. , 2022. Adopting health economic research in radiation oncology: a perspective from low- or middle-income countries. JCO Glob Oncol, 8: e2100374. 10.1200/GO.21.00374 [DOI] [PMC free article] [PubMed] [Google Scholar]
Skovlund E, Leufkens HGM, Smyth JF, 2018. The use of real-world data in cancer drug development. Eur J Cancer, 101: 69-76. 10.1016/j.ejca.2018.06.036 [DOI] [PubMed] [Google Scholar]
Soliman R, Oke J, Sidhom I, et al. , 2023. Cost-effectiveness of childhood cancer treatment in Egypt: lessons to promote high-value care in a resource-limited setting based on real-world evidence. eClinicalMedicine, 55: 101729. 10.1016/j.eclinm.2022.101729 [DOI] [PMC free article] [PubMed] [Google Scholar]
Stabellini N, Cao LF, Towe CW, et al. , 2023. Validation of the PREDICT prognostication tool in US patients with breast cancer. J Natl Compr Canc Netw, 21(10): 1011-1019.e6. 10.6004/jnccn.2023.7048 [DOI] [PubMed] [Google Scholar]
Stein-O'Brien GL, Le DT, Jaffee EM, et al. , 2023. Converging on a cure: the roads to predictive immunotherapy. Cancer Discov, 13(5): 1053-1057. 10.1158/2159-8290.CD-23-0277 [DOI] [PMC free article] [PubMed] [Google Scholar]
Tang M, Pearson SA, Simes RJ, et al. , 2023. Harnessing real-world evidence to advance cancer research. Curr Oncol, 30(2): 1844-1859. 10.3390/curroncol30020143 [DOI] [PMC free article] [PubMed] [Google Scholar]
Terranova N, Venkatakrishnan K, 2024. Machine learning in modeling disease trajectory and treatment outcomes: an emerging enabler for model-informed precision medicine. Clin Pharmacol Ther, 115(4): 720-726. 10.1002/cpt.3153 [DOI] [PubMed] [Google Scholar]
Timmins IR, Jones ME, O'Brien KM, et al. , 2024. International pooled analysis of leisure-time physical activity and premenopausal breast cancer in women from 19 cohorts. J Clin Oncol, 42(8): 927-939. 10.1200/JCO.23.01101 [DOI] [PMC free article] [PubMed] [Google Scholar]
Topp MS, Gökbuget N, Stein AS, et al. , 2015. Safety and activity of blinatumomab for adult patients with relapsed or refractory B-precursor acute lymphoblastic leukaemia: a multicentre, single-arm, phase 2 study. Lancet Oncol, 16(1): 57-66. 10.1016/S1470-2045(14)71170-2 [DOI] [PubMed] [Google Scholar]
Tortora M, Cordelli E, Sicilia R, et al. , 2021. Deep reinforcement learning for fractionated radiotherapy in non-small cell lung carcinoma. Artif Intell Med, 119: 102137. 10.1016/j.artmed.2021.102137 [DOI] [PubMed] [Google Scholar]
Trentham-Dietz A, Chapman CH, Jayasekera J, et al. , 2024. Collaborative modeling to compare different breast cancer screening strategies: a decision analysis for the US preventive services task force. JAMA, 331(22): 1947-1960. 10.1001/jama.2023.24766 [DOI] [PMC free article] [PubMed] [Google Scholar]
Tsai CJ, Riaz N, Gomez SL, 2019. Big data in cancer research: real-world resources for precision oncology to improve cancer care delivery. Semin Radiat Oncol, 29(4): 306-310. 10.1016/j.semradonc.2019.05.002 [DOI] [PubMed] [Google Scholar]
U.S. Food and Drug Administration , 2023. Using the PICOTS Framework to Strengthen Evidence Gathered in Clinical Trials—Guidance from the AHRQ’s Evidence-based Practice Centers Program. U.S. Food and Drug Administration, Silver Spring. https://www.fda.gov/media/109448/download[accessed on Nov. 25, 2024]. [Google Scholar]
van den Puttelaar R, Meester RGS, Peterse EFP, et al. , 2023. Risk-stratified screening for colorectal cancer using genetic and environmental risk factors: a cost-effectiveness analysis based on real-world data. Clin Gastroenterol Hepatol, 21(13): 3415-3423.e29. 10.1016/j.cgh.2023.03.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
Veitch Z, Khan OF, Tilley D, et al. , 2019. Real-world outcomes of adjuvant chemotherapy for node-negative and node-positive HER2-positive breast cancer. J Natl Compr Canc Netw, 17(1): 47-56. 10.6004/jnccn.2018.7066 [DOI] [PubMed] [Google Scholar]
Velmovitsky PE, Bevilacqua T, Alencar P, et al. , 2021. Convergence of precision medicine and public health into precision public health: toward a big data perspective. Front Public Health, 9: 561873. 10.3389/fpubh.2021.561873 [DOI] [PMC free article] [PubMed] [Google Scholar]
Verkerk K, Voest EE, 2024. Generating and using real-world data: a worthwhile uphill battle. Cell, 187(7): 1636-1650. 10.1016/j.cell.2024.02.012 [DOI] [PubMed] [Google Scholar]
Voelker RA, 2023. Cervical cancer screening. JAMA, 330(20): 2030. 10.1001/jama.2023.21987 [DOI] [PubMed] [Google Scholar]
Vorisek CN, Lehne M, Klopfenstein SAI, et al. , 2022. Fast healthcare interoperability resources (FHIR) for interoperability in health research: systematic review. JMIR Med Inform, 10(7): e35724. 10.2196/35724 [DOI] [PMC free article] [PubMed] [Google Scholar]
Wang SV, Pinheiro S, Hua W, et al. , 2021. STaRT-RWE: structured template for planning and reporting on the implementation of real world evidence studies. BMJ, 372: m4856. 10.1136/bmj.m4856 [DOI] [PMC free article] [PubMed] [Google Scholar]
Wang X, Feng KX, Wang WY, et al. , 2022. Long-term outcomes of intraoperative radiotherapy for early-stage breast cancer in China: a multicenter real-world study. Cancer Commun, 42(3): 277-280. 10.1002/cac2.12258 [DOI] [PMC free article] [PubMed] [Google Scholar]
Wang YS, 2024. 123P Goserelin 10.8 mg and 3.6 mg depots in breast cancer (BC): a large real-world noninferiority study. ESMO Open, 9(S4): 103111. 10.1016/j.esmoop.2024.103111 [DOI] [Google Scholar]
Wedam S, Fashoyin-Aje L, Bloomquist E, et al. , 2020. FDA approval summary: palbociclib for male patients with metastatic breast cancer. Clin Cancer Res, 26(6): 1208-1212. 10.1158/1078-0432.CCR-19-2580 [DOI] [PubMed] [Google Scholar]
Wu SY, Li JW, Wang YJ, et al. , 2023. Clinical feasibility and oncological safety of non-radioactive targeted axillary dissection after neoadjuvant chemotherapy in biopsy-proven node-positive breast cancer: a prospective diagnostic and prognostic study. Int J Surg, 109(7): 1863-1870. 10.1097/JS9.0000000000000331 [DOI] [PMC free article] [PubMed] [Google Scholar]
Xu C, Song YH, Zhang D, et al. , 2023. Spatiotemporal knowledge teacher-student reinforcement learning to detect liver tumors without contrast agents. Med Image Anal, 90: 102980. 10.1016/j.media.2023.102980 [DOI] [PubMed] [Google Scholar]
Xu JY, Wu WK, Zhang X, et al. , 2024. The use of real-world evidence for regulatory decisions in China. Clin Pharmacol Ther, 116(1): 82-95. 10.1002/cpt.3257 [DOI] [PubMed] [Google Scholar]
Yala A, Mikhael PG, Lehman C, et al. , 2022. Optimizing risk-based breast cancer screening policies with reinforcement learning. Nat Med, 28(1): 136-143. 10.1038/s41591-021-01599-w [DOI] [PubMed] [Google Scholar]
Yan YT, Lv R, Wang TY, et al. , 2023. Real-world treatment patterns, discontinuation and clinical outcomes in patients with B-cell lymphoproliferative diseases treated with BTK inhibitors in China. Front Immunol, 14: 1184395. 10.3389/fimmu.2023.1184395 [DOI] [PMC free article] [PubMed] [Google Scholar]
Yim WW, Yetisgen M, Harris WP, et al. , 2016. Natural language processing in oncology: a review. JAMA Oncol, 2(6): 797-804. 10.1001/jamaoncol.2016.0213 [DOI] [PubMed] [Google Scholar]
Zhang JD, Wu JJ, Zhou XS, et al. , 2023. Recent advancements in artificial intelligence for breast cancer: image augmentation, segmentation, diagnosis, and prognosis approaches. Semin Cancer Biol, 96: 11-25. 10.1016/j.semcancer.2023.09.001 [DOI] [PubMed] [Google Scholar]
Zhao YF, Kosorok MR, Zeng DL, 2009. Reinforcement learning design for cancer clinical trials. Stat Med, 28(26): 3294-3315. 10.1002/sim.3720 [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhu R, Vora B, Menon S, et al. , 2023. Clinical pharmacology applications of real-world data and real-world evidence in drug development and approval‒an industry perspective. Clin Pharmacol Ther, 114(4): 751-767. 10.1002/cpt.2988 [DOI] [PubMed] [Google Scholar]

[r1] Agiro A, DeVries A, Malin J, et al. , 2018. Real-world impact of a decision support tool on colony-stimulating factor use and chemotherapy-induced febrile neutropenia among patients with breast cancer. J Natl Compr Canc Netw, 16(2): 162-169. 10.6004/jnccn.2017.7033 [DOI] [PubMed] [Google Scholar]

[r2] Anwar M, Chen QT, Ouyang DJ, et al. , 2021. Pyrotinib treatment in patients with HER2-positive metastatic breast cancer and brain metastasis: exploratory final analysis of real-world, multicenter data. Clin Cancer Res, 27(16): 4634-4641. 10.1158/1078-0432.CCR-21-0474 [DOI] [PMC free article] [PubMed] [Google Scholar]

[r3] Arlett P, Kjær J, Broich K, et al. , 2022. Real-world evidence in EU medicines regulation: enabling use and establishing value. Clin Pharmacol Ther, 111(1): 21-23. 10.1002/cpt.2479 [DOI] [PMC free article] [PubMed] [Google Scholar]

[r4] Arondekar B, Duh MS, Bhak RH, et al. , 2022. Real-world evidence in support of oncology product registration: a systematic review of new drug application and biologics license application approvals from 2015-2020. Clin Cancer Res, 28(1): 27-35. 10.1158/1078-0432.CCR-21-2639 [DOI] [PMC free article] [PubMed] [Google Scholar]

[r5] Bankhead P, Loughrey MB, Fernández JA, et al. , 2017. QuPath: open source software for digital pathology image analysis. Sci Rep, 7: 16878. 10.1038/s41598-017-17204-5 [DOI] [PMC free article] [PubMed] [Google Scholar]

[r6] Basulaiman B, Awan AA, Fergusson D, et al. , 2019. Creating a pragmatic trials program for breast cancer patients: rethinking clinical trials (REaCT). Breast Cancer Res Treat, 177(1): 93-101. 10.1007/s10549-019-05274-0 [DOI] [PubMed] [Google Scholar]

[r7] Baumfeld Andre E, Reynolds R, Caubel P, et al. , 2020. Trial designs using real-world data: the changing landscape of the regulatory approval process. Pharmacoepidemiol Drug Saf, 29(10): 1201-1212. 10.1002/pds.4932 [DOI] [PMC free article] [PubMed] [Google Scholar]

[r8] Berger M, Daniel G, Frank K, et al. , 2017. A Framework for Regulatory Use of Real-World Evidence. Duke Margolis Center for Health Policy, Washington, DC, USA. [Google Scholar]

[r9] Black N, 1996. Why we need observational studies to evaluate the effectiveness of health care. BMJ, 312(7040): 1215-1218. 10.1136/bmj.312.7040.1215 [DOI] [PMC free article] [PubMed] [Google Scholar]

[r10] Booth CM, Karim S, Mackillop WJ, 2019. Real-world data: towards achieving the achievable in cancer care. Nat Rev Clin Oncol, 16(5): 312-325. 10.1038/s41571-019-0167-7 [DOI] [PubMed] [Google Scholar]

[r11] Chakiryan NH, Jiang DD, Gillis KA, et al. , 2021. Real-world survival outcomes associated with first-line immunotherapy, targeted therapy, and combination therapy for metastatic clear cell renal cell carcinoma. JAMA Netw Open, 4(5): e2111329. 10.1001/jamanetworkopen.2021.11329 [DOI] [PMC free article] [PubMed] [Google Scholar]

[r12] Chen XX, Wang XM, Zhang K, et al. , 2022. Recent advances and clinical applications of deep learning in medical image analysis. Med Image Anal, 79: 102444. 10.1016/j.media.2022.102444 [DOI] [PMC free article] [PubMed] [Google Scholar]

[r13] Chlebowski RT, Hendrix SL, Langer RD, et al. , 2003. Influence of estrogen plus progestin on breast cancer and mammography in healthy postmenopausal women: the women’s health initiative randomized trial. JAMA, 289(24): 3243-3253. 10.1001/jama.289.24.3243 [DOI] [PubMed] [Google Scholar]

[r14] Cluze C, Rey D, Huiart L, et al. , 2012. Adjuvant endocrine therapy with tamoxifen in young women with breast cancer: determinants of interruptions vary over time. Ann Oncol, 23(4): 882-890. 10.1093/annonc/mdr330 [DOI] [PubMed] [Google Scholar]

[r15] Collaborative Group on Hormonal Factors in Breast Cancer , 1997. Breast cancer and hormone replacement therapy: collaborative reanalysis of data from 51 epidemiological studies of 52 705 women with breast cancer and 108 411 women without breast cancer. Lancet, 350(9084): 1047-1059. 10.1016/S0140-6736(97)08233-0 [DOI] [PubMed] [Google Scholar]

[r16] Collins R, Bowman L, Landray M, et al. , 2020. The magic of randomization versus the myth of real-world evidence. N Engl J Med, 382(7): 674-678. 10.1056/NEJMsb1901642 [DOI] [PubMed] [Google Scholar]

[r17] Cook JA, Collins GS, 2015. The rise of big clinical databases. Br J Surg, 102(2): e93-e101. 10.1002/bjs.9723 [DOI] [PubMed] [Google Scholar]

[r18] Corrigan-Curay J, 2018. Framework for FDA’s Real-World Evidence Program. U.S. Food and Drug Administration, Silver Spring. https://www.fda.gov/media/123160/download[accessed on Nov. 25, 2024]. [Google Scholar]

[r19] Dai WF, Beca JM, Nagamuthu C, et al. , 2022a. Comparative effectiveness and safety of pertuzumab and trastuzumab plus chemotherapy vs trastuzumab plus chemotherapy for treatment of metastatic breast cancer. JAMA Netw Open, 5(2): e2145460. 10.1001/jamanetworkopen.2021.45460 [DOI] [PMC free article] [PubMed] [Google Scholar]

[r20] Dai WF, Beca JM, Nagamuthu C, et al. , 2022b. Cost-effectiveness analysis of pertuzumab with trastuzumab in patients with metastatic breast cancer. JAMA Oncol, 8(4): 597-606. 10.1001/jamaoncol.2021.8049 [DOI] [PMC free article] [PubMed] [Google Scholar]

[r21] Derksen JWG, May AM, Koopman M, 2019. The era of alternative designs to connect randomized clinical trials and real-world data. Nat Rev Clin Oncol, 16(9): 589. 10.1038/s41571-019-0250-0 [DOI] [PubMed] [Google Scholar]

[r22] Diéras V, Miles D, Verma S, et al. , 2017. Trastuzumab emtansine versus capecitabine plus lapatinib in patients with previously treated HER2-positive advanced breast cancer (EMILIA): a descriptive analysis of final overall survival results from a randomised, open-label, phase 3 trial. Lancet Oncol, 18(6): 732-742. 10.1016/S1470-2045(17)30312-1 [DOI] [PMC free article] [PubMed] [Google Scholar]

[r23] Dinan MA, Wilson LE, Reed SD, 2019. Chemotherapy costs and 21-gene recurrence score genomic testing among medicare beneficiaries with early-stage breast cancer, 2005 to 2011. J Natl Compr Canc Netw, 17(3): 245-254. 10.6004/jnccn.2018.7097 [DOI] [PubMed] [Google Scholar]

[r24] Duggento A, Conti A, Mauriello A, et al. , 2021. Deep computational pathology in breast cancer. Semin Cancer Biol, 72: 226-237. 10.1016/j.semcancer.2020.08.006 [DOI] [PubMed] [Google Scholar]

[r25] Eggermont AMM, Apolone G, Baumann M, et al. , 2019. Cancer Core Europe: a translational research infrastructure for a European mission on cancer. Mol Oncol, 13(3): 521-527. 10.1002/1878-0261.12447 [DOI] [PMC free article] [PubMed] [Google Scholar]

[r26] Ethier JL, Desautels D, Robinson A, et al. , 2021. Practice patterns and outcomes of novel targeted agents for the treatment of ERBB2-positive metastatic breast cancer. JAMA Oncol, 7(9): e212140. 10.1001/jamaoncol.2021.2140 [DOI] [PMC free article] [PubMed] [Google Scholar]

[r27] Farkas AH, Nattinger AB, 2023. Breast cancer screening and prevention. Ann Intern Med, 176(11): ITC161-ITC176. 10.7326/AITC202311210 [DOI] [PubMed] [Google Scholar]

[r28] Flynn R, Plueschke K, Quinten C, et al. , 2022. Marketing authorization applications made to the European medicines agency in 2018‒2019: what was the contribution of real-world evidence? Clin Pharmacol Ther, 111(1): 90-97. 10.1002/cpt.2461 [DOI] [PMC free article] [PubMed] [Google Scholar]

[r29] Franklin JM, Schneeweiss S, 2017. When and how can real world data analyses substitute for randomized controlled trials? Clin Pharmacol Ther, 102(6): 924-933. 10.1002/cpt.857 [DOI] [PubMed] [Google Scholar]

[r30] Gennaro G, Bucchi L, Ravaioli A, et al. , 2024. The risk-based breast screening (RIBBS) study protocol: a personalized screening model for young women. Radiol Med, 129(5): 727-736. 10.1007/s11547-024-01797-9 [DOI] [PMC free article] [PubMed] [Google Scholar]

[r31] Global Burden of Disease Cancer Collaboration , 2019. Global, regional, and national cancer incidence, mortality, years of life lost, years lived with disability, and disability-adjusted life-years for 29 cancer groups, 1990 to 2017: a systematic analysis for the global burden of disease study. JAMA Oncol, 5(12): 1749-1768. 10.1001/jamaoncol.2019.2996 [DOI] [PMC free article] [PubMed] [Google Scholar]

[r32] Global Burden of Disease 2019 Cancer Collaboration , 2022. Cancer incidence, mortality, years of life lost, years lived with disability, and disability-adjusted life years for 29 cancer groups from 2010 to 2019: a systematic analysis for the global burden of disease study 2019. JAMA Oncol, 8(3): 420-444. 10.1001/jamaoncol.2021.6987 [DOI] [PMC free article] [PubMed] [Google Scholar]

[r33] Gross AJ, Pisano CE, Khunsriraksakul C, et al. , 2023. Real-world data: applications and relevance to cancer clinical trials. Semin Radiat Oncol, 33(4): 374-385. 10.1016/j.semradonc.2023.06.003 [DOI] [PubMed] [Google Scholar]

[r34] Hripcsak G, Duke JD, Shah NH, et al. , 2015. Observational health data sciences and informatics (OHDSI): opportunities for observational researchers. Stud Health Technol Inform, 216: 574-578. 10.3233/978-1-61499-564-7-574 [DOI] [PMC free article] [PubMed] [Google Scholar]

[r35] Hsu WH, Ko AT, Weng CS, et al. , 2023. Explainable machine learning model for predicting skeletal muscle loss during surgery and adjuvant chemotherapy in ovarian cancer. J Cachexia Sarcopenia Muscle, 14(5): 2044-2053. 10.1002/jcsm.13282 [DOI] [PMC free article] [PubMed] [Google Scholar]

[r36] Jarow JP, LaVange L, Woodcock J, 2017. Multidimensional evidence generation and FDA regulatory decision making: defining and using “real-world” data. JAMA, 318(8): 703-704. 10.1001/jama.2017.9991 [DOI] [PubMed] [Google Scholar]

[r37] Jazieh AR, Onal HC, Tan DSW, et al. , 2021. Real-world treatment patterns and clinical outcomes in patients with stage III NSCLC: results of KINDLE, a multicountry observational study. J Thorac Oncol, 16(10): 1733-1744. 10.1016/j.jtho.2021.05.003 [DOI] [PubMed] [Google Scholar]

[r38] Jiang XY, Hu ZJ, Wang SH, et al. , 2023. Deep learning for medical image-based cancer diagnosis. Cancers, 15(14): 3608. 10.3390/cancers15143608 [DOI] [PMC free article] [PubMed] [Google Scholar]

[r39] Kantarjian H, Stein A, Gökbuget N, et al. , 2017. Blinatumomab versus chemotherapy for advanced acute lymphoblastic leukemia. N Engl J Med, 376(9): 836-847. 10.1056/NEJMoa1609783 [DOI] [PMC free article] [PubMed] [Google Scholar]

[r40] Karakatsanis A, Markopoulos C, 2020. The challenge of avoiding over- and under-treatment in older women with ductal cancer in situ: a scoping review of existing knowledge gaps and a meta-analysis of real-world practice patterns. J Geriatr Oncol, 11(6): 917-925. 10.1016/j.jgo.2020.02.005 [DOI] [PubMed] [Google Scholar]

[r41] Kennedy-Martin T, Curtis S, Faries D, et al. , 2015. A literature review on the representativeness of randomized controlled trial samples and implications for the external validity of trial results. Trials, 16: 495. 10.1186/s13063-015-1023-4 [DOI] [PMC free article] [PubMed] [Google Scholar]

[r42] Khozin S, Blumenthal GM, Pazdur R, 2017. Real-world data for clinical evidence generation in oncology. J Natl Cancer Inst, 109(11): djx187. 10.1093/jnci/djx187 [DOI] [PubMed] [Google Scholar]

[r43] Kim HS, 2024. Dark data in real-world evidence: challenges, implications, and the imperative of data literacy in medical research. J Korean Med Sci, 39(9): e92. 10.3346/jkms.2024.39.e92 [DOI] [PMC free article] [PubMed] [Google Scholar]

[r44] Kirkham AA, Bland KA, Wollmann H, et al. , 2019. Maintenance of fitness and quality-of-life benefits from supervised exercise offered as supportive care for breast cancer. J Natl Compr Canc Netw, 17(6): 695-702. 10.6004/jnccn.2018.7276 [DOI] [PubMed] [Google Scholar]

[r45] Kleppe A, Skrede OJ, de Raedt S, et al. , 2021. Designing deep learning studies in cancer diagnostics. Nat Rev Cancer, 21(3): 199-211. 10.1038/s41568-020-00327-9 [DOI] [PubMed] [Google Scholar]

[r46] Kovačević A, Bašaragin B, Milošević N, et al. , 2024. De-identification of clinical free text using natural language processing: a systematic review of current approaches. Artif Intell Med, 151: 102845. 10.1016/j.artmed.2024.102845 [DOI] [PubMed] [Google Scholar]

[r47] Li XC, Zhang S, Zhang Q, et al. , 2019. Diagnosis of thyroid cancer using deep convolutional neural network models applied to sonographic images: a retrospective, multicohort, diagnostic study. Lancet Oncol, 20(2): 193-201. 10.1016/S1470-2045(18)30762-9 [DOI] [PMC free article] [PubMed] [Google Scholar]

[r48] Liao WQ, Coupland CAC, Burchardt J, et al. , 2023. Predicting the future risk of lung cancer: development, and internal and external validation of the CanPredict (lung) model in 19.67 million people and evaluation of model performance against seven other risk prediction models. Lancet Respir Med, 11(8): 685-697. 10.1016/S2213-2600(23)00050-4 [DOI] [PubMed] [Google Scholar]

[r49] Lin AP, Huang TW, Tam KW, 2021. Treatment of male breast cancer: meta-analysis of real-world evidence. Br J Surg, 108(9): 1034-1042. 10.1093/bjs/znab279 [DOI] [PubMed] [Google Scholar]

[r50] Lu YT, Chu Q, Li Z, et al. , 2024. Deep reinforcement learning identifies personalized intermittent androgen deprivation therapy for prostate cancer. Brief Bioinform, 25(2): bbae071. 10.1093/bib/bbae071 [DOI] [PMC free article] [PubMed] [Google Scholar]

[r51] Lu ZH, Sim JA, Wang JX, et al. , 2021. Natural language processing and machine learning methods to characterize unstructured patient-reported outcomes: validation study. J Med Internet Res, 23(11): e26777. 10.2196/26777 [DOI] [PMC free article] [PubMed] [Google Scholar]

[r52] Malmberg R, Zietse M, Dumoulin DW, et al. , 2022. Alternative dosing strategies for immune checkpoint inhibitors to improve cost-effectiveness: a special focus on nivolumab and pembrolizumab. Lancet Oncol, 23(12): e552-e561. 10.1016/S1470-2045(22)00554-X [DOI] [PubMed] [Google Scholar]

[r53] Malone DC, Brown M, Hurwitz JT, et al. , 2018. Real-world evidence: useful in the real world of us payer decision making? How? When? And What studies? Value Health, 21(3): 326-333. 10.1016/j.jval.2017.08.3013 [DOI] [PubMed] [Google Scholar]

[r54] Martin P, Cohen JB, Wang M, et al. , 2023. Treatment outcomes and roles of transplantation and maintenance rituximab in patients with previously untreated mantle cell lymphoma: results from large real-world cohorts. J Clin Oncol, 41(3): 541-554. 10.1200/JCO.21.02698 [DOI] [PMC free article] [PubMed] [Google Scholar]

[r55] McCarney R, Warner J, Iliffe S, et al. , 2007. The Hawthorne effect: a randomised, controlled trial. BMC Med Res Methodol, 7: 30. 10.1186/1471-2288-7-30 [DOI] [PMC free article] [PubMed] [Google Scholar]

[r56] Merola D, Campbell U, Gautam N, et al. , 2023. The aetion coalition to advance real-world evidence through randomized controlled trial emulation initiative: oncology. Clin Pharmacol Ther, 113(6): 1217-1222. 10.1002/cpt.2800 [DOI] [PubMed] [Google Scholar]

[r57] Miksad RA, Abernethy AP, 2018. Harnessing the power of real-world evidence (RWE): a checklist to ensure regulatory-grade data quality. Clin Pharmacol Ther, 103(2): 202-205. 10.1002/cpt.946 [DOI] [PMC free article] [PubMed] [Google Scholar]

[r58] Moore S, Leung B, Wu J, et al. , 2019. Real-world treatment of stage III NSCLC: the role of trimodality treatment in the era of immunotherapy. J Thorac Oncol, 14(8): 1430-1439. 10.1016/j.jtho.2019.04.005 [DOI] [PubMed] [Google Scholar]

[r59] Newman LA, 2022. Breast cancer screening in low and middle-income countries. Best Pract Res Clin Obstet Gynaecol, 83: 15-23. 10.1016/j.bpobgyn.2022.03.018 [DOI] [PubMed] [Google Scholar]

[r60] Núñez ER, Caverly TJ, Zhang SQ, et al. , 2021. Adherence to follow-up testing recommendations in US veterans screened for lung cancer, 2015-2019. JAMA Netw Open, 4(7): e2116233. 10.1001/jamanetworkopen.2021.16233 [DOI] [PMC free article] [PubMed] [Google Scholar]

[r61] Patil VM, Noronha V, Menon N, et al. , 2023. Low-dose immunotherapy in head and neck cancer: a randomized study. J Clin Oncol, 41(2): 222-232. 10.1200/JCO.22.01015 [DOI] [PubMed] [Google Scholar]

[r62] Penberthy LT, Rivera DR, Lund JL, et al. , 2022. An overview of real-world data sources for oncology and considerations for research. CA Cancer J Clin, 72(3): 287-300. 10.3322/caac.21714 [DOI] [PubMed] [Google Scholar]

[r63] Perez-Lopez R, Ghaffari Laleh N, Mahmood F, et al. , 2024. A guide to artificial intelligence for cancer researchers. Nat Rev Cancer, 24(6): 427-441. 10.1038/s41568-024-00694-7 [DOI] [PubMed] [Google Scholar]

[r64] Poirion OB, Jing Z, Chaudhary K, et al. , 2021. DeepProg: an ensemble of deep-learning and machine-learning models for prognosis prediction using multi-omics data. Genome Med, 13: 112. 10.1186/s13073-021-00930-x [DOI] [PMC free article] [PubMed] [Google Scholar]

[r65] Pollard S, Weymann D, Chan B, et al. , 2022. Defining a core data set for the economic evaluation of precision oncology. Value Health, 25(8): 1371-1380. 10.1016/j.jval.2022.01.005 [DOI] [PubMed] [Google Scholar]

[r66] Prelaj A, Miskovic V, Zanitti M, et al. , 2024. Artificial intelligence for predictive biomarker discovery in immuno-oncology: a systematic review. Ann Oncol, 35(1): 29-65. 10.1016/j.annonc.2023.10.125 [DOI] [PubMed] [Google Scholar]

[r67] Przepiorka D, Ko CW, Deisseroth A, et al. , 2015. FDA approval: blinatumomab. Clin Cancer Res, 21(18): 4035-4039. 10.1158/1078-0432.CCR-15-0612 [DOI] [PubMed] [Google Scholar]

[r68] Qaiser T, Rajpoot NM, 2019. Learning where to see: a novel attention model for automated immunohistochemical scoring. IEEE Trans Med Imaging, 38(11): 2620-2631. 10.1109/TMI.2019.2907049 [DOI] [PubMed] [Google Scholar]

[r69] Roshandel G, Badar F, Barchuk A, et al. , 2023. REPCAN: guideline for REporting population-based CANcer registry data. Asian Pac J Cancer Prev, 24(9): 3297-3303. 10.31557/APJCP.2023.24.9.3297 [DOI] [PMC free article] [PubMed] [Google Scholar]

[r70] Samlowski W, Robert NJ, Chen LW, et al. , 2023. Real-world nivolumab dosing patterns and safety outcomes in patients receiving adjuvant therapy for melanoma. Cancer Med, 12(3): 2378-2388. 10.1002/cam4.5061 [DOI] [PMC free article] [PubMed] [Google Scholar]

[r71] Sherman RE, Anderson SA, Dal Pan GJ, et al. , 2016. Real-world evidence — what is it and what can it tell us? N Engl J Med, 375(23): 2293-2297. 10.1056/NEJMsb1609216 [DOI] [PubMed] [Google Scholar]

[r72] Sim JA, Huang XL, Horan MR, et al. , 2023. Natural language processing with machine learning methods to analyze unstructured patient-reported outcomes derived from electronic health records: a systematic review. Artif Intell Med, 146: 102701. 10.1016/j.artmed.2023.102701 [DOI] [PMC free article] [PubMed] [Google Scholar]

[r73] Sinha S, Laskar SG, Wadasadawala T, et al. , 2022. Adopting health economic research in radiation oncology: a perspective from low- or middle-income countries. JCO Glob Oncol, 8: e2100374. 10.1200/GO.21.00374 [DOI] [PMC free article] [PubMed] [Google Scholar]

[r74] Skovlund E, Leufkens HGM, Smyth JF, 2018. The use of real-world data in cancer drug development. Eur J Cancer, 101: 69-76. 10.1016/j.ejca.2018.06.036 [DOI] [PubMed] [Google Scholar]

[r75] Soliman R, Oke J, Sidhom I, et al. , 2023. Cost-effectiveness of childhood cancer treatment in Egypt: lessons to promote high-value care in a resource-limited setting based on real-world evidence. eClinicalMedicine, 55: 101729. 10.1016/j.eclinm.2022.101729 [DOI] [PMC free article] [PubMed] [Google Scholar]

[r76] Stabellini N, Cao LF, Towe CW, et al. , 2023. Validation of the PREDICT prognostication tool in US patients with breast cancer. J Natl Compr Canc Netw, 21(10): 1011-1019.e6. 10.6004/jnccn.2023.7048 [DOI] [PubMed] [Google Scholar]

[r77] Stein-O'Brien GL, Le DT, Jaffee EM, et al. , 2023. Converging on a cure: the roads to predictive immunotherapy. Cancer Discov, 13(5): 1053-1057. 10.1158/2159-8290.CD-23-0277 [DOI] [PMC free article] [PubMed] [Google Scholar]

[r78] Tang M, Pearson SA, Simes RJ, et al. , 2023. Harnessing real-world evidence to advance cancer research. Curr Oncol, 30(2): 1844-1859. 10.3390/curroncol30020143 [DOI] [PMC free article] [PubMed] [Google Scholar]

[r79] Terranova N, Venkatakrishnan K, 2024. Machine learning in modeling disease trajectory and treatment outcomes: an emerging enabler for model-informed precision medicine. Clin Pharmacol Ther, 115(4): 720-726. 10.1002/cpt.3153 [DOI] [PubMed] [Google Scholar]

[r80] Timmins IR, Jones ME, O'Brien KM, et al. , 2024. International pooled analysis of leisure-time physical activity and premenopausal breast cancer in women from 19 cohorts. J Clin Oncol, 42(8): 927-939. 10.1200/JCO.23.01101 [DOI] [PMC free article] [PubMed] [Google Scholar]

[r81] Topp MS, Gökbuget N, Stein AS, et al. , 2015. Safety and activity of blinatumomab for adult patients with relapsed or refractory B-precursor acute lymphoblastic leukaemia: a multicentre, single-arm, phase 2 study. Lancet Oncol, 16(1): 57-66. 10.1016/S1470-2045(14)71170-2 [DOI] [PubMed] [Google Scholar]

[r82] Tortora M, Cordelli E, Sicilia R, et al. , 2021. Deep reinforcement learning for fractionated radiotherapy in non-small cell lung carcinoma. Artif Intell Med, 119: 102137. 10.1016/j.artmed.2021.102137 [DOI] [PubMed] [Google Scholar]

[r83] Trentham-Dietz A, Chapman CH, Jayasekera J, et al. , 2024. Collaborative modeling to compare different breast cancer screening strategies: a decision analysis for the US preventive services task force. JAMA, 331(22): 1947-1960. 10.1001/jama.2023.24766 [DOI] [PMC free article] [PubMed] [Google Scholar]

[r84] Tsai CJ, Riaz N, Gomez SL, 2019. Big data in cancer research: real-world resources for precision oncology to improve cancer care delivery. Semin Radiat Oncol, 29(4): 306-310. 10.1016/j.semradonc.2019.05.002 [DOI] [PubMed] [Google Scholar]

[r85] U.S. Food and Drug Administration , 2023. Using the PICOTS Framework to Strengthen Evidence Gathered in Clinical Trials—Guidance from the AHRQ’s Evidence-based Practice Centers Program. U.S. Food and Drug Administration, Silver Spring. https://www.fda.gov/media/109448/download[accessed on Nov. 25, 2024]. [Google Scholar]

[r86] van den Puttelaar R, Meester RGS, Peterse EFP, et al. , 2023. Risk-stratified screening for colorectal cancer using genetic and environmental risk factors: a cost-effectiveness analysis based on real-world data. Clin Gastroenterol Hepatol, 21(13): 3415-3423.e29. 10.1016/j.cgh.2023.03.003 [DOI] [PMC free article] [PubMed] [Google Scholar]

[r87] Veitch Z, Khan OF, Tilley D, et al. , 2019. Real-world outcomes of adjuvant chemotherapy for node-negative and node-positive HER2-positive breast cancer. J Natl Compr Canc Netw, 17(1): 47-56. 10.6004/jnccn.2018.7066 [DOI] [PubMed] [Google Scholar]

[r88] Velmovitsky PE, Bevilacqua T, Alencar P, et al. , 2021. Convergence of precision medicine and public health into precision public health: toward a big data perspective. Front Public Health, 9: 561873. 10.3389/fpubh.2021.561873 [DOI] [PMC free article] [PubMed] [Google Scholar]

[r89] Verkerk K, Voest EE, 2024. Generating and using real-world data: a worthwhile uphill battle. Cell, 187(7): 1636-1650. 10.1016/j.cell.2024.02.012 [DOI] [PubMed] [Google Scholar]

[r90] Voelker RA, 2023. Cervical cancer screening. JAMA, 330(20): 2030. 10.1001/jama.2023.21987 [DOI] [PubMed] [Google Scholar]

[r91] Vorisek CN, Lehne M, Klopfenstein SAI, et al. , 2022. Fast healthcare interoperability resources (FHIR) for interoperability in health research: systematic review. JMIR Med Inform, 10(7): e35724. 10.2196/35724 [DOI] [PMC free article] [PubMed] [Google Scholar]

[r92] Wang SV, Pinheiro S, Hua W, et al. , 2021. STaRT-RWE: structured template for planning and reporting on the implementation of real world evidence studies. BMJ, 372: m4856. 10.1136/bmj.m4856 [DOI] [PMC free article] [PubMed] [Google Scholar]

[r93] Wang X, Feng KX, Wang WY, et al. , 2022. Long-term outcomes of intraoperative radiotherapy for early-stage breast cancer in China: a multicenter real-world study. Cancer Commun, 42(3): 277-280. 10.1002/cac2.12258 [DOI] [PMC free article] [PubMed] [Google Scholar]

[r94] Wang YS, 2024. 123P Goserelin 10.8 mg and 3.6 mg depots in breast cancer (BC): a large real-world noninferiority study. ESMO Open, 9(S4): 103111. 10.1016/j.esmoop.2024.103111 [DOI] [Google Scholar]

[r95] Wedam S, Fashoyin-Aje L, Bloomquist E, et al. , 2020. FDA approval summary: palbociclib for male patients with metastatic breast cancer. Clin Cancer Res, 26(6): 1208-1212. 10.1158/1078-0432.CCR-19-2580 [DOI] [PubMed] [Google Scholar]

[r96] Wu SY, Li JW, Wang YJ, et al. , 2023. Clinical feasibility and oncological safety of non-radioactive targeted axillary dissection after neoadjuvant chemotherapy in biopsy-proven node-positive breast cancer: a prospective diagnostic and prognostic study. Int J Surg, 109(7): 1863-1870. 10.1097/JS9.0000000000000331 [DOI] [PMC free article] [PubMed] [Google Scholar]

[r97] Xu C, Song YH, Zhang D, et al. , 2023. Spatiotemporal knowledge teacher-student reinforcement learning to detect liver tumors without contrast agents. Med Image Anal, 90: 102980. 10.1016/j.media.2023.102980 [DOI] [PubMed] [Google Scholar]

[r98] Xu JY, Wu WK, Zhang X, et al. , 2024. The use of real-world evidence for regulatory decisions in China. Clin Pharmacol Ther, 116(1): 82-95. 10.1002/cpt.3257 [DOI] [PubMed] [Google Scholar]

[r99] Yala A, Mikhael PG, Lehman C, et al. , 2022. Optimizing risk-based breast cancer screening policies with reinforcement learning. Nat Med, 28(1): 136-143. 10.1038/s41591-021-01599-w [DOI] [PubMed] [Google Scholar]

[r100] Yan YT, Lv R, Wang TY, et al. , 2023. Real-world treatment patterns, discontinuation and clinical outcomes in patients with B-cell lymphoproliferative diseases treated with BTK inhibitors in China. Front Immunol, 14: 1184395. 10.3389/fimmu.2023.1184395 [DOI] [PMC free article] [PubMed] [Google Scholar]

[r101] Yim WW, Yetisgen M, Harris WP, et al. , 2016. Natural language processing in oncology: a review. JAMA Oncol, 2(6): 797-804. 10.1001/jamaoncol.2016.0213 [DOI] [PubMed] [Google Scholar]

[r102] Zhang JD, Wu JJ, Zhou XS, et al. , 2023. Recent advancements in artificial intelligence for breast cancer: image augmentation, segmentation, diagnosis, and prognosis approaches. Semin Cancer Biol, 96: 11-25. 10.1016/j.semcancer.2023.09.001 [DOI] [PubMed] [Google Scholar]

[r103] Zhao YF, Kosorok MR, Zeng DL, 2009. Reinforcement learning design for cancer clinical trials. Stat Med, 28(26): 3294-3315. 10.1002/sim.3720 [DOI] [PMC free article] [PubMed] [Google Scholar]

[r104] Zhu R, Vora B, Menon S, et al. , 2023. Clinical pharmacology applications of real-world data and real-world evidence in drug development and approval‒an industry perspective. Clin Pharmacol Ther, 114(4): 751-767. 10.1002/cpt.2988 [DOI] [PubMed] [Google Scholar]

PERMALINK

Real-world data and evidence: pioneering frontiers in precision oncology

真实世界数据与证据：精准肿瘤学的革新领域

Jingxin JIANG

Weiwei PAN

Liyang SUN

Liwei PANG

Hailang CHEN

Jian HUANG

Wuzhen CHEN

Abstract

Abstract

1. Introduction

2. Real-world studies in oncology

2.1. Insights into disease epidemiology and burden

2.2. Assessment of treatment modalities and outcomes through real-world studies

2.3. Economic evaluation of cancer treatments through real-world studies

3. Real-world data in oncology

3.1. Characteristics of real-world data in oncology

Fig. 1. Process of real-world data to real-world evidence (created with Biorender.com).

Table 1.

3.2. Real-world data application in real-world precision oncology research

3.2.1. Enhancing screening programs

3.2.2. Advancing diagnostic accuracy and treatment optimization

3.2.3. Accelerating drug development

3.2.4. Applying RWD/RWSs in breast cancer research

4. Design essentials for real-world studies

4.1. Articulating research questions using the PICOTS framework

Fig. 2. Design essentials for real-world study (created with Biorender.com).

4.2. Identifying the appropriate data source

Table 2.

4.3. Ensuring data collection quality

4.4. Confounding factors: identification and control

5. Application of AI in real-world data analysis

5.1. Machine-learning algorithms

5.2. Natural language processing

5.3. Reinforcement learning

5.4. Challenges and considerations

6. Challenges and advancements of real-world studies in oncology

7. Conclusions

Acknowledgments

Author contributions

Compliance with ethics guidelines

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases