Characterisation of oncology EHR-derived real-world data in the UK, Germany, and Japan

B Adamson; E Horne; C Xu; A Samani; C Buhl; P Mpofu; H Pittell; Q Zhang; D Ng; K Seidl-Rathkopf; N Schinwald; E Tajima; A Sujenthiran

doi:10.1016/j.esmorw.2025.100113

. 2025 Feb 7;7:100113. doi: 10.1016/j.esmorw.2025.100113

Characterisation of oncology EHR-derived real-world data in the UK, Germany, and Japan

B Adamson ^1,^∗, E Horne ², C Xu ¹, A Samani ², C Buhl ³, P Mpofu ¹, H Pittell ¹, Q Zhang ¹, D Ng ⁴, K Seidl-Rathkopf ³, N Schinwald ³, E Tajima ⁴, A Sujenthiran ²

PMCID: PMC12836611 PMID: 41647352

Abstract

Oncology real-world data (RWD) from electronic health records (EHRs) can generate real-world evidence crucial for clinical and policy decisions. However, challenges in accessing and harmonising data across countries limit its global applicability. Flatiron Health has developed oncology EHR-derived RWD datasets in the UK, Germany, and Japan, integrating these with its United States database to support multinational studies. This paper outlines the methods, aligned with the ISPOR EHR-derived data SUITABILITY checklist, to ensure the trustworthiness, reliability, and relevance of these datasets.

Key words: electronic health records, oncology, Germany, UK, Japan, biomarkers

Highlights

•
We created a multinational oncology dataset spanning four countries.
•
We developed disease-specific common data models.
•
We used a robust process for curating structured and unstructured EHR-derived data.
•
Clinically meaningful variables are standardised across countries for pooling.
•
A trusted research environment secures patient-level data access for global studies.

Introduction

In oncology, electronic health records (EHRs) are a vital source for generating real-world evidence (RWE) to inform clinical decisions and policy.¹ However, challenges persist in ensuring the clinical depth, consistency, and country-level comparability of such datasets, particularly outside the USA. While United States real-world data (RWD) may be more readily accessible, other parts of the world often find local data necessary for national decision making.²^,³ Additionally, multinational RWE is crucial for contextualising global clinical trials.⁴

EHR-derived data offer rich clinical details, but accessing and harmonising these data across different systems is technically challenging. A recent literature review of RWD sources in oncology in five large European countries found that standard or syndicated RWD was generally not used as 91 of 101 studies identified used RWD that was generated or curated specifically for the objectives reported by the publication.⁵ Continuous updates are essential for tracking patient outcomes and treatment effects longitudinally.⁶^,⁷ Delays in data access can slow insights, and patient-level data are often difficult to obtain. Other RWD sources, such as claims and registries, lack the clinical depth and consistency needed for impactful research in oncology. The challenge intensifies when comparing or pooling data across countries. Initiatives including DARWIN EU have advanced standardisation and harmonisation efforts across countries, yet to date have not published case studies of oncology treatment comparative effectiveness.⁸ To address this, disease-specific common data models are required to harmonise data across nations. These models enhance understanding of how treatment patterns and outcomes vary globally and support the creation of real-world control arms for multi-country clinical trials.

Flatiron Health has expanded to create oncology EHR-derived RWD in the UK, Germany, and Japan. The datasets are built to harmonise with more than 3.8 million patients in the United States Flatiron Health Research Database, described previously.⁹

The objective of this article is to provide a detailed perspective on the infrastructure and processes developed to curate these multinational oncology datasets, ensuring they are de-identified or anonymised in compliance with local data privacy regulations. The content and organisation of this article corresponds to the ISPOR EHR-derived data SUITABILITY checklist.¹⁰ The first section provides a complete understanding of the data and an assessment of its trustworthiness by describing (i) data characteristics, (ii) data provenance, and (iii) data governance. The second section describes fitness for purpose with (i) data reliability items (i.e. accuracy and completeness) and (ii) data relevance items. Ultimately, this work has the potential to accelerate the development of therapies, improve treatment strategies, and most importantly, benefit patients worldwide by ensuring that high-quality RWE informs clinical decisions and health policies on a global scale.

Characterisation

EHR sources

The geographic coverage of the multinational oncology data described here includes the UK, Germany, and Japan (Figure 1). Flatiron Health has growing multi-site oncology networks in each country. The origins of key data elements are documentation by health care professionals for oncology care. The health care site partners deliver oncology care according to the local clinical pathways and routine standard of care in the corresponding country. In the UK this network includes multiple National Health Service (NHS) trusts. In Germany the healthcare network includes a combination of health care providers, reflecting different levels of care in inpatient and outpatient settings. In Japan, the healthcare network includes large cancer centres and university hospitals that are the main providers of oncology care in Japan. Additional new sites across the UK, Germany, and Japan are being continually integrated into the network, thereby increasing patient counts in the resulting datasets. Sites are selected to provide sufficient geographic coverage and demographic variability for representativeness.

**Overview of the process for electronic health record-derived real-world data curation.** EHR, electronic health record.

EHR software varies by country and site, requiring custom technical integrations to access various data sources, often from multiple subsystems. This can include cloud and on-premises EHR vendors such as Epic, Cerner, or Fujitsu as well as home-grown locally deployed solutions. These data sources include both structured data (e.g. demographics, laboratory results, medication orders) and unstructured data (e.g. clinician notes, radiology and pathology reports). Filtering is carried out at the hospital such that only consented and non-opted-out patients are transferred to Flatiron Health. In Japan, data are depersonalized on-premises before being transferred to our cloud environments. Additional data sources (e.g. mortality data from third-party sources) may be linked to the EHR data after its transfer to Flatiron Health to improve completeness in compliance with the legal and ethical requirements of each country. This challenge requires us to deploy specialised teams to partner with each site and facilitate safe and secure data flow between the site EHR system and Flatiron Health.

Time period

The period covered by the EHR-derived oncology RWD is generally January 2015 to the present, with ongoing prospective cohort follow-up. In some disease types the historical data extend before 2015. New patients are added frequently and in line with local privacy requirements, increasing the sample size. The recency of each dataset is 90 days. New sites are being added over time. When there are significant changes to the EHR during the data-processing period (e.g. addition or removal of data elements over time, changes to the coding system, or software updates), they are identified and adapted in the transformation pipelines.

Common data models

We developed a set of common data models for different cancer types (e.g. breast, lung, colorectal, prostate, multiple myeloma, etc.) to achieve the depth of disease-specific clinically meaningful variables. The disease-specific data models are shared across the UK, Germany, and Japan to enable pooling of patient-level data for multinational studies. A core set of variables are used in all cancer types (e.g. birth year, birth sex, ECOG performance status, laboratory results), while other variables are relevant to specific cancer types (e.g. menopausal status for breast cancer, castrate-resistant status for prostate cancer, smoking status for lung cancer). Some variables are found in multiple data models (e.g. disease stage, histology) but have the definitions of category levels unique to the cancer type. The line of therapy table consists of a set of derived variables defined uniquely for each cancer type based on input from local oncologists at Flatiron Health. While the line of therapy table format is consistent across cancer types (e.g. line number, line name, start date, end date, etc.), the derivation algorithms and definitions are customised for each cancer type, and account for variation in clinical guidelines in different countries when appropriate.

Structured data processing

The processing of structured data from the EHR source standardises and cleans data elements for the data models. This can include demographic information, laboratory test results, and medications. To provide an illustrative example for readers, we explain in the following text how structured data processing is completed for medications. The process of mapping drugs from structured codes to create a cleaned medication table involves several steps, which are designed to standardise and harmonise drug information across different EHR data sources. Within a country, this mapping is typically implemented according to the steps outlined in Box 1.

Box 1. Process for mapping drugs from structured codes to create a cleaned medication table.

1.
Source data and structured codes. Structured codes such as RxNorm, Anatomical Therapeutic Chemical (ATC), or local drug codes are used to identify medications in the source data. These codes are often derived from electronic health records or other structured data sources.
2.
Mapping to standardised terminologies. The structured codes from the source data are mapped to standardised drug terminologies such as RxNorm. This ensures that drugs are consistently categorised across different datasets and geographies. This mapping is limited to a list of anticancer drugs that are approved and marketed in the country.
3.
Handling missing or unmapped drugs. In some cases, certain drugs may not have a direct mapping in the standardised terminology. For example, some RxNorm codes used in Japan may not have corresponding entries in the target mappings, leading to null values for drug names and categories. To address this, teams may manually review and add missing mappings.
4.
Drug category assignment. Once the drugs are mapped to standardised codes, they are assigned to specific drug categories (e.g. antineoplastic, non-antineoplastic). This categorisation is crucial for filtering relevant drugs, especially in oncology datasets.
5.
Building the cleaned medication table. After mapping the drugs to standardised codes and assigning categories, the cleaned medication table is built. This table typically includes fields such as: standardised drug name, drug category, harmonised dose and units, route of administration, (e.g. oral, i.v.), and start and end dates.
6.
Validation and harmonisation. The final step involves validating the mappings and ensuring that the cleaned medication table is harmonised across different countries. Replicas of the terminology database are refreshed frequently, and all mappings are up to date before we use them to build the disease cohort data tables.
7.
Use in analytical pipelines. The cleaned medication table is then used in various analytical pipelines, such as deriving lines of therapy, filtering drugs for specific analyses, or generating real-world evidence for regulatory submissions.

Open in a new tab

The mapping process for drugs involves extracting structured codes, mapping them to standardised terminologies, handling missing mappings, assigning drug categories, and building a cleaned medication table that is validated and harmonised across datasets. The process described above ensures consistency and accuracy in drug data, which is critical for downstream analyses and regulatory submissions. Clinical study drugs are masked during harmonisation. Oral drugs are additionally transferred from the patient chart and added to medication tables as they are often not included in structured orders or administrations within the EHR systems, and not doing so would result in systematic omission of oral drugs. This is an example of unstructured data processing, which is described in detail later.

It is exponentially more difficult to process data recorded in multiple languages (e.g. Japanese and English) as mapping to one set of terms is more complex. To solve this, we carry out this work manually.

Our data-processing approach is standard across countries, with adaptations to account for local differences. For example, German EHR systems are particularly fragmented, with data being sourced from a mix of community oncology sites and larger hospitals. This fragmentation necessitates additional harmonisation efforts to standardise local coding systems, such as converting German laboratory test codes to global standards like LOINC.

Unstructured data processing

Critical information needed to reconstruct an oncology patient’s care journey for research is frequently only found within unstructured data. Unstructured data can include the text of clinician and visit notes; pathology, laboratory, and radiology reports; and radiology reports and other free-text documents located across the EHR and other adjacent clinical systems. Abstraction is a process by which information from unstructured data is reviewed and processed into a structured format (e.g. the result of a biomarker test obtained from a pathology report, date of a diagnosis obtained from a clinician note) for secondary use based on standardised processes. The abstracted data are curated using technology-enabled human abstraction (i.e. experienced and trained in-country medical professionals using Flatiron Health's proprietary, technology-enabled abstraction software). In this custom software, Flatiron Health medical professionals (‘abstractors’) can securely view unstructured data alongside the disease-specific abstraction form with standardised policies and procedures accompanying each data element. The custom software also allows for toggling the language of abstraction forms between English, German, and Japanese.

The abstraction process in Japan is conducted in Japanese, the abstraction process in Germany is conducted in German, and then local clinical experts ensure that the data are accurately translated and harmonised with the Flatiron Health global data models into American English. Small differences between the British versus American spellings, such as ‘tumour’ or ‘tumor’, can cause problems and are accounted for in processing and harmonisation. The unstructured data are processed to ensure that key variables such as treatment start and end dates, biomarker results, and endpoints (e.g. disease progression) are accurately captured. The methods for achieving accuracy in unstructured data processing are described later in the Accuracy section.

Provenance

Provenance is critical for ensuring the authenticity, reliability, and trustworthiness of health care data. Before delivery of a final dataset into a so-called ‘trusted research environment’ (TRE; described below), we maintain provenance tables and a process that allows for regulatory-grade traceability back to the original EHR sources.

In Japan, for example, the process includes the following:

(i)
Obtaining informed consent from eligible patients at site.
(ii)
De-personalising the relevant EHR records (e.g. removing name and phone number, conversion of patient ID to research ID, adjusting birthday to year and month) at site.
(iii)
Securing transfer from site to an in-country Flatiron Health data-processing environment.
(iv)
Structured data processing including harmonisation of Japanese laboratory test codes and medication codes to global standards (e.g. RxNorm and LOINC).
(v)
Expert clinical abstraction of records in Japanese following policies and procedures developed and validated by local Japanese clinical data experts to achieve concept validity with Flatiron Health United States variable definitions and data models.
(vi)
Standardisation into English.
(vii)
De-identification for third-party use (e.g. further conversion of research ID to external ID for third-party provision).
(viii)
Delivery of patient-level de-identified EHR-derived data into a TRE.

Governance

The Flatiron Health multinational entities in the UK, Germany, and Japan are responsible for locally processing and ensuring compliance with local data governance regulations, with processes including ethical approvals, data privacy protocols, de-identification processes, and anonymisation strategies tailored to each jurisdiction. In each country, we established a compliant legal framework and ethical basis for the processing of patient-level data. All study analyses are carried out under relevant ethical guidelines and regulations.

In the UK, this work includes data provided by patients and collected by the NHS as part of their care and support. UK Research Ethics Committee (REC) (Ref: 22/YH/0211) and Confidential Advisory Group (CAG) (Ref: 23/CAG/0015) approvals are in place to allow the processing of data using a national and local opt-out model, and under the UK General Data Protection Regulation (GDPR).¹¹ Individual-level data are processed only within the UK and anonymised to protect patient privacy inter alia through the designation of new patient identification numbers, deletion of personally identifiable information (e.g. name, address, residential postal code, etc.), and the random shifting of dates while maintaining the original intervals between dates. Thus, end-users are only able to access anonymised data within the TRE for all analyses.

In Germany, we operate under the relevant legal bases for research under the GDPR and applicable federal and regional data protection laws. Our initial patient cohort in Germany has signed informed consent and the new national legislation on health data use has further facilitated access to and processing of health data for research. The locally hosted data are first pseudonymised and then anonymised before being made available for research purposes. The Flatiron Health TRE ensures that the anonymised patient data are accessed by researchers only in a secure environment that strictly controls data exports.

In Japan, in line with relevant local legal and ethical requirements including the Ethical Guidelines for Medical and Biological Research Involving Human Subjects (‘Ethical Guidelines’) and the Act on the Protection of Personal Information, all data collection and curation is approved by an institutional review board at each partner site. All patients in Japan whose treatment data is incorporated into Flatiron Health datasets have provided consent through appropriate mechanisms. Individual-level data are de-identified to protect patient privacy inter alia through the designation of new patient identification numbers and the deletion, replacement, and adjustment of personally identifiable information (name, address, and residential postal code, etc.). Thus, end-users are only able to access de-identified data for their analyses, via the Flatiron Health single TRE interface, described in more detail later. Study analyses described below were conducted in accordance with the ethical guidelines and were approved by the Public Health Research Foundation Ethical Review Committee (approval date: 25 September 2024; approval number: 24G003).

Analysis in the TRE

The Flatiron Health TRE, powered by Lifebit’s technology and securely hosted in-country, ensures that the combined analysis of patient-level data across multiple countries occurs under stringent data security and privacy controls, maintaining data integrity while enabling comprehensive insights. The patient-level data from each country, which are de-identified or anonymised in line with local data privacy regulations, are combined and accessed by users for analysis (e.g. using R, SQL, or Python) in a TRE enabled by Lifebit. The TRE allows for combined analysis of patient-level data across countries including Japan, Germany, the UK, and the USA (e.g. pooling or matching). The users can download summarised results locally after an analysis is completed and reviewed by a data privacy expert. Security prevents local downloads of any individual-patient-level data to preserve and protect patient privacy.

Fitness for purpose

By curating RWD in a coordinated way across countries, we create consistency in definitions, decisions, and cut-off dates for the purpose of high-quality multinational studies of real-world treatment effectiveness. The fitness for purpose of Flatiron Health multinational datasets has already been demonstrated by their use in two recently published studies. The first is a study of patients diagnosed with non-small-cell lung cancer (NSCLC) and breast cancer from 2016 to 2024 in Germany and the UK that describes the prevalence rates of actionable biomarker mutations available to guide treatment decision making.¹² The second is a study of patients diagnosed with gastric and colorectal cancer in Japan and the USA, and describes epidemiological differences between the cohorts.¹³

We assess the fitness for purpose of the multinational Flatiron Health Research Database more generally in the following text, using the ISPOR SUITABILITY checklist as a framework.¹⁰

Reliability

Accuracy

Data accuracy is ensured across all curated data in the UK, Germany, and Japan through a number of processes. Abstractors are rigorously trained and certified. This includes extensive abstractor staff training on the policies and procedures for abstraction of each unstructured variable. Abstraction form validators prevent submission of illogical data at point of abstraction. A series of quality assurance (QA) and quality control processes were developed to minimise errors during data transformation and curation. Recurring QA procedures are set up to ensure the reliability of the data and were developed by a cross-functional team of clinicians, epidemiologists, and engineers for each disease.

The QA process is carried out at the patient level and cohort level for each data refresh. Patient-level QA is applied to abstracted data and flags instances in which a patient’s abstracted information may not make clinical sense (e.g. pathological staging recorded for a patient who has not received surgery). When a patient-level QA flag is raised, a clinical team member reopens the patient’s record to verify that the information has been abstracted correctly and corrects it if not. Patient-level QA is an iterative process. Cohort-level QA is carried out on the final curated datasets to summarise the disease-specific cohort as a whole. It is reviewed by Flatiron Health oncologists to ensure clinical plausibility. This cohort-level QA process is also used to track the size and representativeness of the cohort, the distribution of variables, and to inform analytic guidance.

Completeness

As an example at the time of writing, among those with breast cancer, the completeness of variables in the UK, Germany, and Japan ranged for group stage (81%-95% completeness), histology (97%-99%), laterality (99%-100%), menopausal status (60%-79%), and tumour grade (97%-98%). Variability in group stage completeness was largely attributed to differences in documentation practices across countries. For example, UK documents were most likely to contain tumour size and nodal status (‘positive’ or ‘negative’)—which could not be unambiguously mapped to group stage—rather than explicit cT and cN stages. For ongoing transparency, more detailed reports of completeness and quality will be presented and published for each country and cancer type.¹²^,¹³ These are subject to change as the dataset is dynamic with increasing sample size over time in addition to the incorporation of net-new source documents. The availability of key variables is evaluated routinely, with completeness contextualised against clinical care practices. This assessment is also more detailed for specific study cohorts to evaluate the data’s fitness for purpose in addressing that research question.

Relevance

Data content

Once the infrastructure for curated EHR-derived RWD is established, the data become a powerful asset for a wide range of applications, from understanding local treatment patterns to informing regulatory assessments and health technology assessments (HTAs). Our disease-specific data models are evaluated for their ability to support decision making across different jurisdictions, with ongoing efforts to validate clinically critical characteristics and endpoints and ensure representativeness. The data across countries are made more relevant with the standardisation into English.

Broad eligibility criteria were set such that, as new sites are integrated over time, the cohorts should become representative of the population of interest. Representativeness of the cohorts will be monitored and findings will be published, as has been done previously for the US Flatiron Health Research Database.⁹ The broad eligibility criteria also reduce the risk that historically marginalised populations are under-represented, and Flatiron Health has a team specifically dedicated to health equity research using the datasets.14, 15, 16, 17

Care settings and time period

As well as being selected for population representativeness, sites are selected to reflect the standard of care in the given country. This ensures that the care setting of the EHR-derived dataset is relevant to the target setting of regulatory and HTA decision making in that country. In addition, recency of data is key to ensure that it is relevant to current decision making. Most patients in the Germany, Japan, and UK datasets were diagnosed after 2015, with new patients frequently added, to ensure that the data represents the current standard of care. Prospective follow-up was initiated on 1 January 2024.

Sample size and follow-up period

Less than 1 year after the start of data curation, we presented findings from cohorts of 1813 NSCLC patients and 1582 breast cancer patients across Germany and the UK,¹² and 1203 patients with gastric or colorectal cancer in Japan.¹³ With the frequent addition of new sites and new patients, these cohort sizes are projected to grow substantially over time. The study time horizon of 2015-2024 applied in most cases was chosen to achieve a balance of recency, while also allowing sufficient follow-up to study relevant outcomes. Follow-up for existing patients who remain under care at the site will also extend, as new information is frequently extracted from their records.

Discussion

Limitations and challenges

A challenge of studies across countries is that EHRs from diverse health care systems may introduce inconsistencies in completeness, mitigated by the above country-specific data processing. These are regularly reviewed and updated, and we developed a rigorous QA process to flag areas for improvement. In addition, we regularly review the EHRs for new source documents to optimise completeness of variables while maintaining content validity. RWD is limited to drugs that are available on the market and used at the time of interest. Similarly, biomarker-defined cohorts are limited to patients who received biomarker testing, which may vary in adoption and utilisation rates between countries.

Future directions

Studies for further verification and validation of clinically critical characteristics and endpoints are in development. Studies to evaluate the representativeness of the multinational Flatiron Health Research Database are ongoing as the number of sites integrated within each country is increasing. We are exploring opportunities to incorporate artificial intelligence and machine learning for extraction of certain data elements, with validation of accuracy benchmarked to our datasets abstracted by trained clinical experts.¹⁸^,¹⁹ Transportability of RWE across country borders is being explored through the Flatiron FORUM, a consortium to foster oncology RWE uses and methods.20, 21, 22 For example, treatments and survival outcomes for patients diagnosed with metastatic breast cancer between 2015 and 2021 (with follow-up through 2024) were similar between the USA and the UK.²²

Conclusion

While the overarching process of curating unstructured data is consistent across the UK, Germany, and Japan, there are important differences driven by local health care systems, regulatory requirements, and data availability. In the UK, although EHR system fragmentation exists, each NHS trust provides a more unified data source in which the longitudinal patient journey is captured or reflected. In contrast, Germany’s more decentralised health care system requires additional efforts to standardise data across multiple providers. Japan’s stringent consent requirements and the need for local language abstraction add further complexity to the curation process but also ensure that the data is highly accurate and reflective of local clinical practices.

These differences highlight the importance of tailoring the data curation process to the specific context of each country, ensuring that the resulting datasets are not only comparable across regions but also reflective of the unique health care environments in which they were generated. This approach enables the multinational Flatiron Health Research Database to support robust comparative-effectiveness research, providing valuable insights into global treatment patterns and outcomes.

Variation in health care systems, clinical guidelines, and patient populations across countries underscores the necessity of developing local RWD infrastructures, which can both provide region-specific insights and support multinational comparisons, particularly in regulatory submissions. By building credible, multinationally curated EHR-derived RWD infrastructure, we lay the foundation for evidence generation that bridges gaps in clinical knowledge, ultimately improving the quality of oncology care worldwide. Our expansion into global oncology care markets is a significant step forward, enabling access to harmonised EHR-derived datasets across geographies. This approach ensures that global researchers have the tools to investigate treatment patterns and outcomes across diverse health care systems, fostering high-quality, cross-border evidence generation.

Declaration of generative AI and AI-assisted technologies in the writing process

During the preparation of this work the authors used an AI tool to compile and summarise information from Flatiron Health documents. After using this tool, the authors reviewed, fact checked, and edited the content as needed and take full responsibility for the content of the publication.

Acknowledgements

We thank Michael Bierl of Flatiron Health, Inc., Lauren Brown of Flatiron Health K.K., Adam Manhi of Flatiron Health UK, and Maike Sauer of Flatiron Health Germany for establishing the partnerships with sites; Matthew Murchinson of Flatiron Health UK and Lauren Ellsworth of Flatiron Health Germany for engineering leadership; Darren Johnson of Flatiron Health, Inc. for editorial support; and Madeline Morenberg of Flatiron Health, Inc. for the visual design of Figure 1.

Funding

This work was supported by Flatiron Health Inc. (no grant number), which is an independent member of the Roche Group.

Disclosure

At the time of this work, BA, EH, CX, A. Samani, CB, PM, HP, QZ, DN, KS, NS, ET, and A. Sujenthiran reported employment with Flatiron Health, Inc. (an independent member of the Roche Group) and stock ownership in Roche.

References

1.Ramsey S.D., Onar-Thomas A., Wheeler S.B. Real-world database studies in oncology: a call for standards. J Clin Oncol. 2024;42(9):977–980. doi: 10.1200/JCO.23.02399. [DOI] [PubMed] [Google Scholar]
2.Bando H., Tajima E., Aoyagi Y., et al. The emerging role of real-world data in oncology care in Japan. ESMO Real World Data Digit Oncol. 2023;2 [Google Scholar]
3.National Institute for Health and Care Excellence NICE real-world evidence framework. National Institute for Health and Care Excellence. 2022. https://www.nice.org.uk/corporate/ecd9/chapter/overview Available at.
4.Lewis J.R.R., Kerridge I., Lipworth W. Use of real-world data for the research, development, and evaluation of oncology precision medicines. JCO Precis Oncol. 2017;1(1):1–11. doi: 10.1200/PO.17.00157. [DOI] [PubMed] [Google Scholar]
5.Liu C.R., Adamson B., Horne E., Sujenthiran A., Meadows E. Real-world Data Sources for Oncology in Five Major European Countries: A Targeted Literature Review. Barcelona, Spain: ISPOR Europe 2024. 2024. https://www.ispor.org/conferences-education/conferences/upcoming-conferences/ispor-europe-2024/program/program/session/euro2024-4013/142090 Available at.
6.Castellanos E.H., Wittmershaus B.K., Chandwani S. Raising the bar for real-world data in oncology: approaches to quality across multiple dimensions. JCO Clin Cancer Inform. 2024;8(8) doi: 10.1200/CCI.23.00046. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Ramsey S.D., Adamson B.J., Wang X., et al. Using electronic health record data to identify comparator populations for comparative effectiveness research. J Med Econ. 2020;23(12):1618–1622. doi: 10.1080/13696998.2020.1840113. [DOI] [PubMed] [Google Scholar]
8.DARWIN E.U. Studies. DARWIN EU. 2024. https://www.darwin-eu.org/index.php/studies Available at.
9.Ma X., Long L., Moon S., Adamson B.J.S., Baxi S.S. Comparison of population characteristics in real-world clinical oncology databases in the US: Flatiron Health, SEER, and NPCR. medRxiv. 2023 doi: 10.1101/2020.03.16.20037143. [DOI] [Google Scholar]
10.Fleurence R.L., Kent S., Adamson B., et al. Assessing real-world data from electronic health records for health technology assessment: the SUITABILITY checklist: a good practices report of an ISPOR task force. Value Heal. 2024;27(6):692–701. doi: 10.1016/j.jval.2024.01.019. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.NHS England Digital National Data Opt-Out. NHS England Digital. 2024. https://digital.nhs.uk/services/national-data-opt-out#:∼:text=The%20National%20Data%20Opt%2DOut,used%20for%20research%20and%20planning.&text=Patients%20can%20find%20out%20more,choice%20on%20the%20NHS%20website Available at.
12.Adamson B., Horne E., Buhl C., et al. [788] Characterization of novel longitudinal oncology real world datasets in Germany and UK. Pharmacoepidemiol Drug Saf. 2024;33(S2) [Google Scholar]
13.Adamson B., Pittell H., Samani A., et al. Early-onset gastric and colorectal cancer in Japan and the US. Asian Conference on Pharmacoepidemiology. 2024. https://resources.flatiron.com/publications/early-onset-gastric-and-colorectal-cancer-in-japan-and-the-us
14.Pittell H., Calip G.S., Pierre A., Ryals C.A., Guadamuz J.S. Racialized economic segregation and inequities in treatment initiation and survival among patients with metastatic breast cancer. Breast Cancer Res Treat. 2024;206(2):411–423. doi: 10.1007/s10549-024-07319-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Guadamuz J.S., Wang X., Ryals C.A., et al. Socioeconomic status and inequities in treatment initiation and survival among patients with cancer, 2011-2022. JNCI Cancer Spectr. 2023;7(5) doi: 10.1093/jncics/pkad058. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Pittell H., Calip G.S., Pierre A., et al. Racial and ethnic inequities in US oncology clinical trial participation from 2017 to 2022. JAMA Netw Open. 2023;6(7) doi: 10.1001/jamanetworkopen.2023.22515. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Adamson B.J.S., Cohen A.B., Gross C.P., et al. ACA Medicaid expansion association with racial disparity reductions in timely cancer treatment. Am J Manag Care. 2021;27(7):274–281. doi: 10.37765/ajmc.2021.88700. [DOI] [PubMed] [Google Scholar]
18.Adamson B., Waskom M., Blarre A., et al. Approach to machine learning for extraction of real-world data variables from electronic health records. Front Pharmacol. 2023;14 doi: 10.3389/fphar.2023.1180962. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Benedum C.M., Sondhi A., Fidyk E., et al. Replication of real-world evidence in oncology using electronic health record data extracted by machine learning. Cancers (Basel) 2023;15(6):1853. doi: 10.3390/cancers15061853. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Mpofu P., Kent S., Jónsson P., et al. Evaluation of US oncology electronic health record real-world data to reduce uncertainty in health technology appraisals: a retrospective cohort study. BMJ Open. 2023;13(10) doi: 10.1136/bmjopen-2023-074559. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Kent S., Duffield S., Adam J., et al. HTA233 transportability of overall survival estimates from US to UK populations receiving first-line treatment for advanced non-small-cell lung cancer. Value Health. 2023;26(12) doi: 10.1136/bmjopen-2024-085722. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Pittell H., Horne E., Mpofu P., et al. Transportability of overall survival estimates from the US to England in metastatic breast cancer using nationally representative data sources. Poster presented at the International Society for Pharmacoeconomics and Outcomes Research (ISPOR) Europe 2024. November 17-20, 2024; Barcelona, Spain. 2024. https://www.ispor.org/heor-resources/presentations-database/presentation/euro2024-4018/144733 Available at.

[bib1] 1.Ramsey S.D., Onar-Thomas A., Wheeler S.B. Real-world database studies in oncology: a call for standards. J Clin Oncol. 2024;42(9):977–980. doi: 10.1200/JCO.23.02399. [DOI] [PubMed] [Google Scholar]

[bib2] 2.Bando H., Tajima E., Aoyagi Y., et al. The emerging role of real-world data in oncology care in Japan. ESMO Real World Data Digit Oncol. 2023;2 [Google Scholar]

[bib3] 3.National Institute for Health and Care Excellence NICE real-world evidence framework. National Institute for Health and Care Excellence. 2022. https://www.nice.org.uk/corporate/ecd9/chapter/overview Available at.

[bib4] 4.Lewis J.R.R., Kerridge I., Lipworth W. Use of real-world data for the research, development, and evaluation of oncology precision medicines. JCO Precis Oncol. 2017;1(1):1–11. doi: 10.1200/PO.17.00157. [DOI] [PubMed] [Google Scholar]

[bib5] 5.Liu C.R., Adamson B., Horne E., Sujenthiran A., Meadows E. Real-world Data Sources for Oncology in Five Major European Countries: A Targeted Literature Review. Barcelona, Spain: ISPOR Europe 2024. 2024. https://www.ispor.org/conferences-education/conferences/upcoming-conferences/ispor-europe-2024/program/program/session/euro2024-4013/142090 Available at.

[bib6] 6.Castellanos E.H., Wittmershaus B.K., Chandwani S. Raising the bar for real-world data in oncology: approaches to quality across multiple dimensions. JCO Clin Cancer Inform. 2024;8(8) doi: 10.1200/CCI.23.00046. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib7] 7.Ramsey S.D., Adamson B.J., Wang X., et al. Using electronic health record data to identify comparator populations for comparative effectiveness research. J Med Econ. 2020;23(12):1618–1622. doi: 10.1080/13696998.2020.1840113. [DOI] [PubMed] [Google Scholar]

[bib8] 8.DARWIN E.U. Studies. DARWIN EU. 2024. https://www.darwin-eu.org/index.php/studies Available at.

[bib9] 9.Ma X., Long L., Moon S., Adamson B.J.S., Baxi S.S. Comparison of population characteristics in real-world clinical oncology databases in the US: Flatiron Health, SEER, and NPCR. medRxiv. 2023 doi: 10.1101/2020.03.16.20037143. [DOI] [Google Scholar]

[bib10] 10.Fleurence R.L., Kent S., Adamson B., et al. Assessing real-world data from electronic health records for health technology assessment: the SUITABILITY checklist: a good practices report of an ISPOR task force. Value Heal. 2024;27(6):692–701. doi: 10.1016/j.jval.2024.01.019. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib11] 11.NHS England Digital National Data Opt-Out. NHS England Digital. 2024. https://digital.nhs.uk/services/national-data-opt-out#:∼:text=The%20National%20Data%20Opt%2DOut,used%20for%20research%20and%20planning.&text=Patients%20can%20find%20out%20more,choice%20on%20the%20NHS%20website Available at.

[bib12] 12.Adamson B., Horne E., Buhl C., et al. [788] Characterization of novel longitudinal oncology real world datasets in Germany and UK. Pharmacoepidemiol Drug Saf. 2024;33(S2) [Google Scholar]

[bib13] 13.Adamson B., Pittell H., Samani A., et al. Early-onset gastric and colorectal cancer in Japan and the US. Asian Conference on Pharmacoepidemiology. 2024. https://resources.flatiron.com/publications/early-onset-gastric-and-colorectal-cancer-in-japan-and-the-us

[bib14] 14.Pittell H., Calip G.S., Pierre A., Ryals C.A., Guadamuz J.S. Racialized economic segregation and inequities in treatment initiation and survival among patients with metastatic breast cancer. Breast Cancer Res Treat. 2024;206(2):411–423. doi: 10.1007/s10549-024-07319-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib15] 15.Guadamuz J.S., Wang X., Ryals C.A., et al. Socioeconomic status and inequities in treatment initiation and survival among patients with cancer, 2011-2022. JNCI Cancer Spectr. 2023;7(5) doi: 10.1093/jncics/pkad058. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib16] 16.Pittell H., Calip G.S., Pierre A., et al. Racial and ethnic inequities in US oncology clinical trial participation from 2017 to 2022. JAMA Netw Open. 2023;6(7) doi: 10.1001/jamanetworkopen.2023.22515. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib17] 17.Adamson B.J.S., Cohen A.B., Gross C.P., et al. ACA Medicaid expansion association with racial disparity reductions in timely cancer treatment. Am J Manag Care. 2021;27(7):274–281. doi: 10.37765/ajmc.2021.88700. [DOI] [PubMed] [Google Scholar]

[bib18] 18.Adamson B., Waskom M., Blarre A., et al. Approach to machine learning for extraction of real-world data variables from electronic health records. Front Pharmacol. 2023;14 doi: 10.3389/fphar.2023.1180962. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib19] 19.Benedum C.M., Sondhi A., Fidyk E., et al. Replication of real-world evidence in oncology using electronic health record data extracted by machine learning. Cancers (Basel) 2023;15(6):1853. doi: 10.3390/cancers15061853. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib20] 20.Mpofu P., Kent S., Jónsson P., et al. Evaluation of US oncology electronic health record real-world data to reduce uncertainty in health technology appraisals: a retrospective cohort study. BMJ Open. 2023;13(10) doi: 10.1136/bmjopen-2023-074559. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib21] 21.Kent S., Duffield S., Adam J., et al. HTA233 transportability of overall survival estimates from US to UK populations receiving first-line treatment for advanced non-small-cell lung cancer. Value Health. 2023;26(12) doi: 10.1136/bmjopen-2024-085722. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib22] 22.Pittell H., Horne E., Mpofu P., et al. Transportability of overall survival estimates from the US to England in metastatic breast cancer using nationally representative data sources. Poster presented at the International Society for Pharmacoeconomics and Outcomes Research (ISPOR) Europe 2024. November 17-20, 2024; Barcelona, Spain. 2024. https://www.ispor.org/heor-resources/presentations-database/presentation/euro2024-4018/144733 Available at.

PERMALINK

Characterisation of oncology EHR-derived real-world data in the UK, Germany, and Japan

B Adamson

E Horne

C Xu

A Samani

C Buhl

P Mpofu

H Pittell

Q Zhang

D Ng

K Seidl-Rathkopf

N Schinwald

E Tajima

A Sujenthiran

Abstract

Highlights

Introduction

Characterisation

EHR sources

Figure 1.

Time period

Common data models

Structured data processing

Box 1. Process for mapping drugs from structured codes to create a cleaned medication table.

Unstructured data processing

Provenance

Governance

Analysis in the TRE

Fitness for purpose

Reliability

Accuracy

Completeness

Relevance

Data content

Care settings and time period

Sample size and follow-up period

Discussion

Limitations and challenges

Future directions

Conclusion

Declaration of generative AI and AI-assisted technologies in the writing process

Acknowledgements

Funding

Disclosure

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases