Abstract
Objectives
As yet, there is no European data standard for naming and describing oncology regimens. To enable real-world cancer treatment data comparisons, the Oncology Data Network created a unified reference database for systemic anti-cancer regimens used in practice across Europe. Data are extracted from clinical systems and mapped to a single standard called the “Core Regimen Reference Library (CRRL)”. An automated matching algorithm has been designed based on: drug combinations; administration schedule; and dosing and route of administration. Incomplete matches are flagged for expert review. The aim of this pilot study is to have an expert pharmacist panel test the algorithm’s feasibility by comparing computerised and manual matching of regimens that are currently in use in different European countries.
Methods
The combined team pooled a diverse sample of 47 reference regimens used in Europe for solid and haematological cancers. These were then codified to the developed common data standard and the algorithm was used to match them to the CRRL. The expert pharmacist panel from the European Society of Oncology Pharmacy (ESOP) selected 12 regimens from the sample set, ranging from simple to complex, and performed a single-blind test of the algorithm, by systematically matching each original regimen to the CRRL.
Results
ESOP validated the algorithm’s feasibility based on full concurrence between manual and computer matches thereby validating the algorithm rules and logic with regard to what defines the core characteristics of a regimen and how to compare similarities and differences.
Conclusions
ESOP’s validation of the matching algorithm and approach to curating a master library provides confidence in their utility for reliable comparison of real-world regimen usage across Europe.
Keywords: medical oncology, medical informatics, documentation, evidence-based medicine, pharmacy service, hospital
Introduction
Currently no unified reference source and data standard exists for oncology regimens used in Europe.1–4 The Oncology Data Network (ODN) addressed this challenge with the aims of studying real-world data to: reveal trends and variations in how anti-cancer therapies are being used for particular patient cohorts, to inform improvements to policy and clinical practice; help providers identify new areas for research; expedite enrolment for clinical trials and real-world studies; and enable novel payment agreements to help meet the challenge of financial sustainability.5–7
Previous attempts to standardise oncology regimen codification have been undertaken in the USA8 9 as well as in the UK.10 The European situation is distinctly different though, with different languages in place, as well as a large variety of automated hospital information systems on the market. Hence, to study real-world data, with the aim of achieving a completely standardised nomenclature in all systems across all borders, would be a huge challenge. Extracting data and then matching them based on predefined criteria, is a more feasible approach. Such an operating model, as proposed in this article, would involve extracting patient data from clinical systems, deidentifying (to ensure anonymity), transforming, and mapping it against the reference library. This would enable the effective analysis and comparison of real-world drug usage and treatment outcomes across different geographies and clinical settings. Information governance for this project was fully aligned with the European Union’s General Data Protection Regulation, and also obtained authorisation from the “Commission nationale de l'informatique et des libertés”, which was the leading data protection authority for the ODN initiative.
The Core Regimen Reference Library (CRRL) is a live reference database containing systemic anti-cancer regimens that are recognised and used in clinical practice. Its base content has been compiled using reference regimens supplied by individual hospitals, some described in ‘summary of medicinal product characteristics’ available from pharmaceutical companies, and others published by national cancer registries. Before being added to the CRRL, the source content is conformed to a common data model in terms of structure, reference vocabularies, and units of measure. It is then classified and verified to avoid duplication or redundancy, and enriched using other reference data such as ‘class of therapy’. A matching algorithm was designed to automate the matching of regimens to the CRRL. This capability enables actual regimen usage to be more readily queried and compared within and across hospitals in Europe, using the CRRL as a master standard.
In 2019, IQVIA and five members of ESOP ran a pilot study to test the algorithm’s feasibility. This paper describes the approach and outcomes of this project.
Material and methods
Terminology
The terms ‘regimen’ vs ‘protocol’ vs ‘treatment plan’ are generally understood as concepts, but standards haven’t yet been established to specify them.1 2
For the purpose of data modelling, we define the following terms:
The foundation of an anti-cancer regimen is the configuration of drug administrations to be given over time according to a schedule, frequency per day, dosing, and route of administration specified for each drug. Additional data such as infusion time could be included if necessary to distinguish different regimens.
A ‘protocol’ comprises one or more regimens, plus guidance about the supportive care required throughout the treatment pathway, including rules for protocol adjustment or breaks in treatment according to diagnostic tests and observations.
A ‘treatment plan’ is formulated to meet the needs of an individual patient, usually by selecting and adapting a ‘protocol’ or regimen(s) from the hospital reference library.
Reference sample selection
Our main objective was to robustly test the feasibility of the IQVIA matching algorithm using expert opinion. This meant selecting a regimen sample set that represented sufficient qualitative diversity and relevance, which could be matched manually by the ESOP expert team within the available time.
Selection criteria were as follows:
-
Broad representation in practice
Used in different European countries
Indicated for a range of solid and haematological cancers
-
Basic through to complex structures
Drug combinations
Schedules
Dosage and route
Standard vs ‘accelerated’ or ‘intense’ versions of a given regimen
-
Class of therapy and treatment settings
Cytotoxic chemotherapy agents and targeted therapies (however, broad-spectrum immunotherapies and hormone therapies weren’t included)
Adjuvant/neoadjuvant, and locally advanced or metastatic settings
Based on these criteria, the panel of pharmacists chose the regimens to be tested by consensus from the sample set. The chosen regimens stem from the thesauri of protocols from the participating hospitals, and are in daily clinical use.
Algorithm design
Figure 1 shows the decision logic of the algorithm. It starts by finding an exact match between the anti-cancer drug names in the reference regimen and the master CRRL regimens, which generates a filtered list of positive (logically ‘true’) results. For the next two stages, the reference regimens are separately matched against the filtered list of CRRL regimens according to cycle pattern and then dose pattern. This produces the range of possible match outputs shown in table 1.
Figure 1.
Decision logic for matching a hospital reference regimen to one or more CRRL master regimens.
Table 1.
Range of scenarios for possible match outputs from the algorithm
| Scenario | Drug combinations | Cycle pattern | Dose pattern | Match summary |
| A. | False | – | – | None |
| B. | True | False | False | Partial |
| C. | True | True | False | Partial |
| D. | True | False | True | Partial |
| E. | True | True | True | Full |
Regimens tagged as partially matched are flagged for expert review, which could result in either addition to the CRRL (maintaining a live and dynamic database), or correction of errors in hospital reference data and subsequent matching to an existing CRRL regimen. Until partial matches are converted to full matches, the tags also enable treatment data linked to the partially matched reference regimens (scenarios B–D) to be made available for usage comparison across hospitals, rather than being quarantined (which only happens in scenario A). First, regimen usage data can be selected based on the CRRL standard name for the matched drug combinations, and second, according to the true/false matches for cycle pattern and dose pattern.
Using the algorithm prior to feasibility testing
Prior to feasibility testing, the IQVIA team used a spreadsheet to manually codify and align each of the reference regimens to a common data model in terms of structure, reference vocabularies, and units of measure. Figures 2 and 3 show a simple and a complex example reference regimen in the original formats, and tables 2 and 3 show them in their codified forms.
Figure 2.
Example of reference regimen (FEC100) in its original format published by Onco Normandie Réseau Régional de Cancérologie13 Reprinted with permission from Reseau Onco Normandie.
Figure 3.
Example of reference regimen (DA-EPOCH) in its original format published in haematologie vademecum, Amsterdam UMC14 Reprinted with permission from Amsterdam UMC.
Table 2.
Example reference regimen (FEC100) codified and aligned to the reference library database
| Row | International non-proprietary name (INN) of drug | Dose quantity | Dose unit | Route of admin | Cycle numbers | Cycle durations (days) | Cycle admin day numbers | Cycle admin frequency per day |
| 1 | Fluorouracil | 500 | mg/m2 | Continuous Intravenous | 1,2,3,4,5,6 | 21,21,21,21,21,21 | 1,1,1,1,1,1 | 1,1,1,1,1,1 |
| 2 | Epirubicin | 100 | mg/m2 | Continuous Intravenous | 1,2,3,4,5,6 | 21,21,21,21,21,21 | 1,1,1,1,1,1 | 1,1,1,1,1,1 |
| 3 | Cyclophosphamide | 500 | mg/m2 | Continuous Intravenous | 1,2,3,4,5,6 | 21,21,21,21,21,21 | 1,1,1,1,1,1 | 1,1,1,1,1,1 |
Table 3.
Example reference regimen (DA EPOCH-R) codified and aligned to the reference library database for dose level 1*
| Row | INN of drug | Dose quantity | Dose unit | Route of admin | Cycle numbers | Cycle durations (days) | Cycle admin day numbers | Cycle admin frequency per day |
| 1 | Prednisolone | 60 | mg/m2 | Oral | 1,2,3,4,5,6 | 21,21,21,21,21,21 | 1–5,1–5,1–5,1–5,1–5,1–5 | 2,2,2,2,2,2 |
| 2 | Rituximab | 375 | mg/m2 | Intravenous | 1,2,3,4,5,6 | 21,21,21,21,21,21 | 1,1,1,1,1,1 | 1,1,1,1,1,1 |
| 3 | Doxorubicin | 10 | mg/m2 | Continuous Intravenous | 1,2,3,4,5,6 | 21,21,21,21,21,21 | 1–4,1–4,1–4,1–4,1–4,1–4 | 1,1,1,1,1,1 |
| 4 | Vincristine | 0,4 | mg/m2 | Continuous Intravenous | 1,2,3,4,5,6 | 21,21,21,21,21,21 | 1–4,1–4,1–4,1–4,1–4,1–4 | 1,1,1,1,1,1 |
| 5 | Etoposide | 50 | mg/m2 | Continuous Intravenous | 1,2,3,4,5,6 | 21,21,21,21,21,21 | 1–4,1-4,1–4,1-4,1–4,1-4 | 1,1,1,1,1,1 |
| 6 | Cyclophosphamide | 750 | mg/m2 | Intravenous | 1,2,3,4,5,6 | 21,21,21,21,21,21 | 5,5,5,5,5,5 | 1,1,1,1,1,1 |
*Subsequent cycles will be recognisable in a clinical anonymised database, based on the fact that each patient starts at dose level 1.
INN, International non-proprietary name.
Next, the IQVIA team used a copy of the CRRL, loaded into the same spreadsheet workbook, and systematically applied the rules of the algorithm to match each reference regimen against master regimens in the CRRL. The match results were recorded against each reference regimen for respective stages of the algorithm (as per table 1, previous) along with the unique identifiers of the CRRL regimens that were fully or partially matched.
Testing feasibility of the algorithm
The ESOP Working Group split into two teams of three, where one team included the IQVIA oncologist who hadn’t been involved prior to the feasibility testing. Each team selected a different set of ‘challenging’ or complex reference regimens in their original format (12 in total, identified in Annex A) from the sample set, then iteratively compared them against the master CRRL regimens in a spreadsheet to locate potential matches.
Results and discussion
The reference set of samples that was selected following the inclusion criteria consisted of 47 oncology regimens, depicted in Annex A. From this set, the ESOP experts selected 12 regimens to manually test against the algorithm. The ESOP experts corroborated the results of the IQVIA matching algorithm in all 12 cases, by identifying the exact same reference from the CRRL database.
Through further discussion, the algorithm rules and logic were validated with regard to what defines the core characteristics of a regimen and how to compare similarities and differences. Initially, the Working Group took an ‘exact match’ approach, not only regarding the drugs used in combination, but also the cycle durations, drug administration days, frequency per administration day, routes of administration, dose values, and dose basis (typically mg, mg per kg body weight, or mg per m2 of body surface area).
While analysing the data, the Group made the following observations about the reference library and the matching process:
Regimens based on the same chemotherapy doublets or triplets tend to fall into sets where the dose basis for one or more of the drugs are discrete values within an expected range, for example, FEC50, FEC75, FEC100.
In some cases, the dose basis is expressed as a range rather than a discrete value, for example, docetaxel 60–100 mg/m2.
In some regimens, notably for gastro-intestinal cancers, folinic acid is used to augment the anti-neoplastic action of fluorouracil (5FU), rather than to counter side effects, so exception rules are required to match folinic acid in this context.
-
Similarly, steroids should be included in matches when used as antineoplastic agents for specific indications, for example,
Dexamethasone in doses of 21 mg or more for a haematological cancer.
Prednisolone for prostate cancer.
-
Regimens for haematological malignancies can be complex, involving multiple dose variations, escalations/adjustments, and alternating or overlapping regimens, for example,
Dose-adjusted EPOCH-R for Burkitt’s lymphoma involves dose adjustment based on tolerability. The reference regimens for the adjusted doses were not yet present in the CRRL, but the starting dose level was an exact match.
A pair of hyper-CVAD reference regimens are titled ‘Cycle A’ and ‘Cycle B’ and must be administered in parallel, but with non-overlapping cycles (ie, A1, B1, A2, B2, A3, B3, A4, B4). The combined regimens were matched to a single regimen in the CRRL.
The question was asked about matching infusion time, given it wasn’t consistently available in reference regimens or CRRL definitions. It was agreed to match on nominative descriptions about delivery rate, such as ‘intravenous bolus’ and ‘continuous intravenous’.
A proportion of the reference regimens were assessed as ‘non-matched’ to the CRRL because the reference regimen defined a specific number of cycles (suggesting either a minimum or maximum) whereas those in the CRRL were ‘open-ended’, or vice versa.
In general, the following criteria were validated for adding new reference regimens to the CRRL, via expert review:
The regimen is not fully matched to an existing CRRL entry (self-evident); and
-
The regimen definition is:
Confirmed to have been used in normal clinical practice, that is, not solely in a clinical trial, and not solely existing in a list of unused hospital reference regimens; or
Published by a recognised authority at regional (sub-national), national, or international level; or
Specified in the summary product characteristics for the market authorisation of a drug in a given geography, including but not limited to the US Federal Drug Association and the European Medicines Agency.
The regimen is structured and defined to a level of data completeness such that it could conceivably be applied in practice by other healthcare organisations.
Improving the CRRL and algorithm rules
Based on the observations above, the experts agreed the following proposals for fine-tuning the reference library structure, and refining the rules of the algorithm to handle exception cases:
Adapt the structure to contain parent regimens with dose ranges.
Define rules which detect cases where one or more reference regimens should be matched as a superset, especially those designed to be administered in parallel.
Extend the library structure to cater for dose-escalation and de-escalation dose adjustment from one cycle to the next, allowing for conditional rules and reference scales based on diagnostic results (such as neutrophil count).
Define the contextual rules for classifying and matching particular drugs as anti-neoplastic agents vs supportive agents.
Specify when a regimen in the library requires the number of cycles to be defined as a minimum or maximum as opposed to being unspecified (‘open-ended’).
Strengths and limitations of the pilot study
While prior approaches have been designed to work with proprietary databases,4 8–10 the current study aimed to test an approach to the structured modelling of regimens in use in clinical databases anywhere in Europe from any hospital information system, and using a classification algorithm to match them at scale to regimens in a standardised master database. This concept is more likely to be of use in the real world, but also more complex than the previous approaches.
A key strength of the project is the inclusion of pharmacists who brought real-world regimens from five different countries. To our knowledge, this is the first attempt to collide real-world oncology drug treatment data across borders and languages. Prior to feasibility testing, IQVIA had applied the algorithm to a set of 47 real-world reference regimens, selected using criteria for diversity. The expert Working Group tested only 12 regimens representing a variety of regimens deemed to be most difficult in terms of codification. Future studies using a larger sample set are warranted.
The effectiveness of the algorithm is dependent on the completeness and quality of the reference regimens supplied by individual hospitals. Certain topics do require further investigation, such as defining rules for the limits of dose ranges, dose-escalation and de-escalation, interchangeability of families of drugs (eg, based on class of therapy such as ‘taxane’), and a harmonised regimen naming system.
Future algorithm development
Because oncology is a rapidly changing area of healthcare, maintaining an adequate reference library will be time consuming. In addition, chemotherapy is not always a stand-alone treatment modality, for example, many combination treatments with radiotherapy are in use. Thus, a highly useful future development would be to develop another algorithm which incorporates combination with non-chemotherapy interventions and one that uses machine learning to detect regimen patterns in treatment data.11 12
Summary and conclusion
Currently, no unified reference source and data standard exists for oncology regimens used in Europe. We involved expert pharmacists from multiple countries in this study, who were able to validate an algorithm built for matching oncology regimens from real live sources to the reference database. This will enable large-scale analyses of trends and variations of anti-cancer therapies used to treat patients with cancer throughout Europe.
What this paper adds.
What is already known on this subject
Standardising regimen nomenclature and codification in oncology can aid in analysing real-world big data and can also lead to prescription error reduction. Oncology regimens generally consist of multiple drugs that are administered in combination or sequentially, over multiple days and in a cyclic manner. This makes standardisation or codification more complex. In Europe, a multitude of local or regional thesauri exist, but there is no gold standard or reference standard available.
What this study adds
To enable comparison of treatment delivery on a European level, a reference library and an algorithm that can match oncology regimens was developed and validated. This approach will allow the use of real-world big data in oncology care.
ejhpharm-2021-002763supp001.pdf (35.5KB, pdf)
Footnotes
Contributors: Conceptualisation: RT, AW, JL. Methodology: RT, AF, RB, AW, JL. Formal analysis and investigation: all authors. Writing and editing of the paper: RB, MC. Reviewing and editing of the paper: all authors. Supervision: JL.
Funding: This work was fully supported by IQVIA World Publications Limited. A fixed fee was paid to ESOP. IQVIA, formerly Quintiles and IMS Health, Inc. is an American multinational human data science company based in Durham, NC serving the combined industries of health information technology and clinical research. It launched the Collaboration for Oncology Data in Europe (CODE), which aimed to harness the power of data through a pioneering large-scale network of cancer treatment centres, the Oncology Data Network (ODN). As a member of the European Cancer Organisation ECO, the European Society of Oncology Pharmacy (ESOP) provided the oncology pharmacy expertise.
Competing interests: None declared.
Provenance and peer review: Not commissioned; externally peer reviewed.
Supplemental material: This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.
Data availability statement
Data are available on reasonable request and based on the data use agreement policies of Amsterdam UMC hospital. Data from this study are available on request by sending an email message to the corresponding author.
Ethics statements
Patient consent for publication
Not required.
Ethics approval
Formal ethical approval for this study was deemed not necessary since no data of subjects were incorporated into the study.
References
- 1. Shulman LN, Miller RS, Ambinder EP, et al. Principles of safe practice using an oncology EHR system for chemotherapy ordering, preparation, and administration, part 1 of 2. J Oncol Pract 2008;4:203–6. 10.1200/JOP.0847501 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Shulman LN, Miller RS, Ambinder EP, et al. Principles of safe practice using an oncology EHR system for chemotherapy ordering, preparation, and administration, part 2 of 2. J Oncol Pract 2008;4:254–7. 10.1200/JOP.0857501 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Malty AM, Jain SK, Yang PC, et al. Computerized approach to creating a systematic ontology of hematology/oncology regimens. JCO Clin Cancer Inform 2018;2:1–11. 10.1200/CCI.17.00142 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Warner JL, Cowan AJ, Hall AC, et al. HemOnc.org: a collaborative online knowledge platform for oncology professionals. J Oncol Pract 2015;11:e336–50. 10.1200/JOP.2014.001511 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Poortmans P, Banks I, Levy J. Understanding pragmatic outcome measures in oncology, 2018. Available: https://www.ecco-org.eu/-/media/Documents/ECCO-sections/Policy/Projects/CODEECCOreportchosenFINAL.pdf?la=en&hash=E69F71A95315B0353DB25288BD103E0137A45DC6 [Accessed Mar 2020].
- 6. Kerr D, Arnold D, Blay J-Y, et al. The oncology data network (ODN): a collaborative European data-sharing platform to inform cancer care. Oncologist 2020;25:e1–4. 10.1634/theoncologist.2019-0337 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Woolmore A, Arnold D, Blay J-Y, et al. The oncology data network (ODN): methodology, challenges, and achievements. Oncologist 2020;25:e1428–32. 10.1634/theoncologist.2019-0855 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Warner JL, Dymshyts D, Reich CG, et al. HemOnc: a new standard vocabulary for chemotherapy regimen representation in the OMOP common data model. J Biomed Inform 2019;96:103239. 10.1016/j.jbi.2019.103239 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Kohler DR, Montello MJ, Green L, et al. Standardizing the expression and nomenclature of cancer treatment regimens. American Society of Health-System Pharmacist (ASHP), American Medical Association (AMA), American Nurses Association (ANA). Am J Health Syst Pharm 1998;55:137–44. 10.1093/ajhp/55.2.137 [DOI] [PubMed] [Google Scholar]
- 10. Bright CJ, Lawton S, Benson S, et al. Data resource profile: the systemic anti-cancer therapy (SACT) dataset. Int J Epidemiol 2020;49:15–15l. 10.1093/ije/dyz137 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Rubinstein SM, Yang PC, Cowan AJ, et al. Standardizing chemotherapy regimen nomenclature: a proposal and evaluation of the HemOnc and National Cancer Institute Thesaurus regimen content. JCO Clin Cancer Inform 2020;4:60–70. 10.1200/CCI.19.00122 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Yang S, Wei R, Guo J, et al. Semantic inference on clinical documents: combining machine learning algorithms with an inference engine for effective clinical diagnosis and treatment. IEEE Access 2017;5:3529–46. 10.1109/ACCESS.2017.2672975 [DOI] [Google Scholar]
- 13. O. N. R. R. d. Cancérologie . Thésaurus régional harmonisé des protocoles de chimiothérapie Sein, 2013. Available: https://onconormandie.fr/qualite/protocoles-regionaux-de-chimiotherapies/ [Accessed 8 Aug 2019].
- 14. Amsterdam UMC, VUmc, “Vademecum hematologie” [Online}. Available: https://vademecum.hematologie.nl/ [Accessed 8 Aug 2019].
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
ejhpharm-2021-002763supp001.pdf (35.5KB, pdf)
Data Availability Statement
Data are available on reasonable request and based on the data use agreement policies of Amsterdam UMC hospital. Data from this study are available on request by sending an email message to the corresponding author.



