Abstract
Cancer patients are often treated with multiple sequential chemotherapy protocols ranging in complexity from simple to highly complex patterns of multiple repeating drugs. Clinical documentation procedures that focus on details of single drug events, however, make it difficult for providers and systems to efficiently abstract the sequence and nature of treatment protocols. We have developed a data driven method for cancer treatment plan recognition that takes as input pharmacy chemotherapy dispensing records and produces the sequence of identified chemotherapy protocols. Compared to a manually annotated gold standard, our method was 75% accurate and 80% precise for a breast cancer testing set (110 patients, 2,029 drug events), and 54% accurate and 63% precise for a lung cancer testing set (53 patients, 670 drug events). This method for cancer treatment plan recognition may provide clinicians and systems an abstracted view of the patient’s treatment history.
Introduction:
Given the progressive nature of the disease, cancer patients are often treated with multiple sequential chemotherapy treatment protocols. The treatment protocols range in complexity from a single drug agent given in repeating pattern to a combination of agents given in a complex repeating pattern. An example of simple plan is PACLITAXEL given every 3 weeks over 3 months, or TRASTUZUMAB given every 3 weeks over 10 months. An example of a more complex plan is a 3 day pattern of CISPLATIN and ETOPOSIDE repeated every 21 days, where CISPLATAIN and ETOPOSIDE are given together on day 1, and then ETOPOSIDE is repeated alone on days 2 and 3. Physicians conceptually recognize the prescribed medications in an abstract sense of a “plan” to be followed by a patient; corresponding prescriptions however, are recorded as discrete transactions in electronic health record (EHR) system. When a physician wants to review the medical plan, during an encounter with a patient, s/he would like to see the plan in a similar abstract manner as originally conceived. Figure 1 better describes the simple and complex plans, corresponding to those described above, that a provider may have in mind while treating a cancer patient.
Figure 1:
Abstract view of chemotherapy protocols.
Clinical documentation procedures for chemotherapy ordering, dispensing and administration focus on the specific details of a single drug event and are recorded as individual time-stamped transactions (Table 1). The discrete transactional nature of medication events makes it difficult for providers and systems to efficiently abstract the sequence and nature of treatment protocols. Chemotherapy flow sheets derived from such transactional data, as shown in Table 2, provide some clustering of medication events by drug class and date, and can assist clinicians in the plan recognition task. Clinicians, however, continue to devise workarounds in their documentation to record the abstract version of the plans such as summarized in Table 3. This presents an opportunity for an automated plan recognition method to detect sequence and patterns of chemotherapy plans from time-stamped medical transaction records. The task is to translate the temporal information into patterns that correspond to the clinician’s original concept of the medication plan.
Table 1:
Medications as listed in Pharmacy Dispensing Records
| Patient Number | Drug Name | Dose | Frequency | Administration Date |
|---|---|---|---|---|
| 53 | PEMETREXED DISODIUM | 800 MG | ONCE | 2007-11-21 16:02:31 |
| 53 | CARBOPLATIN | 360 MG | ONCE | 2007-11-21 16:03:16 |
| 53 | CARBOPLATIN | 376 MG | ONCE | 2007-12-12 16:04:00 |
| 53 | PEMETREXED DISODIUM | 800 MG | ONCE | 2007-12-12 16:04:27 |
| 53 | PEMETREXED DISODIUM | 800 MG | ONCE | 2008-01-04 17:40:08 |
| 53 | CARBOPLATIN | 377 MG | ONCE | 2008-01-04 17:41:38 |
| 53 | CARBOPLATIN | 355 MG | ONCE | 2008-01-30 14:39:49 |
| 53 | PEMETREXED DISODIUM | 800 MG | ONCE | 2008-01-30 14:40:31 |
| 53 | PEMETREXED DISODIUM | 800 MG | ONCE | 2008-02-27 19:14:11 |
| 53 | CARBOPLATIN | 350 MG | ONCE | 2008-02-27 19:15:17 |
| 53 | PEMETREXED DISODIUM | 785 MG | ONCE | 2008-03-21 16:20:51 |
| 53 | CARBOPLATIN | 370 MG | ONCE | 2008-03-21 16:21:44 |
| 53 | PEMETREXED DISODIUM | 820 MG | ONCE | 2008-11-19 18:51:54 |
| 53 | PEMETREXED DISODIUM | 745 MG | ONCE | 2008-12-10 13:50:07 |
| 53 | CARBOPLATIN | 330 MG | ONCE | 2008-12-31 17:33:06 |
| 53 | PEMETREXED DISODIUM | 745 MG | ONCE | 2008-12-31 17:33:53 |
| 53 | BEVACIZUMAB | 330 MG | ONCE | 2008-12-31 17:34:32 |
| 53 | CARBOPLATIN | 480 MG | ONCE | 2009-01-21 18:47:08 |
| 53 | PEMETREXED DISODIUM | 740 MG | ONCE | 2009-01-21 18:47:51 |
| 53 | BEVACIZUMAB | 330 MG | ONCE | 2009-01-21 18:51:27 |
Table 2:
Example Chemotherapy Flow sheet sometime shown in Electronic Health Record of Chemotherapy administration events.
| Date | Drug Name, units, route (administration time) | ||
|---|---|---|---|
| Bevacizumab mg IV | Caboplatin mg IV | Pemetrexed mg IV | |
| 11/21/2007 | 360 (16:03) | 800 (16:02) | |
| 12/12/2007 | 376 (16:04) | 800 (16:04) | |
| 01/04/2008 | 377 (17:41) | 800 (17:40) | |
| 01/30/2008 | 355 (14:39) | 800 (14:40) | |
| 02/27/2008 | 350 (19:15) | 800 (19:14) | |
| 03/21/2008 | 370 (16:21) | 785 (16:20) | |
| 11/19/2008 | 820 (18:51) | ||
| 12/10/2008 | 745 (13:50) | ||
| 12/31/2008 | 330 (17:34) | 330 (17:33) | 745 (17:33) |
| 01/21/2009 | 330 (18:51) | 480 (18:47) | 740 (18:47) |
Table 3:
A sequence of abstracted medication plans for patient 53.
| Sequence Number | Medication Plan | Start Date | Start Day | Day Series | Normalized Day Series | Dose Count | Frequency (days) |
|---|---|---|---|---|---|---|---|
| 1 | CARB+PEME | 11/21/2007 | 1 | 1, 22, 45, 71, 99, 122 | 1, 22, 45, 71, 99, 122 | 6 | 24.2 |
| 2 | PEME | 11/19/2008 | 365 | 365, 386 | 1, 22 | 2 | 21 |
| 3 | BEVA+CARB+PEME | 12/31/2008 | 407 | 407, 428 | 1, 22 | 2 | 21 |
Background:
Knowledge based temporal abstraction methods (KBTA) have been developed in the past. The KBTAs use a knowledge base reference as predefined patterns and find similar patterns in the data. Y. Shahar’s ‘Framework for knowledge-based temporal abstraction’1, for example, breaks the temporal abstraction (TA) task into five subtasks. These subtasks ‘assume a certain structure of time and of the proposition that can be interpreted over time’. Thus there is ‘a task specific TA ontology – a theory of what entities, relations and properties exist in any particular domain from the point of view of the TA task’ and the subtasks depend on ‘four domain specific knowledge types – structural, classification, temporal semantic and temporal dynamic’. The ‘Temporal pattern matching’ subtask then attempts to match patterns from the reference ontology to those that may exist in data to extract TA patterns.
The ChronoMiner2, by R. Raj, is another ontology-driven method that mines temporal patterns based on predefined hierarchical structure. This method uses domain specific ontology of hierarchies and attempts to find similar structures in the relational data-tables.
Either of these methods, and others similar to these, use a curated knowledge base (e.g. a structured ontology) as a reference, to mine similar abstractions from data. Here we attempt to develop a data-driven method to extract chemotherapy plans as an alternative method, avoiding a need to curate and maintain a reference knowledge-base.
Methods:
Data collection:
There are several sources that can provide the input medication transactions for the methodology, to process and produce the medication plans as output. Physician notes are relatively poor source of medication transactions, because they are not structured and are prone to various types of errors due to NLP. Provider orders may provide a reliable source but may not serve as an accurate representation of the patient medication, because provider orders are not one-to-one representation of medication transactions. Pharmacy records do provide one-to-one correspondence to the medication transactions. Medications dispensed, however, may not represent medication dispensed – especially in case of out-patient setting. For an inpatient setting pharmacy dispensing data would provide very accurate data, though. The only other source, better than this, would be the nurse administration record; because, pharmacy records are prone to issues like transaction reversals.
Due to unavailability of a well defined, and anonymized, nurse administration records pharmacy dispensing records were used as input to the plan recognition method, as a surrogate for medication administration events. Pharmacy dispensing records of the drugs ordered by 3 medical oncologists, (2 lung cancer and 1 breast cancer specialists) were utilized to identify a cohort of patients who were likely being treated for lung or breast cancer. All pharmacy dispensing records for this patient cohort were extracted from pharmacy information system, spanning calendar dates from January 2007 through September 2010. Each record contained information about the drug-name, dispensing timestamp, dose, frequency, and prescribing clinician.
The initial dataset contained over 200,000 records for over 1200 unique drugs. A practicing medical oncologist identified the subset of approximately 80 unique anti-neoplastic drugs from this list (oral medications excluded). The initial dataset was then filtered to eliminate all but the anti-neoplastic medications. The final dataset extracted from the Vanderbilt hospital pharmacy system contained 16,800 de-identified chemotherapy pharmacy-dispensing records. For our training set, we used a smaller dataset of 414 pharmacy-dispensing records from a cohort of 37 lung cancer patients (Table 4). We then tested the method using 670 pharmacy-dispensing records for a second cohort of 53 lung cancer patients and 2,029 pharmacy-dispensing records for a cohort of 110 breast cancer patients. A medical oncologist reviewed and annotated the pharmacy dispensing records for each of the 200 patients in the training and testing sets identifying the sequence, name, start and stop date of each chemotherapy plan for the patient.
Table 4:
Number of pharmacy dispensing records and patients in the training and testing set cohorts
| Number of Patients | Number of Pharmacy Dispensing records | ||
|---|---|---|---|
| Lung Cancer | Breast Cancer | ||
| Training Set | 37 | 414 | |
| Testing Sets | 53 | 670 | |
| 110 | 2,029 | ||
| Total | 90 | 110 | 3,113 |
Pre-processing of data:
Due to constraints / limitations on the quantum of dose, some of the drug-doses were split into two or more quanta of the same drug. Though records indicate multiple such dispensing events, they ought to be recognized as single event. After some trial and error, we recognized that such ‘splitting’ of a dispensing even may span up to 4 hours. We, therefore, aggregated the doses for the drug-events spanning 4 hours that were prescribed by the same physician, to the same patient, had the same drug-name and were (to be) administered by the same route. We stamp the earliest of the date of original set of split records to the single aggregated record.
Plan Recognition Method:
Our data driven plan recognition method takes as input information from chemotherapy drugs dispensed from pharmacy information system and generates as output a sequence of abstracted medication plans.
Step-1: We assign a day number to each medication record, when arranged in chronological order for each patient, with respect to the earliest date in the set of records for respective patients – the earliest date being day1.
Step-2: If we recognize that one or more drugs are taken on the same day (irrespective of the exact time-stamp) we concatenate the drug-names. This step creates a fresh-dataset that has no more than one record for a given date – with drug-names concatenated if there are multiple on that date – for each patient. For brevity, we recognize each drug with only first 4 characters.
Step-3: We then process the dataset created in step-2, ordered by each patient and date (or day# - as recognized in step-1). We then process drug-records for each patient, constructing the sequence of days for which same drug is taken in continuous succession. This process computes the ‘actual’ rate with which individual – or combination of – drugs are administered by calculating the days between successive records. Each such repetitive set of the same drug (or same combination of drugs) is ‘preliminary’ recognition of a plan. In addition to calculating the average rate of each preliminary plan, the computational process of this step also evaluates the duration of and gap between each successive preliminary plan. Figure 2 shows a pictorial version of these steps.
Step-4: Having computed avg. frequency, duration and gap (in number of days) between successive preliminary plans, this final step tries to identify the patterns among these plans and checks if these patterns are repeating in any way. If it detects any repeating patterns, the process computes the cycles of the pattern (number of times it is repeated). This identifies the compound nature of the plan – identifying the group-of drugs, indicating the rate at which individual drugs are taken within a group and how many times, if at all, the whole group / pattern is repeated.
Figure 2:
Steps involved in plan recognition method.
This last-step, like that for the preliminary plan, also calculates the duration and gap between successive compound plans. Steps 3 and 4 give us the out-put as shown in Table 3.
Results:
We trained our method on a dataset of pharmacy dispensing records from a cohort of lung-cancer patients. The training process identified a few flaws in the data; for example the dose splitting of some medication – as stated in preprocessing section above. We tuned the methodology to bypass these issues. There were some data errors pertaining to the content of the data, however, that we could neither circumvent nor fix. Initial results were encouraging enough for us to continue development and test the methodology on relatively bigger sets of chemotherapy records for lung and breast cancer.
Performance:
The methodology performed slightly better on the lung cancer test set, compared to the training set. Review of the gold standard, however, showed that the success rate could have been better, but for the data errors and plans with dropped drugs at the end of a plan. Data errors included duplicate pharmacy records with different names in the case of investigational drugs, likely due to billing requirements. [This is another issue with the pharmacy dispensing data. The medication names are used to comply with the billing requirements – especially in case of the investigational drugs. For example, while dispensing an investigational drug, the corresponding transaction record may reflect an equivalent approved drug by the insurance company. This, however, may adversely affect the temporal series of transactions.] Several plans with multiple day dosing had changes in the pattern of drug administration during the last cycle of treatment, likely due to patient toxicity. These modified cycles of the plan were not recognized as part of the original plan since they did not match the original pattern.
The method performed even better on the breast cancer test set. We witnessed recall rate in excess of 90%. Of all the chemotherapy drug plans it detected, 84% were correct (190 out of 226). Table 5 lists the detailed results of the method. The discussion section covers more about the observed performance.
Table 5:
Medication Events and Methodology Performance.
| Training Set | Testing Sets | ||
|---|---|---|---|
| Lung cancer | Lung Cancer | Breast Cancer | |
| Overall performance | |||
| Patients | 37 | 53 | 110 |
| Pre-filter medication events | 414 | 670 | 2,029 |
| Post-filter medication events | 363 | 575 | 1,827 |
| Recall | 0.7955 | 0.7945 | 0.9235 |
| F1-Score | 0.6796 | 0.7030 | 0.8578 |
| Precision | 0.5932 | 0.6305 | 0.8009 |
| Accuracy | 0.5147 | 0.5421 | 0.7510 |
| Plan detection | |||
| Detected Plans | 59 | 93 | 226 |
| True Plans | 42 | 73 | 190 |
| Success Rate (%) | 71.19 | 78.49 | 84.07 |
| Reasons for failure | |||
| False Negatives | 9 | 15 | 15 |
| Data Error | 0 | 2 | 8 |
| Dropped drug at the end of plan | 0 | 8 | 4 |
| Non-repetition of pattern | 6 | 5 | 0 |
Clinical Interpretations:
One of the goals of this project was to use the plan recognition method to evaluate the physician practice patterns with respect to selection and management of chemotherapy protocols. Table 6 summarizes a portion of our findings for the top 6 chemotherapy plans used by the breast cancer physician in our testing set. Among the 110 patients, there were a total of 190 true plans, of which there were 29 unique plans. Over the period of 3-year, Cyclophosphamide plus Doxorubin was the most frequently prescribed chemotherapy protocol (n=35 patients). Cyclophosphamide plus Doxorubin is a commonly used adjuvant chemotherapy protocol for breast cancer4. The standard of care is to prescribe these two medications every 2 weeks for 4 cycles. An analysis of the frequency and number of cycles for this patient cohort shows an average frequency of 14.6 days between cycles and an average number of cycles of 3.5 with a minimum of 1 cycle and a maximum of 4 cycles. Some patients will have toxicity with this protocol and require modification to the frequency or number of cycles. Our data shows the degree to which patients in this cohort were able to complete the standard of care for this protocol, providing some indication of the relative toxicity or intolerance patients may have with that protocol.
Table 6:
Top six most frequently administered breast cancer chemotherapy protocols in breast cancer cohort (SD = Standard Deviation).
| Name Chemotherapy Protocol | Count of patients with protocol | Number of Repeating Cycles | Time between Cycles (days) | |||
|---|---|---|---|---|---|---|
| Line of therapy | Standard of Care | Average (Min, Max, SD) | Standard of Care | Average (Min, Max, SD) | ||
| Cyclophosphamide, Doxorubicin | 35 | Adjuvant | 4 | 3.5 (1, 4, 0.94) | 14 | 14.6 (13, 21.7, 1.7) |
| Paclitaxel | 31 | Adjuvant, Metastatic | 12, unlimited | 8.8 (1, 12, 3.6) | 7 | 7.5 (6.9, 10.5, 0.9) |
| Cyclophosphamide, Docetaxel | 20 | Adjuvant | 4 | 3.5 (1, 4, 1.0) | 21 | 21.8 (21, 25.7, 1.3) |
| Trastuzumab | 19 | Adjuvant, Metastatic | 1 year, unlimited | 6.9 (1, 15, 4.6) | 7, 14, 21 | 17.5 (7, 32, 7.2) |
| Cisplatin, Paclitaxel | 13 | Metastatic | unlimited | 8.1 (3, 15, 3.5) | 7 | 9.1 (7, 14.7, 2.2) |
| Carboplatin, Docetaxel, Trastuzumab | 10 | Adjuvant | 6 | 4.3 (1, 6, 1.7) | 21 | 21.6 (19.8, 27.3, 2.1) |
Discussion:
Data-driven versus knowledge driven approach:
We have developed a data driven method for chemotherapy plan recognition. Excluding data errors, the method performed very well for simple and moderately complex plans that were not modified with respect to the pattern of drug administration throughout the course of treatment. However, the method performance was diminished for complex plans that did not have a discernable pattern and those with modifications in the later cycles (Table 5). Methods such as Shahar’s Knowledge Based Temporal Abstraction Method1 have been used to solve plan recognition problems in the oncology domain. These knowledge-based approaches require a knowledge engineer to model the set of known treatment plans prior to running the method. While this approach will often identify complex plans and those that are modified during the course of treatment, their accuracy comes at the cost of modeling and maintaining what is now a set of over 400 standard of care and thousands of experimental chemotherapy protocols. Other knowledge driven approaches to plan recognition include the work by Das et al in the HIV domain2,3 that also require modeling of known plans.
Our data driven method for plan recognition is relatively simple and does not require maintenance of a knowledge source. We have demonstrated the ability to apply the method with similar performance to two oncology data sets for lung cancer and breast cancer cohorts. The better performance of the methodology for the breast cancer dataset, we believe, is due to the relative absence of non-repeating patterns and a much smaller proportion of dropped drug at the end of a true plan. Additionally, there were relatively fewer complex plans (with compound patterns). We will need to test the hypothesis of generalizability, however, outside of the oncology domain. In addition, our current training and testing sets were limited to non-oral medications as oral anti-cancer therapies are typically ordered as outpatient prescriptions and not administered in the infusion center. Outpatient prescriptions may be for drugs taken daily but have a single prescription documentation event that may include refills for up to one year. The presence of a prescription however does not guarantee that the patient filled the prescription or took the medication as ordered. This class of anti-cancer drugs represents a unique and important challenge for plan recognition methods.
Implications for clinical practice and research:
A chemotherapy plan recognition method can have several clinical uses. Within clinical systems at the point of care, the output of this method can assist providers the clinical task of treatment plan abstraction by providing a summary of the patient’s treatment history. This kind of automated treatment history abstraction is not currently available in clinical systems and providers are faced with reviewing the detailed medication transaction data to perform this task. Likewise, such a method could assist with evaluation of physician guideline compliance and protocol compliance4. We have shown with our examples above in Table 6, how we can use the output of this method to derive provider level statistics regarding the use of various treatment protocols and their compliance with the standard of care therapy. This could also be used for outcomes databases and comparative effectiveness research by providing information on the therapies and transitions in therapies for a patient population.
Limitations:
Though data driven approach has its advantages, it does not come without its own limitations. To recognize a plan, the method relies on repetition of a pattern – be it a single drug (for simple plans) or a group of drugs (for compound or complex plans). For compound plans (where a sequenced group of drugs itself repeats temporally), if the group of drugs occurred only once, the pattern did not recognize the group as a single plan. Due to this and a few data-errors the method did not detect majority of the complex plans, especially in the smaller sets.
Conclusion:
New chemotherapy protocols continue to be developed and evaluated everyday requiring a flexible and easily extensible method for chemotherapy plan recognition. Existing flow sheet methods of presenting chemotherapy data in the EHR do not sufficiently provide an abstract representation of the patient’s treatment history. We believe an automated data-driven method for chemotherapy plan recognition could provide useful output for both clinical and clinical research uses. We have developed a simple method that does not require any external knowledge sources to recognize medication plans. The initial performance of our method is encouraging and we will continue to develop and refine our approach as we incorporate new data sources including oral medications and new clinical domains.
Acknowledgments
The work is funded by NLM fellowship grant 5 T15 LM 7450-10.
References:
- 1.Shahar Y. A Framework for Knowledge-Based Temporal Abstraction. Artificial Intelligence. 1997;90:79–133. doi: 10.1016/0933-3657(95)00036-4. [DOI] [PubMed] [Google Scholar]
- 2.Raj R, O’Connor MJ, et al. An ontology-driven method for hierarchical mining of temporal patterns: application to HIV drug resistance research. AMIA Symposium Proceedings. 2007:614–619. [PMC free article] [PubMed] [Google Scholar]
- 3.Lin RS, Rhee SY, et al. Prediction of HIV mutation changes based on treatment history. AMIA Symposium proceedings. 2006. p. 1011. [PMC free article] [PubMed]
- 4.Citron ML, Berry DA, et al. Randomized trial of dose-dense versus conventionally scheduled and sequential versus concurrent combination chemotherapy as postoperative adjuvant treatment of node-positive primary breast cancer: first report of Intergroup Trial C9741/Cancer and Leukemia Group B Trial 9741. J Clin Oncol. 2003 Apr 15;21(8):1431–9. doi: 10.1200/JCO.2003.09.081. Epub 2003 Feb 13. [DOI] [PubMed] [Google Scholar]


