Skip to main content
JCO Clinical Cancer Informatics logoLink to JCO Clinical Cancer Informatics
. 2024 Mar 5;8:e2300150. doi: 10.1200/CCI.23.00150

Development of an Automatic Rule-Based Algorithm for the Detection of Ovarian Cancer Recurrence From Electronic Health Records

Sanghee Lee 1,2, Ji Hyun Kim 3, Hyeong In Ha 4, Myong Cheol Lim 1,3,5,6, Hyunsoon Cho 1,7,8,
PMCID: PMC10927333  PMID: 38442323

Abstract

PURPOSE

As the onset of cancer recurrence is not explicitly recorded in the electronic health record (EHR), a high volume of manual chart review is required to detect the cancer recurrence. This study aims to develop an automatic rule-based algorithm for detecting ovarian cancer (OC) recurrence on the basis of minimally preprocessed EHR data.

METHODS

The automatic rule-based recurrence detection algorithm (Auto-Recur), using notes on image reading (positron emission tomography-computed tomography [PET-CT], CT, magnetic resonance imaging [MRI]), biomarker (CA125), and treatment information (surgery, chemotherapy, radiotherapy), was developed to detect the first OC recurrence. Auto-Recur contains three single algorithms (images, biomarkers, treatments) and hybrid algorithms (combinations of the single algorithms). The performance of Auto-Recur was assessed using sensitivity, specificity, and accuracy of the recurrence time detected. The recurrence-free survival probabilities were estimated and compared with the retrospective chart review results.

RESULTS

The proposed Auto-Recur considerably reduced human resources and time; it saved approximately 1,340 days when scaled to 100,000 patients compared with the conventional retrospective chart review. The hybrid algorithm on the basis of a combination of image, biomarker, and treatment information was the most efficient (sensitivity: 93.4%, specificity: 97.4%) and precisely captured recurrence time (average time error: 8.5 days). The estimated 3-year recurrence-free survival probability (44%) was close to the estimates by the retrospective chart review (45%, log-rank P value = .894).

CONCLUSION

Our rule-based algorithm effectively captured the first OC recurrence from large-scale EHR while closely approximating the recurrence-free survival estimates obtained by conventional retrospective chart reviews. The study findings facilitate large-scale EHR analysis, enhancing clinical research opportunities.

INTRODUCTION

Ovarian cancer (OC) is the most lethal gynecologic cancer; more than 75% of affected patients are diagnosed at an advanced stage.1 As molecular-targeted therapies have emerged, therapeutic strategies have been investigated thoroughly to improve the median survival of patients with recurrent OC. For assessing the efficacy of new treatments, progression-free survival (PFS) is widely adopted as the primary end point. PFS is measured from the date of OC diagnosis to the date of first documented disease progression (eg, recurrence, metastasis) or the date of death owing to any cause. Despite a recent improvement in treatment strategies, most patients experience the first recurrence within 18-24 months of recurrence-free interval.2

CONTEXT

  • Key Objective

  • To develop an automated rule-based algorithm, Auto-Recur, designed to efficiently detect ovarian cancer (OC) recurrence in electronic health records (EHRs).

  • Knowledge Generated

  • Auto-Recur uses EHR data, including image reading, biomarkers, and treatment details to capture OC recurrence. The algorithm substantially reduces the resources needed for manual chart review while accurately capturing recurrence timing and closely approximates 3-year recurrence-free survival probability.

  • Relevance

  • This work advances the field of automated diagnostics through extraction of information from the medical record, which may improve clinical workflows in the future.

In OC, the date of recurrence is determined through multiple methods using imaging, biomarker levels, and symptom deterioration. The RECIST is a criterion for deciding disease progression with a radiologic assessment of the changes in measurable anatomic tumor size using computed tomography, magnetic resonance imaging, or positron emission tomography scan.2 Vergote et al proposed doubling of the cancer antigen (CA)-125 level from the upper limit of normal (ULN) as a favorable biomarker after the first-line treatment.3 The Gynecologic Cancer InterGroup (GCIG) proposed a combined utilization of radiologic assessment measured by RECIST and CA-125 surveillance.4

However, the time to recurrence, determined by the aforementioned criteria, is not explicitly documented in the electronic health records (EHRs). The conventional method requires manually tracking a large volume of radiologic data and biomarker results by reviewing the patient's chart. Moreover, with the rapid growth of administrative and electronic health data, extracting and managing relevant information is very complicated. Furthermore, with the availability of vast EHRs in medical studies, developing an efficient algorithm for cancer recurrence detection is now critical.

Previous studies have developed structured algorithms to capture cancer recurrence in the health records in various cancer sites.5-20 Chubak et al5 constructed an EHR-based algorithm to identify the date of recurrent breast cancer, and Lash et al20 developed algorithms for a recurrence of colorectal cancer using information from Denmark medical registries. However, no study has investigated the development of an algorithm for OC because of the complexity of the criteria for OC recurrence, which requires additional biomarker surveillance.

This study aims to develop automatic rule-based algorithms to detect the first recurrence of OC using a vast array of EHR databases. To address the validity of the proposed algorithms, we estimated and compared the recurrence-free survival probabilities of patients with OC predicted by the rule-based algorithms and conventional methods of retrospective chart reviews by gynecologic oncologists.

METHODS

Study Population

We identified 830 patients who were newly diagnosed with OC as primary cancer between January 1, 2014, and December 31, 2016, in the National Cancer Center (NCC; International Classification of Diseases, Tenth Revision [ICD-10] diagnosis codes C48.1, C56, and C57). Patients with a history of other cancer diagnoses (n = 70) or recurrent cancer on the first visit (n = 105) and patients younger than 30 years at the time of the cancer diagnosis (n = 31) were excluded. Furthermore, patients who had not received initial treatment in NCC (n = 85), had the persistent OC (n = 30), and lost to follow-up (n = 6) were also excluded. The final cohort included 503 patients with OC (Data Supplement) and was followed from the day of epithelial ovarian cancer (EOC) diagnosis until December 2020 to capture OC recurrence.

Data Source

We used information extracted from the Clinical Research Data Warehouse (CRDW) from the NCC, Korea, to develop rule-based algorithms.21 The CRDW includes deidentified research data with over 2,000 columns containing EHRs of cancer patients: hospital-based cancer registry (eg, enrollment and demographics), clinic visits (eg, outpatient and inpatient), nursing evaluations (eg, medical history and vitals), laboratory test results (eg, blood and urinary), imaging tests results, and cancer treatment (eg, surgery, chemotherapy, and radiotherapy). This information is automatically accumulated in EHR when a clinician prescribes. For the post-treatment surveillance and diagnosis of recurrence, current clinical practice uses multiple imaging modalities and tumor marker. Accordingly, information was extracted from free-text notes on positron emission tomography-computed tomography (PET-CT), CT, and magnetic resonance imaging (MRI) medical image reading; CA-125; and treatment details and used in the development of rule-based algorithms.

The institutional review board approved the study of NCC (NCC2020-0313), and informed consent was waived because this was a retrospective, data-only study. All the data were deidentified.

Gold Standard (retrospective chart review by a gynecologic oncologist)

Medical records, including clinic visits, pathology reports, laboratory results, and hospitalizations, were reviewed to detect the first OC recurrence. The progression of the disease is confirmed by imaging evaluations followed by clinical symptoms with CA-125 elevation determined by GCIG Criteria.5 The recurrence date was captured through a retrospective medical chart review by a gynecologic oncologist independent of an automated rule-based algorithm. This is to establish a standard for evaluating the accuracy of the automated rule-based algorithms in predicting recurrence dates.

OC-Specific Automatic Rule-Based Algorithms: Auto-Recur

Auto-Recur, an automatic rule-based algorithm, was structured and programmed on the basis of the GCIG criteria combined with RECIST version 1.1. Three single algorithms (image-based rule, biomarker-based rule, and treatment-based rule) and hybrid algorithms, combinations of single algorithms, were developed, and these algorithms are presented in Figure 1. More precise descriptions of algorithms are provided in the Data Supplement. The code is written in the SAS programming language and is available on request from the authors.

FIG 1.

FIG 1.

Retrospective chart review versus automatic rule-based algorithms for ovarian cancer recurrence detection. CA, cancer antigen; CRS, cytoreductive surgery; CT, computed tomography; GCIG, Gynecologic Cancer InterGroup; PET, positron emission tomography.

Single Algorithms

Image-based rule using notes on the medical image reading.

The image-based rule using author-defined keywords found recurrent cases from free-text notes on medical image reading (PET-CT, CT, and MRI). A total of 179 keywords or phrases, such as “slightly increased,” “no significant change,” and “r/o recur,” consisting of 14 inclusion, 91 exclusion, 71 must-inclusion, and three must-exclusion keywords, were considered (Data Supplement). These keywords were selected from a clinical perspective by reviewing the imaging test result of actual recurrent patients in 2013. A note should contain either (1) at least one inclusion keyword without exclusion or must-exclusion keyword or (2) at least one must-inclusion keyword without a must-exclusion keyword. To capture recurrence after the end of initial treatment, the notes during the initial treatment period were excluded. After that, the selected notes were sorted by the date of the imaging test. The earliest note was used to capture the presence and time of the first OC recurrence.

Biomarker-based rule using CA-125.

The biomarker-based algorithms captured OC recurrence cases on the basis of the occurrence of two consecutive tests wherein CA-125 levels were two times greater than the ULN. The date of OC recurrence was set as the earlier date of the two consecutive test dates. Our study takes a value of 35 U/mL as the ULN. If a patient’s CA-125 had never been below 35 U/mL during initial treatment, the minimum value of CA-125 during the initial treatment was used as the ULN.

Treatment-based rule using treatment information.

The treatment-based rule detected OC recurrence cases as the second-line treatment patient. The second-line treatment was confirmed if at least one of the following conditions was present: (1) secondary cytoreduction after the first cytoreductive surgery, (2) change in chemotherapy regimen, and (3) any two treatments with an interval longer than 3 months. This is because patients visit after 3 months for clinical relapse evaluation after completion of first-line treatment.22 Use of maintenance or antihormone chemotherapy (eg, “bevacizumab,” “olaparib,” “avelumab,” “farletuzumab,” and “tamoxifen”) was not considered as second-line treatment. The beginning of the second-line treatment was defined as the time of recurrence.

Hybrid Algorithms

Hybrid algorithms are combinations of single rule-based algorithms that enable selection of algorithms depending on the available data source and study design. Only patients with cancer recurrence have second-line treatment information in their health records, indicating that the treatment-based rule (T) can be used when recurrence cases are captured retrospectively. By contrast, image-based (I) and biomarker-based (B) rules can be applied in prospective studies as well. An IB algorithm is a combination of image- and biomarker-based rules; an IT algorithm is a combination of image- and treatment-based rules, whereas a BT algorithm is a combination of biomarker- and treatment-based rules. An IBT algorithm is a combination of image-, biomarker-, and treatment-based rules. The hybrid algorithms captured recurrence cases when any one or some or all of the single rule-based algorithms detected a recurrence. The first captured case was used to determine the time of OC recurrence.

Statistical Analysis

This study intended to increase the efficiency of capturing OC recurrence from EHRs. The amount of time saved by Auto-Recur to capture OC recurrence status for 100, 1,000, 10,000, and 100,000 patients was calculated by subtracting the estimated time required for the algorithms from the time required for retrospective chart review.

To compare the performance of different rule-based algorithms, sensitivity, specificity, and overall accuracy were evaluated. Demographics and clinical characteristics were compared with cancer recurrence status captured by different rule-based algorithms and a retrospective chart review. Cancer recurrences captured within 3 months of the gold standard (retrospective chart review) were considered true-positive cases, whereas recurrences with a difference of 3 months or more were considered false-negative cases. The accuracy of capturing the first OC recurrence was measured by time error and its 95% CI. The time error was calculated on the basis of the absolute value of the time difference between the recurrence date captured by the algorithm and the recurrence date of the retrospective chart review.

Furthermore, we compared the time difference in true-positive cases to demonstrate patterns across different rule-based algorithms in capturing OC recurrence using boxplots. A time difference of zero means that the algorithm and the retrospective chart review found the OC recurrence at the same date. A time difference more significant than 0 means that the retrospective chart review captured OC recurrence before the algorithms did. The Kaplan-Meier estimator estimated recurrence-free survival probabilities. A log-rank test was used to test the significant difference in recurrence-free survival time between the rule-based algorithms and the gold standard. Data were analyzed using SAS v9.4 from January 2021 through January 2022.

RESULTS

Time-Saving by Automatic Rule-Based Algorithms

For retrospective chart review, the estimated average time required to review a patient chart was 19.3 minutes. However, the running time of the proposed Auto-Recur algorithms to capture the OC recurrence was 20.53 seconds on average per 100 patients: 3.80 seconds for data curation and 16.72 seconds for capturing recurrence (Fig 1). The data curation process includes extracting information on biomarker CA-125, image reading, surgery, and chemotherapy of the patient from CRDW and merging them in a date-by-date order. The corresponding decrease in time would be 32.16 hours per 100 patients. The time-saving benefit will be even more significant with extensive data; for 100,000 patients, the algorithms saved approximately 1,340 days.

Baseline Characteristics of Recurrence Cases by Algorithms

From a total of 503 patients with OC, recurrence was confirmed in 273 patients by retrospective chart review (54.3%). Moreover, 52.4% of the patients with recurrence were diagnosed with OC International Federation of Gynecology and Obstetrics (FIGO) stage IV. In comparison, 26.5% of the nonrecurrent patients were diagnosed with FIGO stage I (Table 1). The image-based rule classified 239 patients as recurrent cases (47.5%). Only 161 recurrent patients captured by the biomarker-based rule had a relatively higher FIGO stage, with 55.3% of the recurrent cases staged at FIGO stage IV. The mean age at diagnosis of recurrent patients captured by the treatment-based rule was slightly lower than the mean age at diagnosis of nonrecurrent patients. By contrast, the other algorithms detected recurrent patients of older mean age.

TABLE 1.

Demographic and Clinical Characteristics of Ovarian Cancer According to Recurrence Captured by Different Single Algorithms

Covariate Clinical Gold Standard (retrospective chart review) Image-Based Rule Biomarker-Based Rule Treatment-Based Rule
No Recur (n = 230) Recur (n = 273) No Recur (n = 264) Recur (n = 239) No Recur (n = 342) Recur (n = 161) No Recur (n = 242) Recur (n = 261)
Age at diagnosis, years, mean (SD) 54.5 (11.6) 55.3 (10.4) 54.3 (11.6) 55.5 (10.2) 54.5 (11.2) 55.8 (10.5) 55.1 (11.5) 54.7 (10.5)
Age at diagnosis, years, No. (%)
 30-39 25 (10.9) 18 (6.6) 28 (10.6) 15 (6.3) 33 (9.6) 10 (6.2) 21 (8.7) 22 (8.4)
 40-49 59 (25.7) 69 (25.3) 70 (26.5) 58 (24.3) 88 (25.7) 40 (24.8) 61 (25.2) 67 (25.7)
 50-59 66 (28.7) 92 (33.7) 76 (28.8) 82 (34.3) 105 (30.7) 53 (32.9) 73 (30.2) 85 (32.6)
 60-69 56 (24.4) 73 (26.7) 63 (23.9) 66 (27.6) 85 (24.9) 44 (27.3) 59 (24.4) 70 (26.8)
 70+ 24 (10.4) 21 (7.7) 27 (10.3) 18 (7.5) 31 (9.1) 14 (8.7) 28 (11.6) 17 (16.5)
FIGO stage, No. (%)
 I 61 (26.5) 9 (3.3) 62 (23.5) 8 (3.4) 66 (19.3) <5 58 (24.0) 12 (4.6)
 II 33 (14.4) 12 (4.4) 34 (12.9) 11 (4.6) 39 (11.4) 6 (3.7) 35 (14.5) 10 (3.8)
 III 50 (21.7) 97 (35.5) 68 (25.8) 79 (33.1) 94 (27.5) 53 (32.9) 58 (24.0) 89 (34.1)
 IV 65 (28.3) 143 (52.4) 78 (29.6) 130 (54.4) 119 (34.8) 89 (55.3) 69 (28.5) 139 (52.3)
 Missing 21 (9.1) 12 (4.4) 22 (8.3) 11 (4.6) 24 (7.0) 9 (5.6) 22 (9.1) 11 (4.2)
Histology, No. (%)
 Serous 105 (46.7) 198 (72.5) 129 (48.9) 174 (72.8) 180 (52.6) 123 (76.4) 116 (47.9) 187 (71.7)
 Endometrioid 20 (8.7) 6 (2.2) 20 (7.6) 6 (2.5) 22 (6.4) <5 21 (8.7) 5 (1.9)
 Clear 31 (13.5) 20 (7.3) 34 (12.9) 17 (7.1) 42 (12.3) 9 (5.6) 31 (12.8) 20 (7.7)
 Mucinous 20 (8.7) <5 20 (7.6) <5 23 (6.7) <5 19 (7.9) 5 (1.9)
 Others 54 (23.5) 45 (16.5) 61 (23.1) 38 (15.9) 75 (21.9) 24 (14.9) 55 (22.7) 44 (16.9)
Initial treatment, No. (%)
 Only surgery 18 (7.8) <5 18 (6.8) <5 21 (6.1) <5 17 (7.0) 5 (1.9)
 Surgery + adjuvant CTx 112 (48.7) 89 (32.6) 125 (47.4) 76 (31.8) 159 (46.5) 42 (26.1) 117 (48.4) 84 (32.2)
 Neoadjuvant CTx + surgery + adjuvant CTx 33 (14.3) 106 (38.8) 43 (16.3) 96 (40.2) 64 (18.7) 75 (46.6) 37 (15.3) 102 (39.1)
 Only CTx 67 (29.1) 74 (27.1) 78 (29.6) 63 (26.4) 98 (28.7) 43 (26.7) 71 (29.3) 70 (26.8)

Abbreviations: CTx, chemotherapy; FIGO, Federation of Gynecology and Obstetrics.

Performance of the Algorithms in Capturing the Presence of Recurrence

The image-based rule alone captured 188 cases within 3 months from the gold standard among 273 true recurrent patients and misclassified only two nonrecurrences as recurrences (sensitivity = 68.9%, specificity = 99.1%, and accuracy = 82.7%). The biomarker-based rule failed to capture more than half of the true recurrent cases although it captured 229 patients among the 230 nonrecurrent patients (sensitivity = 41.4%, specificity = 99.6%, and accuracy = 68.0%). The most sensitive single algorithm was the treatment-based algorithm, which captured 224 true recurrent cases (sensitivity = 82.1%, specificity = 97.8%, and accuracy = 89.3%; Table 2).

TABLE 2.

Performance of Single and Hybrid Rule-Based Algorithms

Algorithm Performance Sensitivity Specificity Accuracy Time Error in True-Positive Cases Mean (95% CI), days
Retrospective chart review Ref. Ref. Ref. Ref.
Single algorithm
 Image 68.9 99.1 82.7 10.6 (8.0 to 13.1)
 Biomarker 41.4 99.6 68.0 5.6 (2.9 to 8.3)
 Treatment 82.1 97.8 89.3 23.4 (21.1 to 25.8)
Hybrid algorithm
 IB 80.2 98.7 88.7 6.3 (4.3 to 8.2)
 IT 91.9 97.8 94.6 13.7 (11.5 to 15.9)
 BT 84.6 97.4 90.5 13.9 (11.7 to 16.1)
 IBT 93.4 97.4 95.2 8.5 (6.6 to 10.3)

Abbreviations: BT, biomarker- and treatment-based rule; IB, image- and biomarker-based rule; IBT, image-, biomarker-, and treatment-based rule; IT, image- and treatment-based rule; Ref, reference.

The hybrid IB algorithm combining image- and biomarker-based rules increased the sensitivity to 80.2% and the overall accuracy to 88.7%. The hybrid algorithm combining image- or biomarker-based rule with the treatment-based rule increased the sensitivity substantially (IT algorithm: 91.9%, BT algorithm: 84.6%). The combination of all algorithms (IBT algorithm) performed with the highest sensitivity at 93.4% (specificity = 97.4%, accuracy = 95.2%), which was the best-performing combination for capturing OC recurrence. The performance evaluated at 1 month and 2 months is also provided in the Data Supplement.

Performance of the Algorithms in Estimating Recurrence Time

The biomarker-based rule had the smallest meantime error of 5.6 days (95% CI, 2.9 to 8.3) despite the low sensitivity (Table 2). However, the treatment-based rule showed the most significant mean time error at 23.4 days (95% CI, 21.1 to 25.8) although its sensitivity was relatively high compared with the imaged-based or biomarker-based rule.

The boxplots of the time difference for true-positive cases are shown in Figure 2. The distribution of time differences in the biomarker-based and image-based rule showed that more than 50% of true-positive cases were close to zero. However, the treatment-based rule or hybrid algorithms combined with the treatment-based rule captured OC recurrence with a delay.

FIG 2.

FIG 2.

Comparison of time differences between dates of ovarian cancer recurrence captured by automatic rule-based algorithms and retrospective chart review. The ANOVA test was used to analyze the time difference among single algorithms (image, biomarker, and treatment) and the time difference among hybrid algorithms (IB, IT, BT, and IBT). Both showed significant differences (P value <.001). The Time difference (months) is the time between the recurrence date captured by the algorithm and the recurrence date of the retrospective chart review. ANOVA, analysis of variance; BT, biomarker- and treatment-based rule; IB, image- and biomarker-based rule; IBT, image-, biomarker-, and treatment-based rule; IT, image- and treatment-based rule.

Recurrence-Free Survival Probabilities Estimated by Different Rule-Based Algorithms

The Kaplan-Meier method estimated probabilities of recurrence-free survival from different rule-based algorithms and compared them with the estimates from a retrospective chart review (Fig 3). The estimates of 3-year recurrence-free survival probability varied considerably from a maximum of 68% (biomarker-based rule) to a minimum of 44% (IBT algorithm; Data Supplement). However, the recurrence-free survival time estimated by different algorithms, except image- and biomarker-based rules, was not significantly different from the estimates of the retrospective chart review. The 3- and 5-year recurrence-free survival probability was estimated to be 45% (95% CI, 40 to 49) and 39% (95% CI, 33 to 43), respectively, using a retrospective chart review. Our proposed IBT algorithm estimated the 3- and 5-year recurrence-free survival probability to be 44% (95% CI, 40 to 49) and 38% (95% CI, 33 to 42), which were close to the estimates by the retrospective chart review (log-rank P value = .894). In addition, the hazard ratios estimated from the Cox proportional hazards models were similar among the algorithms (data not shown).

FIG 3.

FIG 3.

Estimates of recurrence-free survival probability by automatic rule-based algorithms. The log-rank test was used to test the null hypothesis of no difference in survival between gold standard and each algorithm. BT, biomarker- and treatment-based rule; IB, image- and biomarker-based rule; IBT, image-, biomarker-, and treatment-based rule; IT, image- and treatment-based rule.

DISCUSSION

The proposed automatic rule-based algorithm (Auto-Recur) efficiently captured the first OC recurrence from minimally preprocessed EHRs, saving human resources and time. Moreover, the recurrence-free survival estimated by the Auto-Recur was very close to the estimates obtained from a conventional retrospective chart review.

To our knowledge, this study is the first attempt to develop an algorithm for detecting OC recurrences from EHRs. When a rule-based algorithm for capturing breast or lung cancer recurrence is devised, the rules solely target the terms in radiology or pathology reports.17 However, additional biomarker surveillance is required in OC to assess recurrence by CA-125, which is accepted by GCIG.4 Despite the complexity of the criteria for OC recurrence, our study devised the implementation of a hybrid algorithm integrating image-based, biomarker-based, and treatment-based rules.

Single rule-based algorithms showed lower sensitivities compared with hybrid algorithms. Every single rule-based algorithm’s specificity was as high as hybrid algorithms, whereas the sensitivity of the biomarker-based rule was the lowest at 41.4%. Owing to the low sensitivity of recurrence detection, recurrence-free survival might be overestimated when a biomarker-based rule or image-based rule is applied (image-based rule: log-rank P value = .0158, biomarker-based rule: log-rank P value <.0001). CA-125 surveillance has a high false-negative rate as CA-125 can also be elevated in other conditions, such as inflammatory disease or endometriosis.23 Previous studies showed inconsistent sensitivity of CA-125 surveillance to detect recurrence, ranging from 56% to 94%.24

The current study presents specified hybrid algorithms with various combinations of each rule-based algorithm. The algorithms can be chosen on the basis of the research designs and data sources available. In the retrospective evaluation of cancer recurrence, if treatment information such as the start date of the second-line treatment or secondary cytoreductive surgery is available, the IBT algorithm is the most appropriate algorithm with the highest sensitivity and specificity. The IB algorithm might capture recurrence in the prospective evaluation with reasonable performance when retrospective information is unavailable.

There are some limitations to this study. First, retrospective chart reviews may be inaccurate as the gold standard.25 Furthermore, the data in the charts were recorded for multiple reasons, including billing, administrative recording, and legal issues, which may lack comprehensiveness and quality.26 Second, this study was conducted using 3-year data (2014-2016) from a single hospital with relatively small sample size. In addition, the sensitivities of the image-based rule could decrease, possibly because of different writing styles of doctors' interpretations or different hospital visit schedules. Recently, natural language processing (NLP) has been applied to pathology and doctor's clinical notes to detect cancer recurrence.6,9,16-19,27-32 In this study, we used author-defined keywords, which involved the abstraction of electronically available reports owing to bilingual texts and their small sample size. The image-based algorithm using NLP, which was left as future research, might improve the performance of recurrence capture and increase the likelihood of expansion of the algorithm to other cancer types.

We used a maximum of 3-month time difference to classify true-positive cases although no standard threshold for optimal accuracy has been defined in the timing of cancer recurrence.15 Among the 18 false-negative cases from the IBT algorithm, 13 cases were the miscaptured recurrence with more significance than a 3-month time error. Further validation with improved accuracy in detecting recurrence time could lead to better sensitivity. As the currently developed algorithm is based on retrospective chart review as the standard reference data, additional validation should be performed first before applying to prospective studies.

In the era of large databases, the volume of medical health data has massively increased at a rapid growth rate. Huge medical data sets with great heterogeneity and diversity were created, requiring new automated algorithms for integrating and analyzing the data.33 Technologically designed algorithms for interpreting data will save time and cost and facilitate utilization of multiple research. The proposed Auto-Recur algorithm detected the first OC recurrence precisely, reducing human resources and time. Therefore, it is expected that researchers will take more opportunities to generate clinical research with a large cohort, and its value will increase in the future.

In conclusion, the proposed algorithm efficiently captured the first OC recurrence by significantly reducing human resources and time and also estimated recurrence-free survivals closely approximating the estimates obtained by conventional retrospective chart review. The algorithms will help facilitate oncologists in capturing OC recurrence from large EHRs.

ACKNOWLEDGMENT

We thank the staff of the National Cancer Data Center of Korea for the resources and data provided.

DISCLAIMER

The funding source had no role in the study design, data curation, or the analysis and interpretation of data.

SUPPORT

Supported by grants from the National Cancer Center, Korea (NCC-2010232-3, NCC-2310450-1), and the National Research Foundation of Korea (Grant No.: NRF-2020R1A2C1A01011584, RS-2023-00275999), funded by the Korea Ministry of Science and ICT.

EQUAL CONTRIBUTION

H.C. and M.C.L. contributed equally to this work as co-corresponding authors. S.L. and J.H.K. contributed equally to this work as cosenior and cofirst authors.

AUTHOR CONTRIBUTIONS

Conception and design: All authors

Financial support: Myong Cheol Lim

Administrative support: Myong Cheol Lim

Provision of study materials or patients: Myong Cheol Lim

Collection and assembly of data: Ji Hyun Kim, Myong Cheol Lim

Data analysis and interpretation: Sanghee Lee, Myong Cheol Lim, Hyunsoon Cho

Manuscript writing: All authors

Final approval of manuscript: All authors

Accountable for all aspects of the work: All authors

AUTHORS' DISCLOSURES OF POTENTIAL CONFLICTS OF INTEREST

The following represents disclosure information provided by authors of this manuscript. All relationships are considered compensated unless otherwise noted. Relationships are self-held unless noted. I = Immediate Family Member, Inst = My Institution. Relationships may not relate to the subject matter of this manuscript. For more information about ASCO's conflict of interest policy, please refer to www.asco.org/rwc or ascopubs.org/cci/author-center.

Open Payments is a public database containing information reported by companies about payments made to US-licensed physicians (Open Payments).

Myong Cheol Lim

Consulting or Advisory Role: GI Innovation, Boryung, AstraZeneca, Takeda, CKD Pharm, Genexine, Hospicare

Research Funding: AbbVie (Inst), AstraZeneca (Inst), Amgen (Inst), Astellas Pharma, BeiGene (Inst), Cellid (Inst), CKD pharm (Inst), Clovis Oncology (Inst), Genexine (Inst), GlaxoSmithKline (Inst), Incyte (Inst), Incyte (Inst), Merck (Inst), MSD (Inst), OncoQuest (Inst), Pfizer (Inst), Roche (Inst), Eisai (Inst)

No other potential conflicts of interest were reported.

REFERENCES

  • 1.Wright JD, Chen L, Tergas AI, et al. : Trends in relative survival for ovarian cancer from 1975 to 2011. Obstet Gynecol 125:1345-1352, 2015 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Schwartz LH, Litière S, De Vries E, et al. : RECIST 1.1—Update and clarification: From the RECIST committee. Eur J Cancer 62:132-137, 2016 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Rustin GJ, Quinn M, Thigpen T, et al. : Re: New guidelines to evaluate the response to treatment in solid tumors (ovarian cancer). J Natl Cancer Inst 96:487-488, 2004 [DOI] [PubMed] [Google Scholar]
  • 4.Rustin GJS, Vergote I, Eisenhauer E, et al. : Definitions for response and progression in ovarian cancer clinical trials incorporating RECIST 1.1 and CA 125 agreed by the Gynecological Cancer Intergroup (GCIG). Int J Gynecol Cancer 21:419-423, 2011 [DOI] [PubMed] [Google Scholar]
  • 5.Chubak J, Yu O, Pocobelli G, et al. : Administrative data algorithms to identify second breast cancer events following early-stage invasive breast cancer. J Natl Cancer Inst 104:931-940, 2012 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Hassett MJ, Uno H, Cronin AM, et al. : Detecting lung and colorectal cancer recurrence using structured clinical/administrative data to enable outcomes research and population health management. Med Care 55:e88-e98, 2017 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Earle CC, Nattinger AB, Potosky AL, et al. : Identifying cancer relapse using SEER-Medicare data. Med Care 40:IV-75-IV-81, 2002. (8 suppl) [DOI] [PubMed] [Google Scholar]
  • 8.McClish D, Penberthy L, Pugh A: Using Medicare claims to identify second primary cancers and recurrences in order to supplement a cancer registry. J Clin Epidemiol 56:760-767, 2003 [DOI] [PubMed] [Google Scholar]
  • 9.Carrell DS, Halgrim S, Tran D-T, et al. : Using natural language processing to improve efficiency of manual chart abstraction in research: The case of breast cancer recurrence. Am J Epidemiol 179:749-758, 2014 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Deshpande AD, Schootman M, Mayer A: Development of a claims-based algorithm to identify colorectal cancer recurrence. Ann Epidemiol 25:297-300, 2015 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Kroenke CH, Chubak J, Johnson L, et al. : Enhancing breast cancer recurrence algorithms through selective use of medical record data. J Natl Cancer Inst 108:djv336, 2016 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Warren JL, Mariotto A, Melbert D, et al. : Sensitivity of Medicare claims to identify cancer recurrence in elderly colorectal and breast cancer patients. Med Care 54:e47-e54, 2016 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Chubak J, Onega T, Zhu W, et al. : An electronic health record-based algorithm to ascertain the date of second breast cancer events. Med Care 55:e81-e87, 2017 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Ritzwoller DP, Hassett MJ, Uno H, et al. : Development, validation, and dissemination of a breast cancer recurrence detection and timing informatics algorithm. J Natl Cancer Inst 110:273-281, 2018 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Uno H, Ritzwoller DP, Cronin AM, et al. : Determining the time of cancer recurrence using claims or electronic medical record data. JCO Clin Cancer Inform 10.1200/CCI.17.00163 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Zeng Z, Espino S, Roy A, et al. : Using natural language processing and machine learning to identify breast cancer local recurrence. BMC Bioinformatics 19:498, 2018. (suppl 17) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Banerjee I, Bozkurt S, Caswell-Jin JL, et al. : Natural language processing approaches to detect the timeline of metastatic recurrence of breast cancer. JCO Clin Cancer Inform 10.1200/CCI.19.00034 [DOI] [PubMed] [Google Scholar]
  • 18.Kehl KL, Xu W, Lepisto E, et al. : Natural language processing to ascertain cancer outcomes from medical oncologist notes. JCO Clin Cancer Inform 10.1200/CCI.20.00020 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Karimi YH, Blayney DW, Kurian AW, et al. : Development and use of natural language processing for identification of distant cancer recurrence and sites of distant recurrence using unstructured electronic health record data. JCO Clin Cancer Inform 10.1200/CCI.20.00165 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Lash TL, Riis AH, Ostenfeld EB, et al. : A validated algorithm to ascertain colorectal cancer recurrence using registry resources in Denmark. Int J Cancer 136:2210-2215, 2015 [DOI] [PubMed] [Google Scholar]
  • 21.Cha HS, Jung JM, Shin SY, et al. : The Korea cancer big data platform (K-CBP) for cancer research. Int J Environ Res Public Health 16:2290, 2019 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Armstrong DK, Alvarez RD, Bakkum-Gamez JN, et al. : Ovarian cancer, version 2.2020, NCCN clinical practice guidelines in oncology. J Natl Compr Cancer Netw 19:191-226, 2021 [DOI] [PubMed] [Google Scholar]
  • 23.Bae SY, Lee JH, Park JY, et al. : Clinical significance of serum CA-125 in Korean females with ascites. Yonsei Med J 54:1241-1247, 2013 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Gadducci A, Cosio S: Surveillance of patients after initial treatment of ovarian cancer. Crit Rev Oncol Hematol 71:43-52, 2009 [DOI] [PubMed] [Google Scholar]
  • 25.Weiskopf NG, Cohen AM, Hannan J, et al. : Towards augmenting structured EHR data: A comparison of manual chart review and patient self-report. AMIA Annu Symp Proc 4:903-912, 2019 [PMC free article] [PubMed] [Google Scholar]
  • 26.Kaji AH, Schriger D, Green S: Looking through the retrospectoscope: Reducing bias in emergency medicine chart review studies. Ann Emerg Med 64:292-298, 2014 [DOI] [PubMed] [Google Scholar]
  • 27.Heintzelman NH, Taylor RJ, Simonsen L, et al. : Longitudinal analysis of pain in patients with metastatic prostate cancer using natural language processing of medical record text. J Am Med Inform Assoc 20:898-905, 2013 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Strauss JA, Chao CR, Kwan ML, et al. : Identifying primary and recurrent cancers using a SAS-based natural language processing algorithm. J Am Med Inform Assoc 20:349-355, 2013 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Yim W-W, Yetisgen M, Harris WP, et al. : Natural language processing in oncology: A review. JAMA Oncol 2:797-804, 2016 [DOI] [PubMed] [Google Scholar]
  • 30.Ling AY, Kurian AW, Caswell-Jin JL, et al. : Using natural language processing to construct a metastatic breast cancer cohort from linked cancer registry and electronic medical records data. JAMIA Open 2:528-537, 2019 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Sheikhalishahi S, Miotto R, Dudley JT, et al. : Natural language processing of clinical notes on chronic diseases: Systematic review. JMIR Med Inform 7:e12239, 2019 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Sanyal J, Tariq A, Kurian AW, et al. : Weakly supervised temporal model for prediction of breast cancer distant recurrence. Sci Rep 11:9461, 2021 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Cahan EM, Hernandez-Boussard T, Thadaney-Israni S, et al. : Putting the data before the algorithm in big data addressing personalized healthcare. NPJ Digit Med 2:78, 2019 [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from JCO Clinical Cancer Informatics are provided here courtesy of American Society of Clinical Oncology

RESOURCES