Skip to main content
JCO Clinical Cancer Informatics logoLink to JCO Clinical Cancer Informatics
. 2018 Jul 13;2:CCI.17.00128. doi: 10.1200/CCI.17.00128

Automated Extraction of Grade, Stage, and Quality Information From Transurethral Resection of Bladder Tumor Pathology Reports Using Natural Language Processing

Alexander P Glaser 1, Brian J Jordan 1, Jason Cohen 1, Anuj Desai 1, Philip Silberman 1, Joshua J Meeks 1,
PMCID: PMC7010439  PMID: 30652586

Abstract

Purpose

Bladder cancer is initially diagnosed and staged with a transurethral resection of bladder tumor (TURBT). Patient survival is dependent on appropriate sampling of layers of the bladder, but pathology reports are dictated as free text, making large-scale data extraction for quality improvement challenging. We sought to automate extraction of stage, grade, and quality information from TURBT pathology reports using natural language processing (NLP).

Methods

Patients undergoing TURBT were retrospectively identified using the Northwestern Enterprise Data Warehouse. An NLP algorithm was then created to extract information from free-text pathology reports and was iteratively improved using a training set of manually reviewed TURBTs. NLP accuracy was then validated using another set of manually reviewed TURBTs, and reliability was calculated using Cohen’s κ.

Results

Of 3,042 TURBTs identified from 2006 to 2016, 39% were classified as benign, 35% as Ta, 11% as T1, 4% as T2, and 10% as isolated carcinoma in situ. Of 500 randomly selected manually reviewed TURBTs, NLP correctly staged 88% of specimens (κ = 0.82; 95% CI, 0.78 to 0.86). Of 272 manually reviewed T1 tumors, NLP correctly categorized grade in 100% of tumors (κ = 1), correctly categorized if muscularis propria was reported by the pathologist in 98% of tumors (κ = 0.81; 95% CI, 0.62 to 0.99), and correctly categorized if muscularis propria was present or absent in the resection specimen in 82% of tumors (κ = 0.62; 95% CI, 0.55 to 0.73). Discrepancy analysis revealed pathologist notes and deeper resection specimens as frequent reasons for NLP misclassifications.

Conclusion

We developed an NLP algorithm that demonstrates a high degree of reliability in extracting stage, grade, and presence of muscularis propria from TURBT pathology reports. Future iterations can continue to improve performance, but automated extraction of oncologic information is promising in improving quality and assisting physicians in delivery of care.

INTRODUCTION

Bladder cancer is the fifth most common cancer overall1 and is one of the most expensive cancers to treat because of high recurrence rates and the need for intensive surveillance in non–muscle-invasive disease.2 Tumors are initially diagnosed, staged, and treated with cystoscopic transurethral resection of bladder tumor (TURBT). Critical staging information obtained during TURBT includes clinical stage, as well as tumor grade, pathologic stage, and histology of the resected tumor. This staging information guides further therapy and treatment. For example, oncologic guidelines recommend patients with high-grade Ta tumor undergo repeat TURBT if no muscularis propria is present in the specimen and recommend all patients with all T1 tumors undergo repeat TURBT, because up to one third of these patients may be understaged, and routine reresection may improve rates of recurrence and progression in these patients.3-5

Therefore, to accurately stage and risk stratify bladder cancer, the information in TURBT pathology reports must include whether muscularis propria is present in the resection specimen or not and explicitly state whether the tumor invades the muscularis propria or not.3,6 However, ≥ 20% of TURBT pathology reports do not contain all information needed to accurately stage and risk stratify patients with bladder cancer, including the presence or absence of muscularis propria.6-8 Furthermore, patterns of treatment, rates of re-TURBT use, and intensity of surveillance vary among providers,9-13 and recurrence rates for Ta and T1 tumors vary across institutions and surgeons.14

Our goal is to improve the quality of care for patients with bladder cancer by monitoring the quality of resection and pathologic reporting in TURBTs. However, TURBT pathology reports are considered endoscopic biopsies and frequently dictated as free text (eg, “Bladder tumor, left lateral wall: high-grade urothelial carcinoma with invasion into the lamina propria”) and not as categorical or standardized template data, as most radical resections are (eg, radical cystectomy). This makes large-scale analysis of TURBT data challenging, because the dictation elements and style vary by pathologist. Automated extraction of critical data from TURBT pathology reports could be used to improve quality and help clinicians determine the appropriate next steps in patient care.

Here, we present an algorithm based on natural language processing (NLP) to automatically extract critical stage, grade, and quality information from TURBT pathology reports. We hypothesized that NLP could reliably (κ > 0.8)15 extract this information, in comparison with manual record review.

METHODS

Patient Identification

After obtaining institutional board approval, we performed a retrospective analysis of TURBTs performed at Northwestern Memorial Hospital from January 2006 to September 2016 using the Northwestern Enterprise Data Warehouse (EDW), which serves as a central repository for data from multiple sources, including Epic and Cerner electronic medical records. Procedures were identified using current procedural terminology codes (52204, 52224, 52234, 52235, 52240). Pathology reports from these procedures, dates of the procedures, and surgeons, as well as patient age, sex, and demographics, were extracted from the EDW.

Development of Automated Staging Information Using NLP

Using the EDW and a training set of 867 manually reviewed TURBTs from a prior institutional review board–approved quality improvement database, we identified critical terminology for staging bladder cancer in TURBTs, including the terms carcinoma in situ (CIS), lamina propria, muscularis propria, high grade, low grade, and invading. Using the EDW, the SQL programming language, and regular expressions, an NLP algorithm was then created to extract information. This algorithm was then iteratively improved until most data points from the training set were accurately extracted. Iteration yielded inclusion of analogous terms such as invasion, invading, and invades and alternative terms and formatting deviations such as carcinoma in situ, carcinoma in-situ, in-situ carcinoma, and in situ carcinoma (Data Supplement).

Code structure and logic are summarized in Table 1, and examples of extraction from pathology reports are shown in Figure 1. Algorithm output terms included MRN, sex, race, ethnicity, date of birth, date of death, International Classification of Diseases (ninth or 10th revision) diagnosis code (eg, 188.9, C68.9), date of diagnosis, date of surgery, age at surgery, surgeon, stage (benign, CIS, Ta, T1, T2), presence of muscularis propria in resection specimen, and tumor grade.

Table 1.

Code Structure Describing NLP Algorithm

graphic file with name CCI.17.00128t1.jpg

Fig 1.

Fig 1.

Examples of extraction of information from transurethral resection of bladder tumor pathology reports using natural language processing. Gold highlighted fields show text corresponding to extracted information; red, tumor stage; gray, tumor grade; blue, presence or absence of muscularis propria. CIS, carcinoma in situ.

Manual Validation

After identification of 3,042 TURBTs from the EDW, manual validation of staging accuracy was performed via record review of 500 randomly selected TURBTs. Further validation of 272 T1 tumors was then performed to validate extraction of grade, stage, histology, whether muscularis propria was mentioned in the pathology report, and whether muscularis propria was present or absent in the resection specimen. T1 tumors were selected for additional validation because these represent a particularly high-risk group. All manual record reviews were performed by subject matter experts (A.P.G., J.C., A.D.).

Statistical Analysis

Database manipulation was performed using R software (version 3.3.3; https://www.r-project.org/) and dplyr tool (version 0.7.0; https://cran.r-project.org/web/packages/dplyr/index.html). Raw percentage agreement was calculated as the number of accurate NLP-extracted data points divided by the number of manually reviewed data points. Cohen’s κ statistic16 was used to measure the level of agreement between NLP-extracted data points and manually reviewed data points using R package irr (version 0.84).

RESULTS

Description of All TURBTs

A total of 3,042 TURBTs performed by 20 urologists in 1,324 patients were identified. Demographics, stage, grade, whether muscularis propria was mentioned by the pathologist, and if muscularis propria was present in the resection specimen are listed in Table 2. Patients with T1 or T2 disease were slightly older (P = .0145), and as expected, T1 and T2 tumors were more likely to be high grade (P < .001), and T1 tumors were more likely to have muscularis propria both mentioned by the pathologist (P < .001) and present in the resection specimen (P < .001).

Table 2.

Demographic and Clinical Characteristics of TURBT Pathology Reports (N = 3,042)

graphic file with name CCI.17.00128t2.jpg

Validation of Staging

Accuracy of staging was performed by manual review of 500 randomly selected TURBTs not included in the initial training data set. Table 3 lists stage information as captured by NLP compared with stage information captured by manual record review. NLP accurately staged 441 of 500 specimens (raw accuracy, 88%; κ = 0.82; 95% CI, 0.78 to 0.86).

Table 3.

Tumor Stage in TURBT Pathology Reports As Determined by NLP and Manual Record Review (n = 500)

graphic file with name CCI.17.00128t3.jpg

Validation of Quality Metrics in T1 Tumors

Accuracy was further characterized in a group of 272 T1 tumors, for which grade and presence of muscularis propria in the resection specimen are critical to patient care (Table 4). NLP correctly categorized grade in 272 of 272 tumors (raw accuracy, 100%; κ = 1), correctly categorized if muscularis propria was reported by the pathologist in 268 of 272 tumors (raw accuracy, 98%; κ = 0.81; 95% CI, 0.62 to 0.99), and correctly categorized if muscularis propria was present or absent in the resection specimen in 222 of 272 tumors (raw accuracy, 82%; κ = 0.62; 95% CI, 0.55 to 0.73).

Table 4.

Tumor Grade and Quality Indicators in T1 Tumors As Determined by NLP and Manual Record Review (n = 280)

graphic file with name CCI.17.00128t4.jpg

Discrepancy Analysis

Discrepancy analysis was then performed to elucidate why specimens were categorized incorrectly by NLP and to improve future iterations. Ta tumors were the most frequently misstaged by NLP. Our NLP algorithm inaccurately categorized 27 of 204 Ta specimens as benign and 18 of 204 Ta specimens as CIS, underscoring the challenges with extracting categorical information from free text. Discrepancy analysis demonstrated that small fragments of Ta tumors were often misclassified as benign by NLP (eg, “small fragments of residual urothelial carcinoma, with no lamina propria invasion”). In addition, supplied clinical history in the pathology report was responsible for misclassification of Ta tumors as CIS (eg, “Clinical information: The patient is a 62-year-old female with a history of urothelial carcinoma in situ undergoing cystoscopy and bladder biopsy.”). A note by the pathologist was also responsible for misclassification of benign specimens as bladder tumors (eg, “Note: multiple deeper H&E [hematoxylin and eosin] stained sections have been examined. Given the history of a prior biopsy showing high-grade urothelial carcinoma…”). Furthermore, despite accounting for misspellings and formatting deviations in our initial training set, unaccounted-for human and typographic errors remained an issue for several misclassifications (eg, “high-grade papillary urothelial carcinoma invading amina propria”).

Of the T1 tumors, NLP performed well at assigning grade and determining if muscularis propria was mentioned in the report by the pathologist, but it misclassified 50 tumors that did have muscularis propria present in the resection specimen as not having muscularis propria in the specimen. Discrepancy analysis of these samples demonstrated that 23 of these samples had muscularis propria present in a so-called deep bite that did not have any carcinoma in the specimen (eg “A. Bladder tumor, transurethral resection: High-grade papillary urothelial carcinoma invading lamina propria. No muscularis propria is identified. B. Bladder tumor, deep bite, transurethral resection: Muscularis propria identified without tumor involvement.”), and 19 had multifocal T1 cancer, but only some of the specimens in the report contained muscularis propria (eg “A. Bladder tumor, right lateral wall, high-grade urothelial carcinoma involving lamina propria. Muscularis propria present and not involved. B. Bladder tumor, left lateral wall, high-grade urothelial carcinoma involving lamina propria. Muscularis propria not identified.”).

Importantly, analysis of these discrepancies is informative and will help improve future iterations by excluding supplied clinical information and pathologist notes and including deeper resection specimens.

DISCUSSION

Pathology reports from TURBTs are often reported as free text instead of as distinct categories. Physicians currently review and interpret these reports to direct next steps for patients with bladder cancer. Automated extraction of stage and other information from these pathology reports could assist physicians in determining next steps, provide avenues for quality improvement, and potentially improve outcomes for patients. Here, we present an NLP algorithm to extract stage, grade, and quality information from plain-text TURBT pathology reports.

NLP is a dynamic and powerful tool, and in urologic oncology, it has previously been used for risk stratification in prostate cancer17 and in identification of type of urinary diversion after radical cystectomy for bladder cancer.18 To our knowledge, this is the first report of the use of NLP to extract staging and quality information from TURBTs. Automated extraction of tumor stage and grade with NLP demonstrates a high degree of reliability (κ > 0.8) in comparison with manual record review, which is labor intensive, especially for a large number of cases. In addition, NLP demonstrates a high degree of reliability in determining whether muscularis propria is reported or not by the pathologist (κ > 0.8). This information is critical to the urologic surgeon because tumors that do not describe or include sampling of muscularis propria may be understaged and compromise patient outcomes.4,5,13 A lower degree of reliability was demonstrated to determine if muscularis propria was present in T1 tumors (κ = 0.64), but all were false negatives, and in this setting, false negatives are more desirable than false positives, because false negatives would prompt the physician to review the case, whereas false positives could provide inappropriate reassurance that the resection was adequate. Furthermore, discrepancy analysis revealed that most of these false negatives resulted from muscularis propria being present in deeper levels of resection sent as separate specimens (ie, deep bites), an issue that is patchable in future iterations.

Automated extraction of information from TURBT pathology reports has multiple possible applications. We are currently investigating use of this algorithm to monitor for rates of muscle presence in T1 tumors by surgeons at our institution and validating this as a marker of surgical quality. Report cards19 are provided to surgeons, and TURBT reports are then monitored, with the goal of improving both quality of resection and patient care. Furthermore, we plan to build and deploy a readily accessible dashboard (eg, Shiny; https://shiny.rstudio.com/) to assist surgeons in monitoring results and improving patient care. This algorithm could also be used as a quality assurance tool for pathologists to ensure appropriate reporting of muscle presence or absence in a specimen. Finally, this algorithm can be used to create and maintain a research database for academic institutions. Because the code is open source and live updating, minimal cost is associated with upkeep.

Several limitations of this study should be acknowledged. First, our algorithm was created using the electronic medical record and pathology reports of a single institution. This may limit generalizability of our reliability to other centers. However, SQL and regular expressions are platform agnostic (ie, applicable to multiple electronic medical record systems), and our open-source code is customizable to the tendencies and patterns of other health care institutions. Furthermore, we plan to expand application of future iterations to other centers in our medical group and encourage others to apply our algorithm as well. In addition, although we included the quality indicator of the presence or absence of muscularis propria, we did not include lymphovascular invasion, which is also important for risk stratification of bladder cancer20 and is now included in the National Comprehensive Cancer Network guidelines.3 Furthermore, clinical stage (ie, bimanual examination), tumor size, and number of tumors are not included in the pathology report and are instead dictated in an operative note. Creation and validation of an NLP algorithm to extract this information from operative notes could be included in future iterations. Inclusion of all of these factors in an automated algorithm could help improve incorporation of these important prognostic factors into routine clinical practice. Future directions and next steps could also include incorporation of machine learning to improve reliability and expansion to other primary cancer sites. Finally, our algorithm demonstrates good, but not perfect, agreement with manual review by physicians. It is critical to remember that technology can be used to assist in care delivery but cannot supplant physician-patient decision making.

In conclusion, application of NLP demonstrates a high degree of reliability in extracting critical stage, grade, and presence or absence of muscularis propria from TURBT reports, and future iterations can continue to improve performance. Automated extraction of oncologic information shows promise in assisting physicians in delivery of care and providing avenues for quality improvement.

Footnotes

Supported in part by Grant No. UL1TR001422 from the National Institutes of Health National Center for Advancing Translational Sciences and by Veterans Health Administration Merit Grant No. BX0033692-01 and the John P. Hanson Foundation for Cancer Research at the Robert H. Lurie Comprehensive Cancer Center, Northwestern University (J.J.M.).

Presented in part at the Annual Meeting of the American Urological Association, Boston, MA, May 12-16, 2017.

The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. Funding sources had no role in the writing of the manuscript or decision to submit it for publication.

AUTHOR CONTRIBUTIONS

Conception and design: Jason Cohen, Joshua J. Meeks

Collection and assembly of data: Alexander P. Glaser, Jason Cohen, Anuj Desai, Philip Silberman, Joshua J. Meeks

Data analysis and interpretation: All authors

Manuscript writing: All authors

Final approval of manuscript: All authors

Accountable for all aspects of the work: All authors

AUTHORS' DISCLOSURES OF POTENTIAL CONFLICTS OF INTEREST

The following represents disclosure information provided by authors of this manuscript. All relationships are considered compensated. Relationships are self-held unless noted. I = Immediate Family Member, Inst = My Institution. Relationships may not relate to the subject matter of this manuscript. For more information about ASCO's conflict of interest policy, please refer to www.asco.org/rwc or ascopubs.org/jco/site/ifc

Alexander P. Glaser

No relationship to disclose

Brian J. Jordan

No relationship to disclose

Jason Cohen

No relationship to disclose

Anuj Desai

No relationship to disclose

Philip Silberman

No relationship to disclose

Joshua J. Meeks

Honoraria: AstraZeneca

REFERENCES

  • 1.Siegel RL, Miller KD, Jemal A. Cancer statistics, 2016. CA Cancer J Clin. 2016;66:7–30. doi: 10.3322/caac.21332. [DOI] [PubMed] [Google Scholar]
  • 2.Svatek RS, Hollenbeck BK, Holmäng S, et al. The economics of bladder cancer: Costs and considerations of caring for this disease. Eur Urol. 2014;66:253–262. doi: 10.1016/j.eururo.2014.01.006. [DOI] [PubMed] [Google Scholar]
  • 3.National Comprehensive Cancer Network NCCN Guidelines in Clinical Oncology: Bladder cancer 5.2017, 2017. https://www.nccn.org/
  • 4.Dalbagni G, Vora K, Kaag M, et al. Clinical outcome in a contemporary series of restaged patients with clinical T1 bladder cancer. Eur Urol. 2009;56:903–910. doi: 10.1016/j.eururo.2009.07.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Divrik RT, Sahin AF, Yildirim U, et al. Impact of routine second transurethral resection on the long-term outcome of patients with newly diagnosed pT1 urothelial carcinoma with respect to recurrence, progression rate, and disease-specific survival: A prospective randomised clinical trial. Eur Urol. 2010;58:185–190. doi: 10.1016/j.eururo.2010.03.007. [DOI] [PubMed] [Google Scholar]
  • 6.Amin MB, Delahunt B, Bochner BH, et al. Protocol for the examination of specimens from patients with carcinoma of the urinary bladder http://www.cap.org/ShowProperty?nodePath=/UCMCon/Contribution%20Folders/WebContent/pdf/urinary-17protocol-3300.pdf
  • 7.Schroeck FR, Pattison EA, Denhalter DW, et al. Early stage bladder cancer: Do pathology reports tell us what we need to know? Urology. 2016;98:58–63. doi: 10.1016/j.urology.2016.07.040. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Capogrosso P, Capitanio U, Ventimiglia E, et al. Detrusor muscle in TUR-derived bladder tumor specimens: Can we actually improve the surgical quality? J Endourol. 2016;30:400–405. doi: 10.1089/end.2015.0591. [DOI] [PubMed] [Google Scholar]
  • 9.Strope SA, Ye Z, Hollingsworth JM, et al. Patterns of care for early stage bladder cancer. Cancer. 2010;116:2604–2611. doi: 10.1002/cncr.25007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Skolarus TA, Ye Z, Zhang S, et al. Regional differences in early stage bladder cancer care and outcomes. Urology. 2010;76:391–396. doi: 10.1016/j.urology.2009.12.079. [DOI] [PubMed] [Google Scholar]
  • 11.Hollenbeck BK, Ye Z, Dunn RL, et al. Provider treatment intensity and outcomes for patients with early-stage bladder cancer. J Natl Cancer Inst. 2009;101:571–580. doi: 10.1093/jnci/djp039. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Hollingsworth JM, Zhang Y, Krein SL, et al. Understanding the variation in treatment intensity among patients with early stage bladder cancer. Cancer. 2010;116:3587–3594. doi: 10.1002/cncr.25221. [DOI] [PubMed] [Google Scholar]
  • 13.Skolarus TA, Ye Z, Montgomery JS, et al. Use of restaging bladder tumor resection for bladder cancer among Medicare beneficiaries. Urology. 2011;78:1345–1349. doi: 10.1016/j.urology.2011.05.071. [DOI] [PubMed] [Google Scholar]
  • 14.Brausi M, Collette L, Kurth K, et al. Variability in the recurrence rate at first follow-up cystoscopy after TUR in stage Ta T1 transitional cell carcinoma of the bladder: A combined analysis of seven EORTC studies. Eur Urol. 2002;41:523–531. doi: 10.1016/s0302-2838(02)00068-4. [DOI] [PubMed] [Google Scholar]
  • 15.Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977;33:159–174. [PubMed] [Google Scholar]
  • 16.Cohen J. A coefficient of agreement for nominal scales. Educ Psychol Meas. 1960;20:37–46. [Google Scholar]
  • 17.Gregg JR, Lang M, Wang LL, et al. Automating the determination of prostate cancer risk strata from electronic medical records JCO Clin Cancer Inform. http://ascopubs.org/doi/full/10.1200/CCI.16.00045 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Tan H-J, Clarke R, Chamie K, et al. Development and validation of an automated method to identify patients undergoing radical cystectomy for bladder cancer using natural language processing. Urol Pract. 2017;4:365–372. doi: 10.1016/j.urpr.2016.09.011. [DOI] [PubMed] [Google Scholar]
  • 19.Matulewicz RS, Tosoian JJ, Stimson CJ, et al. Implementation of a surgeon-level comparative quality performance review to improve positive surgical margin rates during radical prostatectomy. J Urol. 2017;197:1245–1250. doi: 10.1016/j.juro.2016.11.102. [DOI] [PubMed] [Google Scholar]
  • 20.Mathieu R, Lucca I, Rouprêt M, et al. The prognostic role of lymphovascular invasion in urothelial carcinoma of the bladder. Nat Rev Urol. 2016;13:471–479. doi: 10.1038/nrurol.2016.126. [DOI] [PubMed] [Google Scholar]

Articles from JCO Clinical Cancer Informatics are provided here courtesy of American Society of Clinical Oncology

RESOURCES