Classifying Stereotactic Radiosurgery Patients by Primary Diagnosis Using Natural Language Processing of Clinical Notes

Mario Fugal; David Marshall; Alexander V Alekseyenko; Xia Jing; Graham Warren; Jihad Obeid

doi:10.1200/CCI-24-00268

. 2025 Jun 13;9:e2400268. doi: 10.1200/CCI-24-00268

Classifying Stereotactic Radiosurgery Patients by Primary Diagnosis Using Natural Language Processing of Clinical Notes

Mario Fugal ^1,^✉, David Marshall ¹, Alexander V Alekseyenko ¹, Xia Jing ², Graham Warren ¹, Jihad Obeid ¹

PMCID: PMC12178166 PMID: 40513052

Abstract

PURPOSE

Accurate identification of the primary tumor diagnosis of patients who have undergone stereotactic radiosurgery (SRS) from electronic health records is a critical but challenging task. Traditional methods of identifying the primary tumor histology relying on International Classification of Diseases (ICD)9 and ICD10 CM codes often fall short in granularity and completeness, particularly for patients with metastatic cancer.

METHODS

In this study, we propose an approach leveraging natural language processing (NLP) algorithms to enhance the accuracy of extracting primary tumor histology from the patient's electronic records.

RESULTS

Through manual annotation of patient data and subsequent algorithm training, we achieved improvements in accuracy and efficiency in primary tumor type classification and finding histology subtypes not available in ICD10 CM.

CONCLUSION

Our findings underscore the value of NLP in refining research processes, identifying patients' cohorts, and improving efficiencies with the goal of potentially improving patient outcomes in SRS treatment.

INTRODUCTION

Stereotactic radiosurgery (SRS) has emerged as a pivotal therapeutic modality for the management of brain metastases (BMs) and other intracranial lesions. Over the past decade, the use of SRS for treating metastatic disease has increased.^1-4 The rapid expansion of the use of SRS has been accompanied by observational studies that have influenced the clinical use of SRS.^2,5-7 For example, Yamamoto et al⁸ showed that patients with 10 or more metastases treated with SRS had no difference in overall survival compared with those with two to nine metastases treated using observational data. Limon et al reported their clinical outcomes for treating patients with four or more BMs using SRS, including their efficacy and safety.⁹ These studies highlight the importance of these observational data to the SRS community. Because of the rapidly expanded use of SRS, researcher clinicians publish observational data, and clinicians use them to best inform how they treat patients. Curating data for metastatic radiosurgery patients is challenging as important information is not always in a structured form. One of the first concerns is how to identify the metastasis type by finding the primary tumor histology, which is an important factor in the treatment and outcome of SRS patients.

CONTEXT

Key Objective
Can International Classification of Diseases (ICD) codes reliably identify primary histology in patients with brain metastases (BMs) treated with radiosurgery in the electronic health record?
Knowledge Generated
ICD codes were found to be inaccurate and incomplete for identifying histologies. Basic natural language processing (NLP) improved the F1-score to nearly 0.99 for most histologies.
Relevance (S. Aneja)
Identifying patients with BMs is increasing difficulty with current structured data formats. This study demonstrates the utility of NLP to improve our ability to identify and study this unique population. Further work in this area can help us better conduct real-world studies of BMs treatments.*
*Relevance section written by JCO Clinical Cancer Informatics Associate Editor Sanjay Aneja, MD.

Accurate identification of the primary tumor histology from the electronic health record (EHR) remains a formidable challenge in retrospective metastatic cancer analyses. The reliance on the International Classification of Diseases (ICD)-Ninth Revision, Clinical Modification (CM) codes, namely, ICD-9 CM and ICD-10 CM (hereafter referred to as ICD), for primary tumor diagnosis often provides inadequate granularity and completeness, particularly in cases of metastatic cancer and where patients might have multiple cancer types or initially present with BMs at the time of cancer diagnosis.^10-12

Our first attempts to analyze our patient population relied on using ICD codes to classify by histology. However, when reviewing the patients' records identified by ICD codes, we noted that many patients were classified into more than one primary tumor class. Others did not have a primary cancer diagnosis. The main limitations of relying on ICD codes only include the following:

Incomplete coding such as only having BM codes without the primary tumor type code.
The lack of ICD codes for relevant primary tumor subtypes.
Patients with multiple primary cancer diagnoses with no clear indication which cancer has metastasized to the brain.

Patients sometimes presented to the SRS clinic with BMs at the time of cancer diagnosis and were treated with SRS, and the primary cancer diagnosis code is not added to the EHR. However, the clinician nearly always documented the histology or at least the presumed histology in their note (Data Supplement).

Furthermore, ICD codes do not differentiate histologic subtypes of cancers such as the major subdivisions of lung cancer. Small cell lung cancer (SCLC) is meaningfully different from non–small cell lung cancers (NSCLCs). These subtypes have clinically different disease courses and treatments. In the case of SCLC, for example, the BMs are considered very radiosensitive but prone to multiplying rapidly, making SRS historically less common versus whole-brain radiotherapy. Grouping all lung cancers into one cohort in the study of outcomes of radiosurgery is typically not ideal for these reasons, making the ICD codes less relevant in the setting of SRS.

The third limitation of using ICD codes is that the patient might have a history of multiple cancers, especially skin, lung, breast, or prostate cancers. Searching for patients who have a BM ICD code (C79.31) and another cancer code (from the “C” and “D” chapters of ICD 10 CM) will lead to patients having conflicting diagnoses as shown below. The most accurate and meaningful source of information for the indication for the SRS procedure was the consult note from either the radiation oncologist or the neurosurgeon. While the clinicians are limited by the diagnosis codes that are available to choose in ICD, they are free to communicate meaningful data in free form in the consult and follow-up notes.

To address these limitations of relying on ICD codes only, we used natural language processing (NLP) algorithms to extract tumor histology and SRS indication from clinical narratives. We hypothesize that this approach improves the accuracy and efficiency of identifying primary tumor histology and SRS indication in the EHR. By harnessing the power of NLP with the information-rich notes, we enhanced the efficiency and precision of this critical clinical task, with the goal of ultimately facilitating more efficient and effective observational studies of SRS cohorts.

BACKGROUND

Primary tumor histology diagnosis plays a pivotal role in guiding treatment decisions and predicting patient outcomes after SRS.^13-15 However, the conventional methods for ascertaining primary tumor types, primarily relying on ICD codes, often lack the granularity and accuracy required, particularly in the context of metastatic cancer.^10,11,16 Patients with metastatic lesions may present with a complex clinical history involving multiple cancer types or may initially present with BMs where clinicians only enter in the ICD code for BMs, making the ICD-coded data for the patients incomplete or conflicting, necessitating a more nuanced approach to primary tumor identification.

There are several publications on the difficulty and importance of identifying metastatic cancers and classifying them correctly.^10,16 These reports highlight the metastatic blind spot in ICD although because of the heterogeneity of metastases and the many possible combinations, a structured hierarchy does seem cumbersome and daunting. However, clinicians have interest in using the more granular information that is not reflected in the current ICD codes as it directs their decision making to optimize the patients' outcomes. To automate the tedious and important task of identifying primary histology, we developed a NLP tool to accurately extract this information.

This is the first publication to the authors' knowledge that describes an automated NLP method for electronic phenotyping of BMs. While many studies have explored various aspects of SRS for treating BMs, they typically lack detailed descriptions of the methods used to extract primary histology, which is presumably done through manual curation of the data sets.^13-15

There has been growing interest in leveraging advanced computational techniques, such as NLP, to augment traditional clinical and research workflows. NLP offers the promise of automating tedious manual tasks, extracting relevant information from unstructured clinical narratives, and facilitating more accurate and timely gathering of diagnosis and outcome information.^10,16-21 Using the notes as the primary source of information takes advantage of the clinician's interest in accuracy and completeness of the notes versus the difficulty and oftentimes lack of useful granularity of ICD codes. In this study, we explore the potential of NLP algorithms to optimize identifying primary tumor histology diagnosis among patients undergoing SRS in the EHR, with the overarching goal of improving clinical decision making and patient care outcomes.

METHODS

Ethical Approval

This study received approval from the Institutional Review Board (IRB) at the Medical University of South Carolina (MUSC) as exempt, in accordance with institutional guidelines. Data acquisition, analysis, and storage adhered strictly to the protocols established by the IRB.

Data Collection

A data request was submitted to the MUSC data warehouse through SPARCRequest,^22,23 seeking patient records spanning the period from the beginning of 2012 to the end of 2021 for individuals who underwent treatment with the Gamma Knife system. Three primary data tables were used for this research: Diagnosis, Procedures, and Notes. The data set included 1,461 patients, 82,000 notes, and 333,000 cancer-related diagnoses.

Data Preprocessing

Rigorous preprocessing and data quality assurance steps were undertaken to ensure data quality and consistency. Each table was manually reviewed to identify and rectify any instances of duplication, inconsistency, or incompleteness. Examples of rectification included limiting the number of SRS procedures per day to one, splitting strings of multiple ICD diagnosis codes to individual codes, and using regex to extract the diagnosis portion of the notes (see Data Supplement data in Appendix A and B for string splitting code. Appendix C shows samples of text strings for training). A major data preparation step was to manually review consults and follow-up notes to identify the primary tumor type or the reason for the SRS for nonmalignant neoplasms for the 2,012 SRS sessions. The radiation oncology note with the closest date of service to the SRS session was used to identify the primary histology for metastasis or for nonmalignancies the condition being treated. The focus of this research was the identification of the primary tumor types, but other types indicated conditions for SRS (arteriovenous malformations [AVMs], trigeminal neuralgia, and acoustic nerve neoplasms) which were included for additional context to show the capabilities and limitations of both ICD and NLP.

Manual Labeling of Primary Tumor Types

To accurately ascertain the primary tumor type for each patient, radiation oncology consultation records were meticulously reviewed by a clinical expert. Given the complexity of patient histories, particularly in cases of metastatic cancer, manual annotation was necessary. Notably, many patients presented with multiple cancer diagnoses, with the primary billing diagnosis often coded as C79.31, indicating secondary malignant neoplasm of brain. A more precise cancer diagnosis is needed to provide sufficient information for research. Each consult or follow-up note with the closest date to each SRS date was reviewed, and the primary tumor was categorized as shown in Table 1. Note that the two columns “No. of Patients” and “No. of SRS sessions” have different values. This is because patients, especially those with metastases, will undergo multiple SRS sessions either because of the number and size of lesions being too much for one session or as more lesions appear later or less frequently they may recur.

TABLE 1.

Overview of Primary Tumor/Target Types for Patients and Number of SRS Sessions

Clinical Diagnosis Including Grouping and ICD Codes in Brackets	No. of Patients	No. of SRS Sessions
Lung [C30-C39]	438	670
Cranial nerves	200	212s chart may or may not accurate
Benign neoplasm of cranial nerves [D33.3]
Trigeminal neuralgia [G50.0]
Benign neoplasm of peripheral nerves and autonomic nervous system [D36.1]
Meningioma [D32.0, D32.9]	171	207
Melanoma and other skin cancers [C43-C44]	82	147
Breast [C50, D05]	75	143
Digestive organs [C15-C26]	42	60
Arteriovenous malformation [Q28.2]	38	39
Renal [C64, C65]	33	56
Head and neck [C00-C14]	21	29
Other (primary brain or categories with <20 SRS sessions)	382	478
Total	1,461	2,012

Open in a new tab

NOTE. The number of SRS sessions is higher because of repeated treatments for additional lesions or for salvage therapy.

Abbreviations: ICD, International Classification of Diseases; SRS, stereotactic radiosurgery.

RESULTS

Results From Using ICD Codes to Classify BM Histology

The utilization of ICD codes for classifying diagnosis types was found to be ineffective for BMs. For example, we found, as shown in Table 2, that for metastatic skin cancer (primarily melanoma), the recall was high at 0.94, but the precision was low at 0.45, likely because of skin cancer being very common, including in patients with BM from metastatic lung or breast cancer. In nearly all BM cases, we found that the primary ICD code C71.9 indicated only BMs, whereas additional cancer codes in the patient's chart may or may not accurately reflect the primary cancer type. Table 2 presents the precision, recall, and F1-score of using ICD for each primary tumor type and for cases involving nonmalignant cases where SRS was indicated. The accuracy of these classifications varied significantly, with AVM achieving a high F-score of 0.96, whereas GI tumors exhibited a lower F-score of 0.58. The high accuracy of AVM is likely due to the fact that patients who experience AVMs often have not developed cancers by the time of diagnosis and treatment for the AVM as the average age of diagnosis is in the 30s.²⁴

TABLE 2.

Results of Pulling Diagnosis Codes of Interest From ICD Compared With True Brain Metastasis Origin for Each SRS Session

ICD Data	No. (SRS sessions)	Precision	Recall	F1-Score
Lung	670	0.81	0.93	0.87
Cranial nerves	212	0.80	1.00	0.89
Meningioma	207	0.77	1.00	0.87
Melanoma/skin	147	0.45	0.94	0.61
Breast	143	0.63	0.97	0.77
GI	60	0.42	0.93	0.58
AVM	39	0.92	1.00	0.96
Renal	56	0.61	0.88	0.72
Head and neck	29	0.60	0.57	0.58

Open in a new tab

Abbreviations: AVM, arteriovenous malformation; ICD, International Classification of Diseases; SRS, stereotactic radiosurgery.

Training the Primary Diagnosis NLP Classifier

The text sections pertaining to diagnosis within the radiation oncology consultation notes followed distinct patterns, with the diagnosis typically following keywords such as “DIAGNOSIS:” or “TREATMENT HISTORY:.” Using regular expressions (regex), the diagnosis strings, limited to a maximum length of 300 characters, were extracted for training purposes. Manually curated ground truth labels were used for training and testing the NLP algorithm. To show the flexibility and utility of such an algorithm, a diverse type of grouping was used including BM and a composite group of cranial nerve diagnoses and several nonmalignant pathologies. Random forest models for each primary diagnosis were trained using the scikit-learn²⁵ package in Python, with text preprocessing steps including the removal of numbers, special characters, and lowercase conversion. Hyperparameter tuning was used to find the best model for each histology using bag of words. Model performance metrics were evaluated using precision, recall, and F-score. Some primary histologies with too few examples for a test set were excluded.

The random forest classifier from the scikit-learn Python package was used in NLP training as well. We used a Term Frequency counter and Inverse Document Frequency vectorizer²⁶ which converts a collection of text documents into a numerical format, where each word's importance is based on its frequency in a particular document and its rarity across the entire data set. This helps in emphasizing more meaningful or informative words while reducing the influence of very common, less informative ones. A classifier for each histology was trained using a 0.90 train set ratio and 10-fold cross-validation. A stop word list was manually generated based on the data set. This was done to avoid inadvertently including a meaningful word in the stopword list. The results for training the classifier are listed in Table 3.

TABLE 3.

Test Data Results After Training NLP Classifiers Using a Random Forest Model

NLP	No. (test SRS sessions)	Precision	Recall	Test F1-Score (STDEV)
Lung	129	0.96	1.00	0.97 (0.01)
Cranial nerves	43	1.00	1.00	1.00 (0.02)
Meningioma	42	1.00	0.81	0.89 (0.02)
Breast	29	1.00	0.93	0.97 (0.02)
Melanoma	29	1.00	0.93	0.97 (0.02)
GI	12	1.00	1.00	0.92 (0.08)
AVM	8	1.00	1.00	1.00 (0.00)
Renal	—	—	—	—
Head and neck	—	—	—	—

Open in a new tab

NOTE. Renal and head and neck had too few cases for providing meaningful test statistics (AVM included for comparison with ICD classification). The cross-validation standard deviation of the F-score is listed in the final column in parentheses.

Abbreviations: AVM, arteriovenous malformation; ICD, International Classification of Diseases; NLP, natural language processing; SRS, stereotactic radiosurgery; STDEV, standard deviation.

Results From Training the NLP Classifier

To enhance the clinical research impact of this project, additional NLP classifiers were trained to identify subtypes of lung cancer. ICD codes for lung cancer are constrained to the anatomic location of cancer. However, there are clinically meaningful subtypes of lung cancer that affect clinician's decision making. Despite ICD code limitations, the doctors' notes often specify which subtype the cancer is. For some notes, they specify the major categories of NSCLC and SCLC. Other times, the type of NSCLC such as adenocarcinoma of the lung or squamous cell carcinoma of the lung is specifically mentioned. The results are shown in Table 4.

TABLE 4.

Test Data Results After Training NLP Classifiers Using a Random Forest Model Differentiating the Subtypes of Lung Cancer

NLP—Lung	No. (test SRS sessions)	Precision	Recall	Test F1-Score
NSCLC	173	1.00	1.00	1.00
Adenocarcinoma of the lung	254	0.96	0.96	0.96
Squamous cell carcinoma of the lung	36	1.00	1.00	1.00
SCLC	63	1.00	1.00	1.00

Open in a new tab

NOTE. The cross-validation standard deviation of the F-score is listed in the final column in parentheses.

Abbreviations: NLP, natural language processing; NSCLC, non–small cell carcinoma of the lung; SCLC, small cell lung cancer; SRS, stereotactic radiosurgery.

A summary of the F1-score comparison between ICD and NLP classification is shown in Table 5. Note that NLP has a consistently higher F1-score, even doubling for GI and melanoma. The one exception is the case of AVM where the F1-score is equivalent. This is due to the relative simplicity of the diagnosis and the AVM patients' EHR records (see Appendix D in the Data Supplement for relative word importance for each histology type).

TABLE 5.

Summary of Comparison of ICD and Test NLP F1-Scores

Histology	No. of SRS Sessions	ICD F1-Score	NLP Test F1-Score (STDEV)
Lung	645	0.87	0.97 (0.01)
Cranial nerves	212	0.89	1.00 (0.02)
Meningioma	207	0.87	0.89 (0.02)
Breast	141	0.77	0.97 (0.02)
Melanoma	142	0.61	0.97 (0.02)
GI	59	0.58	0.92 (0.08)
AVM	39	0.96	1.00 (0.00)
Renal	56	0.72	—
Head and neck	29	0.58	—

Open in a new tab

Abbreviations: AVM, arteriovenous malformation; ICD, International Classification of Diseases; NLP, natural language processing; SRS, stereotactic radiosurgery; STDEV, standard deviation.

DISCUSSION

The successful implementation of NLP classification in study highlights the transformative potential of NLP in enhancing the accuracy and efficiency of primary tumor histology classification for patients undergoing SRS. By addressing the significant limitations of ICD codes, which often fall short in cases involving complex metastatic histories or multiple malignancies, our NLP-based approach provides a more precise method for identifying primary tumor types. This advancement not only facilitates more accurate data curation for research but also lays the groundwork for improving clinical decision making and patient outcomes in SRS.

Improving clinical outcomes after radiosurgery relies on studying patient outcomes relying on clinical data. For example, current research on radiation necrosis after radiosurgery is observational, which relies on clinical data generated during routine clinical work. Because of the lack of structured data to capture critical information for SRS patients with BM, this manual effort is time-consuming, leading to a high threshold of work required to perform before analysis can be performed.

Here, we show that even categorizing by primary tumor type is not trivial and requires manual labeling of each case. However, using NLP, the primary diagnosis can be found with high accuracy and efficiency. Even with few training samples, the desired signal is strong enough to achieve excellent results by training a relatively lightweight and simple NLP model to identify both composite groups such as patients with a variety of cranial nerve abnormalities or defined groups such as patients with a specific subtype of lung cancer. The same method likely could be used to further discern different histologic subtypes of breast or other sites with further training and larger subpopulations.

There is little to no mention in the publications on outcomes of SRS of how the authors determined the primary diagnosis of the patients with BM.^13-15 It is presumed that they used a manual approach. However, using NLP as a tool to categorize specific or generic data about patients using the best source of data, the clinical notes will take advantage of the data-rich notes and the efficiency of using an automated algorithm to extract the necessary information.

Implementing these algorithms in a learning health system-type classifier for clinical research would be both simple and useful. The raw data processing and output need no user intervention, and continual data processing can happen in the background and keep the clinical measures ready for researchers to study.

We aim to leverage this primary histology classifier to enhance our observational data analysis, enabling us to report the most up-to-date radiation necrosis risk factors from our institution. This work serves as a foundational step toward advancing our research in SRS, particularly in the evolving landscape of BM management and emerging therapeutic options.

In conclusion, the success of our NLP algorithm in accurately categorizing primary tumors highlights its potential for broader application beyond the scope of this study. The developed pipeline can be easily adapted and retrained for use at other institutions with minimal effort, providing a scalable solution to the challenges of data extraction and classification in clinical research. Furthermore, our approach streamlines data set preparation by enabling the model to be trained on a subset of data and then applied to the entire cohort of SRS patients. This significantly reduces the time and manual effort typically required for such tasks, facilitating faster and more comprehensive analyses of patient outcomes.

As we look to the future, the integration of NLP with other data modalities, such as imaging, presents an exciting opportunity to further refine predictive models for SRS outcomes, including the risk of radiation necrosis. The continuous processing of clinical data through an automated NLP pipeline could empower researchers to maintain up-to-date data sets, facilitating ongoing studies and enabling real-time insights into patient care. While external validation is needed to confirm the generalizability of our findings, the simplicity and effectiveness of our approach suggest that NLP could become a vital tool in the evolution of clinical research methodologies in oncology.

This study represents a critical step forward in leveraging advanced computational techniques to overcome the limitations of traditional coding systems, ultimately paving the way for more personalized and effective treatment strategies in SRS.

Mario Fugal

Stock and Other Ownership Interests: Synlogic

David Marshall

Employment: Medical University of South Carolina, Hollings Cancer Center

Leadership: First String Research

Stock and Other Ownership Interests: First String Research, Baebies Inc (I), NIRVana Sciences, Inc (I)

Consulting or Advisory Role: Isoray (Inst)

Graham Warren

Patents, Royalties, Other Intellectual Property: Patent pending for radioprotective compound (Inst), patent or royalties associated with a radioprotective compound

Other Relationship: Non-profit organizations, expert testimony

Jihad Obeid

Research Funding: National Science Foundation (Inst), Centers for Disease Control and Prevention (Inst), NIH (Inst), Patient-Centered Outcomes Research Institute (PCORI) (Inst)

No other potential conflicts of interest were reported.

SUPPORT

Supported in part by Hollings Cancer Center, Medical University of South Carolina.

AUTHOR CONTRIBUTIONS

Conception and design: Mario Fugal, David Marshall, Graham Warren, Jihad Obeid

Financial support: Mario Fugal

Administrative support: Jihad Obeid

Provision of study materials or patients: Mario Fugal

Collection and assembly of data: Mario Fugal

Data analysis and interpretation: All authors

Manuscript writing: All authors

Final approval of manuscript: All authors

Accountable for all aspects of the work: All authors

AUTHORS' DISCLOSURES OF POTENTIAL CONFLICTS OF INTEREST

The following represents disclosure information provided by authors of this manuscript. All relationships are considered compensated unless otherwise noted. Relationships are self-held unless noted. I = Immediate Family Member, Inst = My Institution. Relationships may not relate to the subject matter of this manuscript. For more information about ASCO's conflict of interest policy, please refer to www.asco.org/rwc or ascopubs.org/cci/author-center.

Open Payments is a public database containing information reported by companies about payments made to US-licensed physicians (Open Payments).

Mario Fugal

Stock and Other Ownership Interests: Synlogic

David Marshall

Employment: Medical University of South Carolina, Hollings Cancer Center

Leadership: First String Research

Stock and Other Ownership Interests: First String Research, Baebies Inc (I), NIRVana Sciences, Inc (I)

Consulting or Advisory Role: Isoray (Inst)

Graham Warren

Patents, Royalties, Other Intellectual Property: Patent pending for radioprotective compound (Inst), patent or royalties associated with a radioprotective compound

Other Relationship: Non-profit organizations, expert testimony

Jihad Obeid

Research Funding: National Science Foundation (Inst), Centers for Disease Control and Prevention (Inst), NIH (Inst), Patient-Centered Outcomes Research Institute (PCORI) (Inst)

No other potential conflicts of interest were reported.

REFERENCES

1. Fecci PE, Champion CD, Hoj J, et al. The evolving modern management of brain metastasis. Clin Cancer Res. 2019;25:6570–6580. doi: 10.1158/1078-0432.CCR-18-1624. [DOI] [PMC free article] [PubMed] [Google Scholar]
2. O'Beirn M, Benghiat H, Meade S, et al. The expanding role of radiosurgery for brain metastases. Medicines. 2018;5:90. doi: 10.3390/medicines5030090. [DOI] [PMC free article] [PubMed] [Google Scholar]
3. Higuchi Y, Yamamoto M, Serizawa T, et al. Modern management for brain metastasis patients using stereotactic radiosurgery: Literature review and the authors’ gamma knife treatment experiences. Cancer Manag Res. 2018;10:1889–1899. doi: 10.2147/CMAR.S116718. [DOI] [PMC free article] [PubMed] [Google Scholar]
4. Nieder C, Grosu AL, Gaspar LE. Stereotactic radiosurgery (SRS) for brain metastases: A systematic review. Radiat Oncol. 2014;9:155. doi: 10.1186/1748-717X-9-155. [DOI] [PMC free article] [PubMed] [Google Scholar]
5. Mohammadi AM, Recinos PF, Barnett GH, et al. Role of Gamma Knife surgery in patients with 5 or more brain metastases: Clinical article. J Neurosurg. 2012;117:5–12. doi: 10.3171/2012.8.GKS12983. [DOI] [PubMed] [Google Scholar]
6. Minniti G, Clarke E, Lanzetta G, et al. Stereotactic radiosurgery for brain metastases: Analysis of outcome and risk of brain radionecrosis. Radiat Oncol. 2011;6:48. doi: 10.1186/1748-717X-6-48. [DOI] [PMC free article] [PubMed] [Google Scholar]
7. Stockham AL, Ahluwalia M, Reddy CA, et al. Results of a questionnaire regarding practice patterns for the diagnosis and treatment of intracranial radiation necrosis after SRS. J Neurooncol. 2013;115:469–475. doi: 10.1007/s11060-013-1248-6. [DOI] [PubMed] [Google Scholar]
8. Yamamoto M, Kawabe T, Sato Y, et al. Stereotactic radiosurgery for patients with multiple brain metastases: A case-matched study comparing treatment results for patients with 2-9 versus 10 or more tumors. J Neurosurg. 2014;121:16–25. doi: 10.3171/2014.8.GKS141421. [DOI] [PubMed] [Google Scholar]
9. Limon D, McSherry F, Herndon J, et al. Single fraction stereotactic radiosurgery for multiple brain metastases. Adv Radiat Oncol. 2017;2:555–563. doi: 10.1016/j.adro.2017.09.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
10. Yang R, Zhu D, Howard LE, et al. Identification of patients with metastatic prostate cancer with natural language processing and machine learning. JCO Clin Cancer Inform. doi: 10.1200/CCI.21.00071. 10.1200/CCI.21.00071 [DOI] [PubMed] [Google Scholar]
11. Senders JT, Karhade AV, Cote DJ, et al. Natural language processing for automated quantification of brain metastases reported in free-text radiology reports. JCO Clin Cancer Inform. doi: 10.1200/CCI.18.00138. 10.1200/CCI.18.00138 [DOI] [PMC free article] [PubMed] [Google Scholar]
12. Liede A, Hernandez RK, Roth M, et al. Validation of International Classification of Diseases coding for bone metastases in electronic health records using technology-enabled abstraction. Clin Epidemiol. 2015;7:441–448. doi: 10.2147/CLEP.S92209. [DOI] [PMC free article] [PubMed] [Google Scholar]
13. Miller JA, Bennett EE, Xiao R, et al. Association between radiation necrosis and tumor biology after stereotactic radiosurgery for brain metastasis. Int J Radiat Oncol Biol Phys. 2016;96:1060–1069. doi: 10.1016/j.ijrobp.2016.08.039. [DOI] [PubMed] [Google Scholar]
14. Jaboin JJ, Ferraro DJ, DeWees TA, et al. Survival following gamma knife radiosurgery for brain metastasis from breast cancer. Radiat Oncol. 2013;8:131. doi: 10.1186/1748-717X-8-131. [DOI] [PMC free article] [PubMed] [Google Scholar]
15. Cochran DC, Chan MD, Aklilu M, et al. The effect of targeted agents on outcomes in patients with brain metastases from renal cell carcinoma treated with Gamma Knife surgery: Clinical article. J Neurosurg. 2012;116:978–983. doi: 10.3171/2012.2.JNS111353. [DOI] [PMC free article] [PubMed] [Google Scholar]
16. Ling AY, Kurian AW, Caswell-Jin JL, et al. Using natural language processing to construct a metastatic breast cancer cohort from linked cancer registry and electronic medical records data. JAMIA Open. 2019;2:528–537. doi: 10.1093/jamiaopen/ooz040. [DOI] [PMC free article] [PubMed] [Google Scholar]
17. Kazemzadeh F, Snoek JAA, Voorham QJ, et al. Association of metastatic pattern in breast cancer with tumor and patient-specific factors: A nationwide autopsy study using artificial intelligence. Breast Cancer. 2024;31:263–271. doi: 10.1007/s12282-023-01534-6. [DOI] [PubMed] [Google Scholar]
18. Kirshner J, Cohn K, Dunder S, et al. Automated electronic health record-based tool for identification of patients with metastatic disease to facilitate clinical trial patient ascertainment. JCO Clin Cancer Inform. doi: 10.1200/CCI.20.00180. 10.1200/CCI.20.00180 [DOI] [PubMed] [Google Scholar]
19. Banerjee I, Bozkurt S, Caswell-Jin JL, et al. Natural language processing approaches to detect the timeline of metastatic recurrence of breast cancer. JCO Clin Cancer Inform. doi: 10.1200/CCI.19.00034. 10.1200/CCI.19.00034 [DOI] [PubMed] [Google Scholar]
20. Causa Andrieu P, Golia Pernicka JS, Yaeger R, et al. Natural language processing of computed tomography reports to label metastatic phenotypes with prognostic significance in patients with colorectal cancer. JCO Clin Cancer Inform. doi: 10.1200/CCI.22.00014. 10.1200/CCI.22.00014 [DOI] [PMC free article] [PubMed] [Google Scholar]
21. Zeng J, Banerjee I, Henry AS, et al. Natural language processing to identify cancer treatments with electronic medical records. JCO Clin Cancer Inform. doi: 10.1200/CCI.20.00173. 10.1200/CCI.20.00173 [DOI] [PMC free article] [PubMed] [Google Scholar]
22.https://sparc.musc.edu/ SPARCRequest.
23. Sampson RR, Glenn JL, Cates AM, et al. SPARC: A multi-institutional integrated web based research management system. AMIA Jt Summits Transl Sci Proc. 2013;2013:230. [PubMed] [Google Scholar]
24. Luo J, Lv X, Jiang C, et al. Brain AVM characteristics and age. Eur J Radiol. 2012;81:780–783. doi: 10.1016/j.ejrad.2011.01.086. [DOI] [PubMed] [Google Scholar]
25.https://scikit-learn.org/stable/ scikit-learn: Machine learning in Python—scikit-learn 1.5.2 documentation.
26.https://scikit-learn/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html TfidfVectorizer. scikit-learn.

[b1] 1. Fecci PE, Champion CD, Hoj J, et al. The evolving modern management of brain metastasis. Clin Cancer Res. 2019;25:6570–6580. doi: 10.1158/1078-0432.CCR-18-1624. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b2] 2. O'Beirn M, Benghiat H, Meade S, et al. The expanding role of radiosurgery for brain metastases. Medicines. 2018;5:90. doi: 10.3390/medicines5030090. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b3] 3. Higuchi Y, Yamamoto M, Serizawa T, et al. Modern management for brain metastasis patients using stereotactic radiosurgery: Literature review and the authors’ gamma knife treatment experiences. Cancer Manag Res. 2018;10:1889–1899. doi: 10.2147/CMAR.S116718. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b4] 4. Nieder C, Grosu AL, Gaspar LE. Stereotactic radiosurgery (SRS) for brain metastases: A systematic review. Radiat Oncol. 2014;9:155. doi: 10.1186/1748-717X-9-155. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b5] 5. Mohammadi AM, Recinos PF, Barnett GH, et al. Role of Gamma Knife surgery in patients with 5 or more brain metastases: Clinical article. J Neurosurg. 2012;117:5–12. doi: 10.3171/2012.8.GKS12983. [DOI] [PubMed] [Google Scholar]

[b6] 6. Minniti G, Clarke E, Lanzetta G, et al. Stereotactic radiosurgery for brain metastases: Analysis of outcome and risk of brain radionecrosis. Radiat Oncol. 2011;6:48. doi: 10.1186/1748-717X-6-48. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b7] 7. Stockham AL, Ahluwalia M, Reddy CA, et al. Results of a questionnaire regarding practice patterns for the diagnosis and treatment of intracranial radiation necrosis after SRS. J Neurooncol. 2013;115:469–475. doi: 10.1007/s11060-013-1248-6. [DOI] [PubMed] [Google Scholar]

[b8] 8. Yamamoto M, Kawabe T, Sato Y, et al. Stereotactic radiosurgery for patients with multiple brain metastases: A case-matched study comparing treatment results for patients with 2-9 versus 10 or more tumors. J Neurosurg. 2014;121:16–25. doi: 10.3171/2014.8.GKS141421. [DOI] [PubMed] [Google Scholar]

[b9] 9. Limon D, McSherry F, Herndon J, et al. Single fraction stereotactic radiosurgery for multiple brain metastases. Adv Radiat Oncol. 2017;2:555–563. doi: 10.1016/j.adro.2017.09.002. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b10] 10. Yang R, Zhu D, Howard LE, et al. Identification of patients with metastatic prostate cancer with natural language processing and machine learning. JCO Clin Cancer Inform. doi: 10.1200/CCI.21.00071. 10.1200/CCI.21.00071 [DOI] [PubMed] [Google Scholar]

[b11] 11. Senders JT, Karhade AV, Cote DJ, et al. Natural language processing for automated quantification of brain metastases reported in free-text radiology reports. JCO Clin Cancer Inform. doi: 10.1200/CCI.18.00138. 10.1200/CCI.18.00138 [DOI] [PMC free article] [PubMed] [Google Scholar]

[b12] 12. Liede A, Hernandez RK, Roth M, et al. Validation of International Classification of Diseases coding for bone metastases in electronic health records using technology-enabled abstraction. Clin Epidemiol. 2015;7:441–448. doi: 10.2147/CLEP.S92209. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b13] 13. Miller JA, Bennett EE, Xiao R, et al. Association between radiation necrosis and tumor biology after stereotactic radiosurgery for brain metastasis. Int J Radiat Oncol Biol Phys. 2016;96:1060–1069. doi: 10.1016/j.ijrobp.2016.08.039. [DOI] [PubMed] [Google Scholar]

[b14] 14. Jaboin JJ, Ferraro DJ, DeWees TA, et al. Survival following gamma knife radiosurgery for brain metastasis from breast cancer. Radiat Oncol. 2013;8:131. doi: 10.1186/1748-717X-8-131. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b15] 15. Cochran DC, Chan MD, Aklilu M, et al. The effect of targeted agents on outcomes in patients with brain metastases from renal cell carcinoma treated with Gamma Knife surgery: Clinical article. J Neurosurg. 2012;116:978–983. doi: 10.3171/2012.2.JNS111353. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b16] 16. Ling AY, Kurian AW, Caswell-Jin JL, et al. Using natural language processing to construct a metastatic breast cancer cohort from linked cancer registry and electronic medical records data. JAMIA Open. 2019;2:528–537. doi: 10.1093/jamiaopen/ooz040. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b17] 17. Kazemzadeh F, Snoek JAA, Voorham QJ, et al. Association of metastatic pattern in breast cancer with tumor and patient-specific factors: A nationwide autopsy study using artificial intelligence. Breast Cancer. 2024;31:263–271. doi: 10.1007/s12282-023-01534-6. [DOI] [PubMed] [Google Scholar]

[b18] 18. Kirshner J, Cohn K, Dunder S, et al. Automated electronic health record-based tool for identification of patients with metastatic disease to facilitate clinical trial patient ascertainment. JCO Clin Cancer Inform. doi: 10.1200/CCI.20.00180. 10.1200/CCI.20.00180 [DOI] [PubMed] [Google Scholar]

[b19] 19. Banerjee I, Bozkurt S, Caswell-Jin JL, et al. Natural language processing approaches to detect the timeline of metastatic recurrence of breast cancer. JCO Clin Cancer Inform. doi: 10.1200/CCI.19.00034. 10.1200/CCI.19.00034 [DOI] [PubMed] [Google Scholar]

[b20] 20. Causa Andrieu P, Golia Pernicka JS, Yaeger R, et al. Natural language processing of computed tomography reports to label metastatic phenotypes with prognostic significance in patients with colorectal cancer. JCO Clin Cancer Inform. doi: 10.1200/CCI.22.00014. 10.1200/CCI.22.00014 [DOI] [PMC free article] [PubMed] [Google Scholar]

[b21] 21. Zeng J, Banerjee I, Henry AS, et al. Natural language processing to identify cancer treatments with electronic medical records. JCO Clin Cancer Inform. doi: 10.1200/CCI.20.00173. 10.1200/CCI.20.00173 [DOI] [PMC free article] [PubMed] [Google Scholar]

[b22] 22.https://sparc.musc.edu/ SPARCRequest.

[b23] 23. Sampson RR, Glenn JL, Cates AM, et al. SPARC: A multi-institutional integrated web based research management system. AMIA Jt Summits Transl Sci Proc. 2013;2013:230. [PubMed] [Google Scholar]

[b24] 24. Luo J, Lv X, Jiang C, et al. Brain AVM characteristics and age. Eur J Radiol. 2012;81:780–783. doi: 10.1016/j.ejrad.2011.01.086. [DOI] [PubMed] [Google Scholar]

[b25] 25.https://scikit-learn.org/stable/ scikit-learn: Machine learning in Python—scikit-learn 1.5.2 documentation.

[b26] 26.https://scikit-learn/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html TfidfVectorizer. scikit-learn.

PERMALINK

Classifying Stereotactic Radiosurgery Patients by Primary Diagnosis Using Natural Language Processing of Clinical Notes

Mario Fugal, MS

David Marshall, MD, MS

Alexander V Alekseyenko, PhD

Xia Jing, PhD

Graham Warren, MD, PhD

Jihad Obeid, MD

Abstract

PURPOSE

METHODS

RESULTS

CONCLUSION

INTRODUCTION

CONTEXT

BACKGROUND

METHODS

Ethical Approval

Data Collection

Data Preprocessing

Manual Labeling of Primary Tumor Types

TABLE 1.

RESULTS

Results From Using ICD Codes to Classify BM Histology

TABLE 2.

Training the Primary Diagnosis NLP Classifier

TABLE 3.

Results From Training the NLP Classifier

TABLE 4.

TABLE 5.

DISCUSSION

SUPPORT

AUTHOR CONTRIBUTIONS

AUTHORS' DISCLOSURES OF POTENTIAL CONFLICTS OF INTEREST

REFERENCES

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Classifying Stereotactic Radiosurgery Patients by Primary Diagnosis Using Natural Language Processing of Clinical Notes

Mario Fugal, MS

David Marshall, MD, MS

Alexander V Alekseyenko, PhD

Xia Jing, PhD

Graham Warren, MD, PhD

Jihad Obeid, MD

Abstract

PURPOSE

METHODS

RESULTS

CONCLUSION

INTRODUCTION

CONTEXT

BACKGROUND

METHODS

Ethical Approval

Data Collection

Data Preprocessing

Manual Labeling of Primary Tumor Types

TABLE 1.

RESULTS

Results From Using ICD Codes to Classify BM Histology

TABLE 2.

Training the Primary Diagnosis NLP Classifier

TABLE 3.

Results From Training the NLP Classifier

TABLE 4.

TABLE 5.

DISCUSSION

SUPPORT

AUTHOR CONTRIBUTIONS

AUTHORS' DISCLOSURES OF POTENTIAL CONFLICTS OF INTEREST

REFERENCES

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases