Natural Language Processing of Large-Scale Structured Radiology Reports to Identify Oncologic Patients With or Without Splenomegaly Over a 10-Year Period

Simon Sun; Kaelan Lupton; Karen Batch; Huy Nguyen; Lior Gazit; Natalie Gangai; Jessica Cho; Kevin Nicholas; Farhana Zulkernine; Varadan Sevilimedu; Amber Simpson; Richard K G Do

doi:10.1200/CCI.21.00104

. 2022 Jan 6;6:e2100104. doi: 10.1200/CCI.21.00104

Natural Language Processing of Large-Scale Structured Radiology Reports to Identify Oncologic Patients With or Without Splenomegaly Over a 10-Year Period

Simon Sun ¹, Kaelan Lupton ², Karen Batch ², Huy Nguyen ³, Lior Gazit ³, Natalie Gangai ¹, Jessica Cho ³, Kevin Nicholas ³, Farhana Zulkernine ², Varadan Sevilimedu ⁴, Amber Simpson ², Richard K G Do ^1,^✉

PMCID: PMC9848545 PMID: 34990210

PURPOSE

To assess the accuracy of a natural language processing (NLP) model in extracting splenomegaly described in patients with cancer in structured computed tomography radiology reports.

METHODS

In this retrospective study between July 2009 and April 2019, 3,87,359 consecutive structured radiology reports for computed tomography scans of the chest, abdomen, and pelvis from 91,665 patients spanning 30 types of cancer were included. A randomized sample of 2,022 reports from patients with colorectal cancer, hepatobiliary cancer (HB), leukemia, Hodgkin lymphoma (HL), and non-HL patients was manually annotated as positive or negative for splenomegaly. NLP model training/testing was performed on 1,617/405 reports, and a new validation set of 400 reports from all cancer subtypes was used to test NLP model accuracy, precision, and recall. Overall survival was compared between the patient groups (with and without splenomegaly) using Kaplan-Meier curves.

RESULTS

The final cohort included 3,87,359 reports from 91,665 patients (mean age 60.8 years; 51.2% women). In the testing set, the model achieved accuracy of 92.1%, precision of 92.2%, and recall of 92.1% for splenomegaly. In the validation set, accuracy, precision, and recall were 93.8%, 92.9%, and 86.7%, respectively. In the entire cohort, splenomegaly was most frequent in patients with leukemia (32.5%), HB (17.4%), non-HL (9.1%), colorectal cancer (8.5%), and HL (5.6%). A splenomegaly label was associated with an increased risk of mortality in the entire cohort (hazard ratio 2.10; 95% CI, 1.98 to 2.22; P < .001).

CONCLUSION

Automated splenomegaly labeling by NLP of radiology report demonstrates good accuracy, precision, and recall. Splenomegaly is most frequently reported in patients with leukemia, followed by patients with HB.

INTRODUCTION

A broad spectrum of pathologies including hematologic malignancy, infection, and portal hypertension can lead to splenomegaly.^1-5 In patients with cancer, the onset of splenomegaly has important clinical ramifications, including in patients with colorectal cancer (CRC)^6,7 and in patients with lymphoma whose cancer can be upgraded from stage II to stage III with splenic involvement.⁸ Despite the important clinical implications of splenomegaly in patients with cancer, accurate radiologic assessment of splenomegaly remains controversial, with a lack of consensus on its definition. For example, the use of a craniocaudal benchmark of 13 cm is reported to have 68% sensitivity and 76% specificity as a threshold for mild to moderate splenomegaly.^9,10 Across patient populations, the criteria for splenomegaly may also vary, eg, between patients with lymphoma and patients with secondary splenic enlargement from portal hypertension.¹⁰

CONTEXT

Key Objective
To determine the rate of splenomegaly in different cancer patients, by applying natural language processing to structured radiology reports at a cancer center.
Knowledge Generated
Splenomegaly can be extracted from a large database of structured radiology reports with high accuracy. Splenomegaly rates vary widely across cancers, being most common in patients with lymphoma and leukemia, as well as in patients with hepatobiliary and colorectal cancer.
Relevance
Extraction of splenomegaly labels at large scale can help identify patient populations likely to develop portal hypertension from chemotherapy or at increased risk for complications related to splenomegaly.

Despite this uncertainty, radiologists continue to identify splenomegaly in daily clinical practice, creating large databases of reports identifying normal and abnormal enlarged spleens. Hence, with the increased use of structured reporting, there is an opportunity to measure the frequency of splenomegaly and range of splenic volumes across large populations who undergo medical imaging, using tools such as natural language processing (NLP) of radiologic reports and applying automated segmentation algorithms to clinical computed tomography (CT) images. NLP is increasingly applied to large databases of radiologic reports, eg, in the labeling of pathologic entities such as venous thromboembolism or pulmonary emboli.¹¹

Our hypothesis was that NLP could be used on spleen-related subsections of structured radiologic reports to identify splenomegaly and splenic lesions. The primary objective of our study was thus to develop an NLP model to label large cohorts of patients with or without splenomegaly, allowing for frequency determination across 10 years of structured radiology reports at a tertiary cancer center. To examine the clinical relevance of our findings, we also explored the overall survival (OS) for patients with or without splenomegaly. Our long-term goal was the creation of a large, searchable database of patients with or without splenomegaly that can open up new areas of investigation, including defining normal ranges of splenic volumes across different cancer populations.

METHODS

Patient Cohort

Our research protocol received institutional review board approval with a waiver of informed patient consent.

A retrospective cohort of consecutive patients between July 1, 2009, and March 26, 2019 (Table 1), at a tertiary cancer center who underwent CT imaging was identified in our institutional database (Darwin). The cohort was restricted to patients who had undergone CT scans of the chest, abdomen, and pelvis, with clinical reports that adhered to our departmental structured reporting template (Fig 1). Reports deviating from the template were excluded.

TABLE 1.

Patient Demographics

graphic file with name cci-6-e2100104-g002.jpg

Open in a new tab

FIG 1. — Overview of model structure. CT, computed tomography; NLP, natural language processing; TF-IDF, term frequency-inverse document frequency. Natural language processing can accurately identify splenomegaly in CT reports of patients with cancer.

To determine patient cancer type, an algorithm was developed to return the patient's most likely primary cancer as of the date on the report. The algorithm looked back one year from the report date, considering any disease-relevant information found on several data sources (orders, clinical notes, and pathology reports) along with biologic sex. A rule-based system was developed to maximize match rate on historic records, specifically the most recent primary cancer recorded in the Cancer Database (ie, Tumor Registry) before the report date, if available. The algorithm included 30 potential disease labels on the basis of a clinically relevant, high-level classification of tumors according to the International Classification of Diseases for Oncology (ie, site-level labels for solid tumors such as colorectal and hepatobiliary tumors, and high-level disease labels for hematologic malignancies such as leukemia and Hodgkin lymphoma [HL]).¹² Basal cell and squamous cell carcinomas of the skin were excluded from this algorithm.

Data Curation

The Spleen and Impression subsections of the report were extracted for manual curation. Two curators (S.S., a radiology clinical fellow training in oncologic imaging, and J.C., a research assistant) were trained to evaluate these reports for the presence of absence of splenomegaly. Training details are included in Table 2. Before selecting the reports for manual curation, a rule-based system was developed to automatically label reports where the Spleen section contained only the default text of unremarkable, or the text unchanged or stable.

TABLE 2.

Guidelines for Annotating Reports and Automatic Labeling Rules for Splenomegaly, Splenic Metastases, and Indeterminate Splenic Lesions

Open in a new tab

For the reports that did not qualify for automatic labeling, the two curators evaluated a random sample of 2,022 reports (curation set). A separate validation set of 400 reports is described below. The reports were selected from patients among five cancer subtypes where splenomegaly is known to frequently occur: CRC, hepatobiliary cancer (HB), leukemia, HL, and non-Hodgkin lymphoma (NHL). Inter-rater agreement was calculated as the percentage of ratings that were concordant between the two readers, out of a total sample size of 95 cases, with corresponding 95% binomial exact CIs.

Model Development and Statistical Analysis

From the 2,022 manually curated reports (curation set), 80% were used for training and 20% were used to test the accuracy of various NLP models. Reports were tokenized in two different ways to compare their performances: first as a bag of words (BoW) count and then as a term frequency-inverse document frequency (TF-IDF) count. The BoW approach gathers a dictionary of all the words found in the available corpus and tallies the frequency of each word. The TF-IDF count approach adds what is known as a TF-IDF weight to approximate how important each word might be to the documents it appears in within the corpus. A TF-IDF weight is typically composed of two terms: the first computes the normalized TF, the number of times a word appears in a document divided by the number of words in that document, and the second computes the IDF, calculated as the logarithm of the number of total documents in the corpus divided by the number of documents in which the word appears. When compared with the more simple BoW, TF-IDF can differentiate between a word appearing in one document many times and a word appearing in many documents once.¹³ For the purposes of this study, a single radiology report constituted a document, and a collection of radiology reports constituted the corpus. Cleaned and tokenized data were then passed into several classifiers to determine which produced the best results (Fig 1). Multiple iterations of logistic regression and random forest classification models were tested. For each model, approximate levels of separation between classes were analyzed using principal component analysis, a common technique for reducing dimensionality in wide data sets to hunt for visible separation among data points, indicating a linearly separable class structure. All models were written and compiled in Python 3.7 using Jupyter notebooks. Accuracy, precision (positive predictive value), recall (sensitivity), and F1 score were assessed for the training and test sets.

The best-performing NLP model was used to label the remaining 77,756 unannotated reports for splenomegaly (NLP prediction set). Accuracy, precision, recall, and F1 score of the prediction set were assessed by selecting a random sample of 400 reports for manual validation (validation set). The frequency of splenomegaly was calculated in all reports and was also calculated for all patients who had a single disease label throughout all their radiology reports. Patients with more than one disease label (ie, cancer subtype) assigned across their reports were excluded. For example, if a patient was identified with both prostate and lymphoma as disease labels, they were excluded from the frequency calculation at the patient level.

To explore the clinical relevance of the NLP prediction model, we compared the OS of patients labeled for splenomegaly as follows: (1) patients where all reports were positive for splenomegaly, (2) patients where all reports were negative for splenomegaly, and (3) all other patients, where labels for splenomegaly varied between positive and negative. Time to event (death, loss to follow-up, and end of administrative follow-up) analysis was conducted using Kaplan-Meier survival curves in R 3.5.2 (R Core Team 2017). Follow-up started from the time of initial CT scan on or after July 1, 2009, and survival time was measured from this time point. To calculate hazard ratios (HRs), the Cox proportional hazards model was used with time to event as the dependent variable and status of splenic involvement as the stratifying variable. The HRs were compared between strata using the likelihood ratio test. The type I error rate was set to α = .05. Statistical analysis was performed by a biostatistician (**).

RESULTS

Cohort

The initial database search yielded 4,00,129 CT reports; 12,770 reports were excluded for deviating from the standardized reporting template. The final cohort included 3,87,359 reports from 91,665 patients (mean [standard deviation] age 60.8 [14.8] years; 46,939 (51.2%) females, Table 1). The most common cancers represented in the database were CRC, ovarian, urinary, lung, soft tissue (eg, sarcomas), breast, and prostate (Table 3). The mean number of reports per patient was 4.23, the median was 2, and the interquartile range was 5.

TABLE 3.

Frequency of Splenomegaly

graphic file with name cci-6-e2100104-g005.jpg

Open in a new tab

The number of reports that qualified for automatic labeling was 307,581. Of the 79,778 remaining reports, 2,022 randomly selected reports from patients with CRC, HB, HL, NHL, and leukemia were used for NLP model training and testing. The remaining 77,756 reports from all patients with cancer were reserved for predictive labeling using the best-performing NLP model as well as validation of the best-performing NLP model in 400 randomly selected reports.

NLP Model Performance

Testing set.

Across the 2,022 manually curated reports, the frequency of splenomegaly was 41.9% (847 of 2,022). Inter-rater agreement between the two curators was 98.9% (95% CI, 94.3 to 100).

The best-performing NLP model passed TF-IDF tokenization through a logistic regression classifier.¹⁴ In the testing set, the model achieved high accuracy (92.1%), precision (92.2%), recall (92.1%), and F1 score (92.1%). Figure 2A presents the receiver operating characteristic curves showing that the splenic indeterminate lesion classifier achieved the best performance with the highest area under the curve 0.99. Figure 2B presents the precision-recall curves that allow for better visualization of model performance when taking into account the distribution differences between splenomegaly, splenic metastasis, and indeterminate splenic lesions data.

FIG 2. — Performance of logistic regression TF-IDF classifier for splenomegaly as shown by (A) ROC curves and (B) precision-recall curves. ROC, receiver operating characteristic; TF-IDF, term frequency-inverse document frequency.

Validation set.

The best-performing NLP model was applied to label 77,756 unlabeled reports. The model's performance was evaluated on a validation set of 400 reports. In this validation set, the frequency of splenomegaly was 22.5% (90 of 400). The model achieved an accuracy of 93.8%, precision of 92.9%, recall of 86.7%, and F1 score of 89.7%.

Clinical Relevance

The entire database was labeled for the presence or absence of splenomegaly by combining the automatic label set, the curation set, and the NLP model prediction set. In patients with a single disease label, the frequency of splenomegaly per report was 1.2%-2.1% for the most common cancers, except for CRC where it was 8.5% (Table 3). The frequency of splenomegaly was highest in reports of patients with leukemia (32.5%), followed by patients with HB cancers (17.4%). At the patient level, a splenomegaly label was associated with an increased risk of mortality in the entire cohort (HR, 2.10; 95% CI, 1.98 to 2.22; P < .001) and in patients with CRC (HR, 3.68; 95% CI, 3.07 to 4.42; P < .001; Fig 3).

FIG 3. — OS for patients with colorectal cancer with or without splenomegaly. Patients were grouped by cohorts of patients without splenomegaly on any report (No), as positive for splenomegaly on every report (Yes), and all others (Other). OS, overall survival.

DISCUSSION

Our results show that NLP can be applied to a large database of nearly 4,00,000 radiologic reports to rapidly annotate for splenomegaly and splenic lesions in an oncologic setting. With increasingly available health information technology and structured radiology reporting, vast amounts of electronic patient medical data are now readily available to advance oncologic research.^15,16 The best-performing NLP model for the prediction of splenomegaly achieved an accuracy more than 90%, with comparable performance to prior NLP studies done for different clinical entities such as carotid stenosis by Wu et al¹⁷ (93% accuracy) and surgical-site infections by Thirukumaran et al¹⁸ (97% precision). These prediction accuracies were calculated on unlabeled reports, which accounted for 20.6% of radiologic reports, whereas the remaining 79.4% of reports were automatically labeled. The slight increase in splenomegaly prediction accuracy in the validation set (93.8%) when compared with the testing set (92.1%) may be because of differences in cancer types between the testing and validation sets.

Although NLP has been used to assess oncologic response,¹⁹ our study is the first to use NLP to assess for splenomegaly in a large cancer patient cohort of more than 90,000 patients. The highest frequency of splenomegaly in CT reports was found in patients with leukemia (32%) and HB (17%) cancers, which is in line with reported splenomegaly in the literature of 39% because of hematologic malignancy and 18% because of hepatic disease.^4,20 Patients with CRC also showed a high rate of splenomegaly (8.5%) comparable to patients with HL (5.6%) or NHL (9.1%), possibly because of the frequent use of oxaliplatin plus fluoropyrimidine combination adjuvant chemotherapy that is associated with high rates of splenomegaly.⁶ In our exploratory analysis, the presence of splenomegaly was associated with decreased OS in our cohort of more than 10,000 patients with CRC, which is in line with prior studies.^21,22

Among the various approaches to building an NLP pipeline, we found that the use of TF-IDF data as an input to a logistic regression classifier led to the best results over other model architectures, most likely because of the more elaborate counting approach and specific consideration of individual documents provided by the TF-IDF tokenization.¹³ A test attempt was made using deep learning models, but the size of the training set limited their effectiveness. Deep learning models benefit from larger amounts of training data and are known to underperform with smaller data sets like those used in this study.²³

NLP studies in radiology usually fall into one of five categories in terms of operational use: diagnostic surveillance, cohort building for epidemiologic studies, query-based case retrieval, quality assessment of radiologic practice, and clinical support services.²⁴ The performance of our NLP prediction model for splenomegaly could be used for cohort building and automated retrieval of a large volume of cases with or without splenomegaly to investigate normal splenic volume distributions in different cancers. Our exploratory results demonstrating survival differences among a large number of patients with cancer with or without splenomegaly may spur new investigations into specific clinical questions. For example, these results may inform the use of automated splenic segmentation from CT scans in patients at risk of developing portal hypertension, such as patients with CRC.

Our study has a number of limitations, including its retrospective nature and the lack of generalizability to unstructured radiologic reports from other institutions. However, structured reporting is gaining broader acceptance in the radiology community, and our NLP model may one day be validated using external data. The choice of a subset of cancers for initial annotation and training was another limitation that may explain differences in accuracies between test and validation sets. The choice of limiting patients to CRC, HB, HL, NHL, and leukemia in the test set was deliberate, to enrich for the presence of splenomegaly, but the same choice may have negatively affected the model performance in the validation set. Future studies would benefit from training on a larger representative set of training data, preferably from multiple centers, to improve the generalizability of our results. Any study on splenomegaly is also limited by lack of a standard definition for this entity, with variability between individual radiologists in measuring technique and lack of consensus on the optimal splenic size cutoff to identify splenomegaly for different cancers.^9,10 Our study is also limited by the lack of splenic volumetric calculations for validation on the basis of the CT scans associated with the report. A splenic volume is considered the gold standard for evaluation of splenomegaly, with 314.5 cm³ often used as the upper limit of normal.^2,9,25 Automated splenic segmentation may one day be used to improve splenomegaly detection in comparison to radiologist assessment,⁹ and efforts are underway to apply an automated segmentation pipeline to our patient database. Our OS analysis is limited by a few points: time of initial CT scan was considered a reasonable proxy for time of initial diagnosis; patients who had a CT scan before July 1, 2009, were eligible for the study only on or after July 1, 2009, and therefore their start date was considered as the date of their first CT scan occurring on or after this date. However, the latter is unlikely to introduce any bias in HR estimates because this likely affects the starting date of follow-up in all strata equally.

In our study, we developed an NLP model to automatically label splenomegaly in patients with cancer from a large database of nearly 4,00,000 structured radiology reports and more than 90,000 patients. This database also opens the door for hypothesis-generating population studies and for further exploration of the clinical relevance of dynamic changes in disease labels. The tools developed in this NLP model are also generalizable and could be applied to investigate the presence of metastatic disease in different organs across cancer populations.

ACKNOWLEDGMENT

The authors thank Joanne Chin, MFA, ELS, for her editorial assistance on this paper. Ms Chin is a full-time Editor/Grant Writer at Memorial Sloan Kettering Cancer Center (MSK) and provided editorial assistance for this paper as part of her position at MSK.

Huy Nguyen

Employment: Caremark, LLC

Lior Gazit

Stock and Other Ownership Interests: Within Health

Consulting or Advisory Role: Within Health

Richard K. G. Do

Honoraria: ALK (I), Genentech (I)

Consulting or Advisory Role: DBV Technologies (I), Bayer Healthcare, GE Helathcare

Patents, Royalties, Other Intellectual Property: UpToDate chapters on Food Allergy (I)

No other potential conflicts of interest were reported.

PRIOR PRESENTATION

Presented at the European Society of Gastrointestinal and Abdominal Radiology 2020 Virtual Annual Meeting, May 19-22, 2020.

SUPPORT

Supported by the NIH/NCI Cancer Center Support Grant (P30 CA008748).

AUTHOR CONTRIBUTIONS

Conception and design: Simon Sun, Natalie Gangai, Farhana Zulkernine, Varadan Sevilimedu, Amber Simpson, Richard K. G. Do

Financial support: Amber Simpson

Administrative support: Huy Nguyen, Natalie Gangai

Provision of study materials or patients: Natalie Gangai

Collection and assembly of data: Simon Sun, Huy Nguyen, Lior Gazit, Natalie Gangai, Jessica Cho, Kevin Nicholas, Richard K. G. Do

Data analysis and interpretation: Simon Sun, Kaelan Lupton, Karen Batch, Huy Nguyen, Lior Gazit, Farhana Zulkernine, Varadan Sevilimedu, Amber Simpson, Richard K. G. Do

Manuscript writing: All authors

Final approval of manuscript: All authors

Accountable for all aspects of the work: All authors

AUTHORS' DISCLOSURES OF POTENTIAL CONFLICTS OF INTEREST

The following represents disclosure information provided by authors of this manuscript. All relationships are considered compensated unless otherwise noted. Relationships are self-held unless noted. I = Immediate Family Member, Inst = My Institution. Relationships may not relate to the subject matter of this manuscript. For more information about ASCO's conflict of interest policy, please refer to www.asco.org/rwc or ascopubs.org/cci/author-center.

Open Payments is a public database containing information reported by companies about payments made to US-licensed physicians (Open Payments).

Huy Nguyen

Employment: Caremark, LLC

Lior Gazit

Stock and Other Ownership Interests: Within Health

Consulting or Advisory Role: Within Health

Richard K. G. Do

Honoraria: ALK (I), Genentech (I)

Consulting or Advisory Role: DBV Technologies (I), Bayer Healthcare, GE Helathcare

Patents, Royalties, Other Intellectual Property: UpToDate chapters on Food Allergy (I)

No other potential conflicts of interest were reported.

REFERENCES

1.Vancauwenberghe T, Snoeckx A, Vanbeckevoort D, et al. : Imaging of the spleen: What the clinician needs to know. Singapore Med J 56:133-144, 2015 [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Prassopoulos P, Daskalogiannaki M, Raissaki M, et al. : Determination of normal splenic volume on computed tomography in relation to age, gender and body habitus. Eur Radiol 7:246-248, 1997 [DOI] [PubMed] [Google Scholar]
3.Cotran R, Kumar V, Robbins S: Diseases of white cells, lymph nodes and spleen, in Robbins Pathologic Basis of Disease. Philadelphia, PA, Saunders, 1989, pp 703-754 [Google Scholar]
4.Curovic Rotbain E, Lund Hansen D, Schaffalitzky de Muckadell O, et al. : Splenomegaly—Diagnostic validity, work-up, and underlying causes. PLoS One 12:e0186674, 2017 [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Reinert CP, Kloth C, Fritz J, et al. : Discriminatory CT-textural features in splenic infiltration of lymphoma versus splenomegaly in liver cirrhosis versus normal spleens in controls and evaluation of their role for longitudinal lymphoma monitoring. Eur J Radiol 104:129-135, 2018 [DOI] [PubMed] [Google Scholar]
6.Kim MJ, Han SW, Lee DW, et al. : Splenomegaly and its associations with genetic polymorphisms and treatment outcome in colorectal cancer patients treated with adjuvant FOLFOX. Cancer Res Treat 48:990-997, 2016 [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Accurso V, Santoro M, Raso S, et al. : Splenomegaly impacts prognosis in essential thrombocythemia and polycythemia vera: A single center study. Hematol Rep 11:8281, 2019 [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Cheson BD, Fisher RI, Barrington SF, et al. : Recommendations for initial evaluation, staging, and response assessment of Hodgkin and non-Hodgkin lymphoma: The Lugano classification. J Clin Oncol 32:3059-3068, 2014 [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Linguraru MG, Sandberg JK, Jones EC, et al. : Assessing splenomegaly: Automated volumetric analysis of the spleen. Acad Radiol 20:675-684, 2013 [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Nuffer Z, Marini T, Rupasov A, et al. : The best single measurement for assessing splenomegaly in patients with cirrhotic liver morphology. Acad Radiol 24:1510-1516, 2017 [DOI] [PubMed] [Google Scholar]
11.Tian Z, Sun S, Eguale T, et al. : Automated extraction of VTE events from narrative radiology reports in electronic health records: A validation study. Med Care 55:e73-e80, 2017 [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Fritz A, Percy C, Jack A, et al. International Classification for Diseases in Oncology (ed 3rd, 1st rev). Geneva, Switzerland, World Health Organization (WHO), 2013 [Google Scholar]
13.Jones KS: A statistical interpretation of term specificity and its application in retrieval. J Doc 60:493-502, 1988 [Google Scholar]
14.GitHub, Inc : Splenomegaly-Detection. https://github.com/karenbatch19/Splenomegaly-Detection [Google Scholar]
15.Yim WW, Yetisgen M, Harris WP, et al. : Natural language processing in oncology: A review. JAMA Oncol 2:797-804, 2016 [DOI] [PubMed] [Google Scholar]
16.Sorin V, Barash Y, Konen E, et al. : Deep learning for natural language processing in radiology-fundamentals and a systematic review. J Am Coll Radiol 17:639-648, 2020 [DOI] [PubMed] [Google Scholar]
17.Wu X, Zhao Y, Radev D, et al. : Identification of patients with carotid stenosis using natural language processing. Eur Radiol 30:4125-4133, 2020 [DOI] [PubMed] [Google Scholar]
18.Thirukumaran CP, Zaman A, Rubery PT, et al. : Natural language processing for the identification of surgical site infections in orthopaedics. J Bone Joint Surg Am 101:2167-2174, 2019 [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Chen PH, Zafar H, Galperin-Aizenberg M, et al. : Integrating natural language processing and machine learning algorithms to categorize oncologic response in radiology reports. J Digit Imaging 31:178-184, 2018 [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Saboo SS, Krajewski KM, O'Regan KN, et al. : Spleen in haematological malignancies: Spectrum of imaging findings. Br J Radiol 85:81-92, 2012 [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Katayama M, Nakano H, Kishi S, et al. : A splenic volume increase due to preoperative chemotherapy may impair the long-term outcome after hepatectomy in patients with initially non-optimally resectable colorectal cancer liver metastases. Hepatogastroenterology 60:1420-1425, 2013 [DOI] [PubMed] [Google Scholar]
22.Simpson AL, Leal JN, Pugalenthi A, et al. : Chemotherapy-induced splenic volume increase is independently associated with major complications after hepatic resection for metastatic colorectal cancer. J Am Coll Surg 220:271-280, 2015 [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Haenssle HA, Fink C, Schneiderbauer R, et al. : Man against machine: Diagnostic performance of a deep learning convolutional neural network for dermoscopic melanoma recognition in comparison to 58 dermatologists. Ann Oncol 29:1836-1842, 2018 [DOI] [PubMed] [Google Scholar]
24.Pons E, Braun LM, Hunink MG, et al. : Natural language processing in radiology: A systematic review. Radiology 279:329-343, 2016 [DOI] [PubMed] [Google Scholar]
25.Bezerra AS, D'Ippolito G, Faintuch S, et al. : Determination of splenomegaly by CT: Is there a place for a single measurement? AJR Am J Roentgenol 184:1510-1513, 2005 [DOI] [PubMed] [Google Scholar]

[b1] 1.Vancauwenberghe T, Snoeckx A, Vanbeckevoort D, et al. : Imaging of the spleen: What the clinician needs to know. Singapore Med J 56:133-144, 2015 [DOI] [PMC free article] [PubMed] [Google Scholar]

[b2] 2.Prassopoulos P, Daskalogiannaki M, Raissaki M, et al. : Determination of normal splenic volume on computed tomography in relation to age, gender and body habitus. Eur Radiol 7:246-248, 1997 [DOI] [PubMed] [Google Scholar]

[b3] 3.Cotran R, Kumar V, Robbins S: Diseases of white cells, lymph nodes and spleen, in Robbins Pathologic Basis of Disease. Philadelphia, PA, Saunders, 1989, pp 703-754 [Google Scholar]

[b4] 4.Curovic Rotbain E, Lund Hansen D, Schaffalitzky de Muckadell O, et al. : Splenomegaly—Diagnostic validity, work-up, and underlying causes. PLoS One 12:e0186674, 2017 [DOI] [PMC free article] [PubMed] [Google Scholar]

[b5] 5.Reinert CP, Kloth C, Fritz J, et al. : Discriminatory CT-textural features in splenic infiltration of lymphoma versus splenomegaly in liver cirrhosis versus normal spleens in controls and evaluation of their role for longitudinal lymphoma monitoring. Eur J Radiol 104:129-135, 2018 [DOI] [PubMed] [Google Scholar]

[b6] 6.Kim MJ, Han SW, Lee DW, et al. : Splenomegaly and its associations with genetic polymorphisms and treatment outcome in colorectal cancer patients treated with adjuvant FOLFOX. Cancer Res Treat 48:990-997, 2016 [DOI] [PMC free article] [PubMed] [Google Scholar]

[b7] 7.Accurso V, Santoro M, Raso S, et al. : Splenomegaly impacts prognosis in essential thrombocythemia and polycythemia vera: A single center study. Hematol Rep 11:8281, 2019 [DOI] [PMC free article] [PubMed] [Google Scholar]

[b8] 8.Cheson BD, Fisher RI, Barrington SF, et al. : Recommendations for initial evaluation, staging, and response assessment of Hodgkin and non-Hodgkin lymphoma: The Lugano classification. J Clin Oncol 32:3059-3068, 2014 [DOI] [PMC free article] [PubMed] [Google Scholar]

[b9] 9.Linguraru MG, Sandberg JK, Jones EC, et al. : Assessing splenomegaly: Automated volumetric analysis of the spleen. Acad Radiol 20:675-684, 2013 [DOI] [PMC free article] [PubMed] [Google Scholar]

[b10] 10.Nuffer Z, Marini T, Rupasov A, et al. : The best single measurement for assessing splenomegaly in patients with cirrhotic liver morphology. Acad Radiol 24:1510-1516, 2017 [DOI] [PubMed] [Google Scholar]

[b11] 11.Tian Z, Sun S, Eguale T, et al. : Automated extraction of VTE events from narrative radiology reports in electronic health records: A validation study. Med Care 55:e73-e80, 2017 [DOI] [PMC free article] [PubMed] [Google Scholar]

[b12] 12.Fritz A, Percy C, Jack A, et al. International Classification for Diseases in Oncology (ed 3rd, 1st rev). Geneva, Switzerland, World Health Organization (WHO), 2013 [Google Scholar]

[b13] 13.Jones KS: A statistical interpretation of term specificity and its application in retrieval. J Doc 60:493-502, 1988 [Google Scholar]

[b14] 14.GitHub, Inc : Splenomegaly-Detection. https://github.com/karenbatch19/Splenomegaly-Detection [Google Scholar]

[b15] 15.Yim WW, Yetisgen M, Harris WP, et al. : Natural language processing in oncology: A review. JAMA Oncol 2:797-804, 2016 [DOI] [PubMed] [Google Scholar]

[b16] 16.Sorin V, Barash Y, Konen E, et al. : Deep learning for natural language processing in radiology-fundamentals and a systematic review. J Am Coll Radiol 17:639-648, 2020 [DOI] [PubMed] [Google Scholar]

[b17] 17.Wu X, Zhao Y, Radev D, et al. : Identification of patients with carotid stenosis using natural language processing. Eur Radiol 30:4125-4133, 2020 [DOI] [PubMed] [Google Scholar]

[b18] 18.Thirukumaran CP, Zaman A, Rubery PT, et al. : Natural language processing for the identification of surgical site infections in orthopaedics. J Bone Joint Surg Am 101:2167-2174, 2019 [DOI] [PMC free article] [PubMed] [Google Scholar]

[b19] 19.Chen PH, Zafar H, Galperin-Aizenberg M, et al. : Integrating natural language processing and machine learning algorithms to categorize oncologic response in radiology reports. J Digit Imaging 31:178-184, 2018 [DOI] [PMC free article] [PubMed] [Google Scholar]

[b20] 20.Saboo SS, Krajewski KM, O'Regan KN, et al. : Spleen in haematological malignancies: Spectrum of imaging findings. Br J Radiol 85:81-92, 2012 [DOI] [PMC free article] [PubMed] [Google Scholar]

[b21] 21.Katayama M, Nakano H, Kishi S, et al. : A splenic volume increase due to preoperative chemotherapy may impair the long-term outcome after hepatectomy in patients with initially non-optimally resectable colorectal cancer liver metastases. Hepatogastroenterology 60:1420-1425, 2013 [DOI] [PubMed] [Google Scholar]

[b22] 22.Simpson AL, Leal JN, Pugalenthi A, et al. : Chemotherapy-induced splenic volume increase is independently associated with major complications after hepatic resection for metastatic colorectal cancer. J Am Coll Surg 220:271-280, 2015 [DOI] [PMC free article] [PubMed] [Google Scholar]

[b23] 23.Haenssle HA, Fink C, Schneiderbauer R, et al. : Man against machine: Diagnostic performance of a deep learning convolutional neural network for dermoscopic melanoma recognition in comparison to 58 dermatologists. Ann Oncol 29:1836-1842, 2018 [DOI] [PubMed] [Google Scholar]

[b24] 24.Pons E, Braun LM, Hunink MG, et al. : Natural language processing in radiology: A systematic review. Radiology 279:329-343, 2016 [DOI] [PubMed] [Google Scholar]

[b25] 25.Bezerra AS, D'Ippolito G, Faintuch S, et al. : Determination of splenomegaly by CT: Is there a place for a single measurement? AJR Am J Roentgenol 184:1510-1513, 2005 [DOI] [PubMed] [Google Scholar]

PERMALINK

Natural Language Processing of Large-Scale Structured Radiology Reports to Identify Oncologic Patients With or Without Splenomegaly Over a 10-Year Period

Simon Sun, MD

Kaelan Lupton, BCmpH

Karen Batch, BCmpH

Huy Nguyen, MSc

Lior Gazit, MSc

Natalie Gangai, MPH

Jessica Cho, BA

Kevin Nicholas, MPA

Farhana Zulkernine, PhD

Varadan Sevilimedu, MBBS, DrPH

Amber Simpson, PhD

Richard K G Do, MD, PhD

PURPOSE

METHODS

RESULTS

CONCLUSION

INTRODUCTION

CONTEXT

METHODS

Patient Cohort

TABLE 1.

FIG 1.

Data Curation

TABLE 2.

Model Development and Statistical Analysis

RESULTS

Cohort

TABLE 3.

NLP Model Performance

Testing set.

FIG 2.

Validation set.

Clinical Relevance

FIG 3.

DISCUSSION

ACKNOWLEDGMENT

PRIOR PRESENTATION

SUPPORT

AUTHOR CONTRIBUTIONS

AUTHORS' DISCLOSURES OF POTENTIAL CONFLICTS OF INTEREST

REFERENCES

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases