Abstract
Background & aims:
Diagnostic codes lack the precision to identify specific complications of diverticulitis, limiting their utility in large-scale, real-world data. We developed a natural language processing (NLP) algorithm to classify diverticulitis and associated features using computed tomography (CT) reports.
Methods:
Using data from Mass General Brigham Research Patient Data Registry (1979–2024), we identified patients with a diagnosis code for diverticular disease (ICD-9: 562; ICD-10: K57) and a prior abdominopelvic CT report. We developed and validated our NLP algorithm to detect diverticulitis and associated features. We subsequently investigated the associations between NLP-defined severity at first diagnosis (i.e., uncomplicated, mild, severe, or chronic complications) and risk of severe diverticulitis recurrence using a Cox proportional hazards regression model. We assessed the predictive value of NLP-detected features using random forest models.
Results:
The NLP algorithm achieved positive and negative predictive values of 82.8% to 99.9%, outperforming both ICD codes and a generalist large language model. Among 16,349 patients with NLP-detected diverticulitis, 3,192 developed severe recurrence over 76,736 person-years. Compared to uncomplicated diverticulitis, the multivariable-adjusted hazard ratio (HR) for severe recurrence was 1.39 (95% confidence interval [CI]: 1.14–1.69) for mild complications, 3.02 (95% CI: 2.80–3.27) for severe complications, and 5.41 (95% CI: 4.78–6.13) for chronic complications. NLP-detected features significantly improved the prediction of severe diverticulitis recurrence compared to codified variables.
Conclusion:
Our NLP algorithm accurately classifies diverticulitis features, facilitating the construction of large and high-quality EHR-based cohorts. Severity at initial diagnosis predicts risk of severe recurrence, supporting the use of artificial intelligence for risk stratification and long-term management.
Keywords: Diverticulitis, Natural Language Processing, Artificial Intelligence, Diagnostics
Introduction
Diverticular disease is a common gastrointestinal condition in Western populations, affecting over 2 million patients in the United States alone.1 There has been an increase in emergency admissions, hospitalization costs, and mortality due to complications (e.g., diverticulitis, abscess, perforation) from diverticular disease,2, 3 with more recent data showing a notable rise in early-onset cases diagnosed prior to the age of 50 years.4
Among patients who experience an initial episode of diverticulitis, at least 20% will have one or more recurrences within 10 years, and 12% will develop severe complications that require surgical intervention, such as colectomy.5–7 Yet, predicting which patients are at higher risk for these unfavorable outcomes after the first diagnosis remains largely uncertain. Computed tomography (CT) is the most reliable tool for confirming a diagnosis of diverticulitis and its relevant complications.5, 8 Previous studies have found that the presence of complications and the severity of inflammation are associated with recurrent diverticulitis.9 However, the labor-intensive process of medical records review and data abstraction poses a challenge for applying these findings to large-scale validation. While International Classification of Diseases (ICD) codes are effective in identifying diverticular disease in general, they are not capable of distinguishing between disease subtypes (i.e., complicated vs. uncomplicated) or specifying the type of complications.10
With the advances in modern artificial intelligence techniques in healthcare, natural language processing (NLP) has significantly facilitated the high-throughput extraction of useful information embedded within narrative, unstructured text reports and has been successfully applied to disease phenotyping, but how these specialized workflows compare to generalist models (i.e., large language models [LLMs]) not specifically geared towards medical applications is unknown.11–14 In this study, we aimed to develop an NLP-based algorithm to identify diverticulitis subtypes, including specific complications, using information from CT reports of the abdomen and pelvis. We defined the severity of diverticulitis using NLP-detected features and assessed its association with the risk of future recurrence requiring hospitalization.
Methods
Study Population
The study population includes patients from the Mass General Brigham (MGB) system, which is an integrated healthcare system in the Greater Boston area that serves 1.5 million patients annually. The data were collected from the Research Patient Data Registry (RPDR), a centralized clinical data registry that contains electronic health records for the MGB system. Patients with a diagnosis code for any type of diverticular disease (ICD codes: 562 for ICD-9, K57 for ICD-10) and a radiology report mentioning terms related to diverticular disease of the intestine (diverticulitis, diverticulosis, diverticular, diverticular disease, diverticulum, diverticula) from January 1979 to June 2024 were selected. This study was approved by the MGB Institutional Review Board.
NLP Algorithm
We used information from CT reports of abdomen and pelvis, which are the gold standard confirming a diagnosis of diverticulitis and complications. The NLP algorithm was developed using NLP packages spaCy15 and NegEx/negspaCy16, 17 in Python version 3.9.18. The pretrained en_core_sci_lg model for biomedical texts from the Allen AI ScispaCy package was used. The concepts in the algorithm consisted of diverticulitis and relevant features. The features detected contained wall thickening, fat stranding, inflammatory change, microperforation, abscess, perforation, free air, peritonitis, fistula, large bowel obstruction, phlegmon, and stricture, which were later grouped based on severity: mild complications (microperforation, phlegmon), severe complications (abscess, perforation, free air, peritonitis), chronic complications (fistula, large bowel obstruction, stricture), and uncomplicated features (wall thickening, fat stranding, and inflammatory changes). Each concept was defined by a positive mention of the feature term in the predefined gastrointestinal locations (Supplemental Table 1). The algorithm first parsed the findings and impressions or conclusions sections of each CT report through a previously published method18 and split these sections by sentences. It then universalized synonymized terms for the concept and examined if any were negated by indicators. If one concept was mentioned multiple times, it was defined as present if there was at least one positive mention. Finally, it returned a Boolean variable indicating whether the concept was present in the report (Supplemental Figure 1). Our algorithm is publicly available (https://github.com/lilowuyilun/NLP-diverticulitis.git).
Structured Validation
Blinded internal validation was performed through a double-blinded manual review. The programmer (YW) generated results from the NLP algorithm and randomly selected 1,220 reports that were subsequently manually reviewed by four investigators (JD, DS, WM, PC), who were blinded to the NLP results. The reports were selected using stratified sampling to ensure a sufficient number of positive cases for concepts with low prevalence.19 We also included reports of negative diverticulitis cases for the purpose of specificity evaluation. Each validator received an additional overlapping set of 20 reports, selected from the remaining reports for consistency among validators. All validators independently worked on the overlapping reports, and disagreements were resolved through thorough discussions until a consensus was reached and an inter-rater agreement of 100% was achieved. Each validator then reviewed their reports according to the consensus from the discussion.
Large Language Model Comparison
As a secondary analysis, we compared the performance between our NLP algorithm and Llama 3.1–8B, an open-source large language model (LLM) from Meta AI.20 We selected this model based on its satisfactory performance in disease detection within clinical notes and radiology reports13, 21 and its compatibility with our computational resources. We adapted our concept definition table into an LLM prompt for extraction (Supplemental Table 2). For each concept, we provided a firm definition, instructions to identify relevant mentions, and specific examples of inclusion/exclusion phrases. The LLM was then prompted to return a formatted output after extracting all concepts.
Statistical Analysis
We evaluated the performance of the NLP algorithm against the gold standard defined by the validators, using sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) for each concept. Quadratic weighted kappa was used to assess agreement across severity groups. We also compared the performance of the NLP algorithm to ICD diagnosis codes.
We built the cohort by applying the NLP algorithm to patients with an abdominopelvic CT report mentioning diverticular disease. Among 41,434 patients with ICD codes for diverticular disease and radiology reports, we excluded patients who did not have primary or admitting ICD code for diverticulitis (N=7,474), those who did not have an NLP-based positive abdominopelvic CT report of diverticulitis within +/− 30 days of their ICD diagnosis (N=15,523), those who had missing end of follow-up time or died or were lastly encountered within +/− 30 days of the first diagnosis (N=1,784), or those who had a concurrent diagnosis of inflammatory bowel disease within +/− 30 days of the first diagnosis (N=304), leaving 16,349 patients with first diagnosis of diverticulitis and a matched CT report in the analysis (Supplemental Figure 1).
The primary endpoint was severe diverticulitis recurrence, defined as any inpatient encounter with primary or admitting diagnosis of diverticulitis that happened more than 30 days after the initial episode. We assessed the validity of identifying severe diverticulitis recurrence using this approach by reviewing the medical records of a randomly selected group of 90 patients and confirmed a 94.0% accuracy rate.22 Our secondary outcome was severe diverticulitis recurrence between 31 days and two years after the first diagnosis.
All patients were followed from day 31 after the first diagnosis until the date of first severe recurrence, death, or last encounter, whichever happened the earliest. We used Kaplan-Meier curves to describe the cumulative incidence of severe diverticulitis recurrence according to severity groups at first diagnosis based on NLP-derived features. We investigated the associations between severity groups and the risk of severe diverticulitis recurrence using a Cox proportional hazards regression model, adjusting for the following covariates at the time of first diagnosis: age, sex, race, obesity, tobacco use disorder, alcohol use disorder, and Charlson Comorbidity Index (CCI). We checked the Schoenfeld residuals for proportional hazard assumption. Tobacco use disorder and alcohol use disorder were identified using machine learning algorithms based on ICD codes and NLP terms (positive predictive value [PPV]: 0.91–1.00). Obesity was defined by a mean body mass index (BMI) greater or equal to 30 kg/m2 or by the machine learning computed phenotype if BMI data were missing. CCI was defined as the weighted sum of 19 medical condition categories (e.g., a weight of 1 for congestive heart failure and a weight of 6 for metastatic solid tumor) using ICD-9 and ICD-10 codes. The score we used also implemented a more novel claim-based scheme compared with its original version.23 All comorbidities were assessed prior to the first diagnosis of diverticulitis.
We further examined the impact of surgery or antibiotic use at the first diagnosis on severe diverticulitis recurrence, within subgroups based on severity. Surgery was defined as colectomy or sigmoidectomy performed within +/− 30 days of the first diagnosis from procedure data. Antibiotic use was defined as any use of listed antibiotics within +/− 30 days of the first diagnosis from medication data (Supplemental Table 3).
We used random forest classifiers (random seed: 12629) to predict severe diverticulitis recurrence based on NLP-detected features and covariates. Our key hyperparameters included the number of trees (n_tree=500) and number of features at each split (mtry=3). The classifier was generated using a 60/20/20 random split of training/validation/test sets. The variable importance was scaled from the raw importance measured by the total decrease in node impurity via Gini importance. We performed a DeLong test to compare the Receiver Operating Characteristic (ROC) curves among the model with covariates only (age, sex, race, obesity, tobacco use disorder, alcohol use disorder, and CCI), the model with covariates and codified severity, defined as ICD-derived severity (complicated vs. uncomplicated) and prior admission (inpatient diagnosis at the first episode of diverticulitis), and the model further adding NLP-detected features.
Results
Sensitivity and specificity of our NLP algorithm ranged from 82.8% to 99.9%, with PPV and NPV ranging from 85.8% to 99.8% for diverticulitis and its relevant features including wall thickening, fat stranding, inflammatory change, and complications of varying severity (Table 1). The quadratic weighted kappa between the NLP-derived severity group and that determined by human validators was 0.94. The NLP algorithm significantly outperformed ICD codes in classifying complicated diverticulitis across varying severity groups, showing a marked increase in sensitivity and NPV (Table 2). The NLP algorithm also outperformed the Llama LLM on both PPV and NPV across features (Table 3).
Table 1.
Performance of the natural language processing algorithm for identification of diverticulitis and associated features from blinded internal validation1
| Concept | Positive cases | Negative cases | Sensitivity (%) | Specificity (%) | PPV (%) | NPV (%) |
|---|---|---|---|---|---|---|
| Diverticulitis | 1146 | 74 | 99.1 | 86.5 | 99.1 | 86.5 |
| Wall thickening | 846 | 374 | 96.8 | 91.8 | 96.3 | 92.8 |
| Fat stranding | 783 | 437 | 92.6 | 96.9 | 98.5 | 85.8 |
| Inflammatory change | 582 | 638 | 87.7 | 97.0 | 97.1 | 87.6 |
| Mild complications | ||||||
| Microperforation | 133 | 1087 | 98.5 | 99.9 | 99.2 | 99.8 |
| Phlegmon | 160 | 1060 | 98.1 | 99.8 | 98.8 | 99.7 |
| Severe complications | ||||||
| Abscess | 297 | 923 | 96.6 | 98.7 | 96.0 | 98.9 |
| Perforation | 320 | 900 | 96.5 | 98.6 | 95.9 | 98.8 |
| Free air | 164 | 1056 | 82.8 | 99.0 | 93.9 | 97.0 |
| Peritonitis | 53 | 1167 | 89.7 | 99.9 | 98.1 | 99.5 |
| Chronic complications | ||||||
| Fistula | 176 | 1044 | 98.1 | 98.3 | 89.8 | 99.7 |
| Obstruction | 45 | 1175 | 91.1 | 99.7 | 91.1 | 99.7 |
| Stricture | 50 | 1170 | 93.8 | 99.6 | 90.0 | 99.7 |
A total of 1,220 CT reports were reviewed in the blinded internal validation with four validators.
PPV: positive predictive value; NPV: negative predictive value
Table 2.
Comparison of NLP algorithm and ICD codes in distinguishing complicated vs. uncomplicated diverticulitis across severity groups1
| Severity group | Sensitivity (%) | Specificity (%) | PPV (%) | NPV (%) |
|---|---|---|---|---|
| Any complications | ||||
| ICD | 56.6 | 90.3 | 93.9 | 44.2 |
| NLP | 98.9 | 94.9 | 98.1 | 97.2 |
| Mild complications | ||||
| ICD | 29.2 | 90.3 | 59.6 | 72.3 |
| NLP | 100 | 94.9 | 90.6 | 100 |
| Severe complications | ||||
| ICD | 64.5 | 90.3 | 90.9 | 62.8 |
| NLP | 98.2 | 94.9 | 96.7 | 97.2 |
| Chronic complications | ||||
| ICD | 58.8 | 90.3 | 79.2 | 77.8 |
| NLP | 100 | 94.9 | 92.5 | 100 |
A total of 1,220 CT reports from the blinded internal validation were included. ICD-9 codes for complicated diverticulitis included diverticulitis (562.01, 562.03, 562.11, 562.13) with concurrent complications (560.9, 567.21, 567.22, 567.38, 567.9, 568.89, 569.5, 569.83, 569.81, 593.82, 596.1, 596.2, 619.1, and 619.2) within +/− 30 days of the episode. ICD-10 codes for complicated diverticulitis included diverticulitis (K57.00, K57.01, K57.12, K57.13, K57.20, K57.21, K57.32, K57.33, K57.40, K57.41, K57.52, K57.53, K57.80, K57.81, K57.92, K57.93) with concurrent complications (K56.6, K63.0, K63.1, K63.2, K65.0, K65.1, K65.9, K66.8, K68.19, N28.89, N32.1, N32.2, N82.3, N82.4, N82.5). Bleeding was not classified as a complication.
PPV: positive predictive value; NPV: negative predictive value
Table 3.
Performance of the Llama 3.1 model with 8B parameters for identification of diverticulitis and associated features from blinded internal validation1
| Concept | Sensitivity (%) | Specificity (%) | PPV (%) | NPV (%) |
|---|---|---|---|---|
| Diverticulitis | 97.7 | 77.0 | 98.5 | 68.7 |
| Wall thickening | 82.7 | 96.0 | 97.9 | 71.3 |
| Fat stranding | 75.8 | 94.1 | 96.5 | 64.3 |
| Inflammatory change | 91.6 | 64.8 | 74.4 | 87.4 |
| Mild complications | ||||
| Microperforation | 92.5 | 94.8 | 68.5 | 99.0 |
| Phlegmon | 100 | 90.8 | 62.4 | 100 |
| Severe complications | ||||
| Abscess | 93.2 | 96.1 | 88.4 | 97.8 |
| Perforation | 92.5 | 93.0 | 82.4 | 97.2 |
| Free air | 83.9 | 96.2 | 80.0 | 97.1 |
| Peritonitis | 82.8 | 99.3 | 85.7 | 99.1 |
| Chronic complications | ||||
| Fistula | 90.1 | 99.3 | 95.4 | 98.5 |
| Obstruction | 84.4 | 95.3 | 40.9 | 99.4 |
| Stricture | 93.8 | 99.3 | 84.9 | 99.7 |
A total of 1,220 CT reports were reviewed in the blinded internal validation with 4 validators. Each validator validated 305 distinct reports and 20 overlapping reports for results comparison.
PPV: positive predictive value; NPV: negative predictive value
Common errors included linguistic, negation, and annotation inconsistencies. Linguistic ambiguity involved uncertainty about a feature, such as “if there is clinical concern for fistula”, indicating the absence of the feature but was detected as positive by NLP due to lack of explicit negations. False negatives included phrases with negation words but not meaning negations, such as “no significant change in a short segment of wall thickening”. Annotation inconsistencies included two types. The first involved additional descriptions before features that were not recognized as medical term tokens in the spaCy model, such as “concentric wall thickening” and “surrounding fat strandings”, confusing feature detection. The second involved inconsistent interpretations of the same phrase, such as “cannot be excluded”, in different contexts. This uncertainty led to a disagreement between the annotator, who made the decision based on the entire report, and the NLP algorithm, which extracted features from each split sentence.
Among 16,349 patients with diverticulitis included in the cohort analysis, 12,274 cases were uncomplicated, 547 had mild complications, 2,906 had severe complications, and 622 had chronic complications (Table 4). Fistula was the most common chronic complication, while abscess and perforation were the most frequent severe complications. Compared to those with uncomplicated diverticulitis, patients with any complications were less likely to be female or obese but were more likely to use antibiotics and undergo surgery. Patients with chronic complications displayed a higher proportion of tobacco or alcohol use disorder, as well as a higher CCI at baseline.
Table 4.
Baseline characteristics of patients at their first diagnosis of diverticulitis according to severity groups (N=16,349)
| Characteristic1 | Uncomplicated (N = 12,274) |
Mild complications (N = 547) |
Severe complications (N = 2,906) |
Chronic complications (N = 622) |
|---|---|---|---|---|
| Age at first diagnosis, years | 61.0 (14.3) | 57.1 (14.1) | 60.3 (14.2) | 65.4 (13.6) |
| Female sex | 61.1 | 52.7 | 53.8 | 51.8 |
| White race | 82.8 | 81.7 | 85.0 | 87.8 |
| Non-Hispanic | 86.0 | 87.8 | 86.8 | 87.5 |
| Body mass index, kg/m 2 | 29.9 (10.1) | 29.4 (6.5) | 29.9 (19.4) | 30.1 (6.6) |
| Obesity | 32.8 | 27.6 | 28.7 | 30.5 |
| Tobacco use disorder | 28.1 | 27.2 | 30.9 | 35.4 |
| Alcohol use disorder | 6.4 | 5.3 | 6.5 | 8.0 |
| Surgery 2 | 0.5 | 1.1 | 5.2 | 10.1 |
| Antibiotic use | 81.6 | 86.5 | 88.9 | 87.5 |
| Charlson Comorbidity Index | 3.4 (4.5) | 2.5 (3.8) | 3.0 (4.3) | 3.7 (5.2) |
| Uncomplicated features | ||||
| Wall thickening | 67.9 | 77.5 | 75.7 | 80.7 |
| Fat stranding | 69.9 | 72.4 | 65.6 | 68.6 |
| Inflammatory change | 45.4 | 53.9 | 58.4 | 59.3 |
| Mild complications | ||||
| Microperforation | 0 | 64.2 | 5.9 | 4.3 |
| Phlegmon | 0 | 39.5 | 11.9 | 12.4 |
| Severe complications | ||||
| Abscess | 0 | 0 | 55.0 | 42.8 |
| Perforation | 0 | 0 | 61.3 | 28.0 |
| Free air | 0 | 0 | 22.8 | 10.3 |
| Peritonitis | 0 | 0 | 2.2 | 1.3 |
| Chronic complications | ||||
| Fistula | 0 | 0 | 0 | 84.2 |
| Obstruction | 0 | 0 | 0 | 14.6 |
| Stricture | 0 | 0 | 0 | 3.5 |
Means (SD) are shown for continuous variables (age, BMI, and CCI) and percentages are shown for categorical variables.
Surgery was defined as whether a participant underwent colectomy or sigmoidectomy within +/− 30 days of their initial diverticulitis diagnosis.
A total of 3,192 patients developed severe diverticulitis recurrence during a total follow-up of 76736 years (median: 3.3 years). NLP-derived complications were significantly associated with an increased risk of severe diverticulitis recurrence (Figure 1). Compared to uncomplicated, the multivariable adjusted hazard ratio (HR) was 1.39 (95% confidence interval [CI]: 1.14–1.69) for mildly complicated diverticulitis, 3.02 (95% CI: 2.80–3.27) for diverticulitis with severe complications, and 5.41 (95%CI: 4.78–6.13) for those with chronic complications. The results remained robust in the analysis of severe diverticulitis recurrence within two years after the first diagnosis (Supplemental Table 4).
Figure 1. Severe diverticulitis recurrence by disease severity based on NLP-derived features.

(A) Kaplan-Meier (KM) curves illustrate the cumulative incidence of recurrent diverticulitis requiring hospitalization according to severity groups categorized as uncomplicated (diverticulitis without any complications), mildly complicated (microperforation or phlegmon), severely complicated (abscess, perforation, free air, or peritonitis), and chronic complications (fistula, obstruction, or stricture). (B) Results from Cox proportional hazards models in crude model and multivariate model adjusting for age at first diagnosis, sex, race, obesity, tobacco use disorder, alcohol use disorder, and Charlson Comorbidity Index.
A random forest classifier successfully used NLP-detected features to distinguish participants who developed severe diverticulitis recurrence from those who did not (Figure 2). The model’s AUC was 0.70 when including codified severity, NLP-detected features, and covariates, and 0.63 when including codified severity and covariates, compared to 0.52 with covariates only (DeLong P<0.0001 for both models). The model with additional NLP-detected features showed a significant improvement from the model with codified severity and covariates (DeLong P = 0.0001). The most important features included prior admission, ICD-derived severity, non-white race, wall thickening, and inflammatory change.
Figure 2. Receiver operating curve (ROC) for the random forest models for predicting recurrent diverticulitis requiring hospitalization with covariates only and with covariates and detected features.

(A) Area under the curve (AUC) is 0.52 for the model with covariates only, 0.63 for the model with covariates and codified severity, and 0.70 for the model with covariates, codified severity, and NLP-detected features. Covariates included age at first diagnosis, sex, race, obesity, tobacco use disorder, alcohol use disorder, and Charlson Comorbidity Index. Codified severity represented ICD-derived severity (complicated vs. uncomplicated) and prior admission (inpatient diagnosis at the first episode of diverticulitis). NLP-detected features included wall thickening, fat stranding, inflammatory change, microperforation, phlegmon, abscess, perforation, free air, peritonitis, fistula, obstruction, and stricture (severity group was not included). Further including NLP-detected features significantly improved the performance compared to the model with covariates and codified severity (DeLong P = 0.0001). (B) Importance of variables was scaled from the raw importance measured by the total decrease in node impurity via Gini importance. CCI, Charlson Comorbidity Index
Using the established longitudinal cohort, we further examined the long-term impact of surgery or antibiotic use at the time of initial diverticulitis diagnosis on future outcomes (Supplemental Table 5). In participants with uncomplicated diverticulitis, there was no significant difference in the risk of severe diverticulitis recurrence between those who underwent surgery at their first diagnosis and those who did not. However, among participants who had complicated diverticulitis, surgery was associated with a lower risk of severe recurrence (multivariable-adjusted HR: 0.78; 95% CI: 0.62–1.00). This reduction was most seen in those with chronic complications (HR: 0.37, 95% CI: 0.21–0.63; Supplemental Table 6). Antibiotic use was not significantly associated with a reduced risk of severe diverticulitis recurrence. Instead, there was an increased risk of recurrence among patients with complicated diverticulitis who received antibiotics at the first diagnosis (Supplemental Table 7).
Discussion
In this study, we developed and validated a novel NLP-based algorithm to accurately identify diverticulitis as well as its subtypes. Using the extracted features, we defined the severity of diverticulitis at the first diagnosis and found significant associations with future risk of developing severe diverticulitis recurrence. NLP-detected features remarkably improved prediction performance in differentiating participants with severe diverticulitis recurrence from those without recurrence. We further found that surgery was associated with a lower risk of recurrence requiring hospitalization among participants with complicated diverticulitis at their first diagnosis.
Natural language processing has significantly enhanced clinical decision-making and documentation in gastroenterology by extracting information from clinical notes and improving endoscopic quality metrics.24 It has also enabled automated analyses of free-text reports, facilitating identification of disease patterns and disease phenotyping.25 For instance, NLP has outperformed manual chart reviews in flagging patients at risk during colorectal cancer screenings.26 One NLP pipeline has shown high accuracy in detecting extraintestinal manifestations of inflammatory bowel disease.27 A prior study explored NLP in diverticular disease, showing good PPV but did not differentiate between diverticulosis and diverticulitis or assess disease severity.28 Another study applied a rule-based algorithm to determine the severity of diverticular disease with procedure codes but it still lacked detailed characterization of specific complications.29 Our study introduces a novel NLP algorithm with strong performance in identifying diverticulitis, its complications, and severity. Due to the complexity of concepts, variety of complications, and dispersed information in CT reports, our tool offers an efficient solution to support clinicians in assessing diverticulitis severity and guiding treatment decisions.
Consistent with our results, previous studies have found that complicated diverticulitis, in particular abscess and perforation, was associated with an increased risk of recurrence, emergency surgery, or mortality compared to uncomplicated diverticulitis.30–33 The current study supports these findings and also reveals a role for chronic complications, such as fistula, in predicting the risk of severe recurrence. This highlights the importance of considering diverticulitis severity and chronic complications when screening high-risk patients at their initial visit. Early identification and targeted management have the potential to help reduce recurrence rates and improve long-term outcomes.
Among patients with complicated diverticulitis, we observed a reduced risk of severe diverticulitis recurrence for those who underwent surgery compared to those treated conservatively. Rates of recurrent diverticulitis were significantly lower in the elective surgery group than in the observation group for patients with extraluminal air, although all recurrent episodes were successfully managed without surgery.34 A four-year follow-up of the Laparoscopic Elective Sigmoid Resection Following Diverticulitis (LASER) randomized controlled trial reported that elective sigmoid resection greatly reduced the risk of recurrence compared with conservative treatment but did not improve the quality of life in recurring, persistent painful, or complicated diverticulitis.35 We observed no benefit of antibiotics for uncomplicated diverticulitis patients, consistent with prior studies36, 37 and the shifts away from routine antibiotic use for uncomplicated disease. The increased risk of severe diverticulitis recurrence associated with antibiotics in complicated diverticulitis should be interpreted with caution, as it may primarily reflect that less severe cases were not treated with antibiotics.
To our knowledge, this study included the most comprehensive list of features relevant to diverticulitis, besides the most widely studied complications. It made sufficient use of an extensive amount of information in unstructured clinical reports through a validated algorithm that outperformed the existing diagnosis codes in categorizing diverticulitis severity. These extracted features enabled the development of a large cohort and provide a more detailed and robust comparison between risk of recurrent diverticulitis requiring hospitalization and complicated vs. uncomplicated diverticulitis.
However, our study has several limitations. First, our NLP algorithm was developed and validated using data exclusively from the MGB system. External validation using data from institutions outside MGB is warranted to improve its generalizability. We have made our algorithm publicly available on GitHub, welcoming applications to other EHR resources. Second, due to the constraints of EHR data, cases of recurrent diverticulitis managed at other hospitals or in outpatient settings were not captured. Finally, we relied on CT reports only and extracted the presence of each feature from text but did not incorporate details, such as size or location of abscess, which are not always mentioned. Future studies integrating other EHR data domains including CT imaging are needed to improve outcomes in diverticulitis.
In conclusion, we developed an NLP algorithm to accurately classify diverticulitis and its subtypes based on a full spectrum of features and complications, demonstrating better performance in identifying diverticulitis severity than ICD codes. NLP-detected features have the potential to be incorporated in a clinical decision stool to improve risk stratification and identify patients who are more susceptible to readmission after their initial episode, thus helping guide management and prevention of diverticular disease.
Supplementary Material
Acknowledgment:
The authors thank Shawn Murphy and Henry Chueh and the Mass General Brigham Health Care Research Patient Data Registry group for facilitating use of their database.
Funding
This work was supported by the National Institutes of Health NIDDK 1K01DK135854-01A1 (WM); R01DK101495 (LLS, ATC); American Gastroenterological Association AGA Research Scholar Award AGA2021-13-01 (WM); AGA Research Scholar Award AGA2025-13-02 (JMD); and Massachusetts General Hospital MGH Claflin Distinguished Scholar Award (WM). The content is solely the responsibility of the authors and does not necessarily represent the official views of the funding agencies.
Conflict of interest
These authors disclose the following: ATC served as a consultant for Pfizer Inc., and Boehringer Ingelheim and received grants from Pfizer Inc, Zoe Ltd, and Freenome. LLS serve as a consultant for Medtronic. The remaining authors disclose no conflicts.
Data transparency statement
The data used in this study were obtained from the Research Patient Data Registry (RPDR) at Mass General Brigham. Data, analytic methods, and study materials are available to registered users of RPDR. These data are not publicly available due to patient privacy concerns and institutional data use policies. Researchers interested in accessing RPDR data may submit a data request through the Mass General Brigham RPDR system, subject to institutional review and approval.
References:
- 1.Peery AF, Murphy CC, Anderson C, et al. Burden and Cost of Gastrointestinal, Liver, and Pancreatic Diseases in the United States: Update 2024. Gastroenterology 2025;168:1000–1024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Everhart JE, Ruhl CE. Burden of digestive diseases in the United States part II: lower gastrointestinal diseases. Gastroenterology 2009;136:741–54. [DOI] [PubMed] [Google Scholar]
- 3.Jeyarajah S, Faiz O, Bottle A, et al. Diverticular disease hospital admissions are increasing, with poor outcomes in the elderly and emergency admissions. Aliment Pharmacol Ther 2009;30:1171–82. [DOI] [PubMed] [Google Scholar]
- 4.Kim S, Kwon OJ, Chervu NL, et al. National Trends in Hospital Admissions, Interventions, and Outcomes for Early-Onset (Age <50 years) Diverticulitis From 2005 to 2020. Dis Colon Rectum 2025;68:562–571. [DOI] [PubMed] [Google Scholar]
- 5.Tursi A, Scarpignato C, Strate LL, et al. Colonic diverticular disease. Nat Rev Dis Primers 2020;6:20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Strate LL, Morris AM. Epidemiology, Pathophysiology, and Treatment of Diverticulitis. Gastroenterology 2019;156:1282–1298.e1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Camilleri M, Sandler RS, Peery AF. Etiopathogenetic Mechanisms in Diverticular Disease of the Colon. Cell Mol Gastroenterol Hepatol 2020;9:15–32. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Peery AF, Shaukat A, Strate LL. AGA Clinical Practice Update on Medical Management of Colonic Diverticulitis: Expert Review. Gastroenterology 2021;160:906–911.e1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Dickerson EC, Chong ST, Ellis JH, et al. Recurrence of Colonic Diverticulitis: Identifying Predictive CT Findings-Retrospective Cohort Study. Radiology 2017;285:850–858. [DOI] [PubMed] [Google Scholar]
- 10.Erichsen R, Strate L, Sørensen HT, et al. Positive predictive values of the International Classification of Disease, 10th edition diagnoses codes for diverticular disease in the Danish National Registry of Patients. Clin Exp Gastroenterol 2010;3:139–42. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Liao KP, Cai T, Savova GK, et al. Development of phenotype algorithms using electronic medical records and incorporating natural language processing. Bmj 2015;350:h1885. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Ananthakrishnan AN, Cai T, Savova G, et al. Improving case definition of Crohn’s disease and ulcerative colitis in electronic medical records using natural language processing: a novel informatics approach. Inflamm Bowel Dis 2013;19:1411–20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Zheng NS, Keloth VK, You K, et al. Detection of Gastrointestinal Bleeding With Large Language Models to Aid Quality Improvement and Appropriate Reimbursement. Gastroenterology 2025;168:111–120.e4. [DOI] [PubMed] [Google Scholar]
- 14.Yan C, Ong HH, Grabowska ME, et al. Large language models facilitate the generation of electronic health record phenotyping algorithms. J Am Med Inform Assoc 2024;31:1994–2001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Schmitt X, Kubler S, Robert J, et al. A Replicable Comparison Study of NER Software: StanfordNLP, NLTK, OpenNLP, SpaCy, Gate, In 2019 Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS), 22–25 Oct. 2019, 2019. [Google Scholar]
- 16.Chapman WW, Bridewell W, Hanbury P, et al. A simple algorithm for identifying negated findings and diseases in discharge summaries. J Biomed Inform 2001;34:301–10. [DOI] [PubMed] [Google Scholar]
- 17.Chapman WW, Hillert D, Velupillai S, et al. Extending the NegEx lexicon for multiple languages. MEDINFO 2013: Ios Press, 2013:677–681. [PMC free article] [PubMed] [Google Scholar]
- 18.Sherman MS, Challa PK, Przybyszewski EM, et al. A natural language processing algorithm accurately classifies steatotic liver disease pathology to estimate the risk of cirrhosis. Hepatol Commun 2024;8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.César CC, Carvalho MS. Stratified sampling design and loss to follow-up in survival models: evaluation of efficiency and bias. BMC Med Res Methodol 2011;11:99. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Touvron H, Lavril T, Izacard G, et al. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971 2023. [Google Scholar]
- 21.Wei Y, Wang X, Ong H, et al. Enhancing Disease Detection in Radiology Reports Through Fine-tuning Lightweight LLM on Weak Labels. AMIA Jt Summits Transl Sci Proc 2025;2025:614–623. [PMC free article] [PubMed] [Google Scholar]
- 22.Ha J, Bridge CP, Andriole KP, et al. Visceral Fat Quantified by a Fully Automated Deep-Learning Algorithm and Risk of Incident and Recurrent Diverticulitis. Dis Colon Rectum 2025;68:726–735. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Glasheen WP, Cordier T, Gumpina R, et al. Charlson Comorbidity Index: ICD-9 Update and ICD-10 Translation. Am Health Drug Benefits 2019;12:188–197. [PMC free article] [PubMed] [Google Scholar]
- 24.Hou JK, Imler TD, Imperiale TF. Current and future applications of natural language processing in the field of digestive diseases. Clin Gastroenterol Hepatol 2014;12:1257–61. [DOI] [PubMed] [Google Scholar]
- 25.Parasa S, Sridhar AR. Natural Language Processing in Gastroenterology: Current Applications and Future Directions. Gastrointest Endosc Clin N Am 2025;35:309–317. [DOI] [PubMed] [Google Scholar]
- 26.Denny JC, Choma NN, Peterson JF, et al. Natural language processing improves identification of colorectal cancer testing in the electronic medical record. Med Decis Making 2012;32:188–97. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Stidham RW, Yu D, Zhao X, et al. Identifying the Presence, Activity, and Status of Extraintestinal Manifestations of Inflammatory Bowel Disease Using Natural Language Processing of Clinical Notes. Inflamm Bowel Dis 2023;29:503–510. [DOI] [PubMed] [Google Scholar]
- 28.Joo YY, Pacheco JA, Thompson WK, et al. Multi-ancestry genome- and phenome-wide association studies of diverticular disease in electronic health records with natural language processing enriched phenotyping algorithm. PLoS One 2023;18:e0283553. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Ueland TE, Younan SA, Evans PT, et al. Unmet social needs and diverticulitis: a phenotyping algorithm and cross-sectional analysis. J Am Med Inform Assoc 2025;32:866–875. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.El-Sayed C, Radley S, Mytton J, et al. Risk of Recurrent Disease and Surgery Following an Admission for Acute Diverticulitis. Dis Colon Rectum 2018;61:382–389. [DOI] [PubMed] [Google Scholar]
- 31.Li D, de Mestral C, Baxter NN, et al. Risk of readmission and emergency surgery following nonoperative management of colonic diverticulitis: a population-based analysis. Ann Surg 2014;260:423–30; discussion 430–1. [DOI] [PubMed] [Google Scholar]
- 32.Gregersen R, Andresen K, Burcharth J, et al. Long-term mortality and recurrence in patients treated for colonic diverticulitis with abscess formation: a nationwide register-based cohort study. Int J Colorectal Dis 2018;33:431–440. [DOI] [PubMed] [Google Scholar]
- 33.Hupfeld L, Burcharth J, Pommergaard HC, et al. Risk factors for recurrence after acute colonic diverticulitis: a systematic review. Int J Colorectal Dis 2017;32:611–622. [DOI] [PubMed] [Google Scholar]
- 34.You K, Bendl R, Taut C, et al. Randomized clinical trial of elective resection versus observation in diverticulitis with extraluminal air or abscess initially managed conservatively. Br J Surg 2018;105:971–979. [DOI] [PubMed] [Google Scholar]
- 35.Santos A, Mentula P, Pinta T, et al. Sigmoid Resection vs Conservative Treatment After Diverticulitis: Prespecified 4-Year Analysis of the LASER Randomized Clinical Trial. JAMA Surg 2025;160:615–622. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Mocanu V, Dang JT, Switzer N, et al. The role of antibiotics in acute uncomplicated diverticulitis: A systematic review and meta-analysis. Am J Surg 2018;216:604–609. [DOI] [PubMed] [Google Scholar]
- 37.Desai M, Fathallah J, Nutalapati V, et al. Antibiotics Versus No Antibiotics for Acute Uncomplicated Diverticulitis: A Systematic Review and Meta-analysis. Dis Colon Rectum 2019;62:1005–1012. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The data used in this study were obtained from the Research Patient Data Registry (RPDR) at Mass General Brigham. Data, analytic methods, and study materials are available to registered users of RPDR. These data are not publicly available due to patient privacy concerns and institutional data use policies. Researchers interested in accessing RPDR data may submit a data request through the Mass General Brigham RPDR system, subject to institutional review and approval.
