Natural Language Processing As an Alternative to Manual Reporting of Colonoscopy Quality Metrics

GOTTUMUKKALA S RAJU; PHILLIP J LUM; REBECCA SLACK; SELVI THIRUMURTHI; PATRICK M LYNCH; ETHAN MILLER; BRIAN R WESTON; MARTA L DAVILA; MANOOP S BHUTANI; MEHNAZ A SHAFI; ROBERT S BRESALIER; ALEXANDER A DEKOVICH; JEFFREY H LEE; SUSHOVAN GUHA; MALA PANDE; BORIS BLECHACZ; ASIF RASHID; MARK ROUTBORT; GLADIS SHUTTLESWORTH; LOPA MISHRA; JOHN R STROEHLEIN; WILLIAM A ROSS

doi:10.1016/j.gie.2015.01.049

. Author manuscript; available in PMC: 2016 Sep 1.

Published in final edited form as: Gastrointest Endosc. 2015 Apr 22;82(3):512–519. doi: 10.1016/j.gie.2015.01.049

Natural Language Processing As an Alternative to Manual Reporting of Colonoscopy Quality Metrics

GOTTUMUKKALA S RAJU ¹, PHILLIP J LUM ¹, REBECCA SLACK ², SELVI THIRUMURTHI ¹, PATRICK M LYNCH ¹, ETHAN MILLER ¹, BRIAN R WESTON ¹, MARTA L DAVILA ¹, MANOOP S BHUTANI ¹, MEHNAZ A SHAFI ¹, ROBERT S BRESALIER ¹, ALEXANDER A DEKOVICH ¹, JEFFREY H LEE ¹, SUSHOVAN GUHA ¹, MALA PANDE ¹, BORIS BLECHACZ ¹, ASIF RASHID ¹, MARK ROUTBORT ¹, GLADIS SHUTTLESWORTH ¹, LOPA MISHRA ¹, JOHN R STROEHLEIN ¹, WILLIAM A ROSS ¹

PMCID: PMC4540652 NIHMSID: NIHMS686001 PMID: 25910665

Abstract

BACKGROUND & AIMS

The adenoma detection rate (ADR) is a quality metric tied to interval colon cancer occurrence. However, manual extraction of data to calculate and track the ADR in clinical practice is labor-intensive. To overcome this difficulty, we developed a natural language processing (NLP) method to identify patients, who underwent their first screening colonoscopy, identify adenomas and sessile serrated adenomas (SSA). We compared the NLP generated results with that of manual data extraction to test the accuracy of NLP, and report on colonoscopy quality metrics using NLP.

METHODS

Identification of screening colonoscopies using NLP was compared with that using the manual method for 12,748 patients who underwent colonoscopies from July 2010 to February 2013. Also, identification of adenomas and SSAs using NLP was compared with that using the manual method with 2259 matched patient records. Colonoscopy ADRs using these methods were generated for each physician.

RESULTS

NLP correctly identified 91.3% of the screening examinations, whereas the manual method identified 87.8% of them. Both the manual method and NLP correctly identified examinations of patients with adenomas and SSAs in the matched records almost perfectly. Both NLP and manual method produce comparable values for ADR for each endoscopist as well as the group as a whole.

CONCLUSIONS

NLP can correctly identify screening colonoscopies, accurately identify adenomas and SSAs in a pathology database, and provide real-time quality metrics for colonoscopy.

Keywords: Natural language processing, adenoma detection rate, serrated adenoma detection rate, colonoscopy quality metrics

Introduction

Adenoma detection rate (ADR) is an important quality metric for colonoscopy performance.¹ A recent study demonstrated an inverse association between the ADR and subsequent risk of interval, advanced-stage interval, and fatal interval colorectal cancer.² Currently, calculation of the ADR and other colonoscopy quality metrics requires careful review of all electronic medical records and endoscopy and pathology reports to identify screening colonoscopies. This is followed by manual entry of data into a database. Manual reporting of ADRs is laborious, time-consuming, and resource-intensive, even for a small sample in the absence of highly structured procedural reports.³ Reporting the quality metrics takes many man-hours.⁴ Given that the number of colonoscopies performed in the United States is increasing (>14 million per year in 2012),⁵ manual reporting of the ADR and other quality metrics is not sustainable. Although progress has been made in endoscopy software to provide colonoscopy technique-based quality metrics, majority receive data from pathology software as PDF documents and not as structured data to calculate ADR automatically (table 1). This entails employing a professional to review the pathology data and enter it into the endoscopy software before quality metrics such as ADR could be calculated or uploaded to GiQuIC to report to Medicare.

Table 1.

Current interaction of available endoscopy software with pathology databases^*

Endoscopy Software	Vendor	Endoscopy-Pathology interface
Endoworks	Olympus	Can only interface with Caris Life Science pathology software
Provation MD Gastroenterology	Provation	Can receive PDF from Pathology Software
EndoPro	Pentax Medical	Can receive PDF from Pathology Software
Endosoft	Endosoft	Can receive PDF from Pathology Software

Open in a new tab

Currently available software do not allow merging of pathology data with endoscopy data to report adenoma detection rate automatically. They require manual entry.

Natural language processing (NLP) uses computer-based linguistics and artificial intelligence to identify and extract information from free-text data sources such as progress notes , endoscopy procedure reports, laboratory test results, radiology reports, and pathology reports. NLP offers the opportunity to report data from unstructured procedural and pathology reports that may suffice in producing colonoscopy quality metrics.^6-9 Therefore, we examined the use of NLP in reporting colonoscopy quality metrics. We developed an NLP-based software program to identify patients undergoing their first screening colonoscopy, who are at average risk for colon cancer and with no symptoms and extract information from their corresponding pathology reports, including the number and type of polyps (e.g., adenoma, hyperplastic polyps). We then compared the results with those obtained using manual data extraction. We report herein the colonoscopy quality metrics for our group obtained using NLP.

Methods

A computer application for ADR reporting (CAADRR) using the NLP method was developed to supplant the manual method of reporting the ADR as part of a quality improvement project. Three endoscopists manually reviewed and collected data on patients at The University of Texas MD Anderson Cancer Center who underwent screening colonoscopy. The MD Anderson Institutional Review Board approved this project.

CAADRR Design

The CAADRR consists of 3 separate programs: data abstraction into a staging database (containing data from different sources required to generate the ADR), data processing (using NLP to extract data from paragraphs into structured fields and linking colonoscopy reports with correct pathology reports), and data presentation and result reporting (Figure 1).

Architecture of the computer application for ADR reporting (CAADRR).

Data Extraction

The data extraction software program interfaces with external computer systems and pulls data into the staging database according to the patient's medical record number (MRN). Data extraction consists of the 3 steps described below (Figure 1).

Step 1: Endoscopy Data Extraction

Our institution's endoscopy information system (EndoWorks 7 [Olympus America, Center Valley, PA] coupled with a data warehousing platform has a feature that allows for a weekly data dump into a SQL server database. The extraction system extracts all colonoscopy reports from this database into a staging database.

Step 2: Demographic Information and Transcribed Document Extraction

Race, sex, and medical data are extracted from the electronic medical transcribed records (Clinic Station, The University of Texas MD Anderson Cancer Center, Houston) into the staging database, according to the MRNs in the colonoscopy reports.

Step 3: Pathology Report Extraction

All pathology reports according to the MRNs in the colonoscopy reports are extracted from the pathology database (Sunquest PowerPath; Sunquest Information Systems; 250 South Williams Blvd. Tucson, AZ 85711)

Data Processing

The data processing software program in the CAADRR uses the NLP to convert paragraph text into structured fields in seven steps. (1) Abstract key terms from the pathology reports into structured fields. (2) Match the MRNs on the pathology reports with the MRNs on the given colonoscopy reports. (3) Identify keywords in the pathology report to determine whether the report is linked with a colonoscopy report. (4) Abstract key terms for cecal intubation and preparation quality from the procedure report into structured fields. (5) Match the colonoscopy procedure date with either the collection or receiving date on the pathology report. If the procedure date does not match either the pathology collection or receiving date, the program will try to match them within 5 days of the pathology report collection or receiving date. (6) Identify past colonoscopies and/or pathology reports in relation to the patient's current colonoscopy report. If a past colonoscopy pathology report is detected, it is noted in the previous colonoscopy field. The same process is applied if no past pathology reports are detected for the patient.

The next step in data processing is to decide whether the colonoscopy report is a screening examination. In this process, the data processing program will identify the keyword “screening” in the indication section of the colonoscopy report. Once the examination is identified as a screening examination, the next step is to eliminate the non-screening colonoscopy reports according to the following key areas: patients not 50 to 75 years old as indicated by the endowriter; patients with symptoms (examples: bleeding, abdominal pain, constipation, diarrhea, bowel obstruction); patients with familial adenomatous polyposis, Peutz-Jeghers syndrome, Li-Fraumeni syndrome, Lynch syndrome, or serrated polyposis; patients with a history of inflammatory bowel disease as indicated in colonoscopy reports; patients with family history of colon cancer and polyps; patients with prior history of colon polyps; and patients with previous colonoscopy reports (look for keywords in the colonoscopy reports [e.g., “Last Colonoscopy x years ago,” “1^st Colonoscopy x years ago”] and search the staging database for a past colonoscopy or pathology report).

(7) The final step in data processing is to process the transcribed medical documents to determine whether any past colonoscopy procedures are mentioned in them. If so, the transcribed document date must precede the colonoscopy procedure date. This step is performed if the examination is, up until this point, still considered a screening examination.

Data Presentation

The data presentation software application in the CAADRR shows the raw data and NLP results on one tab and the resulting statistics in another. (1) Each colonoscopy report is shown as a row along with the associated pathology report if present and contains an extended field that indicates whether the report is for a screening colonoscopy and the key words used to determine whether a colonoscopy is a screening procedure or not. (2) Each pathology report associated with a screening colonoscopy report contains the following abstracted pathology fields: tubular adenoma, tubulovillous adenoma, villous adenoma, adenomatous polyps, mixed adenoma, sessile serrated adenoma (SSA), serrated adenoma, traditional serrated adenoma, hyperplastic polyp in the right colon, hyperplastic polyp in left colon, unknown location of hyperplastic polyp, low-grade and high-grade dysplasia, cancer, and miscellaneous polyps (inflammatory polyps, lymphoid polyps etc). Whether an endoscopist identified one or more adenomas per patient is based on pathology report according the specimen submitted in a jar for analysis. Multiple adenomas were said to be present only if adenomatous tissue was found in more than one pathology jar submitted. (3) The data processing results are grouped and then fed into the statistical algorithms. The reporting application can give the ADR, adenoma burden (number of adenomas in a patient found to have an adenoma) and number and percentage of SSAs, serrated lesions, adenomas, advanced adenomas, and polyps. These data can be presented in a multitude of combinations (whole group from beginning to present and by year; each endoscopist in groups of 100 procedures categorized by year, race, or age [≤60, 60-70, and ≥70 years]).

Development of the CAADRR

The entire CAADRR project was developed using the C# programming language and the Visual Studio Ultimate software program (2012 edition; Microsoft, Redmond, WA). C# is a modern object-oriented programming language that provides a rich set of string manipulation functions built into the string variable. These functions are the building blocks for NLP.

Application Development Shortcuts

The ADR reporting application is composed of an advanced data grid (XtraGrid) purchased from a third-party software vendor (DevExpress, Glendale, Calif) with advance built-in functions like wild-card searching of free text, filtering of any field, grouping of data, and exporting of data to an Excel spreadsheet (Microsoft). The code algorithms used to generate the statistics in the application were taken from the Remondo website (http://www.remondo.net/category/algorithm/).

CAADRR Development Cycle

The CAADRR development cycle consist of 5 periods. (1) Two weeks for table replication of Olympus data warehouse tables, pathology tables, and transcription of documents and the demographic data from the electronic medical record into the staging database. (2) Two weeks for development and testing of an application for transfer of data from the various source databases to the staging database. (3) Six weeks for development of the NLP algorithms to process the pathology and colonoscopy reports, linkage of pathology reports with current colonoscopy reports, and identification of preceding colonoscopy and pathology reports to determine whether the current identified screening colonoscopy report is not a surveillance report. (4) Two weeks for development of a reporting application with statistical output. (5) Nine weeks for testing and refinement of the NLP algorithms.

CAADRR Testing

CAADRR testing was performed for 15,471 patients with 19,331 colonoscopy reports from June 19, 2009, to February 28, 2013, and who had 158,645 pathology reports (endoscopy, surgical pathology, and cytology reports) recorded in our pathology database, including 32,450 colonoscopy reports (November 1998 to June 2009) exported from our retired EndoPRO database (Pentax Medical, Montvale, NJ) into the staging database to identify patients with prior colonoscopies. The CAADRR was tested in 3 phases.

Phase 1

The NLP data were organized in a linear fashion—one row, one colonoscopy examination record (indications, findings, etc.) along with the full text and abstracted pathology data—allowing us to easily check thousands of records to see whether the NLP engine was functioning correctly. An iterative process was used to fine-tune the NLP algorithms and collect the keyword dictionary search terms. After running the data processing program, the data in the data grid were examined to filter, sort, and search for key terms to determine whether the NLP engine produced the desired results. New keyword definitions or features were then introduced into the NLP engine to achieve the desired results.

Phase 2

The NLP results were compared with a manually abstracted data set from a previous study to check the adenoma, serrated adenoma, advanced adenoma, and left-sided hyperplastic polyp detection rates in 343 patients.³ The accuracy of parsing of the data from the pathology reports was verified. The program failed to link 3 examinations with pathology reports.

Phase 3

The NLP results were compared with those of another manual method of data extraction from July 1, 2010, to February 28, 2013 (12,748 colonoscopy procedure reports) to identify patients undergoing screening colonoscopy. In addition, MRN-matched records for manual and NLP processing were used to test the NLP program's ability to report the ADR.

In the manual review, all colonoscopy reports were evaluated to determine whether the examination was an initial screening examination. The endoscopy report, reason for referral to endoscopy, pathology report, and past records both from MDA as well as outside facilities were included in the review. This labor intensive approach was adopted as the use of “screening” in the colonoscopy template was inconsistent among endoscopists. The intent was to generate ADR data for as many providers as possible with 90 screening exams during the study period being the minimum. As earlier work has shown that low procedural volumes produce large ADR confidence intervals complicating interpretation, the aim was to capture as many examinations that could be considered screening as possible.¹⁰ For example, referrals for heme positive stool or rectal bleeding in an otherwise asymptomatic 66-year-old with no prior colonoscopy, would be considered screening examinations for the manual review.

Statistics

The numbers of screening examinations are reported using manual data extraction and NLP. Manual data extraction is the standard practice of collecting data retrospectively from medical records. In our analysis, the total number of patients having colonoscopy examinations was defined as the number of examinations extracted using NLP. All of the examinations that were identified using both methods were then used to compare the methods’ accuracy. When both methods agreed, the relevant information was assumed to be true. Upon disagreement between the manually extracted and NLP-extracted data, the patient's medical record was re-examined by an experienced gastroenterologist not involved in initial manual review, who was aware of the results of manual and NLP review process, to resolve the disagreement. The agreements and resolved disagreements make up the criterion standard for determining the accuracy of each measure (called “reviewed”), including whether an examination is considered a screening examination. The numbers and percentages of correct (and incorrect) identification were tabulated for screening identification. Then ADR and SSA detection rate were assessed in all screens to identify whether NLP could accurately identify such detections. Finally, the ADR in screening visits was calculated separately by each method as if it were going to be reported without knowledge of the other method to establish whether NLP may be a reasonable approach for routine ADR reporting.

Results

Identification of Patients Who Underwent Screening Colonoscopy

A total of 12,748 patients underwent colonoscopy examinations from July 1, 2010, to February 28, 2013. Table 2 shows that 2259 and 2169 of these examinations were initially identified as screening examinations using the manual and automated methods, respectively. After resolving differences between manual and automated methods, we observed 2288 true screenings and 10,460 examinations that did not meet the screening criteria. The manual data extraction method identified 2009 (87.8%) of the true screening examinations, whereas NLP identified 2089 (91.3%) of them.

Table 2.

Manual and NLP-Based Extraction of Screening Information for All Patients Undergoing Colonoscopy Examinations (N = 12,748)

	N (%)
Screening Status	Manual Extraction^a	NLP^a	Reviewed
Screening^b	2259	2169	2288

Correctly identified	2009 (87.8)	2089 (91.3)	2288
Incorrectly identified	279 (12.2)	199 (8.7)	0

Not screening^b	10,489	10,579	10,460

Correctly identified	10,210 (97.6)	10,380 (99.2)	10,460
Incorrectly identified	250 (2.4)	80 (0.8)	0

Open in a new tab

Percentages are based on the totals in the Reviewed column.

For manually extracted data, the intention was to obtain all screening examinations.

Therefore, if the examination was included in the data set, it was called screening; if not included, it was called “not screening.”

Reporting of Colonoscopy Quality Metrics

Both the manual data extraction method and NLP correctly identified adenomas perfectly in nearly every case (Table 3). Also, both methods correctly identified SSAs perfectly in almost all cases, although the NLP performed slightly better (Table 4). Table 5 presents the ADRs for each method using the number of screens detected by that respective method. This presents the ADR that would be reported if each were the only method performed. In 4 versus 15 cases, NLP vs. the manual method differed from review by more than 3 percentage points, across males, females, or combined estimates. The maximum ADR difference between NLP and review is 7%, and appears for physician 2 where female ADR is under-reported by 7% and male ADR is over-reported by 7%. For manual collection, the maximum ADR difference from review is 6% and occurs twice for males, once for physician 5 and once for physician 10. Overall, NLP and the manual method always produce an ADR that is within the confidence interval of the reviewed ADR. There is no consistent trend showing that NLP may over or under report the ADR in routine reporting.

Table 3.

Manual and NLP-Based Identification of Adenomas

	N (%)
Adenoma diagnosis	Manual Extraction^a	NLP^a	Reviewed
Called adenoma^b	951	962	966

Correctly identified	950 (98.3)	960 (99.4)	966
Incorrectly identified	16 (1.7)	6 (0.6)	0

Called not adenoma^b	1308	1297	1293

Correctly identified	1292 (99.9)	1291 (99.8)	1293
Incorrectly identified	1 (0.1)	2 (0.2)	0

Open in a new tab

Percentages are based on totals in the Reviewed column.

Adenoma identified for all matched data regardless of screening status to identify the ability to accurately determine the presence of adenoma.

Table 4.

Manual and NLP-Based Identification of Serrated Sessile Adenomas (SSAs)

	N (%)
SSA diagnosis	Manual Extraction^a	NLP^b	Reviewed
Called SSA^b	190	194	193

Correctly identified	186 (96.4)	193 (100)	193
Incorrectly identified	7 (3.6)	0	0

Called not SSA^b	2069	2065	2066

Correctly identified	2062 (99.8)	2065 (99.9)	2066
Incorrectly identified	4 (0.2)	1 (0.0)	0

Open in a new tab

Percentages are based on totals in the Reviewed column.

SSA is identified for all matched data regardless of screening status to identify the ability to accurately determine the presence of SSA.

Table 5.

ADR by Physician and Gender for Screening Visits as Identified by the Same Method.

Physician	Gender	N	Manual ADR% (95% CI)	N	NLP ADR% (95% CI)	N	Reviewed ADR% (95% CI)
All	All	2259	42 (40, 44)	2169	43 (41, 45)	2288	43 (41, 45)
	F	1453	36 (34, 39)	1348	37 (34, 39)	1439	37 (34, 39)
	M	806	52 (49, 56)	821	54 (50, 57)	849	54 (51, 58)

1	All	142	50 (42, 58)	138	49 (41, 58)	149	51 (43, 59)
	F	88	43 (33, 54)	85	46 (35, 57)	93	45 (35, 55)
	M	54	61 (48, 75)	53	55 (41, 69)	56	61 (48, 74)

2	All	63	30 (19, 42)	45	29 (15, 43)	51	29 (16, 42)
	F	39	28 (13, 43)	26	19 (3, 35)	31	26 (9, 42)
	M	24	33 (13, 54)	19	42 (18, 67)	20	35 (12, 58)

3	All	131	47 (38, 55)	119	41 (32, 50)	123	44 (35, 53)
	F	93	39 (29, 49)	85	32 (22, 42)	85	34 (24, 44)
	M	38	66 (50, 82)	34	65 (48, 82)	38	66 (50, 82)

4	All	231	29 (24, 35)	228	30 (24, 36)	224	32 (26, 38)
	F	146	23 (16, 30)	140	24 (16, 31)	140	25 (18, 32)
	M	85	40 (29, 51)	88	41 (30, 51)	84	43 (32, 54)

5	All	152	38 (30, 46)	116	43 (34, 52)	149	41 (33, 49)
	F	97	29 (20, 38)	71	31 (20, 42)	95	29 (20, 39)
	M	55	55 (41, 68)	45	62 (47, 77)	54	61 (48, 75)

6	All	255	37 (31, 43)	224	36 (30, 43)	243	36 (30, 42)
	F	161	28 (21, 35)	136	26 (18, 33)	151	25 (18, 32)
	M	94	53 (43, 63)	88	52 (42, 63)	92	54 (44, 65)

7	All	203	49 (42, 56)	197	52 (45, 59)	203	51 (44, 58)
	F	130	44 (35, 52)	122	46 (37, 55)	128	45 (36, 53)
	M	73	59 (47, 70)	75	63 (51, 74)	75	63 (51, 74)

8	All	252	60 (54, 66)	234	62 (55, 68)	256	61 (55, 67)
	F	158	58 (50, 66)	142	59 (51, 67)	156	59 (51, 67)
	M	94	64 (54, 74)	92	65 (55, 75)	100	64 (54, 74)

9	All	185	55 (47, 62)	187	52 (45, 60)	197	52 (45, 59)
	F	122	48 (39, 57)	124	46 (37, 55)	130	46 (37, 55)
	M	63	68 (56, 80)	63	65 (53, 77)	67	64 (52, 76)

10	All	377	33 (29, 38)	388	37 (32, 42)	406	37 (32, 42)
	F	254	31 (25, 37)	249	33 (27, 39)	265	33 (28, 39)
	M	123	38 (30, 47)	139	45 (36, 53)	141	44 (36, 52)

11	All	78	21 (11, 30)	76	22 (13, 32)	75	21 (12, 31)
	F	49	12 (3, 22)	47	17 (6, 28)	46	15 (4, 26)
	M	29	34 (16, 53)	29	31 (13, 49)	29	31 (13, 49)

12	All	190	44 (37, 51)	217	46 (39, 52)	212	46 (39, 53)
	F	116	40 (31, 49)	121	39 (30, 48)	119	39 (30, 48)
	M	74	51 (40, 63)	96	54 (44, 64)	93	55 (45, 65)

Open in a new tab

Discussion

Our study demonstrates that NLP can accurately identify patients who undergo screening colonoscopy and the pathology of colon polyps thereby generating reliable ADR values for quality assurance. In addition, it facilitates reporting of the ADR stratified by age, sex, and race, as well as over time (and additional variables) more rapidly and with less effort than does the manual data extraction method.

Currently, ADR is calculated using manual methods, which are not only laborious but also cumbersome because of the frequent use of unstructured reports as well as the use of free-form text in report generation. In addition, some endoscopists use the term “screening” liberally to include colon polyp surveillance examinations, examinations of patients with colonic symptoms, examinations of patients with family histories of colon cancer, and other types of examinations in cases with a high likelihood of finding colon polyps; these reports require additional manual search of the electronic medical records to determine whether this was truly a screening examination or not.

Manual data extraction requires dedicated personnel who are knowledgeable in endoscopy and pathology terminology to enter data, which is time-consuming and costly over the long term. In our study, endoscopists manually extracted the records for internal audit for approximately 5 minutes per record entry after reviewing the endoscopy and pathology records as well as looking at past medical records. Because of these challenges in calculating the ADR, some centers have resorted to calculation of the polypectomy rate using administrative claims data, which has proven to be an accurate surrogate for the ADR in preliminary reports and may become an important quality measure for external and internal use.¹¹ However, the polypectomy rate is perceived by the endoscopy community to be set up for potential misuse by endoscopists if they remove normal tissue or nonneoplastic polyps and report these resections as polypectomies. This problem is not an issue with reporting of the ADR, which can be accomplished using NLP as shown in our study and by others.^12,13 Admittedly, the ADR can be influenced by examinations misclassified as screening examinations. Therefore, we made an intense effort to accurately identify all examinations that were screening examinations, not just those noted in the procedure indication by reviewing the past pathology, endoscopy, and electronic medical records.

NLP accurately identifies the pathology of polyps as demonstrated in our study as well as one by Imler et al.¹⁴ In addition, it can identify the locations, sizes, and numbers of adenomas as well as hyperplastic polyps when these data are included in either the pathology or endoscopic report. This allows for detailed reporting of colonoscopy quality metrics, such as the ADR, serrated lesion detection rate, multiple ADR, and adenoma burden per patient.

NLP-based measurement of the ADR and other colonoscopy quality metrics has several benefits. For example, NLP can report the ADR and various quality metrics in different groups according to age, sex, and race. This is particularly meaningful when using these subgroups in comparing quality metrics for different endoscopists, practices, and regions of the country because patient demographics and evolving standards of practice have bearing on the ADR.^15-17 Also, the ability of NLP to search pathology databases, transcribed medical documents, and endoscopy databases for prior examinations allows for separating screening examinations from surveillance examinations to report quality metrics. Once the NLP is set up, it can review tens of thousands of records quickly and provide accurate reports. In our study, reviewing and recording the data for 12,748 patients at the rate of 5 minutes per record entry took 1062 hours, or 26.5 weeks of full-time labor. A study this large would be prohibitively expensive when using a manual data extraction method over the long term and on a national scale.

The design of the CAADRR using 3 separate programs has several benefits. Both data extraction and data processing, which are time-consuming tasks, can run during off hours to enable the system to perform efficiently. In addition to efficiency of operation, the CAADRR design allows for easy adoption of changes in electronic medical records and endoscopy writing software as well as changes in hospital computer systems. The data extraction program is designed as an independent program, which minimizes the effect of changes in electronic medical records and endoscopy writing software in generating the quality metrics. Also, the staging database has all of the necessary fields for generating quality metrics. This design permits use of the program even if the hospital computer system changes because the only change in the CAADRR necessary is in how the data extraction program accesses endoscopy, pathology and electronic medical record systems to obtain the necessary data and place them in the correct fields in the staging database.

NLP provides information as reliably and accurately as the manual method of extracting data as demonstrated in our study. Therefore, this NLP potentially can be used for reporting quality metrics to the CMS for national benchmarking. However, our NLP method has some limitations. We used data from one academic institution with one particular type of endoscopy reporting system, one electronic medical record system and one pathology management software system. It has yet to be tested in terms of its ability to link with The GI Quality Improvement Consortium. The program has not been tested to extract data from scanned documents. It is not set up to include patients if they had undergone colonoscopy at an outside facility, especially when there is no entry about the procedure in the transcribed documents or in the endoscopy report. The program is also dependent on accurate data entry regarding indications to correctly identify patients who have undergone true screening examinations because discrepancies exist among NLP, the manual data extraction method, and the reviewed information regarding the screening examinations. The reviewed data is considered our criterion standard because screening examinations and adenomas may be missed by both the manual method and NLP. In the absence of a better method, we have no means of identifying those screening examinations without an additional manual audit of all colonoscopies called no screening or with no adenoma or SSA detected. Our comparison of NLP with a careful manual data collection procedure consistent with standard practice is a reasonable approach. Another limitation of our study is the potential for underreporting of adenoma burden per patient because the system is unable to identify patients when multiple adenomas are submitted in a single jar.

In conclusion, the present study demonstrated the value of NLP in (1) identifying patients who underwent screening colonoscopy, (2) reporting correctly the identification of adenomas and serrated adenomas, and (3) reporting in real time the performance of colonoscopists according to the types of patients they serve. The program must be explored further regarding its potential use with different types of reporting systems as well as ability to sync with registries such as The GI Quality Improvement Consortium. Such programs have the potential to significantly reduce the burden on practitioners in reporting quality metrics in a timely, accurate, low-cost manner, freeing up time for them to care for patients and address other regulatory issues.

Acknowledgments

Grant support:

Supported by the NIH/NCI under award number P30CA016672 and K07CA160753, and used the Biostatistics Shared Resource.

Abbreviations

NLP: Natural language processing
ADR: adenoma detection rate
SSA: sessile serrated adenoma

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Contributions of authors: 1. study concept and design; 2. acquisition of data; 3. analysis and interpretation of data; 4. drafting of the manuscript; 5. critical revision of the manuscript for important intellectual content; 6. statistical analysis; 7. obtained funding; 8. technical, or material support; 9. study supervision

GOTTUMUKKALA S. RAJU: 1,2,3,4,5,7,8,9

PHILLIP J. LUM: 1,2,4,8,9

REBECCA SLACK: 3, 5, 6,

SELVI THIRUMURTHI: 2,5,8,

PATRICK M. LYNCH: 1,2,5,8,9

ETHAN MILLER: 1,2,5,8

BRIAN R. WESTON: 1,2,5,8

MARTA L. DAVILA: 1,2,5,8

MANOOP S. BHUTANI: 1,2,5,8

MEHNAZ A. SHAFI: 1,2,5,8

ROBERT S. BRESALIER: 1,2,5,8

ALEXANDER A. DEKOVICH: 1,2,8 (deceased)

JEFFREY H. LEE:1,2,5,8

SUSHOVAN GUHA:1,2,5,8

MALA PANDE:3,5,8

BORIS BLECHACZ: 1,2,8

ASIF RASHID: 1,2,5,,8,9

MARK ROUTBORT: 1,2,8,9

GLADIS SHUTTLESWORTH:1,2,8,9

LOPA MISHRA:1,2,7,8

JOHN R. STROEHLEIN: 1,2,5,7,8

WILLIAM A. ROSS: 1,2,3,5,8,9

References

1.Calderwood AH, Jacobson BC. Colonoscopy quality: metrics and implementation. Gasteroenterol Clin North Am. 2013;42:599–618. doi: 10.1016/j.gtc.2013.05.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Corley DA, Jensen CD, Marks AR, et al. Adenoma detection rate and risk of colorectal cancer and death. N Engl J Med. 2014;370:1298–1306. doi: 10.1056/NEJMoa1309086. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Raju GS, Vadyala V, Slack R, et al. Adenoma detection in patients undergoing a comprehensive colonoscopy screening. Cancer Med. 2013;2:391–402. doi: 10.1002/cam4.73. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Ross WA, Thirumurthi S, Lynch PM, et al. Detection rates of premalignant polyps during screening colonoscopy: time to revise quality standards? Gastrointest Endosc. 2014 doi: 10.1016/j.gie.2014.07.030. accepted for publication. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Seeff LC, Richards TB, Shapiro JA, et al. How many endoscopies are performed for colorectal cancer screening? Results from CDC's survey of endoscopic capacity. Gastroenterology. 2004;127:1670–1677. doi: 10.1053/j.gastro.2004.09.051. [DOI] [PubMed] [Google Scholar]
6.Ohno-Machado L. What's new in informatics. J Am Med Inform Assoc. 2011;18:1. doi: 10.1136/jamia.2010.009910. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Ohno-Machado L. Realizing the full potential of electronic health records: the role of natural language processing. J Am Med Inform Assoc. 2011;18:539. doi: 10.1136/amiajnl-2011-000501. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Nadkarni PM, Ohno-Machado L, Chapman WW. Natural language processing: an introduction. J Am Med Inform Assoc. 2011;18:544–551. doi: 10.1136/amiajnl-2011-000464. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Hou JK, Imler TD, Imperiale TF. Current and future applications of natural language processing in the field of digestive diseases. Clin Gastroenterol Hepatol. 2014;12:1257–1261. doi: 10.1016/j.cgh.2014.05.013. [DOI] [PubMed] [Google Scholar]
10.Do A, Weinberg J, Kakkar A, Jacobson BC. Reliability of adenoma detection rate is based on procedural volume. Gastrointest Endosc. 2013;77:376–80. doi: 10.1016/j.gie.2012.10.023. [DOI] [PubMed] [Google Scholar]
11.Patel NC, Islam RS, Wu Q, et al. Measurement of polypectomy rate by using administrative claims data with validation against the adenoma detection rate. Gastrointest Endosc. 2013;77:390–394. doi: 10.1016/j.gie.2012.09.032. [DOI] [PubMed] [Google Scholar]
12.Mehrotra A, Dellon ES, Schoen RE, et al. Applying a natural language processing tool to electronic health records to assess performance on colonoscopy quality measures. Gastrointest Endosc. 2012;75:1233–9.e14. doi: 10.1016/j.gie.2012.01.045. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Deutsch JC. Colonoscopy quality, quality measures, and a natural language processing tool for electronic health records. Gastrointest Endosc. 2012;75:1240–1242. doi: 10.1016/j.gie.2012.02.031. [DOI] [PubMed] [Google Scholar]
14.Imler TD, Morea J, Kahi C, et al. Natural language processing accurately categorizes findings from colonoscopy and pathology reports. Clin Gastroenterol Hepatol. 2013;11:689–694. doi: 10.1016/j.cgh.2012.11.035. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Kahi CJ, Vemulapalli KC, Johnson CS, et al. Improving measurement of the adenoma detection rate and adenoma per colonoscopy quality metric: the Indiana University experience. Gastrointest Endosc. 2014;79:448–454. doi: 10.1016/j.gie.2013.10.013. [DOI] [PubMed] [Google Scholar]
16.Hernandez LV, Deas TM, Catalano MF, et al. Longitudinal assessment of colonoscopy quality indicators: a report from the Gastroenterology Practice Management Group. Gastrointest Endosc. 2014;80:835–41. doi: 10.1016/j.gie.2014.02.1043. [DOI] [PubMed] [Google Scholar]
17.Gawron AJ, Thompson WK, Keswani RN, et al. Anatomic and advanced adenoma detection rates as quality metrics determined via natural language processing. Am J Gastroenterol. 2014;109:1844–9. doi: 10.1038/ajg.2014.147. [DOI] [PubMed] [Google Scholar]

[R1] 1.Calderwood AH, Jacobson BC. Colonoscopy quality: metrics and implementation. Gasteroenterol Clin North Am. 2013;42:599–618. doi: 10.1016/j.gtc.2013.05.005. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] 2.Corley DA, Jensen CD, Marks AR, et al. Adenoma detection rate and risk of colorectal cancer and death. N Engl J Med. 2014;370:1298–1306. doi: 10.1056/NEJMoa1309086. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] 3.Raju GS, Vadyala V, Slack R, et al. Adenoma detection in patients undergoing a comprehensive colonoscopy screening. Cancer Med. 2013;2:391–402. doi: 10.1002/cam4.73. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] 4.Ross WA, Thirumurthi S, Lynch PM, et al. Detection rates of premalignant polyps during screening colonoscopy: time to revise quality standards? Gastrointest Endosc. 2014 doi: 10.1016/j.gie.2014.07.030. accepted for publication. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] 5.Seeff LC, Richards TB, Shapiro JA, et al. How many endoscopies are performed for colorectal cancer screening? Results from CDC's survey of endoscopic capacity. Gastroenterology. 2004;127:1670–1677. doi: 10.1053/j.gastro.2004.09.051. [DOI] [PubMed] [Google Scholar]

[R6] 6.Ohno-Machado L. What's new in informatics. J Am Med Inform Assoc. 2011;18:1. doi: 10.1136/jamia.2010.009910. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] 7.Ohno-Machado L. Realizing the full potential of electronic health records: the role of natural language processing. J Am Med Inform Assoc. 2011;18:539. doi: 10.1136/amiajnl-2011-000501. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] 8.Nadkarni PM, Ohno-Machado L, Chapman WW. Natural language processing: an introduction. J Am Med Inform Assoc. 2011;18:544–551. doi: 10.1136/amiajnl-2011-000464. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] 9.Hou JK, Imler TD, Imperiale TF. Current and future applications of natural language processing in the field of digestive diseases. Clin Gastroenterol Hepatol. 2014;12:1257–1261. doi: 10.1016/j.cgh.2014.05.013. [DOI] [PubMed] [Google Scholar]

[R10] 10.Do A, Weinberg J, Kakkar A, Jacobson BC. Reliability of adenoma detection rate is based on procedural volume. Gastrointest Endosc. 2013;77:376–80. doi: 10.1016/j.gie.2012.10.023. [DOI] [PubMed] [Google Scholar]

[R11] 11.Patel NC, Islam RS, Wu Q, et al. Measurement of polypectomy rate by using administrative claims data with validation against the adenoma detection rate. Gastrointest Endosc. 2013;77:390–394. doi: 10.1016/j.gie.2012.09.032. [DOI] [PubMed] [Google Scholar]

[R12] 12.Mehrotra A, Dellon ES, Schoen RE, et al. Applying a natural language processing tool to electronic health records to assess performance on colonoscopy quality measures. Gastrointest Endosc. 2012;75:1233–9.e14. doi: 10.1016/j.gie.2012.01.045. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] 13.Deutsch JC. Colonoscopy quality, quality measures, and a natural language processing tool for electronic health records. Gastrointest Endosc. 2012;75:1240–1242. doi: 10.1016/j.gie.2012.02.031. [DOI] [PubMed] [Google Scholar]

[R14] 14.Imler TD, Morea J, Kahi C, et al. Natural language processing accurately categorizes findings from colonoscopy and pathology reports. Clin Gastroenterol Hepatol. 2013;11:689–694. doi: 10.1016/j.cgh.2012.11.035. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] 15.Kahi CJ, Vemulapalli KC, Johnson CS, et al. Improving measurement of the adenoma detection rate and adenoma per colonoscopy quality metric: the Indiana University experience. Gastrointest Endosc. 2014;79:448–454. doi: 10.1016/j.gie.2013.10.013. [DOI] [PubMed] [Google Scholar]

[R16] 16.Hernandez LV, Deas TM, Catalano MF, et al. Longitudinal assessment of colonoscopy quality indicators: a report from the Gastroenterology Practice Management Group. Gastrointest Endosc. 2014;80:835–41. doi: 10.1016/j.gie.2014.02.1043. [DOI] [PubMed] [Google Scholar]

[R17] 17.Gawron AJ, Thompson WK, Keswani RN, et al. Anatomic and advanced adenoma detection rates as quality metrics determined via natural language processing. Am J Gastroenterol. 2014;109:1844–9. doi: 10.1038/ajg.2014.147. [DOI] [PubMed] [Google Scholar]

PERMALINK

Natural Language Processing As an Alternative to Manual Reporting of Colonoscopy Quality Metrics

GOTTUMUKKALA S RAJU

PHILLIP J LUM

REBECCA SLACK

SELVI THIRUMURTHI

PATRICK M LYNCH

ETHAN MILLER

BRIAN R WESTON

MARTA L DAVILA

MANOOP S BHUTANI

MEHNAZ A SHAFI

ROBERT S BRESALIER

ALEXANDER A DEKOVICH

JEFFREY H LEE

SUSHOVAN GUHA

MALA PANDE

BORIS BLECHACZ

ASIF RASHID

MARK ROUTBORT

GLADIS SHUTTLESWORTH

LOPA MISHRA

JOHN R STROEHLEIN

WILLIAM A ROSS

Abstract

BACKGROUND & AIMS

METHODS

RESULTS

CONCLUSIONS

Introduction

Table 1.

Methods

CAADRR Design

Figure 1.

Data Extraction

Step 1: Endoscopy Data Extraction

Step 2: Demographic Information and Transcribed Document Extraction

Step 3: Pathology Report Extraction

Data Processing

Data Presentation

Development of the CAADRR

Application Development Shortcuts

CAADRR Development Cycle

CAADRR Testing

Phase 1

Phase 2

Phase 3

Statistics

Results

Identification of Patients Who Underwent Screening Colonoscopy

Table 2.

Reporting of Colonoscopy Quality Metrics

Table 3.

Table 4.

Table 5.

Discussion

Acknowledgments

Abbreviations

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases