Identifying free-text features to improve automated classification of structured histopathology reports for feline small intestinal disease

Abdullah Awaysheh; Jeffrey Wilcke; François Elvinger; Loren Rees; Weiguo Fan; Kurt Zimmerman

doi:10.1177/1040638717744002

. 2017 Nov 30;30(2):211–217. doi: 10.1177/1040638717744002

Identifying free-text features to improve automated classification of structured histopathology reports for feline small intestinal disease

Abdullah Awaysheh ^1,^2,³, Jeffrey Wilcke ^1,^2,³, François Elvinger ^1,^2,³, Loren Rees ^1,^2,³, Weiguo Fan ^1,^2,³, Kurt Zimmerman ^1,^2,^3,¹

PMCID: PMC6505871 PMID: 29188759

Abstract

The histologic evaluation of gastrointestinal (GI) biopsies is the standard for diagnosis of a variety of GI diseases (e.g., inflammatory bowel disease [IBD] and alimentary lymphoma [ALA]). The World Small Animal Veterinary Association (WSAVA) Gastrointestinal International Standardization Group proposed a reporting standard for GI biopsies consisting of a defined set of microscopic features. We compared the machine classification accuracy of free-text microscopic findings with those represented in the WSAVA format with a diagnosis of IBD and ALA. Unstructured free-text duodenal biopsy pathology reports from cats (n = 60) with a diagnosis of IBD (n = 20), ALA (n = 20), or normal (n = 20) were identified. Biopsy samples from these cases were then scored following the WSAVA guidelines to create a set of structured reports. Three supervised machine-learning algorithms were trained using the structured and then the unstructured reports. Diagnosis classification accuracy for the 3 algorithms was compared using the structured and unstructured reports. Using naive Bayes and neural networks, unstructured information-based models achieved higher diagnostic accuracy (0.90 and 0.88, respectively) compared to the structured information-based models (0.74 and 0.72, respectively). Results suggest that discriminating diagnostic information was lost using current WSAVA microscopic guideline features. Addition of free-text features (number of plasma cells) increased WSAVA auto-classification performance. The methodologies reported in our study represent a way of identifying candidate microscopic features for use in structured histopathology reports.

Keywords: Histopathology report, machine learning, structured report, text mining

Introduction

A pathology report is an important communication link between a pathologist and a clinician. Different pathology reporting formats have been shown to have an effect on the message transferred. A 1992 study by the College of American Pathologists showed that standardized or checklist reporting practice was the only factor significantly associated with increased likelihood of providing complete or adequate pathology information.²⁴ The use of different formats in pathology reporting was recalled in 2000, in which the authors uncovered a communication gap between pathologists and surgeons as a result of unfamiliarity with different reporting formats, and they emphasized the need for more complete, clear, and standardized reporting.²⁰ Researchers also highlighted the need for a controlled vocabulary to improve communication and improve the quality of data reported. A 2004 survey of reports from 96 veterinary clinical pathologists showed that 68 unique terms were used to express probability or likelihood of cytologic diagnoses.⁷ A 2014 study found expressions of uncertainty in 35% of 1,500 human surgical pathology reports.¹⁶ Traditional unstructured (free-text) reports can be more detailed, explicit, and representative of real-world findings, but they can be incomplete, unclear, and not easily converted into a computable format.⁵ Therefore, in healthcare, to increase message clarity, standardize healthcare practice, and increase data interoperability across different systems, clinicians and pathologists are increasingly using structured reporting formats.^9,17,18,23

In 2005, the World Small Animal Veterinary Association (WSAVA) Gastrointestinal (GI) International Standardization Group took the responsibility of standardizing the histologic evaluation of the gastrointestinal tract of cats and dogs (https://goo.gl/zidmVH). Three years later, the group proposed standards for reporting microscopic findings from GI biopsy samples.⁸ The proposed standards were developed to help minimize variation among pathologists’ determination of microscopic severity of changes and to ensure consistent reporting of a standardized set of variables. Moreover, using structured recording of microscopic features will provide a basis for the development of a standardized and evidence-based diagnosis.

A 2010 study conducted by the College of American Pathologists showed that institutions that used checklists reported all required pathologic elements at a higher rate than those that did not use the checklists (88% vs. 34%).¹⁰ This suggests that the use of WSAVA checklist variables to record microscopic findings will ensure capturing most of the required information. However, to our knowledge, there has been no formal assessment of the coverage of WSAVA structured reports of the information needed to make a diagnosis.

We identified free-text histopathologic features not currently expressed in the WSAVA format that may provide evidence for distinguishing between inflammatory bowel disease (IBD) and alimentary lymphoma (ALA) in cats. We compared auto-classification accuracy of structured and unstructured histopathology reports into IBD and ALA classes using a variety of machine-learning algorithms. We hypothesized that WSAVA-based structured reports will have higher classification accuracy of these disorders compared to the use of an unstructured format.

Materials and methods

We examined free-text histopathology reports from 3 groups of 20 (60 total) client-owned cats presented to the Virginia Tech Veterinary Teaching Hospital (VTH; Blacksburg, VA) in 2008–2015. All cats were patients presented to the VTH with chronic GI disease of at least 2-wk duration and had undergone an upper GI endoscopic biopsy procedure. All cases were classified histologically as normal, IBD, or small-cell ALA by any 1 of 8 board-certified pathologists who was on duty at the time of original case presentation to the VTH.

Only cats with duodenal lymphocytic or plasma cellular inflammation were placed in the IBD group. None of the IBD cats had lymphoma at any other biopsy site. All cats in the ALA group had small-cell lymphoma based upon the 2008 World Health Organization (WHO) histologic classification standards.¹¹ Cats with ALA may have had a similar or a different diagnosis at another biopsy site.

The original hematoxylin and eosin duodenal biopsy slides for the unstructured cases were retrieved and randomized. The slides were reevaluated and reported in a structured format using the WSAVA guidelines by a single board-certified pathologist (K Zimmerman) who was blinded to the original diagnosis.⁸ Two (1 ALA, 1 normal) of the 60 cases had been originally diagnosed by this pathologist.

Data mining

Three supervised machine-learning models (naive Bayes, C4.5 decision tree, and artificial neural networks) were developed for each reporting format (structured and unstructured) using the Waika Environment for Knowledge Analysis (WEKA) data mining software (v.3.7, https://goo.gl/xD7W7i).²⁵ This open-source software provided tools for data pre-processing, classification, and visualization. Naive Bayes, J48, and multiple perceptrons are the 3 classification algorithms in WEKA that implemented the naive Bayes, C4.5 decision tree, and artificial neural networks algorithms, respectively.

Structured (WSAVA) data transformation

Following the WSAVA standards, 9 variables were evaluated for each duodenal section including: “villus stunting,” “villus epithelial injury,” “crypt distension,” “lacteal dilation,” “mucosal fibrosis,” “intraepithelial lymphocytes,” “lamina propria lymphocytes and plasma cells,” “lamina propria eosinophils,” “lamina propria neutrophils.” Each variable was recorded as follows: normal, mild increase, moderate increase, or marked increase. Only cats with moderate or marked duodenal lymphocytic or plasma cellular inflammation were placed in the IBD group. These ordinal values were transformed numerically into 0, 1, 2, and 3, respectively.

Unstructured (free-text) data transformation

The free-text description of each unstructured report was transformed into a word-occurrences vector using the “bag of words” method.^7,22 In this approach, every document was represented by a set of words (called features) that were extracted from its text. These features were tokenized from the text using the WEKA “AlphabeticTokenizer” algorithm, in which non-alphabetical elements were excluded. A lowercase transformation factor was applied to convert all text letters into lowercase. Irrelevant, non-informative, stop words (such as “the” and “of”) were excluded from the list of tokens by setting the WEKA “useStoplist” variable to “True.” The words “neoplasm” and “neoplasia” were excluded from the text tokens to prevent any bias introduced by using the classification class (diagnosis) as a predictor of itself.

The data type of the word-occurrences vector was defined as a quantitative data type by adjusting the WEKA “outputWordCounts” setting to “True.” Then, 2 transformation factors that have been shown to improve document categorization were applied to create a quantifiable, weighted representation of words.¹³

Term frequency transformation factor was applied using the WEKA “TFTransform” setting, in which word frequencies were transformed into log(1 + fij), where fij is the frequency of word i in document j. This weighting factor assumes that the more often a word occurs in a document, the more representative it is of the text content, so its weight should be of a high power.²²

Inverse document frequency is the second weighting factor that was used, as defined by WEKA “IDFTransform.” This setting transforms word frequencies in a document into:

f i j * \log (\frac{number of all
documents}{number of documents
with word i}),

where fij is the frequency of word i in document j. This weighting factor considers words that appear in many documents as words that have little discriminating power when used in classification, thus should be less weighted.

Using the WEKA “minTermFreq” option, we chose to include the words that occurred at least 3 times, and this is enforced on a per-class basis. Word frequencies were normalized with the document length using the WEKA “Normalize all data” setting. Settings of all other parameters were left at their default values unless otherwise noted. Table 1 summarizes WEKA parameter settings used in converting each document string into a word vector.

Table 1.

Parameter settings of “StringToWordVector” filter in Waika Environment for Knowledge Analysis (WEKA) used to convert terms for word strings, in the free-text (unstructured) microscopic description portion of the pathology reports, into a word vector.

Parameter for “StringToWordVector” filter	Setting
IDFTransform	True
TFTransform	True
attributeIndices	1^*
attributeNamePrefix	Plain (no entry)
doNotOperatePerClassBasis	False
invertSelection	False
lowerCaseTokens	True
minTermFreq	3
normalizeDocLength	Normalize all data
outputWordCounts	True
periodicPruning	−1.0
Stemmer	NullStemmer
Stopwords	Weka-3-7
Tokenizer	AlphabeticTokenizer
Usestoplist	True
WordsToKeep	1000

Open in a new tab

Setting is the attribute number that carried the string to be converted, which in our case was “1.”

Structured and unstructured feature selection

Only subsets of the structured and unstructured extracted features were selected to be used in classification in an attempt to improve the models’ performance, minimize processing time, and to avoid “overfitting” (the problem of being so specific to the examined dataset that the training output only describes the one dataset in detail as opposed to characterizing the main generalizable features of the training subset as a whole), as has been shown in previous studies.^4,22

The “Best First” searching method was used to create different subsets of features (Korf RE. Linear-space best-first search: summary of results. Proc Assoc Advance Artif Intell; 1992. Available from: https://goo.gl/DZyYnA), and “wrapper” was used as an evaluator of each subset.¹⁴ As a result, 3 subsets of features (one by each algorithm) were selected from each data format (structured and unstructured).

New candidate features for structured reports

New candidate features that might improve the classification performance of the structured models were identified by manually reviewing the list of 74 unstructured features not currently part of the structured reports identified. These new features were quantified in all of the tissue specimens, by averaging results from 5 different microscopic fields of the lamina propria at 1,000× (1.25 oil lens).

Classification models

Naive Bayes, C4.5 decision tree, and artificial neural networks algorithms were utilized to create the classification models. The 3 models were trained on the selected structured features, and independently on the unstructured features. Then, the same structured data models were retrained after adding new candidate features extracted from the unstructured reports.

To train and test the generated models, a 10-fold cross-validation technique was performed as described previously (Kohavi R. A study of cross-validation and bootstrap for accuracy estimation and model selection. Proc Intern Joint Conf Artif Intell, vol. 2, 1995:1137–1143. Available from: https://goo.gl/UtnmGk). In this technique, 10 different datasets were created; in every one of them, a different 10% data partition was held out for testing, and the rest of the 90% data were used for training, which results in the testing set never being a part of the training set in any of the 10 cases.

In order to exclude any distinction that can be the result of an artifact division of training and testing sets, 10 random repeats of the 10-fold cross-validation technique were performed. The parameter settings used in WEKA for classification are shown in Table 1.

As in the literature, we use the term “sensitivity” to mean “classification accuracy.” Average sensitivity and confidence intervals for every classifier were considered to assess and compare models’ performances. Differences in model performance were evaluated by one-way analysis of variance and Tukey comparison (significance p < 0.05) using commercial statistical software (JMP Pro 11, SAS Institute, Cary, NC).

Results

Features selected from the structured reports

From the original 9 features of the structured (WSAVA) reports, selected features were as follows: 1) using the naive Bayes: “villus epithelial injury,” “crypt distension,” “lacteal dilation,” “mucosal fibrosis,” “intraepithelial lymphocytes,” “lamina propria lymphocytes and plasma cells,” “lamina propria neutrophils,” and 2) using the decision tree and the artificial neural networks: “lamina propria lymphocytes and plasma cells” was the only selected feature

Features selected from the unstructured reports

From the unstructured reports, initial word vector tokenization resulted in the extraction of 74 unique words (Table 2). From these words, 18 were selected in the feature selection approach using the 3 algorithms (Table 3). Frequency of occurrence for the 18 words in association with the class diagnosis (normal, IBD, and ALA) is shown in Table 4.

Table 2.

Word features (terms) extracted from the free-text (unstructured) microscopic description portion of the pathology reports using “bag of words” methodology before applying specific feature selection methodologies associated with each algorithm.

abundant	clusters	expands	inflammatory	mucosa	observed	score
amounts	cytoplasm	fields	lamina	mucosal	occasional	sections
amphophilic	dense	figures	large	multifocal	plasma	sheets
approximately	diameter	glands	lymphocytes	muscularis	population	sized
arranged	diffusely	glandular	markedly	neutrophils	present	small
array	distinct	grade	medium	normal	propria	surface
basophilic	eosinophils	increased	mitotic	noted	rare	ten
borders	epithelial	indistinct	moderate	nuclei	roth	villi
cells	epithelium	infiltrate	moderately	nucleoli	round
central	expand	infiltrating	monomorphic	numbers	scant
chromatin	expanded	infiltration	monotonous	numerous	scattered

Open in a new tab

Table 3.

Word features (terms) selected by each algorithm using its feature-selection process.

All words selected	Selected by naive Bayes	Selected by C4.5	Selected by neural networks
cells	√
lamina		√
plasma	√		√
numbers		√
small		√
moderate		√	√
round	√	√	√
figures		√
population	√
normal	√		√
present		√
mitotic			√
large	√		√
inflammatory		√	√
expands	√	√	√
medium	√		√
numerous	√
surface		√

Open in a new tab

Table 4.

Frequency of occurrence of word features (terms) in the unstructured reports for the 3 cohorts: normal, inflammatory bowel disease (IBD), and alimentary lymphoma (ALA).

Word selected	No. of reports with the word
Word selected	Normal cases (n = 20)	IBD cases (n = 20)	ALA cases (n = 20)	Total (n = 60)
cells	6	16	15	37
lamina	10	13	13	36
plasma	5	18	7	30
numbers	4	8	5	17
small	2	3	11	16
moderate	0	8	6	14
round	0	0	11	11
mitotic	0	0	10	10
figures	0	0	9	9
population	0	0	7	7
normal	7	0	0	7
present	2	2	3	7
large	0	1	4	5
inflammatory	0	1	4	5
expands	0	4	0	4
medium	0	3	0	3
numerous	0	0	3	3
surface	0	0	3	3

Open in a new tab

New candidate features identified

From the unstructured reports, the word “plasma” appeared in 25% (5 of 20) of normal cases, in 90% (18 of 20) of IBD cases, and in 35% (7 of 20) of ALA cases. In the 7 ALA cases with plasma cells mentioned, 3 described few and 4 described rare numbers.

The number of plasma cells was a candidate feature that was not independently recorded by the structured (WSAVA) format. Quantifying the number of plasma cells per 5 microscopic fields revealed that the average number of cells was 4 (95% confidence interval [CI]: 2–6) in normal cases, 19 (95% CI: 15–24) in cases with IBD, and 8 (95% CI: 6–10) in ALA cases.

The words: “round,” “population,” “surface,” “numerous,” “figure,” and “mitosis” were associated with ALA 100% of the time and never appeared in any normal or IBD reports (Table 4). “Round” described the shape of the cells, and “population” described some distributional characteristics at large such as monotonous or homogeneous population. “Surface” referenced the epithelium surface in 3 reports, with 2 mentioning infiltration on the surface. “Numerous” was used in association with lymphocytes. In all cases, “figures” appeared in context with “mitotic” describing cellular mitotic activity. In all reports that recorded “mitotic” (appeared in ALA only), 7 of 10 described no mitotic activity observed, 2 recorded low mitotic activity, and 1 report recorded rare mitotic activity.

The feature “small” was associated with ALA (55%, 11 of 20) more than it was associated with normal (10%, 2 of 20) or IBD (15%, 3 of 20) cases. The 11 ALA and 2 of the 3 IBD cases described the size of the lymphocytes, whereas the remainder described small quantities of cells or tissues. For “large,” in 4 ALA cases: 1 described large nodules formed between glands, 1 described slightly large nuclei, and 2 cases referred to large numbers of lymphocytes; the only IBD case with “large” described large lymphocytes and large nucleus.

Classification model performance

Two-tailed t-tests showed that sensitivities achieved by naive Bayes and neural network classifiers using the unstructured reports (0.898 and 0.883, respectively) were higher than using the structured reports (0.735 and 0.717, respectively, p < .0001; Table 5). Two-tailed t-tests showed that sensitivity rates increased using the structured reports after adding the “lamina propria plasma cells” feature (0.792, 0.770, and 0.782 compared to 0.735, 0.717, and 0.717, respectively, for naive Bayes, decision tree, and neural networks, p < 0.05; Table 6).

Table 5.

Sensitivity (classification accuracy) when applying the 3 classifiers on the structured and unstructured datasets.

	Naive Bayes		C4.5 decision tree		Artificial neural networks
	Sensitivity	95% CI	Sensitivity	95% CI	Sensitivity	95% CI
Structured reports (WSAVA)
Normal	0.845	(0.797–0.893)	0.750	(0.696–0.804)	0.750	(0.696–0.804)
IBD	0.660	(0.590–0.730)	0.650	(0.582–0.718)	0.650	(0.582–0.718)
ALA	0.700	(0.635–0.765)	0.750	(0.689–0.812)	0.750	(0.689–0.812)
Average	0.735^a	(0.696–0.774)	0.717	(0.678–0.755)	0.717^b	(0.678–0.755)
Unstructured reports (free-text)
Normal	0.850	(0.802–0.898)	0.845	(0.795–0.895)	0.840	(0.789–0.891)
IBD	0.950	(0.920–0.980)	0.695	(0.628–0.762)	0.990	(0.970–1.00)
ALA	0.895	(0.850–0.940)	0.725	(0.661–0.789)	0.820	(0.764–0.876)
Average	0.898^a	(0.873–0.924)	0.755	(0.722–0.788)	0.883^b	(0.858–0.908)

Open in a new tab

ALA = alimentary lymphoma; CI = confidence interval; IBD = inflammatory bowel disease; WSAVA = World Small Animal Veterinary Association. Different superscripts (^a,b) represent a statistically different pair (2-tailed t-test, p < 0.0001).

Table 6.

Sensitivity (classification accuracy) when applying the 3 classifiers on the structured datasets after adding the “plasma cells” feature (term) into the learning set.

	Naive Bayes		C4.5 decision tree		Artificial neural networks
	Sensitivity	95% CI	Sensitivity	95% CI	Sensitivity	95% CI
Structured reports (WSAVA) + “plasma cells” feature
Normal	0.850	(0.794–0.906)	0.890	(0.844–0.936)	0.855	(0.804–0.906)
IBD	0.845	(0.793–0.897)	0.680	(0.613–0.750)	0.780	(0.712–0.845)
ALA	0.680	(0.618–0.742)	0.740	(0.680–0.800)	0.710	(0.655–0.765)
Average	0.792^a	(0.759–0.824)	0.770\|\|	(0.738–0.802)	0.782^c	(0.751–0.812)
Recalling values from Table 5 structured reports (without “plasma cells” feature)
Average	0.735^a	(0.696–0.774)	0.717^b	(0.678–0.755)	0.717^c	(0.678–0.755)

Open in a new tab

ALA = alimentary lymphoma; CI = confidence interval; IBD = inflammatory bowel disease; WSAVA = World Small Animal Veterinary Association. Different superscripts (^a,b,c) represent a statistically different pair (2-tailed t-test, p < 0.05).

Discussion

The models developed in our study to classify normal, IBD, and ALA classes using the unstructured histopathology reports showed that the models were able to achieve very good performance (sensitivity of 76–90%). The classification models developed using data from WSAVA structured reports demonstrated lower performance (sensitivity of 72–74%).

Frequency of occurrence analysis on the features extracted from the unstructured reports showed that the term “plasma” was more commonly associated with IBD (90% of cases, 18 of 20) than normal (25%, 5 of 20) or ALA cases (35%, 7 of 20). The 7 ALA finding descriptions with “plasma” described either few or rare plasma cells. Moreover, our study showed that recording the number of plasma cells in conjunction with some of the 9 WSAVA variables improved the classification accuracy of the 3 models (Table 6). Such findings are consistent with the literature, which shows that ALA is characterized by the infiltration of lymphocytes, unlike IBD, which is represented by the infiltration of lymphocytes and plasma cells.^3,6,12,24 To date, the WSAVA standards do not distinguish between the number of lymphocytes and the number of plasma cells when reporting; 1 variable called “lamina propria lymphocytes and plasma cells” is being used to represent the infiltration of any of the 2 cell types.

Mitotic activity was another feature selected as a predictor variable to distinguish among the 3 groups. However, all reports described low or rare mitoses and, while reviewing the tissue samples, no mitotic activity was identified; this finding is not surprising given that all of the ALA cases were of small-cell type and likely an indolent form of lymphoma. Although mitotic activity did not prove to be a good candidate in our study, we believe that recording this variable may have importance in other cases, such as large-cell lymphoma, in which the mitotic activity may be more notable.

The naive Bayes, decision trees, and artificial neural networks are common examples of machine-learning algorithms that can exploit underlying complex patterns in large datasets to classify cases into related groups. By using them in classification and prediction, these 3 algorithms have shown to support the decision-making process in many areas of medicine.^1,2 Our study showed that supervised machine-learning models developed using naive Bayes and neural networks achieve performance similar to or higher than that achieved using the C4.5 decision tree. The naive Bayes algorithm is known to assume conditional independence of features, which is not always true (Zhang H. The optimality of naive Bayes. Assoc Adv Artif Intell 2004;1:3. Available from: https://goo.gl/DUMC7t). Therefore, the naive Bayes algorithm does not perform as well as other classifiers when the training features are dependent on each other. The naive Bayes outperformed the decision tree classifier when learning from free-text, suggesting a reasonable independency assumption on free-text pathologic features. A previous study of diagnostic models has shown similar results in which the conditional independency assumption of the naive Bayes did not have any negative effects on performance.²⁷

Despite the relative similarities in the performance of the 3 algorithms, some other factors should be considered when comparing algorithms. The naive Bayes assumes independency of features. This assumption makes the algorithm simple and computationally inexpensive (which was clearly the motivation for the development of the naive version of the Bayes algorithm).²⁸ However, although the decision-tree model output is easy to understand and implement, it can be inefficient in solving complex problems. On the other hand, the multilayer artificial neural network algorithm has the ability of easily capturing the complexity of the relationships between features during its learning phase at a higher computational cost and with final results that make the underlying logic harder to understand.^21,22

In features selection, our results showed that only 18 words (removed 76% of the 74 extracted features) from the free-text were needed to classify the reports into the 3 classes. This finding is consistent with a previous feature selection study that showed that performances of classification models either improve or remain unchanged with removal of 75% of the features.¹⁹ Our study recalls the importance of selecting the subset of features needed for classification, which leads to an improvement in the models’ performance while minimizing the training time, data dimensionality, and the application complexity.

The reports selected in our study represent a corpus of simple cases for which no diseases other than IBD and ALA were considered. This selection represents a good example of testing the ability to classify histopathology reports into 2 of the most common GI diseases in companion animals. However, a larger corpus needs to be examined in the future to validate the results of our study and extract other features that can be noteworthy in other diseases. In our study, the free-text reports were created by 1 of 8 pathologists, and goodness-of-fit testing indicated they were not randomly associated with the 3 classes of reports. The bias that can be introduced by the reporting style of one or more pathologists might be one limitation of our study that can be overcome by utilizing a larger number of cases as well. The pathologist performing the WSAVA scoring (K Zimmerman) made the original diagnosis on 2 of the cases more than one year prior; given the timespan, limited number of cases, and blinding, this overlap was unlikely to have introduced any significant bias in the scoring process.

Although our study focused on histopathology reports for GI diseases, we believe that this work can be extended to assist in the creation of standardized histopathology reports involving other body structures and disorders. The methodologies reported in our study represent a way of extracting and selecting candidate features to be used when structuring or classifying reports. It can also serve as a quality assurance process to highlight any information loss when changing reporting formats.

Finally, a previous study has shown that structured reporting formats provide information that is considered to be simple and computable.¹⁵ Another study found that structured pathology reports provide complete or adequate pathology information.²⁶ A 2000 study showed a communication gap between clinicians and pathologists, and then concluded that the old, less-structured reporting format is associated with a lower misinterpretation rate by clinicians, and further improvement of the structured reports is essential.²⁰ Consistent with that study, our results showed that machine-learning models achieved a higher accuracy in classifying unstructured reports into the actual diagnosis when compared with structured reports. Our results suggest that discriminating information is lost when using just those features currently listed in the WSAVA GI biopsy reporting format. Therefore, further recording of some of the features identified from the unstructured reports, such as the number of plasma cells, in conjunction with the WSAVA features will improve auto-classification for similar reports.

Footnotes

Declaration of conflicting interests: The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding: The authors received no financial support for the research, authorship, and/or publication of this article.

References

1. Abbass HA. An evolutionary artificial neural networks approach for breast cancer diagnosis. Artif Intell Med 2002;25:265–281. [DOI] [PubMed] [Google Scholar]
2. Al-Omari FA, et al. An intelligent decision support system for quantitative assessment of gastric atrophy. J Clin Pathol 2011;64:330–337. [DOI] [PubMed] [Google Scholar]
3. Barrs VR, Beatty JA. Feline alimentary lymphoma: 2. Further diagnostics, therapy and prognosis. J Feline Med Surg 2012;14:191–201. [DOI] [PMC free article] [PubMed] [Google Scholar]
4. Blum AL, Langley P. Selection of relevant features and examples in machine learning. Artif Intell 1997;97:245–271. [Google Scholar]
5. Branavan SRK, et al. Learning document-level semantic properties from free-text annotations. J Artif Intell Res 2009;34:569–603. [Google Scholar]
6. Carreras JK, et al. Feline epitheliotropic intestinal malignant lymphoma: 10 cases (1997–2000). J Vet Intern Med 2003;17:326–331. [DOI] [PubMed] [Google Scholar]
7. Christopher MM, Hotz CS. Cytologic diagnosis: expression of probability by clinical pathologists. Vet Clin Pathol 2004;33:84–95. [DOI] [PubMed] [Google Scholar]
8. Day MJ, et al. Histopathological standards for the diagnosis of gastrointestinal inflammation in endoscopic biopsy samples from the dog and cat: a report from the World Small Animal Veterinary Association Gastrointestinal Standardization Group. J Comp Pathol 2008;138(Suppl 1):S1–43. [DOI] [PubMed] [Google Scholar]
9. Gephardt GN, Baker PB. Lung carcinoma surgical pathology report adequacy: a College of American Pathologists Q-Probes study of over 8300 cases from 464 institutions. Arch Pathol Lab Med 1996;120:922–927. [PubMed] [Google Scholar]
10. Idowu MO, et al. Adequacy of surgical pathology reporting of cancer: a College of American Pathologists Q-Probes study of 86 institutions. Arch Pathol Lab Med 2010;134:969–974. [DOI] [PubMed] [Google Scholar]
11. Jaffe ES. The 2008 WHO classification of lymphomas: implications for clinical practice and translational research. Hematology Am Soc Hematol Educ Program 2009:523–531. [DOI] [PMC free article] [PubMed] [Google Scholar]
12. Jergens AE, et al. Idiopathic inflammatory bowel disease in dogs and cats: 84 cases (1987–1990). J Am Vet Med Assoc 1992;201:1603–1608. [PubMed] [Google Scholar]
13. Jouhet V, et al. Automated classification of free-text pathology reports for registration of incident cases of cancer. Methods Inf Med 2012;51:242–251. [DOI] [PubMed] [Google Scholar]
14. Kushmerick N. Wrapper induction: efficiency and expressiveness. Artif Intell 2000;118:15–68. [Google Scholar]
15. Leslie KO, Rosai J. Standardization of the surgical pathology report: formats, templates, and synoptic reports. Semin Diagn Pathol 1994;11:253–257. [PubMed] [Google Scholar]
16. Lindley SW, et al. Communicating diagnostic uncertainty in surgical pathology reports: disparities between sender and receiver. Pathol Res Pract 2014;210:628–633. [DOI] [PubMed] [Google Scholar]
17. Miller PR. Inpatient diagnostic assessments: 2. Interrater reliability and outcomes of structured vs. unstructured interviews. Psychiatry Res 2001;105:265–271. [DOI] [PubMed] [Google Scholar]
18. Miller PR, et al. Inpatient diagnostic assessments: 1. Accuracy of structured vs. unstructured interviews. Psychiatry Res 2001;105:255–264. [DOI] [PubMed] [Google Scholar]
19. Noureldien NA, et al. The effect of feature selection on detection accuracy of machine learning algorithms. Int J Eng Res Tech 2013;2:1407–1410. [Google Scholar]
20. Powsner SM, et al. Clinicians are from Mars and pathologists are from Venus. Arch Pathol Lab Med 2000;124:1040–1046. [DOI] [PubMed] [Google Scholar]
21. Rezaei-Darzi E, et al. Comparison of two data mining techniques in labeling diagnosis to Iranian pharmacy claim dataset: artificial neural network (ANN) versus decision tree model. Arch Iran Med 2014;17:837–843. [PubMed] [Google Scholar]
22. Sebastiani F. Machine learning in automated text categorization. Acm Comput Surv 2002;34:1–47. [Google Scholar]
23. Thompson JF, Scolyer RA. Cooperation between surgical oncologists and pathologists: a key element of multidisciplinary care for patients with cancer. Pathology 2004;36:496–503. [DOI] [PubMed] [Google Scholar]
24. Willard MD. Feline inflammatory bowel disease: a review. J Feline Med Surg 1999;1:155–164. [DOI] [PMC free article] [PubMed] [Google Scholar]
25. Witten IH, et al. Data Mining: Practical Machine Learning Tools and Techniques. Amsterdam: Elsevier, 2016:363–423. [Google Scholar]
26. Zarbo RJ, Fenoglio-Preiser CM. Interinstitutional database for comparison of performance in lung fine-needle aspiration cytology. A College of American Pathologists Q-Probe Study of 5264 cases with histologic correlation. Arch Pathol Lab Med 1992;116:463–470. [PubMed] [Google Scholar]
27. Zelic I, et al. Induction of decision trees and Bayesian classification applied to diagnosis of sport injuries. J Med Syst 1997;21:429–444. [DOI] [PubMed] [Google Scholar]
28. Zhang X, et al. Ontology driven decision support for the diagnosis of mild cognitive impairment. Comput Methods Programs Biomed 2014;113:781–791. [DOI] [PubMed] [Google Scholar]

[bibr1-1040638717744002] 1. Abbass HA. An evolutionary artificial neural networks approach for breast cancer diagnosis. Artif Intell Med 2002;25:265–281. [DOI] [PubMed] [Google Scholar]

[bibr2-1040638717744002] 2. Al-Omari FA, et al. An intelligent decision support system for quantitative assessment of gastric atrophy. J Clin Pathol 2011;64:330–337. [DOI] [PubMed] [Google Scholar]

[bibr3-1040638717744002] 3. Barrs VR, Beatty JA. Feline alimentary lymphoma: 2. Further diagnostics, therapy and prognosis. J Feline Med Surg 2012;14:191–201. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr4-1040638717744002] 4. Blum AL, Langley P. Selection of relevant features and examples in machine learning. Artif Intell 1997;97:245–271. [Google Scholar]

[bibr5-1040638717744002] 5. Branavan SRK, et al. Learning document-level semantic properties from free-text annotations. J Artif Intell Res 2009;34:569–603. [Google Scholar]

[bibr6-1040638717744002] 6. Carreras JK, et al. Feline epitheliotropic intestinal malignant lymphoma: 10 cases (1997–2000). J Vet Intern Med 2003;17:326–331. [DOI] [PubMed] [Google Scholar]

[bibr7-1040638717744002] 7. Christopher MM, Hotz CS. Cytologic diagnosis: expression of probability by clinical pathologists. Vet Clin Pathol 2004;33:84–95. [DOI] [PubMed] [Google Scholar]

[bibr8-1040638717744002] 8. Day MJ, et al. Histopathological standards for the diagnosis of gastrointestinal inflammation in endoscopic biopsy samples from the dog and cat: a report from the World Small Animal Veterinary Association Gastrointestinal Standardization Group. J Comp Pathol 2008;138(Suppl 1):S1–43. [DOI] [PubMed] [Google Scholar]

[bibr9-1040638717744002] 9. Gephardt GN, Baker PB. Lung carcinoma surgical pathology report adequacy: a College of American Pathologists Q-Probes study of over 8300 cases from 464 institutions. Arch Pathol Lab Med 1996;120:922–927. [PubMed] [Google Scholar]

[bibr10-1040638717744002] 10. Idowu MO, et al. Adequacy of surgical pathology reporting of cancer: a College of American Pathologists Q-Probes study of 86 institutions. Arch Pathol Lab Med 2010;134:969–974. [DOI] [PubMed] [Google Scholar]

[bibr11-1040638717744002] 11. Jaffe ES. The 2008 WHO classification of lymphomas: implications for clinical practice and translational research. Hematology Am Soc Hematol Educ Program 2009:523–531. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr12-1040638717744002] 12. Jergens AE, et al. Idiopathic inflammatory bowel disease in dogs and cats: 84 cases (1987–1990). J Am Vet Med Assoc 1992;201:1603–1608. [PubMed] [Google Scholar]

[bibr13-1040638717744002] 13. Jouhet V, et al. Automated classification of free-text pathology reports for registration of incident cases of cancer. Methods Inf Med 2012;51:242–251. [DOI] [PubMed] [Google Scholar]

[bibr14-1040638717744002] 14. Kushmerick N. Wrapper induction: efficiency and expressiveness. Artif Intell 2000;118:15–68. [Google Scholar]

[bibr15-1040638717744002] 15. Leslie KO, Rosai J. Standardization of the surgical pathology report: formats, templates, and synoptic reports. Semin Diagn Pathol 1994;11:253–257. [PubMed] [Google Scholar]

[bibr16-1040638717744002] 16. Lindley SW, et al. Communicating diagnostic uncertainty in surgical pathology reports: disparities between sender and receiver. Pathol Res Pract 2014;210:628–633. [DOI] [PubMed] [Google Scholar]

[bibr17-1040638717744002] 17. Miller PR. Inpatient diagnostic assessments: 2. Interrater reliability and outcomes of structured vs. unstructured interviews. Psychiatry Res 2001;105:265–271. [DOI] [PubMed] [Google Scholar]

[bibr18-1040638717744002] 18. Miller PR, et al. Inpatient diagnostic assessments: 1. Accuracy of structured vs. unstructured interviews. Psychiatry Res 2001;105:255–264. [DOI] [PubMed] [Google Scholar]

[bibr19-1040638717744002] 19. Noureldien NA, et al. The effect of feature selection on detection accuracy of machine learning algorithms. Int J Eng Res Tech 2013;2:1407–1410. [Google Scholar]

[bibr20-1040638717744002] 20. Powsner SM, et al. Clinicians are from Mars and pathologists are from Venus. Arch Pathol Lab Med 2000;124:1040–1046. [DOI] [PubMed] [Google Scholar]

[bibr21-1040638717744002] 21. Rezaei-Darzi E, et al. Comparison of two data mining techniques in labeling diagnosis to Iranian pharmacy claim dataset: artificial neural network (ANN) versus decision tree model. Arch Iran Med 2014;17:837–843. [PubMed] [Google Scholar]

[bibr22-1040638717744002] 22. Sebastiani F. Machine learning in automated text categorization. Acm Comput Surv 2002;34:1–47. [Google Scholar]

[bibr23-1040638717744002] 23. Thompson JF, Scolyer RA. Cooperation between surgical oncologists and pathologists: a key element of multidisciplinary care for patients with cancer. Pathology 2004;36:496–503. [DOI] [PubMed] [Google Scholar]

[bibr24-1040638717744002] 24. Willard MD. Feline inflammatory bowel disease: a review. J Feline Med Surg 1999;1:155–164. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr25-1040638717744002] 25. Witten IH, et al. Data Mining: Practical Machine Learning Tools and Techniques. Amsterdam: Elsevier, 2016:363–423. [Google Scholar]

[bibr26-1040638717744002] 26. Zarbo RJ, Fenoglio-Preiser CM. Interinstitutional database for comparison of performance in lung fine-needle aspiration cytology. A College of American Pathologists Q-Probe Study of 5264 cases with histologic correlation. Arch Pathol Lab Med 1992;116:463–470. [PubMed] [Google Scholar]

[bibr27-1040638717744002] 27. Zelic I, et al. Induction of decision trees and Bayesian classification applied to diagnosis of sport injuries. J Med Syst 1997;21:429–444. [DOI] [PubMed] [Google Scholar]

[bibr28-1040638717744002] 28. Zhang X, et al. Ontology driven decision support for the diagnosis of mild cognitive impairment. Comput Methods Programs Biomed 2014;113:781–791. [DOI] [PubMed] [Google Scholar]

PERMALINK

Identifying free-text features to improve automated classification of structured histopathology reports for feline small intestinal disease

Abdullah Awaysheh

Jeffrey Wilcke

François Elvinger

Loren Rees

Weiguo Fan

Kurt Zimmerman

Abstract

Introduction

Materials and methods

Data mining

Structured (WSAVA) data transformation

Unstructured (free-text) data transformation

Table 1.

Structured and unstructured feature selection

New candidate features for structured reports

Classification models

Results

Features selected from the structured reports

Features selected from the unstructured reports

Table 2.

Table 3.

Table 4.

New candidate features identified

Classification model performance

Table 5.

Table 6.

Discussion

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Identifying free-text features to improve automated classification of structured histopathology reports for feline small intestinal disease

Abdullah Awaysheh

Jeffrey Wilcke

François Elvinger

Loren Rees

Weiguo Fan

Kurt Zimmerman

Abstract

Introduction

Materials and methods

Data mining

Structured (WSAVA) data transformation

Unstructured (free-text) data transformation

Table 1.

Structured and unstructured feature selection

New candidate features for structured reports

Classification models

Results

Features selected from the structured reports

Features selected from the unstructured reports

Table 2.

Table 3.

Table 4.

New candidate features identified

Classification model performance

Table 5.

Table 6.

Discussion

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases