Skip to main content
. Author manuscript; available in PMC: 2008 Aug 14.
Published in final edited form as: Pac Symp Biocomput. 2007:233–244.

Table 2.

The fault model and results of its application to Johnson et al.’s erroneous outputs. The rows in bold are the subtotaled percentages of the broad categories of errors in relation to all errors. The non-bolded rows indicate the percentages of the subtypes of errors in relation to the broad category that they belong to. The counts for the subtypes of text processing errors exceed the total text processing count because multiple types of text processing errors can contribute to one erroneously matched relationship.

Inter-judge agreement
Type of error Percent Count pre-resolution post-resolution
Lexical ambiguity errors
biological polysemy 56% (105/186) 86% 98%
ambiguous abbreviation 44% (81/186) 96% 99%
Lexical Ambiguity Total 38% (186/481)

Text processing errors
Stemming 6% (29/449) 100% 100%
digit removal 51% (231/449) 100% 100%
punctuation removal 27% (123/449) 100% 100%
stop word removal 14% (65/449) 99% 100%
Text Processing Total 60% (290/481)

Matched Metadata Total 1% (5/481) 100% 100%
Total 99% (481/481) 95% 99%