Skip to main content
. Author manuscript; available in PMC: 2009 Jul 24.
Published in final edited form as: J Am Soc Inf Sci Technol. 2008 Sep 18;60(1):123–134. doi: 10.1002/asi.20955

TABLE 1.

Length of caption and paragraph text in the corpora.

Corpus Type of text Document count N Word count
Training Paragraph only 13 26 ± 14
Caption only 85 13 ± 16
Test Paragraph 81a 44 ± 42
Caption 81a 17 ± 20
Test and Training Paragraph 94 42 ± 40
Caption 166 15 ± 18
a

In the test corpus, one radiograph showed two hands and had to be split in two images (one for each hand) to make IRMA indexing possible. For text processing, however, the same caption and paragraph text had to be used for both images.