TABLE 1.
Corpus | Type of text | Document count N | Word count |
---|---|---|---|
Training | Paragraph only | 13 | 26 ± 14 |
Caption only | 85 | 13 ± 16 | |
Test | Paragraph | 81a | 44 ± 42 |
Caption | 81a | 17 ± 20 | |
Test and Training | Paragraph | 94 | 42 ± 40 |
Caption | 166 | 15 ± 18 |
In the test corpus, one radiograph showed two hands and had to be split in two images (one for each hand) to make IRMA indexing possible. For text processing, however, the same caption and paragraph text had to be used for both images.