Table 4.
Corpus statistics
Name | Description | Filter | # | % |
---|---|---|---|---|
all TTE reports | 70441 | 100.0 | ||
only relevant sites | f site | 68915 | 97.8 | |
T d | dominant layouts | fsite, fchar≥800, | 63489 | 90.1 |
T u | mostly unstructured | fsite, fchar≥100, fchar<800, | 2712 | 3.9 |
T c | uncommon layout | fsite, fli | 1041 | 1.5 |
mostly defective | fsite, fchar<100 | 1673 | 2.4 |
fsite: filter that excludes three sites of the hospital. fchar≥n: require at least n non white space characters. fli: at least 5 list elements