Table 2.
|
Grammar |
|
GG1 |
GG2 |
GG3 |
GG4 |
GG5 |
GG6 |
Best |
---|---|---|---|---|---|---|---|---|---|
Grammar found by | Local | IO | IO | CYK | CYK | CYK | |||
CYK |
Sensitivity |
0.496 |
0.505 |
0.330 |
0.374 |
0.474 |
0.469 |
0.526 |
0.675 |
|
PPV |
0.479 |
0.481 |
0.258 |
0.322 |
0.454 |
0.467 |
0.479 |
0.585 |
|
F–score |
0.478 |
0.441 |
0.426 |
0.435 |
0.461 |
0.339 |
0.461 |
0.622 |
IO |
Sensitivity |
0.387 |
0.392 |
0.408 |
0.413 |
0.373 |
0.404 |
0.410 |
0.450 |
|
PPV |
0.552 |
0.517 |
0.551 |
0.550 |
0.566 |
0.556 |
0.583 |
0.584 |
F–score | 0.461 | 0.443 | 0.473 | 0.470 | 0.449 | 0.471 | 0.488 | 0.493 |
The sensitivities, PPVs, and F–scores of grammars GG1–GG6 and on the evaluation set, using different methods of training and testing. ‘CYK’ indicates that the CYK algorithm was used, and ‘IO’ that the inside and outside algorithms were used. The column ‘Best’ was calculated by selecting, for each structure, the prediction with the highest F–score, and then recording the sensitivity, PPV, and F–score for that prediction. It is perhaps not surprising that the ‘best’ predictions for CYK are better than the ‘best’ predictions for IO, as IO is in some sense averaging over all predictions. One might expect the predictions to be more similar than those from CYK, as seen by comparing IO values for GG6 and ‘best’, giving less increase when considering those with best F–score.