Table 4.
Performance of classification models using only lexical features according to different evaluation metrics for the task of annotating adolescent interview session transcripts. Highest value for each metric and codebook size across all models is highlighted in boldface.
| Cls. | Model | Acc. | Prec. | Rec. | F1 | Kappa |
|---|---|---|---|---|---|---|
| 17 | NB | 0.544 | 0.603 | 0.544 | 0.552 | 0.497 |
| NB-M | 0.670 | 0.662 | 0. 670 | 0.643 | 0.622 | |
| J48 | 0.595 | 0.573 | 0.595 | 0.580 | 0.539 | |
| AdaBoost | 0.627 | 0.600 | 0.627 | 0.609 | 0.574 | |
| RF | 0.670 | 0.662 | 0.670 | 0.625 | 0.616 | |
| DiscLDA | 0.477 | 0.454 | 0.477 | 0.431 | 0.388 | |
| CNN | 0.678 | 0.633 | 0.678 | 0.670 | 0.509 | |
| SVM | 0.708 | 0.705 | 0.708 | 0.680 | 0.663 | |
| 20 | NB | 0.487 | 0.509 | 0.487 | 0.482 | 0.448 |
| NB-M | 0.579 | 0.582 | 0.579 | 0.559 | 0.537 | |
| J48 | 0.479 | 0.467 | 0.479 | 0.470 | 0.431 | |
| AdaBoost | 0.504 | 0.488 | 0.504 | 0.493 | 0.458 | |
| RF | 0.563 | 0.564 | 0.563 | 0.519 | 0.514 | |
| DiscLDA | 0.400 | 0.410 | 0.400 | 0.356 | 0.330 | |
| CNN | 0.586 | 0.588 | 0.586 | 0.587 | 0.476 | |
| SVM | 0.610 | 0.611 | 0.610 | 0.592 | 0.571 | |
| 41 | NB | 0.406 | 0.434 | 0.406 | 0.405 | 0.375 |
| NB-M | 0.513 | 0.479 | 0.513 | 0.484 | 0.478 | |
| J48 | 0.396 | 0.375 | 0.396 | 0.382 | 0.356 | |
| AdaBoost | 0.436 | 0.412 | 0.436 | 0.421 | 0.398 | |
| RF | 0.495 | 0.487 | 0.495 | 0.453 | 0.455 | |
| DiscLDA | 0.362 | 0.387 | 0.362 | 0.301 | 0.304 | |
| CNN | 0.396 | 0.369 | 0.396 | 0.382 | 0.170 | |
| SVM | 0.537 | 0.513 | 0.537 | 0.504 | 0.502 | |