Table 2:
Results showing comparison of accuracy and macro-F1 scores across different interpretable models’ performance averaged over providers for the neutral-high classification. LR, DTC, SVM, RF and GBDT stand for Logistic Regression, Decision Tree Classifier, Support Vector Machine, Random Forest and Gradient Boosted Decision Trees respectively. Despite the small sample size and class imbalance, most models perform better than a macro-F1 threshold of 0.5
Social Signal | Metric | LR | DTC | SVM (Linear) | SVM (Radial) | RF | GBDT |
---|---|---|---|---|---|---|---|
Provider Dominance | Accuracy | 0.719 | 0.673 | 0.703 | 0.704 | 0.722 | 0.717 |
F1 Score | 0.658 | 0.606 | 0.650 | 0.614 | 0.651 | 0.657 | |
Provider Interactiveness | Accuracy | 0.660 | 0.641 | 0.657 | 0.690 | 0.692 | 0.616 |
F1 Score | 0.559 | 0.550 | 0.548 | 0.519 | 0.523 | 0.525 | |
Provider Engagement | Accuracy | 0.863 | 0.947 | 0.852 | 0.903 | 0.898 | 0.934 |
F1 Score | 0.547 | 0.736 | 0.566 | 0.605 | 0.651 | 0.672 | |
Provider Warmth | F1 Score | 0.556 | 0.543 | 0.581 | 0.581 | 0.571 | 0.565 |
F1 Score | 0.515 | 0.501 | 0.544 | 0.460 | 0.521 | 0.515 | |
Patient Dominance | Accuracy | 0.975 | 0.968 | 0.973 | 0.973 | 0.961 | 0.970 |
F1 Score | 0.910 | 0.841 | 0.892 | 0.893 | 0.840 | 0.842 | |
Patient Interactiveness | Accuracy | 0.651 | 0.641 | 0.660 | 0.679 | 0.647 | 0.653 |
F1 Score | 0.559 | 0.509 | 0.557 | 0.525 | 0.523 | 0.519 | |
Patient Engagement | Accuracy | 0.826 | 0.663 | 0.827 | 0.883 | 0.804 | 0.826 |
F1 Score | 0.556 | 0.529 | 0.551 | 0.599 | 0.557 | 0.594 | |
Patient Warmth | Accuracy | 0.517 | 0.544 | 0.551 | 0.634 | 0.610 | 0.605 |
F1 Score | 0.537 | 0.562 | 0.568 | 0.604 | 0.614 | 0.603 |