Skip to main content
. 2019 Feb 5;14(2):e0211735. doi: 10.1371/journal.pone.0211735

Table 1. Correlations between human- and computer-generated valence ratings.

Model:
Data Set
Correlation [95% CI]
r ICC(1)
(+) (–) (+) (–)
RF Ratings:
Training
.89 [.88, .90] .77 [.75, .78] .88 [.87, .89] .71 [.69, .72]
RF Ratings:
Test
.88 [.87, .89] .74 [.72, .77] .87 [.86, .88] .68 [.65, .71]
FACET Ratings: Training + Test .71 [.70, .73] .40 [.38, .43] -.43 [-.46, -.41] -.22 [-.25, -.20]

Notes. (+) = positive valence ratings; (–) = negative valence ratings; r = Pearson’s correlation; ICC = Intraclass correlation coefficient. Training and test sets contained 3,060 and 1,588 recordings, respectively. Note that because FACET’s default positive and negative valence scores were not informed by our dataset, we present the correlations of FACET scores across the entire dataset as opposed to separately for training and test sets. ICC(1) scores are not necessarily interpretable for FACET’s positive and negative affect scores because FACET’s scale of measurement is arbitrary (i.e. ranging from about -16 to +16), whereas the human coders made judgements on a meaningful 1–7 scale. Nevertheless, we report them for completeness.