. 2020 Oct 13;7:347. doi: 10.1038/s41597-020-00680-2

Table 3.

Movie word and face annotation information.

Movie	Words						Faces
	On and Offsets (%)				Truncated	N
	Matched/Similar	Estimated
	Matched/Similar	Continuous	Partial	Full			>95% (%)	Time (%)
500 Days of Summer	65.46	4.57	21.84	8.12	4.13	8,286.00	93.15	80.83
Citizenfour	80.68	3.82	14.62	0.88	1.32	13,936.00	93.04	70.79
12 Years a Slave	67.41	6.06	19.66	6.86	3.64	7,984.00	88.48	77.54
Back to the Future	72.52	4.40	17.32	5.77	2.35	8,634.00	89.85	71.21
Little Miss Sunshine	72.48	3.18	22.47	1.87	3.12	8,555.00	87.96	79.17
The Prestige	77.22	4.69	15.19	2.89	2.39	10,954.00	88.84	77.09
Pulp Fiction	73.14	4.13	18.35	4.38	2.77	16,155.00	88.88	79.63
The Shawshank Redemption	81.62	4.92	10.86	2.60	2.12	11,779.00	85.30	78.55
Split	82.21	4.34	8.58	4.88	2.09	7,032.00	96.27	70.13
The Usual Suspects	84.80	3.36	10.61	1.23	1.27	9,913.00	94.94	74.12
Mean	75.75	4.35	15.95	3.95	2.52	10,322.80	90.67	75.91
SD	6.57	0.82	4.83	2.46	0.92	2,909.40	3.49	4.01

The on and offsets of words were obtained from machine learning-based speech-to-text transcriptions. Dynamic time warping was used to align these to subtitles. If words in a subtitle page ‘Matched’ or were ‘Similar’ to words in the transcript, it received the transcript timing. Otherwise it was estimated. ‘Continuous’ estimations are single subtitle words inheriting the start and end time from the end of the prior and start of the next transcribed word. ‘Partial’ estimations are similar but involve two or more missing words between transcribed words. ‘Full’ estimations occured when no words were transcribed and words were estimated from the start and end time of the subtitle page. When word lengths were unreasonable, they were ‘Truncated’. This procedure resulted in an average number (‘N’) of >10,000 words per movie. The on and offsets of faces were also obtained from a machine learning-based approach. The final two columns are the average percentage of face labels with >95% confidence and the percent of time faces were on screen.