The impact of text segmentation on subtitle reading

Olivia Gerber-Morón; Agnieszka Szarkowska; Bencie Woll

doi:10.16910/jemr.11.4.2

. 2018 Jun 30;11(4):10.16910/jemr.11.4.2. doi: 10.16910/jemr.11.4.2

The impact of text segmentation on subtitle reading

Olivia Gerber-Morón ¹, Agnieszka Szarkowska ^{2, 3,}^{2, 3}, Bencie Woll ²

PMCID: PMC7901653 PMID: 33828704

Abstract

Understanding the way people watch subtitled films has become a central concern for subtitling researchers in recent years. Both subtitling scholars and professionals generally believe that in order to reduce cognitive load and enhance readability, line breaks in twoline subtitles should follow syntactic units. However, previous research has been inconclusive as to whether syntactic-based segmentation facilitates comprehension and reduces cognitive load. In this study, we assessed the impact of text segmentation on subtitle processing among different groups of viewers: hearing people with different mother tongues (English, Polish, and Spanish) and deaf, hard of hearing, and hearing people with English as a first language. We measured three indicators of cognitive load (difficulty, effort, and frustration) as well as comprehension and eye tracking variables. Participants watched two video excerpts with syntactically and non-syntactically segmented subtitles. The aim was to determine whether syntactic-based text segmentation as well as the viewers’ linguistic background influence subtitle processing. Our findings show that non-syntactically segmented subtitles induced higher cognitive load, but they did not adversely affect comprehension. The results are discussed in the context of cognitive load, audiovisual translation, and deafness.

Keywords: eye movement, reading, region of interest, subtitling, audiovisual translation, media accessibility, cognitive load, segmentation, line breaks, revisits

Introduction

In the modern world, we are surrounded by screens, captions, and moving images more than ever before. Technological advancements and accessibility legislation, such as the United Nations Convention on the Rights of Persons with Disabilities (2006), Audiovisual Media Services Directive or the European Accessibility Act,have empowered different types of viewers across the globe in accessing multilingual audiovisual content. Viewers who do not know the language of the original production or people who are deaf or hard of hearing can follow film dialogues thanks to subtitles (15).

Because watching subtitled films requires viewers to follow the action, listen to the soundtrack and read the subtitles, it is important for subtitles to be presented in a way that facilitates rather than hampers reading (12, 21). Some typographical subtitle parameters, such as small font size, illegible typeface or optical blur, have been shown to impede reading (2, 63). In this study, we examine whether segmentation, i.e. the way text is divided across lines in a two-line subtitle, affects the subtitle reading process. We predict that segmentation not aligned with grammatical structure may have a detrimental effect on the processing of subtitles.

Readability and syntactic segmentation in subtitles

The general consensus among scholars in audiovisual translation, media regulation, and television broadcasting is that to enhance readability, linguistic phrases in two-line subtitles should not be split across lines (6, 12, 18, 21, 42). For instance, subtitle (1a) below is an example of correct syntactic-based line segmentation, whereas in (1b) the indefinite article “a” is incorrectly separated from the accompanying noun phrase (6).

(1a)

We are aiming to get

a better television service.

(1b)

We are aiming to get a

better television service.

The underlying assumption is that more cognitive effort is required to process text when it is not segmented according to syntactic rules (44). However, segmentation rules are not always respected in the subtitling industry. One of the reasons for this might be the cost: editing text in subtitles requires human time and effort, and as such is not always cost-effective. Another reason is that syntactic-based segmentation may require substantial text reduction in order to comply with maximum line length limits. As a result, when applying syntactic rules to segmentation of subtitles, some information might be lost. Following this line of thought, BBC subtitling guidelines (6) stress that well-edited text and synchronisation should be prioritized over syntactically-based line breaks.

The widely held belief that words “intimately connected by logic, semantics, or grammar” should be kept in the same line whenever possible (18, p. 77) may be rooted in the concept of parsing in reading (51, p. 216). Parsing, i.e. the process of identifying which groups of words go together in a sentence (68), allows a text to be interpreted incrementally as it is read. It has been reported that “line breaks, like punctuation, may have quite profound effects on the reader’s segmentation strategies” (23, p. 56). Insight into these strategies can be obtained through studies of readers’ eye movements, which reflect the process of parsing: longer fixation durations, higher frequency of regressions, and longer reading time may be indicative of processing difficulties (49). An inappropriately placed line break may lead a reader to incorrectly interpret the meaning and structure, luring the reader into a parse that turns out to be a dead end or yield a clearly unintended reading – a so-called “garden path” experience (14, 51). The reader must then reject their initial interpretation and re-read the text. This takes extra time and, as such, is unwanted in subtitling, which is supposed to be as unobtrusive as possible and should not interfere with the viewer’s enjoyment of the moving images (12).

Despite a substantial body of experimental research on subtitling (7, 9, 10, 24, 29, 30, 47, 61), the question of whether text segmentation affects subtitle processing (44) still remains unanswered. Previous research is inconclusive as to whether linguistically segmented text facilitates subtitle processing and comprehension. Contrary to arguments underpinning professional subtitling recommendations, Perego, Del Missier, Porta, & Mosconi (46), who used eye-tracking to examine subtitle comprehension and processing, found no disruptive effect of “syntactically incoherent” segmentation of noun phrases on the effectiveness of subtitle processing in Italian. In their study, the number of fixations and saccadic crossovers (i.e. gaze jumps between the image and the subtitle) did not differ between the syntactically segmented and non-segmented conditions. In contrast, in a study on live subtitling, Rajendran, Duchowski, Orero, Martínez, & Romero-Fresco (48) showed benefits of linguistically-based segmentation by phrase, which induced fewer fixations and saccadic crossovers, and resulted in shortest mean fixation duration, together indicating less effortful processing.

Ivarsson & Carroll (18) noted that “matching line breaks with sense blocks is especially important for viewers with any kind of linguistic disadvantage, e.g. immigrants or young children learning to read or the deaf with their acknowledged reading problems” (p. 78). Indeed, early deafness is strongly associated with reading difficulties (37, 41). Researchers investigating subtitle reading by deaf viewers have demonstrated processing difficulties resulting in lower comprehension and more time spent by deaf viewers on reading subtitles (25, 26, 60). Lack of familiarity with subtitling is another aspect which may affect the way people read subtitles. In a recent study, Perego et al. (47) found that subtitling can hinder viewers accustomed to dubbing from fully processing film images, especially in the case of structurally complex subtitles.

Cognitive load

Watching a subtitled video is a complex task: not only do viewers need to follow the dynamically unfolding on-screen actions, accompanied by various sounds, but they also need to read the subtitles (32). This complex processing task may be hindered by poor quality subtitles, possibly including aspects such as non-syntactic segmentation. The processing of subtitles has been previously studied in association with the concept of cognitive load (27), rooted in cognitive load theory (CLT) and instructional design (56). Drawing on the central tenet of CLT, the design of materials should aim at reducing any unnecessary load to free the processing capacity for task-related activities (58).

In the initial formulation of CLT, two types of cognitive load were distinguished: intrinsic and extraneous (8). Intrinsic cognitive load is related to the complexity and characteristics of the task (54). Extraneous load relates to how the information is presented; if presentation is inefficient, learning can be hindered (57). For instance, too many colours or blinking headlines in a lecture presentation can distract students rather than help them focus, wasting attentional resources on taskirrelevant details (54). Later studies in CLT also distinguish the concept of ‘germane cognitive load’ and, more recently, ‘germane resources’ (54, 57). It is believed that germane load is not imposed by the characteristics of the materials and germane resources should be “high enough to deal with the intrinsic cognitive load caused by the content” (54). In this paper, we set out to test whether non-syntactically segmented text may strain working memory capacity and prevent viewers from efficiently processing subtitled videos. It is our contention that just as the goal of instructional designers is to foster learning by keeping extraneous cognitive load as low as possible (54), so it is the task of subtitlers to reduce the extraneous load on viewers, enabling them to focus on what is important during the filmwatching experience.

The concept of cognitive load encompasses different categories (58, 67). Mental effort is understood, following Paas, Tuovinen, Tabbers, & Van Gerven (43, p. 64) and Sweller et al. (57, p. 73), as “the aspect of cognitive load that refers to the cognitive capacity that is actually allocated to accommodate the demands imposed by the task”. As mental effort invested in a task is not necessarily equal to the difficulty of the task, difficulty is a construct distinct from effort (66). Drawing on the multidimensional NASA Task Load Index (16), some researchers also included other aspects of cognitive load, such as temporal demand, performance, and frustration with the task (57). Apart from effort, difficulty and frustration, of particular importance in the present study is performance, operationalised here as comprehension score, which demonstrates how well a person carried out the task. Performance may be positively affected by lower cognitive load, as there is more unallocated processing capacity to carry out the task. As the task complexity increases, more effort needs to be expended to keep the performance at the same level (43).

Cognitive load can be measured using subjective or objective methods (27, 57). Subjective cognitive load measurement is usually done indirectly using rating scales (43, 54), where people are asked to rate their mental effort or the perceived difficulty of a task on a 7- or 9-point Likert scale, ranging from “very low” to “very high” (66). Subjective rating scales have been criticised for using only one single item (usually either mental load or difficulty) in assessing cognitive load (54). Yet, they have been found to effectively show the correlations between the variation in cognitive load reported by people and the variation in the complexity of the task they were given (43). According to Sweller et al. (57), “the simple subjective rating scale [...], has, perhaps surprisingly, been shown to be the most sensitive measure available to differentiate the cognitive load imposed by different instructional procedures” (p. 74). The problem with rating scales is they are applied to the task as a whole, after it has been completed. In contrast, objective methods, which include physiological tools such as eye tracking or electroencephalography (EEG), enable researchers to see fluctuations in cognitive load over time (3, 65). Higher number of fixations and longer fixation durations are generally associated with higher processing effort and increased cognitive load (17, 28). In our study, we combine subjective rating scales with objective eye-tracking measures to obtain a more reliable view on cognitive load during the task of subtitle processing.

Various types of measures have been used to evaluate cognitive load in subtitling. Some previous studies have used subjective post-hoc rating scales to assess people’s cognitive load when watching subtitled audiovisual material (27, 30, 69); subtitlers’ cognitive load when producing live subtitles with respeaking (62); or the level of translation difficulty (55). Some studies on subtitling have used eye tracking to examine cognitive load and attention distribution in a subtitled lecture (30); cognitive load while reading edited and verbatim subtitles (60); or the processing of native and foreign subtitles in films (7); to mention just a few. Using both eye tracking and subjective self-report ratings, Łuczak (34) tested the impact of the language of the soundtrack (English, Hungarian, or no audio) on viewers’ cognitive load. Kruger, Doherty, Fox, et al. (28) combined eye tracking, EEG and selfreported psychometrics in their examination of the effects of language and subtitle placement on cognitive load in traditional intralingual subtitling and experimental integrated titles. For a critical overview of eye tracking measures used in empirical research on subtitling, see (13), and of the applications of cognitive load theory to subtitling research, see Kruger & Doherty (27).

Overview of the current study

The main goal of this study is to test the impact of segmentation on subtitle processing. With this goal in mind, we showed participants two videos: one with syntactically segmented text in the subtitles (SS) and one where text was not syntactically segmented (NSS). In order to compensate for any differences in the knowledge of source language and accessibility of the soundtrack to deaf and hearing participants, we used videos where the soundtrack was in Hungarian – a language that participants could not understand.

All subtitles in this study were shown in English. The reason for this is threefold. First, the noncompliance with the subtitling guidelines with regard to text segmentation and line breaks is particularly visible on British television in Englishto-English subtitling. Although the UK is the leader in subtitling when it comes to the quantity of subtitle provision, with many TV channels having 100% subtitling to its programmes, the quality of prerecorded subtitles is often below professional subtitling standards with regard to subtitle segmentation. Another reason for using English – as opposed to showing participants subtitles in their respective mother tongues – was to ensure identical linguistic structures in the subtitles. A final reason for using English is that, as participants live in the UK, they are able to watch English subtitles on television. The choice of English subtitles is therefore ecologically valid.

We measured participants’ cognitive load and comprehension as well as a number of eye tracking variables. Following the established method of measuring self-reported cognitive load previously used by Kruger et al. (30), (61), and Łuczak (34), we measured three aspects of cognitive load: perceived difficulty, effort, and frustration, using subjective 17 rating scales (54). We also related viewers’ cognitive load to their performance, operationalised here as comprehension score. Based on the subtitling literature (45), we predicted that non-syntactically segmented text in subtitles would result in higher cognitive load and lower comprehension. We hypothesised that subtitles in the NSS condition would be more difficult to read because of increased parsing difficulties and extra cognitive resources which might be expended on additional processing.

In terms of eye tracking, we hypothesised that people would spend more time reading subtitles in the NSS condition. To measure this, we calculated the absolute reading time and proportional reading time of subtitles as well as fixation count in the subtitles. Absolute reading time is the time the viewers spent in the subtitle area, measured in milliseconds, whereas proportional reading time is a percentage of time spent in the subtitle area relative to subtitle duration (11, 24). Furthermore, because we thought that the non-syntactically segmented text would be more difficult to process, we also expected higher mean fixation duration and more revisits to the subtitle area in the NSS condition (17, 50, 51).

To address the contribution of hearing status and experience with subtitling to cognitive processing, our study includes British viewers with varying hearing status (deaf, hard of hearing, and hearing), and hearing native speakers of different languages: Spanish people, who grew up in a country where the dominant type of audiovisual translation is dubbing, and Polish people, who come from the tradition of voice-over and subtitling. We conducted two experiments: Experiment 1 with hearing people from the UK, Poland, and Spain, and Experiment 2 with English hearing, hard of hearing and deaf people. We predicted that for those who are not used to subtitling, cognitive load would be higher, comprehension would be lower and time spent in the subtitle would be higher, as indicated by absolute reading time, fixation count and proportional reading time.

By using a combination of different research methods, such as eye tracking, self-reports, and questionnaires, we have been able to analyse the impact of text segmentation on the processing of subtitles, modulated by different linguistic backgrounds of viewers. Examining these issues is particularly relevant from the point of view of current subtitling standards and practices.

Methods

The study took place at University College London and was part of a larger project on testing subtitle processing with eye tracking. In this paper, we report the results from two experiments using the same methodology and materials: Experiment 1 with hearing native speakers of English, Polish, and Spanish; and Experiment 2 with hearing, hard of hearing, and deaf British participants. The Englishspeaking hearing participants are the same in both experiments. In each of the two experiments, we employed a mixed factorial design with segmentation (syntactically segmented vs. nonsyntactically segmented) as the main within-subject independent variable, and language (Exp. 1) or hearing loss (Exp. 2) as a between-subject factor.

All the study materials and results are available in an open data repository RepOD hosted by the University of Warsaw (59).

Participants

Participants were recruited from the UCL Psychology pool of volunteers, social media (Facebook page of the project, Twitter), and personal networking. Hard of hearing participants were recruited with the help of the National Association of Deafened People. Deaf participants were also contacted through the UCL Deafness, Cognition, and Language Research Centre participant pool. Participants were required not to know Hungarian.

Table 1.

Demographic information on participants

Experiment 1
		English	Polish	Spanish
Gender	Male	13	5	10
	Female	14	16	16
Age	Mean	27.59	24.71	28.12
	(SD)	(7.79)	(5.68)	(5.88)
	Range	20-54	19-38	19-42
Experiment 2
		Hearing	Hard of hearing	Deaf
Gender	Male	13	2	4
	Female	14	8	5
Age	Mean	27.59	46.40	42.33
	(SD)	(7.79)	(12.9)	(14.18)
	Range	20-54	22-72	24-74

	English	Polish	Spanish
Subtitling	24	11	22
Dubbing	0	0	1
Voice-over	1	0	0
I watch films in their original version	1	10	3
I never watch foreign films	1	0	0

	English	Polish	Spanish
Secondary education	5	9	6
Bachelor degree	14	4	6
Master degree	8	8	13
PhD	0	0	1

Linguistic unit	Chef	Philomena
Auxiliary and lexical verb	2	2
Subject and predicate	3	3
Article and noun	3	3
Conjunction between two clauses	4	5

Linguistic unit	SS condition	NSS condition
Auxiliary and lexical verb	Now, should we have served	Now, should we have
	that sandwich?	served that sandwich?
Subject and predicate	That's my son. Get back in there.	That's my son. Get back in there. We
	We got some hungry people.	got some hungry people.
Article and noun	I've loved the hotels,	I've loved the
	the food and everything,	hotels,the food and everything,
Conjunction between two clauses	Now I've made a decision	Now I've made a decision and
	and my mind's made up.	my mind's made up.

Eye tracking measure	Description
Absolute reading time	The sum of all fixation durations and saccade durations, starting from the duration of the saccade entering the AOI, referred to in SMI software as ‘glance duration’. Longer time spent on reading may be indicative of difficulties with extracting information (17).
Proportional reading time	The percentage of dwell time (the sum of durations of all fixations and saccades in an AOI starting with the first fixation) a participant spent in the AOI as a function of subtitle display time. For example, if a subtitle lasted for 3 seconds and the participant spent 2.5 seconds in that subtitle, the proportional reading time was 2500/3000 ms = 83% (i.e. while the subtitle was displayed for 3 seconds, the participant was looking at that subtitle for 83% of the time). Longer proportional time spent in the AOI translates into less time available to follow on-screen action.
Mean fixation duration	The duration of a fixation in a subtitle AOI, averaged per clip per participant. Longer mean fixation duration may indicate more effortful cognitive processing (17).
Fixation count	The number of fixations in the AOI, averaged per clip per participant. Higher numbers of fixations have been reported in poor readers (17).
Revisits	The number of glances a participant made to the subtitle AOI after visiting the subtitle for the first time. Revisits to the AOI may indicate problems with processing, as people go back to the AOI to re-read the text.

Measure	df	F	p	𝜂_p²
Difficulty	2,71	.592	.556	.016
Effort	2,71	2.382	.100	.063
Frustration	2,71	1.850	.165	.050

	Language	Mean	(SD)
Comprehension SS	English	4.11	(1.01)
	Polish	4.48	(.81)
	Spanish	4.08	(1.09)
	Total	4.20	(.99)
Comprehension NSS	English	4.26	(1.02)
	Polish	4.76	(.43)
	Spanish	3.88	(1.21)
	Total	4.27	(1.02)

Measure	df	F	p	𝜂_p²
Absolute reading time	2,63	5.593	.006*	.151
Proportional reading time	2,63	5.398	.007*	.146
Mean fixation duration	2,63	6.166	.004*	.164
Fixation count	2,63	2.980	.058	.086
Revisits	2,63	.332	.719	.010

	Degree of hearing loss
	Hearing	Hard of hearing	Deaf	df	F	p	𝜂_p²
	M (SD)	M (SD)	M (SD)
Difficulty				1,43	6,580	.014*	.133
SS	2.37 (1.27)	1.60 (1.07)	2.56 (1.42)
NSS	2.63 (1.44)	2.20 (1.31)	3.44 (1.59)
Effort				1,43	4,372	.042*	.092
SS	2.78 (1.55)	1.60 (1.07)	2.78 (1.64)
NSS	2.89 (1.60)	2.50 (1.35)	3.44 (1.42)
Frustration				1,43	7,669	.008*	.151
SS	2.15 (1.40)	1.00 (.00)	2.56 (1.59)
NSS	3.04 (1.85)	2.10 (1.28)	3.00 (1.58)

	Deafness	Mean (SD)
Comprehension SS	Hearing	4.11 (1.01))
	Hard of hearing	4.60 (.51)
	Deaf	4.00 (.70)
	Total	4.20 (.88)
Comprehension NSS	Hearing	4.26 (1.02)
	Hard of hearing	4.50 (.70)
	Deaf	3.44 (1.23)
	Total	4.15 (1.05)

	Language
	English	Polish	Spanish	df	F	P	𝜂_p²
Difficulty				1,71	15,584	< .001*	.18
SS	2.37 (1.27)	2.05 (1.02)	1.96 (1.14)
NSS	2.63 (1.44)	2.67 (1.46)	3.42 (1.65)
Effort				1,71	7,788	.007*	.099
SS	2.78 (1.55)	1.90 (1.26)	2.23 (1.50)
NSS	2.89 (1.60)	2.43 (1.16)	3.54 (2.10)
Frustration				1,71	27,030	< .001*	.276
SS	2.15 (1.40)	1.38 (.80)	1.62 (.89)
NSS	3.04 (1.85)	2.48 (1.91)	3.27 (2.07)

	Language
	English	Polish	Spanish	df	F	p	𝜂_p²
Absolute reading time (ms)				1,63	2.950	.091	.045
SS	1614	1634	1856
NSS	1617	1529	1817
Proportional reading time				1,63	2.128	.150	.033
SS	.65	.67	.76
NSS	.66	.62	.74
Mean fixation duration (ms)				1,63	2.128	.906	.000
SS	209	194	214
NSS	211	187	218
Fixation count				1,63	2.279	.136	.035
SS	6.41	6.68	7.27
NSS	6.45	6.42	6.95
Revisits				1,63	11.839	.001*	.158
SS	.28	.27	.21
NSS	.39	.34	.36

	Hearing loss
	Hearing	Hard of hearing	Deaf	Df	F	p	𝜂_p²
Absolute reading time (ms)				1,33	1.752	.195	.050
SS	1614	1619	1222
NSS	1617	1519	1522
Proportional reading time				1,33	2.270	.141	.064
SS	.65	.66	.45
NSS	.66	.61	.62
Mean fixation duration (ms)				1,33	.199	.659	.006
SS	209	199	214
NSS	211	185	219
Fixation count				1,33	2.686	.111	.075
SS	6.41	6.73	4.63
NSS	6.45	6.45	5.90
Revisits				1,33	.352	.557	.011
SS	.28	.20	.45
NSS	39	.30	.15

PERMALINK

The impact of text segmentation on subtitle reading

Olivia Gerber-Morón

Agnieszka Szarkowska

Bencie Woll

Abstract

Introduction

Readability and syntactic segmentation in subtitles

Cognitive load

Overview of the current study

Methods

Participants

Table 1.

Table 2.

Figure 1.

Table 3.

Table 4.

Table 5.

Materials

Table 6.

Table 7.

Eye tracking recording

Dependent variables

Table 8.

Procedure

Results

Experiment 1

Cognitive load

Table 9.

Table 10.

Comprehension

Table 11.

Eye tracking measures

Table 12.

Table 13.

Experiment 2

Cognitive load

Table 14.

Comprehension

Table 15.

Eye tracking measures

Table 16.

Interviews

Discussion

Conclusions

Ethics and Conflict of Interest

Acknowledgements

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases