Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2008 Sep 1.
Published in final edited form as: Cognition. 2006 Sep 12;104(3):495–534. doi: 10.1016/j.cognition.2006.07.013

Linguistic complexity and information structure in Korean: Evidence from eye-tracking during reading

Yoonhyoung Lee a, Hanjung Lee b, Peter C Gordon a,*
PMCID: PMC2084389  NIHMSID: NIHMS14275  PMID: 16970936

Abstract

The nature of the memory processes that support language comprehension and the manner in which information packaging influences online sentence processing were investigated in three experiments that used eye-tracking during reading to measure the ease of understanding complex sentences in Korean. All three experiments examined reading of embedded complement sentences; the third experiment additionally examined reading of sentences with object-modifying, object-extracted relative clauses. In Korean, both of these structures place two NPs with nominative case marking early in the sentence, with the embedded and matrix verbs following later. The type (pronoun, name or description) of these two critical NPs was varied in the experiments. When the initial NPs were of the same type, comprehension was slowed after participants had read the sentence-final verbs, a finding that supports the view that working memory in language comprehension is constrained by similarity-based interference during the retrieval of information necessary to determine the syntactic or semantic relations between noun phrases and verb phrases. Ease of comprehension was also influenced by the association between type of NP and syntactic position, with the best performance being observed when more definite NPs (pronouns and names) were in a prominent syntactic position (e.g., matrix subject) and less definite NPs (descriptions) were in a non-prominent syntactic position (embedded subject). This pattern provides evidence that the interpretation of sentences is facilitated by consistent packaging of information in different linguistic elements.

Keywords: Linguistic complexity, Information structure, Korean, Working memory, Online comprehension, Noun phrase

1. Introduction

An important question in the investigation of human sentence processing is whether, when, and to what extent sentence comprehension can be influenced by structural and nonstructural factors. Sentences with restrictive relative clauses (RCs) are a type of complex structure that has proven very useful for exploring this issue. This is particularly so for subject-extracted and object-extracted RCs, as illustrated below.

  • (1) The lawyer that irritated the banker filed a hefty lawsuit.

  • (2) The lawyer that the banker irritated filed a hefty lawsuit.

In a subject-extracted RC, like (1), the extracted element (e.g., lawyer) serves as the unexpressed logical subject of the verb in the embedded clause (i.e., irritated). In an object-extracted RC, like (2), the extracted element is understood to function as the unexpressed logical object of the verb in the relative clause. Research using a variety of methods has shown that sentences with object-extracted RCs are harder to understand than those with subject-extracted RCs (e.g., Ford, 1983; King & Just, 1991; King & Kutas, 1995), with this difference typically being attributed to the greater demands on working memory imposed by object-extracted as compared to subject-extracted RCs. Object-extracted RCs impose these memory demands because two NPs are stacked at the beginning of the sentence before any verbs are encountered; thus, these structures create a milder version of the extreme memory demands that are seen in English for doubly center-embedded sentences, where three NPs are stacked at the beginning of a sentence. Research on the comprehension of these types of structures has played a very important role in the development of theories of human sentence processing (e.g., Caplan & Waters, 1999; Gibson, 1998; Just & Carpenter, 1992; Lewis, 1996; Miller & Chomsky, 1963).

Although object-extracted RCs are generally harder to comprehend than subject-extracted RCs, there are cases where this difference in difficulty is significantly reduced. Bever (1974) noted that doubly-center embedded sentences [e.g., (3)], which are usually nearly impossible to understand, appear to become much more intelligible when they have a mixture of different types of NPs [e.g., (4)].

  • (3) The reporter the politician the commentator met trusts said the president won't resign.

  • (4) The reporter everyone I met trusts said the president won't resign.

Greatly decreased difficulty in the processing of center-embedded structures has been attributed to the non-similarity of the two critical NPs and the accessibility/definiteness of the embedded NP. We consider each of these explanations in turn; additionally we examine patterns of linguistic markedness that may play a role in how the association of types of NPs with grammatical roles may influence sentence comprehension.

1.1. Similarity-based interference account

The similarity-based-interference account (Gordon, Hendrick, & Johnson, 2001, 2004) is based on the idea that the syntactic complexity of object-extracted RCs is compounded by the confusability of the entities represented by the NPs. That is, in (2), the representations of lawyer and banker are quite similar. Interpreting object-extracted RCs like (2) requires comprehenders to represent and successfully retrieve two similar NPs, a task that is made more difficult by the complex syntax. In a series of self-paced reading experiments, Gordon et al. (2001) tested sentences where the first (modified) NP was a description, and the (manipulated) second NP was either a description, the pronoun you, or a short, common proper name as illustrated in (5) and (6).

  • (5) The banker that the judge/you/Bob praised climbed the mountain.

  • (6) The banker that praised the judge/you/Bob climbed the mountain.

  • (7) It was the barber/John that the lawyer/Bill saw in the parking lot.

  • (8) It was the barber/John that saw the lawyer/Bill in the parking lot.

Comparing (5) and (6), they found decreased difficulty in object-extracted RCs when the second NP was the pronoun you or a proper name. Gordon, Hendrick, Johnson, and Lee (in press), using eye tracking during reading, showed that having a second NP that was a proper name decreased comprehension difficulty for object-extracted RCs at both early and late levels of processing. Additional self-paced reading experiments reported in Gordon, Hendrick, and Johnson (2004) have shown that no reduction in the difference in processing difficulty between object-extracted and subject-extracted RCs occurs when the second NP: (i) is an indefinite expression, (ii) differs in number from the first expression, (iii) is a generic expression, or (iv) is always the NP the person. The difference between object-extracted and subject-extracted RCs is significantly reduced when the second NP is the quantified expression everyone.

Similarity-based interference effects were also shown in cleft sentences such as (7) and (8) where the logical subject and object were manipulated as either proper names or definite descriptions (Gordon et al., 2001). Sentences in which both of the critical NPs were either descriptions or names yielded greater difficulty than sentences with one name and one description, and object extractions like (7) had higher error rates and longer reading times than subject extractions. Furthermore, when the two critical NPs were both descriptions or both names, object extractions increased the error rates and reading times more than subject extractions. By using a memory load task and sentence comprehension task concurrently, Gordon, Hendrick, and Levine (2002) showed the results that more errors occurred on comprehension questions when the memory load items matched the NP-type of the critical NPs in the target sentences of the comprehension task. Taken together, these results support the similarity-based interference account and underscore the importance of the memory representations that underlie language processing.

1.2. Accessibility-based account

The accessibility-based account formulated by Gibson and Warren (Gibson, 1998; Warren & Gibson, 2002) views the effects of NP types on processing difficulty in a different way. On this account, the number of discourse referents that intervenes between a filler and the site where it is attached determines how difficult it will be to integrate the filler, and hence how much disruption comprehenders will experience. The account proposes further that local person pronouns such as you, me, I and us represent a special class of referring expressions. Essentially, because local person pronouns refer to entities that are immediately available in the comprehender's environment, they impose less of a load on working memory than referring expressions such as the lawyer whose meaning must be retrieved from long-term memory.

In offline studies, Warren and Gibson (2002) showed that the rated ease of understanding object-extracted RCs in doubly center-embedded sentences decreases when local person pronouns replace other types of referring expressions. Further, Warren and Gibson (2002) found that sentence complexity is affected by gradations of the accessibility of the embedded subject, such as proposed in the definiteness hierarchy (9) (Aissen, 2003) or in the givenness hierarchy (10) (Gundel, Hedberg, & Zacharski, 1993; Ariel, 1991), not just to whether or not the embedded subject was a local person pronoun.

  • (9) Aissen's definiteness hierarchy: Pronoun > Name > Definite > Indefinite

  • (10) The Gundel et al. (1993) givenness hierarchy: Central Peripheral in focus > activated > familiar > uniquely identifiable > referential

The use of particular referring expressions such as pronouns, proper nouns, definites and indefinites is a strategy for marking the accessibility of the mental representation of discourse referents. Ariel (1991) argues that all referring expressions in all languages are arranged on a scale of accessibility, and the use of high accessibility referring expressions implies that the discourse referent has high accessibility to the addressee, while the use of low accessibility referring expressions implies that it has low accessibility to the addressee. Thus, the definiteness hierarchy can be interpreted as a ranking of accessibility markers from high accessibility markers (e.g. pronouns) to low accessibility markers (e.g. indefinites).

Warren and Gibson (2002) conducted a rating study on ease of understanding in which they varied the type of the subject NP in the object-extracted RCs across six levels of the givenness hierarchy: (1) first-/second-person pronouns; (2) third-person pronouns; (3) first names; (4) full names; (5) definite descriptions; (6) indefinite descriptions. Because they found that complexity was increasingly reduced as the definiteness of the embedded NP increases from least definite/accessible to most definite/accessible, they concluded that their findings support a referential processing theory based on accessibility and NP type, consistent with referent access models proposed by Garrod and Sanford (1994) and Myers and O'Brien (1998). In particular, Warren and Gibson (2002) account for the observed complexity differences by assuming that integrations crossing NPs from the more peripheral end of the definiteness/givenness hierarchy is more complex than integrations crossing NPs from more central levels.

1.3. Markedness and markedness reversal

The hierarchies (9) and (10) must be understood in connection with the grammatical function hierarchy ‘Subject > Non-subject’; there is a correlation between types of an NP and the grammatical function of the NP. Both pronouns and other definite NPs tend to refer to elements that are assumed to be known to the reader/listener. Pronouns, however, are generally used to refer to discourse-salient entities (including the speaker and the addressee), or discourse topics (Ariel, 1991; Garrod & Sanford, 1982, 1985; Prince, 1981). Such salient entities tend to be encoded in subject position (Gordon & Chan, 1995; Gordon, Grosz, & Gilliom, 1993; Prince, 1992; among others). Hence, pronouns occur in subject position more often than other definite NPs. Unlike definites, indefinites do not carry presuppositions of uniqueness and familiarity. Instead, they typically introduce referents. Such entities that are new to the discourse are generally introduced in non-subject positions.

The association of elements higher on the definiteness hierarchy is supported by Keenan (1976), who states that “highly referential” NPs such as pronouns and proper nouns can always be subjects, by Givón (1979), who shows that subjects are usually definite, and by the fact that in a number of languages subjects cannot be indefinites (Aissen, 2003; Diesing & Jelinek, 1995; Foley & Van Valin, 1984; Kroeger, 1993). The association of elements lower on the hierarchy with objects is supported by Keenan (1976), who cites Philippine languages in which objects cannot be definites (at least with non-relativized verbs). There are also languages like Chamorro (Chung, 1998), Mam (England, 1983) and Halkomelem (Gerdts, 1988), which exclude personal pronoun objects. All these languages resort to constructions other than simple active clauses (e.g., passive voice) to express the combination of non-pronoun agent and pronoun patient.

Although none of the structures that are disallowed in languages like Chamorro, Mam and Halkomelem are categorically avoided in English, numerous studies have shown fairly robust frequentistic definiteness effects on the choice between active and passive in English. For instance, Givón (1979) shows that indefinite subjects in English main clause active declarative sentences occur at a very low frequency – approximately 10% of English subjects are indefinite, as opposed to 90% definite. Estival and Myhill (1988) demonstrate that pronominal agents are less likely to passivize (0%) than nominal agents (5%), and that definite agents are less likely to passivize (1%) than indefinite agents (4%). They also show that pronominal patients are more likely to passivize (17%) than nominal patients (5%), and definite patients more likely to passivize (12%) than indefinite patients (4%). Svartvik (1966) finds consistently across three texts that the proportion of pronouns in subject position of passives is much higher than the proportion of pronouns in object position of actives. Similarly, the proportion of pronouns in subject position of actives is much higher than the proportion of pronouns in by-phrases of passives.

The tendency for high-prominent elements to be associated with subject function and for low-prominent elements to be associated with object function has been called markedness reversal (Aissen, 1999, 2003; Croft, 1990) – the elements at the top of the hierarchy are unmarked as subjects but marked as objects, while the elements at the bottom are marked as subjects and unmarked as objects. As demonstrated by a large body of literature on markedness, markedness reversal is not random, but is consistent across languages: languages in general treat definite subjects as unmarked but definite objects as marked.

In addition, patterns of markedness reversal can create a dissociation between frequency at the word level and frequency at the level of words in specific grammatical positions. For example, pronouns are among the most frequent words in any language, occurring with far greater frequency than nominal expressions. However, as noted above markedness reversal can cause this difference in overall frequency to vary as a function of grammatical positions, as shown by the fact that pronominal agents are less likely to passivize than nominal agents but that the reverse is true for patients. A large body of findings in experimental psycholinguistics has shown that higher overall frequency is associated with easier processing as measured in a variety of ways (e.g., Just & Carpenter, 1980; Rayner & Duffy, 1986 for evidence from eye tracking), but little-to-no evidence exists about how ease of processing is affected by the frequency with which a class of expressions appears in particular grammatical positions.

1.4. Comparison of models

As discussed in Gordon et al. (2004), the two memory-based models of complex-sentence processing (similarity-based interference and accessibility) are not incompatible; each may accurately characterize factors that contribute to processing difficulty. The similarity-based interference model focuses on how multiple NPs are represented and retrieved in working memory during sentence processing. Understanding object-extracted RCs requires that both critical NPs be stored in memory and accurately retrieved when each is needed for integration with a verb. A substantial literature on human memory indicates that such processes are impaired when the items are similar (Crowder, 1976; Gillund & Shiffrin, 1984; Hintzman, 1986). The accessibility model focuses on the ease of remembering an NP as a function of its type independently of the other NPs that must be remembered. Thus, it is possible that both the similarity of NPs and differences in the ease of remembering different types of NPs could contribute to how NP type interacts with sentence structure to influence ease of comprehension.

Linguistic analyses of markedness patterns point to another way in which NP type might interact with sentence structure to influence processing. There may be a bias for interpreting NPs with prototypical subject characteristics as the subjects of the verb and NPs with prototypical object characteristics as the objects of the verb. Research on English has focused on object-extracted RCs, a structure that deviates from canonical word ordering (subject–verb–object). In the absence of normal word-ordering cues, a bias for assigning syntactic roles to an NP based on its type could be an important factor in facilitating comprehension.

Interference, accessibility and markedness accounts of how NP types affect sentence processing are not mutually exclusive; each could explain different aspects of comprehension. However, it is worthwhile to explore the extent to which each is necessary. In particular, the interference and accessibility models make identical predictions in many of the constructions that have been studied so far, so further tests are needed to see which has greater generality. In addition, the various factors affecting comprehension may be separable in that they may impact processing at different points over the time course of comprehension. If the frequency-based cue of markedness plays a role in facilitating the determination of syntactic functions of NPs, that role would likely be seen early in comprehension, especially in the absence of the normal word order cues or morphological cues. Similarity-based interference, in contrast, is not due to the properties of individual NPs, but rather it arises from the confusable relationship between items being processed as readers put together the meanings of linguistic constituents. Hence, one can expect that this interference in memory retrieval would tend to be detectable later in the course of comprehension than markedness effects. Because most previous research on this topic has been conducted with using off-line methods, or online methods with relatively low temporal resolution, measuring such effects separately during the process of comprehension has been difficult. The current experiments use eye-tracking during reading, a method with excellent temporal resolution that allows effects to be measured during natural language comprehension (Rayner, 1998; Rayner & Pollatsek, 1989).

The three experiments reported here examine the effects of similarity, accessibility and markedness on the processing of Korean center-embedded complement clause structures and object-extracted relative clauses. Most discussions of NP type and sentence processing have been focused on complex sentences, most notably relative clauses, in English and other Germanic languages. Korean differs structurally from English and other European languages in a number of ways. Most obviously, Korean is a left-branching, head-final language with SOV (Subject–Object–Verb) structure. In Korean, it is possible to stack a large number of sentence-initial NPs without causing severe processing difficulty because the syntactic role of a noun in a sentence is cued by the particles attached to the noun. This makes it easier to study the effects of NP types in Korean than in a fixed-word order language, such as English, because there are more ways in which to create sequences of adjacent NPs.

Also, Korean orthography allows the size of a visually presented word to be held constant across types of NPs, which facilitates comparisons across conditions when using eye-tracking methodology. As discussed by Lee and Ramsey (2000), the Korean writing system combines the major features of an alphabet and a syllabary. Written Korean consists of a left-to-right sequence of characters each of which represents a syllable. In turn, each character consists of a grouping of symbols that represent the onset, nucleus and coda (if present) of the syllable. The coda is always at the bottom of the syllabic character with the nucleus above it; the onset can either be above or to the left of the nucleus depending on the style of writing that is adopted. Specific graphic features of these alphabetic symbols correspond to phonological distinctions (e.g., conveying whether a consonant is voiced or voiceless). Thus, the Korean writing system provides a highly regular, yet visually compact representation of the sound of a word.

2. Experiment 1

Experiment 1 tested center embedded complement clause structures in Korean to determine how the definiteness and similarity of two adjacent subject NPs affect processing difficulty. We varied whether the matrix subject NP and the embedded subject NP were pronouns or descriptions, as illustrated in (11)–(14). All of the nouns were three-syllable words. Interference models predict that comprehension should be easier when the type of the two critical NPs is not matched (pronoun–description or description–pronoun) than when it is matched (pronoun–pronoun or description– description). In contrast, accessibility models predict that comprehension should be easier when the critical NPs (particularly the embedded NP) are pronouns as compared to when they are descriptions. An approach to sentence processing based on markedness predicts that comprehension should be easiest when the more definite NP (a pronoun) is in matrix subject position and the less definite NP (the description) is in embedded subject position. The use in this experiment of eye-tracking during reading allows us to measure the time course of processing difficulty associated with each of the experimental conditions.

  • (11) Matched (Pronoun–Pronoun)
    • Kutul-i wuli-ka silhum-ul haysstako malhayssta.
    • 3.pl-nom 1.pl-nom experiment-acc ran said
    • ‘They said that we ran experiments’.
  • (12) Matched (Description–Description)
    • Uysa-ka haksayng-i silhum-ul haysstako malhayssta.
    • doctor-nom student-nom experiment-acc ran said
    • ‘The doctor said that the student ran experiments’.
  • (13) Non-matched (Pronoun–Description)
    • Kutul-i haksayng-i silhum-ul haysstako malhayssta.
    • 3.pl-nom student-nom experiment-acc ran said
    • ‘They said that the student ran experiments’.
  • (14) Non-matched (Description–Pronoun)
    • Uysa-ka wuli-ka silhum-ul haysstako malhayssta.
    • doctor-nom 1.pl-nom experiment-acc ran said
    • ‘The doctor said that we ran experiments’.

Center-embedded complement clause structures, like those illustrated above, are locally ambiguous in a head-final language like Korean: The sequence of the two nominative NPs can be analyzed as a nominative subject followed by a nominative object because Korean stative verbs and nonagentive verbs may assign nominative case to their object. The two nominative NPs may also be analyzed as subjects of different verbs. This is illustrated in

  • (15) a. Coargument analysis[NPnom (subj) NPnom (obj) …
    • b. Two-clause analysis[NPnom (subj) [NPnom (subj) …

However, as the third, accusative-marked NP is read, a clause boundary is created between the two nominative NPs. When the next word, the embedded form of a transitive verb, is encountered, the bi-clausal analysis is confirmed and a matrix verb is predicted:

  • (16) [NPnom (subj) [NPnom (subj) NPacc (obj) V1] V2]

There is not yet strong empirical evidence relevant to whether the preferred interpretation of the locally ambiguous part of the center-embedded complement clause structure is (15a) or (15b) (cf. Kim, ms). The co-argument analysis (15a) is syntactically simpler, and requires fewer head-argument relations. But lexical and morphological frequencies strongly bias the initial interpretation towards the two-clause analysis (15b), because the nominative marker – ka is much more frequent as a subject marker than as an object marker. It is also possible that readers do not commit themselves to an analysis of the structure before they read the word after the sequence of nominative NPs. Though this is an interesting issue in its own right, it goes beyond the focus of the present study. The crucial facts are that the first nominative NP is unambiguously a subject and that the biclausal, embedded complement structure is unambiguously specified once the accusative NP is read.

2.1. Method

2.1.1. Participants

Twenty-four Koreans in their twenties or thirties from the UNC Chapel Hill community participated in the experiment. All were native Korean speakers and received $10 for their participation. All had normal or corrected-to-normal vision.

2.1.2. Design and procedure

Forty-eight sentences were created with different matrix verbs and embedded verbs, and the test sentences were mixed with an equal number of unrelated filler sentences to form a single list of 96 sentences. In the region of the two critical NPs, all words had three syllables. There was also a sentence-initial adverb in all sentences. Four counterbalanced lists were created such that each experimental sentence appeared in only one condition in a list. Across lists, every experimental sentence occurred in all conditions. There were 10 initial warm-up sentences followed by 48 experimental and 48 filler sentences. Appendix A shows the experimental stimuli.

Participants performed the sentence reading task while wearing an SMI Eyelink eye-movement tracking device. The eye-tracker sampled pupil location at a rate of 250 Hz and it parsed the sample into fixations and saccades. The device was calibrated before the experiment began and this calibration was validated on a fixation point before each trial. After the fixation validation, participants were shown a sentence. Each Korean character subtended slightly more than one degree of visual angle, though this measure was variable because participants could move their heads. The participants were instructed to read the sentences silently at a natural pace and to press the space button as soon as they finished. Following this, a true/false comprehension statement was presented, and the participants responded by pressing “/” for true and “z” for false. Eye movements were recorded throughout the experiment. After the experiment, all participants were asked to rate the naturalness of one set of the experimental sentences. These experimental sentences were not shown during the experimental session.

2.1.3. Analyses and measures

Fixations of less than 80 ms that fell within the same word as an adjacent fixation were incorporated into larger fixations, otherwise they were deleted (e.g. Pickering, Traxler, & Crocker, 2000; Rayner, 1975, 1978). Short fixations made up 1.8% of total fixation; 0.8% were combined with longer fixations, and 1% were omitted. Fixations longer than 800 ms were trimmed to 800 ms. Only 0.4% of total fixations were longer than 800 ms. Multiple behavioral measures of sentence processing will be reported, including a metalinguistic measure, ratings of naturalness, as well as accuracy in answering comprehension questions. As online measures of sentence processing, gaze duration, right-bounded reading time, rereading time and regression-path duration will be reported. Gaze duration refers to the sum of all fixations on a region of the sentence before the eyes moves out of the region to either the left or right (Rayner, 1998). The term first-pass reading time is preferred to gaze duration when the region consists of multiple words (Rayner & Pollatsek, in press), but gaze duration is sometimes used to refer to regions up to two words (Rayner, Warren, Juhasz, & Liversedge, 2004). We use the term gaze duration because our regions of interest consist of either one or two words. Right-bounded time is the sum of all fixations on a region before the first fixation to the right of the region. (Calvo, 2001; Pickering et al., 2000; Sturt & Lombardo, 2005); right-bounded reading time has also been called quasi-first pass reading time (Traxler, Morris, & Seely, 2002). Rereading time is the total time spent fixating the region minus initial reading (gaze duration). Regression-path duration (sometimes called go-past time) counts all the time spent on the target and pre-target regions from the first fixation in a target region until fixating to the right of the target region (Liversedge, Paterson, & Pickering, 1998; Rayner & Duffy, 1986). In addition to these measures, rereading time on the critical region after direct regression from the verbs will be reported separately. As argued by a number of researchers (e.g., Mauner, Melinger, Koening, & Bienvenue, 2002; Pickering, Frisson, McElree, & Traxler, 2004; Rayner, 1998; Rayner, Sereno, Morris, Schmauder, & Clifton, 1989; among others), examining various eye movement measures is very important to obtaining a good picture of complex eye movement patterns across condition. In particular, it is important to examine eye movement measures that aggregate fixations by temporal sequences, in addition to those that aggregate by region, because temporally contiguous measures like regression-path duration can provide an index of the time a subject has spent detecting a problem and then re-reading the text prior to fixating novel linguistic material (Liversedge et al., 1998). Cases where a word (or region) was skipped during first-pass reading were not included in the computation of gaze duration, right-bounded duration or regression-path duration (see Rayner, 1998 for a discussion of reasons not to use a duration of zero for instances where a word is skipped).

2.2. Results

2.2.1. Preliminaries

Some basic measures of eye movements were calculated over the entire set of stimuli (experimental stimuli and fillers). Although these basic eye-movement characteristics are not the focus of this research, this was done because to our knowledge no papers have yet been published that present data on eye movements during the reading of Korean. In contrast, a great deal is known about eye movements during the reading of English (see Rayner, 1978, 1998 for reviews) and increasingly about Chinese (e.g. Chen, Song, Lau, Wong, & Tang, 2003; Chen & Tang, 1998; Inhoff & Liu, 1998; Rayner, Li, Juhasz, & Yan, 2005). In the current study, the average fixation duration was 216 ms, which is squarely in the range (200–250 ms) reported for the reading of English (Rayner, 1998) and similar to the value of 220 ms reported for Chinese (Inhoff, Lie, & Tang, 1999). The average length of a forward saccade was 3.4 characters as compared to 7–8 letters for English (Rayner, 1998) and 2.6 characters for Chinese (Chen et al., 2003). Presumably, this variation across languages results from differences in the horizontal compactness of the writing systems, with English being the least compact because it uses alphabetic characters, Korean intermediate because it uses syllabic characters and Chinese the most compact because it is a logographic language.

The average landing position of the first fixation on a word was slightly to the right of center for words that were two characters or less, and slightly to the left of center for words that were two characters or more. Qualitatively similar patterns are observed for English words (e.g. Radach & MaConkie, 1998; Vitu, 1991). The overall first-pass skipping rate was 9.8% for the Korean data. For English, Rayner et al. (2005) offer an overall estimate of 20% though skipping rate varies inversely with the number of letters in a word (Brysbaert & Vitu, 1998; Rayner, 1998). For Chinese, estimates of skipping rates range from 10% to 42% (Chen et al., 2003; Rayner et al., 2005; Tsai, Lee, Tzeng, Hung, & Yen, 2004). Regressive eye movements constituted 24% of the first-pass saccades, which is higher than English (15–20%) (Rayner & Pollatsek, 1989) and Chinese (10%) (Chen et al., 2003). The relatively low skipping rate and relatively high rate of regressive eye movements observed here are likely due to our focus on Korean sentences that are relatively complex. Studies examining the reading of English have shown that the proportion of regressive saccades varies with the difficulty of the reading material (e.g. Rayner & Pollatsek, 1989). In sum, these basic characteristics of eye movements during the reading of Korean are either very similar to those observed for other languages or differ from other languages in ways that can be understood in a straightforward way by considering orthographic variation across languages. These results support the use of eye movement measures as a tool to study higher-level comprehension processes in Korean as has been done for other languages.

2.2.2. Effects of experimental manipulations

Table 1 shows the mean percent correct for comprehension questions and the mean naturalness ratings for the four sentence types. There were no significant differences between the conditions in accuracy rates or rated naturalness.

Table 1.

Accuracy and naturalness rating (1–7 scale, with 1 being most natural) in Experiment 1

Desc–Desc Pron–Pron Desc–Pron Pron–Desc
Accuracy  .94  .95  .96  .96
Naturalness rating 2.67 2.50 2.56 2.53

Table 2 shows measures of reading time for five regions in the sentence: initial adverb, a critical region comprised of the two sentence-initial subject NPs (NP1 and NP2), the object (NP3), the embedded verb, and the matrix verb. For the sentence-initial adverb, the three measures of first-pass reading (gaze duration, right-bounded time and regression-path duration) are identical by definition, so we only report gaze duration and rereading for that region. For the matrix verb at the end of the stimulus sentence, regression-path duration is not meaningful because readers cannot read past the end of the sentence, therefore that measure is not reported. All of the relevant reading time measures (gaze duration, right-bounded time, regression-path duration and rereading time) are reported for the critical NP region, the object and the embedded verb.

Table 2.

Reading times for various measures in Experiment 1

Match type Match
Non-match
Sentence type Desc–Desc Pron–Pron Desc–Pron Pron–Desc
Gaze duration
 Adverb 345 321 332 331
 Critical region (NP1 + NP2) 599 590 649 517
 Object 243 249 241 233
 Embedded verb 359 333 353 333
 Matrix verb 242 224 268 215
Right-bounded reading
 Critical region (NP1 + NP2) 732 708 770 615
 Object 297 312 324 261
 Embedded verb 394 389 401 377
 Matrix verb 384 333 398 305
Regression-path duration
 Critical region (NP1 + NP2) 766 756 811 644
 Object 424 468 490 342
 Embedded verb 491 672 511 523
Rereading
 Adverb 204 168 208 213
 Critical region (NP1 + NP2) 907 866 822 612
 Object 228 286 248 182
 Embedded verb 207 198 190 146
 Matrix verb 111  79 115  65
Time on critical region after regression from verbs
543 572 520 430

2.2.3. Gaze duration

For the critical NP region, gaze duration was longer when the matrix subject (NP1) was a description (624 ms) as compared to when it was a pronoun (554 ms), an effect that was close to significant by participants and significant by items, [F1(1,23) = 3.82 MSe = 443,009, p = .063, F2(1,47) = 15.44 MSe = 83,295, p < .001]. Gaze durations for this region were longer when the embedded subject (NP2) was a pronoun (620 ms) as compared to when it was a description (558 ms), [F1(1,23) = 14.92 MSe = 62,603, p < .001, F2(1,47) = 7.19 MSe = 136,544, p < .01]. The experimental manipulations did not significantly affect gaze duration for any other region of the sentence.

2.2.4. Right-bounded reading

For the critical region, right-bounded reading time showed significant effects of the type of the matrix subject NP, with the description condition (751 ms) taking longer to read than the pronoun condition (662 ms) [F1(1,23) = 5.72 MSe = 577,739 p < .05, F2(1,47) = 15.60 MSe = 128,889, p < .001] and of the type of the embedded subject NP, with the pronoun condition (739 ms) taking longer to read than the description condition (674 ms) [F1(1,23) = 20.25 MSe = 64,982, p < .001, F2(1,47) = 6.88 MSe = 176,019, p < .05]. Right-bounded times for the matrix verb region were longer when the matrix subject was a description (391 ms) as compare to when it was a pronoun (319 ms) [F1(1,23) = 4.18 MSe = 347,779, p = .052, F2(1,47) = 31.19 MSe = 31,508, p < .001].

2.2.5. Regression-path duration

As expected based on the right-bounded reading time, regression-path duration in the critical region showed significant effects of the type of the matrix subject NP, with the description condition (789 ms) taking longer to read than the pronoun condition (705 ms) [F1(1,23) = 4.97 MSe = 463,130 p < .05, F2(1,47) = 14.67 MSe = 174,976, p < .001] and of the type of the embedded subject NP, with the pronoun condition (784 ms) taking longer to read than the description condition (705 ms) [F1(1,23) = 26.91 MSe = 75,258, p < .001, F2(1,47) = 7.40 MSe = 268,004, p < .01]. Regression-path duration in the object region showed significant (or marginal) effects of the matrix NP and the embedded NP such that times were longer for sentences with a matrix description (457 ms) and an embedded pronoun (405 ms): [F1(1,23) = 4.53 MSe = 208,161, p < .05, F2(1,47) = 4.26 MSe = 168,970, p = .051] for the matrix NP effect and [F1(1,23) = 12.66 MSe = 177,234, p < .01, F2(1,47) = 8.11 MSe = 218,553, p < .01] for the embedded NP effect (pronoun: 479 ms vs. description: 383 ms).

2.2.6. Rereading

Rereading times for the critical region (NP1 and NP2) and for the object region were longer when the matrix subject (NP1) and the embedded subject NP (NP2) were the same type as compared to when they were different types. This effect was significant for the critical region (same type 886 ms vs. different types; 717 ms) [F1(1,23) = 13.67 MSe = 551,111, p < .001, F2(1,47) = 21.37 MSe = 348,293, p < .001] and for the object region (same type; 257 ms vs. different types; 215 ms) [F1(1,23) = 7.73 MSe = 54,300, p < .05, F2(1,47) = 11.00 MSe = 55,964, p < .01].

Rereading time includes time spent reading a region after regression from any point after that region. In addition to this overall measure, time spent rereading after regressions from the verb region was examined because the reading of the verbs occasions integration of the parts of the sentence. Rereading time after regression from the verb region was longer for the critical region (NP1 and NP2) when the initial NPs were of matched type than when they were not of matched type (same type; 557 ms vs. different types; 475 ms) [F1(1,23) = 4.78 MSe = 440,058, p < .05, F2(1,47) = 7.61 MSe = 341,683, p < .01]. In addition, this rereading measure showed longer times in the critical region and the object region when the embedded subject (NP2) was a pronoun as compared to when it was a description; this effect was significant for the critical region (pronoun; 844 ms vs. description; 760 ms) [F1(1,23) = 4.33 MSe = 635,472, p < .05, F2(1,47) = 4.15 MSe = 625,362, p < .05] and for the object region (pronoun; 267 ms vs. description; 205 ms). [F1(1,23) = 8.34 MSe = 122,713, p < .01, F2(1,47) = 7.85 MSe = 111,660, p < .01].

2.3. Discussion

The results of this experiment have clear implications for the similarity-based interference (Gordon et al., 2001, 2002, 2004) and accessibility-based (Gibson, 1998; Warren & Gibson, 2002) accounts of sentence complexity effects, as well as demonstrating additional ways in which NP type affects sentence processing. Before addressing those implications, it is important to note that the observed effects cannot solely reflect the consequence of differences in word frequency across conditions. For first-pass effects, reading times were shorter when the matrix subject was a pronoun as compared to when it is a description. Because pronouns are among the most frequent words in any language, it is possible that some (or even all) of this difference is due to the greater frequency of the pronouns as compared to the descriptions. However, reading time was longer when the embedded subject was a pronoun as compared to when it was a description, a finding that is directly opposite of standard word frequency effects and of what was observed for the matrix subject. For the rereading effects, the critical comparisons involve whether the matrix and embedded subjects were matched (pronoun–pronoun or description–description) or not matched (pronoun–description or description–pronoun), which means that characteristics of individual NPs (including word frequency) had the same impact on both conditions.

2.3.1. Similarity-based interference

The results of the experiment provide support for this model as well as demonstrating some effects that the model does not predict. Support for the similarity-based interference model is found in the significant slowing of rereading of the two sentence-initial subject NPs when those NPs were the same type (two descriptions or two pronouns) as compared to when they were different types. This pattern is similar to that observed previously in English for sentences with relative clauses and clefts (Gordon et al., 2001, 2004, in press). Eye-tracking measures for the reading of English (Gordon et al., in press) showed effects of NP similarity on measures of early processing (right-bounded reading and regression-path duration) for the relative clause and the matrix verb. In contrast, similarity effects in the current experiment appeared in a measure of later processing (rereading). This difference between English and Korean in the timing of effects of similarity-based interference is not surprising given differences in the sentence structure of the two languages. Similarity-based interference is considered primarily to be a phenomenon of memory retrieval (Gordon et al., 2001, 2004). When a verb is encountered, information from the correct NPs must be retrieved from memory to fill the argument slots of the verb; this process is more difficult when NPs in memory are similar. For relative clauses in English, the verbs and NPs that must be integrated occur in the middle of the sentence, so the processing difficulty created by similarity-based interference in memory retrieval must be at least partially overcome for the reader to move past the RC and matrix verb with some understanding of the sentence. For embedded complements in Korean, the verbs occur at the end of the sentence and therefore the necessary memory retrieval operations occur late in sentence understanding. Further, those processes occur at the same time as more general sentence wrap-up processes (Rayner, Kambe, & Duffy, 2000), making it more difficult to precisely localize similarity-based interference effects in Korean as compared to English.

While similarity-based interference offers a good account of how the types of NPs in a sentence affect later stages of comprehension, it offers no account of the differential difficulty shown in measures of early processing for the two non-matched conditions. Sentences with description main clause subjects (NP1) and pronoun embedded subjects (NP2) were more difficult to process than any other kinds of sentences, while sentences with pronoun main clause subjects (NP1) and description embedded subjects (NP2) were the easiest. These findings show that early stages of sentence processing, before the parts of the sentence are integrated, are strongly influenced by the definiteness/topicality of NPs in the sentence. According to the similarity-based interference model, these characteristics of NPs do not mediate sentence complexity effects. However, the model does not address the question of whether definiteness or topicality influences other aspects of sentence processing.

2.3.2. Accessibility-based accounts

Accessibility-based models (Gibson, 1998; Warren & Gibson, 2002) do assign a major role to definiteness as a mediating factor in sentence complexity effects. According to these models, the number of discourse referents that intervenes between a head and its dependent(s) determines how difficult it will be to integrate these elements, and this process is less costly when the integration step crosses more definite elements that are correspondingly more accessible. The finding in this experiment of decreased processing time when the matrix subject (NP1) was a pronoun as compared to when it was a name is broadly consistent with such models, though in Gibson (1998) it is stipulated that there is no cost associated with the matrix subject. However, these models are challenged by the finding that having the embedded subject (NP2) be a pronoun increases reading time in measures of early processing. Accessibility models predict the opposite pattern of facilitation for these NPs. Finally, on the measures of later processing, accessibility-based models do not account for why both types of matched conditions, including the pronoun–pronoun condition, cause increased processing difficulty.

2.3.3. Markedness effects on sentence processing

The difficulty seen in measures of early processing, when the matrix subject (NP1) was a description and when the embedded subject (NP2) was a pronoun, parallels the manner in which syntactic embedding correlates with degree of definiteness and topicality. It has been observed that non-pronominal lexical NPs strongly tend to have referents whose topic status is low, and that they often occur in backgrounded portions of a discourse. Lambrecht (1986, 317) enumerates five parameters associated with the higher and lower degree of topicality of a NP referent:

(17) High Topicality: Low Topicality:
More salient referent Less salient referent
More anaphoric referent Less anaphoric referent
More specific referent Less specific referent
Higher transitivity of clause Lower transitivity of clause
Little or no syntactic embedding Frequent syntactic embedding

In a quantitative study of spoken French corpora, Lambrecht (1986) showed that lexical subject NPs tend to appear more frequently in embedded clauses than other referential expressions of higher topicality. Of the 83 non-topicalized lexical subject NPs in the three corpora Lambrecht analyzed, 22 (26.5%) were embedded. But only 10 out of the 102 occurring topicalized NPs (9.8%) were found in embedded clauses.

This parallel between the reading time results in this experiment and the moderating effect of embedding on the association between subject status and NP type, suggests that, at least in Korean, early stages of reading comprehension are influenced by the expected topicality of NP types in embedded and non-embedded positions.

3. Experiment 2

Experiment 2 was designed to further evaluate the similarity-based interference and markedness effects found in Experiment 1. It does so by replacing the pronouns used in Experiment 1 with names. Research on English has shown that names are similar to pronouns in that they have sufficient contrast with descriptions so that mixing them with descriptions reduces similarity-based interference in complex sentences (Gordon et al., 2001, in press). Thus, we predict that the effects of NP match in this experiment on Korean should be similar to those observed in the previous experiment.

In addition, names, while being less definite than pronouns, still are more definite than descriptions, leading to the prediction that the results on markedness in this experiment should mirror those of Experiment 1. Such a finding for the contrast between names and descriptions would suggest differences in the effects of topicality during the comprehension of Korean and English. Gordon et al. (in press) used eye tracking during the reading of English sentences to assess whether the inherent differences in topicality between names and descriptions would yield a processing advantage for names that were subjects as compared to names that were objects; this subject vs. object contrast parallels the matrix subject vs. embedded subject contrast in the sense of one syntactic position being more inherently associated with topical NPs than the other. Gordon et al. (in press) found no evidence that this type of definiteness affected reading comprehension in English. A finding of definiteness effects, like those seen in Experiment 1, would indicate that definiteness affects comprehension differently in Korean, a language with relatively free word order, and English, a language with relatively fixed word order.

3.1. Method

3.1.1. Participants

Twenty-four Koreans from the UNC Chapel Hill community participated in the experiment. All of them were native Korean speakers and received $10 for their participation. All had normal or corrected-to-normal vision and had not taken part in the first experiment.

3.1.2. Design and procedure

The design and procedure were as in Experiment 1.

3.1.3. Analyses and measures

All the analyses and measure were as in Experiment 1. Short fixations made up 2.7% of total fixation. 0.8% were combined with longer fixations and 1.9% were omitted. Fixations longer than 800 ms were regarded as 800 ms fixations. Only 0.5% of total fixations were longer than 800 ms.

3.2. Results

Table 3 shows the mean accuracy rates on the comprehension questions and the mean naturalness ratings for all four sentence types. There was no significant difference for the accuracy rates but there was a marginally significant difference by NP matching for the naturalness ratings. The Non-Matched NP conditions (2.62) were rated more natural than the Matched NP conditions (2.77) [F1(1,23) = 3.79 MSe = 1.67, p = .064, F2(1,47) = 3.75 MSe = 1.78, p = .059].

Table 3.

Accuracy and naturalness rating (1–7 scale, with 1 being most natural) in Experiment 2

Desc–Desc Name–Name Desc–Name Name–Desc
Accuracy rate  .96  .94  .96  .96
Naturalness rate 2.84 2.70 2.61 2.63

Table 4 shows reading time for the four regions in the sentence that were analyzed in the previous experiment using the same measures.

Table 4.

Reading times for various measures in Experiment 2

Match type Match
Non-match
Sentence type Desc–Desc Name–Name Desc–Name Name–Desc
Gaze duration
 Adverb  324 315 328 334
 Critical region (NP1 + NP2)  555 540 582 520
 Object  223 212 220 217
 Embedded verb  263 257 240 271
 Matrix verb  248 234 229 229
Right-bounded reading
 Critical region (NP1 + NP2)  731 673 745 622
 Object  289 252 269 277
 Embedded verb  349 334 319 329
 Matrix verb  393 358 328 344
Regression-path duration
 Critical region (NP1 + NP2)  805 703 789 656
 Object  473 364 404 411
 Embedded verb  654 571 606 629
Rereading
 237 210 219 210
 Critical region (NP1 + NP2) 1114 994 917 916
 Object  355 282 260 302
 Embedded verb  271 204 208 219
 Matrix verb  136 110  85 100
Time on critical region after regression from verbs
 741 645 603 573

3.2.1. Gaze duration

For the critical NP region, gaze duration was longer when the matrix subject (NP1) was a description (569 ms) as compared to when it was a name (530 ms), an effect that was close to significant by participants and significant by items, [F1(1,23) = 3.30 MSe = 119,128, p = .082, F2(1,47) = 4.53 MSe = 99,815, p < .05]. The experimental manipulations did not significantly affect gaze duration for any other region of the sentence.

3.2.2. Right-bounded reading

For the critical NP region, right-bounded reading was longer when the matrix subject (NP1) was a description (738 ms) as compared to when it was a name (648 ms) [F1(1,23) = 19.87 MSe = 109,154, p < .001, F2(1,47) = 11.45 MSe = 180,438, p < .01]. For the object region, this measure showed longer reading times when the embedded subject (NP2) was a description (283 ms) as compared to when it was a name (261 ms) [F1(1,23) = 8.10 MSe = 11,693, p < .01, F2(1,47) = 4.28 MSe = 21,238, p < .05]. For the matrix verb region, right-bounded reading took more time when the matrix and embedded NPs were matched (376 ms) (both descriptions or both names) as compared to when they were not matched (336 ms) [F1(1,23) = 4.52 MSe = 71,513, p < .05, F2(1,47) = 6.05 MSe = 54,697, p < .05].

3.2.3. Regression-path duration

For the critical NP region, regression-path duration was longer when the matrix subject (NP1) was a description (797 ms) as compared to when it was a name (680 ms) [F1(1,23) = 18.38 MSe = 225,617, p < .001, F2(1,47) = 18.24 MSe = 233,821, p < .001]. Regression-path duration for the object region was longer when the matrix subject was a description(439 ms) as compared to when it was a name (388 ms), a pattern that was significant by participants but marginal by ite ms [F1(1,23) = 7.89 MSe = 134,634, p < .01, F2(1,47) = 2.69 MSe = 223,780, p = .108]. This measure also showed an effect of the embedded subject, yielding longer reading times when it was a description (442 ms) as compared to when it was a name (384 ms) [F1(1,23) = 8.91, MSe = 201,658, p < .01, F2(1,47) = 4.28 MSe = 144,157, p < .05].

3.2.4. Rereading

Rereading times for the critical region (NP1 and NP2) were longer when the matrix subject (NP1) and the embedded subject NP (NP2) were the same type as compared to when they were different types. This effect was significant for the critical region (1954 ms vs. 917 ms) [F1(1,23) = 6.12 MSe = 844,809, p < .05, F2(1,47) = 7.08 MSe = 859,696, p < .05] and for the object region (318 ms vs. 281 ms) [F1(1,23) = 4.22 MSe = 95,777, p = .051, F2(1,47) = 4.18 MSe = 93,139, p < .05]. In addition, rereading of the object region took longer when the embedded NP was a description (329 ms) as compared to when it was a name (271 ms) [F1(1,23) = 10.22, p < .01 MSe = 92,589, F2(1,47) = 9.29 MSe = 94,301, p < .01].

As in the last experiment, we also examined rereading time after regression from the verb region. For the critical region, this time was longer when the critical NPs were matched type (693 ms) as compared to when they were the same type (588 ms) [F1(1,23) = 3.92 MSe = 737,742, p = .060, F2(1,47) = 6.51 MSe = 652,117, p < .05].

3.3. Discussion

Most of the results were similar to those of Experiment 1. With respect to similarity-based interference, rereading times were longer when the two adjacent subject NPs were of matched type as compared to when they were of different types, a pattern that was found for both the critical region and object region, as well as for the rereading of the critical region after regression from the verb region. These results again support the idea that the similarity of two adjacent NPs contributes to the processing in a manner that is consistent with the similarity-based interference model.

With respect to definiteness, results for the name/description manipulation paralleled that of the pronoun/description manipulation of the preceding experiment in the case of the matrix subject. Reading times were shorter when the matrix subject was a name as compared to a description for measures of early and relatively-early comprehension of the sentence. These effects were found for the critical region consisting of the matrix and embedded subject NPs and for the following region consisting of the object NP. As with the previous experiment, these results indicate that comprehension is easier when the matrix subject of a Korean sentence is an NP with greater definiteness (pronoun or name) as compared to one with less definiteness (a description).

The results for the embedded subject NP are not consistent with those of Experiment 1, where we found that there were shorter reading times for the critical region when the embedded NP was a less definite description than when it was a more definite pronoun. The current experiment showed a trend in this direction but it was not significant. Further, it showed that reading times for the object region were longer on three measures (right-bounded reading, regression-path duration and rereading), when the embedded subject was a less definite description as compared to a more definite name. These significant main effects are difficult to interpret. In the case of rereading, the significant main effect is qualified by a significant interaction of type of embedded subject and type of matrix subject (i.e., by the match effect). In the case of the measures of early processing (right-bounded reading and regression-path duration) the effect, while significant, is small, and it occurs immediately after the critical region, where a non-significant difference in the opposite direction was found. This suggests that there may have been some tradeoff in processing between the critical region and the object region.

In summary, the results of Experiment 2 (which used descriptions and names) replicated and extended the findings of Experiment 1 (which used descriptions and pronouns) by showing that similarity-based interference is observed during the later stages of reading Korean embedded complement sentences and by showing that sentences with a more definite matrix subject show an early processing advantage as compared to those with a less definite matrix subject. The pattern of effects for the embedded subject in Experiment 2 were not consistent with those of Experiment 1, a result that is difficult to interpret because of possible tradeoffs in the reading of different regions of the sentence.

4. Experiment 3

Experiment 3 had two goals. The first was to see whether similarity-based interference occurs during the comprehension of Korean relative clauses, the second is to assess further the specificity and generality of markedness effects in during the comprehension of Korean sentences. The experiment manipulates NP match (descriptions and names) in sentences with object-modifying, object-extracted RCs and also with the embedded complements used in the preceding experiments.

4.0.1. Similarity-based interference

Research on similarity-based interference in English has focused on object-extracted RCs (and also clefts) because they involve stacked NPs. As a head-final language, Korean has a greater number of structures with stacked NPs but not in the specific types of RC sentences that stack NPs in English (object-extracted RCs that modify subjects). In Korean, RCs are pre-nominal modifiers that accordingly do not interrupt a clause when modifying a matrix subject. However, when modifying a matrix object, object-extracted RCs in Korean appear directly after the matrix subject. This pattern is illustrated schematically in (19). It allows manipulation of the similarity of the matrix subject and the subject of the embedded clause as shown (20)–(23):

  • (19) [NPnom (subj) [RC NPnom (subj) ei V1] NPiacc (obj) V2]

  • (20) Matched (Name–Name)
    • Yongjin-ika Eunsuk-ika hyepbakha-n chongcang-ul mannassta.
    • Yongjin-nom Eunsuk-nom threaten-comp president-acc met
    • ‘Youngjin met the president that Eunsuk threatened.’
  • (21) Matched (Description–Description)
    • pencipcang-i enronin-i hyepbakha-n chongcang-ul mannassta.
    • editor-nom journalist-nom threaten-comp president-acc met
    • ‘The editor met the president that the journalist threatened.’
  • (22) Non-matched (Name–Description)
    • Yongjin-ika enronin-i hyepbakha-n chongcang-ul mannassta.
    • Yongjin-nom journalist-nom threaten-comp president-acc met
    • ‘Yongjin met the president that the journalist threatened.’
  • (23) Non-matched (Description–Name)
    • pencipcang-i Eunsuk-ika hyepbakha-n chongcang-ul mannassta.
    • editor-nom Eunsuk-nom threaten-comp president-acc met
    • ‘The editor met the president that Eunsuk threatened.’

As discussed earlier, Korean center-embedded structures are locally ambiguous between a single-clause interpretation and a bi-clausal one through the second nominative NP. However, a clause boundary must be posited as the accusative-marked NP is read in complement clause sentences (as in (11)–(13)) or as the first verb is read for object-modifying object-extracted RC sentences (as in (20)–(23)). These are the critical locations where it is first clear that the single clause interpretation is incorrect.1 The manner in which the complement clause structure is determined has already been discussed. For the relative clause sentences, the NP after the verbal complex (consisting the verb root and complementizer affix) makes it clear that the sentence contains an RC. The accusative marker on this NP requires it to be the object of the matrix clause and the RC gap has to be posited. Since the NP within the RC is nominative, the RC gap (i.e., the empty position in the RC, indicated by ei) has to be the logical object of the embedded verb. Thus, object-modifying object-extracted RCs in Korean set up the possibility of similarity-based interference between the initial NPs of the sentence, a possibility that is assessed in this experiment by examining the effects of matched and non-matched NP types (names and descriptions) affects sentence processing.

4.0.2. Definiteness

Our findings in Experiments 1 and 2 showed consistent effects on sentence processing of the definiteness of the sentence-initial, matrix-subject NP. More definite NPs resulted in faster reading times than did less definite NPs. Less consistent effects were observed for the definiteness of the second, embedded-subject NP, with Experiment 1 showing that less definite descriptions led to faster reading than more definite pronouns, while Experiment 2 if anything showed that less definite descriptions led to slower reading times than more definite names. Experiment 3 modified the stimuli of Experiment 2 in order to provide a more sensitive test of whether there is a processing advantage associated with less definite embedded subjects. Greater sensitivity may be required when comparing names and descriptions, rather than pronouns and descriptions, because the definiteness difference between them is less. The stimuli were changed by increasing the length of the name and description NPs from three syllables to four syllables. Short descriptions (three syllables) were used in Experiment 1 so that the descriptions and pronouns could be matched in length. These same descriptions were carried forward to Experiment 2 and used with names of matched length. Having longer NPs is likely to increase reading time for the critical region, which should concentrate processing in that region rather than having it distributed across different regions of the sentence. In addition, an adverbial phrase was inserted after the critical region so that the syntactic structure (most notably whether the sentence contained an embedded complement or RC) could not be determined through parafoveal preview while reading the critical region. Again, this should allow observation of effects of NP type independently of any processing activity associated with determining the grammatical structure of the sentence.

In addition, the use of RCs in Experiment 3 allows additional exploration of the effects of topicality on sentence processing. In sentences with restrictive RCs there is a strong tendency for embedded NPs to convey given information and for matrix NPs to convey new information (Fox & Thompson, 1990; Francis et al., 1999; Gordon & Hendrick, 2005), a pattern that reverses the one found in Korean sentences with embedded complements. The function of RCs, identifying referents and anchoring them in discourse, provides a straightforward explanation of this pattern. In object-extracted RCs the subject NP of the RC provides an important mechanism for such anchoring while the head of the RC will generally be new to the discourse. Here we examine whether this shift in expectations about the topicality of matrix and embedded NPs extends to matrix NPs other than the head.

Experiments 1 and 2 have shown that easier processing is associated with matrix subject NPs that are more definite. This pattern should be present in the early portions of the sentence but is predicted to change at the point in the text where the sentence is unambiguously seen to contain an RC.

4.1. Method

4.1.1. Participants

Thirty-six students at the Korea University served as participants in the experiment. They were native Koreans and received credit for an introductory psychology course for their participation. All had normal or corrected-to-normal vision.

4.1.2. Materials

Thirty-six experimental sentences from Experiment 2 were adapted and modified to create the complement clause stimuli and 36 sentences from the stimuli used in Kwon, Polinsky, and Kluender (2004) were adapted and modified to create the relative clause stimuli. In addition to the experimental sentences, 76 filler sentences were created. As in the previous two experiments, we varied whether the matrix subject NP and the embedded subject NP were names or descriptions, but instead of using three syllable words like the previous experiments, all the critical nouns were four syllable words.

In addition, adjectives were inserted immediately after the embedded subject so that participants could not determine the sentence type (complement or RC) through parafoveal preview as they were reading the critical NPs.

4.1.3. Design, procedure and equipment

The design, procedure and equipment were as in the previous experiments except there was no naturalness rating after the experiment.

4.1.4. Analyses and measures

The results were analyzed following the general strategy employed in the previous experiments, with adjustments made for design differences in this experiment. Analysis of gaze duration and right-bounded reading for the critical region was done jointly on both complement and RC sentences because the two types of sentences have the same structure through that region and because the gaze and right-bounded measures only include fixations that occur before participants have read past the region in question. Further, for the object relative clause sentences, the rereading time on critical region after direct regression from the object and the matrix verb will be reported instead of the regression from the verbs. Other than that, all the analyses and measure were as in the previous experiments. For the complement clause structure, short fixations made up 5.2% of total fixation. 4.1% were combined with longer fixations and 1.1% were omitted. Only 0.4% of total fixations were longer than 800 ms. For the object relative clause structure, short fixations made up 4.6% of total fixation. 3.9% were combined with longer fixations and 0.7% were omitted. Only 0.1% of total fixations were longer than 800 ms.

4.2. Results

Table 5 shows the mean accuracy rates on the comprehension. There were no significant differences between versions for the complement clause sentences. However, for the object-extracted RC sentences, there were effects that were significant by ite ms only for the type of the matrix subject(.82 vs. .87) [F2(1,35) = 4.92 MSe = .12, p < .05] and NP matching (.83 vs. .86) [F2(1,35) = 4.54 MSe = .10, p < .05]. The participants responded more accurately when the matrix subjects were names. The non-matched NP conditions were answered more accurately than the matched NP conditions.

Table 5.

Accuracies in Experiment 3

Desc–Desc Name–Name Desc–Name Name–Desc
Complement .93 .95 .96 .95
Object relative .81 .84 .83 .89

Table 6 shows gaze duration and right-bounded reading of the critical region jointly for the complement and RC sentences.

Table 6.

Gaze duration and right-bounded reading times for critical region in Experiment 3

Match type Match
Non-match
Sentence type Desc–Desc Name–Name Desc–Name Name–Desc
Adverb 476 491 461 465
Critical region (NP1 + NP2)
 Gaze duration 639 618 682 606
 Right-bounded reading 784 711 821 710
 Regression-path duration 820 749 872 755

Gaze duration was longer when the matrix subject was a description (661 ms) as compared to when it was a name (612 ms) [F1(1,35) = 13.55 MSe = 110,906, p < .001, F2(1,71) = 10.61 MSe = 147,220, p < .001]. In addition, gaze duration was longer when the embedded subject was a name (650 ms) as compared to when it was a description (620 ms) [F1(1,35) = 5.57 MSe = 90,539, p < .05, F2(1,71) = 3.71 MSe = 119,406, p = .058]. Right-bounded measure times were longer when the matrix subject was a description (803 ms) as compared to when it was a name (711 ms) [F1(1,35) = 23.99 MSe = 223,615, p < .001, F2(1,71) = 36.81 MSe = 148,125, p < .001]. Regression-path durations were longer when the matrix subject was a description (846 ms) as compared to when it was a name (752 ms) [F1(1,35) = 9.90 MSe = 676,169, p < .01, F2(1,71) = 28.18 MSe = 224,883, p < .001].

Table 7 shows reading time for the four regions in the sentence that were analyzed in the previous experiment using the same measures, excluding those measures that were analyzed jointly for the complement and RC sentences (Table 6).

Table 7.

Reading times for critical regions of complement clause sentences in Experiment 3

Match type Match
Non-match
Sentence type Desc–Desc Name–Name Desc–Name Name–Desc
Gaze duration
 Adverb  403  370  397  403
 Object  206  221  217  209
 Embedded verb  292  290  284  298
 Matrix verb  223  224  216  225
Right-bounded reading
 Adverb  461  423  436  455
 Object  223  242  230  217
 Embedded verb  332  304  313  327
 Matrix verb  347  307  317  322
Regression-path duration
 Object  327  303  292  275
 Embedded verb  898  726  773  648
Rereading
 Adverb  379  431  409  424
 Critical region (NP1 + NP2) 1290 1172 1156 1106
 Adverb  421  381  385  438
 Object  130  105  109  112
 Embedded verb  164  122  129  142
 Matrix verb   97   66   79   77
Time on critical region after regression from verbs
1031  923  899  856

4.2.1. Gaze duration

For the object region, gaze duration showed longer reading times when the embedded subject was a name (219 ms) as compared to when it was a description (208 ms) [F1(1,35) = 4.15 MSe = 8681, p < .05, F2(1,35) = 4.14 MSe = 7370, p < .05]. The experimental manipulations did not significantly affect gaze duration for any other region of the sentence.

4.2.2. Right-bounded reading

For the object region, right-bounded reading showed longer reading times when the embedded subject was a name (239 ms) as compared to when it was a description (220 ms) [F1(1,35) = 4.45 MSe = 13,889, p < .05, F2(1,35) = 5.13 MSe = 10,996, p < .05]. Right-bounded reading for the sentence middle adverb region was longer when the embedded subjects were descriptions (458 ms) than when they were names (430 ms) [F1(1,35) = 4.39 MSe = 47,428, p < .05, F2(1,35) = 6.15 MSe = 47,789, p < .05].

4.2.3. Regression-path duration

No significant effects were observed for regression-path duration.

4.2.4. Rereading

Rereading times for the critical region (NP1 and NP2) were longer when the matrix subject (NP1) and the embedded subject NP (NP2) were the same type (1231 ms) as compared to when they were different types(1131 ms) [F1(1,35) = 5.22 MSe = 619,873, p < .05, F2(1,35) = 6.89 MSe = 470,084, p < .05].

Rereading time for the critical region after regression from the verb region was longer when the critical NPs were matched type (977 ms) as compared to when they were the same type (877 ms) [F1(1,35) = 4.81 MSe = 685,480, p < .05, F2(1,35) = 6.82 MSe = 549,070, p < .05].

Table 8 shows results for the RC sentences, providing the reading time measures for the four regions that we have examined previously for the complement sentences, excluding those measures that were analyzed jointly for the complement and RC sentences (Table 6). Note that in the RC sentences the embedded verb occurs before the object NP.

Table 8.

Reading times for critical regions of relative clause sentences in Experiment 3

Match type Match
Non-match
Sentence type Desc–Desc Name–Name Desc–Name Name–Desc
Gaze duration
 Adverb  563  554  564  568
 Embedded verb  261  258  254  258
 Object  220  222  226  236
 Matrix verb  209  211  209  217
Right-bounded reading
 Adverb  754  728  742  754
 Embedded verb  278  270  270  271
 Object  238  250  247  267
 Matrix verb  372  348  363  362
Regression-path duration
 Embedded verb  317  378  346  446
 Object  314  308  365  318
Rereading
 Adverb  247  362  310  290
 Critical region (NP1 + NP2) 1518 1252 1276 1218
 Adverb  623  552  590  570
 Embedded verb  287  247  268  271
 Object  274  245  258  259
 Matrix verb  132  122  136  127
Time on critical region after regression from verbs
1220 1045 1015 1015

4.2.5. Gaze duration

There were no significant effects on this measure for the regions in Table 8.

4.2.6. Right-bounded reading

There were no significant effects on this measure for the regions in Table 8.

4.2.7. Regression-path duration

For the matrix verb region, regression-path durations were longer when the matrix subject was a name (412 ms) as compared to when it was a description (332 ms) [F1(1,35) = 6.52 MSe = 203,119, p < .05, F2(1,35) = 5.20 MSe = 348,304, p < .05]. Rereading. Rereading times for the critical region (NP1 and NP2) were longer when the matrix subject (NP1) and the embedded subject (NP2) were the same type (1385 ms) as compared to when they were different types (1247 ms) [F1(1,35) = 14.74 MSe = 419,399, p < .001, F2(1,35) = 8.97 MSe = 689,348, p < .01]. In addition, rereading time was longer when the matrix subject was a description (1397 ms) as compared to when it was a name (1235 ms) [F1(1,35) = 10.44 MSe = 818,172, p < .01, F2(1,35) = 13.62 MSe = 627,183, p < .001] and when the embedded subject was a description (1368 ms) as compared to when it was a name (1235 ms) [F1(1,35) = 5.92 MSe = 589,558, p < .05, F2(1,35) = 4.62 MSe = 754,890, p < .05]. Also, rereading time for the initial adverb region was longer when the matrix subject was a description (300 ms) as compared to when it was a name (276 ms) [F1(1,35) = 7.69 MSe = 95,345, p < .01, F2(1,35) = 4.09 MSe = 179,132, p = .051] and when the embedded subject was a description (336 ms) as compared to when it was a name (269 ms) [F1(1,35) = 10.44 MSe = 818,172, p < .01, F2(1,35) = 13.62 MSe = 627,183, p < .001].

Rereading times for the critical region were also examined after regression from the final two constituents of the sentence (object plus matrix verb in the RCs, as compared to embedded and matrix verbs in the complement sentences). These times were longer when the critical NPs were of matched type (1132 ms) as compared to non-matched type (1025 ms) [F1(1,35) = 10.32 MSe = 359,962, p < .01, F2(1,35) = 4.35 MSe = 678,096, p < .05].

4.3. Discussion

4.3.1. Similarity-based interference

An increase in reading time when the critical NPs are of the same type provides evidence supporting the idea that similarity-based interference affects memory retrieval during sentence comprehension. For both the embedded complements and the RC sentences, more time was spent rereading the critical NPs when they were the same type (two names or two descriptions) than when they were different types (a name and a description). This pattern was significant when the analysis was restricted to rereading that occurred after reading had progressed to the last two constituents of the sentence, where retrieval of the information associated with the NPs would be necessary in order to determine the arguments of the verbs. Thus, the results of this experiment replicate the pattern observed in the preceding experiments and extend it to sentences that contain RCs. This extension shows that similarity-based interference operates in Korean during the understanding of a type of structure that has been the focus of research in English. However, because Korean is a head-final language with pre-nominal RCs, the sentential roles of the NPs that must be stored in memory and retrieved as part of interpretation differ from the roles of the relevant NPs in English. Taken together, the results from English RCs, English clefts, Korean embedded complements and Korean RCs indicate that similarity-based interference impacts sentence comprehension when similar NPs must be held in memory before they are integrated into the meaning of the sentence. These results are not readily explained by an analysis of the relationship between types of NPs and the roles of those NPs in the sentence.

4.3.2. Markedness

On measures of initial processing of the critical NPs, reading times were faster when the matrix subject was a more definite NP (a name) than when it was a less definite NP (a description). This is the same pattern that was observed in Experiment 1, which used pronouns and descriptions and in Experiment 2, which used names and descriptions. Gaze durations were shorter in this region when the embedded subject was a less definite NP (a description) than when it was a more definite NP (a name), an effect that was significant by participants and very close (p = .058) by ite ms. This is the same pattern of definiteness effects that was observed in Experiment 1, which used pronouns and descriptions, but differs from Experiment 2, which like the current experiment used names and descriptions. Experiment 2, like this one, showed strong definiteness effects for the matrix subject, but showed inconsistent effects for the embedded subjects, which if anything were the opposite of what was found here. However, the current experiment was designed to be more sensitive through the use of longer NPs and the insertion of an adverbial phrase after the embedded subject in order to avoid preview effects. In sum, effects due to the definiteness of the matrix subject in sentences with embedded complements are consistent across the three experiments, with more definite NPs leading to easier comprehension. Effects due to the definiteness of embedded subjects are less consistent, but the results of this experiment tip the balance toward a pattern where having less definite NPs as embedded subjects lead to easier comprehension.

The RC sentences that were included in this experiment differ from the complement sentences in that the semantic function of the embedded RC creates a pressure for it to contain more definite, or given, NPs as compared to those in the matrix clause; that pattern reverses the one found in embedded complements. The results for the RCs showed that the initial processing advantage for a more definite matrix subject was reversed when reading progressed to the initial verb phrase. Regression-path durations were longer when the matrix NP was a more definite name as compared to when it was a less definite description. The initial verb phrase, where this effect is observed, is the point at which it is clear that the sentence contains an RC. As discussed earlier, RCs serve to ground the matrix clause in the current discourse and because of this they reverse the typical expectations about where new and given information reside within a sentence. The current results show that the effects of the conventional definiteness or givenness of NPs are shaped by a reader's emerging understanding of the structure of a sentence.

5. General discussion

The results of the three experiments are summarized schematically in Fig. 1, which shows the sequence of NPs and verbs in the embedded-complement and RC sentences that we studied. Initial processing of the two nominative NPs shows effects of the alignment of definiteness with syntactic position. Comprehension was easier (as indicated by shorter reading times) when the matrix subject was a more definite NP (pro-noun or name) than when it was a less definite NP (a description). Comprehension was harder when the embedded subject was a highly definite NP (a pronoun) than when it was a less definite NP (a description). This pattern was also seen when a moderately definite NP (a name) was compared to a less definite NP (a description), though this effect was not observed in one experiment. This general pattern of effects, where processing was facilitated when a more definite NP was in a prominent syntactic position, reversed for the matrix subject when readers encountered information that indicated that the sentence contained a relative clause. Finally, comprehension was more difficult when both initial NPs of the sentence were of the same type (pronouns, names, or descriptions) when readers looked back at the initial NPs after reading to the later part of the sentence. Below, we discuss the implications of this pattern of results for understanding the nature of the memory processes used during language comprehension and for understanding the ways in which alignment of markedness values affects sentence processing.

Fig. 1.

Fig. 1

A schematic depiction of how comprehension is affected by the definiteness and similarity of NPs as embedded-complement and RC sentences are read. (in file LeeLeeGordonFigure.pdf)

5.1. Memory and sentence processing

Our memories are far better for meaningfully integrated information than for less meaningful, list-like information, a fact that likely contributes to the incremental nature of language comprehension where linguistic input is interpreted more or less as it becomes available. Some sentences are difficult to understand, even though they contain no misleading local ambiguities, because they contain words and phrases that must be held in memory awaiting subsequent linguistic information that is necessary for integrating the parts of a sentence into meaningful representations. The difficulties that people have in understanding such sentences have provided evidence for developing and evaluating theories of the nature of the memory processes that support language comprehension. While most such theories have stressed the limited capacity of working memory (e.g., Gibson, 1998; Just & Carpenter, 1992; Wanner & Maratsos, 1978), our work has emphasized the susceptibility of human memory to interference due to similarity of the items that must be recalled (Gordon et al., 2001; Gordon et al., 2002; Gordon et al., 2004; Gordon et al., in press). The experiments reported in this paper provide evidence that is inconsistent with a current, capacity-based model of memory constraints on comprehending complex sentences (Warren & Gibson, 2002) and provide new information that helps localize the operation of memory interference during language comprehension.

According to Gibson's (1998) Dependency Locality Theory, comprehension difficulty increases with the demands imposed by remembering discourse entities that intervene between a filler and the site where it is attached. Warren and Gibson (2002) elaborated on Gibson's (1998) treatment of the memory demands of different types of NPs, replacing Gibson's characterization of indexical pronouns as unique in creating no demands on memory capacity with a graded treatment in which the demands caused by different types of intervening NPs varies with their position on the givenness hierarchy (Gundel et al., 1993); more given NPs are characterized as imposing smaller memory demands and therefore less difficulty in comprehension. The current experiments provide no support for this position and show that if anything comprehension of the embedded-complement and RC sentences was more difficult when the embedded subject NP, which intervened between the matrix subject and verb, was a more definite/given NP, a finding that was particularly clear in Experiment 1 which contrasted highly-definite pronouns with descriptions. This effect is the opposite of what is predicted by the Warren and Gibson (2002) theory, though the occurrence of the effect early in processing creates some uncertainty about its implications for their theory, which does not indicate exactly when in the time course of processing effects of the definiteness of NPs should be observed.

The absence in Korean sentence comprehension of any facilitative effect of having the embedded subject be a highly definite NP provides important perspective on debates about the nature of memory constraints that have arisen from studies on the comprehension of complex sentences in English. Research on memory constraints in comprehending English has focused on object-extracted RCs because sentences containing such clauses are one of the few instances in English where unintegrated NPs are stacked in a sentence. Models addressing those memory constraints have differentially emphasized the roles of accessibility and similarity of NPs (Gordon et al., 2004; Warren & Gibson, 2002). Because the head of an RC is in most instances a description, the easiest way to create a difference between the head and the immediately following subject of the embedded clause is to have that subject be a more definite name or pronoun. Experiments that have done this have shown that comprehension is facilitated when the embedded subject is either a name or a pronoun (Gordon et al., 2001, 2004; in press; Warren & Gibson, 2002), a finding that can be explained both by memory-interference models and by accessibility models. The absence in Korean of a facilitative effect of having a more definite embedded NP suggests that memory interference models provide a more general explanation than accessibility models of the effects of NP types on the ease of processing complex sentences. The high incidence of pronouns as subject NPs in English object-extracted RCs (Fox & Thompson, 1990; Gordon et al., 2004; Gordon & Hendrick, 2005) is consistent with the semantic function of an RC in identifying its head by relating it to given information, but that semantic effect on language use is not directly tied to experimental evidence concerning the difficulty of understanding complex sentences.

Difficulty in understanding the complex sentences studied here, embedded complements and RCs in Korean, was increased when the two initial nominative NPs in the sentence were of the same type. This effect was observed in the re-reading times of those NPs after participants progressed toward the end of the sentence and then looked back at the NPs. The timing of this effect differs from that in English, where interference due to similar NPs is observed in the middle of a sentence (Gordon et al. ms). This difference suggests that similarity-based interference is associated with the interpretation of NPs in relation to verbs, which have different sentential positions in the two languages. This leads to the conclusion that similarity-based interference is primarily a phenomenon of memory retrieval, not of the encoding or maintenance of information about NPs. It occurs when readers encounter information about the verb and must retrieve NP information from memory in order to establish the syntactic or semantic relations between the verb and NPs in the sentence. Previous results on English have not provided clear evidence on this issue because the proximity of the critical NPs and verbs has made it difficult to distinguish these aspects of memory.

The Korean sentences used in this experiment contained two NPs in succession that had the same case markers. As such, the order of the NPs was the only cue to their syntactic role, a situation that is comparable to object-extracted RCs in English. While Korean syntax leads to frequent stacking of NPs, successive NPs usually have different case markers. The current experiments provide no evidence about whether case markers lead to more easily retrievable representations of the syntactic role of an NP than does linear order. Nonetheless, it is tempting to speculate that they do because otherwise the memory demands of understanding head-final languages such as Korean might be excessive.

5.2. Alignment of markedness values

The manner in which different hierarchies are aligned has generated considerable interest in linguistics (Croft, 1990; Givón, 1979; Aissen, 2003). Across many studies there is clear evidence that unmarked values tend to go with unmarked values while marked values go with marked values. This pattern of alignment is seen as a statistical preference in some language while in others it is a grammatical constraint (Givón, 1979; Bresnan et al., 2001). The use of particular types of NPs provides a mechanism for marking the accessibility of the discourse entity that is being referred to. The sequence of pronoun, name, definite description defines a hierarchy that goes from more definite to less so (Croft, 1990; Aissen, 2003). A comparable hierarchy is seen in grammatical positions with subjects referring to more given information than non-subjects. The NP and grammatical-position hierarchies align such that NP types that mark given information go with grammatical positions that are also associated with given information, while NP types that mark less given information go with grammatical positions that are not associated with given information. While these patterns of alignment have received much attention in linguistics, they do not appear to have been examined in psycholinguistic studies of sentence processing.

The experiments reported here provide clear evidence that early phases in the processing of Korean are facilitated (as measured by shorter reading times) when the matrix subject of a sentence is a definite NP and also when the embedded subject is a less definite NP. This pattern of processing facilitation is consistent with the patterns of frequency of association and grammaticality that have been observed in linguistic research (Aissen & Bresnan, 2002; Bresnan et al., 2001). Two possible mechanisms come to mind as explanations for this facilitation.

The first explanation is that the facilitation is driven by experience, with processing being easiest when the association between NP type and grammatical position is one that has been previously encountered at a relatively high frequency. This type of mechanism is consistent with models where parsing preferences and mechanisms are seen as developing in response to the relative frequencies of patterns in linguistic input (e.g., Juliano & Tanenhaus, 1994; Jurafsky, 1996; MacDonald, Perlmutter, & Seidenberg, 1994; Townsend & Bever, 2001). While evidence on relative frequency of the occurrence of different types of NPs in different syntactic positions is consistent with this explanation, the correlation between relative frequency and ease of processing does not provide conclusive evidence that experience is the causal factor. Research relating the relative frequency of grammatical patterns to ease of processing has yielded a mixed record of correlations between the two measures. This research has been dogged by the“grain problem” (e.g., Gordon et al., 2004; Desmet & Gibson, 2003; Desmet, De Baecke, Drieghe, Brysbaert, & Vonk, 2006; Mitchell, Cuetos, Corley, & Brysbaert, 1995; Townsend & Bever, 2001) which refers to uncertainty about the level of linguistic analysis (e.g., frequency of co-occurrence of NP types in grammatical positions vs. frequency of co-occurrence of particular NPs as arguments of particular verbs) at which experience-driven theories should be tested.

The second explanation is that associations between NP type and grammatical position affect ease of processing because those associations are in some way better formed or more meaningful. On this view, preferred associations are not formed through experience with elements that are arbitrarily paired but instead reflect basic syntactic or semantic principles. Such principles could influence parsing preferences or could influence the ease of creating a model of the meaning of a sentence. The timing of the association effects that we observed are more consistent with processes that operate at relatively early stages of sentence comprehension (e.g., parsing) than with processes involved in deeper levels of comprehension.

A major question arising from the findings here is whether the effects of the association between NP types and grammatical position, observed here for Korean, occur in other languages. In Gordon et al. (in press), we varied the type of NPs (name vs. description) in subject and indirect object position. No effects of the association between NP type and grammatical position were observed in two experiments that used the same eye-tracking-during-reading procedures that were used here for Korean. This difference between the results on Korean and English must be interpreted with caution. For one thing, the English studies contrasted subject and object positions while the Korean studies contrasted matrix and embedded subject positions. Further, the studies on English contrasted names and descriptions, which differ less in definiteness than do pronouns and descriptions. While the results of the current Experiments 2 and 3 show that the name/description contrast is sufficient to create association effects in Korean, contrasting pronouns and descriptions would provide a stronger test of whether there are association effects in English.

If the difference between Korean and English holds in future research, then differences between the languages could provide information that is helpful in understanding why associations between NP type and grammatical position affect comprehension in Korean. One possibility arises from the relatively flexible word order in Korean, at least as compared to English. The dominant word order in Korean is SOV but OSV sentences are possible and tend to occur in discourse contexts where the object constituent refers to given information. This flexibility may increase attention in Korean to the grammatical and/or serial position of constituents as indicative of how information is packaged in a sentence. A second possibility is that information about the structure of Korean sentences is not available until the end of a sentence because it is a head-final language. This creates difficulty in determining the structure of a sentence and may increase reliance on markedness as a source of information about the sentence.

6. Conclusion

The findings reported in this paper show that the types of NPs (i.e., pronouns, names or descriptions) in a sentence affect ease of comprehension in two distinct ways. First, the similarity of NPs that are stacked in a sentence makes it more difficult to retrieve the correct NP from memory when a verb is encountered that requires that NP as an argument. This effect is consistent with the view that similarity-based interference is a fundamental constraint on memory during sentence comprehension, a view that grounds characterization of the operation of memory during language comprehension in more general theories of memory that address non-linguistic phenomena. Second, the alignment of type of NP and syntactic prominence affects ease of sentence comprehension in a manner that is consistent with analyses of linguistic patterns: comprehension is easiest when more definite NPs are in prominent syntactic positions and when less definite NPs are in less prominent syntactic positions. This effect shows that the online interpretation of a sentence is guided in part by the alignment of the information packaged in different linguistic elements.

Acknowledgements

The research reported here was supported by BCS-0112231 and RO1 MH066271. We gratefully acknowledge the assistance of Nayoung Kwon, who provided the relative clause sentences that we used in Experiment 3. We also thank Marcus Johnson, Mary Michael, Scott Hajek, Xena Kim, Yuan Kwon and Gary Feng for help with many aspects of this research.

Appendix A

Examples of the stimuli from Experiment 1 and 2 are shown below. The entire set of stimuli is available upon request.

graphic file with name nihms-14275-0002.jpg

Interview professor/they/Hyunsu-NOM painter/we/Yonghee-NOM property-ACC all donated applauded

During the interview the professor/they/Hyunsu applauded that the painter/we/Yonghee donated all the properties.

graphic file with name nihms-14275-0003.jpg

Law office general/we/Myungsu-NOM director/she/Sunghee-NOM bribe-ACC received saw

At the law office the general/we/Myungsu saw that the director/she/Sunghee received the bribe.

graphic file with name nihms-14275-0004.jpg

Meal while chief/they/Minsu-NOM instructor/she/Yongmi-NOM contract-ACC violated exposed

While eating the chief/they/Minsu exposed that the instructor/we/Yongmi violated the contract

graphic file with name nihms-14275-0005.jpg

Drinking owner/that person/Jinsu-NOM executive/we/Sunmi-NOM gamble-ACC making was told

During drinking the owner/that person/Jinsu told that the executive/we/Sunmi was making a gamble

Appendix B

Examples of the stimuli from complement clause sentences of Experiment 3 are shown below. The entire set of stimuli is available upon request.

graphic file with name nihms-14275-0006.jpg

Interview researcher/Yongman-NOM nurse/Jihyun-NOM very respectfully property-ACC all donated applauded

During the interview the researcher/Yongman applauded that the nurse/Jihyun-NOM donated all the properties very respectfully.

graphic file with name nihms-14275-0007.jpg

Law office inspector/Jieun-NOM custom officer/Hosuk-NOM behind secretly bribe-ACC received saw

At the law office the inspector/Jieun saw that the officer/Hosuk received the bribe secretly.

graphic file with name nihms-14275-0008.jpg

Meal while manager/Inchul-NOM gangster/Jungmin-NOM very viciously contract-ACC violated exposed

While eating the manager/Inchul exposed that the gangster/Jungmin violated the contract very viciously

graphic file with name nihms-14275-0009.jpg

Drinking tax collector/Yonghwan-NOM patent attorney-NOM very hopelessly gamble-ACC was making told

During drinking the tax collector/Yonghwan told that the patent attorney/Jihyun was making a gamble very hopelessly

Appendix C

Examples of the stimuli from relative clause sentences of Experiment 3 are shown below. These sentences were adapted from stimuli developed by Nayoung Kwon (Kwon et al., 2004). Requests for information about the entire set of stimuli should be made to her (nayoung@ling.ucsd.edu).

graphic file with name nihms-14275-0010.jpg

Yesterday night editor/Yongchul-NOM reporter/eunsuk-NOM bribery charge threatened-COM executive director-ACC met

Yesterday night the editor met the executive director whom the reporter threatened with a bribery charge.

graphic file with name nihms-14275-0011.jpg

Just now diplomat/Yongjin-NOM president/Jihyun-NOM press reception received-COM minister-ACC remembered

Just now the diplomat remembered the minister whom the president received at the press reception.

graphic file with name nihms-14275-0012.jpg

Expected repairman/Donghun-NOM driver/Sueun-NOM Seoul suburb guided-COM soldier-ACC saw

As expected the repairman/Donghun saw the soldier whom the driver/Sueun guided to suburban Seoul

graphic file with name nihms-14275-0013.jpg

Any proof without union president/Yonghwan-NOM chairperson/Suhyun-NOM salary negotiation for met-COM director-ACC

Without any proof the union president/Yonghwan blamed the director whom the chairperson/Suhyun met for a salary negotiation.

Footnotes

1

Note that Korean does not have relative pronouns and, as a head-final language, complementizers, which are verbal affixes, come at the end of the clause.

References

  1. Aissen J, Bresnan J. Optimality theory and typology. Course taught at the DGFS/LSA summer school ‘formal and functional linguistics’. Heinrich-Heine University; 2002. [Google Scholar]
  2. Aissen J. Markedness and subject choice in Optimality Theory. Natural Language and Linguistic Theory. 1999;17:673–711. [Google Scholar]
  3. Aissen J. Differential object marking: Iconicity vs. economy. Natural Language and Linguistic Theory. 2003;21:435–483. [Google Scholar]
  4. Ariel M. Accessing noun-phrase antecedents. Routledge; London: 1991. [Google Scholar]
  5. Bever TG. The ascent of the specious, or there's a lot we don't know about mirrors. In: Cohen D, editor. Explaining linguistic phenomena. Hemisphere; Washington: 1974. pp. 173–200. [Google Scholar]
  6. Bresnan J, Dingare S, Manning C. Soft constraints mirror hard constraints: voice and person in English and Lummi; Miriam Butt & Tracy Holloway King Proceedings of the LFG 01 Conference; 2001. Online, CSLI Publications: http://www-csli.stanford.edu/publications. [Google Scholar]
  7. Brysbaert M, Vitu F. Word skipping: Implications for theories of eye movement control in reading. In: Underwood G, editor. Eye guidance in reading and scene perception. Elsevier; New York: 1998. pp. 125–147. [Google Scholar]
  8. Calvo MG. Working memory and inferences: Evidence from eye fixations during reading. Memory. 2001;9:365–381. doi: 10.1080/09658210143000083. [DOI] [PubMed] [Google Scholar]
  9. Caplan D, Waters GS. Verbal working memory and sentence comprehension. Behavioral and Brain Sciences. 1999;22:77–94. doi: 10.1017/s0140525x99001788. [DOI] [PubMed] [Google Scholar]
  10. Chen HC, Song H, Lau WY, Wong KFE, Tang SL. Developmental characteristics of eye movements in reading Chinese. In: McBride-Chang C, Chen H-C, editors. Reading development in Chinese children. Praeger; Westport, CT: 2003. pp. 157–169. [Google Scholar]
  11. Chen HC, Tang CK. The effective visual field in Chinese. Reading and Writing. 1998;10:245–254. [Google Scholar]
  12. Chung S. The design of agreement. Evidence from Chamorro. University of Chicago Press; Chicago: 1998. [Google Scholar]
  13. Croft W. Typology and universals. Cambridge University Press; Cambridge: 1990. [Google Scholar]
  14. Crowder RG. In: Principles of learning and memory. Melton AW, editor. Erlbaum; Hillsdale, NJ: 1976. [Google Scholar]
  15. Desmet T, De Baecke C, Drieghe D, Brysbaert M, Vonk W. Relative clause attachment in Dutch: on-line comprehension corresponds to corpus frequencies when lexical variables are taken into account. Language and Cognitive Processes. 2006;21:453–485. [Google Scholar]
  16. Desmet T, Gibson E. Disambiguation preferences and corpus frequencies in noun phrase conjunction. Journal of Memory and Language. 2003;49:353–374. [Google Scholar]
  17. Diesing M, Jelinek E. Distributing arguments. Natural Language Semantics. 1995;3:123–176. [Google Scholar]
  18. England N. Ergativity in Mamean (Mayan) languages. International Journal of American Linguistics. 1983;49:1–19. [Google Scholar]
  19. Estival D, Myhill J. Formal and functional aspects of the development from passive to ergative systems. In: Shibatani M, editor. Passive and voice. John Benjamins; Amsterdam: 1988. pp. 441–491. [Google Scholar]
  20. Foley AW, Van Valin RD., Jr. Functional syntax and universal grammar. Cambridge University Press; Cambridge: 1984. [Google Scholar]
  21. Ford M. A method for obtaining measures of local parsing complexity throughout sentences. Journal of Verbal Learning and Verbal Behavior. 1983;22:203–218. [Google Scholar]
  22. Fox AB, Thompson SA. A discourse explanation of the grammar of relative clauses in English conversation. Language. 1990;66:297–316. [Google Scholar]
  23. Francis SH, Gregory LM, Michaelis AL. CLS. Vol. 35. Chicago Linguistic Society; Chicago: 1999. Are Lexical Subjects Deviant? pp. 85–97. [Google Scholar]
  24. Garrod CS, Sanford A. The mental representation of discourse in a focused memory system: Implications for the interpretation of anaphoric noun phrases. Journal of Semantics. 1982;1:21–41. [Google Scholar]
  25. Garrod CS, Sanford A. On the real-time character of interpretation during reading. Language and Cognitive Processes. 1985;1:43–59. [Google Scholar]
  26. Garrod CS, Sanford A. Resolving sentences in a discourse context: How discourse representation affects language understanding. In: Gernsbacher MA, editor. Handbook of psycholinguistics. Academic Press; San Diego: 1994. pp. 675–698. [Google Scholar]
  27. Gerdts D. Object and absolutive in Halkomelem Salish. Garland; New York: 1988. [Google Scholar]
  28. Gibson E. Linguistic complexity: locality of syntactic dependencies. Cognition. 1998;68:1–76. doi: 10.1016/s0010-0277(98)00034-1. [DOI] [PubMed] [Google Scholar]
  29. Gillund G, Shiffrin RM. A retrieval model for both recognition and recall. Psychological Review. 1984;91:1–67. [PubMed] [Google Scholar]
  30. Givón T. On understanding grammar. Academic Press; New York: 1979. [Google Scholar]
  31. Gordon PC, Chan D. Pronouns, passives, and discourse coherence. Journal of Memory and Language. 1995;34:216–231. [Google Scholar]
  32. Gordon PC, Grosz B, Gilliom L. Pronouns, names and the centering of attention in discourse. Cognitive Science. 1993;17:311–347. [Google Scholar]
  33. Gordon PC, Hendrick R, Johnson M. Memory interference during language processing. Journal of Experimental Psychology: Learning, Memory and Cognition. 2001;27:1411–1423. doi: 10.1037//0278-7393.27.6.1411. [DOI] [PubMed] [Google Scholar]
  34. Gordon PC, Hendrick R, Johnson M. Effects of noun phrase type on sentence complexity. Journal of Memory and Language. 2004;51:97–114. [Google Scholar]
  35. Gordon PC, Hendrick R, Levine WH. Memory-load interference in syntactic processing. Psychological Science. 2002;13:425–430. doi: 10.1111/1467-9280.00475. [DOI] [PubMed] [Google Scholar]
  36. Gordon PC, Hendrick R. Relativization, ergativity, and corpus frequency. Linguistic Inquiry. 2005;36:456–463. [Google Scholar]
  37. Gordon PC, Hendrick R, Johnson M, Lee Y. Similarity-based interference during language comprehension: evidence from eye tracking during reading. Journal of Experimental Psychology: Learning, Memory and Cognition. doi: 10.1037/0278-7393.32.6.1304. in press. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Gundel J, Hedberg N, Zacharski R. Cognitive status and the form of referring expressions in discourse. Language. 1993;69:274–307. [Google Scholar]
  39. Hintzman DL. “Schema abstraction” in a multiple-trace memory model. Psychological Review. 1986;93:411–428. [Google Scholar]
  40. Inhoff AW, Liu W. The perceptual span and oculomotor activity during the reading of Chinese sentences. Journal of Experimental Psychology: Human Perception & Performance. 1998;24:20–34. doi: 10.1037//0096-1523.24.1.20. [DOI] [PubMed] [Google Scholar]
  41. Inhoff AW, Lie W, Tang Z. Use of prelexical and lexical information during Chinese sentence Reading: Evidence From eye-movement studies. In: Wang J, et al., editors. Reading Chinese script: A cognitive analysis. Lawrence Erlbaum; Mahwah, NJ: 1999. pp. 223–238. [Google Scholar]
  42. Juliano C, Tanenhaus MK. A constraint based lexicalist account of the subject. Journal of Psycholinguistic Research. 1994;23:459–471. doi: 10.1007/BF02146685. [DOI] [PubMed] [Google Scholar]
  43. Jurafsky D. Probabilistic model of lexical and syntactic access and disambiguation. Cognitive Science. 1996;20:137–194. [Google Scholar]
  44. Just MA, Carpenter PA. A theory of reading: from eye fixations to comprehension. Psychological Review. 1980;87:137–172. [PubMed] [Google Scholar]
  45. Just MA, Carpenter PA. A capacity theory of comprehension: individual differences in working memory. Psychological Review. 1992;98:122–149. doi: 10.1037/0033-295x.99.1.122. [DOI] [PubMed] [Google Scholar]
  46. Keenan EL. The logical diversity of natural languages. In: Harnad S, Steklis H, Lancaster J, editors. Annals of the New York Academy of Sciences. Vol. 280. 1976. pp. 73–91. [DOI] [PubMed] [Google Scholar]
  47. Kim Y. Resolving grammatical marking ambiguities in Korean: An eye-tracking study. ms. [Google Scholar]
  48. King J, Just MA. Individual differences in syntactic processing: the role of working memory. Journal of Memory and Language. 1991;30(5):580–602. [Google Scholar]
  49. King J, Kutas M. Who did what and when? Using word- and clause-level ERPs to monitor working memory usage in reading. Journal of Cognitive Neuroscience. 1995;7:376–395. doi: 10.1162/jocn.1995.7.3.376. [DOI] [PubMed] [Google Scholar]
  50. Kroeger P. Phrase structure and grammatical relations in tagalog. CSLI Publications; Stanford: 1993. [Google Scholar]
  51. Kwon N, Polinsky M, Kluender R. Processing of relative clause sentences in Korean; Proceedings of amlap 2004 conference; 2004. [Google Scholar]
  52. Lambrecht K. Topic, focus, and the grammar of spoken French. University of California; Berkeley: 1986. Doctoral dissertation. [Google Scholar]
  53. Lee L, Ramsey R. The Korean language. SUNY Press; Albany, New York: 2000. [Google Scholar]
  54. Lewis RL. Interference in short-term memory: the magical number two (or three) in sentence processing. Journal of Psycholinguistic Research. 1996;25:93–115. doi: 10.1007/BF01708421. [DOI] [PubMed] [Google Scholar]
  55. Liversedge SP, Paterson KB, Pickering MJ. Eye movements and measures of reading time. In: Underwood G, editor. Eye guidance in reading and scene perception. Elsevier; New York: 1998. pp. 55–75. [Google Scholar]
  56. MacDonald MC, Perlmutter NJ, Seidenberg MS. The lexical nature of syntactic ambiguity resolution. Psychological Review. 1994;101:676–703. doi: 10.1037/0033-295x.101.4.676. [DOI] [PubMed] [Google Scholar]
  57. Mauner G, Melinger A, Koening J, Bienvenue B. When is schematic participant information encoded? Evidence from eye-monitoring. Journal of Memory and Language. 2002;47:386–406. [Google Scholar]
  58. Miller GA, Chomsky N. Finitary models of language users. In: Luce DR, Bush RR, Galanter E, editors. Handbook of mathematical psychology. Vol. 2. Wiley; New York: 1963. [Google Scholar]
  59. Mitchell DC, Cuetos F, Corley MMB, Brysbaert M. Exposure based models of human parsing: evidence for the use of coarse-grained (nonlexical) statistical records. Journal of Psycholinguistic Research. 1995;24:469–488. [Google Scholar]
  60. Myers J, O'Brien E. Accessing the discourse representation during reading. Discourse Processes. 1998;26:137–157. [Google Scholar]
  61. Pickering MJ, Frisson S, McElree B, Traxler MJ. Eye movements and semantic composition. In: Carreiras M, Clifton C Jr., editors. The on-line study of sentence comprehension. Psychology Press; New York: 2004. pp. 30–55. [Google Scholar]
  62. Pickering MJ, Traxler MJ, Crocker MW. Ambiguity resolution in sentence processing: Evidence against frequency-based accounts. Journal of Memory and Language. 2000;43:447–475. [Google Scholar]
  63. Prince FE. Toward a taxonomy of given-new information. In: Cole P, editor. Radical pragmatics. Academic Press; New York: 1981. pp. 223–255. [Google Scholar]
  64. Prince FE. The ZPG letter: subjects, definiteness, and information status. In: Thompson S, Mann W, editors. Discourse description: Diverse analyses of a fund raising text. John Benjamins; Philadelphia/Amsterdam: 1992. pp. 295–325. [Google Scholar]
  65. Radach R, MaConkie GW. Determinants of fixation positions in words during reading. In: Underwood G, editor. Eye guidance in reading and scene perception. Esevier; Oxford, England: 1998. pp. 77–100. [Google Scholar]
  66. Rayner K. The Perceptual span and peripheral cues in reading. Cognitive Psychology. 1975;7:65–81. [Google Scholar]
  67. Rayner K. Eye movements in reading and information processing. Psychological Bulletin. 1978;85:618–660. [PubMed] [Google Scholar]
  68. Rayner K. Eye movements in reading and information processing: 20 years of research. Psychological Bulletin. 1998;124:372–422. doi: 10.1037/0033-2909.124.3.372. [DOI] [PubMed] [Google Scholar]
  69. Rayner K, Pollatsek A. The psychology of reading. Prentice-Hall; New York: 1989. [Google Scholar]
  70. Rayner K, Pollatsek A. Eye movement control in reading. In: Traxler M, Gernsbacher M, editors. Handbook of psycholinguistics. 2nd ed. Elsevier; in press. [Google Scholar]
  71. Rayner K, Duffy S. Lexical complexity and fixation times in reading: effects of word frequency, verb complexity, and lexical ambiguity. Memory & Cognition. 1986;14:191–201. doi: 10.3758/bf03197692. [DOI] [PubMed] [Google Scholar]
  72. Rayner K, Kambe G, Duffy S. The effect of clause wrap-up on eye movements during reading. The Quarterly Journal of Experimental Psychology. 2000;53A:1061–1080. doi: 10.1080/713755934. [DOI] [PubMed] [Google Scholar]
  73. Rayner K, Li X, Juhasz B, Yan G. The effect of word predictability on the eye movements of Chinese readers. Psychonomic Bulletin & Review. 2005;12:1089–1093. doi: 10.3758/bf03206448. [DOI] [PubMed] [Google Scholar]
  74. Rayner K, Sereno S, Morris R, Schmauder AR, Clifton C. Eye movements an on-line language comprehension processes. Language and Cognitive Processes. 1989;4:SI21–SI50. [Google Scholar]
  75. Rayner K, Warren T, Juhasz BJ, Liversedge SP. The effect of plausibility on eye movements in reading. Journal of Experimental Psychology: Learning, Memory, and Cognition. 2004;30:1290–1301. doi: 10.1037/0278-7393.30.6.1290. [DOI] [PubMed] [Google Scholar]
  76. Sturt P, Lombardo V. Processing coordinated structures: Incrementality and connectedness. Cognitive Science. 2005;29:291–305. doi: 10.1207/s15516709cog0000_8. [DOI] [PubMed] [Google Scholar]
  77. Svartvik J. On voice in the English verb. Mouton; The Hague: 1966. [Google Scholar]
  78. Townsend D, Bever T. Sentence comprehension: The integration of habits and rules. We understand everything twice. MIT Press; Cambridge: 2001. [Google Scholar]
  79. Traxler M, Morris R, Seely RE. Processing subject and object relative clauses: evidence from the eye movements. Journal of Memory and Language. 2002;47:69–90. [Google Scholar]
  80. Tsai JL, Lee CY, Tzeng OJL, Hung DL, Yen NS. Use of phonological codes from Chinese characters: evidence from processing of parafoveal preview when reading sentences. Brain and Language. 2004;91:235–244. doi: 10.1016/j.bandl.2004.02.005. [DOI] [PubMed] [Google Scholar]
  81. Vitu F. The existence of a center of gravity effect during reading. Vision Research. 1991;31:1289–1313. doi: 10.1016/0042-6989(91)90052-7. [DOI] [PubMed] [Google Scholar]
  82. Wanner E, Maratsos M. An ATN approach to comprehension. In: Halle M, Bresnan J, Miller G, editors. Linguistic theory and psychological reality. MIT Press; Cambridge: 1978. [Google Scholar]
  83. Warren T, Gibson E. The influence of referential processing on sentence complexity. Cognition. 2002;85:79–112. doi: 10.1016/s0010-0277(02)00087-2. [DOI] [PubMed] [Google Scholar]

RESOURCES