Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2014 Feb 1.
Published in final edited form as: Cognition. 2012 Nov 28;126(2):268–284. doi: 10.1016/j.cognition.2012.10.008

Language experience changes subsequent learning

Luca Onnis a,*, Erik Thiessen b
PMCID: PMC3800190  NIHMSID: NIHMS419966  PMID: 23200510

Abstract

What are the effects of experience on subsequent learning? We explored the effects of language-specific word order knowledge on the acquisition of sequential conditional information. Korean and English adults were engaged in a sequence learning task involving three different sets of stimuli: auditory linguistic (nonsense syllables), visual non-linguistic (nonsense shapes), and auditory non-linguistic (pure tones). The forward and backward probabilities between adjacent elements generated two equally probable and orthogonal perceptual parses of the elements, such that any significant preference at test must be due to either general cognitive biases, or prior language-induced biases. We found that language modulated parsing preferences with the linguistic stimuli only. Intriguingly, these preferences are congruent with the dominant word order patterns of each language, as corroborated by corpus analyses, and are driven by probabilistic preferences. Furthermore, although the Korean individuals had received extensive formal explicit training in English and lived in an English-speaking environment, they exhibited statistical learning biases congruent with their native language. Our findings suggest that mechanisms of statistical sequential learning are implicated in language across the lifespan, and experience with language may affect cognitive processes and later learning.

Keywords: Corpus analyses, Experience-dependent learning, Implicit learning, Linguistic typology, Prediction, Retrodiction, Second language acquisition, Sequential learning, Statistical learning, Transitional probabilities, Word order

1. Introduction

Over the course of language acquisition, discovering the units that organize linguistic utterances (such as words and phrases) is of particular importance. Statistical information has been argued to be an important cue to identifying linguistic units. For example, sounds within a word are more predictable than sounds across boundaries, which may help infants discover words in fluent speech (Aslin, Saffran, & Newport, 1998; Saffran, Aslin, & Newport, 1996). Indeed, after exposure to the kinds of artificial languages used in statistical segmentation tasks, learners appear to treat the items they have segmented as discrete, unitized, word-like percepts (e.g., Giroux & Rey, 2009; Saffran, 2001). Furthermore, infants are better at learning labels for novel objects when those labels occurred as predictable syllable sequences (i.e., words) in the speech stream than when the syllable groupings were not predictable (Graf Estes, Evans, Alibali, & Saffran, 2007). Likewise, acquired sensitivity to statistical regularities between words may also be useful in parsing into phrase constituents (Thompson & Newport, 2007), thus supporting the discovery of syntactic properties of a language. In general, unit boundaries tend to correlate with the points where the predictability of successive or spatially continuous elements is lowest, while high probabilities promote perceptual grouping (Edelman, 2008; Goldstone, 2000).

Because information about the predictability of linguistic elements is present in all languages, statistical information may be a particularly important cue early in development, one that can be used without requiring prior experience with the native language (e.g., Thiessen & Saffran, 2003). But while statistical learning may be a universal cue to linguistic structure, it is also the case that the statistical structure across languages differs. Sensitivity to a particular predictable relation could be adaptive in one linguistic environment, but less so in another. A variety of research has examined how statistical learning helps learners adapt to the structure of their native language (e.g., Maye, Werker, & Gerken, 2002; Thiessen & Saffran, 2007). In this series of experiments, we ask whether statistical learning itself also changes as a function of linguistic experience. That is, does experience with the native language alter the kinds of statistical regularities that learners use to identify predictable relations among elements of the input in ways that are consistent with the predominant statistical structure of the native language?

Implicit in measures of ‘predictability’ is the assumption that mechanisms of learning, memory, and perception are inherently specialized for detecting forward-directed temporal relationships (Hawkins & Blakeslee, 2004; see Table 1). This assumption is shared by an established tradition conceptualizing language knowledge as the ability to predict incoming linguistic material based on previous context (Barr, 2007; Conway, Bauernschmidt, Huang, & Pisoni, 2010; Elman, 1990; Miller, Heise, & Lichten, 1951; Rubenstein, 1973). However, another useful source of information at the service of unitization documented in humans is retrodiction-based (Table 1). Jones and Pashler (2007) showed participants sequences of shapes governed by probabilistic relations, and then asked them to choose which shape reliably came after a probe shape (prediction test) or before a probe shape (retrodiction test). They found that prediction was never superior to retrodiction, and that both were used effectively for recalling memories. Likewise, both adult participants (Perruchet & Desaulty, 2008) and infants (Pelucchi, Hay, & Saffran, 2009) can perceive word boundaries in a continuous sequence of language-like stimuli based on backward transition probabilities alone.

Table 1.

Two measures of predictability of sequential structure.

Measure Description Formula
Prediction For any given sequence of elements XY is the forward conditional probability of Y given X, calculated by normalizing the co-occurrence frequency of X and Y by the frequency of X fwd-TP(Y|X) = freq(XY)/ freq(X)
Retrodiction Operationalized as the backward conditional probability of X given Y (in the sequence XY), and calculated by normalizing the co-occurrence frequency of X and Y by the frequency of Y back-TP(X|Y) = freq(XY)/freq(Y)

The fact that both prediction and retrodiction are informative in memory, perception, and language tasks suggests that learners need to combine these cues. In this study we asked whether experience with language-specific forms of predictive and retrodictive regularities emerging from different word orders may alter the way learners perceive groupings of syllables in otherwise unparsed sequences.

1.1. A link between prediction, retrodiction and word order structure in language

Our study started with a simple intuitive rationale. In English, the head elements in a phrase come first, while in Korean the head comes at the end of the phrase. The English sentence “I saw him go there” is glossed as “I him there go saw” in Korean. Likewise “Give me the ball” is glossed as “Ball me give”; “Let’s go get some food” is glossed as “Food get go let’s”. English is also prepositional (‘at school’), while Korean is postpositional (‘school at’). Thus, the most common and frequent constructions such as transitives, imperatives, and exortatives in English have a reversed word-order in Korean.

We conjectured that because the linear word order relations in English and Korean are often mirrored, different sets of expectancies for predictive and retrodictive dependencies may emerge during learning each specific language: for example, in English the predictive probability of the noun school following the preposition at (p(school|-at)) is lower than the retrodictive probability of at preceding school (p(at|school)), because several nouns can follow the preposition at, while given a noun the preposition at is one of the few possible preceding words. If the natural word order in Korean is opposite to English (school at), the opposite predictive and retrodictive patterns as English should apply.

To corroborate these intuitions, we first conducted large-scale corpus studies of the two languages. Then an artificial grammar was created that inherently contained alternative patterns of predictive and retrodictive relations between adjacent elements. This grammar is equally parsable on the basis of predictive or retrodictive cues to structure, so any preference that learners demonstrate for one directionality over another must derive from previous biases that learners bring to the experiment. Thus, the grammar was used as a litmus test for assessing potential prior biases on learning. To establish whether the biases should be attributable to experience with language or general sequential biases, we tested the learnability of our grammar in a sequential learning task across speakers of the two languages with opposite word order in question – Korean and English – and in three different modalities: auditory linguistic (speech), visual non-linguistic (abstract shapes), and auditory non-linguistic (pure tones).

We further reasoned that if sequential learning mechanisms are directly involved in language acquisition and processing, as it has been put forth, these mechanisms should show language-specific biases effects when adults are engaged in sequence-learning tasks with speech-like stimuli. In addition, if the bias is due to language experience – and not to some more general temporal processing bias – adult participants engaging in the same sequence task using non-linguistic stimuli (visual shapes and auditory tones), should behave consistently irrespective of their language background. Another possibility is that sequential learning mechanisms are shared among perceptual modalities and exhibit inherent a priori biases for sequences of stimuli, for example for predictive relations. In this latter case, we would expect a consistent pattern of preference across languages and modalities. Finally, there may be patterns of preference consistent across languages but differing by modality, in which case any effect may be attributable to modality-specific biases.

2. Corpus analyses

We quantified the hypothesis that word order tendencies in Korean and English generate opposite patterns of predictive and retrodictive conditional probabilities that signal phrase cohesiveness and syntactic information.

2.1. Corpora

For English we used the SUSANNE Corpus, consisting of 130,000 words of published American English annotated with part-of-speech (POS) and syntactic information (Tree-bank). 1 For Korean we sampled the freely available Sejong Corpus, with a syntactically annotated subcomponent containing 800,000 words.2 For each sentence in the corpus, we derived unigram and bigram frequency counts as well as forward and backward transitional probability statistics between any two words. Ngram frequencies in English were sampled from the Google Ngram database for the year 2000 (~4 million unigrams and 60 million bigrams). Korean ngram token frequencies were summed over three different corpora: Sejong, HC Korean (55 million unigrams), and KAIST (70 million unigrams) in order to obtain reliable frequency counts. Finally, for each word pair in a sentence we derived the level of syntactic boundary inherent in the syntactic annotation. We then ran ordinal logistic regressions to predict the level of the syntactic constituent node (a measure related to phrase cohesiveness; see definition below) between any two adjacent words in a sentence in the corpus. The independent variables were the forward and backward transition probabilities between adjacent words, as well as the unigram (single word) and bigram (two words) frequencies. Details of our measures are described below.

2.2. Corpus measures of phrase cohesiveness

2.2.1. Independent variable I: Ngram frequencies

Because several psycholinguistic studies have shown that humans are sensitive to the logarithm of event frequencies as opposed to raw frequencies, it is customary to consider log-frequencies as opposed to raw frequencies. The logfrequency of a sequence of two words (logBigram) can be taken as an approximation of phrase cohesiveness, following Tremblay, Derwing, Libben, and Westbury (2011). The logfrequency of each individual word can also be useful in predicting headedness, as higher frequency words tend to be heads of phrase constituents (Gervain, Nespor, Mazuka, Horie, & Mehler, 2008).

2.2.2. Independent variable II: Conditional probability

Another way to measure how likely two words are to occur together is to look at a word and estimate what words are likely to follow it. The likelihood of a given word following is the forward probability of the word pair. For example, for the sequence ‘in Sapporo’:

fwdTP(Sapporoin)=freq(in_Sapporo)/freq(in)

The calculation can also be computed in the opposite direction. That is, examine a word and estimate what words are likely to precede it. The likelihood of a given word preceding is known as the backward probability between the two words.

bckTP(inSapporo)=freq(in_Sapporo)/freq(Sapporo)

For example, suppose the word “in” occurs 2853 times in the corpus, but the word “Sapporo” occurs only nine times and the sequence of words “in Sapporo” occurs three times. Since the word “in” occurs 2853 times, and only 3 of those times with the word Sapporo, this pair of words has a very low forward probability (3/2853), However, if we examine the pair from the opposite direction, we see that three out of the nine times the word “Sapporo” appears, it is preceded by the word “in”. Thus, the backward probability is 3/9, or .33.

2.2.3. Dependent variable: Phrase structure cohesiveness

To estimate the informativeness of transitional probabilities and frequencies in parsing at the constituent level, we followed Johnson (1965): Sentences from the tree-tagged Susanne corpus and Sejong corpus were divided up into phrasal constituents. For every word pair transition considered linearly from left to right, it is possible to rank-order the level at which a constituent node for that transition occurs. For example, in “[[[The house] [across [the street]]] [is burning]]” the highest node is at transition 5 (street_is), followed in rank by transition 2 (house_across), then transition 3 (across_the). Finally, transitions 1, 4, and 6 are tied on the same rank. Using the syntactically annotated corpora, every bigram in each sentence can be assigned a syntactic rank, following the example above.

Typological studies of world languages have uncovered important correlations in the linear order of the constituents in different subdomains. For instance, the constituent order of a clause (the relative order of subject, object, and verb); the order of modifiers (adjectives, numerals, demonstratives, possessives, and adjuncts) in a noun phrase; and the order of adverbials and adpositions. Most languages appear to have a preferred word order that is usually most frequent. This ordering of constituents is often represented as a tree where branches can be divided into other minor branches, which may also branch in turn. English is often described as a right-branching language, because it tends to place dependents after the head words. Nouns follow determiners, direct objects follow verbs, and adpositions are prepositional. This type of branching is also known as head-first order. Left-branching languages, like Korean and Japanese, exhibit the opposite tendency, that is, they tend to place the head element of a phrase to the left. Objects appear to the left of verbs, sentences appear to the left of subordinating conjunctions and noun phrases appear to the left of prepositions (which, for this reason, are often called postpositions in these languages). Since postpositions come after the noun in left-branching languages, our example phrase, “in Sapporo,” would actually be in the opposite order, “Sapporo in”.

While these typologies are well-established, the way that they influence transitional probabilities between elements of the input has not previously been studied. Are there systematic correlations between word order typology and language-specific probabilistic expectations between sequences of adjacent words in English and Korean? For example, considering the English sequence “in Sapporo” the forward probability is expected to be low, arguably because many words can follow “in” (Rome, New York, summer, me, the, lovely, etc.). Conversely, the backward probability should be high, because only a few words are expected before “Sapporo” (to, in). Thus a pattern of “forward low-backward high” probability (LoHi for short) is expected to indicate tighter constituents in English. If we express this combined pattern as the algebraic difference between the forward and the backward probability values (TPdiff), for any word pair in a sentence we should expect larger negative TPdiff values to be associated with more cohesive phrase units in the syntactically-tagged English corpus. Thus, the difference between forward and backward probabilities could be taken to predict the level of syntactic constituency between any word pair in the syntactically tagged corpora. Notably, for Korean the pattern of transition probabilities is expected to be reversed. For “Sapporo in”, the forward probability should be high relative to the backward probability. Thus, HiLo patterns are expected to be associated with tighter phrase boundaries in Korean. Using the same differential measure (TPdiff) between forward and backward probability, this time we can expect larger positive values of TPdiff associated with more cohesive phrase units in the syntactically-tagged Korean corpus.

These predictions “by example” are by no means granted for the whole language. In the syntactic literature it has long been noted that the right-branching/left-branching dichotomy may not hold for an entire language, and in the case of English it is not fully consistent even at the phrasal level (for instance for the word ordering within a Noun Phrase, see Cook & Newson, 2007). Thus, it is important to evaluate whether these probabilistic biases are significantly and robustly correlated with word order across the two language corpora.

2.3. Results

All statistical analyses in this study were conducted in R (R Development Core Team, 2011). Ordinal logistic regressions were run to predict the syntactic tree level between any two members of word pairs (word1, word2, e.g., “in Sapporo”) in a sentence. The syntactic tree level was obtained from the syntactic parsing provided in the Susanne and Sejong corpora (henceforth English and Korean corpus respectively). Therefore, the tree level (henceforth tree) was the dependent variable to be predicted by the regression models. The following predictors were considered: log frequency of each bigram, forward probability, backward probability, log-frequency of first word, and log-frequency of second word. Because node levels above 6 were very infrequent in both corpora, we considered the first six node levels, accounting for 99.5% of bigrams in the English corpus (109,861 bigrams entered in the analyses) and for 98.1% of bigrams in the Korean corpus (22,382 bigrams entered).

2.3.1. Model fit

For each corpus, ordinal logistic regression models with different complexity were fitted and compared for goodness of fit. The null model contained no predictors, then increasingly complex models added logBigram, Forward Probability, Backward Probability, LogFrequency of first word, and LogFrequency of second word as predictors. Analyses of deviance between each increasingly complex model indicated that including all predictors except Log-Frequency of second word increased the fit of each regression model significantly with respect to the previous less complex model by reducing deviance. This result held for both corpora. Thus, in the following analyses the log-frequency of the second word was excluded as a predictor. All other variables contributed significantly to predict the level of syntactic node between any two adjacent words in a sentence in the corpus. Using the lrm function in R we were also able to assess the goodness of fit of the models. As the p-values of the G test statistics is 0 in both language models, the null hypothesis can be rejected that there is no overall significant relationship between the dependent variable tree and the independent variables. The predictive ability of the model can also be measured using C, an index of concordance between the predicted probability and the observed response. (if C = 0.5 the predictors are random, when it is 1, prediction is perfect). Since C = 0.71 for English and C = 0.64 for the Korean corpus, we have confidence that both models have moderate to strong predictive capacity. Somer’s Dxy is a rank correlation between predicted probabilities and observed responses. It ranges between 0 (randomness) and 1 (perfect prediction). Since Dxy = 0.43 (English) and Dxy = 0.30 (Korean) we have again confidence that both language models have moderate predictive capacity. Kendall’s Tau-a rank correlations also assess the correlations between all predicted probabilities and observed response. Here Tau-a = 0.26 (English) and Tau-a = 0.16 (Korean).

2.3.2. English corpus

We were particularly interested in the coefficient sign of the independent variables across the two corpora. Table 2 reports the table of coefficients, their standard errors, the (Wald) z-test, and associated p-values. Coefficients are given in units of ordered log odds. For example, for a one unit increase in logBigram, we expect a 0.05 increase in the expected value of syntactic node rank (the dependent variable) on the log odds scale, given all of the other independent variables in the model are held constant. For English, the coefficients for logBigram were significant and positive going (see also Fig. 1, first row, first panel), indicating that the more frequent bigrams are associated with higher (i.e., less cohesive) phrase boundaries. This result runs counter the hypothesis that the frequency of a bigram can be used to partially predict phrase cohesiveness (Tremblay et al., 2011).3 Most importantly for the hypothesis being tested in this study, lower forward probabilities and concurrently higher backward probabilities (a LoHi pattern) were associated with higher phrase cohesiveness, as indicated by a positively oriented coefficient for TPdiff (see Fig. 1, first row, third panel). Remember that low syntactic levels indicate that the word pair tends to be occurring within the same phrase, or a across a transition that is at a lower level up the syntactic tree. In addition, higher frequency of first words was associated with more phrasal cohesiveness (Fig. 1, first row, second panel), in accord with Gervain et al. (2008).

Table 2.

Coefficient estimates of ordinal logistic regressions of syntactic node rank in the corpus.

Independent variable Coefficient (ordered log odds) SE Wald Z
English corpus
logBigram 0.05 0.005 9.19a
logFrequency 1st word −0.41 0.007 −58.36a
TPdiff 3.25 0.051 63.04a
Korean corpus
logBigram 0.46 0.022 20.20a
logFrequency 1st word 0.50 0.017 28.79a
TPdiff −0.52 0.081 −6.33a

Note: Coefficients are in units of ordered log odds. For example, for a one unit increase in TPdiff, we expect a 3.25 increase in the expected value of Node Rank on the log odds scale, given all of the other variables in the model are held constant.

a

p < 0.001.

Fig. 1.

Fig. 1

Partial effects of the three independent corpus variables entered in the ordinal logistic regressions. Y axis indicates the probability of a higher constituent rank (a less cohesive phrase boundary) between any two words in the English and Korean corpora.

2.3.3. Korean corpus

LogBigram frequency was positively associated with syntactic depth, indicating again that less cohesive phrases tend to be more frequent (Fig. 1, second row, first panel). Crucially, the coefficient for TPdiff was now positive (and reversed with respect to English), indicated that higher forward probabilities and lower backward probabilities (a HiLo pattern) were associated with higher phrase cohesiveness (Fig. 1, second row, third panel). In addition, higher frequency of first words was associated with less phrasal cohesiveness (Fig. 1, second row, second panel), in accord with Gervain et al. (2008).

2.3.4. Summary

When comparing English and Korean, the patterns of probability that support syntactic parsing are clearly reverse in the two languages, as predicted. In particular, phrase cohesiveness correlates with a LoHi pattern of transition probabilities in English, and with a HiLo pattern in Korean. Below we ask whether these language-specific patterns of probabilities are a source of experience-induced bias when learners group novel stimuli in a sequence learning task.

3. Experiment 1: Sequential learning with language-like stimuli

The corpus analyses above provide an empirical basis to test our main hypothesis that the predictive regularities most consistently experienced in one’s native language impose processing biases on human sequential learning (Table 3). A speech-synthesized stream of syllables was constructed so that two mutually exclusive sets of syllable groupings could possibly be perceptually parsed, according to either a bias for a LoHi probability pattern (as in English ‘to school’, Table 3), or a HiLo probability pattern (as in Korean ‘school to’, Table 3). Because the two sets were equally frequent (for the HiLo grouping, the mean frequency was M = 59.2, SD = 2.9; for the LoHi grouping, M = 59.3, SD = 3.2; difference ns; Cohen’s d = .03), a consistent preference for either of them would be indicative of a statistical learning bias developed prior to the experiment.

Table 3.

Forward and backward transition probabilities and frequencies associated with any two adjacent stimuli in Experiments 1–3.

Transition from Transition to Forward TP Backward TP Grouping type Frequency of 1st symbol Frequency of 2nd symbol Frequency of grouping
X D 0.33 1.00 LoHi 178 59 59
X E 0.34 1.00 LoHi 178 60 60
X F 0.33 1.00 LoHi 178 59 59
D Y 1.00 0.33 HiLo 59 178 59
E Y 1.00 0.34 HiLo 60 178 60
F Y 1.00 0.33 HiLo 59 178 59
Y A 0.34 1.00 LoHi 178 60 60
Y B 0.36 1.00 LoHi 178 64 64
Y C 0.30 1.00 LoHi 178 54 54
A X 1.00 0.34 HiLo 60 178 60
B X 0.98 0.35 HiLo 64 178 63
C X 1.00 0.30 HiLo 54 178 54

Note: Letter symbols stand for variables, which were instantiated in monosyllabic pseudowords in Experiment 1, abstract shapes in Experiment 2, and pure tones in Experiment 3. For example, given A only X can follow with a forward TP(X|A) = [1]. Given X there is a .34 probability that A precedes it (backward TP(A|X) = (.33)). Frequencies of individual symbols and symbol pairs are also reported. Notice that because the process generating the sequence was a true random process, probabilities and frequencies for the particular training sequence generated slightly different values. This sequential arrangement yields at least two perceptual parses: Groupings could emerge either when the forward transitional probability between adjacent symbols was high and the backward probability was low (HiLo groupings), or viceversa (LoHi groupings).

3.1. Method

3.1.1. Participants

Thirty-seven English monolingual and 36 native Korean students participated. Korean participants were enrolled in graduate programs at the University of Hawaii, and their scores in the TOEFL test of English as Second Language were on the high end (M = 252.14, out of 300, SD = 16).

3.1.2. Materials

For Experiments 1–3 a template sequence of 711 letter symbols was generated according to the rules of a stochastic Markovian grammar chain (see Knowlton & Squire, 1996). The process started by choosing one of eight possible symbols (X, Y, A, B, C, D, E, F) at random, and then generating the next symbol according to the probabilistic sequencing rules specified in Table 3. For example, given the symbol X, three possible symbols could follow (D, E, or F), each with equal probability. Given any of these three symbols, say E, only one symbol could follow (Y), and so on. Table 3 specifies all possible transitions in the grammar (see also Fig. 2), as well as the frequencies of occurrence of the symbols. While the sequence templates for both train and test stimuli is common to Experiments 1–3, the letter symbols functioned as placeholders for different stimuli instantiations in each experiment, according to a specific modality. In Experiment 1, the sequence was realized as the continuous concatenation of eight monosyllabic words to form a pauseless 5-min speech stream. We assigned each letter placeholder to a given monosyllabic word (X = /fʊ/, Y = /dɪ/, A = /bʊ/, B = /ɹa/, C = /ti/,D = /ʃε/, E = /gε/, F = /ni/), with 80 ms for consonants and 260 ms for vowels. Because we were interested in the perception of grouping boundaries as driven by statistical biases alone, a speech synthesizer was used (MBROLA, Dutoit, 1997) eliminating possible prosodic cues to grouping boundaries. In addition, the sequence faded in and out for 5 s, giving the impression of an infinite loop. The Italian diphone set in MBROLA was chosen to make the words dissimilar to English and Korean, but still clearly perceivable in both languages, and to engage participants in a foreign language learning task for both groups. Phonemes had equivalent phonemic realizations in English and Korean, and all syllable sequences were phonotactically legal. No participant knew Italian. Importantly, whenever the forward probability was low between any two adjacent syllables, (fwdTP(zi|ʃε = .33)), the backward probability was high (backTP(ʃε|zi = 1)), and vice versa (Table 3). At test, two groupings corresponding to a pattern of HiLo probability and LoHi probability were pitted one against the other in a forced-choice task. None of the possible groupings was an actual syllable sequence in either language. Six test pair trials were presented in random sequential order, while the order within a pair was counterbalanced by repeating each test pair twice, for a total of 12 test trials.

Fig. 2.

Fig. 2

Upper half of the figure. A representation of the Markov rule chain used in Experiments 1–3. Arrows represent the possible continuations at time t + 1 for a given symbol at time t, and the associate probabilities. Values above arrows indicate forward probabilities and values below arrows indicate backward probabilities. In the lower half of the figure, a sample of the sequential template generated by the Markov process (first row), with the corresponding instantiation with syllable stimuli (second row) that English and Korean speakers listened to in the learning phase of Experiment 1. Perceptual parses could emerge during training either when the forward transitional probability between adjacent syllables was high and the backward probability was low (HiLo groupings, third row), or vice versa (LoHi groupings, fourth row). At test participants were queried about which of these two sets of groupings they preferred.

3.1.3. Procedure

Stimuli were presented via headphones and the experiment was controlled by a computer program. Participants in each language group were randomly assigned either to the experimental condition that included Training and Test or to a control condition that included the Test phase only (18 English native speakers, 21 Korean native speakers). This was to further ensure that any preference for a specific grouping was not due to syllable sequences sounding inherently familiar in English or Korean, irrespective of the training phase. In the experimental condition, participants listened to the training stream for 5 min, and were then presented with a two forced-choice task between pairs of LoHi and HiLo groupings. For each pair they were asked to choose which sound sequence formed a grouping in the novel language they had just heard. Instructions were administered in the native language of participants.

3.2. Results

Participants responses were coded in terms of proportion endorsements for HiLo groupings (see Fig. 3). Consequently, low endorsement rates for HiLo indicate preferences for LoHi groupings. A 2 (Language: Korean, English) × 2 (Condition: Experimental, Control) ANOVA revealed a main effect of Language (F(1.72) = 11.22, p < 0.01), and a Language by Condition interaction (F(1.72) = 10.67, p < 0.01). In particular, English native speakers exposed to training preferred the HiLo groupings with a mean proportion of 0.38 (t(18) = −3.68, p = 0.0016, Cohen’s d = 0.70, 95% CI = 0.31 ≤ μ1 ≤ 0.44). Korean native speakers exposed to training preferred HiLo groupings with a mean proportion of 0.59 (t(14) = 2.53, p = 0.024, Cohen’s d = 0.52, 95% CI = 0.51 ≤ μ1 ≤ 0.66). Thus, English and Korean participants reliably attended to the transitional probabilities that were most predictive of the canonical word order of their native language, as predicted by the corpus analyses. When presented with the Test items alone without training, no preference emerged for either groupings above or below chance between English and Korean (English, M = 0.51, t(17) = 0.44, p = 0.67; 95% CI = 0.45 ≤ μ1 ≤ 0.58; Korean, M = 0.52, t(20) = 1.03, p = 0.32, 95% CI = 0.47 ≤ μ2 ≤ 0.57). This confirmed that the bias in the experimental condition was not due to inherent preferences for certain sound combinations of the test items.

Fig. 3.

Fig. 3

Proportion of endorsements for HiLo (versus LoHi) groupings of items in Experiments 1–3. Error bars are standard errors. Top left panel: Opposite preferences for predictive and retrodictive patterns by English and Korean speakers in Experiment 1 appear to reflect learning biases for regularities that are most relevant in the participant’s native language, based on the canonical word order of their native language. The results from the Korean speakers, who had received extended explicit formal instruction in English, suggest that implicit statistical learning biases may continue to ‘leak’ into second language learning and processing. Top right panel: Control condition in Experiment 1. Participants not trained on the sequence exhibited no preference for language-like groupings. Bottom left panel: No language bias emerges when participants are exposed to the same grammar instantiated with shapes. Bottom right panel: When instantiated with musical tones, both Korean and English speakers preferred HiLo patterns.

A word is in order regarding the possible ‘level of analysis’ that our artificial sequence of syllables afforded, as it relates to our corpus analyses. Those analyses indicate that patterns of forward and backward probabilities between words are informative about phrase-level regularities. In the current experiments, however, participants are not asked to analyze the input in terms of words and phrases. The instructions to the participants were minimally informative, and simply conveyed that participants would be listening to a novel “sequence of sounds”. These instructions – and this experimental procedure – are consistent with what are typically described as “word segmentation” experiments: exposure to a sequence of sounds, followed by an assessment of how participants grouped those sounds (e.g., Aslin et al., 1998).

This raises an important question: why should language-wide phrase-level regularities influence participants’ performance in a word segmentation task? One possibility is that participants did not treat the task as one of word segmentation. Because our instructions were minimally constraining, and the sequence itself minimally informative (a monotone sequence of syllables played in a loop with no clear beginning or end), there are a variety of ways in which participants could have interpreted the task. As such, it is possible that participants implicitly treated the task as one of grouping (monosyllabic) words into (two-word) phrases. From this perspective, it is straightforward to explain how phrase-level native-language regularities might influence participants’ interpretation of the artificial grammar.

An alternative possibility is that phrase-level regularities, once learned, are applied more widely to linguistic input. This could be the case, for example, if the same statistical learning mechanisms that underlie word segmentation also underlie phrase learning (e.g., Thompson & Newport, 2007). If so, a bias acquired from one level of analysis might naturally extend to other levels of linguistic structure that are parsed via the same underlying mechanism. Our goal in this experiment was not to construe the task in a specific way, but to explore aspects of perceptual grouping, and how people group items as a function of their prior experience. The effects of experience we found can be interpreted as reflecting the statistical properties of word order in two different languages. It would be interesting to expand these initial findings in future studies, and assess for example whether the corpus statistics gleaned from a morphological versus a phrasal analysis in natural languages conflict. If they did, one would expect that participants ‘switch’ to these different statistics in artificial language tasks according to the level they are induced to tap into, for instance by manipulating the initial task instructions, or by adding pauses between syllables to make words stand out perceptually. For the purpose of this study, it was central for us to demonstrate possible language-specific biases in a generic sequential learning task that was not construed to tap into any specific level of analysis. These kinds of grouping biases may ultimately be important for both word segmentation and for phrase learning, especially in terms of phrase knowledge developing from item-based grammars (e.g., Tomasello, 2003).

4. Experiment 2: Visual sequential learning

In order to further ascertain that the different endorsements for HiLo patterns between Korean and English speakers in Experiment 1 were due to language-specific biases, in Experiment 2 we tested whether learning biases would arise when the same miniature grammar was implemented with non-linguistic stimuli in the visual modality. As discussed previously, sensitivity to forward and backward probabilities has been demonstrated in non-language domains, including visual processing (Fiser & Aslin, 2002; Jones & Pashler, 2007). However, the two cues were not pitted against each other in those experiments, but rather one or the other was maximally informative in the input. Here, at any stimulus transition the two cues are equally informative, but pitted against each other. Therefore we expected one of two scenarios. A null result would obtain if learners as a group weigh each cues equally, as indeed Jones and Pashler’s (2007) data suggest. Alternatively, if a priori visual non-linguistic preferences are attested in participants’ responses, we expected them to be general visual sequential processing biases not influenced by language experience. Thus, regardless of scenario, we expected no differential preferences based on the language of our participants.

4.1. Method

4.1.1. Participants

Fourteen new English and 15 Korean native speakers from the same population as Experiment 1 participated.

4.1.2. Materials

A continuous sequence was generated that had exactly the same structure and length as Experiment 1, with the only exception that the eight synthesized syllables of Experiment 1 were now replaced by eight abstract shapes (a subset of Kroll & Potter, 1984; see Appendix A). Shapes appeared in succession on the screen for 340 ms. The training sequence and test items had the same statistical properties as the language in Experiment 1.

4.1.3. Procedure

The same learning and test procedure as the experimental condition of Experiment 1 applied. At test participants received a two forced-choice task between 12 pairs of LoHi and HiLo shape groupings. For each pair they were asked to choose which one formed a grouping in the sequence they had just seen. All instructions were administered in the native language of participants to make Experiment 2 as comparable as possible to the conditions of Experiment 1.

4.2. Results

As in Experiment 1, participant responses were coded in terms of proportion endorsements for HiLo test items (see Fig. 3). The English group (M = 0.50, SD = 0.16) and the Korean group (M = 0.51, SD = 0.17) did not differ, t(27) = −0.16, p = .87, Cohen’s d = −0.06, 95% CI = −0.14 ≤ μ1–μ2 ≤ 0.12). Moreover, mean test items endorsements did not differ from chance in either language group (English, t(13) = 0.03, p = 0.97, 95% CI = 0.40 ≤ μ2 ≤ 0.60; Korean, t(14) = 0.26, p = 0.8; 95% CI = 0.42 ≤ μ1 ≤ 0.60). To check whether the non-significant difference in means between the English and Korean group were due to a lack of statistical power, we conducted post hoc power analyses, with power (1 − β) set at 0.80 and α = 05, two-tailed. This showed us that sample sizes would have to increase up to N = 4,668 per group, in order for group differences to reach statistical significance at the .05 level, for the obtained effect size of −0.06. Thus, it is unlikely that our negative findings can be attributed to a limited sample size. In addition, the magnitude of the effect size for this experiment is negligible (d = 0.06), especially when compared to the effect size obtained for the mean difference between English and Korean in Experiment 1 (d = 1.50). One interpretation of this null finding is that both groups gave equal weight to forward and backward probabilities and thus endorsed them equally often at test. Without prior bias from a specific form of experience, this is expected. Another interpretation is that both groups were unable to learn any regularities in the stream, not because they were tracking both, but because they were tracking neither. Although this latter interpretation cannot be ruled out a priori, our design intently used very similar training conditions and visual stimuli to Jones & Pashler and Fiser & Aslin, whose conjoint findings indicate successful visual learning based on prediction and retrodiction. Thus, previous literature using a very similar training and testing paradigm established that statistical learning operates over visual sequential stimuli for both forward and backward probabilities independently, lending support to the interpretation that our results in Experiment 2 are not due to absence of learning or disengagement of our participants from the task, but to absence of previous experiential bias for specific patterns of predictive/retrodictive expectations for visual stimuli.

5. Experiment 3: Auditory sequential learning with non-linguistic stimuli

The results of Experiment 2 indicate that the difference in directional preference between English and Korean speakers with speech stimuli may not extend to visual sequences. This is consistent with the hypothesis that what drives the difference between language speakers in Experiment 1 is their experience with language. The ability to learn structural properties of sequences in the auditory modality with non-speech sounds has also been documented (e.g., Conway & Christiansen, 2006; Creel, Newport, & Aslin, 2004; Saffran, Johnson, Aslin, & Newport, 1999). Some of these studies have documented that there may important differences between processing visual stimuli and audio stimuli, due in part to the more transient nature of audio information (e.g., Conway & Christiansen, 2005) or due to possible inherent learning asymmetries in the two modalities (Marcus, Fernandes, & Johnson, 2007). As such, it may be the case that the differences between English and Korean speakers are not specific to language, but rather arise from more general differences in auditory processing. To assess this possibility, we created a tonal analog of the input from Experiment 1.

5.1. Method

5.1.1. Participants

Both English monolinguals (N = 15) and Korean/English bilingual (N = 15) who reported Korean as their dominant language participated in this experiment. All English monolinguals were undergraduates at Carnegie Mellon University, as were six of the Korean English bilinguals. The other nine bilingual participants were recruited via advertising in Pittsburgh churches.

5.1.2. Materials

Each of the placeholders in the language used in Experiment 1 was replaced by a unique tone (letter A = tone A4, letter C = tone B, letter E = tone C#, letter Y = tone D, letter X = tone E, letter F = tone F#, letter D = tone G#, letter B = tone A5) in the key of A major. The resulting training and test tonal sequences thus had an identical statistical structure as the language used in Experiments 1 and 2. Each tone lasted 330 ms and there were no inter-tone latencies.

5.1.3. Procedure

Participants listened to the tone sequence over headphones. Next, participants were given 12 forced choice questions and asked to indicate, on a response sheet, which of two items sounded “more like” the tone sequence they had just heard. On each of the 12 questions, a tone item with a high-forward, low-backward transitional probability was paired with an item with low-forward, high-backward transitional probability, making this test structurally comparable to those in Experiments 1 and 2.

5.2. Results

Again, participant choices were scored in terms of proportion of endorsements for HiLo tone patterns (see Fig. 3). The English group (M = 0.57) and the Korean group (M = 0.58) did not differ, t(28)=−0.25, p = 0.80, Cohen’s d = −0.09, 95% CI = −0.10 ≤ μ1–μ2 ≤ 0.08). As in Experiment 2, to establish whether the non-significant difference in means between the English and Korean group was due to a lack of statistical power, we conducted post hoc power analyses, with power (1 − β) set at 0.80 and α = 05, two-tailed. This showed us that sample sizes would have to increase up to N = 1,856 per group, in order for group differences to reach statistical significance at the .05 level, for the obtained effect size of −0.09. Thus, it is unlikely that our negative findings can be attributed to a limited sample size. In addition, even if that power was achieved the magnitude of the effect size for this experiment would be negligible (d = 0.09), especially when compared to the effect size obtained for the mean difference between English and Korean with word stimuli in Experiment 1 (d = 1.50).

In addition, both language groups selected test items with high forward transitional probabilities (HiLo items) at a rate above chance (English, M = 0.57, SD = 0.09, t(14) = 2.6, p = .02, 95% CI = 0.51 ≤ μ1 ≤ 0.62; Korean, M = 0.58, SD = 1.14, t(14) = 2.1, p = .05, 95% CI = 0.50 ≤ μ2 ≤ 0.66). The fact that they performed equivalently in this experiment is consistent with the hypothesis that the differences between these two groups in Experiment 1 are language-specific, and strengthen the claim that they arise from linguistic experience. Unlike adults’ lack of preference for shape test items in Experiment 2, though, participants in this experiment did have a consistent preference for test items with high forward transitional probabilities, regardless of language background. We refer to the Discussion below for an interpretation of these data. For our purposes, the data strengthen the main hypothesis that the preferential differences in transition probabilities in Experiment 1 are driven by experience with specific language patterns.

6. Experiment 4: Probabilistic sensitivity or linguistic parameter setting?

In Experiment 1 we attempted to control for some of the frequency effects that might provide an alternative explanation of our results. For example, we ensured that HiLo and LoHi groupings did not differ in frequency. However, in order to create the specific balance of forward and backward transition probabilities the frequency of individual syllables was not equal across all syllables. Table 3 shows that the elements X and Y were three times more frequent than any other element. For our training and test stimuli this translated in alternating patterns of more frequent and less frequent elements. Given this, learners could have paid attention to the differences in frequency (and not the difference in transition probabilities) during training and then preferred test items with the frequent element in either initial position (e.g., test item XD, in which X is more frequent than D) or final position (e.g., test item AX, in which A is less frequent than X). The role of frequency is not inconsistent with a distributional approach to sequential learning (e.g., Perruchet & Pacteau, 1990). Frequency potentially plays a role in word order for natural languages as well, in fact our regression models for the corpus analyses indicated that frequency of first syllable was an independent predictor of phrase constituency along with transition probabilities. Specifically, higher frequency of a word in the English corpus was positively associated with a tighter phrase, while in Korean the association was negative. This finding matches the intuition that more frequent words occur at the beginning of a phrase (e.g. “at school”, “the dog”, where prepositions and articles are vastly more frequent than specific noun tokens) while this tendency is reversed in Korean (see our example of postpositions, “school at”).

Researchers have noted that in natural languages there is a large frequency discrepancy between functor and content words, and this distributional arrangement may be useful in finding structural aspects of the input (Gervain et al., 2008; Gomez, 2002). For example, Gervain et al., 2008 argued that prelexical infants are sensitive to such differences, and use them to bootstrap word order in their native language. While this account is compatible with a distributional approach to sequential and language learning, Gervain et al. (2008) interpret the role of frequency as bootstrapping learners into language-specific abstract prewired structural representations. Under this account, speakers of a given language would possess knowledge of a Head–Complement parameter that determines whether languages place the Head of a syntactic phrase first, and its Complement second, or in the reverse order (e.g., Rizzi, 1986, quoted in Gervain et al.). English is described as being Head-first, while Korean as Head-last. As a consequence of this theoretical interpretation, it is in principle possible that the adult participants in Experiment 1 treated the frequent elements X and Y as equivalent to function words, and the less frequent elements A, B, C, D, E, F as open-class words (see Table 3 for frequencies). The head-parameter account makes the specific prediction that English would prefer items like XD on the basis of a head-order parameter ‘Head-first’, while Korean participants would prefer items like AX, consistently with the predominant head-order parameter ‘Head-last’ in Korean.

We focus on the Gervain et al. study here, specifically because when applied to Experiment 1, their head-order account makes the exact same prediction as our statistical bias account. This is because the probabilistically determined LoHi items like XD that were preferred by the English group all begin with a frequent element (see Table 3), which is compatible with a Head-First account. Likewise, the HiLo items like AX preferred by the Korean group have the frequent element in second position, compatibly with a Head-Last account for the Korean language. Therefore, the results of Experiment 1 may not necessarily be the consequence of statistical learning biases, but of a linguistically preset head-order parameter in each language that participants applied to the stream by paying attention to frequent words. To rule out this latter possibility we devised Experiment 4, in which we pitted transition probabilities against frequency.

6.1. Method

6.1.1. Participants

We recruited twenty-nine English native speakers and 42 Korean dominant speakers from the same population as Experiments 1–3, but who had not participated in the previous studies.

6.1.2. Materials

We first describe the general template structure of the stimuli. The goal of this experiment was to create training sequences in which participants could attend to either transition probabilities between elements or to the frequency of individual elements to identify two-element groupings as in Experiments 1–3. If participants were sensitive to frequency based on the position of the most frequent element in their language (Frequent First in English, Frequent Last in Korean), and according to a parameter-setting account, they should disprefer groupings based on informative transition probabilities. Conversely, if participants were most sensitive to transition probabilities, they should prefer groupings based on TPs despite the fact that the most frequent element in those groupings ran contrary to the canonical frequency order of their language.

To achieve this state of affair, we needed to create a sequence wherein the Korean group would disprefer frequent-last test items if attending to TPs, and another in which the English group would disprefer frequent-first items if attending to TPs. These constraints lead to the realization that a single training sequence could not be obtained with all the desired statistical properties, and that we needed instead to generate one sequence for each language group. We found that the desired sequences had been created in a design by Perruchet and Desaulty (2008),4 in which participants were assigned to two comparable train and test sequences. Our sequences were modeled directly after theirs. Both contained 12 unique letter symbols (A,B, …,H, I,X,Y,Z) appearing in sequence with comparable ngram frequencies. What differed was the arrangement of these symbols, which generated certain predictive groupings. In the Forward Sequence (see Table 4, first half), forward transition probabilities were informative, with words alternating between high and low forward TPs (1 and 0.11 respectively). Backward probabilities were uninformative, in that they ranged uniformly across all transitions between 0.17 and 0.45. It is assumed that what counts as informative is not a specific TP value, but whether high or low TP values correlate consistently with symbol transitions. Therefore, if participants were doing a form of probability-based parsing we would expect them to perceive unitary groupings at those transitions where forward TPs were high, and to prefer items like BX over XA in Table 4, Forward Sequence (see also Perruchet & Desaulty, 2008, Table 2).

Table 4.

Test items used in Experiment 4, arranged by training sequence type and grouping type.

Transition from Transition to Forward TP Backward TP Grouping type Frequency of 1st symbol Frequency of 2nd symbol Frequency of grouping
Forward Sequence test items
B X 0.96 0.19 ProbBased 26 131 25
C X 1.00 0.17 ProbBased 22 131 22
E Y 1.00 0.21 ProbBased 23 109 23
F Y 1.00 0.22 ProbBased 24 109 24
H Z 1.00 0.19 ProbBased 22 115 22
I Z 1.00 0.19 ProbBased 22 115 22
X A 0.09 0.14 FreqFirst 131 83 12
X D 0.19 0.40 FreqFirst 131 62 25
Y D 0.16 0.27 FreqFirst 109 62 17
Y G 0.19 0.30 FreqFirst 109 71 21
Z A 0.32 0.45 FreqFirst 115 83 37
Z G 0.10 0.17 FreqFirst 115 71 12
Backward Sequence test items
A X 0.16 0.10 FreqLast 83 132 13
A Y 0.47 0.35 FreqLast 83 113 39
D X 0.41 0.20 FreqLast 66 132 27
D Y 0.26 0.15 FreqLast 66 113 17
G X 0.59 0.32 FreqLast 71 132 42
G Y 0.25 0.16 FreqLast 71 113 18
X B 0.20 1.00 ProbBased 132 27 27
X C 0.17 1.00 ProbBased 132 22 22
Y E 0.19 0.96 ProbBased 113 23 22
Y F 0.20 0.96 ProbBased 113 24 23
Z H 0.17 0.91 ProbBased 115 22 20
Z I 0.18 0.95 ProbBased 115 22 21

Note: English participants were tested on items from the Forward Sequence, by pitting “FreqFirst” items versus “ProbBased” items in a forced-choice test. FreqFirst groupings like ‘X A’ would be preferred if English learners relied on the higher frequency of the first syllable (see values in bold on the seventh column), on the assumption that higher frequency indicates the linguistic head of a grouping, as expected by Gervain et al. (2008). Conversely, if English speakers used high transition probabilities (values in bold on the third column) to group items together they should prefer ProbBased items like ‘B X’, despite the fact that the most frequent element X is in last position. Korean participants were tested on items from the Backward Sequence, by pitting “FreqLast” items versus “ProbBased” items. This arrangement allows to test whether learners rely on frequency or probability to parse the sequence of syllables they experience.

In addition to TP cues, the Forward Sequence contained high frequency and low frequency words alternating (see Experiment 1). Unlike Experiment 1, however, TP information and individual word frequency conflicted. Attending to high forward TPs would result in grouping words whose last syllable is more frequent (Frequent Last, e.g. BX). Because this Frequent Last order contravenes a putative Head First parameter in English, the Forward Sequence allowed us to test whether English participants used head-parameter information or probabilities to parse the stream. Therefore, the English group was assigned to the Forward Sequence. For test items, we chose 12 groupings, six that could be preferred based on TP information but whose frequency conflicted with the expected Head first parameter (“ProbBased” items, e.g., BX) and six that could be preferred by a putative Head first parameter (“FreqFirst” items, e.g., XA), and not on the basis of transition probabilities.

Similarly, the Backward Sequence (Table 4, second half) contained the same 12 unique letter symbols as in the Forward Sequence (A,B, …,H, I,X,Y,Z), occurring with equivalent frequency. Here, backward transition probabilities were informative, with word transitions alternating between high and low backward TPs (1 and 0.11 respectively), while forward probabilities were uninformative, ranging from 0.16 to 0.59 across transitions). We reasoned that participants may perceive unitary groupings in the sequence probabilistically when it is most informative, i.e., at transitions where the backTPs are highest, in accord with previous research (Pelucchi et al., 2009; Perruchet & Desaulty, 2008). However, parsing according to the backward probabilities would result in groupings whose first element has higher frequency. Doing so would automatically rule out an account of parsing based on Head parameter that according to Gervain et al. determines parsing in Head Last languages such as Korean.5 For this reason, the Korean group were assigned the Backward Sequence training and test items.6

Actual stimuli were generated for both sequences as in Experiment 1 by randomly assigning 12 synthesized monosyllabic words to the 12 letter symbols in each sequence, (/fʊ/, /dzi/, /m Inline graphic/, /dɪ/, /ʃε/, /bʊ/, /ɹa/, /ni/, /k ε/, /gε/, /lo/, /va/), and concatenating the words in a pauseless stream. The total length of the sequence as well as the length of individual words and the synthesized voice used were as in Experiment 1. Test stimuli were created as in Experiment 1 by grouping the monosyllabic words in Table 4. Thus, while the surface structure of the stimuli and the presentation was comparable to Experiment 1, the difference with Experiment 1 was the distribution of probabilities and frequencies that governed the ordering of the words (compare Tables 1 and 4 directly). This allowed us to discriminate whether participants were using probabilities or frequencies to parse the sequence.

6.1.3. Procedure

The exact same learning and test procedure as Experiment 1 applied. English participants were administered training and test items from the Forward Sequence stimuli, while Korean participants were administered training and test items from the Backward Sequence stimuli (Table 4). At test participants received a two forced-choice task between 36 pairs of ProbBased versus FreqLast (Korean) or FreqFirst (English) groupings, in counterbalanced order. As in the previous experiments, for each pair participants were asked to choose which one formed a grouping in the sequence they had just heard. All instructions were administered in the native language of participants consistently with previous experiments.

6.2. Results

Participants’ responses are reported in terms of proportion preference for ProbBased groupings. The English group preferred the ProbBased groupings with a mean proportion of 0.63 of times (SD = 0.18, t(29) = 3.9, p < 0.001, 95% CI = 0.56 ≤ μ1 ≤ 0.70). The Korean group preferred the ProbBased groupings with a mean proportion of 0.53 of times (SD = 0.18, t(41) = 0.98, p = 0.33, 95% CI = 0.47 ≤ μ2 ≤ 0.58). Thus, the English group appear to have relied on transition probabilities to perceive words groupings, and crucially not on the frequency of words in first position (e.g., XB), ruling out a putative head parameter explanation for the data in Experiments 4 and 1. Similarly, for the Korean group there was no evidence that they preferred groupings with the frequent word in last position (e.g., B X), which would be predicted if they had relied on a Head Last parameter set in Korean. Thus, Experiment 4 rules out a linguistic bootstrapping explanation of parsing based on frequency in adults, and suggests that the language preferences of Experiment 1 were driven by probabilistic biases, as we originally hypothesized. In addition, the results in Experiment 1 are not at odds with those obtained by Gervain et al. (2008) with Italian and Japanese infants. How can they be reconciled? While Gervain et al. claimed that their babies parsed an artificial grammar differently by mapping the higher frequency pseudowords to function words, a close inspection of their grammar reveals that the forward and backward pattern of probabilities in their grammar were not controlled for. Indeed, they appear to be similar to our artificial grammar in Experiment 1. Thus, the results of Gervain et al. could be reinterpreted in terms of infants’ developed language-specific sensitivities to predictive and retrodictive patterns of Italian and Japanese respectively.

7. Discussion

We tested the hypothesis that adult English and Korean speakers come to the lab having already developed opposite statistical preferences for parsing continuous speech. The results of Experiment 1 supported that hypothesis: where English speakers preferred items with high backward probabilities, Korean speakers preferred items with high forward probabilities. Experiments 2 and 3 suggest that this preference is limited to linguistic materials. This limitation is consistent with the possibility that the preference arises from experience with language. Furthermore, our large-scale corpus analyses of English and Korean are consistent with this possibility, as the predominant word order of both languages mirrors the direction preference of English and Korean speakers in Experiment 1 with an artificial grammar.

Our findings contribute to extending our knowledge of human sequential learning in several novel directions. While recent studies show that learners can exploit both predictive and retrodictive relations, such relations were previously explored individually. For instance, Jones and Pashler (2007) showed participants sequences of shapes governed by probabilistic relations, and then asked them to choose which shape reliably came after a probe shape (prediction test) or before a probe shape (retrodiction test). In experiments where forward and backward probabilities were made informative, they found that both prediction and retrodiction were used effectively for recalling memories. In a similar experiment using a continuous sequence of nonsense syllables, Perruchet and Desaulty (2008) found that participants perceived word boundaries based on backward transitional probabilities as well as forward probabilities equally well. Likewise, Pelucchi et al. (2009) provided evidence that infants can track backward statistics in running speech. The studies above tested cases in which forward and backward probabilities were never in conflict. Each cue was made maximally informative in a given experiment, while the other was made uninformative. Yet in naturalistic circumstances, such as when attending to the word order in a heard utterance, prediction and retrodiction need to be effectively combined. To our knowledge, our study demonstrates for the first time that human learners combine prediction and retrodiction to find structure in sequential stimuli. In addition, our cross-linguistic corpus analyses and experiments with artificial grammars corroborate the hypothesis that these cues act in language-specific ways in adult learners, at least when the stimuli are language-like.

The current results are not the first to demonstrate that statistical learning allows learners to adapt to the structure of their native language in ways that influence subsequent grouping of a sequence of elements. For example, statistical learning has been suggested to play a role in infants’ discovery of the phonotactic and phonological structure of words in their native language (e.g., Saffran & Thiessen, 2003; Thiessen & Saffran, 2003). Once infants have learned these patterns, they alter the way that they group syllables in fluent speech into words (Thiessen & Saffran, 2007). That is, statistical learning allows learners to discover additional cues, beyond the predictability of elements, to grouping. However, the current results are novel in that they suggest that statistical learning adapts in ways more fundamental than allowing learners to discover additional cues (such as phonotactics or lexical stress) to grouping. Instead, as the results of Experiment 1 demonstrate, linguistic experience actually changes the kinds of predictability that learners are biased to discover. These results suggest that statistical learning itself changes over the course of experience in ways that make it better adapted to the structure of the linguistic input. While the current results are limited to language, it is possible that this kind of adaptation is occurring in many different domains.

The results of Experiment 3 are potentially consistent with adaptation in the musical domain: participants exposed to the same underlying grammar instantiated with pure tone sequences exhibited a consistent preference for test items with high forward transitional probabilities, regardless of language background. One explanation for this that does not require adaptation across experience is that the preference for forward-going items constitutes a domain-general auditory preference, similar to the Iambic– Trochaic Law (e.g., Hay & Diehl, 2007). Such a preference may be early-developing, and then modified by linguistic experience: notably, strengthened for Korean learners, and contravened by English learners. Alternatively, it may be the case that experience with music inculcates a bias in both English and Korean listeners; on this account, musical experience is more consistent cross-culturally than linguistic experience. While the locus of tone preferences remains speculative, further experimental and corpus studies could tease out the role of experiential and universal biases. For example, large-scale corpus analyses of Asian and Western musical scores may reveal similar biases for forward-going probabilities between adjacent notes and musical phrases, suggesting a possible influencing role of experience with music. The alternative possibility that these biases precede experience with music could be substantiated if it was shown that young English and Korean infants with little exposure to music and tested on the musical tones grammar exhibited the same cross-language preference for forward-going probabilities. Although undoubtedly interesting, these future questions are for now outside the immediate remit of this study.

In Experiment 4, we were able to unconfound the role of probabilities and frequencies in sequence parsing, in favor of probabilities. The experiment also allowed us to tackle the important theoretical question of whether participants exposed to sequential stimuli use putative word order innate principles to map words onto Heads and Complements, as proposed by Gervain et al. (2008). The distributional arrangement in the training sequences of Experiment 4 afforded two orthogonal types of parsings, either probability-based, or frequency-based. In probability-based parsing participants should perceive groupings in the sequence whenever the TP between adjacent words is highest, and boundaries where TPs are lowest, in accord with previous research (Perruchet & Desaulty, 2008; Saffran et al., 1996). Conversely, if participants relied on the higher frequency of the alternating syllables in a given position to set a Head parameter congruent with one’s language – as suggested by Gervain et al. (2008) – then they should prefer groupings that are opposite to the ones most naturally indicated by the transition probabilities. Thus, the experimental set up in Experiment 4 allowed us to directly test whether participants implicitly adopted a parsing strategy based on a putative innate linguistic mechanism, confirming a possible interpretation of Experiment 1 in terms of Gervain et al., as well as their own study, or whether parsing is guided by inductive probabilistic learning. The results suggest that there was no evidence for a frequency-based parsing in neither Korean or English, and for the English group there was a significant preference for probability-based parsing.

Our findings are not straightforwardly accounted for by popular current computational models of human sequential learning. For example, PARSER produces sensitivity to conditional probabilities by a competition parameter that influences forward and backward probabilities equally. The ability to be sensitive differentially to forward or backward probabilities may not be possible for the current instantiation of the model. Likewise, the Simple Recurrent Network (SRN, Elman, 1990) is a connectionist model that uses its internal memory to process arbitrary sequences of inputs, allowing it to perform many tasks of sequence-prediction. The SRN and similar recurrent architectures have been used extensively to model language acquisition and processing. However, because of constraints inherent in its architecture, the SRN is only sensitive to forward probabilities, and as such would not be able to model combined effects of prediction and retrodiction. Recently, a connectionist autoassociator model has been proposed that is sensitive to both forward and backward probabilities (TRACX, French, Addyman, & Mareschal, 2011). In that respect, one directly testable prediction is that the grammar we used in Experiments 1–3 would not be easily parsable by TRACX unless previous experience with English-like or Korean-like stimuli were provided to the model. Conversely, the Forward and Backward Sequences used in Experiment 4 should be parsable, and based on probabilities rather than frequencies (indeed there is evidence that this is the case from French et al., 2011). Whether TRACX could capture the biased patterns as a function of modality-specific experience documented here would provide an explicit computational explanation of our data. More in general and beyond specific implementations, our findings coupled with recent ones on retrodiction challenge the wide assumption that the brain is inherently forward looking and that much of human cognition and behavior relies on the ability to make implicit predictions about upcoming events (Barr, 2007; Conway et al., 2010). Minimally, the predictive statement may need to be revised to include retrodictive abilities.

The findings of Experiments 1 and 4 also have potential ramifications for understanding processes of second language (L2) acquisition. Our Korean participants were advanced second language speakers of English, and were enrolled in graduate programs in the United States. The majority of them were even hired as Graduate Teaching Assistants to teach English for academic purposes in the English Language Institute run by the university. Their persistent sensitivity to the learning bias opposite to that of the English participants suggests that they implicitly used their solidified L1’s learning biases when learning the novel artificial language, despite the fact that they were induced to believe that the grammar was from another language. Admittedly, providing participants with instructions in their native language could have acted as a priming effect (e.g., Marian & Kaushanskaya, 2011). However, this should have been so also for Experiment 4, but no preference consistent with a putative head parameter setting of the native language was found.

Because patterns that conform to those initially learned are further promoted, learning to track statistics relevant to one language may interfere with tracking the statistics relevant to the structure of a different language. Although there exist accounts of second language learning difficulty based on interference and transfer from the first language, they have typically highlighted transfer of higher-order aspects of language, such as syntactic or conceptual knowledge (for a review see Ortega, 2009). The Competition model (MacWhinney, 2002) for example stipulates transfer of L1-tuned cue strengths and reliabilities when processing L2 (for instance when identifying agency). This model focuses on the use of linguistic cues, i.e. cues that can be identified by linguistic analysis, such as verb agreement morphology, and nominative case-marking for pronouns. Our findings are not incompatible with such model, and suggest that yet unexplored lower-order kinds of transfer involving basic sequential processing biases are at play and may have a ripple effect on encoding higher-order processes such as word order structure in a second language. In other words, these acquired biases may engender difficulties in learning a second language later in life. Such reasoning can only remain speculative at this stage, as there is little evidence for a direct link between sequential learning biases and second language abilities. We found no correlation between TOEFL scores and preference scores for Experiment 1 (r = 0.02, t(11) = 0.0778, p = 0.94. TOEFL scores from two participants could not be obtained). The absence of correlation may be due to low N size or insufficient variance, as our participants were all in the high range of the TOEFL scores. In work in progress we are exploring more sensitive and comprehensive assessments of second language proficiency, and testing whether sequential learning tasks correlate directly with online processing tasks in a second language.

It should be noted that our results did not directly address the strength of the bias and whether such bias is ‘unrecoverable’. The design of our test did not necessarily suggest that participants completely failed to track the statistic inconsistent with their native language. The nature of the forced-choice test may have obscured gradations in learning the two kinds of statistics. While Experiment 1 clearly suggests that native English and Korean speakers have different biases, it may have obscured latent sensitivity to the non-preferred statistics. For example, learners in both groups may have thought that both HiLo and LoHi sequences acceptable, but that one is better, leading them to choose that type more often at test. It is possible that they would have endorsed the less-preferred type at greater than chance levels when given a foil comparison with less reliable statistics. Although we recognize such design limitations, we believe they do not detract from the current finding that when participants are asked to choose, these biases emerge. In addition, our design is directly comparable to an established tradition of studies on forward and backward probability, which also used a forced-choice paradigm. In the future, it may be useful to adopt more sensitive measures of learning to tap into the degree and gradations of statistical learning and biases (see Dale, Duran, & Morehead, 2012).

While preliminary this study establishes a basis for exploring in future studies the causal links between statistical learning abilities and second language acquisition. One outstanding debate in the literature for example is what determines ultimate level of attainment in a second language, and the large variations in proficiency and fluency even when factoring in biological predictors (e.g., initial age of L2 acquisition) and biographical factors (e.g., amount or length of exposure to the L2). Participants in our experiments also showed variability in scores. Thus, one possible prediction is that individual differences in statistical learning can predict aptitude for acquiring an L2. For example, Korean individuals showing less L1 bias on the artificial grammar should perform better in sentence processing tasks involving word order manipulations in English. In addition, while our studies suggest a certain degree of entrenchment affecting subsequent language learning, different individuals may exhibit less entrenchment and consequently more flexibility in late learning. This in turn raises the possibility that training interventions targeting specific statistical learning abilities may directly help improve second language acquisition, in line with recent studies on cognitive reserve and experience-dependent brain plasticity (May, 2011).

One additional open question triggered by the current study regards the time-course of language-specific statistical learning biases in language development. If these biases were to emerge early in infancy as an effect of exposure to sequences of sounds in the ambient language, and prior to the discovery of word order syntactic relations and phrase structure, then they would be extremely useful in constraining the search space of possible linguistic structures for a given language. This scenario applied to word order learning could parallel the pre-linguistic constraining processes that underlie the onset of speech perception. Language-specific discrimination of minimally-different syllables (e.g. [ba] versus [pa]) precede infants’ ability to distinguish between similar word pairs, and is mediated by sensitivity to sub-phonemic distributional information in the ambient language prior to the discovery of higher-order information about language to which the discrimination applies (Maye et al., 2002). However, it is also possible that language-specific statistical biases emerge as a consequence, rather than as a cause, of syntactic knowledge about word order. Thus, the question of the onset of statistical learning biases in childhood is important for circumscribing the nature of the acquisition mechanisms. The current experimental setup with its equally possible grouping hypotheses constitutes a fitting learning scenario adaptable for testing young children on their pre-existing statistical preferences. Indeed, a recent study with 7- and 13-month old English monolingual infants (Thiessen & Onnis, submitted for publication) corroborates the hypothesis of the early onset of a statistical proto-grammar. Using a preferential looking paradigm adapted to preverbal infants, we found that, similar to our adult English speakers, the 13-month-olds preferred test items that have a LoHi structure, that is groupings that are consistent with the predominant word order regularity in English.

Taken as a whole, our results suggest that statistical learning changes throughout development by adapting to the characteristics of the native language. This opens many avenues for subsequent research, including understanding the mechanisms and developmental time course through which experience with the native language alters subsequent learning.

Acknowledgments

This study was partly supported by NICHD Grant 5R03HD051671-02, and by a Language Learning Small Grant to the first author. Thanks to Ju Young Min and Daehoi Lee for translating materials and instructions in Korean and for running the experiments with Korean-speaking participants. Tyler Heston and Daniel Chang prepared the corpus databases. We thank two anonymous reviewers and Pierre Perruchet for valuable suggestions. This manuscript is a considerably extended and reworked version of a paper presented at The 34th annual meeting of the Cognitive Science Society, held in Sapporo, Japan, August 1–4, 2012.

Appendix A

graphic file with name nihms419966u1.jpg

The abstract figures used in Experiment 2. The assignment of each figure to a placeholder in the grammar (X,Y, …) was random and different for each participant, to reduce spurious idiosyncratic preferences for certain figure sequences.

Footnotes

3

However, the coefficient value for logBigram, expressed in ordered log odds, are quite small in absolute size and with respect to the other two predictors, suggesting that the other predictors have a more sizeable effect.

4

We thank Pierre Perruchet for providing the original sequences from Experiment 2 of Perruchet and Desaulty (2008). Our Forward sequence and corresponding test items come from the sequence assigned to their TPfor group, while our Backward sequence and corresponding test items come from the sequence assigned to their TPback group. Our implementation of Perruchet & Desaulty uses different synthesized syllables (they used French syllables, we used Italian as in our Experiment 1). Furthermore, our sequences were shortened to keep with the length of training in Experiment 1, and we removed immediate repetitions of the same two elements (e.g., …XAXA…) to avoid that repetition patterns gave away the structure of the language. In this case, for example, a chunk within the running sequence …XA X A … might be immediately perceived as two instances of XA.

5

Gervain et al. made their claim based on Japanese and Italian infant learners. However, to the extent that Korean has similar word order characteristics to Japanese and can be classified in the same group of Headlast languages, and that English has similar word order characteristics to Italian, the reasoning of Gervain et al. can naturally extend to our case.

6

Note that both Forward and Backward sequences were originally generated by random concatenation of a few syllabic sequences, while the sequence in Experiments 1–3 was generated using a stochastic Markov model. In both cases, the results amount to the same outcome, a sequence of elements that can be described in terms of frequencies and transition probabilities between any two adjacent elements.

References

  1. Aslin RN, Saffran JR, Newport EL. Computation of conditional probability statistics by human infants. Psychological Science. 1998;9:321–324. [Google Scholar]
  2. Barr M. The proactive brain: Using analogies and associations to generate predictions. Trends in Cognitive Sciences. 2007;11:280–289. doi: 10.1016/j.tics.2007.05.005. [DOI] [PubMed] [Google Scholar]
  3. Conway CM, Bauernschmidt A, Huang SS, Pisoni DB. Implicit statistical learning in language processing: Word predictability is the key. Cognition. 2010;114:356–371. doi: 10.1016/j.cognition.2009.10.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Conway CM, Christiansen MH. Modality-constrained statistical learning of tactile, visual, and auditory sequences. Journal of Experimental Psychology: Learning, Memory, and Cognition. 2005;31:24–39. doi: 10.1037/0278-7393.31.1.24. [DOI] [PubMed] [Google Scholar]
  5. Conway CM, Christiansen MH. Statistical learning within and between modalities: Pitting abstract against stimulus-specific representations. Psychological Science. 2006;17:905–912. doi: 10.1111/j.1467-9280.2006.01801.x. [DOI] [PubMed] [Google Scholar]
  6. Cook VJ, Newson M. Chomsky’s universal grammar: An introduction. 3. Wiley-Blackwell; 2007. [Google Scholar]
  7. Creel SC, Newport EL, Aslin RN. Distant melodies: Statistical learning of non-adjacent dependencies in tone sequences. Journal of Experimental Psychology: Learning, Memory, and Cognition. 2004;30:1119–1130. doi: 10.1037/0278-7393.30.5.1119. [DOI] [PubMed] [Google Scholar]
  8. Dale R, Duran ND, Morehead JR. Prediction during statistical learning, and implications for the implicit/explicit divide. Advances in Cognitive Psychology. 2012;8:196–209. doi: 10.2478/v10053-008-0115-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Dutoit T. An Introduction to text-to-speech synthesis. Dordrecht Hardbound: Kluwer Academic Publishers; 1997. p. 312. [Google Scholar]
  10. Edelman S. Computing the mind. New York: Oxford University Press; 2008. [Google Scholar]
  11. Elman JL. Finding structure in time. Cognitive Science. 1990;14:179–211. [Google Scholar]
  12. Fiser J, Aslin RN. Statistical learning of new visual feature combinations by infants. Proceedings of the National Academy of Sciences. 2002;99:15822–15826. doi: 10.1073/pnas.232472899. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. French RM, Addyman C, Mareschal D. TRACX: A recognition-based connectionist framework for sequence segmentation and chunk extraction. Psychological Review. 2011;118(4):614–636. doi: 10.1037/a0025255. [DOI] [PubMed] [Google Scholar]
  14. Gervain J, Nespor M, Mazuka R, Horie R, Mehler J. Bootstrapping word order in prelexical infants: A Japanese–Italian cross-linguistic study. Cognitive Psychology. 2008;57(1):56–74. doi: 10.1016/j.cogpsych.2007.12.001. [DOI] [PubMed] [Google Scholar]
  15. Giroux I, Rey A. Lexical and sub-lexical units in speech perception. Cognitive Science. 2009;33:260–272. doi: 10.1111/j.1551-6709.2009.01012.x. [DOI] [PubMed] [Google Scholar]
  16. Goldstone RL. Unitization during category learning. Journal of Experimental Psychology: Human Perception and Performance. 2000;26:86–112. doi: 10.1037//0096-1523.26.1.86. [DOI] [PubMed] [Google Scholar]
  17. Gómez RL. Variability and detection of invariant structure. Psychological Science. 2002;13(5):431–436. doi: 10.1111/1467-9280.00476. [DOI] [PubMed] [Google Scholar]
  18. Graf Estes KM, Evans J, Alibali MW, Saffran JR. Can infants map meaning to newly segmented words? Statistical segmentation and word learning. Psychological Science. 2007;18:254–260. doi: 10.1111/j.1467-9280.2007.01885.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Hawkins J, Blakeslee S. On intelligence. New York: Times Books; 2004. [Google Scholar]
  20. Hay JF, Diehl RL. Perception of rhythmic grouping: Testing the Iambic/Trochaic Law. Perception and Psychophysics. 2007;69:113–122. doi: 10.3758/bf03194458. [DOI] [PubMed] [Google Scholar]
  21. Johnson NF. The psychological reality of phrase structure rules. Journal of Verbal Learning and Verbal Behavior. 1965;4:469–475. [Google Scholar]
  22. Jones J, Pashler H. Is the mind inherently forward looking? Comparing prediction and retrodiction. Psychonomic Bulletin and Review. 2007;14(2):295–300. doi: 10.3758/bf03194067. [DOI] [PubMed] [Google Scholar]
  23. Kroll JF, Potter MC. Recognizing words, pictures, and concepts: A comparison of lexical, object, and reality decisions. Journal of Verbal Learning and Verbal Behavior. 1984;23:39–66. [Google Scholar]
  24. Knowlton BJ, Squire LR. Artificial grammar learning depends on implicit acquisition of both rule-based and exemplar-specific information. Journal of Experimental Psychology: Learning, Memory & Cognition. 1996;22:169–181. doi: 10.1037//0278-7393.22.1.169. [DOI] [PubMed] [Google Scholar]
  25. MacWhinney B. The competition model: The input, the context, and the brain. In: Robinson Peter., editor. Cognition and second language instruction. New York: Cambridge University Press; 2002. [Google Scholar]
  26. Marcus GF, Fernandes KJ, Johnson SP. Infant rule learning facilitated by speech. Psychological Science. 2007;18:387–391. doi: 10.1111/j.1467-9280.2007.01910.x. [DOI] [PubMed] [Google Scholar]
  27. Marian V, Kaushanskaya M. Language-dependent memory: Insights from bilingualism. In: Zelinsky-Wibbelt C, editor. Relations between language and memory. Franfurt, Peter Lang: Sabest Saarbrucker; 2011. [Google Scholar]
  28. May A. Experience-dependent structural plasticity in the adult human brain. Trends in Cognitive Sciences. 2011;15:475–482. doi: 10.1016/j.tics.2011.08.002. [DOI] [PubMed] [Google Scholar]
  29. Maye J, Werker JF, Gerken L. Infant sensitivity to distributional information can affect phonetic discrimination. Cognition. 2002;82(3):B101–B111. doi: 10.1016/s0010-0277(01)00157-3. [DOI] [PubMed] [Google Scholar]
  30. Miller GA, Heise GA, Lichten W. The intelligibility of speech as a function of the context of the test materials. Journal of Experimental Psychology: Human Perception and Performance. 1951;41:329–335. doi: 10.1037/h0062491. [DOI] [PubMed] [Google Scholar]
  31. Ortega L. Understanding second language acquisition. Hodder Arnold Publication; 2009. [Google Scholar]
  32. Pelucchi B, Hay JF, Saffran JR. Learning in reverse: 8-month-old infants track backwards transitional probabilities. Cognition. 2009;113:244–247. doi: 10.1016/j.cognition.2009.07.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Perruchet P, Pacteau C. Synthetic grammar learning: Implicit rule abstraction or explicit fragmentary knowledge. Journal of Experimental Psychology: General. 1990;119(3):264–275. [Google Scholar]
  34. Perruchet P, Desaulty S. A role for backward transitional probabilities in word segmentation? Memory & Cognition. 2008;36:1299–1305. doi: 10.3758/MC.36.7.1299. [DOI] [PubMed] [Google Scholar]
  35. R Development Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing; Vienna, Austria: 2011. http://www.R-project.org/ [Google Scholar]
  36. Rizzi L. Null objects in Italian and the theory of pro. Linguistic Inquiry. 1986;17:501–558. [Google Scholar]
  37. Rubenstein H. Language and probability. In: Miller GA, editor. Communication, language, and meaning: Psychological perspectives. New York: Basic Books; 1973. pp. 185–195. [Google Scholar]
  38. Saffran JR. Words in a sea of sounds: The output of statistical learning. Cognition. 2001;81:149–169. doi: 10.1016/s0010-0277(01)00132-9. [DOI] [PubMed] [Google Scholar]
  39. Saffran JR, Thiessen ED. Pattern induction by infant language learners. Developmental Psychology. 2003;39:484–494. doi: 10.1037/0012-1649.39.3.484. [DOI] [PubMed] [Google Scholar]
  40. Saffran JR, Aslin RN, Newport EL. Statistical learning by 8-month-old infants. Science. 1996;274:1926–1928. doi: 10.1126/science.274.5294.1926. [DOI] [PubMed] [Google Scholar]
  41. Saffran JR, Johnson EK, Aslin RN, Newport EL. Statistical learning of tone sequences by human infants and adults. Cognition. 1999;70:27–52. doi: 10.1016/s0010-0277(98)00075-4. [DOI] [PubMed] [Google Scholar]
  42. Thiessen ED, Onnis L. Evidence for the early development of syntactic knowledge in infancy. (submitted for publication) [Google Scholar]
  43. Thiessen ED, Saffran JR. When cues collide: Use of statistical and stress cues to word boundaries by 7- and 9-month-old infants. Developmental Psychology. 2003;39:706–716. doi: 10.1037/0012-1649.39.4.706. [DOI] [PubMed] [Google Scholar]
  44. Thiessen ED, Saffran JR. Learning to learn: Infants’ acquisition of stress-based strategies for word segmentation. Language Learning & Development. 2007;3:73–100. [Google Scholar]
  45. Thompson SP, Newport EL. Statistical learning of syntax: the role of transitional probability. Language Learning and Development. 2007;3:1–42. [Google Scholar]
  46. Tomasello M. Constructing a language: A usage-based theory of language acquisition. Harvard University Press; 2003. [Google Scholar]
  47. Tremblay A, Derwing B, Libben G, Westbury C. Processing advantages of lexical bundles: Evidence from self-paced reading and sentence recall tasks. Language Learning. 2011;61:569–613. [Google Scholar]

RESOURCES