Skip to main content
Developmental Cognitive Neuroscience logoLink to Developmental Cognitive Neuroscience
. 2025 Jul 23;75:101599. doi: 10.1016/j.dcn.2025.101599

The development of semantic integration in bilingual toddlers measured by N400

Itziar Lozano a,b,⁎⁎,1, Anna Duszyk-Bogorodzka c,⁎⁎,1, Ingeborg Sophie Ribu d,⁎⁎,1, Natalia Falkiewicz a, Wiktoria Ogonowska a, Agnieszka Dynak a, Franziska Köder e, Przemysław Tomalski b, Ewelina Fryzowska a, Grzegorz Krajewski a, Cecilie Rummelhoff e, Elena C Varona e, Karolina Krupa-Gaweł e,f, Lisa Laumann f, Nina Gram Garmann f, Ewa Haman a,
PMCID: PMC12344258  PMID: 40763370

Abstract

Semantic integration is a mechanism of lexical-semantic processing. When indexed by N400, it emerges coupled with the vocabulary spurt in the second year. To what extent maturation and language exposure contribute to its development remains unclear. Bilingual toddlers split their time between two languages. While experiencing similar concepts as monolinguals, bilinguals are less exposed to each language’s words. This makes them a key group to disentangle the relative contribution of maturation and language exposure in the emergence of semantic integration. We investigate (1) whether bilinguals follow the same developmental trajectory of semantic integration as monolinguals, and (2) whether semantic integration differs between bilinguals’ dominant and non-dominant languages across time. If language exposure drives semantic integration, we expect earlier timing of onset of semantic integration in monolinguals than bilinguals and language-dominance effects within bilinguals. In this event-related potential mixed longitudinal and cross-sectional study (N=131), bilingual and monolingual 18- and 24-month-olds were shown a picture-word priming-paradigm. We found N400 effect at 18 and 24 months in monolinguals. Bilinguals showed no evidence of N400 effect in either the dominant or non-dominant language at any time-point. Although the bilingual sample was smaller than planned, our results contribute to general neurodevelopmental and dual language acquisition models.

Keywords: Semantic Integration, Bilinguals, N400, Language Development, Longitudinal

Highlights

  • Indexed by the N400 component, semantic integration was observed in monolinguals at 18 and 24 months, but not in bilinguals at either age.

  • There was no evidence of differences in semantic integration between dominant vs. non-dominant language at 18 and 24 months in the bilinguals.

  • There was no association between receptive and expressive vocabulary size at 18 months and semantic integration at 24 months in the longitudinal subsample.

  • However, exploratory analyses showed association between receptive vocabulary size and the robustness of semantic integration.

1. Introduction

Despite its crucial role in language development, lexical-semantic acquisition is still one of the less understood sub-components of language in the field of early language development. Acquiring this skill requires, among others, that young children succeed in integrating the concepts of things that surround them (i.e., referents) with their corresponding labels in their language(s) (i.e., words). This ability is known as semantic integration (SI) and is thought to reflect one of the main mechanisms underlying lexical-semantic processing at the end of the second year of life, as well as facilitating long-term vocabulary acquisition (Friedrich and Friederici, 2006, Friedrich and Friederici, 2010). One classical approach of measuring this mechanism has been through semantic-priming tasks, where a picture representing a given referent (e.g., an eye) is shown before either a congruent (e.g., the word eye) or an incongruent (e.g., the word fish) label. In the context of this paradigm, the semantic integration mechanism would refer to integrating a word into a preceding semantic context (see Friedrich and Friederici, 2005, Friedrich and Friederici, 2006; Junge et al., 2021 in support of this definition). The prime (i.e., the picture representing a referent) would give the semantic context to the target (i.e., the auditory word). The target is expected to be more easily (semantically) integrated in the congruent than in the incongruent condition. The underlying rationale is that if the child shows a differential response to the incongruent pairs relative to the congruent ones, either at behavioral or neural level, this will be indicative of a presence of semantic integration. In electrophysiological studies, the N400 has been the event-related potential (ERP) component commonly used to test semantic integration in early infancy and toddlerhood (for a systematic review, see Junge et al., 2021). The N400 is thought to reflect an electrophysiological correlate of the integration of a given word with the previously provided context (Brown and Hagoort, 1993, Kutas and Federmeier, 2011). Crucially, unlike earlier components (e.g., N200, N350) that would reflect other earlier mechanisms involved in lexical-semantic processing (e.g., lexical priming, occurring from 200 to 400 ms), N400 would genuinely reflect semantic integration (Friedrich and Friederici, 2005).

In monolinguals, evidence across several experimental paradigms has shown that the emergence of semantic priming effects at a neural level takes place between 18 and 24 months (Friedrich and Friederici, 2004; see Wojcik, 2018 for a review), with toddlers with higher receptive vocabulary scores showing such effects as early as 12–14 months (Friedrich and Friederici, 2010, Friedrich and Friederici, 2006). This timing of language development is key because it overlaps with the onset of the vocabulary spurt and the use of specific mechanisms for word learning (e.g., disambiguation; Halberda, 2003). More importantly, it has led some to argue that the emergence of semantic processing in the second half of the second year of life is tightly coupled with the child’s trajectory of vocabulary development. While very recent longitudinal research points to this possibility (Avila-Varela et al., 2021; Arias-Trejo et al., 2022), the relative contribution of child maturation, lexical proficiency, and language exposure to the emergence of this mechanism has not yet been disentangled. Similarly, the role of early language exposure in the onset of this mechanism is generally taken for granted or indirectly explored (e.g., see works comparing semantic integration between real words, pseudowords – phonotactically plausible – and nonsense – phonotactically implausible – words; Friedrich and Friederici, 2005), but it has not been directly measured.

Some current theoretical approaches to bilingual early language acquisition have argued that studying the development of semantic integration in bilingual toddlers offers the opportunity to examine if semantic integration is shaped by early exposure, more specifically, whether it is conceptually or lexically driven (Jardak and Byers-Heinlein, 2019). Bilinguals are exposed to the same number of concepts as monolinguals, but they are less frequently exposed to the words of each of their languages (Conboy and Mills, 2006, Jardak and Byers-Heinlein, 2019). Crucially, bilinguals’ language exposure is also characterized by reduced access to labeling concepts in each language and, therefore, less robust word-concept mappings. Thus, they enable us to test whether equivalent conceptual, but differential language exposure may modify the emergence of semantic integration compared to monolinguals, who are exposed to univocal word-concept relationships. This approach predicts that, if language exposure drives the emergence of semantic integration, differences between monolinguals and bilinguals would be present in at least two aspects: in the timing of onset of semantic integration, with monolinguals showing semantic priming effects earlier than bilinguals, and in the effect of language dominance, with bilinguals showing more robust semantic priming effects in their dominant than in their non-dominant language. In line with this approach, we argue that, ultimately, bilinguals constitute a window to disentangle the relative role of maturation and language exposure in the emergence of the mechanism of semantic integration (see Sander-Montant et al., 2022, for a similar approach). And, more specifically, we argue that bilinguals allow us to investigate whether this mechanism is driven by a particular dimension of language experience, namely, the level of robustness of the word-concept mappings toddlers are exposed to.

Despite the theoretical relevance of studying the development of semantic integration in bilingual toddlers, little is known about the development of this mechanism in this population. The available evidence (Kuipers and Thierry, 2012, Kuipers and Thierry, 2013, Kuipers and Thierry, 2015, Rämä et al., 2018, Sirri and Rämä, 2019) is mixed for several reasons. First, most studies are cross-sectional and compare semantic integration between monolinguals and bilinguals using designs that measure only one time-point, thus fragmentarily tracking developmental stages of this ability. Second, some of these cross- sectional studies measured semantic integration across a wide range of ages (from 19 months to 4 years), which may obscure when exactly in development this skill emerges; i.e., mean group results may be driven by the performance of older toddlers instead of reflecting the average performance of all individuals. Third, and more crucially, there is insufficient longitudinal evidence. To our knowledge, only one longitudinal study has been conducted at a behavioral level (De Anda and Friend, 2020), and none at a neural level. This means there is scant research tracking changes in this mechanism across time. Fourth, a great variety of paradigms have been used, including procedures that test semantic integration in different sensory modalities (auditory-auditory and visual-auditory priming paradigms), and at different levels, namely, behavioral (e.g., Jardak and Byers-Heinlein, 2019) and neural (e.g., Kuipers and Thierry, 2013).

As a consequence of the scant and mixed evidence in bilinguals and the scarcity of longitudinal studies, several key questions remain unsolved (see De Anda et al., 2016b, for a critical review). First, there is still a need to address whether there are differences between monolinguals and bilinguals at the onset of semantic integration: does semantic integration emerge at the same age in both groups despite having different language exposure, or does having less exposure to each language protract the onset of this mechanism in bilingual toddlers? Addressing this question in bilinguals with an early but differential onset of exposure to L1 and L2 (instead of early exposure to both languages from birth, which has been the most frequently tested sample so far) would enable a clearer disentangling of the relative role of maturation and language exposure in the emergence of semantic integration. Second, it is unclear to what extent early language exposure shapes the emergence of semantic integration in each language for bilinguals. In other words, it is uncertain to what extent the effects of dual language exposure affect semantic integration of both languages, or only to the dominant one. To our knowledge, few studies have directly explored the role of bilingual toddlers’ language exposure, both cumulative and current, in the emergence of semantic integration (but see Ward, 2020 for an example). Third, it has been underexplored to what extent the robustness of the N400 (i.e., its magnitude) is driven by language exposure; i.e., if it is equivalent in both dominant and non-dominant languages in bilinguals or, instead, if it is stronger in the language with more exposure? Altogether, the answers to these questions would be crucial to address the more general theoretical debate of what drives the emergence of semantic integration, if language exposure or maturation.

To date, among the evidence mentioned above, two sets of studies have attempted to answer these research questions, each of them proposing a different design. Whereas some have compared performance in semantic integration in bilingual toddlers relative to their monolingual peers in one language (usually the dominant; Kuipers and Thierry, 2012, Kuipers and Thierry, 2013, Kuipers and Thierry, 2015; Rämä et al., 2018), others have compared whether semantic integration differs between bilingual’s dominant vs. non-dominant language (Sirri and Rämä, 2019). From our point of view, both designs are incomplete. If one compares the semantic integration of monolingual and bilingual toddlers only in one language, the effect of age is controlled for, but the effect of language exposure in bilinguals is only partially measured. Since bilingual toddlers split their time between two languages, if we ignore one (usually the non-dominant), we may overlook the impact of part of their language exposure on the development of semantic integration. More importantly, as stated above, though a given concept and its label are systematically mapped in the language experience of monolinguals, it is not the case for bilinguals across languages, since they regularly hear several labels for the same concept. Thus, measuring semantic integration only in the dominant language of bilinguals, and comparing it with the performance of their monolingual peers, entails losing the effect that exposure to the non-dominant language may have on the emergence and robustness of this mechanism. Reversely, comparing if semantic integration differs in bilinguals’ dominant vs. non-dominant language without including monolingual groups entails not controlling for either the effects of maturation or those of having exposure to one single language system.

In our approach, two comparisons should be made within the same study to explore (1) whether bilingual toddlers follow the same developmental time course of semantic integration as monolingual toddlers and (2) what role early language exposure plays in the emergence of semantic integration. First, it is necessary to compare monolingual and bilingual toddlers at relevant time-points of lexical-semantic development (i.e., between 18 and 24 months). Second, it is necessary to compare bilingual toddlers within their two languages (i.e., dominant vs. non-dominant languages) and with two groups of monolingual peers in the same study. We believe this design will be informative of both the effects of maturation and single-language exposure relative to dual-language exposure in the emergence of semantic integration. One of the few attempts we know in this direction is the cross-sectional design by Jardak and Byers-Heinlein (2019), who tested semantic integration in monolingual and bilingual toddlers aged 24 and 30 months and, within the latter group, in both their dominant and non-dominant languages. They found no conclusive evidence concerning whether the onset of semantic integration differs between monolinguals and bilinguals. More crucially, they showed that semantic priming effects are challenging to observe in bilinguals at 24 months, but are strong at 30 months. This points to the need of investigating the age of onset of semantic integration in bilinguals younger than 24 months. Though their findings are very relevant to the field, we think there is room to further explore certain issues. First, due to the lack of longitudinal design and, second, due to the nature of the method used (i.e., behavioural measures in a semantic priming eye tracking task), which may not have been sensitive to potential group differences. Indeed, overall, the field is still missing brain activity studies that systematically test changes in the development of semantic integration using the same paradigm across early toddlerhood.

The current study aims to investigate the developmental trajectory of semantic integration in bilingual and monolingual toddlers at 18 and 24 months by using a picture-word priming task while measuring their ERPs (the N400 component). First, we aim to test if the developmental time-course of semantic integration is equivalent in bilinguals and monolinguals or if, instead, bilinguals show a later onset timing. The latter would support that a reduced amount of language exposure protracts the timing of emergence of semantic priming of the bilinguals relative to their monolingual peers. Second, we aim to examine the role of early language exposure in the emergence of semantic integration in two ways: 1) by exploring differences in performance between dominant and non-dominant languages across time in bilinguals and 2) by directly measuring the effect of cumulative and current language exposure that they receive.

We consider that a longitudinal design that uses the same paradigm to examine semantic integration at two time-points is the most suitable methodological approach to address our two research questions. Moreover, we believe it is also the most accurate approach to examine what is the relationship across development between bilingual toddlers’ early language exposure, their vocabulary performance (both receptive and productive) and semantic priming, since longitudinal designs allow to track changes in these three variables across time. Testing at two ages relevant for early lexical acquisition (18 and 24 months) would be crucially informative of the neural mechanisms underlying semantic integration in monolingual and bilingual toddlers. Additionally, we think that neuroimaging methods (ERPs) offer a more fine-grained measure to track developmental changes in semantic integration, which is otherwise more difficult to observe through behavioral methods (e.g., eye tracking).

We are aware, though, of the practical challenges when longitudinally testing developmental populations with EEG (Norton et al., 2021). For toddlers in our ages of interest, the data attrition rate ascends to 50% (mainly due to cap refusal and data loss; (Noreika et al., 2020; Webb et al., 2015). Knowing this upfront, we propose several methodological decisions to maximize the statistical power of our study. First, we will recruit a sufficiently large sample of toddlers; i.e., almost twice the size reported in prior studies reviewed above (see Junge et al., 2021 for a systematic review). Second, we will use a combined cross-sectional and longitudinal design that will increase the chances of having enough power to test our hypotheses, despite drop-outs (see Arredondo et al., 2022 for a similar approach). This choice is crucial for the bilingual group, given the moderate prevalence of the unique combination of languages we will explore (i.e., Polish-Norwegian).

Our study adds up further unique contributions. Unlike most prior evidence (e.g., Kuipers and Thierry, 2012, Kuipers and Thierry, 2013, Kuipers and Thierry, 2015; Rämä et al., 2018; Sirri and Rämä, 2019), we will study bilinguals (i.e., Polish toddlers living in Norway) who are not intensively and systematically exposed to L2 until after they enroll in Early Childhood Education and Care (henceforth ECEC; European Commission, 20232) around the age of one year (Bartsch, 2022). This type of bilingual exposure allows us to investigate in a clearer manner whether the age of onset and the amount of exposure to their L2 modulates the emergence and robustness of N400 effects indexing semantic integration. The reason behind this, is that whereas in bilinguals exposed to their two languages from birth the dual language exposure and the maturation are somehow confounded variables (i.e., both happen coupled in time), in bilinguals who are exposed to L2 later on in life, these two variables are teased apart in time, thus allowing a clearer scenario to test the effects of each in the emergence of semantic integration. Further, the unique combination of L1 and L2 seems to us a strength of our study because (1) it increases the homogeneity of the bilinguals’ language background, thus minimizing potential confounds derived from high between-participants’ diversity in their L2s, while (2) it contributes to overcoming the Anglocentric approach common in the literature (Kidd and Garcia, 2022), thus increasing the generalizability of our results to other populations. Finally, we will additionally explore the impact of language exposure on N400 effects by treating the amount of exposure to L1 and L2 as a continuous variable (ranging from 0% to 100%), instead of as a categorical one (monolingual vs. bilingual).

2. Aims and hypotheses

Our study aims

  • 1.

    To test whether bilingual toddlers follow the same developmental trajectory of semantic integration as monolinguals. (H1)

  • 2.

    To study the role of early language exposure on the emergence of semantic integration: does semantic integration differ between bilinguals’ dominant and non-dominant languages across time? (H2)

  • 3.

    To investigate if there is language-specificity in the concurrent and longitudinal associations between vocabulary size and the robustness of semantic priming effects. (H3)

To achieve these aims, we have formulated the following specific hypotheses and predictions:

H1

Monolinguals will show semantic integration earlier than bilinguals. More specifically, monolingual toddlers will differ from bilingual toddlers in their semantic integration at 18 months of age, but not at 24 months of age, thus showing a protracted onset of this skill in the latter group. We expect so because, despite both groups being comparable in chronological age, bilinguals have had reduced exposure to both L1 and L2 compared to monolinguals, which may protract the onset of their semantic priming to later stages of the second year of life (vs. 18 months, when it has been observed in monolinguals; see Wojcik, 2018, for a review).

We make this prediction for two dependent variables: Amplitude and Latency.



Amplitude

We expect that, at 18 months of age, monolinguals will show a higher negative deflection in the incongruent condition relative to the congruent condition (i.e, N400 effects), which will be indicative of semantic integration. In contrast, bilinguals will show no differences in amplitude between the incongruent and congruent conditions, thus suggesting a lack of N400 effects.

However, at 24 months of age, both monolinguals and bilinguals (in their L1 or dominant language) will show N400 effects – i.e., a higher negative deflection in the incongruent condition compared to the congruent condition – thus indicating that both succeed in semantic integration.



Latency

We expect slower speed in semantic integration (i.e., longer latencies of the N400 effect) at 18 months than at 24 months in monolinguals. Moreover, at 24 months of age, bilinguals in their L1 will show slower speed in semantic integration than monolinguals – i.e., longer latencies of N400 effects.

H2

Bilingual toddlers will show no semantic integration in any language at 18 months, and a reduced semantic integration in their non-dominant vs. dominant language at 24 months. More specifically, they will not show significant differences between congruent and incongruent trials in either of their languages at 18 months of age. In contrast, at 24 months of age, bilinguals will show reduced semantic integration in their L2 (non-dominant language) relative to their L1 (dominant language), thus reflecting the effect of their language exposure to L1 and L2 between 18 and 24 months.

Amplitude

At 18 months, bilinguals will show no differences in amplitude between the congruent and incongruent conditions, in either of their languages. At 24 months, bilinguals will show higher amplitudes in the incongruent vs. congruent condition in their dominant language but not in their non-dominant language.

H3

Language-specificity of longitudinal associations between vocabulary size and semantic priming. Children’s vocabulary performance (both receptive and expressive, but particularly receptive) in all groups collapsed at 18 months will predict the robustness of semantic priming at 24 months of age.

Our registered protocol was deposited here after the In Principal Acceptance at Stage 1: https://osf.io/utpgb/?view_only=6bed1518d6e24719873483b3e1dc469b

3. Method

3.1. Participants

We recruited 228 toddlers (112 Polish monolinguals, 76 Norwegian monolinguals and 40 Polish-Norwegian bilinguals), which we will refer to as the Large Sample henceforth. Out of those, we recruited 132 toddlers cross-sectionally (60 Polish monolingual, 42 Norwegian monolingual, and 30 Polish-Norwegian bilingual), to be tested at either at 18 months (+/− 1 month) or at 24 months (+/− 1 month), and 96 toddlers longitudinally (52 Polish monolingual, 34 Norwegian monolingual, and 10 Polish-Norwegian bilingual), to be tested at two time-points: 18 months (+/− 1 month) and 24 months of age (+/− 1 month). Given the high attrition rate and data loss in studies with toddlers that use the EEG method (see Van der Velde and Junge, 2020 and Noreika et al., 2020 for reviews of the effect of these two factors specifically in longitudinal designs), we expected a 50% drop-outs of the sample at each time-point due to low EEG data quality and, from the remaining sample, an additional 10% due to attrition rate of longitudinal designs.

Language exposure was measured as a continuous variable (from no exposure, 0%, to exclusive exposure, 100%). Monolinguals were defined as toddlers with regular exposure to a single language from birth, and less than 20% of exposure to a second language, if any, by the time of testing (as in De Anda et al., 2016). Bilinguals consisted of Polish toddlers living in Norway, with regular exposure to Polish from birth and who were systematically exposed to Norwegian after they enrolled in ECEC – around the age of one year. Fifteen parents reported that their child was exposed to a third language in addition to Polish and Norwegian, and five parents reported that their child was routinely exposed to four languages, including Polish and Norwegian. Their minimum exposure to Norwegian by the time of testing was 20%. The onset of dual exposure in this group was controlled for by measuring the age of onset of exposure to L2 via parent report. To classify infants as monolinguals and bilinguals, Polish/Norwegian translated versions of the Language Exposure Assessment Tool – LEAT – (De Anda et al., 2016a) and the Questionnaire for Parents of Bilingual children –PABIQ-IT–(Gatt et al., 2015) were used. Maturation was operationalized as toddlers’ chronological age in months.

The toddlers living in Poland were mainly recruited from Warsaw, and the toddlers living in Norway were recruited from Oslo and surrounding areas. Both online (i.e., social media) and face-to-face recruitment sources (i.e., contact with ECEC and invitation letters to parents) were used. After exclusions and data quality checks, we expected to have a final sample of at least 30 participants in each group with valid analysable EEG data for the cross-sectional design and a longitudinal subsample of 60 participants (i.e., n = 20 by language group), with valid analysable EEG data and successfully attended visits at the two time-points (i.e., complete cases). However, notice that we expected to have a larger longitudinal subsample with one missing time-point (i.e., unbalanced cases) (n = 40 by time-point and language group), which, as will be shown below in Section 4.4, is a well-powered dataset when using sophisticated statistical models such as Linear Mixed Models (LMMs).

Recruitment for cross-sectional data at 18 and 24 months was simultaneously conducted for both time-points, while the longitudinal subsample included the participants who were recruited at 18 months and successfully followed up at 24 months. This study was approved by the Research Ethics Committee at the Faculty of Psychology, University of Warsaw, and evaluated by the Norwegian Agency for Shared Services in Education and Research in Norway. Parents completed an online informed consent form before taking part in the study. All toddlers received small gifts for their participation.

Among the recruited large sample of toddlers (N = 228 participants that provided N = 324 datasets), we excluded datasets due to cap/gel refusal (n = 36), preterm birth (gestational age < 36 weeks or low birth weight < 2300 g; n = 10), not having more than 19 valid trials per condition (n = 69), having low quality data based on visual inspection (n = 35), realizing that a monolingual toddler had systematic bilingual exposure from birth when coming to the lab (n = 4), having perinatal complications (n = 4), technical issues (n = 3), or not having 6 valid centro-parietal electrodes – C3, Cz, C4, P3, Pz, and P4 – (n = 6). In total, we excluded N = 167 datasets.

The final Large Sample after exclusions consisted of 157 datasets (83 males, 74 females; 42.7% from Warsaw and 57.3% from Oslo, collapsing time-points; see full descriptives in Table 1). These datasets were provided by N = 131 toddlers who provided valid, analyzable EEG data. Out of this larger sample, n = 73 toddlers were cross-sectionals and n = 58 toddlers were longitudinals. By longitudinal we mean toddlers who came to the lab, accepted the cap, and watched the procedure at both time-points (18 months and 24 months). However, out of these, only n = 26 toddlers had valid analyzable EEG data at both time-points (i.e., were complete cases), while n = 12 toddlers had valid data only at 18 months and n = 20 toddlers had valid data only at 24 months (see Table 2 for full descriptive statistics).

Table 1.

Descriptive statistics of the final large sample by language group and time-point; Mean (SD), Range.

Large Sample (N = 157)
Monolingual Polish
(n = 67)
Monolingual Norwegian
(n = 59)
Bilingual Polish-Norwegian
(n = 31)
18 months
(n = 31)
24 months
(n = 36)
18 months
(n = 24)
24 months
(n = 35)
18 months
(n = 11)
24 months
(n = 20)
Sex (% females) 51.6% 50% 50% 37.1% 36.4% 55%
Age (in days) 564.87
(43.5)
509–751
735
(15.36)
706–779
558.29
(19.11)
515–613
724.20
(17.84)
698–779
558.64
(18.72)
532–596
730.85
(19.49)
703–783
Age (in months) 18.57
(1.43)
16.73–24.69
23.21
(.50)
23.21–25.61
18.35
(.63)
16.93–20.15
23.65
(1.00)
18.90–25.61
18.36
(.59)
17.49–19.59
24.02
(.64)
23.11–25.74
Gestational Age (weeks) 38.45
(1.72)
36–43
39.14
(1.78)
35–42
39.91
(1.08)
38–42
39.80
(1.23)
36–42
39.75
(1.39)
38–42
39.92
(1.18)
38–42
Birth weight (g) 3602.74
(477.56)
2500–4350
3508.46
(510.67)
2600–4600
3587.74
(461.13)
2465–4455
3435.60
(423.27)
2465–4455
3558.50
(624.01)
2400–4500
3514.21
(538.49)
2400–4460
Exposure to L2 from birth (%)a .85
(2.27)
0–9.33
1.71
(6.68)
0–38.98
1.81
(8.13)
0–39.00
1.87
(8.34)
0–46.11
27.4
(16.49)
0–44.1
20.4
(15.37)
0–45.19
Exposure to L2 last week (%)b 2.23
(7.38)
0–35.71
.71
(4.18)
0–25.09
1.68
(7.82)
0–37.56
1.91
(8.26)
0.00–45.09
27.29
(16.77)
0–43.05
20.89
(14.78)
.50–45.81
Exposure to L2 from birth (hh)c 3.41
(3.61)
.03–8.82
9.26
(12.06)
.2–32.70
16.27
(21.60)
1–31.55
17.16
(18.60)
1.88–37.87
22.92
(14.08)
0–36.17
17.55
(12.41)
.42–38.48
Exposure to L2 last week (hh)d 9.67
(11.83)
.50–30
3.60
(8.57)
0–21.08
13.63
(17.87)
1–26.27
22.67
(15.37)
5–33
22.61
(10.57)
0–34
33.02
(12.90)
14.86–55.00
Age of onset of exposure to L2 (in months)e 8.33
(7.42)
0–16
6.40
(6.50)
0–13
0
(0)
0–0
7.25
(8.4)
0–15
1.7
(5.38)
0–17
7.89
(9.27)
0–23
SES Educationf C1 3.93
(.45)
2–5
3.97
(.51)
3–5
3.83
(.58)
2–4
3.89
(.63)
2–5
3.91
(.70)
3–5
3.68
(.88)
1–5
SES Occupationg C1 4
(3.4)
1–11
3.09
(2.61)
1–11
3.52
(3.57)
1–13
3.09
(2.69)
1–10
5
(4.08)
1–11
3.56
(2.95)
1–10
SES Education C2 4.07
(2.29)
2–12
4.20
(2.09)
2–12
3.74
(.75)
2–5
4.00
(1.59)
1–12
3.55
(1.21)
1–5
3.32
(1.25)
1–5
SES Occupation C2 3
(3.04)
1–12
3.06
(3.19)
0–12
2.22
(1)
1–5
2.88
(2.17)
1–12
3.30
(2.11)
2–7
2.85
(2.87)
1–9

Note. g = grams; hh= hours. SES = Socioeconomic Status. C1 = Caregiver 1. C2 = Caregiver 2.

a

The percentage of exposure to a second language in toddlers’ lifetime.

b

The percentage of exposure to a second language in toddlers’ last week.

c

Number of hours of exposure to a second language per day in toddlers’ lifetime.

d

Number of hours of exposure to a second language per day in toddlers’ last week. a,b,c,d and

e

were all calculated according to LEAT (De Anda et al., 2016a). Calculations of c and d were based on assuming an average of 12 h of awake time per day.

f

SES Education ranges 1–5: 1 = Primary level education; 2 = Secondary level education; 3 = Professional training; 4 = University/tertiary education; 5 = Post-graduate education; we further added 10–13 (10 = not working; 11 = parental leave; 12 = No second caregiver; 13 = Working & studying). g SES Occupation ranges 0 (higher) - 9 (lower), according to the ISCO 08 Code (International Labour Office, 2012); we further added 10–13 (10 = not working; 11 = parental leave; 12 = No second caregiver; 13 = Working & studying).

Table 2.

Descriptive statistics of the final longitudinal subsample by language group and time-point; Mean (SD), Range.

Longitudinal subsample (N= 58; 26 complete cases; 12 only at T1 and 20 only at T2)
Monolingual Polish
(n = 29; 12 complete cases; 8 only T1; 9 only T2)
Monolingual Norwegian
(n = 24; 10 complete cases; 3 only T1; 11 only T2)
Bilingual Polish-Norwegian
(n = 5; 4 complete cases; 1 only T1)
18 months
(N = 20)
24 months
(N = 21)
18 months
(N = 13)
24 months
(N = 21)
18 months
(N = 5)
24 months
(N = 4)
Sex (% females) 55 % 35 % 38.5 % 23.8 % 60 % 50 %
Age (in days) 569.86
(49.56)
509–751
735.75
(14.53)
706–758
551.23
(14.69)
515–574
721.81
(19.57)
698–779
550.40
(3.65)
544–553
719
(5.94)
711–724
Age (in months) 18.73
(1.63)
16.73–24.69
24.18
(.48)
23.21–24.92
18.12
(.48)
16.93–18.87
23.46
(1.22)
18.90–25.61
18.09
(.12)
17.88–18.18
23.63
(.20)
23.37–23.80
Gestational Age (weeks) 39.79
(1.90)
36–43
39.05
(1.73)
35–42
39.85
(.69)
39–41
39.62
(1.24)
36–42
39.80
(1.78)
38–42
39.25
(1.50)
38–41
Birth weight (g) 3681.90
(450.14)
2580–4350
3544.50
(453.27)
2720–4350
3658
(503.75)
2465–4455
3440
(450.1)
2465–4455
3487.00
(646.45)
2400–3930
3376.25
(689.51)
2400–3875
Exposure to L2 from birth (%)a .42
(1.53)
0–6.82
2.34
(8.74)
0–38.98
- .10
(.45)
0–1.9
32.60
(17.59)
1.40–43.15
34.04
(11.58)
18.40–45.19
Exposure to L2 last week (%)b .45
(1.82)
0–8.33
1.28
(5.60)
0–25.09
- .12
.52
(0–2.23)
31.70
(16.87)
1.65–40.48
34.29
(10.95)
20.17–45.81
Exposure to L2 from birth (hh)c 3.83
(4.47)
.67–7
12.77
(17.42)
.38–32.7
- 1.88
-
1.88–1.88
26.63
(14.17)
1.39–34
28.80
(9.20)
16.95–38.48
Exposure to L2 last week (hh)d 4
(4.24)
1–7
7.19
(12.02)
0 – 21.08
- 5
-
5–5
30.48
(3.69)
25–34
35.21
(19.98)
14.86–55
Age of onset of exposure to L2 (in months)e 3
(4.24)
0–6
2
(3.46)
0–6
- 7.50
(10.61)
0–15
3.40
(7.60)
0–17
0
(0)
0–0
SES Education C1f 4
(.32)
3–5
4
(.46)
3–5
3.85
(.55)
2–4
3.76
(.77)
2–5
3.80
(.84)
3–5
3.50
(1.29)
2–5
SES Occupation C1g 3.58
(3.01)
1–10
2.68
(2.03)
1–10
3.23
(3.24)
1–11
3.19
(2.96)
1–10
5
(4.24)
2–11
2.67
(1.15)
2–4
SES Education C2 4.30
(2.74)
2–12
4.60
(2.62)
2–12
3.92
(.64)
3–5
4.14
(1.93)
2–12
3
(1.58)
1–5
2.50
(1.73)
1–5
SES Occupation C2 3.67
(3.65)
1–12
3.37
(3.82)
0–12
1.92
(.49)
1–3
2.86
(2.48)
1–12
4
(2.31)
2–6
3.33
(2.31)
3–2

Note. g = grams; hh= hours. SES = Socioeconomic Status. C1 = Caregiver 1. C2 = Caregiver 2. T1 = Time=point 1; T2 = Time-point 2.

a

The percentage of exposure to a second language in toddlers’ lifetime.

b

The percentage of exposure to a second language in toddlers’ last week.

c

Number of hours of exposure to a second language per day in toddlers’ lifetime.

d

Number of hours of exposure to a second language per day in toddlers’ last week. a,b,c,d and

e

were all calculated according to LEAT (De Anda et al., 2016a). Calculations of c and d were based on assuming an average of 12 h of awake time per day.

f

SES Education ranges 1–5: 1 = Primary level education; 2 = Secondary level education; 3 = Professional training; 4 = University/tertiary education; 5 = Post-graduate education; we further added 10–13 (10 = not working; 11 = parental leave; 12 = No second caregiver; 13 = Working & studying).

g

SES Occupation ranges 0 (higher)-9 (lower), according to the ISCO 08 Code (International Labour Office, 2012); we further added 10–13 (10 = not working; 11 = parental leave; 12 = No second caregiver; 13 = Working & studying).

3.2. Data sampling plan

To check whether our sampling plan would allow us to contrast our hypotheses 1 and 2 in the cross-sectional sample in a well-powered manner, we: (1) reviewed effect sizes reported in prior literature on N400 effects in bilinguals using picture-word priming tasks, and (2) ran statistical power analyses with R Core Team (2020).

Few N400’s studies on bilingual toddlers report effects sizes on the interactions, which is our main contrast of interest, but mostly on main effects. One of the few works following an almost equivalent design to ours (i.e., a three factored design with two levels within each) reported an effect size of d = .34 when testing n = 30 toddlers (Conboy and Mills, 2006). Thus, in the context of our study, we can assume this sample size as the minimum to get a small effect size. This size is in line with the most commonly reported n-size in the majority of the studies on N400 effects in the first two years of life (i.e., n = 25 − 30; see Junge et al., 2021 for a systematic review).

To test if this assumption was empirically supported, we estimated simulation-based power analyses for our hypotheses 1 and 2. This approach is suitable and necessary when conducting linear mixed models is planned (Kumle et al., 2021). We used lme4 (Bates et al., 2015b) and the two complementary simr (Green and MacLeod, 2016) and mixedpower (Kumle et al., 2018) R packages. First, data for n = 30 by each Language Group and Time-point was simulated. Number of trials was set as 180. Second, we entered our model as yLangGroup * Time-point * Type of trial + (1 | id) + (1 | item) with the makeLmer function of simr package. Third, we estimated the power of the interaction (our contrast of interest) by running 1000 iterations. Results for the data-based estimate showed that, for the final data we estimate we would get to contrast H1 and H2 (n = 30 in the bilingual group, n = 30 in the monolingual comparison group; total n = 60), the achieved power would be close to 99%. Results for the SESOI (i.e. smallest effect of interest) estimate were equal. Therefore, a sample of n = 60, the achieved power = 90%. Thus, our planned sample size of 30 participants (with analyzable datasets) by Language group and Time-point was well-powered to test hypothesis 1 and 2. Results are shown in Figure S2, Supplementary Materials.

For hypothesis 3, an a priori power analysis was conducted using G*Power version 3.1.9.7 (Faul et al., 2007) to determine the minimum sample size required for Pearson correlations. Results indicated the required sample size to achieve 80% power for detecting a medium effect size (d = 0.30), at a significance criterion of α = .05, was n = 67. Since we expected to have a final longitudinal sample about n = 20 by Time-point and Language group (complete cases), plus around 10% of further participants with missing values in one time-point, our expected final sample was sufficiently powered to test this hypothesis.

3.3. Timeline

Our study started at the end of December 2022. Both cross-sectional and longitudinal data of the two time-points (18 and 24 months) were continuously collected until December 2024. EEG data was pre-processed and analyzed from June 2024 to December 2024.

3.4. Stimuli

The stimuli consisted of 45 words and 45 pictures. The words were nouns with a standard index of comprehension rate by 18-month-olds higher than 55% in both the Polish and Norwegian MacArthur-Bates Communicative Development Inventory (CDI), Words & Gestures form (Smoczyńska et al., 2015; Simonsen et al., 2014), respectively), uttered in an infant-directed manner (see Supplementary Table S1). Trisyllabic, difficult to depict, and cognate words were excluded. Pictures consisted of identifiable and attractive coloured cartoons representing the words selected. All stimuli are available on https://osf.io/utpgb/?view_only=6bed1518d6e24719873483b3e1dc469b. To check whether participants comprehend the words shown in the procedure, we initially planned to use a list of words containing all stimuli, and ask parents to indicate the words their child did not comprehend. This would allow us to conduct, for each participant, an off-line rejection of those trials containing non-comprehended words. However, during the study, it appeared that caregivers reported a different number of words and particular items compared to their reports on the CDI. Also, for the bilingual toddlers, it was difficult for the caregivers to assess words in their child’s L2 (for similar evidence in children, see Łuniewska et al., 2024). There were also logistical constraints of filling out this list at the end of the procedure, or obtaining lists from the ECECs. As these inconsistencies might affect the comparability of words rejected between language groups, we decided to resign from using the list to reject trials.

3.5. Procedure

A picture-word priming paradigm adapted from Friedrich and Friederici (2004) was used. In each trial onset, a picture was displayed on the screen. After a randomized interval of 1.4–1.6 s from the picture onset, a word either semantically related to the picture (congruent condition) or unrelated to the picture (incongruent condition) was presented. The picture remained on the screen for 2.1 s from the word onset. Each trial was 3.6 s in length [3.5–3.7 s] (see Figure S1 in Supplementary Materials for an example of an incongruent trial). The task consisted of four blocks of 45 trials with a 20–60 s video break in-between (total number of trials: 180; approx. length: 15 min). The number of congruent and incongruent trials was balanced with equal numbers in each experiment. The order of presentation was fully randomized, with four exceptions: 1) no more than 3 incongruent or congruent trials were presented in a row, 2) the same picture was not presented more than once consecutively, 3) words starting with the same letter were not shown in the same incongruent trial, 4) words with close semantic relatedness (e.g., head and cap) were not paired in the incongruent trials (operationalized as a cosine value between words equal to zero, following the index by Buchanan et al., 2013, and as in De Anda and Friend, 2020). Attentional getters (2 s length animated clips) were presented after 3–5 trials. Intertrial interval (ITI) was zero. Monolinguals were tested with either the Polish or the Norwegian version of this procedure, while bilinguals were assessed in both languages within the same session (with a break in-between). If a child was unable to complete the two language procedures in one session, some bilinguals had to be invited to come back for a second session. In the bilingual group, the order of presentation of the two language procedures was counterbalanced between participants. The stimuli was presented by PsychoPy (Peirce, 2007). During the whole experiment, electrical brain activity (EEG) and video recorded with a webcam placed on the top of the monitor were simultaneously registered.

3.6. EEG data acquisition

EEG data was acquired using TMsemantic integration Porti amplifier with a sampling frequency 512 Hz. 19 derivations of 10–20 system (M1 and M2 for reference, ground at AFz) were used. Synchronization of auditory stimuli was achieved via direct input of the audio controller signals into an auxiliary channel of the EEG amplifier. A dedicated software, SVAROG, was used for data acquisition.3

3.7. Language outcomes and exposure measures

At both time-points, expressive and receptive language outcomes were assessed by using Polish and Norwegian adaptations of the MacArthur-Bates Communicative Development Inventory, CDI (Fenson et al., 2007). At 18 months, we used Words & Gestures forms (Smoczyńska et al., 2015, in Polish; Simonsen et al., 2014, in Norwegian) to measure vocabulary comprehension and production. At 24 months, the Words & Sentences form was used to assess productive language skills (Smoczyńska et al., 2015, in Polish; Simonsen et al., 2014, in Norwegian). For bilingual children, a representative from the child’s ECEC was asked to fill in the CDI in the non-dominant language. To measure vocabulary size, we used CDI Z-scores calculated separately for each language group and time-point, as they facilitate comparisons by considering score variability (mean and standard deviation). This transformation was needed because the Polish CDI W&G (Smoczyńska et al., 2015) has norm scores only up to 18 months and we tested toddlers beyond this age for whom it was not possible to assign norming scores. Furthermore, there are no bilingual norms available for this specific population.

Language exposure was assessed in all language groups at both time-points with two instruments: PABIQ-IT (Gatt et al., 2015) and a partial translation of LEAT (De Anda et al., 2016a) to Polish and Norwegian (i.e., “Primary Input by Language and Person” and “Parent Estimate” sections). Bilinguals’ caregivers filled out PABIQ-IT and the translated version of LEAT in their preferred language. CDI and PABIQ-IT were administered online before the visit to the Babylab, while LEAT was administered onsite. Filling them out took approximately 35 min for the caregivers of monolinguals and 60 min for those of bilinguals. The full protocol of testing by time-point and language group (including all measures, instruments and expected time of application) is available in Supplementary materials.

3.8. Pilot data

Before starting data collection, we completed a pilot study (n = 10) to assess the feasibility of our procedure in both Baby Labs (Warsaw and Oslo). Pilot data (EEG raw data and metadata on the list of tags per participant) is available at https://osf.io/utpgb/

Videos are not included due to the data protection policy agreed in our ethical approvals.

4. Plan of analyses

Exclusion/inclusion criteria: participants were excluded in case of preterm birth (gestational age <36 weeks), birth weight < 2300 g, or perinatal complications. Participants were included if meeting the monolingual criteria (i.e., on average, being exposed to a primary language for at least 80% of their lifetime) or the bilingual criteria (i.e., on average, more than 20% of exposure to a second language in their current daily life and lifetime as measured by LEAT; De Anda et al., 2016) . In rare cases of toddlers being exposed to a third or fourth language, they were included as long as there is a maximum of 20% of exposure to L3 or L4 and at least 20% exposure to the common L2 (i.e., Norwegian).

4.1. Descriptive statistics (Means, SDs and ranges)

  • Demographics (sex, birth weight, gestational age in weeks or days, caregivers’ Socioeconomic Status).

  • Descriptive statistics of all dependent variables (N400 magnitude, the latency of N400 effects by time-point, by language group - and, within bilinguals, also by language)

  • Descriptive statistics of productive vocabulary size and comprehensive vocabulary by time-point, by language group- and, within bilinguals, by language).

  • Descriptive statistics of language exposure (age of onset to L2, % of exposure and number of hours to L1 and L2 in the last week-current-and from birth-cumulative).

4.2. Quality control of EEG data

Overall: number of valid participants by time-point, language group (and, in the Bilingual group, by language: L1, L2). Means, SDs, and ranges of valid trials by Condition, Time-point, and Language group are reported.

Participant exclusion criteria:

  • 1.

    Number of valid trials by condition: participants with less than 25 congruent and 25 incongruent trials will be excluded from the analyses. Unlike we had planned, we ended up including participants with more than 19 congruent and 19 incongruent trials in order to maximize the amount of analyzable data included (number of toddlers) while keeping only good quality trials. Notice that this is nonetheless the common practice in prior EEG studies on semantic integration with toddlers (e.g., Borgstrom, 2015; Cantiani et al., 2017; Rama et al., 2013; Sirri and Rämä., 2015). It is worth noting that the minimum number of trials for a child to provide a reliable estimate of ERP in our study was much higher than is generally used (usually, a minimum of n = 10; e.g., Rämä et al., 2013; Junge et al., 2012; Mani et al., 2012; Helo et al., 2017; Hendrickson et al., 2019). Importantly, there are no clear standards in the literature on what is the exact number of trials to be included in a paradigm like ours with the population we work with, as indicated by the variability of the number of trials included in the studies cited above (see also Van der Velde and Junge, 2020). Therefore, we decided that 19 trials was a compromise between the number of datasets we needed for our analyses, while staying as close as possible to the number of trials we pre-registered. So in terms of valid trials, monolinguals had M = 32.51, SD = 9.2, 19–56 (congruent trials at 18 months) M = 34.90, SD = 10.04, 20–59 (congruent trials at 24 months), M = 32.40, SD = 8.9, 19–52 (incongruent trials at 18 months), M = 33.87, SD = 10.7, 20–63 (incongruent trials at 24 months). Bilinguals had, at 18 months, M = 40.44, SD = 16.93, 19–69 (congruent trials) and M = 39.11, SD = 15.72, 24–62 (incongruent trials) in the dominant language, and M = 35.42, SD = 18.38, 20–69 (congruent trials) and M = 39.86, SD = 14.95, 25–68 (incongruent trials) for the non-dominant language. At 24 months, bilinguals had M = 41.53, SD = 14.19, 22–64 (congruent trials) and M = 41.27, SD = 13.44, 23–59 (incongruent trials) for the dominant language, and M = 32.92, SD = 15.57, 20–77 (congruent trials) and M = 31.69, SD = 14.53, 21–78 (incongruent trials) for the non-dominant language.

  • 2.

    Participants with child/parent cap/gel refusal and extremely poor data quality (due to fussiness, fatigue, movement, or crying during the task) were excluded from the analyses.

Trial exclusion criteria: see subsection 4.3.

4.3. EEG data analysis

The EEG analysis was conducted in Python3 with in-house scripts and MNE toolbox.4 Since, in most cases, EEG recorded from children is heavily contaminated by artifacts, the EEG preprocessing was mostly based on the approach of Dovgialo et al. (2019), Duszyk et al. (2019), Zieleniewska et al. (2019), who proposed the automatic removal of artifacts from low-quality data.

As shown by Li et al. (2019), Jörn et al. (2011), and Duda-Goławska et al. (2022), temporal-frequency pattern analysis allows for the extraction of ERP components (including the N400) from a few trials of EEG data, the removal of spontaneous brain activity, and residual artifacts. To obtain more reliable parameters of the N400 response (magnitude and latency), we used Matching Pursuit decomposition (specifically, we adapted the approach presented by Duda-Goławska et al., 2022, for more details see: Supplementary Materials C).

The analysis included several main steps:

  • 1.

    EEG dataset quality check - the energy of each particular EEG channel and the correlation between them were estimated. Based on those parameters, we were able to indicate broken, and very noisy channels, and exclude them from further analysis or reject the whole dataset. The same procedure was applied to find fragments of an EEG signal where the signal quality was too low for further analysis.

  • 2.

    EEG data preprocessing - in the first step, three different types of artifacts were considered: outliers, muscle activity, and abrupt slopes. Their occurrence was detected based on the statistical properties of EEG signal (standard deviation and median) and fixed thresholds. For each type of artifacts, the final threshold was taken as the lowest of those three possible. In the next step, the EEG signal was filtered with the following filters: notch filter (Chebyshev type II band-stop, passband edges: 47.5 Hz and 52.5 Hz, stopband edges: 49.9 Hz and 50.1 Hz), high-pass filter (Butterworth, passband edge 0.5 Hz, stop-band edge 0.25 Hz) and low-pass filter (Butterworth, passband edge 30 Hz, stop-band edge 60 Hz). After that, to remove eye movements and eye blinks artifacts, an independent component analysis was conducted (Extended-Infomax algorithm, implemented in MNE toolbox). After the removal of the ICA component with the highest correlation with the averaged signal from the frontal electrodes (Fp1 and Fp2), a signal was projected back onto the electrodes space. Due to the low quality of the raw EEG data (resulting from missing channels, muscle artifacts, and movement artifacts), we decided to focus our main analysis on the 6 centro-parietal channels (C3, Cz, C4, P3, Pz and P4), where the signal had the highest quality. This decision was both theoretically driven and data driven. We chose this subset of channels for analysis based on previous reports in the developmental N400 literature (Junge et al., 2021). The channels chosen are also usually more artifact free (e.g., Gabard-Durnam et al., 2018). In parallel, we found through visual inspection of the datasets that several channels in regions other than the centro-parietal were often contaminated by artifacts, and they were therefore not included in the analyses. In the last step, the EEG signal was segmented into epochs from 0.2 s before the onset of each stimulus (the spoken word) to 1.4 s after it (according to the signal of the technical channel). Moreover, from the final analysis, we discarded epochs: 1) contaminated by any artifact detected in the first step in any of the channels (except for the channels removed),

    2) during which a child did not attend to the screen for at least 500 ms during the picture presentation before the word onset.

  • 3.

    Matching Pursuit decomposition and N400 parametrisation. In this step, the EEG signal was decomposed with the MP algorithm into gabor functions (atoms), which can be precisely described in terms of occurrence time, frequency, phase and duration (for a detailed description, see: Section C in Supplementary Materials). The N400 component was parametrised based on atoms related to its occurrence time and low frequency – the averaged amplitude and latency of the selected atoms were taken to the statistical analysis. Based on the ERP curve achieved for a particular child, we had planned to select a particular atom related to the N400 effect. Due to strong non-phase locked brain activity and lack of visible N400 deflection at the individual level, it was not possible to indicate relevant atoms. Therefore, to assess the N400, we calculated a mean N400 magnitude in the time window from 0.6 to 1.2 s at Cz electrode. Our decision was both theoretically and data driven, as there is no clear consensus on how to select time-window and scalp distribution in N400 studies in infants and toddlers (see Junge et al., 2021 for further discussion on this). Theoretically, we based our decision on Junge et al. (2021) to cover the most relevant time window for both 18 and 24 months old (0.4–1.2 s). However, we adjusted its onset to the characteristics of our data, as based on their visual inspection, we observed that the onset at 0.6 s seemed more appropriate than at 0.4 s.

4.4. Main analyses

To contrast hypotheses 1 and 2 in both the cross-sectional and the longitudinal datasets, we used linear mixed-effect models (LMMs). This choice allowed us (1) to analyse our full sample even in case of multiple missing values in the longitudinal dataset and (2) to maximize the comparability of the results found in cross-sectional and longitudinal datasets. Crucially, LMMs provide a unique utility in developmental ERP research for yielding accurate and unbiased results even when a low valid number of trials by a participant is available (Heise et al., 2022).

Fitting the model was run in a stepwise manner. First, we compared the structure of several models until finding the one with the best fit. To make them comparable, we kept all predictors fixed and used restricted maximum likelihood estimation (REML) and Unstructured variance-covariance matrix structure (Field, 2018). By using a forward variable selection, we estimated eight models by adding Language Group (Monolingual vs. Bilingual), Time-point (18 months vs. 24 months) and Trial Type (Congruent vs. Incongruent) and all possible interactions one at a time as predictors and then evaluate the change in likelihood-ratio test Δχ2 and Akaike Information Criterion (AIC). The outcome variable was amplitude (H1, H2). We assessed if including each of these predictors significantly improved the model fit (i.e., if, when adding a new predictor, Δχ2 is significant) and selected the model with the best fit (lowest AIC).

Second, after selecting the model structure, we estimated three models to search for the type of effects with the best fit. We started first from an all-fixed effect to a maximal random effects structure (i.e., random intercepts and slopes for Subjects) and selected the model (1) with lowest AIC and (2) which converges. We thought this step was necessary because although maximizing random effects has been commonly advised (Barr et al., 2013), it is only recommended when the model converges (Bates et al., 2015a; Matuschek et al., 2017). Thus, intercepts and slopes for Participants and Trial Type were defined as random effects as long as this choice leads to the best model fit (and only for the longitudinal dataset, where within-subject correlated variance needs to be accounted for). Assumptions were checked by visual evaluation of the residuals.

Analyses on the cross-sectional and longitudinal datasets were separately conducted. All analyses were blindly conducted by partially deleting participants’ identification code (i.e., the fragment informative of age and language group) before and during data-pre-processing but re-storing it prior to conducting linear mixed-effect models.

5. Results

5.1. N400 effects in monolinguals and bilinguals by time-point

We ran a 2 × 2 × 2 linear mixed-effect model on the N400 magnitude, with the Language Group (Monolingual vs. Bilingual) as between-subjects factor and Time-point (18 months vs. 24 months) and Type of trial (Congruent vs. Incongruent) as within-subject factors. We expected to find a significant interaction between Language group, Time-point, and Type of Trial. In post-hoc comparisons (with Bonferroni corrections), we expected to find a higher amplitude to incongruent trials compared to the congruent trials only in the monolingual group (but not in the bilingual group) and only at 18 months of age. In the cross-sectional dataset, Time-point was entered as a between-subjects factor (this applies to all subsequent analyses on H1). A separate analysis for each monolingual group (vs. bilingual) was run and we expected similar results in both of them.

As our sample size was not as powered as we planned when separately considering the cross-sectional and longitudinal data (especially in the bilingual group), we deviated from the registered plan and analyzed these two datasets collapsed first (the Large Sample) and, after that, only the longitudinal subsample. The rationale behind was to have enough statistical power to contrast H1, as the longitudinal subsample alone was not sufficiently powered.

In the first analysis, the sample size was N = 157, as not all bilinguals had valid analyzable EEG data in their dominant language. As planned, we estimated eight models with REML as a model estimator and Unstructured Covariance as the type of covariance matrix structure. Next, we selected the model with the structure with the lowest AIC, −2LL, and that converged (Model 7; see Table S2). As the planned Unstructured Covariance led to convergence issues, we chose instead Compound Symmetry, which is suitable for longitudinal designs (Field, 2018) and with which the model converged. Next, we tested the types of effects with the best fit for the selected model (see Table S3. We started from all fixed to all random, and selected the model with the lowest AIC and −2LL, and that converged (Model 7.1), which was the one with all effects fixed (including both intercept and slopes for subjects and type of trial also as fixed).

The final selected LMM consisted of mean amplitude as the outcome variable, with Language Group (Monolinguals vs. Bilinguals), Time-point (18 months vs. 24 months), Type of Trial (Congruent vs. Incongruent), and all two-way possible interactions, and the three-way Language Group × Time-point × Type of Trial interaction as fixed effect factors (see full descriptives in Table 2). Type of Trial and Time-point were entered as within-subject factors and Language Group as a between-subjects factor. Bonferroni corrections were used for post hoc comparisons instead of False Discovery Rate, as SPSS does not provide the latter as an option in LMMs.

There was a significant main effect of Trial Type, F(1, 150) = 14.61, p < .001, r = .01, indicating that the mean amplitude significantly differed between the congruent and incongruent trials. Furthermore, there was a significant Language Group × Type of Trial interaction, F(1, 150) = 5.29, p = .02, r = .21, suggesting that monolinguals and bilinguals differed in their mean amplitude responses to the congruent and incongruent trials. There were no other significant main effects or interactions (all ps > .09).

Post-hoc comparisons showed that the interaction (depicted in Fig. 1) was driven by monolinguals showing a significantly more negative mean amplitude for the incongruent than for the congruent trials (congruent M = −.63, 95% CI: −1.36, 0.105; incongruent M = −3.31, 95% CI: −4.04, −2.6), F(1, 150) = 52.85, p < .001, whereas the bilinguals showed equivalent mean amplitude for both types of trials (congruent M = −0.32, 95% CI: −1.90, 1.25; incongruent M = −0.99, 95% CI: −2.57, 0.59), F(1, 150) = .704, p = .40. Moreover, this interaction was driven by monolinguals showing equivalent mean amplitude to bilinguals in the congruent trials (monolinguals M = −.63, 95% CI: −1.36,.10; bilinguals M = −.32, 95% CI: −1.9, 1.25), F(1, 238.33) = .118, p = .73, whereas monolinguals showed a more negative amplitude than bilinguals in the incongruent trials (monolinguals M = −3.31, 95% CI: −4.04, −2.5; bilinguals M = −.99, 95% CI: −2.57, 0.59), F(1, 238.33) = 6.88, p = .009. Overall, this suggests that monolinguals showed N400 effects, and, thus, semantic integration. In contrast, the results indicate lack of evidence for semantic integration indexed by N400 in bilinguals. The grand average ERPs for bilinguals and monolinguals can be seen in Fig. 2.

Fig. 1.

Fig. 1

Mean amplitude of the N400 component by type of trial (congruent and incongruent) and language group (all monolinguals collapsed vs bilinguals) across time-points from central brain areas (Cz electrode). Error bars represent standard error of the mean.

Fig. 2.

Fig. 2

Grand average ERPs (Cz electrode) for bilinguals and monolinguals (large sample). The upper part of the graph displays the grand average ERP for bilinguals, with the data for 18-month-olds and 24-month-olds merged. The lower part of the graph displays the grand average ERP for monolinguals, with the data for 18-month-olds and 24-month-olds merged. Both graphs have 95% confidence intervals (shaded area).

Moreover, we replicated this same model with the two monolingual groups (Monolinguals Polish and Monolinguals Norwegian) instead of one, which replicated the former results (a significant main effect of Type of Trial, F(1, 144.65) = 30.56, p = < .001, and a marginal Language Group × Type of Trial interaction, F(2, 139.48) = 3.13, p = .047).

Finally, for H1, we had also planned to assess our main hypothesis on the latency of the N400 component with LMMs (H1b). We could only do this if we first found an N400 component at an individual level. A reliable way of assessing this was to test if differences between congruent and incongruent trials at an individual level were significant. Thus, during the pre-processing phase, even if it was not planned, we employed a cluster-based permutation test within individual children. This approach was designed to ensure that our measurements of latency were related to the real component, rather than a random potential increase of brain activity. However, the absence of observed N400 effects at an individual level in both the 18-month and 24-month groups, and the presence of a strong non-phase-locked brain activity, led us to the decision to abandon the testing of H1b on latencies.

5.2. N400 effects in bilinguals by language dominance and time-point

We conducted a 2 × 2 × 2 linear mixed-effect model on the amplitude of N400, with Time-point (18 months vs. 24 months), Language dominance (Dominant vs. Non-dominant) and Type of Trial (Congruent vs. Incongruent) as within-subject factors. This analysis was conducted only in the bilingual group. We expected to find a three-way significant interaction. In post-hoc contrasts (with Bonferroni corrections), we expected to find higher amplitude to incongruent trials compared to the congruent trials only in the dominant language and exclusively at 24 months of age. In the cross-sectional data set, Time-point was entered as a between-subjects factor.

We replicated the same model structure and type of effects as in our main LMM analysis (see 5.1.). In this analysis, the sample size was n = 31, both collapsing cross-sectional and longitudinal toddlers (Large Sample). Results showed no significant main effects or interactions (all ps. >.18). This suggests a lack of differences in N400 effects between dominant vs. non-dominant language across time-points in the bilinguals, which is consistent given the finding of lack of N400 effects observed in this group in our first analysis (see 5.1.).

5.3. Longitudinal associations between vocabulary size and semantic priming

To contrast H3, as planned we used correlation analyses between receptive and expressive vocabulary size at 18 months and semantic priming at 24 months in the longitudinal subsample, collapsing all language groups (n = 33). Data was normally distributed for CDI Z-scores for the receptive vocabulary and the N400 differential score, but not for the CDI Z-scores for the expressive vocabulary. Thus, we used both Pearson and Spearman ρ correlations. We chose one tailed analysis due to having a directional hypothesis. The raw CDI scores are reported in Table 3. To measure the robustness of semantic priming, we used a differential mean amplitude score (calculated as the mean amplitude for the congruent trials minus the mean amplitude for the incongruent trials). There were no significant associations between receptive vocabulary at 18 months and the robustness of semantic priming at 24 months, r (31) = -.13, p = .24, nor between expressive vocabulary at 18 months and the robustness of semantic priming at 24 months, r (31) = -.00, p = .49.

Table 3.

Descriptive statistics (Mean, SD and Range) of Raw CDI scores on productive vocabulary and comprehensive vocabulary by time-point, by language group (and, within bilinguals, by language dominance).

Monolingual Polish Monolingual Norwegian Bilingual
Polish-Norwegian
Dominant Non-Dominant
18 months
CDI – WG expressive vocabulary 46
(59.60)
1–229
56
(58.91)
5–283
30
(34.94)
3–107
23
(36.16)
0–95
CDI – WG receptive vocabulary 226
(104.63)
48–379
186
(68.33)
40–309
145
(108.66)
14–328
52
(68.277)
10–185
24 months
CDI WS expressive vocabulary 224.20
(160.22)
4–582
372
(188.62)
65–710
144
(169.31)
12–633
64
(84.97)
0–329

Note: CDI = Communicative Development Inventories. WG = Words and Gestures; WS = Words and Sentences. N size varied between 6 and 81, depending on the CDI measure.

5.4. Exploratory analyses

In a deviation from our pre-registered analysis for H3, we ran one tailed Pearson’s correlation analyses in the Large Sample to explore potential concurrent relations between toddlers’ productive and expressive vocabulary and the robustness of semantic priming effects. We included all the toddlers who had CDI data and valid EEG data, all language groups collapsed. For bilinguals, we only included their dominant language in the analyses. We found that toddlers’ receptive vocabulary at 18 months (n = 61) was concurrently related to the robustness of semantic priming, r (59) = .39, p < .001, r2 = .15, thus with a medium effect size; see Fig. 3). The same concurrent association held between toddlers’ expressive vocabulary at 18 months and the robustness of their semantic priming effects, r (59) = .26, p = .02, r2 = .07, yet the latter entailed a small effect size. At 24 months (n = 81), no significant relations were found, r (79)= .04, p = .35, r2 = .0002.

Fig. 3.

Fig. 3

Scatterplot showing the association between differential mean amplitude and words understood at 18 months. A greater difference between the mean amplitude of congruent relative to incongruent trials was associated with higher Z-scores on CDI for receptive vocabulary. mo = months.

To further explore the potential effects of time-window and topography in our main results for H1, we ran a new Linear Mixed Model (LMM) analysis, but adding region of interest (ROI) and time-window (TW) as additional predictors (see Supplementary Materials for full details of ROIs and TWs). We entered three ROIs (Frontal, Central and Parietal) and five time-windows into the analysis. Each TW was 200 ms long, starting from 0.4 s after word onset (i.e., 0.4–0.6 s, 0.6–0.8 s, 0.8–1.0 s, 1.0–1.2 s, 1.2–1.4 s) This was larger than the window in our main analysis. This decision was based on the literature in same-aged toddlers, as this is the time period when we typically expect N400 effects in toddlers between 18 and 24 months old, even despite the considerable timing variability that has been observed during the second year of life (see systematic review by Junge et al., 2021). This analysis was unplanned, not originally registered, and therefore exploratory. Full details on building the model, selection and the results of the model with best fit are in the Supplementary Materials. The results replicated the Type of Trial by Language Group significant interaction whereby monolingual infants exhibited N400 effects—indicating semantic integration—at both 18 and 24 months, while bilingual infants showed no evidence of semantic integration at either time-point. Additionally, the results provided no evidence that timing or topography influenced the main findings, as there were no significant five-way interactions between Trial Type, Language Group, Time-Point, Time Window, and Region of Interest.

6. Discussion

6.1. Monolinguals show semantic integration earlier than bilinguals

In line with our first hypothesis, we found that bilinguals followed a different developmental trajectory of semantic integration compared to monolinguals. More specifically, we observed an earlier timing of presence of semantic integration in monolinguals than in bilinguals. As we predicted, monolinguals showed N400 effects at 18 and 24 months, which indicates they do show semantic integration at both time-points (which is in line with, e.g., Friedrich and Friederici, 2004; Wojcik, 2018). In contrast, bilinguals showed no evidence for N400 effects at any time-point, which suggests there is no evidence to conclude that they do show semantic integration in this period of life. Importantly, we did not expect this latter result, but instead we predicted that, at 24 months, both monolinguals and bilinguals would show N400 effects (and, therefore, semantic integration). Overall, these findings suggest that, as predicted, the developmental trajectory of semantic integration might be driven by language experience more than maturation, as both language groups were comparable in chronological age but differed in their language exposure during the first two years of life. They also suggest that having reduced language exposure to L1 and L2 may protract the onset of semantic integration in bilinguals to later stages of the second year of life. That we do not find evidence for semantic integration in the bilingual toddlers at either time-point is not incompatible with it arising later in development, as seen in the study by Sirri and Rämä (2019) who found N400 effects for bilingual toddlers between 2 and 4 years old, only in their dominant language, and Kuipers and Thierry (2015) who found an N400 effect at 29–30 months. One possible interpretation, similar to the approach by Conboy and Mills (2006) and Jardak and Byers-Heinlein (2019), is that bilinguals, compared to their monolingual peers, are exposed to the same number of concepts but are less frequently exposed to the words in each language. Therefore, it might take bilinguals a longer time and a higher amount of exposure for word labels to establish robust word-concept mappings in both languages before this can be reflected as an N400 effect. Alternatively, given that bilinguals’ language experience is more diverse than that of their monolingual peers, as a form of adaptation to this increased variability, bilinguals might be more tolerant to greater variation in word-referent mappings, which could then be reflected as a lack of N400 effect (e.g., see Kuipers and Thierry, 2013 for findings supporting this idea). Our findings are novel because they provide direct support to the idea so far generally taken for granted (or indirectly explored, as we argued in 1) that early language exposure (particularly, the level of robustness of the word-concept mappings) might play a crucial role in the onset and developmental trajectory of the mechanism of SI.

6.2. No evidence for semantic integration in bilingual toddlers for either language or any time-points

Like we predicted in our second hypothesis, bilingual toddlers showed no evidence for semantic integration in any of their languages at 18 months. This was indicated by bilinguals showing no significant differences between the mean amplitude for the congruent relative to the incongruent trials in either of their languages at 18 months. However, unlike we predicted, this pattern of lack of the N400 effects extended to 24 months as well (as Jardak and Byers-Heinlein, 2019 found at a behavioral level), while we had hypothesized a reduced semantic integration in this group in their non-dominant relative to their dominant language. Given the lack of evidence for the N400 effects in bilinguals across time-points, we did not find a reduced semantic integration in their non-dominant vs. dominant language at 24 months, but instead no evidence for the N400 effects in any of their languages. Overall, similar to what we discussed above, these results might suggest that the trajectory of semantic integration in bilinguals could be somehow more protracted in development than we expected, and may instead emerge later on for both languages. One possibility is that it emerges once their productive and, especially, receptive vocabulary size increases. This interpretation seems plausible as we found that at 18 months, on average, bilinguals understood 145 words (compared to 226 and 186 words in the Polish and Norwegian monolingual groups, respectively), and said, on average, 30 words (compared to 46 and 56 words in the Polish and Norwegian monolingual groups, respectively). It is likely that a minimum (yet unknown) threshold of language exposure and vocabulary knowledge in both L1 and L2 is necessary to establish robust word-concept mappings and for semantic integration to emerge. In line with this possibility, previous studies have found the N400 effects only in the dominant language later on, from 19 months to 4 years of age (Conboy and Mills, 2006, Sirri and Rämä, 2019), and in both dominant and non-dominant languages at 30 months of age (Jardak and Byers-Heinlein, 2019).

Alternatively, the lack of evidence for the N400 effects in bilinguals might suggest that our sample size in this group (n = 31, collapsing across time-points and designs) was moderately powered to test this hypothesis, which is in line with the small to medium effect size we found for the interaction of language group by trial type (r = .21). Importantly, though our sample size in the bilingual group did not accomplish our planned 30 participants (with analyzable datasets) by Language group and Time-point (see our power analyses for H1 and H2), it did when collapsing data across time-points (n = 31). This is important, given that we did not find a significant main effect of time-point, nor any significant interaction involving time-point. However, we suggest that our results on the first and second hypothesis should be cautiously considered until further replication with larger samples of bilingual toddlers are published. Moreover, a larger sample size of bilinguals tested in future studies would allow us to contrast potential language dominance effects in semantic integration within the group, which we may have not detected due to the moderate sample size of bilinguals in this study. Despite this limitation, we think overall, our results suggest that there is an effect of bilinguals’ reduced language exposure to L1 and L2 in the emergence of semantic integration between 18 and 24 months. Additionally, our design compares semantic integration between two groups of monolinguals and bilinguals, at relevant time-points for lexical-semantic development, while also comparing the emergence of semantic integration in both languages within bilingual toddlers. We included both a cross-sectional and longitudinal sample who were tested with the same paradigm. The toddlers in our study were from a smaller age range than what is often observed in the literature (see e.g., Junge et al., 2021), which limits the variability in maturational stages and has allowed us to disentangle the effects of language experience more clearly. This is a novelty in the literature: our study adds to the scarcity of neural evidence on the relative effects of maturation (which we controlled for), but also on the importance of single- and dual-language exposure in the emergence of semantic integration.

6.3. Language-specificity of longitudinal associations between vocabulary size and semantic priming

Regarding our third hypothesis, in line with the idea that the trajectory of semantic integration is intertwined with the trajectory of vocabulary acquisition, we predicted that toddlers’ vocabulary skills (both receptive and expressive, but particularly receptive) in all groups collapsed at 18 months would longitudinally relate with the robustness of their semantic priming at 24 months. Unlike predicted, we did not find any prospective relations in the longitudinal subsample, likely due to the modest sample size. However, in the large subsample, we found concurrent associations between receptive and expressive vocabulary skills and the robustness of semantic priming (indexed as N400 differential score) at 18 months, but not at 24 months. This suggests that the presence and robustness of semantic integration is linked to toddlers’ ability to understand and say words at a very specific time-point: the more words they understand and say at 18 months, the stronger are their N400 effects (and, thus, the more advanced is their semantic integration; as observed by Torkildsen et al., 2008 at 20 months). However, note that some have found this pattern earlier, at 12 months (e.g., Friedrich and Friederici, 2010). We believe this is a contribution of our study to the field since, as discussed by Junge et al. (2021), few studies have yet used correlations but instead compared N400 effects between subgroups of toddlers with high vs. low vocabulary size. This result is partly in line with prior studies showing that toddlers with higher receptive vocabulary scores showed N400 effects already at 12–14 months (Friedrich and Friederici, 2010, Friedrich and Friederici, 2006), so this relation might hold both prospectively (as in Friedrich and Friederici’;s studies; 2010, 2006) and concurrently in toddlerhood (as in our study; see also Borgström et al., 2015). This result is also partly consistent with results by Rämä et al. (2013), who found the N400 effect only in children with higher word production ability at 18 months. We interpret the absence of association at 24 months as perhaps being driven by higher variability in vocabulary size at 24 months (indicated by higher standard deviations in CDI scores at this age) compared to 18 months. At 24 months, vocabulary becomes much more diverse and there are greater individual differences in both vocabulary comprehension and production, in both the number of words and categories known (Fenson et al., 1994; Rowe et al., 2012). This may somehow hide robust relationships between performance in semantic integration and vocabulary at this age. Moreover, our findings open the possibility that both expressive and receptive vocabulary knowledge might drive semantic integration, instead of viceversa, or that relations across development between the occurrence of semantic integration and receptive vocabulary are bidirectional (see Panda et al., 2021 and Schneider et al., 2023 for discussion on the complexity of these relations throughout development later on in childhood).

However, our results do not allow us to conclude about the directionality of these relations across development. Although we did not find support for vocabulary knowledge at 18 months predicting the N400 effect at 24 months in our longitudinal subsample, we cannot rule out that there is an association, again due to our modest sample size. Future studies with larger longitudinal sample sizes should explore these potential bidirectional relations across the first two years of life. Our results also suggest that, as we predicted, the relations between the presence of semantic integration and language skills might be stronger for receptive than for productive vocabulary. These latter results might be partly accounted for by the fact that while our toddlers produced many words at 18 months (between 52 and 226, on average across language groups), they still only said a few words by this time-point (between 23 and 56, on average across language groups). Overall, the contribution of these results to the literature is that we found evidence, across toddlers from two sites (Norway and Poland) and three different language groups (monolinguals in each site and Polish-Norwegian bilinguals), that the trajectory of semantic processing in the early second half of the second year of life (18 months) may be tightly coupled with the trajectory of the child’s receptive and expressive vocabulary development (especially, receptive).

7. Limitations and future directions

Our study is not without limitations. First, the bilingual sample is thus smaller than originally planned as it was challenging to recruit such a large sample from the Polish-Norwegian population within the timeframe of our project. This means that the statistical power to detect effects between language groups was not equivalent.

Second, regarding the recruitment, we planned to investigate a relatively homogeneous group of bilinguals (Polish-Norwegian) to overcome the limitation of prior literature where bilinguals usually had many different L2s. However, it was more difficult than expected to recruit such a specific bilingual group. Even though we recruited a quite homogeneous bilingual group, the amount of exposure to L2 and, particularly, the age of onset of exposure to L2, was different within the bilingual toddlers. This means that we ended up recruiting not only sequential bilinguals, but also some simultaneous bilinguals. Despite this limitation, a strength of our study compared to prior literature is the relatively high homogeneity of the bilingual sample tested relative to previous studies on semantic integration indexed by N400 effects in this population, where the diversity of L1 and L2 combinations was greater. This combination has allowed us to interpret our findings specifically in the context of Polish-Norwegian bilingual toddlers (i.e, mainly raised in Polish speaking families living in Norway), and to somehow minimize other potential confounds in our results (e.g., the proximity between the toddlers’ L1 and L2).

Third, conducting a study like ours to better understand the development of a mechanism of lexico-semantic processing in bilingual toddlers required measuring skills twice, once in each language, and across time-points. This resulted in potential missing data in several of the assessment tools we used. Indeed, we had missing CDI data, particularly in the ones obtained from the ECEC in the bilingual group (i.e., the non-dominant language). In addition, the raters of the CDI may have differed in their own knowledge of the language being assessed (e.g., caregivers compared to ECEC staff, especially in the non-dominant language). Moreover, the CDI form used at 24 months (Words and Sentences) does not measure receptive vocabulary, which means that we cannot fully estimate the dissociations between the occurrence of semantic integration and both receptive and expressive vocabulary across ages. Therefore it is also not possible within our design, to make claims about the associations between N400 and receptive vocabulary at 24 months. Future studies should incorporate measures of receptive vocabulary to get a better understanding of these associations. Another source of missing data was EEG data. As bilinguals were assessed with the same procedure in the same session in two different languages, it was extremely difficult to obtain enough good quality data in both languages. This was due to fatigue effects in the second language tested, which led to partial EEG recordings, insufficient number of trials, insufficient data quality, high level of movement or lack of cooperation from the child. Data loss was also related to the nature of our longitudinal design. Although we lost data in all language groups, this was particularly high for bilinguals. This led to the lack of complete longitudinal cases, which is common in EEG studies in toddlers (see Van der Velde and Junge, 2020). It also meant that we were unable to test our hypotheses in this subsample.

Finally, we propose some future lines of research. We suggest that the role of language exposure in the emergence of semantic integration should be further explored by analyzing the links between the amount of L2 exposure, vocabulary size and the robustness of N400 effects, as it is currently insufficiently clear what the relative contribution of each is to the emergence of SI.

CRediT authorship contribution statement

Itziar Lozano*: Writing – review & editing, Writing – original draft, Visualization, Validation, Supervision, Resources, Project administration, Methodology, Investigation, Formal analysis, Data curation, Conceptualization. Anna Duszyk-Bogorodzka* : Writing – review & editing, Writing – original draft, Visualization, Validation, Supervision, Software, Resources, Project administration, Methodology, Investigation, Formal analysis, Data curation, Conceptualization. Ingeborg Sophie Ribu* : Writing – review & editing, Writing – original draft, Validation, Supervision, Resources, Project administration, Methodology, Investigation, Funding acquisition, Data curation, Conceptualization. Natalia Falkiewicz: Writing – review & editing, Methodology, Investigation, Formal analysis, Data curation. Wiktoria Ogonowska: Writing – review & editing, Methodology, Investigation, Formal analysis, Data curation. Agnieszka Dynak: Writing – review & editing, Methodology, Funding acquisition, Conceptualization. Franziska Köder: Writing – review & editing, Methodology, Conceptualization. Przemysław Tomalski: Writing – review & editing, Supervision, Methodology, Funding acquisition, Conceptualization. Ewelina Fryzowska: Writing – review & editing, Investigation, Data curation. Grzegorz Krajewski: Writing – review & editing, Software, Methodology, Funding acquisition, Data curation. Cecilie Rummelhoff: Writing – review & editing, Project administration, Investigation, Data curation. Elena C. Varona: Writing – review & editing, Investigation, Data curation. Karolina Krupa-Gaweł: Writing – review & editing, Project administration, Investigation. Lisa Laumann: Writing – review & editing, Investigation. Nina Gram Garmann: Writing – review & editing, Supervision, Project administration, Methodology, Funding acquisition, Conceptualization. Ewa Haman: Writing – review & editing, Supervision, Resources, Project administration, Methodology, Funding acquisition, Conceptualization. * These authors have contributed equally to this work and share first authorship.

Data statement

The data that support the findings of this study are openly available in OSF at https://osf.io/utpgb/?view_only=6bed1518d6e24719873483b3e1dc469b.

Funding

The scientific research presented in this registration (registered report) has been funded by the Norwegian Financial Mechanism for 2014-2021 via NCN GRIEG programme (Project. no. 2019/34/H/HS6/00615), and partially funded by the Faculty of Psychology, University of Warsaw (501-D125-01-1250000, no 5011000613). During the preparation of this paper IL’s work was financed by NCN Sonatina 7 (2023/48/C/HS6/00264).

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

We would also like to thank all infants and parents who participated in the study for their generous contribution. We thank Charlotte Bergsaune for her assistance with recruitment and study coordination, and Karolina Muszyńska, Natalia Wojtkowiak, Ewa Komorowska, and Katarzyna Bajkowska for their work in recruitment. Thanks to Anna Malinowska-Korczak and Karolina Babis for their early testing tips and recruitment support. Many thanks to Dr. Stephanie De Anda for sharing the LEAT materials and for allowing the use of its shorter form and the translation into Polish and Norwegian.

Appendix A

Supplementary data associated with this article can be found in the online version at doi:10.1016/j.dcn.2025.101599.

2

Early childhood education and care refers to any regulated arrangement that provides education and care for children from birth to compulsory primary school age. https://education.ec.europa.eu/education-levels/early-childhood-education-and-care

Contributor Information

Itziar Lozano, Email: itziar.lozanosanchez@psych.uw.edu.pl.

Anna Duszyk-Bogorodzka, Email: aduszyk1@swps.edu.pl.

Ingeborg Sophie Ribu, Email: ininge@oslomet.no.

Ewa Haman, Email: ewa.haman@psych.uw.edu.pl.

Appendix A. Supplementary material

Supplementary material

mmc1.docx (2.2MB, docx)

Data availability

Data will be made available on request.

References

  1. Arias-Trejo N., Angulo-Chavira A.Q., Avila-Varela D.S., Chua-Rodriguez F., Mani N. Developmental changes in phonological and semantic priming effects in spanish-speaking toddlers. Dev. Psychol. 2022;58:236. doi: 10.1037/dev0001290. [DOI] [PubMed] [Google Scholar]
  2. Arredondo M.M., Aslin R.N., Zhang M., Werker J.F. Attentional orienting abilities in bilinguals: evidence from a large infant sample. Infant Behav. Dev. 2022;66 doi: 10.1016/j.infbeh.2021.101683. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Avila-Varela D.S., Arias-Trejo N., Mani N. A longitudinal study of the role of vocabulary size in priming effects in early childhood. J. Exp. Child Psychol. 2021;205 doi: 10.1016/j.jecp.2020.105071. [DOI] [PubMed] [Google Scholar]
  4. Barr D.J., Levy R., Scheepers C., Tily H.J. Random effects structure for confirmatory hypothesis testing: keep it maximal. J. Mem. Lang. 2013;68:255–278. doi: 10.1016/j.jml.2012.11.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Bartsch, B., 2022. Facts Educ. Nor. 2022. Tech. Rep.
  6. Bates, D., Kliegl, R., Vasishth, S., Baayen, H., 2015a. Parsimonious mixed models. arXiv preprint arXiv:1506.04967 doi: 10.48550/arXiv.1506.04967. [DOI]
  7. Bates D., Maechler M., Bolker B., Walker S., Christensen R.H., Singmann H., Dai B., et al. lme4: linear mixed-effects models using eigen and S4. R. Package Version. 2015;2014(1):1–7. doi: 10.18637/jss.v067.i01. [DOI] [Google Scholar]
  8. Borgström K., von Koss Torkildsen J., Lindgren M. Substantial gains in word learning ability between 20 and 24 months: a longitudinal ERP study. Brain Lang. 2015;149:33–45. doi: 10.1016/j.bandl.2015.07.002. [DOI] [PubMed] [Google Scholar]
  9. Brown C., Hagoort P. The processing nature of the N400: evidence from masked priming. J. Cogn. Neurosci. 1993;5:34–44. doi: 10.1162/jocn.1993.5.1.34. [DOI] [PubMed] [Google Scholar]
  10. Buchanan E.M., Holmes J.L., Teasley M.L., Hutchison K.A. English semantic word-pair norms and a searchable web portal for experimental stimulus creation. Behav. Res. Methods. 2013;45:746–757. doi: 10.3758/s13428-012-0284-z. [DOI] [PubMed] [Google Scholar]
  11. Cantiani C., Riva V., Piazza C., Melesi G., Mornati G., Bettoni R., Molteni M. ERP responses to lexical-semantic processing in typically developing toddlers, in adults, and in toddlers at risk for language and learning impairment. Neuropsychologia. 2017;103:115–130. doi: 10.1016/j.cortex.2021.04.020. [DOI] [PubMed] [Google Scholar]
  12. Conboy B.T., Mills D.L. Two languages, one developing brain: event-related potentials to words in bilingual toddlers. Dev. Sci. 2006;9:F1–F12. doi: 10.1111/j.1467-7687.2005.00453.x. [DOI] [PubMed] [Google Scholar]
  13. Core Team, R., 2020. R: A language and environment for statistical computing. R foundation for statistical computing. Vienna, Austria.https://www.R-project.org/.
  14. De Anda S., Arias-Trejo N., Poulin-Dubois D., Zesiger P., Friend M. Minimal second language exposure, SES and early word comprehension: new evidence from a direct assessment. Biling. Lang. Cogn. 2016;19:162–180. doi: 10.1017/S1366728914000820. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. De Anda S., Bosch L., Poulin-Dubois D., Zesiger P., Friend M. The language exposure assessment tool: quantifying language exposure in infants and children. J. Speech Lang. Hear. Res. 2016;59:1346–1356. doi: 10.1044/2016_JSLHR-L-15-0234. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. De Anda S., Friend M. Lexical-semantic development in bilingual toddlers at 18 and 24 months. Front. Psychol. 2020;11 doi: 10.3389/fpsyg.2020.508363. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. De Anda S., Poulin-Dubois D., Zesiger P., Friend M. Lexical processing and organization in bilingual first language acquisition: guiding future research. Psychol. Bull. 2016;142:655. doi: 10.1037/bul0000042. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Dovgialo M., Chabuda A., Duszyk A., Zieleniewska M., Pietrzak M., Różański P., Durka P. Assessment of statistically significant command-following in pediatric patients with disorders of consciousness, based on visual, auditory and tactile event-related potentials. Int. J. Neural Syst. 2019;29:1850048. doi: 10.1142/S012906571850048X. [DOI] [PubMed] [Google Scholar]
  19. Duda-Goławska J., Imbir K.K., Żygierewicz J. ERP analysis using a multi-channel matching pursuit algorithm. Neuroinformatics. 2022:1–36. doi: 10.1007/s12021-022-09575-6. [DOI] [PubMed] [Google Scholar]
  20. Duszyk A., Dovgialo M., Pietrzak M., Zieleniewska M., Durka P. Event-related potentials in the odd-ball paradigm and behavioral scales for the assessment of children and adolescents with disorders of consciousness: a proof of concept study. Clin. Neuropsychol. 2019;33:419–437. doi: 10.1080/13854046.2018.1555282. [DOI] [PubMed] [Google Scholar]
  21. European Commission, 2023. Early childhood education and care. URL: 〈https://education.ec.europa.eu/education-levels/〉 early-childhood-education-and-care. Accessed: 2023-02-13.
  22. Faul F., Erdfelder E., Lang A.G., Buchner A. G* power 3.1: a flexible statistical power analysis program for the social. Behav. Biomed. Sci. Behav. Res. Methods. 2007;39:175–191. doi: 10.3758/BF03193146. [DOI] [PubMed] [Google Scholar]
  23. Fenson, L., et al., 2007. MacArthur-Bates communicative development inventories. Paul H. Brookes Publishing Company Baltimore, MD. 10.1037/t11538-000. [DOI]
  24. Fenson L., Dale P.S., Reznick J.S., Bates E., Thal D.J., Pethick S.J. Variability in early communicative development. Monogr. Soc. Res. Child Dev. 1994;59(5, Serial 242) doi: 10.2307/1166093. [DOI] [PubMed] [Google Scholar]
  25. Field, A., 2018. Multilevel linear models. Discovering statistics using IBM SPSS statistics., Fifth ed. SAGE, pp. 936–990.
  26. Friedrich M., Friederici A.D. N400-like semantic incongruity effect in 19-month-olds: processing known words in picture contexts. J. Cogn. Neurosci. 2004;16:1465–1477. doi: 10.1162/0898929042304705. [DOI] [PubMed] [Google Scholar]
  27. Friedrich M., Friederici A.D. Lexical priming and semantic integration reflected in the event-related potential of 14-month-olds. Neuroreport. 2005;16:653–656. doi: 10.1097/00001756-200504250-00028. [DOI] [PubMed] [Google Scholar]
  28. Friedrich M., Friederici A.D. Early N400 development and later language acquisition. Psychophysiology. 2006;43:1–12. doi: 10.1111/j.1469-8986.2006.00381.x. [DOI] [PubMed] [Google Scholar]
  29. Friedrich M., Friederici A.D. Maturing brain mechanisms and developing behavioral language skills. Brain Lang. 2010;114:66–71. doi: 10.1016/j.bandl.2009.07.004. [DOI] [PubMed] [Google Scholar]
  30. Gabard-Durnam L.J., Mendez Leal A.S., Wilkinson C.L., Levin A.R. The harvard automated processing pipeline for electroencephalography (HAPPE): standardized processing software for developmental and high-artifact data. Front. Neurosci. 2018;12:97. doi: 10.3389/fnins.2018.00097. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Gatt D., OToole C., Haman E. Using parental report to assess early lexical production in children exposed to more than one. Assess. Mul tilingual Child. Disentangling Biling. Lang. Impair. 2015;13:151. doi: 10.21832/9781783093137-009. [DOI] [Google Scholar]
  32. Green P., MacLeod C.J. SIMR: an R package for power analysis of generalized linear mixed models by simulation. Methods Ecol. Evol. 2016;7:493–498. doi: 10.1111/2041-210X.12504. [DOI] [Google Scholar]
  33. Halberda J. The development of a word-learning strategy. Cognition. 2003;87:B23–B34. doi: 10.1016/S0010-0277(02)00186-5. [DOI] [PubMed] [Google Scholar]
  34. Heise M.J., Mon S.K., Bowman L.C. Utility of linear mixed effects models for event-related potential research with infants and children. Dev. Cogn. Neurosci. 2022;54 doi: 10.1016/j.dcn.2022.101070. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Helo A., Azaiez N., Rämä P. Word processing in scene context: an event-related potential study in young children. Dev. Neuropsychol. 2017;42(7-8):482–494. doi: 10.1080/87565641.2017.1396604. [DOI] [PubMed] [Google Scholar]
  36. Hendrickson K., Love T., Walenski M., Friend M. The organization of words and environmental sounds in the second year: behavioral and electrophysiological evidence. Dev. Sci. 2019;22(1) doi: 10.1111/desc.12746. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Jardak A., Byers-Heinlein K. Labels or concepts? the development of semantic networks in bilingual two-year-olds. Child Dev. 2019;90:e212–e229. doi: 10.1111/cdev.13050. [DOI] [PubMed] [Google Scholar]
  38. Jörn M., Sielużycki C., Matysiak M., Żygierewicz J., Scheich H., Durka P., König R. Single-trial reconstruction of auditory evoked magnetic fields by means of template matching pursuit. J. Neurosci. Methods. 2011;199:119–128.. doi: 10.1016/j.jneumeth.2011.04.019. [DOI] [PubMed] [Google Scholar]
  39. Junge C., Boumeester M., Mills D.L., Paul M., Cosper S.H. Development of the N400 for word learning in the first 2 years of life: a systematic review. Front. Psychol. 2021;12:2420. doi: 10.3389/fpsyg.2021.689534. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Junge C., Cutler A., Hagoort P. Electrophysiological evidence of early word learning. Neuropsychologia. 2012;50(14):3702–3712. doi: 10.1016/j.neuropsychologia.2012.10.012. [DOI] [PubMed] [Google Scholar]
  41. Kidd E., Garcia R. How diverse is child language acquisition research? First Lang. 2022;42:702–735. doi: 10.1177/01427237211066405. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Kuipers J.R., Thierry G. Event-related potential correlates of language change detection in bilingual toddlers. Dev. Cogn. Neurosci. 2012;2:97–102. doi: 10.1016/j.dcn.2011.08.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Kuipers J.R., Thierry G. ERP-pupil size correlations reveal how bilingualism enhances cognitive flexibility. Cortex. 2013;49:2853–2860. doi: 10.1016/j.cortex.2013.01.012. [DOI] [PubMed] [Google Scholar]
  44. Kuipers J.R., Thierry G. Bilingualism and increased attention to speech: Evidence from event-related potentials. Brain Lang. 2015;149:27–32. doi: 10.1016/j.bandl.2015.07.004. [DOI] [PubMed] [Google Scholar]
  45. Kumle, L., Võ, M., Draschkow, D., 2018. Mixedpower: a library for estimating simulation-based power for mixed models in R. https://github.com/DejanDraschkow/mixedpower.
  46. Kumle L., Võ M.L.H., Draschkow D. Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R. Behav. Res. Methods. 2021;53:2528–2543. doi: 10.3758/s13428-021-01546-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Kutas M., Federmeier K.D. Thirty years and counting: finding meaning in the n400 component of the event-related brain potential (erp) Annu. Rev. Psychol. 2011;62:621–647. doi: 10.1146/annurev.psych.093008.131123. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Li B., Liu Z., Gao X., Lin Y. N400 extraction from a few trials of eeg data using spatial and temporal-frequency pattern analysis. J. Neural Eng. 2019;16 doi: 10.1088/1741-2552/ab434c. [DOI] [PubMed] [Google Scholar]
  49. Łuniewska M., Krysztofiak M., Haman E. Parental report of vocabulary in 3-to 6-year-old Polish children: reliable but not valid. Int. J. Lang. Commun. Disord. 2024;56(6):2483–2496. doi: 10.1111/1460-6984.13101. [DOI] [PubMed] [Google Scholar]
  50. Mani N., Mills D.L., Plunkett K. Vowels in early words: an event-related potential study. Dev. Sci. 2012;15(1):2–11. doi: 10.1111/j.1467-7687.2011.01092.x. [DOI] [PubMed] [Google Scholar]
  51. Matuschek H., Kliegl R., Vasishth S., Baayen H., Bates D. Balancing type I error and power in linear mixed models. J. Mem. Lang. 2017;94:305–315. doi: 10.1016/j.jml.2017.01.001. [DOI] [Google Scholar]
  52. Noreika V., Georgieva S., Wass S., Leong V. 14 challenges and their solutions for conducting social neuroscience and longitudinal eeg research with infants. Infant Behav. Dev. 2020;58 doi: 10.1016/j.infbeh.2019.101393. [DOI] [PubMed] [Google Scholar]
  53. Norton E.S., MacNeill L.A., Harriott E.M., Allen N., Krogh-Jespersen S., Smyser C.D., Rogers C.E., Smyser T.A., Luby J., Wakschlag L. EEG/ERP as a pragmatic method to expand the reach of infant-toddler neuroimaging in hbcd: Promises and challenges. Dev. Cogn. Neurosci. 2021;51 doi: 10.1016/j.dcn.2021.100988. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Panda E.J., Emami Z., Valiante T.A., Pang E.W. EEG phase synchronization during semantic unification relates to individual differences in children’s vocabulary skill. Dev. Sci. 2021;24(1) doi: 10.1111/desc.12984. [DOI] [PubMed] [Google Scholar]
  55. Peirce J.W. Psychopypsychophysics software in Python. J. Neurosci. Methods. 2007;162:8–13. doi: 10.1016/j.jneumeth.2006.11.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Rämä P., Sirri L., Goyet L. Event-related potentials associated with cognitive mechanisms underlying lexical-semantic processing in monolingual and bilingual 18-month-old children. J. Neurolinguist. 2018;47:123–130. doi: 10.1016/j.jneuroling.2018.04.012. [DOI] [Google Scholar]
  57. Rämä P., Sirri L., Serres J. Development of lexical–semantic language system: N400 priming effect for spoken words in 18-and 24-month old children. Brain Lang. 2013;125(1):1–10. doi: 10.1016/j.bandl.2013.01.009. [DOI] [PubMed] [Google Scholar]
  58. Rowe M.L., Raudenbush S.W., Goldin-Meadow S. The pace of vocabulary growth helps predict later vocabulary skill. Child Dev. 2012;83(2):508–525. doi: 10.1111/j.1467-8624.2011.01710.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. Sander-Montant, A., Byers-Heinlein, K., Perez, M.L., 2022. The more they hear the more they learn? using data from bilinguals to test models of early lexical development doi: 10.31234/osf.io/zd3m8. [DOI] [PubMed]
  60. Schneider J.M., Poudel S., Abel A.D., Maguire M.J. Age and vocabulary knowledge differentially influence the N400 and theta responses during semantic retrieval. Dev. Cogn. Neurosci. 2023;61 doi: 10.1016/j.dcn.2023.101251. [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. Simonsen H.G., Kristoffersen K.E., Bleses D., Wehberg S., Jørgensen R.N. The Norwegian Communicative Development Inventories: re- liability, main developmental trends and gender differences. First Lang. 2014;34:3–23. doi: 10.1177/0142723713510997. [DOI] [Google Scholar]
  62. Sirri L., Rämä P. Cognitive and neural mechanisms underlying semantic priming during language acquisition. J. Neurolinguist. 2015;35:1–12. doi: 10.1016/j.jneuroling.2015.01.003. [DOI] [Google Scholar]
  63. Sirri L., Rämä P. Similar and distinct neural mechanisms underlying semantic priming in the languages of the french–spanish bilingual children. Biling. Lang. Cogn. 2019;22:93–102. doi: 10.1017/S1366728917000578. [DOI] [Google Scholar]
  64. Smoczyńska, M., Krajewski, G., Łuniewska, M., Haman, E., Bulkowski, K., Kochańska, M., 2015. Inwentarze rozwoju mowy i komunikacji (IRMIK) słowa i gesty, słowa i zdania: podręcznik. Instytut Badań Edukacyjnych Warszawa.
  65. Torkildsen J.V.K., Svangstu J.M., Hansen H.F., Smith L., Simonsen H.G., Moen I., Lindgren M. Productive vocabulary size predicts event-related potential correlates of fast mapping in 20-month-olds. J. Cogn. Neurosci. 2008;20(7) doi: 10.1162/jocn.2008.20087. 1266-128. [DOI] [PubMed] [Google Scholar]
  66. Van der Velde B., Junge C. Limiting data loss in infant EEG: putting hunches to the test. Dev. Cogn. Neurosci. 2020;45 doi: 10.1016/j.dcn.2020.100809. [DOI] [PMC free article] [PubMed] [Google Scholar]
  67. Ward, A.L., 2020. Language Dominance and Lexical-Semantic Processing in Bilingual Toddlers. Ph.D. thesis. University of Oregon. https://hdl.handle.net/1794/25824.
  68. Webb S.J., Bernier R., Henderson H.A., Johnson M.H., Jones E.J., Lerner M.D., McPartland J.C., Nelson C.A., Rojas D.C., Townsend J., et al. Guidelines and best practices for electrophysiological data collection, analysis and reporting in autism. J. Autism Dev. Disord. 2015;45:425–443. doi: 10.1007/s10803-013-1916-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  69. Wojcik E.H. The development of lexical–semantic networks in infants and toddlers. Child Dev. Perspect. 2018;12:34–38. doi: 10.1111/cdep.12252. [DOI] [Google Scholar]
  70. Zieleniewska M., Duszyk A., Różański P., Pietrzak M., Bogotko M., Durka P. Parametric description of EEG profiles for assessment of sleep architecture in disorders of consciousness. Int. J. Neural Syst. 2019;29 doi: 10.1142/S0129065718500491. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary material

mmc1.docx (2.2MB, docx)

Data Availability Statement

Data will be made available on request.


Articles from Developmental Cognitive Neuroscience are provided here courtesy of Elsevier

RESOURCES