When function words carry content

João Vieira; Elisângela Teixeira; Erica Rodrigues; Hayward J Godwin; Denis Drieghe

doi:10.1177/17470218241307582

. 2024 Dec 26;78(10):2235–2248. doi: 10.1177/17470218241307582

When function words carry content

João Vieira ^1,^✉, Elisângela Teixeira ², Erica Rodrigues ³, Hayward J Godwin ¹, Denis Drieghe ¹

PMCID: PMC12432282 PMID: 39628381

Abstract

Studies on eye movements during reading have primarily focussed on the processing of content words (CWs), such as verbs and nouns. Those few studies that have analysed eye movements on function words (FWs), such as articles and prepositions, have reported that FWs are typically skipped more often and, when fixated, receive fewer and shorter fixations than CWs. However, those studies were often conducted in languages where FWs contain comparatively little information (e.g., the in English). In Brazilian Portuguese (BP), FWs can carry gender and number marking. In the present study, we analysed data from the RASTROS corpus of natural reading in BP and examined the effects of word length, predictability, frequency and word class on eye movements. Very limited differences between FWs and CWs were observed mostly restricted to the skipping rates of short words, such that FWs were skipped more often than CWs. For fixation times, differences were either nonexistent or restricted to atypical FWs, such as low frequency FWs, warranting further research. As such, our results are more compatible with studies showing limited or no differences in processing speed between FWs and CWs when influences of word length, frequency and predictability are taken into account.

Keywords: Eye movements, corpus study, function words, content words, Brazilian Portuguese

Reading is a very common activity in everyday life. Research using eye-tracking methodology during reading has shown that eye movement patterns closely reflect the cognitive processes active while processing text, with eye movements slowing down as processing gets more effortful (Liversedge & Findlay, 2000). Unlike what happens when we are listening to someone speak, in reading we are free to advance or return to previously read parts and as a result our eyes’ behaviour can tell us a great deal about cognitive processing, as they provide insights into which parts of text are being processed at certain moments. For this reason, eye-tracking, which offers great spatial and temporal accuracy about eye movements, has been implemented for decades to study the cognitive processes behind reading (for a review, see Rayner, 2009).

Eye movements during reading can be categorised into two basic behaviours. The first one is fixations, which happen when our eyes remain relatively still. During reading, fixations typically last about 200–250 ms for adult, proficient readers (Rayner, 2009). It is during fixations that we acquire new visual information. The second behaviour is saccades which are fast eye movements between fixations, during which the eyes are functionally blind and do not acquire useful visual information (Matin, 1974). The number and duration of our fixations are influenced by various factors such as the word length of the fixated word, whereby shorter words are fixated less often and for less time (Rayner, 1998). Other influences are word frequency (Inhoff & Rayner, 1986) and a word’s predictability from the preceding context (Balota et al., 1985), such that more frequent and more predictable words receive fewer and shorter fixations compared with low-frequency and unpredictable words.

Not all words are fixated. Word skipping happens when a word is not directly fixated during first pass and most models of eye movements during reading assume that word skipping is usually related to how much information is acquired before the eyes reach that word (e.g., the EZ-Reader model, Reichle et al., 1998). If an advanced stage of processing of the next word has been obtained, the word will be skipped. Variables that impact fixation durations have often been shown to impact word skipping as well. For example, words that appear frequently in written material are skipped more often than infrequent words (e.g., Rayner et al., 1996). Short words are also skipped more often than long words. For example, two- or three-letter long words are only fixated around 25% of the time, while words with eight letters or more are almost always fixated and often multiple times (Brysbaert et al., 2005). In addition, it has been well-documented that words that are more predictable from the preceding context are skipped more often compared with unpredictable words (Balota et al., 1985). Predictability in reading relates to our ability to anticipate what comes next in the sentence. Contextual information, which can be world knowledge or specific information present in the preceding text, often aids us in predicting the next word in full (i.e., the exact word) or, when that is not possible, partially, such as when we predict a word’s part-of-speech (e.g., noun, verb, and preposition) but not the specific word (Luke & Christianson, 2016). Furthermore, words can also be skipped or fixated upon due to mislocated saccades, such as when our oculomotor system aims for a specific word and overshoots or undershoots it thereby landing on another word (Nuthmann et al., 2005). Simulations from Nuthmann et al. suggest involuntary skipping of or landing on a word happens relatively often, especially for short words.

Research also shows that word skipping can be influenced by word class (Angele & Rayner, 2013; Staub et al., 2018). Words can be categorised as either Content words (CWs) or Function words (FWs). As the name suggests, CWs are words that typically carry more content, or semantic information, such as verbs, nouns, and adjectives. In contrast, FWs, such as prepositions, conjunctions, and articles, typically carry less to close to no semantic information, as is the case in for instance the article “the” in English. In addition, new FWs are rarely if ever created, which is why they are also called closed-class words, while CWs are considered open-class words, as they can be more easily created. According to Rayner (1998), FWs are fixated upon only about 35% of the time, whereas the average fixation rate on CWs is around 85%. Part of the reason why FWs are skipped so often lies in the previously mentioned effects of word length and word frequency on skipping. FWs are typically shorter and more frequent than CWs in most languages. An open question is whether or not, when influences of word length and frequency are considered, FWs still differ from CWs regarding skipping and fixation time duration. This will be the topic of the current investigation.

Researchers have hypothesised that FWs and CWs are processed differently during both language comprehension and production. Some indications of differences in how these word types are processed come from imaging studies. Diaz and McCarthy (2009) reported that when processing FWs and CWs, both classes shared activation of several brain areas (such as the temporal-parietal cortex, middle and anterior temporal cortex, and inferior frontal gyrus), but CWs generated stronger activations of the middle and anterior temporal cortex, as well as the parahippocampal regions. Also, Thi et al. (2022), in an EEG experiment during which participants listened to English sentences, found larger N1 amplitudes, from the onset of the target words, on CWs than on FWs, suggesting there is a difference in their integration processes. Juste et al. (2012), in a study in Brazilian Portuguese, when comparing stuttering individuals of different age groups, found that children show a tendency to stutter more on FWs, while adolescents and adults express their disfluency more on CWs. Juste et al. suggest that a possible explanation for this is that children retrieve CWs from their mental lexicon faster than they can plan the correct syntactic structure, which is where FWs play a bigger role. Reports from reading-alike experiments also suggest differences between content and function words. In a letter detection task, wherein participants are asked to read a text and mark all occurrences of a specific letter (e.g., “e”), Corcoran (1966) found that participants missed the letter “e” more often on very frequent FWs, such as “the,” than on CWs. The missing letter effect has been well replicated when comparing content and function words, with participants failing to detect the letters on FWs more often than on CWs (see Roy-Charland et al., 2022; Saint-Aubin & Poirier, 1997 for studies conducted in French).

Importantly, eye movement studies corroborate the idea that FWs and CWs are processed differently. Gautier et al. (2000), in a reading study in French, found that the article “les” (“the” in English) was skipped more often than a three-letter long verb. Staub et al. (2018) found that the repetition of certain FWs (e.g., Amanda jumped off the swing and landed on her feet.) was often not perceived by readers, even when their eyes landed on both the first and repeated words, and contrary to what happened to repeated CWs. Angele and Rayner (2013) used the eye-contingent boundary paradigm (Rayner, 1975) whereby an invisible boundary is placed in the text such that when the eyes cross it, the preview of a target word located after the boundary is changed. In their experiment the preview was either the correct target word or the article the. They found that the article the was often skipped even when it was used as a grammatically incorrect parafoveal preview for the target word (e.g., She was sure she would| the/ace all the tests—the | represents where the invisible barrier was). Angele and Rayner (2013) suggested readers are so used to skipping the article the that even when its presence goes against the syntactic structure, the parafoveal preview of the article is enough for the oculomotor system to decide to skip it. In addition, Luke and Christianson (2016) ran a series of analyses on FWs and CWs using data from the PROVO corpus of Natural Language in English. The authors analysed the effects of partial predictability. For instance, in the sentence “Last summer, Peter travelled to _____,” without more information, it is difficult to guess where Peter went (e.g., Paris) or the purpose of his trip (e.g., to relax), but there is a high chance the word will be a noun (e.g., Paris, Rio de Janeiro, or Tokyo). The predictability of part-of-speech is a form of partial predictability. Luke and Christianson (2016) found that partial predictability was enough to facilitate FW processing, which translated into shorter fixations and higher skipping rates, while the same was true for fixation durations on CWs, but not skipping rates of CWs. Correctly anticipating the grammatical class of an FW should be enough to be fairly certain of its syntactic function in the sentence, which could, in turn, be enough to influence the decision of skipping that word. The same is not true with CWs, since anticipating the grammatical class of a verb or noun does not give enough semantic information to justify the decision of skipping it.

However, not all studies of eye movements during reading indicate processing differences between FWs and CWs. Following the results reported by Angele and Rayner (2013), Angele et al. (2014), again using the eye-contingent boundary paradigm, reported an experiment with controlled high- and low-frequency words as parafoveal previews which could be identical to the target word or be an incorrect preview that violated the syntactic fit. The authors found that highly frequent words (i.e., not only the article “the”) were skipped more often, even in the incorrect preview condition, suggesting that the results found by Angele and Rayner (2013) on the article the may have been caused by its extremely high frequency, not simply because the specifically triggers word skipping. The 2014 study therefore does not suggest FWs and CWs would be processed differently. Schmauder et al. (2000) found no difference in skipping rates and early eye movement measures between FWs and CWs when word frequency and word length were matched. It should be noted that the pattern of results they found was not in the expected direction (i.e., while statistically not significant, FWs had longer mean fixation times than CWs) and it has been argued that some of the FWs they used were especially infrequent, which could mean their results might not generalise to more frequent words (Roy-Charland & Saint-Aubin, 2006). Unfortunately, Schmauder et al. (2000) did not provide a list of the stimuli they used, so this criticism cannot be verified. More recently, stronger evidence for a lack of difference between FWs and CWs was provided by Staub (2024), who reported the results of two large-scale analyses carried out in a corpus study in English in which FWs and CWs were matched on word length, predictability, and frequency. Staub found no effect of word class on word skipping and little to no difference in fixation times. We will return to this study in the “General Discussion.”

Another set of analyses on word classes comes from Kliegl (2007). Kliegl analysed single fixation durations (the duration of the fixation on a word that was fixated exactly once) and skipping rates on word triplets composed of a target word, the preceding word, and the following word: word n, word n – 1 and word n + 1, respectively. For example, Kliegl found that FWs had longer fixation times when word n + 1 was a CW about to be skipped. The author also reported that skipping costs were only observed after FWs were skipped, but not CWs. Of particular interest to us is that these differences between CWs and FWs were observed in the Potsdam Corpus, a corpus of sentence reading in German that allowed for control of word length and frequency (Kliegl et al., 2006). Also, in a research field often dominated by studies in English, a language in which FWs carry little semantic information, have no gender or number marking, and are generally shorter, it is important to explore how different characteristics of FWs may impact reading in other languages, such as German.

To state that FWs usually do not carry any content at all might be too strong a statement. Even in a language such as English, it can be said that FWs do carry some content. As Michel Lange et al. (2017) argue, in the sentence the wife of my friend, both the and of have content, those being possession (of) and definiteness (the), the latter in contrast with indefiniteness (some). However, in some languages, FWs often clearly carry more content compared with English, for instance in the form of gender and number marks in Brazilian Portuguese (BP), as well as in European Portuguese.¹ An example to compare FWs in BP to English is the word algum, which translates to some in English. First, some contrasts the idea of all, which in itself carries some information. This distinction between all and some is true in English as well, however, in BP, algum also carries both gender and number marking (algum² is male and singular, alguns is male plural, alguma is female singular, and algumas is female plural). In addition, the gender and number marking of an FW will often, particularly in formal written language, indicate the gender and number for the following CW, except when the FW does not have gender and number marking. For instance, algumas (female and plural) will be followed by a CW that is also female and plural, except when algumas is the last word in the sentence. This means that, in BP, FWs carry comparatively richer semantic and syntactic information in the form of gender and number marking.

In this study, our objective was to examine eye movement behaviour on FWs and CWs in BP. To that end, we analysed data from the RASTROS corpus of natural reading in BP (Leal et al., 2022; Vieira, 2020). The RASTROS Corpus is the first corpus of natural reading in BP and it contains eye movement measures during reading on 50 nonmanipulated short paragraphs taken from books and websites, as well as predictability values for every word, except the first word in each paragraph. The most common way to establish the predictability of a word is by doing a Cloze Task (Taylor, 1953). In this task, participants are given part of a sentence and they have to continue the sentence with the word they think will come next (Kuperberg & Jaeger, 2016). This task can be used for one word in a sentence (e.g., Last summer we bought a _____), or for all the words in a sentence or paragraph. When a Cloze Task is used for all words, participants are usually given the first word and have to keep guessing each following word while receiving feedback after each answer. The most commonly given answers are considered to be the more predictable words in that context. Previous corpora studies that used the Cloze Task for multiple words include the PROVO Corpus, in English (Luke & Christianson, 2018) and the Potsdam corpus, in German (Kliegl et al., 2006). The same procedure was also used to create the RASTROS corpus. Besides predictability, the RASTROS also contains frequency measures and the word length of the words that make up the corpus.

To analyse reading data from corpus studies as opposed to experiments with controlled stimuli is a methodological choice in this study. Unlike what happens in manipulated studies, where the researcher can focus on specific predictors and control others (e.g., match word length and compare eye movements on high and low-frequency words), corpus studies offer data that are more natural and, as such, are influenced by many factors. This allows researchers to invest in exploratory analyses which, in turn, may lead to interesting findings that can be subsequently examined in controlled experiments with manipulated stimuli.

We expected to replicate the well-reported effects of word length, frequency and predictability, such that shorter, more frequent and more predictable words will show higher skipping rates and shorter fixation durations. The novelty of the current study is that we are also analysing word class when comparing CWs to FWs in BP, a language in which FWs carry comparatively more semantic and syntactic information than in English, for example. We expected FWs to be skipped more often and have shorter fixation times than CWs, but the extent of this difference could be smaller than what is reported in the literature in English, as the difference between CWs and FWs might be less pronounced in BP. Our focus was also on the extent to which eye movement behaviour on FWs versus CWs would be different when effects of word length, frequency and predictability were accounted for.

Methods

For detailed information on the RASTROS corpus, we refer to Vieira (2020), but will repeat the most relevant information here as Vieira (2020) is accessible online but as an unpublished thesis.

Participants

Sixty undergraduate students (29 female; mean age: 22.2 years; range: 18–40 years) from the Federal University of Ceará, Brazil participated in the eye-tracking task. Eleven participants had to be removed for not finishing the task or due to bad calibration, leaving a total of 49 participants. As is common in Brazil, participants were volunteers and did not receive any compensation (resulting in some of them not finishing the experiment). All participants had normal or corrected to normal vision, had no reported reading difficulties, and were native speakers of Brazilian Portuguese. The experimental procedure for the RASTROS Corpus was approved by the ethics committee at the Federal University of Ceará, and the secondary analyses reported here were approved by the ethics committee at the University of Southampton. All participants signed an informed consent form.

Material

Participants read 50 short paragraphs taken from various websites and books in the public domain. In total, the paragraphs had 120 sentences and 2,494 words (1,237 unique words). The range of the number of sentences per paragraph was 1–5 (M = 2.4); the range of words per sentence was 3–60 (M = 20); the words per paragraph range was 36–70 (M = 49); the word length range was 1–15 (M = 4.7); and the average length of function words was 2.4 letters whereas the average content word length was 6.4 letters. In accordance with the Brazilian Portuguese hyphen rules, hyphenated words were considered as one, hence word length could be up to 15 letters long. Predictability values for every word, except the first word, in each paragraph, were acquired through a Cloze Task on the Simpligo Online Platform (as described by Leal et al., 2022). Four hundred and seventeen participants completed the Cloze Task for every word in five out of the 50 paragraphs. They were given the first word and filled in every word until the end of the paragraph. After each word, the correct answer appeared on the screen before they continued to the next one.

Apparatus and procedure

Eye movements were recorded using an Eyelink 1000 eye tracker (SR Research) with a chin rest. The experiment was programmed in Experiment Builder (SR Research). Paragraphs were shown using Courier New font, size 18-point with double spacing between each line. The background was light grey and the text font was black. The distance between the participant’s eye and the monitor was about 65 cm, which amounts to about three letters per degree of visual angle.

Participants read all 50 paragraphs and the reading task took 30 min on average. A nine-point calibration was performed at the beginning of the experiment and repeated every 10 min to ensure precision. For each paragraph, once a stable fixation was registered on the dot presented on the screen, the paragraph was revealed ensuring the eyes were fixating on the first word. If the drift correction deviated more than 0.5° from the focal point, a full recalibration was carried out.

First, participants read two practice paragraphs, and then the 50 experimental paragraphs were read in a pseudo-random sequence. After finishing a paragraph, participants had to press a button on a joystick in front of them to continue. To ensure participants were reading attentively, 20 of the trials were followed by simple yes-no comprehension questions. One participant had 70% accuracy and was removed. All remaining participants had an accuracy of 75% or above (93% on average). Participants were asked to move as little as possible and to read silently.

Data availability

All materials, including eye movement data, cloze data, and R scripts used for statistical analyses are available at https://osf.io/pqhx9/

Results

We used Data Viewer (SR Research) to do the initial data processing. Following standard procedures, we merged all fixations shorter than 80 ms with fixations that were no more than 0.5° away. Second, we repeated the procedure, but for fixations shorter than 40 ms with fixations within 1.25° distance. Finally, we removed all remaining fixations shorter than 80 ms, longer than 800 ms, and that were outside any interest areas (i.e., words). Subsequently, we examined each trial for tracking loss or whether participants accidentally ended a trial prematurely by pressing the button too soon, which resulted in removing 0.5% of all paragraphs. We also removed outliers (2.5 standard deviations from the mean per subject) from each reading time measure individually, which meant removing approximately 2% of the remaining data.

Finally, the first word of every paragraph, the first word on every line, and the last word of each sentence (to avoid wrap-up effects) were removed from the analyses. The analyses were run on all remaining words. All words were tagged for their part-of-speech (e.g., noun, verb, preposition, conjunction) using the Palavras Parser, a part-of-speech tagger in Brazilian Portuguese (Bick, 2000) and, for the purpose of the analyses we ran, words were marked as either content words or function words.

Eye movement data

The eye movement measures we report here are first fixation duration (FFD), which is the duration of the first fixation on any given word; Gaze duration (GD), which is the sum of all reading times during the first pass on a given word; Go past time (GPT), which is the time taken, including regressions, from entering a given word to going past it for the first time; and Skipping Rates, which is how often a given word is not directly fixated during the first pass. We do not report Single Fixation Durations in this article because those only rarely happen on longer words. Words that have eight or more letters are fixated 90% of the time, often with more than one fixation (Rayner, 1998), and our data includes words with up to 15 letters. Therefore, this measure is less suited for our analyses. To normalise the distribution of the fixation duration measures, values were log-transformed. Predictability, frequency, and length values were centred before being entered as predictors in the models. The RASTROS corpus contains two frequency corpora, the BrWac (Wagner Filho et al., 2018) and the Brasileiro corpus (Sardinha, 2009) which both have for the words in the RASTROS corpus a very high correlation between word length and frequency. As the Brasileiro corpus still had a somewhat less stronger correlation (-.75) than the BrWac corpus (-.85), we selected the frequency norms of the Brasileiro corpus.³ Previous research (Brysbaert & New, 2009) has indicated that corpus size does not matter much once the size exceeds 30 million words, a criterion easily met by both corpora (~2.7 billion tokens in the BrWac and ~1 billion tokens in the Brasileiro).

We employed a two-step modelling approach to find the best statistical models. First, we used the buildmer package (version 2.11) in R Studio (R version 4.1.1) to identify the optimal logistic or linear mixed model. Specifically, we used the forward function built in the buildmer package to iteratively build the models by adding predictors and comparing their fit. This approach allowed us to find the largest fixed and random structures that resulted in converging linear mixed models, yielding different models depending on the dependent variable. However, this often resulted in a random structure that did not contain any slopes for the fixed factors, which could be anti-conservative (Barr et al., 2013). We therefore used the fixed structures identified in the previous step to build a Bayesian Linear Mixed model using the brms package (version 2.21.0), for each dependent variable. This technique allows for the maximal random structure for both participants and items. Gaussian distribution was used for fixation time measures and the Bernoulli distribution was used for skipping rates. All models converged with four chains with 5,000 iterations each. In addition, when the initial Bayesian model showed that the 95% credible intervals for three-way or four-way interactions included zero, these terms were removed and we ran a new model with the remaining fixed structure. Reading times were log-transformed for these analyses. We report the estimates, standard errors, and lower and upper 95% Bayesian Credible intervals in Table 1. We consider that there is an effect for a predictor or interaction if the value 0 is not part of the 95% Credible Interval. Figures were plotted in R Studio (R version 4.1.1) using the ggplot2 package (version 3.5.1).

Table 1.

Bayesian linear mixed models for word predictability, word length, word frequency, and word class for all eye movement measures.

	Fixed Effects	b	SE	l-95%	u-95%
Skipping Rates	Intercept	-1.45	0.10	-1.64	-1.25
	Length	-0.37	0.02	-0.42	-0.32
	Word Class	0.24	0.08	0.08	0.39
	Predictability	0.59	0.09	0.41	0.76
	Frequency	0.18	0.03	0.12	0.23
	Length × Word Class	-0.21	0.05	-0.31	-0.11
	Length × Predictability	-0.14	0.04	-0.22	-0.07
	Length × Frequency	-0.06	0.01	-0.08	-0.05
	Word Class × Predictability	-0.38	0.14	-0.66	-0.10
	Word Class × Frequency	-0.15	0.08	-0.31	0.00
	Predictability × Frequency	-0.14	0.07	-0.27	0.00
FFD	Intercept	5.34	0.02	5.30	5.37
	Length	0.01	0.00	0.01	0.01
	Frequency	-0.01	0.00	-0.02	-0.01
	Predictability	-0.05	0.01	-0.08	-0.02
	Word Class	-0.01	0.02	-0.05	0.02
	Length × Frequency	0.00	0.00	0.00	-0.00
	Frequency × Predictability	0.02	0.01	0.00	0.05
	Predictability × Word Class	0.01	0.06	-0.10	0.12
	Length × Word Class	-0.02	0.01	-0.04	0.00
	Length × Predictability	0.01	0.01	0.00	0.02
	Frequency × Word Class	0.01	0.02	-0.04	0.05
	Length × Predictability × Class	-0.04	0.02	-0.07	-0.01
	Length × Frequency × Predictability	0.00	0.00	-0.01	0.00
	Length × Frequency × Class	0.01	0.01	0.00	0.03
	Frequency × Predictability × Class	-0.09	0.03	-0.15	-0.03
Gaze Duration	Intercept	5.49	0.02	5.45	5.53
	Length	0.04	0.00	0.04	0.05
	FrequencyLog	-0.03	0.00	-0.04	-0.02
	Predictability	-0.14	0.02	-0.17	-0.10
	Word Class	0.01	0.03	-0.05	0.07
	Frequency × Predictability	0.07	0.02	0.03	0.11
	Length × Frequency	0.00	0.00	-0.01	0.00
	Predictability × Word Class	0.08	0.08	-0.07	0.23
	Frequency × Word Class	-0.05	0.04	-0.12	0.03
	Length × Word Class	-0.03	0.01	-0.06	0.00
	Length × Predictability	0.02	0.01	0.00	0.04
	Frequency × Predictability × Class	-0.14	0.04	-0.22	-0.05
	Length × Frequency × Class	0.00	0.01	-0.02	0.03
	Length × Frequency × Predictability	-0.01	0.01	-0.02	0.01
	Length × Predictability × Class	-0.04	0.02	-0.09	0.00
Go Past Time	Intercept	5.71	0.03	5.65	5.76
	Length	0.05	0.00	0.05	0.06
	FrequencyLog	-0.06	0.01	-0.07	-0.04
	Predictability	-0.19	0.03	-0.24	-0.14
	Word Class	0.00	0.03	-0.06	0.06
	Length × Frequency	0.00	0.00	-0.01	0.00
	Frequency × Predictability	0.05	0.02	0.01	0.10
	Length × Word Class	-0.04	0.02	-0.07	0.00
	Predictability × Word Class	0.01	0.05	-0.08	0.10
	Frequency × Word Class	-0.03	0.03	-0.09	0.03
	Length × Predictability	0.01	0.01	-0.01	0.03

Open in a new tab

Note: Credible intervals that do not include 0 are in bold.

Eye movement measures

All mean values and standard deviations are presented in Table 2.

Table 2.

Mean values and standard deviations for eye movement measures for content and function words.

Measure	CW		FW
Measure	Mean	SD	Mean	SD
Skipping (%)	18	38	62	0.49
FFD (ms)	219	70	203	70
GD (ms)	286	137	230	108
GPT (ms)	414	348	318	310

Open in a new tab

Skipping rates

We found a main effect of predictability such that more predictable words were skipped more often. Our analyses also showed a main effect of word length whereby shorter words were skipped more often. In addition, the main effect of word frequency was such that more frequent words were skipped more often. Finally, there was also a main effect of word class where FWs were skipped more often than CWs. These main effects were all qualified by two-way interactions. The interaction between predictability and word length was such that the skipping of shorter words was more influenced by predictability, while there was barely any effect of predictability on skipping rates of longer words (Figure 1a). In addition, the interaction between length and frequency showed that the effect of frequency was stronger on skipping rates of shorter words with again little effect on longer words (Figure 1b.). Similarly, the interaction on skipping rates between word length and word class is such where shorter FWs were skipped more often than same-sized CWs, but for longer words, there is no difference between classes (Figure 1c). Finally, we also found an interaction between predictability and word class where there is a clear predictability effect for CWs but hardly for FWs (Figure 1d).

Figure 1. — Two-way interactions in skipping rates.

*Note*: Word length is centred around the mean of 4.7. The shaded areas represent the 95% confidence intervals.

FFD

We found a main effect of predictability such that more predictable words had shorter FFD compared with less predictable words. Also, as expected we found a main effect of word length such that longer words had a longer FFD. In addition, the main effect of frequency was apparent from more frequent words having shorter fixations. We found no main effect of word class. However, we did find a three-way interaction between word length, word predictability, and word class. As can be seen from Figure 2a, first fixation times looks very similar between FWs and CWs when predictability is low. However, for high predictable words, the picture is quite different with inverse word length effects for FWs (longer first fixation times for shorter words). We also found a three-way interaction of word frequency, predictability and word class. As can be seen in Figure 2b, effects of predictability look quite similar for FWs and CWs for high-frequency words but not for low-frequency words where, for FWs, predictability effects are absent or almost going into an opposite direction (longer times for more predictable FWs). However, the figure also shows how this is driven by a small amount of low-frequency FWs as is evident from the very wide confidence intervals.

Figure 2. — Two-way interactions in fixation times.

*Note*: Word length is centred around the mean of 4.7. The shaded areas represent the 95% confidence intervals.

GD

We also found main effects of predictability, frequency, and word length on GD and such that more predictable, more frequent, and shorter words received shorter GD. The data showed no main effect of word class on GD. We also found a two-way interaction between frequency and predictability that was qualified by a three-way interaction that also included word class (Figure 2c). This interaction closely mimics the interaction observed in FFD (Figure 2b). The effects of predictability are quite comparable for high frequency FWs and CWs but for low frequency FWs effects of predictability are limited and almost in opposite direction (longer GD for high predictable words). Again, this interaction seems to be driven by a low number of low frequency FWs as is clear from the very wide confidence intervals (Figure 2c).

GPT

Analyses showed main effects of predictability, frequency, and length on GPTs such that more predictable, frequent and shorter words received shorter GPTs. There was no main effect of word class on GPTs, and all interactions involving word class had credible intervals that included zero. The two-way interaction between frequency and predictability showed that the effect of predictability was stronger on words with low frequency (Figure 2d).

Discussion

Studies on eye movements during reading have primarily focused on the processing of CWs. Those few studies that have examined FW have consistently suggested that FWs are processed more quickly than CWs, as indicated by FWs having higher skipping rates (Angele & Rayner, 2013; Staub et al., 2018) and shorter fixation times (Rayner, 1998) than CWs. The nature of these findings has been questioned by two studies on reading in English. Schmauder et al. (2000) found no difference in early processing measures between CWs and FWs when matching both word classes on length and frequency (though see Roy-Charland & Saint-Aubin, 2006 for a criticism of this study). More recently, Staub (2024) reported very limited differences in eye movement behaviour between word classes (restricted to CWs receiving unexpectedly slightly shorter fixation times than FWs) when matching on word length, frequency, and predictability. These latter findings raise questions as to whether the commonly observed higher skipping rates and shorter fixation times on FWs are merely a side effect of FWs typically having a higher frequency and shorter word length than CWs.

In the current study, we set out to compare CWs and FWs in BP, a language that, compared with English, can have longer FWs which contain more semantic and syntactic information, to examine if eye movement behaviour would be similar to what is found in languages where most FWs carry comparatively little information. We analysed data from the RASTROS corpus of natural reading in BP (Vieira, 2020) and replicated previous findings showing that when looking at overall averages, FWs were skipped considerably more often than CWs and received shorter fixation times when they were fixated (Rayner, 1998).

Delving deeper into the results of the statistical models, first we found main effects of word length, word frequency, and word predictability on all measures. Longer words were fixated for longer and skipped less often than short words, while more frequent and predictable words received shorter fixations and were skipped more often than less frequent and unpredictable words. These results are consistent with the literature (Rayner, 1998).

In skipping rates, we also found a main effect of Word Class in that FWs were skipped more often than CWs. All main effects in word skipping were qualified by four two-way interactions. Three of these interactions are with word length showing that for longer words the effects of Predictability, Frequency and Word Class disappear, very likely due to floor effects on word skipping. Note that for the first interaction between word predictability and word length (see Figure 1a), Rayner et al. (2011) found no interaction on skipping rates. However, our current study has words that are even longer (up to 15 letters long) than the long words in the Rayner et al. study (10–12 letters). Therefore, the most parsimonious explanation for the observed patterns points towards floor effects on the longest words, which were only very rarely skipped.

The effect of Word Class on word skipping is further qualified by an additional interaction which includes predictability (see Figure 1d) such that the influence of word predictability on the skipping rates of CWs was stronger than on FWs. One possible reason for the comparatively reduced influence of predictability on FWs compared with CWs might lie in the distinction between full predictability and partial predictability. Full predictability was included in the models and means anticipating the exact word, whereas partial prediction means to be able to anticipate part of a word’s information, such as the part-of-speech or inflection. The exact definition of partial predictability can vary from study to study. Here, when we refer to partial predictability, we refer to part-of-speech predictability. The syntactic structure of the sentence often has sufficient information for the reader to predict, with good certainty, what the grammatical class of the following word is. If the system can predict that the next word is an FW and has learned that skipping FWs usually does not result in slowing down the reading process, it may be that part-of-speech predictability instead of full predictability is the stronger and more appropriate predictor for skipping rates of FWs. As a result, the influence of full predictability would be more pronounced for CWs than FWs. The relevance of the distinction between full and partial predictability is also supported by Luke and Christianson (2016) who found that predicting a word’s part-of-speech meant shorter fixation times on both CWs and FWs in English, but only facilitated the skipping rates of FWs. In contrast, the authors found full predictability to be facilitative on all analyses for both CWs and FWs.

The RASTROS corpus also includes measures of part-of-speech predictability and to test the hypothesis that partial predictability might be a stronger influence on the skipping rates of FWs, we ran one extra analysis with the same fixed factors in the Bayesian linear mixed model reported above, which are Predictability, Word Class, Word Frequency, and Word Length, and added Part-of-Speech Predictability as a main effect to the model. The full model is reported in the Online Supplementary Material. The results were very straightforward: there was no main effect of part-of-speech predictability on skipping rates. This lack of effect indicates that the observed interaction between full predictability and word class was not due to a differential impact of partial predictability on FWs versus CWs. Therefore, the observation remains that predictability from the preceding context plays a smaller role in the decision to skip the next word if it is an FW compared with when the word is a CW.

For an FW to be skipped more often than a CW, the system can rely on two sources of information for determining whether the next word is an FW or CW which is the predictability from the preceding context (full or partial) and information about the actual word acquired during parafoveal processing. The lack of effects due to partial predictability on word skipping and the comparatively smaller impact of full predictability on the skipping of FWs compared with CWs suggest that the decision to skip an FW more often than a CW is primarily based on parafoveal processing. This interpretation is also compatible with the interaction we found on skipping rates between word length and word class (Figure 1c). This interaction shows that short FWs were skipped more often than short CWs but for longer words, there is no discernible difference between word classes. If the decision to skip an FW is to a lesser extent based on the preceding context predicting the FWs—and this entails both full and part-of-speech predictability- and more on whether parafoveal processing suggests the next word will be an FW, an interaction with word length is expected in that parafoveal processing will be more likely to reach an advanced state for short words. This more advanced state in processing the parafoveal word could entail determining whether the word is an FW or CW.

Turning to fixation times, we observed two three-way interactions on FFD. The first three-way interaction between word length, predictability and word class shows that, for CWs, word length effects are such that longer words received longer FFDs. However, for FWs the opposite seems to be true in that longer words received shorter fixation durations, especially so for highly predictable FWs. Reverse word length effects in FFDs have been observed before, because shorter words are often fixated only once compared with longer words which will receive more often multiple fixations and single fixations tend to be longer than the first out of multiple fixations (e.g., Rayner et al., 1996). Across the RASTROS corpus, when fixated, 43% of CWs received exactly one fixation whereas 65% of FWs received exactly one fixation. As such, FFDs for FWs will more often be single fixation durations compared with FFDs on CWs. Likewise, more predictable FWs will even more likely receive a single fixation compared with multiple fixations explaining the interaction with predictability. However, as clearly shown on Figure 2a, the confidence intervals for highly predictable FWs are substantial due to a very low number of highly predictable FWs (see Figure 3), so caution is needed for interpreting the effects of predictability on FWs. Note that the absence of this interaction in GD is compatible with the hypothesis that this interaction is due to a different mixture of single fixation and first out of multiple fixations in the FFD analysis for FWs compared with CWs. And as such, this interaction is unlikely to reflect processing differences between FWs and CWs.

Figure 3. — Word predictability per word length for content and function words.

The second three-way interaction we found on FFD involves predictability, frequency, and word class (Figure 2b) and closely resembles the same three-way interaction observed in GD (Figure 2c). This interaction shows that for high frequency words there is little difference in the effects of predictability between FWs and CWs. For low frequency words, FWs do show quite different patterns compared with CWs such that predictability effects in FWs are minimal and are almost in the opposite direction (longer times on more predictable words). However, it is important to note that this interaction is driven by low frequency FWs of which there were close to none in the corpus (see Figure 4), as is also clear from the large confidence intervals in Figure 2b and c for low frequency FWs. As such, we argue that the interpretation of these interactions is difficult, if not futile, and further exploration, in the form of an experiment analysing CWs and FWs with orthogonally manipulated frequency (high and low) and predictability (high and low), is required. When focusing on the frequency range where we do have adequate numbers of FWs (above 0 in Figure 2b), we replicate standard predictability effects.

Figure 4. — Word predictability per word frequency for content and function words.

Finally, our results also showed a two-way interaction between word frequency and word predictability on GPTs where there was only an effect of predictability on low frequency words. While previous research has shown that the effects of frequency and predictability are mostly additive (see Rayner et al., 2004 for an example of a mild interaction), we believe the simpler explanation for this interaction in our data is floor effects on the very high frequency words, where fixation times cannot be shorter, causing the effects of predictability to be slightly reduced on high frequency words.

Overall, our analyses of eye movement measures during reading point towards rather limited differences between FWs and CWs in that only the skipping of short words seems to clearly show higher skipping of FWs compared with CWs. Any differences observed in fixation times were limited and are likely a consequence of the little data we have on very low frequency FWs and potentially a different mixture of single versus first out of multiple fixations in FFDs. There were no differences between CWs and FWs on later processing stages, as indicated by GPTs. This differential impact on skipping rates versus fixation times illustrates how both eye movement measures can reflect different processing stages in the recognition of a word. Models such as the E-Z Reader model (Reichle et al., 1998) assume that a parafoveal word is skipped because an advanced stage in word recognition is reached such that the word will typically be recognised by the time the saccade that skips the word has landed further in the text. Alternative models such as the Extended Optimal Viewing Position model (Brysbaert & Vitu, 1998) assume a less advanced parafoveal processing informing the decision to skip a word and in this model the decision to skip a word is mostly viewed as an educated guess that skipping the next word will not hinder overall text understanding. Regardless of the specific theoretical framework, the decision to skip a word is usually not seen as based on the word being completely identified at the moment the decision is made. Fixation times on the other hand are, at least in a serial model such as E-Z reader model, thought to typically reflect complete word identification. From this perspective, our limited effects on fixation times would indicate a very similar processing ease for fully identifying FWs and CWs when word length, frequency, and predictability are taken into account. However, in those instances when parafoveal processing is fast enough (i.e., a short word in the parafovea) to provide indications of word class, this information is taken into account when deciding whether or not to skip the next word. This could reflect that the system has learned that FWs in BP can often be skipped without hindering overall text understanding. This strategy would make sense in BP given that much of the information contained in short FWs (gender and number marking), would often also be present in the word following the FWs.⁴ This is true, particularly for determiners, which always carry gender and number marks, but also for contractions between prepositions and determiners (i.e., do/da/dos/das, contractions between “of and the” or “from and the” in English), while only certain classes of FWs do not carry morphological information (e.g., conjunctions and prepositions out of contractions).

When we started this research, our starting premise was that given that FWs in BP contain comparatively more information than in English, FWs would behave more like CWs in comparison to languages where FWs carry comparatively little information. Our results seem to indicate that FWs in BP behave similarly to CWs, with the exception of skipping rates of very short words. However, given the lack of differences recently reported by Staub (2024) between FWs and CWs in English, our results are more properly interpreted as adding crosslinguistic evidence to CWs and FWs eliciting similar eye movement behaviour when the differences between FWs and CWs in terms of word length, frequency, and predictability are taken into account. This compatibility of results was also apparent despite the differences in how predictability was implemented between the two studies. Whereas Staub (2024) used next word predictability obtained from the large language model GPT-2, the predictability norms we used were the more traditional sentence completion rates embedded in the RASTROS corpus. The similarity in results can be considered a form of convergent validity between these two approaches to quantifying predictability.

Our interpretation that the higher skipping rates of short FWs, when compared with CWs, were mostly based on parafoveal processing and only to a lesser extent on predictability can be more directly explored. Such an experiment could make use of the eye contingent boundary paradigm (Rayner, 1975) to mask the identity of a function or content word n + 1, while controlling for its predictability and part-of-speech predictability to further examine our interpretation of the current results.

An important contribution of the present study is its comprehensive examination of a range of predictors of eye movements during reading. Specifically, our analyses incorporate a very wide spectrum of word length (ranging from 1 to 15 letter-long words), as well as word frequency and predictability (from very low to very high), along with all possible interactions among these factors and word class (function vs. content words). This extensive range is an important advancement over similar studies, which often do not encompass such variability in their predictors. For example, the Provo Corpus (Luke & Christianson, 2018) is an important contribution to the literature on the processing of content and function words, but the study does not include word class as a predictor in their statistical models, or Staub (2024) whose important contribution does include word class as a predictor, but who limited word length from four to six letters. Whereas our wide ranges in the predictors did come with a cost (e.g., difficulties interpreting findings on low frequency FWs due to their scarcity), overall, we think our approach allows for a more thorough examination of FWs versus CWs processing. Whereas restricting our analyses to the ranges of predictors previously used in the literature (e.g., Staub, 2024) mostly replicates previous findings, our wider ranges allow for new observations (e.g., floor effects of word length and frequency for the skipping of longer words).

To summarise, our results show that when controlling for word length, frequency, and predictability, word class had very limited effects on eye movement measures in BP mostly restricted to influenced the skipping rates of very short words, such that short FWs were skipped more often than short CWs. In terms of fixation times, we found little differences between classes, restricted to early measures and likely due to the limited data we have on low-frequency FWs. These results indicate that whereas the processing ease for an FW might be very similar to a CW when word length, predictability, and frequency are taken into account, at least in BP a preference to skip short FWs is present. This latter mechanism could reflect an educated guess that skipping FWs often does not hinder text understanding as much of the information contained in the FW (gender and number marking) will also often feature in the word following the FW.

Supplemental Material

sj-docx-1-qjp-10.1177_17470218241307582 – Supplemental material for When function words carry content

sj-docx-1-qjp-10.1177_17470218241307582.docx^{(16.9KB, docx)}

Supplemental material, sj-docx-1-qjp-10.1177_17470218241307582 for When function words carry content by João Vieira, Elisângela Teixeira, Erica Rodrigues, Hayward J Godwin and Denis Drieghe in Quarterly Journal of Experimental Psychology

^1.

Even though the present study focuses on Brazilian Portuguese, there is no reason to believe that FWs would be processed any differently in any other variety of Portuguese.

^2.

For instance, a direct translation of “Some time ago” in BP is “Algum tempo atrás”, in which “algum” is a masculine singular function word.

^3.

All VIF values were below 1.3.

^4.

For example, in the sentence “[. . .] e agora as abelhas também se juntaram ao clube.,” where “as,” “the” in English, which is feminine and plural, is followed by the word “abelhas,” “bees” in English, also feminine and plural. A direct translation would be “[. . .] and now the bees also joined the club.”

Footnotes

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding: The author(s) received no financial support for the research, authorship, and/or publication of this article.

Pre-registration: This study was not preregistered.

ORCID iDs: João Vieira Inline graphic https://orcid.org/0000-0001-5215-2020

Elisângela Teixeira Inline graphic https://orcid.org/0000-0003-3924-3985

Erica Rodrigues Inline graphic https://orcid.org/0000-0002-3524-5820

Hayward J Godwin Inline graphic https://orcid.org/0009-0005-1232-500X

Denis Drieghe Inline graphic https://orcid.org/0000-0001-9630-8410

Data accessibility statement: Inline graphic

graphic file with name 10.1177_17470218241307582-img3.jpg

The data and materials from the present experiment are publicly available at the Open Science Framework website: https://osf.io/pqhx9/.

Supplemental material: The supplementary material is available at qjep.sagepub.com.

References

Angele B., Laishley A. E., Rayner K., Liversedge S. P. (2014). The effect of high- and low-frequency previews and sentential fit on word skipping during reading. Journal of Experimental Psychology: Learning, Memory, and Cognition, 40(4), 1181–1203. 10.1037/a0036396 [DOI] [PMC free article] [PubMed] [Google Scholar]
Angele B., Rayner K. (2013). Processing the in the parafovea: Are articles skipped automatically? Journal of Experimental Psychology: Learning, Memory, and Cognition, 39(2), 649–662. 10.1037/a0029294 [DOI] [PubMed] [Google Scholar]
Balota D. A., Pollatsek A., Rayner K. (1985). The interaction of contextual constraints and parafoveal visual information in reading. Cognitive Psychology, 17(3), 364–390. 10.1016/0010-0285(85)90013-1 [DOI] [PubMed] [Google Scholar]
Barr D. J., Levy R., Scheepers C., Tily H. J. (2013). Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of Memory and Language, 68(3), 255–278. 10.1016/j.jml.2012.11.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
Bick E. (2000). The parsing system Palavras: Automatic grammatical analysis of Portuguese in a constraint grammar framework [Doctoral thesis, Aarhus University]. [Google Scholar]
Brysbaert M., Drieghe D., Vitu F. (2005). Word skipping: Implications for theories of eye movement control in reading. In Underwood G. (Ed.), Cognitive processes in eye guidance (pp. 53–77). Oxford University Press. 10.1093/acprof:oso/9780198566816.003.0003 [DOI] [Google Scholar]
Brysbaert M., New B. (2009). Moving beyond Kučera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English. Behavior Research Methods, 41(4), 977–990. 10.3758/brm.41.4.977 [DOI] [PubMed] [Google Scholar]
Brysbaert M., Vitu F. (1998). Word skipping: Implications for theories of eye movement control in reading. In Underwood G. (Ed.), Eye guidance in reading and scene perception (pp. 125–147). Elsevier Science Ltd. 10.1016/B978-008043361-5/50007-9 [DOI] [Google Scholar]
Corcoran D. W. (1966). An acoustic factor in letter cancellation. Nature, 210(5036), Article 658. 10.1038/210658a0 [DOI] [PubMed] [Google Scholar]
Diaz M. T., McCarthy G. (2009). A comparison of brain activity evoked by single content and function words: An fMRI investigation of implicit word processing. Brain Research, 1282, 38–49. 10.1016/j.brainres.2009.05.043 [DOI] [PMC free article] [PubMed] [Google Scholar]
Gautier V., O’Regan J. K., Le Gargasson J. F. (2000). “The-skipping” revisited in French: Programming saccades to skip the article “les.” Vision Research, 40(18), 2517–2531. 10.1016/s0042-6989(00)00089-4 [DOI] [PubMed] [Google Scholar]
Inhoff A. W., Rayner K. (1986). Parafoveal word processing during eye fixations in reading: Effects of word frequency. Perception & Psychophysics, 40(6), 431–439. 10.3758/BF03208203 [DOI] [PubMed] [Google Scholar]
Juste F. S., Sassi F. C., de Andrade C. R. F. (2012). Exchange of disfluency with age from function to content words in Brazilian Portuguese speakers who do and do not stutter. Clinical Linguistics & Phonetics, 26(11–12), 946–961. 10.3109/02699206.2012.728278 [DOI] [PubMed] [Google Scholar]
Kliegl R. (2007). Toward a perceptual-span theory of distributed processing in reading: A reply to Rayner, Pollatsek, Drieghe, Slattery, and Reichle. Journal of Experimental Psychology: General, 136(3), 530–537. 10.1037/0096-3445.136.3.530 [DOI] [PubMed] [Google Scholar]
Kliegl R., Nuthmann A., Engbert R. (2006). Tracking the mind during reading: The influence of past, present, and future words on fixation durations. Journal of Experimental Psychology: General, 135(1), 12–35. 10.1037/0096-3445.135.1.12 [DOI] [PubMed] [Google Scholar]
Kuperberg G. R., Jaeger T. F. (2016). What do we mean by prediction in language comprehension? Language, Cognition and Neuroscience, 31(1), 32–59. 10.1080/23273798.2015.1102299 [DOI] [PMC free article] [PubMed] [Google Scholar]
Leal S. E., Lukasova K., Carthery-Goulart M. T., Aluísio S. M. (2022). Rastros project: Natural Language Processing Contributions to the development of an eye-tracking corpus with predictability norms for Brazilian Portuguese. Language Resources and Evaluation, 56(4), 1333–1372. 10.1007/s10579-022-09609-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
Liversedge S. P., Findlay J. M. (2000). Saccadic eye movements and cognition. Trends in Cognitive Sciences, 4(1), 6–14. 10.1016/s1364-6613(99)01418-7 [DOI] [PubMed] [Google Scholar]
Luke S. G., Christianson K. (2016). Limits on lexical prediction during reading. Cognitive Psychology, 88, 22–60. 10.1016/j.cogpsych.2016.06.002 [DOI] [PubMed] [Google Scholar]
Luke S. G., Christianson K. (2018). The Provo Corpus: A large eye-tracking corpus with predictability norms. Behavior Research Methods, 50(2), 826–833. 10.3758/s13428-017-0908-4 [DOI] [PubMed] [Google Scholar]
Matin E. (1974). Saccadic suppression: A review and an analysis. Psychological Bulletin, 81(12), 899–917. 10.1037/h0037368 [DOI] [PubMed] [Google Scholar]
Michel Lange V., Messerschmidt M., Harder P., Siebner H. R., Boye K. (2017). Planning and production of grammatical and lexical verbs in multi-word messages. PLOS ONE, 12(11), Article e0186685. 10.1371/journal.pone.0186685 [DOI] [PMC free article] [PubMed] [Google Scholar]
Nuthmann A., Engbert R., Kliegl R. (2005). Mislocated fixations during reading and the inverted optimal viewing position effect. Vision Research, 45(17), 2201–2217. 10.1016/j.visres.2005.02.014 [DOI] [PubMed] [Google Scholar]
Rayner K. (1975). The perceptual span and peripheral cues in reading. Cognitive Psychology, 7(1), 65–81. 10.1016/0010-0285(75)90005-5 [DOI] [Google Scholar]
Rayner K. (1998). Eye movements in reading and information processing: 20 years of research. Psychological Bulletin, 124(3), 372–422. 10.1037/0033-2909.124.3.372 [DOI] [PubMed] [Google Scholar]
Rayner K. (2009). Eye movements and attention in reading, scene perception, and visual search. Quarterly Journal of Experimental Psychology, 62(8), 1457–1506. 10.1080/17470210902816461 [DOI] [PubMed] [Google Scholar]
Rayner K., Ashby J., Pollatsek A., Reichle E. D. (2004). The effects of frequency and predictability on eye fixations in reading: Implications for the E-Z Reader Model. Journal of Experimental Psychology Human Perception & Performance, 30(4), 720–732. 10.1037/0096-1523.30.4.720 [DOI] [PubMed] [Google Scholar]
Rayner K., Sereno S. C., Raney G. E. (1996). Eye movement control in reading: A comparison of two types of models. Journal of Experimental Psychology: Human Perception and Performance, 22(5), 1188–1200. 10.1037/0096-1523.22.5.1188 [DOI] [PubMed] [Google Scholar]
Rayner K., Slattery T. J., Drieghe D., Liversedge S. P. (2011). Eye movements and word skipping during reading: Effects of word length and predictability. Journal of Experimental Psychology: Human Perception and Performance, 37(2), 514–528. 10.1037/a0020990 [DOI] [PMC free article] [PubMed] [Google Scholar]
Reichle E. D., Pollatsek A., Fisher D. L., Rayner K. (1998). Toward a model of eye movement control in reading. Psychological Review, 105(1), 125–157. 10.1037/0033-295x.105.1.125 [DOI] [PubMed] [Google Scholar]
Roy-Charland A., Collin M.-M., Richard J. (2022). The development of the missing-letter effect revisited: The role of word frequency and word function. Experimental Psychology, 69(5), 275–283. 10.1027/1618-3169/a000565 [DOI] [PubMed] [Google Scholar]
Roy-Charland A., Saint-Aubin J. (2006). Short article: The interaction of word frequency and word class: A test of the GO model’s account of the missing-letter effect. Quarterly Journal of Experimental Psychology, 59(1), 38–45. 10.1080/17470210500269428 [DOI] [PubMed] [Google Scholar]
Saint-Aubin J., Poirier M. (1997). The influence of word function in the missing-letter effect: Further evidence from French. Memory & Cognition, 25(5), 665–676. [PubMed] [Google Scholar]
Sardinha T. B. (2009). Corpus Brasileiro [Brazilian Corpus]. https://www.linguateca.pt/acesso/corpus.php?corpus=CBRAS
Schmauder A. R., Morris R. K., Poynor D. V. (2000). Lexical processing and text integration of function and content words: Evidence from priming and eye fixations. Memory & Cognition, 28(7), 1098–1108. 10.3758/BF03211811 [DOI] [PubMed] [Google Scholar]
Staub A. (2024). The function/content word distinction and eye movements in reading. Journal of Experimental Psychology: Learning, Memory, and Cognition, 50(6), 967–984. 10.1037/xlm0001301 [DOI] [PubMed] [Google Scholar]
Staub A., Dodge S., Cohen A. L. (2018). Failure to detect function word repetitions and omissions in reading: Are eye movements to blame? Psychonomic Bulletin & Review, 26(1), 340–346. 10.3758/s13423-018-1492-z [DOI] [PubMed] [Google Scholar]
Taylor W. L. (1953). “Cloze procedure”: A new tool for measuring readability. Journalism Quarterly, 30(4), 415–433. [Google Scholar]
Thi T. L., Na Y., Choi I., Woo J. (2022). Revealing differential importance of word categories in spoken sentence comprehension using phoneme-related representation. Journal of Integrative Neuroscience, 21(1), 029. 10.31083/j.jin2101029 [DOI] [PubMed] [Google Scholar]
Vieira J. M. M. (2020). The Brazilian Portuguese eye tracking corpus with a predictability study focusing on lexical and partial prediction [Master’s thesis, Federal University of Ceará]. Biblioteca Universitária, Universidade Federal do Ceará. http://www.repositorio.ufc.br/handle/riufc/55798 [Google Scholar]
Wagner Filho J. A., Wilkens R., Idiart M., Villavicencio A. (2018). The brWaC corpus: A new open resource for Brazilian Portuguese. Language Resources and Evaluation, LREC 2018, 4339–4344. https://www.aclweb.org/anthology/L18-1686.pdf

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

sj-docx-1-qjp-10.1177_17470218241307582 – Supplemental material for When function words carry content

sj-docx-1-qjp-10.1177_17470218241307582.docx^{(16.9KB, docx)}

Data Availability Statement

All materials, including eye movement data, cloze data, and R scripts used for statistical analyses are available at https://osf.io/pqhx9/

[bibr1-17470218241307582] Angele B., Laishley A. E., Rayner K., Liversedge S. P. (2014). The effect of high- and low-frequency previews and sentential fit on word skipping during reading. Journal of Experimental Psychology: Learning, Memory, and Cognition, 40(4), 1181–1203. 10.1037/a0036396 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr2-17470218241307582] Angele B., Rayner K. (2013). Processing the in the parafovea: Are articles skipped automatically? Journal of Experimental Psychology: Learning, Memory, and Cognition, 39(2), 649–662. 10.1037/a0029294 [DOI] [PubMed] [Google Scholar]

[bibr3-17470218241307582] Balota D. A., Pollatsek A., Rayner K. (1985). The interaction of contextual constraints and parafoveal visual information in reading. Cognitive Psychology, 17(3), 364–390. 10.1016/0010-0285(85)90013-1 [DOI] [PubMed] [Google Scholar]

[bibr4-17470218241307582] Barr D. J., Levy R., Scheepers C., Tily H. J. (2013). Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of Memory and Language, 68(3), 255–278. 10.1016/j.jml.2012.11.001 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr5-17470218241307582] Bick E. (2000). The parsing system Palavras: Automatic grammatical analysis of Portuguese in a constraint grammar framework [Doctoral thesis, Aarhus University]. [Google Scholar]

[bibr6-17470218241307582] Brysbaert M., Drieghe D., Vitu F. (2005). Word skipping: Implications for theories of eye movement control in reading. In Underwood G. (Ed.), Cognitive processes in eye guidance (pp. 53–77). Oxford University Press. 10.1093/acprof:oso/9780198566816.003.0003 [DOI] [Google Scholar]

[bibr7-17470218241307582] Brysbaert M., New B. (2009). Moving beyond Kučera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English. Behavior Research Methods, 41(4), 977–990. 10.3758/brm.41.4.977 [DOI] [PubMed] [Google Scholar]

[bibr8-17470218241307582] Brysbaert M., Vitu F. (1998). Word skipping: Implications for theories of eye movement control in reading. In Underwood G. (Ed.), Eye guidance in reading and scene perception (pp. 125–147). Elsevier Science Ltd. 10.1016/B978-008043361-5/50007-9 [DOI] [Google Scholar]

[bibr9-17470218241307582] Corcoran D. W. (1966). An acoustic factor in letter cancellation. Nature, 210(5036), Article 658. 10.1038/210658a0 [DOI] [PubMed] [Google Scholar]

[bibr10-17470218241307582] Diaz M. T., McCarthy G. (2009). A comparison of brain activity evoked by single content and function words: An fMRI investigation of implicit word processing. Brain Research, 1282, 38–49. 10.1016/j.brainres.2009.05.043 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr11-17470218241307582] Gautier V., O’Regan J. K., Le Gargasson J. F. (2000). “The-skipping” revisited in French: Programming saccades to skip the article “les.” Vision Research, 40(18), 2517–2531. 10.1016/s0042-6989(00)00089-4 [DOI] [PubMed] [Google Scholar]

[bibr12-17470218241307582] Inhoff A. W., Rayner K. (1986). Parafoveal word processing during eye fixations in reading: Effects of word frequency. Perception & Psychophysics, 40(6), 431–439. 10.3758/BF03208203 [DOI] [PubMed] [Google Scholar]

[bibr13-17470218241307582] Juste F. S., Sassi F. C., de Andrade C. R. F. (2012). Exchange of disfluency with age from function to content words in Brazilian Portuguese speakers who do and do not stutter. Clinical Linguistics & Phonetics, 26(11–12), 946–961. 10.3109/02699206.2012.728278 [DOI] [PubMed] [Google Scholar]

[bibr14-17470218241307582] Kliegl R. (2007). Toward a perceptual-span theory of distributed processing in reading: A reply to Rayner, Pollatsek, Drieghe, Slattery, and Reichle. Journal of Experimental Psychology: General, 136(3), 530–537. 10.1037/0096-3445.136.3.530 [DOI] [PubMed] [Google Scholar]

[bibr15-17470218241307582] Kliegl R., Nuthmann A., Engbert R. (2006). Tracking the mind during reading: The influence of past, present, and future words on fixation durations. Journal of Experimental Psychology: General, 135(1), 12–35. 10.1037/0096-3445.135.1.12 [DOI] [PubMed] [Google Scholar]

[bibr16-17470218241307582] Kuperberg G. R., Jaeger T. F. (2016). What do we mean by prediction in language comprehension? Language, Cognition and Neuroscience, 31(1), 32–59. 10.1080/23273798.2015.1102299 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr17-17470218241307582] Leal S. E., Lukasova K., Carthery-Goulart M. T., Aluísio S. M. (2022). Rastros project: Natural Language Processing Contributions to the development of an eye-tracking corpus with predictability norms for Brazilian Portuguese. Language Resources and Evaluation, 56(4), 1333–1372. 10.1007/s10579-022-09609-0 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr18-17470218241307582] Liversedge S. P., Findlay J. M. (2000). Saccadic eye movements and cognition. Trends in Cognitive Sciences, 4(1), 6–14. 10.1016/s1364-6613(99)01418-7 [DOI] [PubMed] [Google Scholar]

[bibr19-17470218241307582] Luke S. G., Christianson K. (2016). Limits on lexical prediction during reading. Cognitive Psychology, 88, 22–60. 10.1016/j.cogpsych.2016.06.002 [DOI] [PubMed] [Google Scholar]

[bibr20-17470218241307582] Luke S. G., Christianson K. (2018). The Provo Corpus: A large eye-tracking corpus with predictability norms. Behavior Research Methods, 50(2), 826–833. 10.3758/s13428-017-0908-4 [DOI] [PubMed] [Google Scholar]

[bibr21-17470218241307582] Matin E. (1974). Saccadic suppression: A review and an analysis. Psychological Bulletin, 81(12), 899–917. 10.1037/h0037368 [DOI] [PubMed] [Google Scholar]

[bibr22-17470218241307582] Michel Lange V., Messerschmidt M., Harder P., Siebner H. R., Boye K. (2017). Planning and production of grammatical and lexical verbs in multi-word messages. PLOS ONE, 12(11), Article e0186685. 10.1371/journal.pone.0186685 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr23-17470218241307582] Nuthmann A., Engbert R., Kliegl R. (2005). Mislocated fixations during reading and the inverted optimal viewing position effect. Vision Research, 45(17), 2201–2217. 10.1016/j.visres.2005.02.014 [DOI] [PubMed] [Google Scholar]

[bibr24-17470218241307582] Rayner K. (1975). The perceptual span and peripheral cues in reading. Cognitive Psychology, 7(1), 65–81. 10.1016/0010-0285(75)90005-5 [DOI] [Google Scholar]

[bibr25-17470218241307582] Rayner K. (1998). Eye movements in reading and information processing: 20 years of research. Psychological Bulletin, 124(3), 372–422. 10.1037/0033-2909.124.3.372 [DOI] [PubMed] [Google Scholar]

[bibr26-17470218241307582] Rayner K. (2009). Eye movements and attention in reading, scene perception, and visual search. Quarterly Journal of Experimental Psychology, 62(8), 1457–1506. 10.1080/17470210902816461 [DOI] [PubMed] [Google Scholar]

[bibr27-17470218241307582] Rayner K., Ashby J., Pollatsek A., Reichle E. D. (2004). The effects of frequency and predictability on eye fixations in reading: Implications for the E-Z Reader Model. Journal of Experimental Psychology Human Perception & Performance, 30(4), 720–732. 10.1037/0096-1523.30.4.720 [DOI] [PubMed] [Google Scholar]

[bibr28-17470218241307582] Rayner K., Sereno S. C., Raney G. E. (1996). Eye movement control in reading: A comparison of two types of models. Journal of Experimental Psychology: Human Perception and Performance, 22(5), 1188–1200. 10.1037/0096-1523.22.5.1188 [DOI] [PubMed] [Google Scholar]

[bibr29-17470218241307582] Rayner K., Slattery T. J., Drieghe D., Liversedge S. P. (2011). Eye movements and word skipping during reading: Effects of word length and predictability. Journal of Experimental Psychology: Human Perception and Performance, 37(2), 514–528. 10.1037/a0020990 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr30-17470218241307582] Reichle E. D., Pollatsek A., Fisher D. L., Rayner K. (1998). Toward a model of eye movement control in reading. Psychological Review, 105(1), 125–157. 10.1037/0033-295x.105.1.125 [DOI] [PubMed] [Google Scholar]

[bibr31-17470218241307582] Roy-Charland A., Collin M.-M., Richard J. (2022). The development of the missing-letter effect revisited: The role of word frequency and word function. Experimental Psychology, 69(5), 275–283. 10.1027/1618-3169/a000565 [DOI] [PubMed] [Google Scholar]

[bibr32-17470218241307582] Roy-Charland A., Saint-Aubin J. (2006). Short article: The interaction of word frequency and word class: A test of the GO model’s account of the missing-letter effect. Quarterly Journal of Experimental Psychology, 59(1), 38–45. 10.1080/17470210500269428 [DOI] [PubMed] [Google Scholar]

[bibr33-17470218241307582] Saint-Aubin J., Poirier M. (1997). The influence of word function in the missing-letter effect: Further evidence from French. Memory & Cognition, 25(5), 665–676. [PubMed] [Google Scholar]

[bibr34-17470218241307582] Sardinha T. B. (2009). Corpus Brasileiro [Brazilian Corpus]. https://www.linguateca.pt/acesso/corpus.php?corpus=CBRAS

[bibr35-17470218241307582] Schmauder A. R., Morris R. K., Poynor D. V. (2000). Lexical processing and text integration of function and content words: Evidence from priming and eye fixations. Memory & Cognition, 28(7), 1098–1108. 10.3758/BF03211811 [DOI] [PubMed] [Google Scholar]

[bibr36-17470218241307582] Staub A. (2024). The function/content word distinction and eye movements in reading. Journal of Experimental Psychology: Learning, Memory, and Cognition, 50(6), 967–984. 10.1037/xlm0001301 [DOI] [PubMed] [Google Scholar]

[bibr37-17470218241307582] Staub A., Dodge S., Cohen A. L. (2018). Failure to detect function word repetitions and omissions in reading: Are eye movements to blame? Psychonomic Bulletin & Review, 26(1), 340–346. 10.3758/s13423-018-1492-z [DOI] [PubMed] [Google Scholar]

[bibr38-17470218241307582] Taylor W. L. (1953). “Cloze procedure”: A new tool for measuring readability. Journalism Quarterly, 30(4), 415–433. [Google Scholar]

[bibr39-17470218241307582] Thi T. L., Na Y., Choi I., Woo J. (2022). Revealing differential importance of word categories in spoken sentence comprehension using phoneme-related representation. Journal of Integrative Neuroscience, 21(1), 029. 10.31083/j.jin2101029 [DOI] [PubMed] [Google Scholar]

[bibr40-17470218241307582] Vieira J. M. M. (2020). The Brazilian Portuguese eye tracking corpus with a predictability study focusing on lexical and partial prediction [Master’s thesis, Federal University of Ceará]. Biblioteca Universitária, Universidade Federal do Ceará. http://www.repositorio.ufc.br/handle/riufc/55798 [Google Scholar]

[bibr41-17470218241307582] Wagner Filho J. A., Wilkens R., Idiart M., Villavicencio A. (2018). The brWaC corpus: A new open resource for Brazilian Portuguese. Language Resources and Evaluation, LREC 2018, 4339–4344. https://www.aclweb.org/anthology/L18-1686.pdf

PERMALINK

When function words carry content

João Vieira

Elisângela Teixeira

Erica Rodrigues

Hayward J Godwin

Denis Drieghe

Abstract