1. Introduction
Verbal fluency tests are standard cognitive instruments that engage semantic, lexical, and executive function. The participant is asked to generate a list of words during a time limit, usually 60 s. These tests are sensitive to brain damage in various forms and have the advantages of being brief and easy to administer. The most common tasks are letter fluency (e.g., words that start with F) and category fluency (e.g., animals). These tasks may have differential sensitivity to damage within certain networks, and, therefore, to various forms of brain pathology.
Patterns of performance on these instruments across the spectrum from normal cognition to dementia are usually characterized according to the count of unique, valid items generated during the time limit. This tally is often referred to as the “raw score.” On the most common tasks (letter F and animals), older participants generate an average of 9.2 more animals than F-words (Vaughan et al., 2016). A series of meta-analyses has shed light on patterns of verbal fluency performance in the setting of traumatic brain injury (Henry and Crawford, 2004), schizophrenia (Henry and Crawford, 2005), focal cortical lesions (Henry and Crawford, 2004b), Parkinson disease (PD) (Henry and Crawford, 2004c), Huntington disease (Henry et al., 2005), and Alzheimer disease (AD) (Henry et al., 2004). These studies indicate that letter fluency is sensitive to the presence of traumatic brain injury and focal frontal lobe lesions, while semantic fluency is impacted more in the setting of PD, schizophrenia, focal temporal lobe lesions, and AD. More recent work directly comparing individuals with AD, frontotemporal dementia (FTD), and “Parkinson-Plus” syndromes (e.g., progressive supranuclear palsy [PSP]) suggests that verbal fluency raw scores are excellent for detecting cognitive impairment, but not for differentiating among the clinical syndromes (Henderson et al., 2023). Nevertheless, other work has revealed contrasting patterns of raw score performance between syndromes. Studies comparing category and letter fluency performance suggest that AD patients are likely to invert the typical pattern, producing (for example) fewer animals than words beginning with F, while patients with “subcortical” dementia, such as vascular disease or PSP, exhibit similar reductions on both tasks, maintaining the typical pattern (Rosser and Hodges, 1994; Canning et al., 2004). In this setting, raw scores have clear prognostic value. Among individuals with AD, top-tertile performance on a composite of six raw scores is associated with a 29% reduction in risk of death (Cosentino et al., 2006). Raw scores figure prominently in machine learning models trained to identify individuals at increased risk for dementia (Clark et al. 2014, 2016).
A goal of the current work is to shed light on the mechanisms underlying VF task performance, to aid in the interpretation of studies employing detailed analysis of the word lists produced. The most common approach to word list analysis is to identify clusters of related, consecutive words within a list and then to quantify the sizes of the clusters and the number of switches (transitions from one group of related words to a different group) (Troyer et al. 1997, 1998a, 1998b; Troyer, 2000). These scores have some value for predicting the development of dementia in large studies, though results differ in terms of whether clustering (Pakhomov and Hemmy, 2013) or switching (Raoux et al., 2008) has greater value. Cluster scores in non-demented AD mutation carriers correlate with AD biomarkers, while raw scores do not (Yucebas et al., 2024). Further, patients with AD may perform better than patients with vascular dementia on both clustering and switching scores (Zhao et al., 2013). Individuals with amnestic mild cognitive impairment (MCI) differ from controls in terms of category fluency cluster size and letter fluency switching scores (Mueller et al., 2015).
Another feature of VF word lists consists of the lexical frequencies of the words produced, which may be influenced by genetic variations and neuropathological changes. Among a large set of candidate lexical measures, frequency is second only to raw score for detecting neurodegenerative cognitive impairment (Henderson et al., 2023). Lexical frequencies of words produced show a decreasing trend over time, i.e., they are lower, on average, in each successive 10 s interval of the task (Linz et al., 2019; Vonk et al., 2019). Asymptomatic individuals with increased genetic risk for AD due to having one or more copies of the apolipoprotein E ε4 (APOE4) allele generate animal names with higher average lexical frequency than individuals without APOE4 (Vonk et al., 2019). A similar pattern is observed in individuals with MCI (Linz et al., 2019).
Patterns of VF performance in terms of raw score and lexical frequency may reflect the anatomy of pathologic changes. Individuals with PSP produce fewer words on letter fluency than AD patients (Rosser and Hodges, 1994), but these words may be lower in frequency (Rittman et al., 2013), suggesting that these patients have access to less common words, but do so less efficiently. This reduction in raw score could be related to pathologic involvement of left dorsolateral frontal and striatal regions (Rittman et al., 2013). Studies of verbal fluency indicate that animal knowledge depends on the integrity of the anterior temporal lobes. In controls, the anterior temporal lobes exhibit relative activation during verbal fluency for animals, in comparison to letter fluency tasks (Mummery et al., 1996). These same regions exhibit profound atrophy in the syndrome of semantic dementia (Hodges et al., 1992). During verbal fluency tasks, individuals with semantic dementia produce animal names with significantly higher frequency than those produced by controls or patients with other syndromes, but this frequency effect is inverted with category fluency for grocery items, suggesting task-specific or category-specific effects (Marczinski and Kertesz, 2006). Given that anterior temporal atrophy correlates with animal fluency raw scores (but not supermarket fluency) in MCI and AD (Ahn et al., 2011), the increased frequency of animal names generated by patients with semantic dementia supports the view that temporal lobe degeneration disproportionately affects performance on animal fluency (Henderson et al., 2023).
Researchers have been analyzing the timings of verbal fluency responses since the 1940s, and have shown that the rate of word generation during fluency tasks decays according to an exponential or hyperbolic function (Bousfield and Sedgewick, 1944; Gruenewald and Lockhead, 1980). These observations find application to the study of cognitive impairment through two general approaches. The first approach is to identify clusters and switches according to Troyer’s method, then to examine intra- and inter-cluster transition times before and after correcting for the decay in word production rate (Lenio et al., 2016). This approach may help to separate executive and semantic components of category fluency. The second approach, which we take in this work, is to identify transitions between related and unrelated words via the slope difference algorithm. This algorithm involves fitting an exponential curve to an individual’s actual response function (i.e., a plot showing how raw score increases over time). The slopes of the two curves are then compared at points between consecutively generated words. If the slope of the response function exceeds the slope of the fitted curve, the pair of words bounding that portion of the response function is considered related (or linked). Otherwise, the pair of words is considered to be unrelated, and the transition is marked as a switch. This method has been used to show that between-cluster retrieval times are prolonged among individuals with APOE4 performing the animal fluency task (Rosen et al., 2005). Scores based on animal fluency switch and edge* transition speeds in a previous analysis of these data lead to a better model fit than classic clustering and switching scores when predicting incident cognitive impairment (ICI) with logistic regression (Bushnell et al., 2022). Focusing on individuals with progressive cognitive decline, this analysis shows that switch and edge transition speed scores yield an 8% net reclassification improvement (Pencina et al., 2008) over a base model.
Patterns of verbal fluency performance likely depend on common mechanisms or strategies, which may become distorted in the setting of brain damage. Priming is a well-recognized phenomenon in which the speed of access to an item is affected by exposure to a related stimulus (Lashley, 1951; Meyer and Schvaneveldt, 1971). Priming may, therefore, influence performance on verbal fluency tasks. There are several types of priming. Most relevant to the current work is semantic priming, in which hearing or reading one word (the “prime,” e.g., lemon) diminishes the time required to read or produce a semantically related word (the “target,” e.g., apple). Other forms of lexical similarity (e.g., phonetic or orthographic) may also result in priming. Associative priming is a phenomenon similar to semantic priming in which the prime and the target are not alike in the sense of sharing semantic features, but are nevertheless strongly associated with one another (e.g., cow → milk). Negative priming refers to an increase in processing time that occurs when a participant is made to suppress or ignore the prime before presentation of the target (Tipper and Cranston, 1985). For example, one could imagine a lexical decision task in which each stimulus is flanked by items the participant is instructed to ignore (e.g. “LION qrzvwp LION”). Recognition on the next frame of a word related to these ignored flankers is then delayed (e.g., “xprzq TIGER xprzq”). In the case of verbal fluency, we may examine the production of any given word (after the initial word) as if it is a target item, potentially primed by the most recently produced word. Positive and negative priming may occur during verbal fluency tasks but may influence subsequent word production over different time scales, or the polarity of priming effects may depend on the nature of the associations between words. (We focus on the influence of the immediately preceding word, but do not exclude the possibility of priming by words produced even earlier in the task, though we do not examine it directly in this work).
While the influence of priming on word generation within clusters may seem transparent, factors that precipitate or affect the speed of switch transitions are less obvious. Two models that seek to account for patterns of clustering and switching are the censored random walk with priming (Zemla and Austerweil, 2017) and optimal foraging (Hills et al., 2012; Avery and Jones, 2018; Zemla et al., 2023). Both models account, at least qualitatively, for the observation that performance of verbal fluency includes alternation between local exploitation that depends on a semantic linkage and global exploration that depends on some other kind of cue. In the case of the random walk model, the global cue consists of the empirically measured lexical association between the word “animal” and each specific animal name, while in the case of optimal foraging, the global cue is lexical frequency. Although it is not our goal to evaluate or compare these models, we do anticipate that the optimal foraging model would be relevant to our interpretation of effects of lexical frequency on verbal fluency performance. Investigations focusing on timing of verbal fluency responses in the setting of AD suggest that executive aspects of word retrieval (e.g., switching, suppression of unwanted responses) are disrupted (Itaguchi et al., 2025; Tröger et al., 2019).
The current work seeks to identify factors impacting the durations of intervals between words generated during verbal fluency tasks (letter F and animals). Compared to the approaches described in the previous paragraph, our analysis is somewhat pre-theoretical, as we are not seeking support for any specific hypothesis regarding strategies or mechanisms underlying verbal fluency performance. Our goal is to simultaneously examine several predictors that may play a role in the organization of word lists generated during verbal fluency tasks. We will accomplish this goal by using linear mixed-effects models to examine clinical and lexical predictors of interword intervals in the verbal fluency tasks, first using all consecutive word pairs and then repeating the same analysis separately for pairs of words classified as linked or unlinked based on a data-driven algorithm. We anticipate that some of our findings will recapitulate those of a smaller previous analysis based on verbal fluency word order, in which we examined the effect of sound, spelling, or meaning similarity on the probability that two words would be listed consecutively during ten different fluency tasks (Clark et al., 2014a,b). We anticipate that in the current analysis, the larger sample size and use of interword intervals (IWI) as an outcome measure will improve sensitivity and permit analysis of key interactions. Based on the existing literature on priming, and our previously published work on VF tasks, we hypothesize that when words are lexically similar, the retrieval times for these words will be shorter. Phonological and orthographic similarity are strongly correlated and therefore will compete for variance. Semantic similarity correlates (at least weakly) with phonological similarity within individual languages (Dautriche et al., 2017). Lexical frequency correlates inversely with word length (Zipf, 1946), and word length, in turn, influences measures of phonological and orthographic similarity. At least with the animal fluency task, words retrieved earlier tend to be higher in frequency (Vonk et al., 2019)—thus, we hypothesize a negative correlation between latency and frequency. On the other hand, we expect higher frequency items to be retrieved faster than lower frequency items. Therefore, we hypothesize potential interactions between latency and frequency when predicting word retrieval times. Based on a previous analysis of this data set (Ayers et al., 2022), we are aware of a relationship between speed of word retrieval and clinical group (i.e., future cognitive decline). Furthermore, as some common dementing diseases have a strong impact on semantic or lexical processing, we anticipate possible interactions between clinical group and lexical similarity measures when predicting timings—thus, we hypothesize slower word retrieval times in the future cognitive decline group. Finally, owing to the observation that words retrieved occur in clusters (with switches occurring between clusters), we will perform a stratified analysis by re-fitting the mixed-effects models after partitioning the transitions into those within clusters (edge transitions) and those between clusters (switch transitions) with the hypothesis that interword intervals will be shorter for the edge transitions and longer for switch transitions—reflecting faster retrieval of lexically related items and the increased retrieval demands when shifting to a new cluster.
2. Subjects and methods
2.1. The REGARDS study
The Reasons for Geographic and Racial Differences in Stroke (REGARDS) study is a longitudinal epidemiological study that has followed 30,239 individuals at least 45 years of age in the 48 contiguous United States (Howard et al., 2005, Howard et al., 2011; Howard, 2013). Enrollment oversampled the “stroke belt,” a region across the southeastern US with increased incidence of stroke relative to the rest of the country, and evenly included men and women and Black and White races. During the initial phase of the project, participants were visited by a nurse who took vitals and drew blood. Participants were followed with serial remote evaluations over the following years. Incident strokes were identified and adjudicated by a team of specialists. Cognitive tests were conducted by telephone. Category and letter VF tasks were administered and recorded biennially. During the VF tasks, participants were given 60 s to generate a list of animals or words starting with the letter F. Incident cognitive impairment (ICI) in our models was determined by the pattern of longitudinal scores on the Six-Item Screener (SIS). The SIS is an easily administered test in which participants are given three words to remember, then undergo testing with three temporal orientation items, followed by recall for the three words. The three temporal orientation items and three recall items are worth 1 point each. We defined an abnormal score on the SIS as any score <5 (sensitivity 74.2% and specificity 80.2%) (Callahan, Unverzagt, Hui, Perkins and Hendrie, 2002).
The REGARDS Study has been approved by the Institutional Review Board at the University of Alabama at Birmingham. The data analysis was approved by the Institutional Review Board of Indiana University.
2.2. Subjects
We identified 641 cases of ICI and matched them to 641 controls who did not experience ICI from the REGARDS dataset. For details of the matching process, see our previous work (Ayers et al., 2022; Bushnell et al., 2022). We did not include participants with a raw score of 0 or 1 on either VF task, as these participants would have no IWI to analyze. Matching variables included age (<3 years absolute difference), educational level (matched within four categories, including less than high school, high school graduate, some college, and college graduate or above), race (Black or White), sex, and geographic region (stroke belt vs. non-belt). Participants selected for this analysis were required to have had no clinical stroke and to have undergone baseline verbal fluency assessment with animals and letter F. Cases were required to meet the following criteria for ICI: (1) normal SIS scores up to at least the first SIS evaluation after the baseline fluency evaluation, (2) at least one abnormal SIS score thereafter, and (3) an abnormal score on the most recent SIS evaluation. Stipulation that the most recent SIS evaluation must be abnormal is meant to exclude individuals who perform in the abnormal range but later revert to a normal score. Controls were defined as having no abnormal SIS score at any evaluation.
2.3. Acute vs. progressive decline
To examine verbal fluency patterns possibly arising from different pathological mechanisms, we divided the ICI participants into those who exhibited a progressive decline and those who declined acutely. Groups were divided based on an individual’s entire array of SIS scores. We defined a “downward step” as a transition from a normal score (5 or 6) to an abnormal score (<5) or from an abnormal score to a lower abnormal score. We then defined progressive decline to be the presence of >1 downward step to reach the lowest SIS score and acute decline as a transition from any normal SIS score to the lowest recorded abnormal SIS score in a single downward step.
2.4. Data pre-processing
We transcribed 2564 verbal fluency recordings by first obtaining a rough transcription with the Amazon Web Services (AWS) Transcribe tool. This preliminary transcription included utterances and approximate timings. With the aid of a custom MATLAB-based (Natick, MA) graphical transcription tool, we amended the transcriptions by correcting the content and making the timings precise.
2.5. Lexical properties and comparisons
This work depended on measures of individual words and comparisons between pairs of consecutively listed words. We tabulated lexical frequency and latency for each individual word. Lexical frequency is a count of occurrences of a word in a corpus and is invariant across subjects. Latency consists of the time elapsed before a word is produced; this measure will rarely be the same for two instances of the same word, i.e., it varies across subjects.
We computed three measures of similarity between each consecutive pair of valid words: orthographic, phonological, and semantic. For orthographic similarity, we simply computed Levenshtein edit distance between the two words (Levenshtein, 1966) and multiplied by −1, thus converting the distance to a similarity.
For phonological similarity, we used a weighted version of the Needleman-Wunsch algorithm (Needleman and Wunsch, 1970), which is commonly used for alignment of genetic sequences. In our case, however, we were aligning word pronunciations in the form of sequences of ARPAbet symbols, each symbol representing an English phoneme. We drew the majority of our word pronunciations from the CMU Pronouncing Dictionary (2013), which lists pairs of written words with their ARPAbet pronunciations (e.g., ‘horse’: HH AO R S). We defined additional ARPAbet pronunciations for words not present in the original dictionary. We created a vector description of each ARPAbet symbol in terms of phonetic features (including, for example, voicing, place of articulation, presence/absence of a “stop,” etc.) by assigning values of 1, 0, or −1 to each feature. We then calculated the Euclidean distance between every possible pair of vectors and subjected these distance measurements to a pair of sequential transformations, such that identical phonemes (distance of 0) received a weighting of +1, while the least similar pair received a weighting of −1. First, the distances were normalized to the range of [0, 1] using the minmax algorithm. Second, we multiplied the minmax normalized distances by −2 and added 1.0. The result of this second transformation is transparent if one applies it to the boundaries of the interval resulting from minmax (0 and 1); the 0 from the first interval maps to +1 and the 1 maps to −1. The Needleman-Wunsch algorithm uses these weights to identify and score the optimal alignment of two phonetic sequences.
For semantic similarity, we employed a vector space model of word meanings derived from neural network modeling, known as global vectors or GloVes (Pennington et al., 2014). Each word was associated with a vector and the vectors for two words were compared by means of cosine similarity (i.e., normalized dot product). This method yields a number ranging between −1 and +1, with +1 indicating that the meanings were identical. A value of 0 indicates there is no similarity between the meanings of the two words. In practice, negative values are not common but a negative value would mean these words are dissimilar or mutually exclusive (i.e., close to −1 would be ‘alive’ and ‘dead’).
2.6. Statistical analyses
Statistical analyses were carried out in R 4.1.0. Linear mixed-effects models were created using the lme function of the lme4 package. We performed three analyses for each verbal fluency task, each with interword interval (IWI) as the outcome variable. The first analysis incorporated all measured timings between consecutive, valid items. Repeated items and intrusions were not considered valid. Thus, given the letter F sequence “fast, furtive, furious, phone, fussy,” the analysis would include the consecutive word pairs (fast, furtive) and (furtive, furious) but not (furious, fussy) because of the intervening intrusion. The second and third analyses stratified these timings into those that occurred between linked and unlinked pairs of words, which we refer to as edge transitions and switch transitions, respectively. A random per-subject intercept was included in each model to account for faster or slower performance of each individual participant that was not captured by the predictor variables.
2.6.1. Predictors of interword interval
We examined the influence of several predictors by means of regression models. Fig. 1 depicts an example data point. Each data point consisted of an IWI (the outcome variable), a pair of consecutively listed words, measures on the words, and data on the subject who generated the words. The IWI measures were severely positively skewed and were normalized by taking the fourth root. Predictors included clinical variables (acute vs. progressive group), and lexical measures (log-transformed frequency of the second word in each pair, square root-transformed latency of the first word in each pair, and semantic, orthographic, and phonological similarity of each consecutive word pair). We incorporated two-way interactions between frequency and latency and between clinical group and each form of lexical similarity. Transformation of lexical frequencies with the natural logarithm is standard in the natural language processing literature. We normalized the distribution of latencies with a square root transformation. Use of square root rather than logarithm to normalize latencies prevented any negative infinity result that would arise in the rare event that a word latency was measured as zero. In addition, square roots may be easier to interpret for latency, as there will be no negative values. These considerations are not relevant for lexical frequencies, which are always positive (i.e., word counts). Demographic variables (age, sex, race, education, geographic region) were included as covariates in all models.
Fig. 1. Schematic depicting the measures taken on one data point.

The participant has generated the words dog and cat, in that order. The latency is measured from the beginning of the recording to the onset of the word dog. The interword interval (IWI) is measured from the offset of the word dog to the onset of the word cat. Semantic, orthographic, and phonological measurements are tabulated by comparing the words dog and cat in terms of meaning, spelling, and pronunciation, respectively. The lexical frequency of the word cat is tabulated. Demographic and clinical measures (not illustrated here) are identical for all data points from a given subject.
2.6.2. Switch and edge transition times
We partitioned the data into transitions between unrelated words (switch transitions) and those between related words (edge transitions) by means of the slope difference algorithm. This algorithm identifies related response items through a data-driven approach utilizing timings between words. Following Bousfield (1944) and Gruenewald and Lockhead (1980), we first expressed each individual’s performance as a function of increasing raw score over time. We then used a custom MATLAB program to fit an exponential curve to this function (Fig. 2). The fitted curve was defined as ŷ = c(1 − e−mt), where e is the base of the natural logarithm, c is a scale factor, m is a slope term, t is the vector of untransformed word latencies, and ŷ is the estimated number of words generated up to each point in time. Values of c and m were identified that minimized the sum of squared differences between this curve and the participan’s raw score. To decide whether a particular pair of consecutively generated words were related, the slopes of the actual response and the fitted estimate were compared at a time point halfway between the two words. Words produced faster than predicted by the fitted estimate (positive slope difference) were considered to be linked to the previous word for that subject (edge transition). Words produced more slowly than predicted (negative slope difference) were considered to be switches or not linked to the previous word for that subject (switch transition).
Fig. 2. Examples of curve fitting for the slope difference algorithm.

(A) An example animal list with latencies, forming the response curve in black. A fitted exponential curve is shown in green. Chains of linked items include [dogs, cats, rabbits], [mice], [kittens], [goats, cow, horses, mules, ponies], [squirrels, ants, bees], [mosquitos] (B) An example letter F list with latencies with response and fitted curves shown as in panel A. Chains of linked items include [frank, follow, fell, feel, famous, father], [found, figure, felt], [finish, forward, field]. (For interpretation of the references to colour in this figure legend, the reader is referred to the Web version of this article.)
Fourth root-transformed IWI measures for switch and edge transitions were analyzed using the same methods outlined for the unpartitioned analysis, 2.6.1 above. We generated split violin plots to illustrate the distributions of IWI without partitioning, switch IWI, and edge IWI for animals and letter F fluency, first comparing CN and ICI groups, then comparing progressive and acute decliners.
3. Results
3.1. Group comparisons
Table 1 shows data on the 641 clinically stroke-free individuals with ICI and matched controls. We observed the progressive pattern in 104 cases and the acute pattern in 537 cases. Cases performed significantly worse than controls on both animal fluency and letter fluency.
Table 1.
REGARDS matched participant data.
| Controls (N = 641) | ICI (N = 641) | Acute decliners (N = 537) | Progressive decliners (N = 104) | |
|---|---|---|---|---|
| Age (years) | 74.94 (8.62) | 74.96 (8.72) | 74.4 (8.72) | 77.87 (8.16)a,* |
| Sex (F:M) | 347: 294 | 347:294 | 277: 260 | 70:34a,* |
| Region (NB:Belt:Buckle) | 258:243:140 | 258:243:140 | 219:201:117 | 39:42:23 |
| Education (<HS:HS:SC:College+) | 75:191:171:204 | 75:191:171:204 | 60:167:141:169 | 15:24:30:35 |
| Race (B:W) | 244:397 | 244:397 | 209:328 | 35:69 |
| Animal fluency (words) | 16.66 (5.35) | 15.2 (5.15)a | 15.36 (5.23)a | 14.38 (4.66)a,* |
| Letter fluency (words) | 11.96 (4.56) | 11.17 (4.65)a | 11.1 (4.6)a | 11.52 (4.89) |
| Number of SIS assessments | 8.88 (2.00) | 8.79 (2.04) | 8.69 (2.10) | 9.27 (1.61)a,* |
| Minimum SIS score | 5.17 (0.38) | 3.27 (1.12)a | 3.54 (0.89)a | 1.84 (1.08)a,* |
Metric variables are shown as mean (standard deviation) and compared with t-test. Categorical variables were compared using χ2.
SIS = Six-Item Screener; ICI = incident cognitive impairment; HS = high school; SC = some college; College+ = college graduate or graduate school.
p < 0.05 compared to controls.
p < 0.05 compared to acute decliners.
As has been discussed in other published work, separation of cases into acute and progressive decliners did not reveal significant demographic differences between the acute decliners and controls (Ayers et al., 2022; Bushnell et al., 2022). However, progressive decliners were significantly older than both controls and acute decliners, and more likely to be female. The progressive group performed significantly worse than the acute group on animal fluency (but not letter fluency), with the progressive group scoring a mean and SD of 14.38 (4.66) and acute group scoring 15.36 (5.23) on animal fluency.
Table 2 shows mean IWI measures for each verbal fluency task and clinical group. See Supplementary Figures e1–e4 for split violin plots comparing CN and ICI groups and ICI cases separated into acute and progressive groups.
Table 2.
Durations of interword intervals in s1/4.
| Cognitively normal | All ICI | Acute | Progressive | |
|---|---|---|---|---|
| Animal Fluency | ||||
| All IWI | 1.01 (0.39) | 1.04 (0.40) | 1.03 (0.39) | 1.07 (0.41) |
| Edges | 0.85 (0.34) | 0.90 (0.35) | 0.89 (0.35) | 0.92 (0.35) |
| Switches | 1.31 (0.31) | 1.33 (0.32) | 1.33 (0.32) | 1.39 (0.32) |
| Letter F Fluency | ||||
| All IWI | 1.20 (0.35) | 1.22 (0.35) | 1.22 (0.35) | 1.21 (0.36) |
| Edges | 1.07 (0.32) | 1.08 (0.30) | 1.08 (0.30) | 1.07 (0.30) |
| Switches | 1.41 (0.30) | 1.46 (0.31) | 1.46 (0.30) | 1.43 (0.33) |
Standard deviations are in parentheses.
ICI = incident cognitive impairment; IWI = interword intervals.
3.2. Interword intervals – all-items or unpartitioned analysis
The unpartitioned analysis consisted of one linear mixed-effects model per fluency task (animals and letter F) fit with all transition events.
3.2.1. Animal task
The statistical results for this part of the analysis are depicted in blue within Figs. 3 and 4, where any confidence intervals which do not cross zero are considered significant, p < 0.05. The same results are shown numerically in Supplementary Table e1.
Fig. 3. Forest plots for clinical and lexical predictors of animal fluency IWI.

Confidence intervals for regression coefficients are shown for models fit with all (unpartitioned) items (blue), edge transition items only (green), and switch transition items (red). Confidence intervals containing zero overlap the dotted vertical line and are not statistically significant. (For interpretation of the references to colour in this figure legend, the reader is referred to the Web version of this article.)
Fig. 4. Forest plots for two-way interactions predicting animal fluency IWI.

Color coding is exactly as in Fig. 3. Ac = acute cognitive decline; Pr = progressive cognitive decline; ortho = orthographic similarity; phono = phonological similarity; lat = latency (square root transformed); freq = frequency (natural log transformed). (For interpretation of the references to colour in this figure legend, the reader is referred to the Web version of this article.)
3.2.1.1. Lexical effects.
Phonological similarity [b = −0.014, SE = 0.004, t(16475) = −3.172, p < 0.01] and semantic similarity [b = −0.198, SE = 0.005, t(16566) = −43.337, p < 0.0001] were associated with shorter IWI. Latency of the first word in a pair was associated with increased IWI [b = 0.028, SE = 0.011, t(16522) = 2.486, p < 0.02]. Frequency of the second word in a pair was associated with decreased IWI [b = −0.018, SE = 0.004, t(16456) = −4.855, p < 0.0001].
3.2.1.2. Clinical effects.
Both acute decliners [b = 0.035, SE = 0.009, t (1877) = 3.682, p < 0.001] and progressive decliners [b = 0.073, SE = 0.018, t(2079) = 4.143, p < 0.0001] were slower than controls.
3.2.1.3. Interactions.
As shown in Figs. 4 and 5, there was a significant interaction between progressive decliner status and semantic similarity [b = −0.027, SE = 0.013, t(16618) = −2.049, p < 0.05], with longer intervals between words with low semantic similarity among the progressive decliners. For word pairs with high semantic similarity, the intervals of progressive decliners are more like those of the other two groups.
Fig. 5. Interactions between semantic similarity and IWI, separated by clinical group.

Regression lines depicting a main effect of semantic similarity for all three clinical groups, with shorter IWI as semantic similarity increases. There is a negative interaction for the progressive group due to increased slowing with transitions between items with low semantic similarity.
There was a significant latency × frequency interaction [b = 0.003, SE = 0.001, t(16534) = 3.068, p < 0.01]. This interaction is depicted in the upper left panel of Fig. 6, which shows that early in the task, words with lower frequency take longer to produce, while later in the task, frequency does not seem to exert an effect.
Fig. 6. Interactions between IWI and latency, separated by lexical frequency tertiles.

A key finding is the inversion of the influence of frequency at long latencies for animal switch and edge transitions. See text for further description and discussion.
3.2.2. Letter F task
The statistical results for this part of the analysis are depicted in blue within Figs. 7 and 8, where any confidence intervals which do not cross zero are considered significant, p < 0.05. The same results are shown numerically in Supplementary Table e2.
Fig. 7. Forest plots for clinical and lexical predictors of letter fluency IWI.

Confidence intervals for regression coefficients are shown for models fit with all (unpartitioned) items (blue), edge transition items only (green), and switch transition items (red). Confidence intervals containing zero overlap the dotted vertical line and are not statistically significant. (For interpretation of the references to colour in this figure legend, the reader is referred to the Web version of this article.)
Fig. 8. Forest plots for two-way interactions predicting letter fluency IWI.

Color coding is exactly as in Fig. 3. Ac = acute cognitive decline; Pr = progressive cognitive decline; ortho = orthographic similarity; phono = phonological similarity; lat = latency (square root transformed); freq = frequency (natural log transformed). (For interpretation of the references to colour in this figure legend, the reader is referred to the Web version of this article.)
3.2.2.1. Lexical effects.
Shorter IWI were observed for greater amounts of lexical similarity, regardless of the type of similarity. The magnitude of reduction was somewhat greater for orthographic [b = −0.023, SE = 0.006, t(10429) = −3.94, p < 0.0001] than for phonological similarity [b = −0.019, SE 0.006, t(10172) = −3.37, p < 0.001]. The magnitude of reduction was the largest for semantic similarity [b = −0.064, SE = 0.006, t(10196) = −10.53, p < 0.0001].
IWIs were longer between word pairs with greater latency [b = 0.034, SE = 0.013, t(10040) = 2.57, p < 0.02]. Increasing lexical frequency was associated with faster word generation [b = −0.009, SE = 0.004, t(10086) = −2.31, p < 0.05].
3.2.2.2. Clinical effects.
Acute, but not progressive, decliners were significantly slower than controls [b = 0.034, SE = 0.014, t(2907) = 2.51, p < 0.02].
3.2.2.3. Interactions.
There was a significant latency × frequency interaction, as depicted in the right upper panel of Fig. 6 [b = 0.002, SE = 0.001, t(10055) = 2.75, p < 0.01]. Early in the task, the IWIs correspond to intuition, with high-frequency words generated faster than low-frequency words. However, late in the task, this pattern inverts.
3.3. Interword intervals - partitioned analysis
The partitioned analysis consisted of two linear mixed-effects models per task, one fit with edge transition times and one fit with switch transition times.
3.3.1. Animals task
The results from this part of the analysis are shown in Figs. 3 and 4, and Supplementary Figure e5, with confidence intervals from the edge timings depicted in green and those from switch timings depicted in red. Any confidence intervals which do not cross zero are considered significant, p < 0.05. The same data are presented numerically in Supplementary Tables e3 and e4.
3.3.1.1. Lexical effects.
Contrary to expectations, greater orthographic similarity was associated with longer IWIs for switch transitions [b = 0.012, SE = 0.005, t(5012) = 2.42, p < 0.02]. For edge transitions, the effect was numerically similar, but not statistically significant. As was noted in the unpartitioned analysis, greater phonological similarity was associated with shorter IWIs for edge transitions [b = −0.017, SE = 0.005, t(10574) = −3.66, p > 0.001]. Semantic similarity remained a strong predictor of shorter IWIs for both edge and switch transitions, but the magnitude of the effect was more than twice as large for edge transitions (b = −0.117) than for switch transitions (b = −0.053).
Latency, but not frequency, was associated with a significant effect on edge transitions [b = 0.110, SE = 0.011, t(10662) = 9.802, p < 0.0001]. With switch transitions, both latency and frequency exerted significant influences that were similar to those seen in the unpartitioned analysis (latency: b = 0.077, SE = 0.015, t(5019) = 5.26, p < 0.0001; frequency: b = −0.013, SE = 0.004, t(5076) = −3.006, p < 0.01].
3.3.1.2. Clinical effects.
Slowing in switch and edge transitions for those with progressive cognitive decline was statistically significant and highly similar for the two types of transition [edge: b = 0.08, SE = 0.02, t(2249) = 3.962, p = 0.0001; switch: b = 0.079, SE = 0.021, t(1457) = 3.816, p = 0.0001]. The same was true for those with acute cognitive decline, though the effect sizes were smaller [edge: b = 0.054, SE = 0.011, t(2078) = 4.981, p < 0.0001; switch: b = 0.031, SE = 0.011, t(1300) = 2.84, p < 0.05].
3.3.1.3. Interactions.
Fig. 4 shows forest plots for interactions between variables in the partitioned analyses of animal fluency. Individuals with progressive cognitive decline exhibited additional slowing associated with greater phonological similarity on edge transitions only [b = 0.035, SE = 0.014, t(10756) = 2.483, p < 0.05]. The interaction plot in the top panel (A) of Fig. 9 reveals that individuals in the progressive group exhibited a paradoxical pattern regarding phonological similarity among linked words, with shorter IWI between words that do not sound similar and longer IWI between words that do sound similar, as if phonological similarity is interfering with word retrieval. The opposite interaction was seen among the edge transitions with orthographic similarity (see bottom panel, B, of Fig. 9). While there was a trend toward a paradoxical main effect of orthographic similarity, in which greater similarity was associated with longer IWI (p = 0.0592), individuals in the progressive group exhibited the opposite pattern [b = −0.027, SE = 0.014, t(10970) = −1.962, p < 0.05].
Fig. 9. Interactions between similarity and edge transition IWI, separated by clinical group.

(A) Regression lines depicting diminished IWI for NC and acute groups with increasing phonological similarity, but an inversion of the effect for the progressive group. (B) Regression lines depicting a non-significant trend toward increasing IWI for NC and acute groups with increasing orthographic similarity, but a significant negative interaction for the progressive group.
Latency and frequency interact in statistically significant, but anti-thetical, ways during edge and switch transitions [edge: b = −0.003, SE = 0.001, t(10674) = −3.046, p < 0.01; switch: b = 0.004, SE = 0.001, t(5021) = 3.38, p < 0.001]. Referring to the left middle panel of Fig. 6, it appears that late in the task, low-frequency words are being produced faster during switch transitions. This pattern resembles what was observed for letter fluency in the unpartitioned analysis. Inspection of the left lower panel shows inversion of the pattern at long latencies, with low-frequency words taking longer to access.
3.3.2. Letter F task
Refer to Figs. 7 and 8, and Supplementary Figure e6 for forest plots depicting the statistical findings for this part of the analysis. Confidence intervals for edge transitions are shown in green and those for switch transitions are shown in red, and confidence intervals which do not cross zero are considered significant, p < 0.05. The same data are presented numerically in Supplementary Tables e5 and e6.
3.3.2.1. Lexical effects.
Edge transitions were faster for words with higher orthographic similarity, while switch transitions were slower [edge: b = −0.015, SE = 0.014, t(6109) = −2.580, p < 0.01; switch: b = 0.017, SE = 0.007, t(3294) = 2.415, p < 0.05]. Phonological and semantic similarity were associated with faster transition times only in the edge analysis [phonological: b = −0.017, SE = 0.0045, t(10574) = −3.663, p < 0.001; semantic: b = −0.117, SE = 0.0048, t(10741) = −24.050, p < 0.0001]. As in the unpartitioned analysis, switches and edges were associated with slowing in proportion to the latency of the first word in a pair [edge: b = 0.110, SE = 0.011, t(10662) = 9.802, p < 0.0001; switch: b = 0.059, SE = 0.017, t(3209) = 3.419, p < 0.001]. However, the effect of frequency that was seen in the unpartitioned analysis (faster transitions for higher frequency words) was evident only for switch transitions [b = −0.011, SE = 0.005, t(3227) = −2.280, p < 0.05].
3.3.2.2. Clinical effects.
As in the unpartitioned analysis, progressive cognitive decline was not associated with any statistically significant effect on letter F transition times. For the group with acute cognitive decline, the analysis of switch transitions revealed slowing similar to that seen in the unpartitioned analysis [b = 0.059, SE = 0.018, t(2415) = 3.290, p < 0.01]. Edge transitions were not associated with slowing.
3.3.2.3. Interactions.
For letter F edge and switch transitions, none of the interactions between lexical similarity and pattern of cognitive decline were statistically significant. The latency × frequency interaction was significant only for switch transitions [b = 0.003, SE = 0.001, t(3215) = 2.421, p < 0.05]. The middle panel of the right column of Fig. 6 shows the same pattern as that seen in the unpartitioned analysis (upper panel). Early in the task, switches to a word with lower frequency are generated somewhat more quickly, while later in the task, they are generated more slowly. Of note, this is the same pattern observed with switches during animal fluency.
4. Discussion
We have evaluated lexical, clinical, and task-related factors contributing to the duration of intervals between words during verbal fluency tasks. After analyzing the IWIs in aggregate, we applied the slope difference algorithm to partition IWI into those between likely unrelated (switch transitions) and those between likely related items (edge transitions). Repeating the analysis on the partitioned data revealed several qualitative differences between predictors of the durations of switch and edge transitions.
Among the main effects of lexical similarity, semantic similarity was the strongest predictor of IWI for both letter and animal (category) fluency; these effects were most apparent in the unpartitioned analysis. This finding replicates observations in previous work in which semantic similarity was associated with a higher probability that two words would be listed consecutively across a range of fluency tasks (Clark et al., 2014a,b). For animal fluency, the effect remains robust in the partitioned analyses, but for letter F fluency, there was no significant effect for the switch analysis, suggesting that participants rely on semantic associations when forming clusters, but abandon the semantic strategy when switching (or resort to switching when a semantic neighborhood becomes depleted).
We observed very similar influences of phonological similarity for semantic and letter fluency. In both cases, significant negative effects (indicating shorter IWI) were present only in the unpartitioned and edge analyses. We interpret these findings as indicating that phonetic similarity sometimes accelerates word retrieval enough to result in an edge transition. These findings contrast with our previous analysis (Clark et al., 2014a,b), in which phonological similarity was not a predictor of word adjacency either for animals or F-words (although it was significant for S-words). Perhaps the new analysis is more sensitive due to use of IWI as an outcome measure, the larger sample size, or use of a more fine-grained metric for phonological similarity.
Orthographic similarity did not accelerate lexical transitions in any animal fluency analysis and resulted in paradoxical slowing (longer IWI) of switch transitions for both VF tasks. For letter fluency, the contrast was more stark, as orthographic similarity was associated with shorter IWI in the unpartitioned and edge analyses, but with paradoxical slowing of switch transitions. Two things should be noted. First, it is possible that the paradoxical slowing in this context reflects a feature of a common search strategy, in which the candidate words with similar spellings are deactivated during periods of more expansive search, rendering them less accessible. Perhaps this phenomenon represents a form of negative priming, induced by the desire to avoid repeating words. That is, active suppression of words recently produced may result in deactivation of their orthographic neighbors. Second, it is possible that the high correlation between phonological and orthographic similarity is leading to a multicollinearity effect in our models. However, it is remarkable that this multicollinearity effect emerges only in the switch analysis, and for both letter and category fluency tasks. Summarizing the three analyses for letter fluency another way: the unpartitioned and edge analyses show the same pattern, with shorter IWI for greater lexical similarity of any form, while in the switch analysis there was no effect of semantic or phonological similarity and paradoxical slowing as orthographic similarity increased.
Those who subsequently experienced cognitive decline, whether acute or progressive, exhibited slowing of animal fluency transitions in all three analyses. Slowing was slightly greater among those with progressive cognitive decline. We argued in previous work that the progressive cognitive decline group is likely composed chiefly of individuals with AD (Ayers et al., 2022). Thus, it is possible that the slowing observed here is a component of early semantic deficits that have been described in the setting of AD (Kertesz et al., 1986; Giffard et al., 2005; Chertkow et al., 2008; Rogers and Friedman, 2008; Verma and Howard, 2012; Papp et al., 2016). However, although semantic similarity is a robust driver of letter fluency edge transitions, there was no effect of progressive group on letter IWI. These findings coincide with the observation that amyloid accumulation in cognitively normal individuals is associated with early reduction in category fluency, but possibly enhanced performance on letter fluency, and with a faster rate of decline in category fluency over time (Papp et al., 2016). Those who experienced acute cognitive decline exhibited longer IWI in the unpartitioned and switch analyses for letter fluency, suggesting that a delay in letter fluency switching may precede acute memory decline. The differing patterns on the two fluency tasks between the two clinical groups raises some noteworthy considerations. For purposes of argument, we may consider the possibility that the acute group is not pathologically different from the progressive group but were simply tested earlier in their disease course—i.e., that with further follow up, they would have exhibited ongoing progression. If so, then the pattern of slowed switches on both animal and letter fluency would have potential as an early marker of decline. On the other hand, if we assume that the acute group consists of individuals with a spectrum of pathologies, consisting predominantly of Alzheimer disease, Lewy body disease, cerebrovascular disease, and frontotemporal dementia (in proportions corresponding to those observed epidemiologically), then the pattern of performance we observed may relate to some of the underlying pathologies rather than the timing of testing. Apart from AD, the other common pathologies listed are usually associated with early executive dysfunction. Some combination of these two extremes may be at play, and this question is a matter for future research.
We observed two opposing interactions between progressive cognitive decline and lexical similarity measures in the animal fluency edge analysis. There was a significant negative progressive × orthographic interaction, indicating that edge transitions occurred somewhat faster in the progressive group when word spellings were more similar. This increase in speed is offset by a larger positive progressive × phonological interaction. These interactions attenuate the main effects, which included significantly faster IWI with greater phonological similarity and a nonsignificant trend (p = 0.059) toward slower IWI with greater orthographic similarity (see Fig. 9). We hypothesize that these findings indicate that most individuals activate phonological neighbors of recently produced animal names while relatively deactivating orthographic neighbors, but that this phenomenon is attenuated in patients with AD.
We identified a negative progressive × semantic interaction only in the unpartitioned analysis of animal fluency. From Fig. 5, we conclude that individuals in the progressive group retrieve words with lower semantic similarity more slowly and the slowing effect dissipates as semantic similarity increases. The bulk of evidence here is that progressive decline (but perhaps not acute decline) is accompanied by disruption of semantic associations, a phenomenon that has been reported many times over the decades in the setting of AD and has been thought to underlie the well-established finding that patients with AD usually exhibit difficulty with semantic fluency (Chan et al. 1993, 1997, 1998; Rosser and Hodges, 1994; Adlam et al., 2006; Eastman et al., 2014; Henderson et al., 2023). Again, we propose a mediating effect of the switch/edge distinction, i.e., detection of the interaction depends on the contrast between switch and edge transitions, and partitioning the data obliterates this contrast. Thus, the greater slowing of retrieving less semantically similar words by those in the progressive group likely occurs with switches. This finding resembles that of Rosen et al. (2005), in which individuals with APOE4 exhibited longer switch times than individuals without APOE4.
Lexical frequency plays a role in the performance of verbal fluency and may serve as a sort of “global cue” when foraging for words that meet the constraints of fluency tasks (Avery and Jones, 2018; Zemla et al., 2023). Vonk et al. (2019) and Linz et al. (2019) observed that the average frequency of animal names generated declines with each 10-s interval during a 60-s fluency task, suggesting that individuals begin the task by retrieving the most common and familiar animals and gradually resort to less familiar and more exotic animals. Latency of words generated during fluency tasks has uncontroversially been associated with greater slowing of word generation, leading to the familiar curve representing decay of retrieval rate, as illustrated in Fig. 2 (Bousfield and Sedgewick, 1944; Gruenewald and Lockhead, 1980). We sought to extend these findings by examining the effects of latency and frequency on IWI as well as the frequency × latency interaction. As predicted, latency is associated with significantly increased IWI (slowing) in all analyses. Increasing lexical frequency was associated with significantly reduced (faster) IWI in the unpartitioned and switch analyses for both animal and letter F fluency. No significant effect of lexical frequency was detected in either edge analysis. These last two findings support the optimal foraging model, in which edge transitions (clusters) entail retrieval of words that are in some sense “neighbors,” (especially due to semantic relatedness) while lexical frequency serves as a global cue when an individual departs from the vicinity of one cluster to explore a different cluster (Avery and Jones, 2018). Our finding that this pattern emerges in both animal and letter F fluency suggests that it may reflect a general strategy individuals apply to any fluency task.
The interactions between latency and frequency are complex. Although the unpartitioned analysis invites a simple interpretation, partitioning the data reveals an inversion of the expected effect of frequency late in the task for switch transitions. For animal fluency edge transitions, although there was no main effect of frequency, the interaction was significant: as latency increases, higher lexical frequency imparts a slight advantage, i.e., shorter IWI. Referring to the bottom left panel of Fig. 6, we see that late in the task, edge transitions are being affected in a straightforward way by lexical frequency. That is, words with higher frequency are being retrieved somewhat faster. However, a different pattern holds in both the unpartitioned and switch analyses, where low-frequency words are retrieved more slowly only early in the task. These two analyses show slightly different interactions. The unpartitioned analysis (left upper panel of Fig. 6) reveals slower retrieval of low-frequency words early in the task and essentially no effect of frequency late in the task. The switch analysis (left middle panel) shows the same pattern early in the task, but the pattern inverts late in the task, such that high frequency words are paradoxically more difficult to retrieve toward the end of the time limit. These findings could be seen as consistent with the optimal foraging model, if individuals are actively seeking exotic (low-frequency) exemplars late in the task when exploring new territory.
The global pattern for the frequency × latency interaction is recapitulated in the analysis of letter fluency, except that the interaction is not significant in the edge analysis. For the unpartitioned and switch analyses, the right upper and middle panels of Fig. 6 reveal an unsurprising advantage for high-frequency words early in the task with a paradoxical inversion of this effect late in the task. One highly speculative explanation for the presence of this general pattern in both tasks is that switching depends on “exploration,” such that each switch tends to carry one further afield, away from more familiar, higher frequency items. Moves in the opposite direction, however, incur a penalty. At risk of being simplistic, we propose the comparison of verbal fluency performance to tree climbing. Early in the task, the trunk (composed of the most common or familiar animals) is readily accessible, but later in the task it is easier to move from one distal branch to another distal branch than it is to backtrack toward the trunk, possibly due to a desire to avoid re-exploring depleted territory (repeating words).
Several limitations of this work should be noted. The one we are most eager to remedy is the absence of biomarker or imaging data on the participants. We were fortunate to have access to a large pool of data through the REGARDS epidemiological study. In research, there is often a tradeoff between sample size and granularity of data, and in this case the sample size was sufficiently generous to reveal interesting patterns in the data that might have been difficult to detect with fewer participants. Future work will focus on identifying the contribution of the predictors defined here related to or in conjunction with biomarker-defined outcomes. A second limitation pertains to use of the slope difference algorithm for partitioning the data into switch and edge transitions. This algorithm entails fitting an exponential curve to the word latencies, then comparing the slope of the actual performance curve to the predicted curve. The distinction between switches and edges therefore depends on latency and IWI, though in a different way for every subject. In some cases, we may report that a given variable contributes to a decrease in IWI for switch transitions, but if the effect were stronger, some of the transitions might have been labeled as edges. The opposite phenomenon could apply to variables that result in increased IWI for edge transitions. Despite the potential complexity of interpretation, we selected this data-driven approach because it has the potential to account for individual, idiosyncratic associations rather than taking an approach that imposes the judgments of the researchers on all participants. In addition, use of (for example) Troyer’s classic method would have meant explicitly defining switches and edges in terms of subjectively perceived lexical similarity, leading to circularity when performing the partitioned analyses. Moreover, use of semantic subcategories for the animal fluency analysis and orthographic/phonological criteria for the letter fluency analysis would have greatly complicated comparison of the findings from the two types of VF tasks. One further limitation of the current work is that our measure of semantic similarity possibly fails to capture some semantic relationships, as it is based on a semantic space model derived from word co-occurrences.
In summary, we find that semantic similarity has a strong effect on IWI within both letter F and animal fluency, though for letter fluency we saw no effect of semantic similarity on switches. With animal fluency, phonological similarity tends to accelerate edge transitions, while orthographic similarity slows down switch transitions. With letter fluency, both phonological and orthographic similarity accelerate edge transitions, while orthographic similarity paradoxically slows down switch transitions. The main clinical effects include slowing of animal word retrieval for both acute and progressive groups and slowing of letter F word retrieval only for the acute group—a finding driven by slowed switch transitions. For animal fluency in the progressive group, the orthographic and phonological effects are attenuated (with relative acceleration of orthographically similar items and relative slowing of phonologically similar items). Regarding latency and frequency, animal and letter F fluency exhibited very similar patterns. Latency predictably leads to slowing of word retrieval. Higher lexical frequency generally facilitates switching, but this effect interacts with latency, such that there is facilitation early in the task, but paradoxical slowing with higher frequency words late in the task.
Regarding the clinical implications of this work, our findings suggest that examining the timing of verbal fluency, rather than just total word counts, may offer earlier clues about type and trajectory of cognitive decline. Many measures of cognitive decline rely on count, but slowing on animal fluency across both within-cluster and switching transitions may signal emerging semantic impairment consistent with prodromal Alzheimer’s disease, where isolated slowing of switching during letter fluency tasks may point to early executive dysfunction or more heterogenous reasons underlying cognitive decline. The observed distinct timing patterns associated with lexical similarity also support the fact that subtle changes in how patients retrieve related words could help clinicians differentiate semantic versus executive contributors to decline. With further study, focusing on the timing of fluency measures may ultimately serve to complement current neuropsychological testing by providing additional, more sensitive markers of early, concerning cognitive impairment.
Future work will investigate the relationship between verbal fluency timings and groups defined by clinical syndrome or biomarkers. Analysis of repeated verbal fluency assessments will evaluate the prognostic value of individual changes in timing measures. Cognitive models of fluency performance will need to account for the predictors of timing reported here, especially the paradoxical slowing occurring with two measures during switching: that seen with increased orthographic similarity and that seen with increased lexical frequency late in the task.
Supplementary Material
Funding sources
NIH U01 NS041588 and NIH P30 AG010133.
Appendix A. Supplementary data
Supplementary data to this article can be found online at https://doi.org/10.1016/j.neuropsychologia.2026.109400.
Footnotes
CRediT authorship contribution statement
Olivia Murray: Writing – review & editing, Writing – original draft, Visualization, Formal analysis. Justin Bushnell: Writing – review & editing, Software, Formal analysis, Data curation, Conceptualization. Frederick Unverzagt: Writing – review & editing, Supervision, Conceptualization. John Del Gaizo: Writing – review & editing, Software. Virginia G. Wadley: Supervision, Project administration, Conceptualization. Richard Kennedy: Writing – review & editing, Supervision, Formal analysis. Matthew R. Ayers: Writing – review & editing, Conceptualization. David Glenn Clark: Writing – review & editing, Writing – original draft, Visualization, Validation, Supervision, Software, Resources, Project administration, Methodology, Formal analysis, Data curation, Conceptualization.
We borrow the term “edge” from graph theoretical verbal fluency analyses, like that of Goñi (2010), in which related words are connected by an edge in a graph. Importantly, a series of edge transitions does not necessarily indicate the presence of a cluster, as the slope difference algorithm does not categorize the relatedness of non-consecutive words. Yet, edge transitions are sufficient for our purposes, as we are analyzing intervals only between consecutive words.
References
- Adlam AL, Bozeat S, Arnold R, Watson P, Hodges JR, 2006. Semantic knowledge in mild cognitive impairment and mild alzheimer’s disease. Cortex 42 (5), 675–684. [DOI] [PubMed] [Google Scholar]
- Ahn HJ, Seo SW, Chin J, Suh MK, Lee BH, Kim ST, Im K, Lee JM, Lee JH, Heilman KM, Na DL, 2011. The cortical neuroanatomy of neuropsychological deficits in mild cognitive impairment and alzheimer’s disease: a surface-based morphometric analysis. Neuropsychologia 49 (14), 3931–3945. [DOI] [PubMed] [Google Scholar]
- Avery JE, Jones MN, 2018. Comparing models of semantic fluency: do humans forage optimally, or walk randomly? Proceedings of the annual meeting of the cognitive science society. [Google Scholar]
- Ayers MR, Bushnell J, Gao S, Unverzagt F, Gaizo JD, Wadley VG, Kennedy R, Clark DG, 2022. Verbal fluency response times predict incident cognitive impairment. Alzheimer’s Dement. 14 (1), e12277. [Google Scholar]
- Bousfield W, Sedgewick C, 1944. An analysis of sequences of restricted associative responses. J. Gen. Psychol 30 (2), 149–165. [Google Scholar]
- Bushnell J, Svaldi D, Ayers MR, Gao S, Unverzagt F, Gaizo JD, Wadley VG, Kennedy R, Goni J, Clark DG, 2022. A comparison of techniques for deriving clustering and switching scores from verbal fluency word lists. Front. Psychol 13, 743557. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Canning SJ, Leach L, Stuss D, Ngo L, Black SE, 2004. Diagnostic utility of abbreviated fluency measures in Alzheimer disease and vascular dementia. Neurology 62 (4), 556–562. [DOI] [PubMed] [Google Scholar]
- Chan AS, Butters N, Paulsen JS, Salmon DP, Swenson MR, Maloney LT, 1993. An assessment of the semantic network in patients with Alzheimer’s disease. J. Cognit. Neurosci 5 (2), 254–261. [DOI] [PubMed] [Google Scholar]
- Chan AS, Butters N, Salmon DP, 1997. The deterioration of semantic networks in patients with Alzheimer’s disease: a cross-sectional study. Neuropsychologia 35 (3), 241–248. [DOI] [PubMed] [Google Scholar]
- Chan A, Salmon DP, Nordin S, Murphy C, Razani J, 1998. Abnormality of semantic network in patients with Alzheimer’s disease: evidence from verbal, perceptual, and olfactory domains. Ann. N. Y. Acad. Sci 855, 681–685. [DOI] [PubMed] [Google Scholar]
- Chertkow H, Whatmough C, Saumier D, Duong A, 2008. Cognitive neuroscience studies of semantic memory in Alzheimer’s disease. Prog. Brain Res 169, 393–407. [DOI] [PubMed] [Google Scholar]
- Clark DG, Kapur P, Geldmacher DS, Brockington JC, Harrell L, DeRamus TP, Blanton PD, Lokken K, Nicholas AP, Marson DC, 2014a. Latent information in fluency lists predicts functional decline in persons at risk for Alzheimer disease. Cortex 55, 202–218. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Clark DG, Wadley VG, Kapur P, DeRamus TP, Singletary B, Nicholas AP, Blanton PD, Lokken K, Deshpande H, Marson D, Deutsch G, 2014b. Lexical factors and cerebral regions influencing verbal fluency performance in MCI. Neuropsychologia 54, 98–111. [DOI] [PubMed] [Google Scholar]
- Clark DG, McLaughlin PM, Woo E, Hwang KS, Hurtz S, Ramirez L, Eastman JA, Dukes R-M, Kapur P, DeRamus TP, Apostolova LG, 2016. Novel verbal fluency scores and structural brain imaging for prediction of cognitive outcome in mild cognitive impairment. Alzheimer’s Dement.: Diag Assess and Dis Monit. [Google Scholar]
- Cmu Pronouncing Dictionary.(2013). [Google Scholar]
- Cosentino S, Scarmeas N, Albert SM, Stern Y, 2006. Verbal fluency predicts mortality in Alzheimer disease. Cognit. Behav. Neurol 19 (3), 123–129. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dautriche I, Mahowald K, Gibson E, Piantadosi ST, 2017. Wordform similarity increases with semantic similarity: an analysis of 100 languages. Cogn. Sci 41 (8), 2149–2169. [DOI] [PubMed] [Google Scholar]
- Eastman JA, Hwang KS, Lazaris A, Chow N, Ramirez L, Babakchanian S, Woo E, Thompson PM, Apostolova LG, 2014. Cortical thickness and semantic fluency in Alzheimer’s disease and mild cognitive impairment. Am. J. Alzheim. Dis [Google Scholar]
- Giffard B, Desgranges B, Eustache F, 2005. Semantic memory disorders in Alzheimer’s disease: clues from semantic priming effects. Curr. Alzheimer Res 2 (4), 425–434. [DOI] [PubMed] [Google Scholar]
- Gruenewald P, Lockhead G, 1980. The free recall of category examples. J. Exp. Psychol.: Human Learning and Memory Human Learning and Memory 6, 225–240. [Google Scholar]
- Henderson SK, Peterson KA, Patterson K, Lambon Ralph MA, Rowe JB, 2023. Verbal fluency tests assess global cognitive status but have limited diagnostic differentiation: evidence from a large-scale examination of six neurodegenerative diseases. Brain Commun 5 (2) fcad042. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Henry JD, Crawford JR, 2004. A meta-analytic review of verbal fluency performance in patients with traumatic brain injury. Neuropsychology 18 (4), 621–628. [DOI] [PubMed] [Google Scholar]
- Henry JD, Crawford JR, 2004b. A meta-analytic review of verbal fluency performance following focal cortical lesions. Neuropsychology 18 (2), 284–295. [DOI] [PubMed] [Google Scholar]
- Henry JD, Crawford JR, 2004c. Verbal fluency deficits in Parkinson’s disease: a meta-analysis. J. Int. Neuropsychol. Soc 10 (4), 608–622. [DOI] [PubMed] [Google Scholar]
- Henry JD, Crawford JR, 2005. A meta-analytic review of verbal fluency deficits in schizophrenia relative to other neurocognitive deficits. Cogn. Neuropsychiatry 10 (1), 1–33. [DOI] [PubMed] [Google Scholar]
- Henry JD, Crawford JR, Phillips LH, 2004. Verbal fluency performance in dementia of the Alzheimer’s type: a meta-analysis. Neuropsychologia 42 (9), 1212–1222. [DOI] [PubMed] [Google Scholar]
- Henry JD, Crawford JR, Phillips LH, 2005. A meta-analytic review of verbal fluency deficits in Huntington’s disease. Neuropsychology 19 (2), 243–252. [DOI] [PubMed] [Google Scholar]
- Hills TT, Jones MN, Todd PM, 2012. Optimal foraging in semantic memory. Psychol. Rev 119 (2), 431–440. [DOI] [PubMed] [Google Scholar]
- Hodges JR, Patterson K, Oxbury S, Funnell E, 1992. Semantic dementia: progressive fluent aphasia with temporal lobe atrophy. Brain 115, 1783–1806. [DOI] [PubMed] [Google Scholar]
- Itaguchi Y, Waterloo K, Johnson SH, Rodríguez-Aranda C, 2025. Understanding the semantic organization of animal fluency in mild Alzheimer’s disease through time-course analysis and LDA topic modeling. 10.1016/j.neuropsychologia.2025.109126. [DOI] [Google Scholar]
- Kertesz A, Appell J, Fisman M, 1986. The dissolution of language in Alzheimer’s disease. Can. J. Neurol. Sci 13 (S4), 415–418. [DOI] [PubMed] [Google Scholar]
- Lashley KS, 1951. The Problem of Serial Order in Behavior. Cerebral Mechanisms in Behavior. Wiley, L. A. Jeffress. New York, pp. 112–131. [Google Scholar]
- Lenio S, Lissemore FM, Sajatovic M, Smyth KA, Tatsuoka C, Woyczynski WA, Lerner AJ, 2016. Detrending changes the temporal dynamics of a semantic fluency task. Front. Aging Neurosci 8 (252), 1–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Levenshtein VI, 1966. Binary codes capable of correcting deletions, insertions, and reversals. Sov. Phys. Dokl 10 (8), 707–710. [Google Scholar]
- Linz N, Fors KL, Lindsay H, Eckerström M, Alexandersson J, Kokkinakis D, 2019. Temporal analysis of the semantic verbal fluency task in persons with subjective and mild cognitive impairment. Proceedings of the Sixth Workshop on Computational Linguistics and Clinical Psychology. [Google Scholar]
- Marczinski CA, Kertesz A, 2006. Category and letter fluency in semantic dementia, primary progressive aphasia, and Alzheimer’s disease. Brain Lang. 97 (3), 258–265. [DOI] [PubMed] [Google Scholar]
- Meyer DE, Schvaneveldt RW, 1971. Facilitation in recognizing pairs of words: evidence of a dependence between retrieval operations. J. Exp. Psychol 90, 227–234. [DOI] [PubMed] [Google Scholar]
- Mueller KD, Koscik RL, LaRue A, Clark LR, Hermann B, Johnson SC, Sager MA, 2015. Verbal fluency and early memory decline: results from the Wisconsin registry for Alzheimer’s prevention. Arch. Clin. Neuropsychol 30 (5), 448–457. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mummery CJ, Patterson K, Hodges JR, Wise RJ, 1996. Generating ‘tiger’as an animal name or a word beginning with T: differences in brain activation. Proc. Roy. Soc. Lond. B Biol. Sci 263 (1373), 989–995. [Google Scholar]
- Needleman SB, Wunsch CD, 1970. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol 48 (3), 443–453. [DOI] [PubMed] [Google Scholar]
- Pakhomov SV, Hemmy LS, 2013. A computational linguistic measure of clustering behavior on semantic verbal fluency task predicts risk of future dementia in the Nun Study. Cortex. [Google Scholar]
- Papp KV, Mormino EC, Amariglio RE, Munro C, Dagley A, Schultz AP, Johnson KA, Sperling RA, Rentz DM, 2016. Biomarker validation of a decline in semantic processing in preclinical Alzheimer’s disease. Neuropsychology 30 (5), 624. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pencina MJ, D’Agostino RB Sr., D’Agostino RB Jr., Vasan RS, 2008. Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond. Stat. Med 27 (2), 157–172 discussion 207-112. [DOI] [PubMed] [Google Scholar]
- Pennington J, Socher R, Manning CD, 2014. Glove: Global Vectors for Word Representation. [Google Scholar]
- Raoux N, Amieva H, Le Goff M, Auriacombe S, Carcaillon L, Letenneur L, Dartigues J-F, 2008. Clustering and switching processes in semantic verbal fluency in the course of Alzheimer’s disease subjects: results from the PAQUID longitudinal study. Cortex 44, 1188–1196. [DOI] [PubMed] [Google Scholar]
- Rittman T, Ghosh BC, McColgan P, Breen DP, Evans J, Williams-Gray CH, Barker RA, Rowe JB, 2013. The Addenbrooke’s Cognitive examination for the differential diagnosis and longitudinal assessment of patients with parkinsonian disorders. J. Neurol. Neurosurg. Psychiatry 84 (5), 544–551. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rogers SL, Friedman RB, 2008. The underlying mechanisms of semantic memory loss in Alzheimer’s disease and semantic dementia. Neuropsychologia 46 (1), 12–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rosen VM, Sunderland T, Levy J, Harwell A, McGee L, Hammond C, Bhupali D, Putnam K, Bergeson J, Lefkowitz C, 2005. Apolipoprotein E and category fluency: evidence for reduced semantic access in healthy normal controls at risk for developing Alzheimer’s disease. Neuropsychologia 43, 647–658. [DOI] [PubMed] [Google Scholar]
- Rosser A, Hodges JR, 1994. Initial letter and semantic category fluency in Alzheimer’s disease, Huntington’s disease, and progressive supranuclear palsy. J. Neurol. Neurosurg. Psychiatry 57 (11), 1389–1394. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tipper SP, Cranston M, 1985. Selective attention and priming: inhibitory and facilitatory effects of ignored primes. Q. J. Exp. Psychol 37A (4), 591–611. [Google Scholar]
- Tröger J, Linz N, König A, Robert P, Alexandersson J, Peter J, Kray J, 2019. Exploitation vs. exploration– computational temporal and semantic analysis explains semantic verbal fluency impairment in Alzheimer’s disease. Neuropsychologia 131, 53–61. [DOI] [PubMed] [Google Scholar]
- Troyer AK, 2000. Normative data for clustering and switching on verbal fluency tasks. J. Clin. Exp. Neuropsychol 22 (3), 370–378. [DOI] [PubMed] [Google Scholar]
- Troyer AK, Moscovitch M, Winocur G, 1997. Clustering and switching as two components of verbal fluency: evidence from younger and older healthy adults. Neuropsychology 11 (1), 138–146. [DOI] [PubMed] [Google Scholar]
- Troyer AK, Moscovitch M, Winocur G, Alexander MP, Stuss D, 1998a. Clustering and switching on verbal fluency: the effects of focal frontal- and temporal-lobe lesions. Neuropsychologia 36 (6), 499–504. [DOI] [PubMed] [Google Scholar]
- Troyer AK, Moscovitch M, Winocur G, Leach L, Freedman M, 1998b. Clustering and switching on verbal fluency tests in Alzheimer’s and Parkinson’s disease. J. Int. Neuropsychol. Soc 4 (2), 137–143. [DOI] [PubMed] [Google Scholar]
- Vaughan RM, Coen RF, Kenny RA, Lawlor BA, 2016. Preservation of the semantic verbal fluency advantage in a large population-based sample: normative data from the TILDA Study. J. Int. Neuropsychol. Soc 22, 570–576. [DOI] [PubMed] [Google Scholar]
- Verma M, Howard RJ, 2012. Semantic memory and language dysfunction in early Alzheimer’s disease: a review. Int. J. Geriatr. Psychiatr 27 (12), 1209–1217. [Google Scholar]
- Vonk JMJ, Flores RJ, Rosado D, Qian C, Cabo R, Habegger J, Louie K, Allocco E, Brickman AM, Manly JJ, 2019. Semantic network function captured by word frequency in nondemented APOE epsilon4 carriers. Neuropsychology 33 (2), 256–262. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yucebas D, Fox-Fuller JT, Badillo Cabrera A, Baena A, Pluim McDowell C, Aduen P, Vila-Castelar C, Bocanegra Y, Tirado V, Sanchez JS, Cronin-Golomb A, Lopera F, Quiroz YT, 2024. Associations of category fluency clustering performance with in vivo brain pathology in autosomal dominant Alzheimer’s disease. J. Int. Neuropsychol. Soc 30 (1), 77–83. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zemla JC, Austerweil JL, 2017. Modeling semantic fluency data as search on a semantic network. Cogsci 2017, 3646–3651. [PMC free article] [PubMed] [Google Scholar]
- Zemla JC, Gooding DC, Austerweil JL, 2023. Evidence for optimal semantic search throughout adulthood. Sci. Rep 13 (1), 22528. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhao Q, Guo Q, Hong Z, 2013. Clustering and switching during a semantic verbal fluency test contribute to differential diagnosis of cognitive impairment. Neurosci. Bull 29, 75–82. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zipf GK, 1946. The psychology of language, pp. 332–341. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
